THE DEMAND FOR A DISTRIBUTED DATA NETWORK
Today’s Business IT Environment is looking at data as being the new ‘oil’. It sees data access and sharing as being primary business operations required to keep from being digitally disrupted by new entrants and competitors. Only through collection and secure sharing of data among business partners will this be possible. Dataparency Distributed Data Network (D-DDN™) is the proposed solution. D-DDN is a collection of Dataparency Information Sharing Platform (D-ISP) servers deployed throughout a data network such as the global NGS network of Synadia, Communications. NGS provides a global, always on, messaging network that D-DDN takes advantage of to provide an always available, from anywhere, data source. A data dial-tone for all your applications.
D-DDN AND NATS JETSTREAM – SYNADIA – NGS
Dataparency D-DDN is a Distributed Data Network running on top of a NATS JetStream Network, one of which is Synadia NGS, a global NATS JetStream network. As such, it is addressable from anywhere on the Internet. NATS JetStream is a Publish/Subscribe Streaming Network, first envisioned by Derek Collison of Synadia. NATS provides a location-independent communication channel routed by the topics that publisher/subscribers communicate through. It provides the basis from which the creation of D-DDN was contemplated.
One can think of a NATS stream in radio broadcasting terms; the publisher is the radio station broadcasting on a certain frequency (the topic) music (the data) to be received on a radio (the subscriber) tuned to the specific frequency (the topic) and presented as music (the data) which can be saved (a recording). D-DDN uses these communication ‘channels’ to allow the subscribing of service requests (data queries/posts) and sending responses back through that channel.
NATS JetStream allows clients to persist stream data in a form, a stream, which can later be retrieved. D-DDN allows this data to be persisted in a form that can be queried at a later time. It is the flexible queries and filtering features that distinguishes D-DDN persistence from NATS JetStream’s persistence mechanism. It is also what makes D-DDN, within a NATS Jetstream Network, such a formidable solution to today’s IT problems.
D-ISP IN A D-DDN NETWORK
Examine the following diagram. It shows the relationship of D-ISP within a D-DDN Distributed Data Network backed by a NATS JetStream Network, for example NGS.
FIGURE 1 REQUEST FLOW
D-ISP is built on an Entity-bound database storing hierarchical JSON documents under a URL path and controlled by a Relationship Distributed Identifier (RDID). This path and the RDID embedded in it are used to retrieve the document stored in the backing KV data store.
Query parameters passed in the request determine what in a targeted document hierarchy, the query should return. D-ISP NATS Clients makes requests either through NATS client function calls to NATS JetStream Queues, or with HTTPS request. The examples shown will use the HTTPS interface. Even where making requests are made through HTTPS, all Entity/data updates will still be distributed throughout the network through NATS JetStream update messages.
A requested Entity Token must be supplied in all calls to D-ISP. This token encodes the entity and its passCode and has an expiration, by default 24 hours.
DATA DESIGN DECISIONS
Use of D-ISP starts with the data structure layout. Part of the design phase is to group data elements in their logical groups. This makes queries more efficient and allows discrete access control to groups of like data elements.
It also controls the amount of data returned from the request. With properly designed grouping, there is no need for a graph query such as GraphQL to compose the minimal response.
Take this example: (JSON double quotes skipped)
Data: { ⓪Order : { ①Header : { CustomerNo : 0001, Customer { CustomerName : Customer One, CustomerStreet : 101 Anystreet, CustomerCity : Chicago, CustomerState : Illinois, CustomerZip : 69039, CustomerPhone : (312) 345-0394 } CustomerCreditLimit : 3,500.00 } ②Lines : [ Line : { LineNo : 1, ItemNo : XDFWE40349, UnitCost : 4.93, Qty : 350, Extension : 1725.50 } Line : { LineNo : 2, ItemNo : CDFWE40333, UnitCost : 1356.90, Qty : 3, Extension : 4070.70 } ] ③Shipping_Info : { Shipper : { FOB : Chicago, Ill., ShipperID : FTGCART1, ShipperName : FTG Cartage, . . . } } } }
Note the following:
- ⓪ The Order object in the document can be retrieved in its entirety by the tag “Order”. D-ISP will return the Order node and all its children in the response.
The query param would be ‘k=Order’. The request URL would be: https://localhost/ourbusiness/customers/<RDID>/orders?k=Order - ② The Lines grouping can be retrieved through the tag “Lines”. D-ISP will return Lines and all its children which includes the Line(x) and all their children.
The query param would be ‘k=Lines’. The request URL would be: https://localhost/ourbusiness/customers/<RDID>/orders?k=Lines
The first Second item can be retrieved through the tag “LineNo” and the value “2”.
The query params would be ‘k=LineNo’ and ‘v=2’. The request URL would be: https://localhost/ourbusiness/customers/<RDID>/orders?k=LineNo&v=2
D-ISP will search the document structure to find all LineNo elements with the value of “2”.
This will return the LineNo 2 nodes as result.
By default, queries return the targeted element’s parent and all siblings in depth.
FLEXIBILITY and EXTENSIBILITY
The ability to specify the URL in the request allows much flexibility in data design and placement. No manual, maintenance intensive, specification of tables in a database. Data placement becomes fluid and manageable.
As Microservices can be registered as Entities, they can ‘own’ and manage their data without impacting other services. When they are included in a Container as in Kubernetes, the multiple instances pose no problem as each instance will connect to D-DDN using its acquired Entity Token and RDIDs retrieved or requested from D-DDN.
Multi-tenant is a no brainer. Just set a different first segment in the path. And further access control can be accomplished through non-evident URL path segments.
This use of an encoding of components of the path and RDID makes for a fluid, unrestricted method for placement of data and removes the requirements for pre-defined schemas, although schema validations can be applied in pre-posting functions. We are developing an IDE that will provide the governance and design helpers to make this more intuitive. CI/CD processes can use the output of the IDE (Integrated Development Environment) to validate the development pipeline and as security is built in to the process, will be capable of performing security audits in the build process.
All data elements and their values are indexed into a dictionary during the parsing process. This allows D-ISP to interrogate the dictionary to preclude documents not containing the query target. This allows for up to 175,000 document searches per second per server during query processing. It also allows queries for values anywhere in the document, either exactly or by substrings and within any data element.
Document metadata; tag/value dictionary, timestamps & IDs, are stored in an array within a path-specific system record in LIFO order. This makes it easy to retrieve the last document stored and easy to retrieve the first document. This layout allows queries without retrieving non-matching data records which would be the case in non-key queries in SQL/RDBMS. The timestamp within the array can be matched to a timestamp param in the query.
IMMUTABILITY
All requests to store Entity data result in a new document, entered into LIFO ordered lists, with an optional expiration period. On expiration, the document will be deleted from the data store. As there is no DELETE action accepted, non-expired data will persist indefinitely, hence be immutable. This immutability in conjunction with storing in ordered lists allows historical auditing of versions of documents within a ‘collection’ of Entity data.
DATA PRIVACY AND SECURITY
When an Entity is registered with the system, it is given a uniquely created private/public key pair. This pair is used to encrypt/decrypt the entity data on disk. The key pair are held within a system record, itself encrypted with the system key pair. Additionally, the key to the KV document, in the server data store, is constructed from an algorithm that includes multiple hashes of the URL segments, the entity identifier in the system retrieved from the RDID, and a UUID hash that becomes the document ID. For security, there is no possible iteration, i.e. simple numeric incrementation, of the document ID, which would allow transversal of the data store.
These hashes are then hashed to a 128-bit hash, encoded by base-62, and used for the document key. A 128-bit hash allows for 340 trillion trillion possible values.
This key construction/encryption protocol results in a near impossible to hack data store. Even if hackers gain access to the server data store (the server must be down to do so), without the knowledge of the components that made up the original request and the proprietary hash, they have no way to discover the data and then use it without the encryption key. Therefore all data must be accessed using the D-ISP server with no end runs to bypass its security & privacy controls.
Data queries generally require possession of an approved RDID. This approval is given by the targeted Entity or its proxy. RDIDs are specific to an Entity and its requestor and cannot be used to retrieve private data of other Entities.
Data can be marked ‘private’ or ‘public’. Private data can only be retrieved through an appropriate RDID supplied in the request. Public data can be accessed without a RDID as long as the data was stored under a RDID to which the caller is a party.
Further data element access control is offered through the use of ‘Access Templates’. These are system documents stored with the server that describe the roles and group memberships that must have been assigned to the Entity that is requesting the data element. Without meeting the requirements, no data will be returned. Entity roles and group memberships can be added or removed at any time by a system administrator Entity.
Using grouping as described above, individual elements or whole groups of elements can be ‘redacted’ from values returned from a query using their tag or key. This offers great flexibility in managing data privacy. These requirements are stored with the document so they are bound to the data elements at time of storage. Access Templates are immutable and there can be only one instance in the system. This removes the ability of a hacker to change permissions by changing the template to gain access to protected data.
HASHING
Most hashing within D-ISP operations is done with a proprietary, fixed-width hash, either 64 or 128 bit. It has the distinct property of its hashes being capable of being added to or subtracted from to produce a resulting hash. This resulting hash is then used, with the system hash, to verify an appropriate hash was passed to the system. It also obscures internal values from hackers. Hashes of lengths greater than 128 bits can be accommodated if higher security is required.
MULTI-SERVER SYNCHRONIZATION UPDATES IN A DISTRIBUTED NETWORK
D-DDN has provisions for keeping the D-ISP servers data in synchronization using NATS JetStream queues. Entities, Relationships (RDIDs), Access Templates, and Entity Data are keep in sync thru publishing of updates to specific topic queues that each server listens on. When a server receives an update message on a particular queue topic, it updates that data in the appropriate area of its data store. This is especially efficient in the area of Entity Data as there can only be one instance of a document. It is simple to just insert the document at the appropriate path and metadata as it has already been parsed and encrypted by the Entity’s public key. The system documents relating to Entities, RDIDs, and Templates have also been encrypted with the server’s system public key shared by all servers making it therefore unusable by other parties.
These updates are stored in NATS JetStream durable file-based streams that hold all messages on that stream. These file streams allow D-DDN network D-ISP instances to recover lost or missing updates and allow a D-ISP instance to recreate the entire data store from the file streams of each update topic area.
As data is backed up by the multiple servers in a D-DDN Network running on a NATS JetStream Network, there is little chance for a hacker or ransomware attacker to disrupt access to the data of the D-DDN Network. It will always be recoverable if there is an operating NATS JetStream server.
GEOLOCATION, GOVERNMENTAL JURISDICTIONS, & SERVER PLACEMENT
Within D-DDN, D-ISP server instances and their attached data stores can be placed in geolocations to comply with governmental requirements. As Entities are assigned to NATS JetStream topic queues by geolocation when registered, privacy and user data can be provided within the requirements of the controlling jurisdiction policies, e.g. GDPR or CCPA/CPRA. And with RDID revocation, the ‘right to be forgotten’ can easily be accommodated. As RDIDs are specific to individual entities, an entity can ‘forget’ an entity and lose access to its data while leaving the data and other data access alone. Whereas other methods require deletion of data from the forgetting entity, D-ISP’s controls allow the data to still exist and be usable by other approved entities. The ‘forgetting’ entity just doesn’t have access to it anymore, anywhere in the network.
Additionally, server topic queue allocation can be used to provide more restrictive access and data location to enterprise data. Entity tokens can then be used to allocate access to sensitive enterprise resources without mixing access privileges with less permissioned entities.
The use of NATS JetStream queues to connect with D-ISP servers in the D-DDN Network means there is no host locating process required by the client. NATS JetStream will route the request to whatever server is operating wherever it is and as all queue servers are Entity and data synchronized, any server can process the request. This obviates the typically manual maintenance overhead generally required in a distributed system.
SCALABILITY CONCERNS
D-ISP running within a D-DDN Network is a highly scalable offering. As all data is accessed by the Entity ID encoded within the RDID passed in the request, the D-ISP server instance’s request queue topic, stored in the Entity system record, can be used to route the request to the appropriate server queue group using NATS JetStream’s load balanced topic routing. Scalability can therefore be achieved by creating appropriate instances of D-ISP servers ‘listening’ on entity-specific topic queues. Many topologies can be implemented; serverless one server per Entity, serverless one instance for multiple Entities, serverless instance(s) serving Entities within a governing jurisdiction, and many other possibilities. NATS JetStream topic queues map Entity data to D-ISP instances serving that Entity’s data.
With each D-ISP server capable of performing up to 145 fully indexed 1k document writes per second and up to 175,000 reads per second, you can do the math to see the possible total throughput of a D-DDN Network.
TRANSACTION CONTROLLING OPERATIONS
D-ISP has two features that can be used to provide distributed transaction processing even though it hasn’t a real distributed transaction process. D-ISP POSTs allow setting two parameters that can be used to implement a transaction control protocol.
The first is the ability to set an expiration TTL (Time to Live) for a document. This allows setting say a TTL of 30 seconds for a ‘locking document’ after which the document will be deleted. This is done in a ‘lazy’ manner, document expiration and deletion is only performed on the next read/write access for that path and entity. This feature can be used to post an entity and path-specific ‘lock’ on a path which will timeout within a specified time interval. These lock documents can be used by an external process to orchestrate a ‘saga’, a distributed transaction controlling multiple steps where each step may be undone thereby invalidating the entire transaction and requiring rollbacks.
Additionally, a second feature allows retrieval of documents by being the latest in the LIFO list. This allows retrieving the latest locking document post for determination of locking status.
USE CASES
There are many applications where D-DDN’s features provide value. Just a few:
- • As a data platform for sharing between business partners
o Cooperatives
o Business Groups
o B2B processing; Supply Chain, Logistics, Orders, Invoicing, etc.
• Sharing patient user data; Healthcare, wearables, sensors, etc.
o Physicians
o Hospitals
o Govt. Agencies
o Caregivers
• Sharing Military deployments, equipment locations and transport, sensors, etc.
o On land, in the air, and in space
o Predictive maintenance
• Securing CI/CD resource data pipelines for hack-resistant production builds.
o No SolarWinds type secondary supply chain hacks
o Maintain secure version metadata of Open Source dependencies
• Securing ICAM (Identity, Credentials, and Access Management) framework.
o Providing Zero Trust implementation
o Providing secure private key storage
o Granular access, at all levels, no limits, minimum maintenance required
• IoT and Edge Computing
o Ability to place data where it is best used. Attached to a NATS Leaf node
o Small footprint (~10Mb) allows D-ISP server instance to run in edge device
o Edge D-ISP instance can propagate data to central cloud
o Interrogate Edge data directly on device based analytics/AI
• 5G and Edge Computing
o Allows offering security/data service to 5G/Virtual Private Networks
o Allow secure manufacturing 4.0 implementation over 5G wireless
o Potential to add security layer to subscriber services
• NFT-like ownership platform