Deep dive
DHT stands for Distributed Hash Table, which is a type of distributed key-value database. In this system, each member of the network can store information, such as details about themselves.
The implementation of DHT in TON is similar to the Kademlia protocol, which is also used in IPFS.
Any network participant can operate a DHT node, generate keys, and store data. To do this, they need to create a random ID and inform other nodes about their presence.
An algorithm determines the "distance" between the node and the key, which helps identify which node should store the data. The algorithm is straightforward: it takes the node's ID and the key's ID and performs the XOR
operation. A smaller resulting value indicates a closer proximity between the node and the key.
The goal is to store the key on nodes that are as close as possible to the key so that other network participants can, using the same algorithm, easily locate a node that can provide data associated with that key.
Finding a value by key
Let's examine an example involving a search for a key: connect to any DHT node and establish a connection via ADNL UDP.
Suppose we want to find the address and public key needed to connect to the node hosting the foundation.ton
site.
Assuming we have already obtained this site's ADNL address by executing the "get method" of the DNS contract, the ADNL address in hexadecimal format is 516618cf6cbe9004f6883e742c9a2e3ca53ed02e3e36f4cef62a98ee1e449174
.
Our objective is to determine the IP address, port number, and public key of the node associated with this address.
To achieve this, we first need to get the ID of the DHT key. We will begin by populating the DHT key schema:
dht.key id:int256 name:bytes idx:int = dht.Key
The term name
refers to the type of key. For ADNL addresses, the term address
is used. For instance, when searching for shard nodes, the term nodes
is used. However, the key type can vary and may consist of any array of bytes, depending on the specific value you are seeking.
By applying this schema, we get:
8fde67f6 -- TL ID dht.key
516618cf6cbe9004f6883e742c9a2e3ca53ed02e3e36f4cef62a98ee1e449174 -- the ADNL address being searched
07 61646472657373 -- key type, the word "address" as a TL array of bytes
00000000 -- index 0 because there is only 1 key
Next, retrieve the key ID and the SHA-256 hash from the bytes serialized above. It will be b30af0538916421b46df4ce580bf3a29316831e0c3323a7f156df0236c5b2f75
.
Now we can begin our search. To do this, we need to execute a query that uses the following schema:
dht.findValue key:int256 k:int = dht.ValueResult
The key
represents the ID of our DHT key, while k
indicates the "width" of the search. A smaller value for k
results in a more accurate search but limits the number of potential nodes to query. In TON, the specific value of k
depends on the implementation.
Now, let's populate this structure, serialize it, and send the request using the adnl.message.query
schema. For more details, please refer to the documentation here.
In response, we can get:
dht.valueNotFound
- if the value is not found.dht.valueFound
- if the value is found on this node.
dht.valueNotFound
If we receive dht.valueNotFound
, the response will include a list of nodes that are known to the node we queried and as close as possible to the key we requested. In this situation, we need to connect to these received nodes and add them to our list of known nodes.
Afterwards, we will select the closest, accessible nodes that have not yet been queried from our entire list of known nodes and send the same request to one of them. We will continue this process until we have tried all the nodes within our chosen range or until we stop receiving new nodes.
Let’s analyze the response fields and the schemas used in more detail:
adnl.address.udp ip:int port:int = adnl.Address;
adnl.addressList addrs:(vector adnl.Address) version:int reinit_date:int priority:int expire_at:int = adnl.AddressList;
dht.node id:PublicKey addr_list:adnl.addressList version:int signature:bytes = dht.Node;
dht.nodes nodes:(vector dht.node) = dht.Nodes;
dht.valueNotFound nodes:dht.nodes = dht.ValueResult;
dht.nodes -> nodes
- list of DHT nodes (array).
Each node has an id
, which serves as its public key, typically represented as pub.ed25519. This key is used to connect to the node via ADNL. Additionally, each node contains a list of addresses, addr_list:adnl.addressList
, along with its version and signature.
We need to verify the signature of each node. To do this, we first read the value of the signature
field and then set it to zero, effectively making it an empty byte array. Next, we serialize the TL structure dht.node
using this empty signature and check the signature
field that we emptied earlier.
We validate the serialized bytes using the public key from the id
field. [Please see implementation example].
From the list addrs:(vector adnl.Address)
, we select an address and attempt to establish an ADNL UDP connection, using id
(the public key) as the server key.
To determine the "distance" to this node, we retrieve the key ID from the id
field and calculate the distance using the XOR
operation between the node's key ID and the desired key. If the distance is small enough, we can make the same request to this node. This process continues until we find a value or run out of new nodes.