This project implements a Kademlia Distributed Hash Table (DHT) for storing and retrieving key-value pairs across a decentralised network
This project focuses on a simplified Kademlia implementation demonstrating core concepts. While a real-world Kademlia Paper uses 160-bit identifiers, I'm using 6 bits for easier testing (32 possible nodes).
- Scalability - Efficiently in storage we can distribute data across all nodes
- Decentralization - maintaining p2p network (trustless)
- Fault Tolerance
- Node Identification
- Distance Metric (Which Node should own <k,v> pair ?)
- Routing Storing Adjacent Node
- Routing (where to find <k,v>?)
- Store values in Nodes
Each node in the network receives a unique 6-bit identifier (000000 to 111111).
In a real-world scenario, cryptographic hashing algorithms like SHA-1 would generate these unique IDs.
Kademlia uses the XOR (^) operation to calculate the distance between node IDs. It adheres to these properties:
d(x, x) = 0
(distance to itself is zero)d(x, y) > 0
(distance is positive)d(x, y) d(y, z) >= d(x, z)
(triangle inequality)
- This metric helps determine which node should store a key-value pair based on the key's distance to node IDs. The node with the closest distance "owns" the key-value pair.
Similarly Let's take <k,v> pair and calculate the distance between <k,v> and n0,n1.....n31 nodes . Node with less distance we will store <k,v> with Node
Observation: the most common Prefix of key and shorter distance
how would node 7 find what’s the value at key 13?
Every node will have a routing table with IP addresses and IDs of at least “K” other active nodes in a prefix range
For Example if the node identifier size is 4 bit then the total size of network 2^4 = 16
So every node should store k buckets i.e 4 buckets in this case
So 1111 i.e 15 nodes should know 4 nodes in different subtrees with distance 1,2,4,8 so k-buckets for node 15 will be 14,13,11,7
A node checks its k-buckets and chooses the node with the least distance to the key. It then routes the request from one node to another until it reaches the node that "owns" the <k,v> pair.
It will check for k-buckets and choose the least distance with key and routes from one node to another node
Peer GUID | 2^0 = 1 | 2^1 = 2 | 2^2 = 4 | 2^3 = 8 |
---|---|---|---|---|
0 | 1 | 2 | 4 | 8 |
1 | 0 | 3 | 5 | 9 |
2 | 3 | 0 | 4 | 10 |
3 | 2 | 1 | 7 | 11 |
4 | 5 | 6 | 0 | 12 |
5 | 4 | 7 | 1 | 13 |
6 | 7 | 4 | 2 | 14 |
7 | 6 | 5 | 3 | 15 |
8 | 9 | 10 | 12 | 0 |
9 | 8 | 11 | 13 | 1 |
10 | 11 | 8 | 14 | 2 |
11 | 10 | 9 | 15 | 3 |
12 | 13 | 14 | 8 | 4 |
13 | 12 | 15 | 9 | 5 |
14 | 15 | 12 | 10 | 6 |
15 | 14 | 13 | 11 | 7 |
Once the node that owns the <k,v> pair is identified, the value can be stored on that node. Kademlia also implements mechanisms to maintain the routing table and k-buckets as the network evolves.
- GET
/api/ping
- Checks if the current node is reachable and current Buckets
- GET
/api/pingServer
- Pings a specific node by its port number
- GET
/api/get/:key
- Retrieves the value associated with a key from its own network.
- GET
/api/save/:key/:value
- Initiates storing a key-value pair in the associated node
- GET
/api/findValue/:key
- Locates the value of key responsible for storing a key's data.
- GET
/api/findNode/:key
- Finds the closest node responsible for a key (not necessarily storing the data)
I have tested the current algorithm with 32 nodes i.e 6 Bit Node identifier, view all test cases src/test/kademila.test.ts