To design a parallel distributed key-value store using consistent hashing on a cluster of Raspberry Pis.


Our key-value store supports the following operations:

Distributing the database across several nodes provides multi-faceted advantages:

The basic architecture consists of a central coordinator and several storage nodes. The clients send requests to the coordinator which routes the requests to appropriate storage nodes. Request routing decisions are taken based on a consistent-hashing ring. The central coordinator has information about all the storage nodes present in the system. Storage nodes store the key-value pairs assigned to it in-memory and service requests routed to it. We'll be programming our project in Go as it enables parallelism via low-cost goroutines. The GoRPC package spawns a new goroutine for each request which enables parallelism across multiple requests.

Claim: The simple key-value system store does not have compute intensive tasks therefore it is wasteful to use beefy compute nodes to implement the system as they are very power hungry.

We plan to build this system using wimpy compute nodes (Raspberry Pis) which are power efficient. We want to benchmark the system and comment if it is useful to build a key-value store using Raspberry Pis. We also would like to perform dynamic scaling of the storage nodes in the system as the in-memory data structure on individual nodes start to bloat. We plan to use consistent hashing to redistribute the data among nodes.


A few challenges that we foresee:




A few addons that we want to explore if time permits:


We hope to demo a working model of this proposed small-scale key-value architecture. We would demonstrate all the features guaranteed by our system namely put, get and delete. We also hope to compare the performance of this distributed key-value store against a single node version of it.



Raspberry Pis are cheap and power efficient and provides fast access to persistent storage (flash vs disks). Pis are readily accessible and therefore we hope that our project can be used by other students to play around with simple distributed systems for academic purposes. Golang is used as it's an effective systems programming language to design thread-safe efficient parallel distributed systems.



Performance comparison:

We tried running a workload using 1024 requests of 128 byte values. This workload was split across clients ranging from 5 to 100 and was run both on Raspberry Pis and Andrew Machines using various configurations like: 1 master 2 workers, 1 master 3 workers and 1 master 4 workers. We observed that Pis scaled poorly because of a single master but the maximum latency was still below 100ms. For any decent web application, this is more than acceptable. As a result, Pis could be an acceptable replacement for disk based powerful compute servers for such applications.

Performance comparison between Raspberry Pis and Andrew Machines:

One of the main motivations behind using Raspberry Pis was that they are power efficient. We tried modeling the performance per watt graphs for both Raspberry Pis and Andrew machines to compare their relative power efficiencies. The results are as follows:

We also observed that increasing the size of values stored in the key-value store gives similar performance upto Values of size 1KB. Above that, especially on higher sizes like 1MB values, continuously streaming data increases average latency upto 2400 ms. This is definitely not acceptable especially for a web application which requires latencies in the order of 100ms. This is mainly due to the NIC of the single master getting overwhelmed by transferring 1MB of data per client request as all requests are streamed from the worker->master->client. Our findings are shown below:

To improve the latency for big data, we removed master from the critical path of data movement. In the new system, the master would just forward the ip address and port number of the worker chosen to handle the request to the client. The client would then establish a new connection with the worker and get the data directly from it. This optimization reduces data movement as master is no longer needed to download data from the worker and stream to the client. This also allows requests to different workers to be served in parallel by removing the master's NIC in the data path which was a bottleneck. We tried the same workload as before with 1 master and 2 workers with 256 STORE requests of 1MB values. We chose nodeIDs in advance such that 128 requests went to each worker. The readings are shown below in comparison to the previous latency values:



So far we have achieved the following:



Our proposed goals were to implement a Key-Value store with GET, PUT and DEL API support. We were hoping to demonstrate the dynamic scale-in/scale-out of the nodes using consistent hashing as the load on the system increases or decreases. This load can be described along two axes:

As we scale-in/scale-out the cluster of Pis we were planning to compare its perf/watt and perf/dollar to a regular commercial machine.


Our goals for the project are pretty much the same as we had initially proposed. Only thing changed is that initially we were hoping to perform dynamic scaling of the system but we don't think we will be able to write an efficient implementation (without a big lock) in the given amount of time. In context of what would be feasible to demo on the project presentation day we think we will demonstrate working model of our system. We plan to show graph of average response times as the number of clients grow, performance/watt and performance/dollar. For the performance/watt we, for now, plan to use the power numbers that are mentioned in the hardware specification document of the concerned machines that we are using. To perform a live demo we need to go forward with the assumption that the IP address of the master node is known to all the slave nodes. This is one addition to our initial goals that we hope to implement a very basic slave node discovery protocol which will run on the master. We don’t know if we will be to implement this given the amount of time that we have but as a fallback we will be using static IP address or a configuration file which will contain the IP address of all the nodes so that the communication between nodes is easy.