Requirements

  • Size of a key-value pair is small - less than 10 KB
  • Ability to store big data
  • High availability - System responds quickly, even during failures
  • High scalability - System can be scaled to support large data
  • Automatic scaling - Addition/Deletion of servers should be automatic based on traffic
  • Tunable consistency
  • Low latency

We will designing an AP system

Partitioning of data

  • To store big data, we will need to partition the data into smaller partitions and store them in multiple servers
  • Consistent hashing can be used to partition the data as it provides these benefits:
    • Evenly distribute data
    • Minimize data movement when nodes are added/removed
    • Virtual nodes can be used to ensure heterogeneity. Servers with higher capacity can be assigned more virtual nodes

Ensuring high availability and reliability

Assuming that we are using consistent hashing

  • To ensure high availability and reliability, data needs to be replicated across servers, where is a configurable parameter.
  • We can replicate data easily on the hash ring across nodes

Taking care of consistency

  • Since data is now replicated across nodes, it must be synchronized across all replicas to ensure high consistency.
  • We can use a technique called Using Quorum Consensus to get consistency to guarantee consistency for both read and write operations
  • In this example, we will be trying to achieve eventual consistency

Resolving conflicts/inconsistencies

  • Replication gives high availability but causes inconsistencies among replicas
  • We will use versioning and vector clocks to detect and resolve inconsistencies

Handling failures

High level architecture

  • Clients communicate with the key-value store through APIs: get(key) and put(key, value)
  • Coordinator is a node that acts as a proxy between the client and the key-value store
  • Nodes are distributed on a hash ring
  • System is completely decentralized - adding/removing nodes can be automatic
  • Data is replicated at multiple nodes
  • No single point of failure as every node has the same set of responsibilities

Responsibilities of each node

Sources