Database sharding

Example

The hash function user_id % 4 is used to find the server with the relevant user data

Sharding key (Partition key) consists of one or more columns that determine how data is distributed. In the example above, user_id is the sharding key.
It allows you to determine the correct database for a query
Choose a key that can ensure that the data is evenly distributed

Resharding data:
- Needed when
  - A single shard can no longer hold more data
  - Certain shards are exhausted due to uneven data distribution
- Requires updating the sharding function and moving data around to comply with the new function
  - Consistent hashing is a commonly used technique to solve this problem
Celebrity problem:
- Also called the hotspot key problem
- Excessive access to a specific shard could cause server overload
- Example: Data for a celebrity such as Lionel Messi is on a particular shard. In case of a social application, that shard will be overwhelmed with read operations
- Solution:
  - Allocate a shard for each celebrity
  - Increase the number of shards
Join and de-normalization:
- Hard to perform join operations across database shards
- Common workaround is to de-normalize the database so that queries can be performed in a single table.