Separates large databases into smaller, manageable parts called shards
Each shard shares the same schema.
Actual data on each shard is unique to the shard
Example
Choosing the right sharding key
Sharding key (Partition key) consists of one or more columns that determine how data is distributed. In the example above, user_id is the sharding key.
It allows you to determine the correct database for a query
Choose a key that can ensure that the data is evenly distributed
Challenges
Resharding data:
Needed when
A single shard can no longer hold more data
Certain shards are exhausted due to uneven data distribution
Requires updating the sharding function and moving data around to comply with the new function
Excessive access to a specific shard could cause server overload
Example: Data for a celebrity such as Lionel Messi is on a particular shard. In case of a social application, that shard will be overwhelmed with read operations
Solution:
Allocate a shard for each celebrity
Increase the number of shards
Join and de-normalization:
Hard to perform join operations across database shards
Common workaround is to de-normalize the database so that queries can be performed in a single table.