Cassandra: Strategies for Distributed Data Storage

*
Accepted Session
Short Form
Scheduled: Wednesday, June 2, 2010 from 3:45 – 4:30pm in Morrison

Excerpt

Cassandra is an open source, highly scalable distributed database that brings together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. In this talk we'll discuss the strategies Cassandra employs to provide an eventually consistent data model.

Description

Cassandra is an open source, highly scalable distributed database that’s rapidly gaining momentum in the NoSQL community. It brings together Dynamo’s fully distributed design and Bigtable’s ColumnFamily-based data model to provide a unique data storage solution that is suitable to a wide variety of use cases.

Professor Eric Brewer’s CAP theorem states that a distributed system design can offer at most two out of three desirable properties: Consistency, Availability, and Partition Tolerance. So, how do you provide consistency when your distributed system’s primary requirements are availability and partition tolerance?

In this talk we’ll introduce eventual consistency and the four strategies that Cassandra uses to provide it, while still maintaining high availability:

  • Gossip
  • Read Repair
  • Hinted Hand-off
  • Anti-Entropy

Speaking experience

Speaker