In our second episode (12 minutes long), Alex and Nat talk about the new generation of “NoSQL” databases that have created a lot of interest among web developers; especially those lucky people dealing with thousands of simultaneous users and terabytes of data.
Please feel free to leave a comment below after you’ve listened to the episode. We’re still total newbies at this podcasting thing, so your feedback and encouragement are a big help!
If you want to learn more about NoSQL than what we covered in the show, check out these links:
- Nice introduction to all the basic concepts: consistency models, replication, vector clocks.
- A comparison of NoSQL alternatives and a good braindump of the subject matter.
- Amazon Dynamo paper. Great readable paper introducing the core concepts for massively scalable datastores.
- BigTable paper. Another cornerstone paper.
- How FriendFeed uses MySQL to store schema-less data
The Big Guys:
- HBase — We didn’t get to this one, but it’s modelled on BigTable, and can replicate across geographically separated datacenters (Cassandra needs faster roundtrips). And it’s what Hadoop uses internally.
- MongoDB — Great for storing JSON objects.
- Redis — memcached with persistence and useful list/set/ordered-set datatypes.
- Redis twitter implementation — simple example of building a twitter-like system on top of redis.
- Consistent Hashing.
- Vector Clocks — See section 4.4 in the Amazon Dynamo paper.
- Important relationship between Consistency, Availability and Partition Tolerance, called the CAP Theorem.
The image above is a picture of a Google datacenter in Oregon, where they no doubt run BigTable.