Which key/value store is the most promising/stable?

Question:

I’m looking to start using a key/value store for some side projects (mostly as a learning experience), but so many have popped up in the recent past that I’ve got no idea where to begin. Just listing from memory, I can think of:

  1. CouchDB
  2. MongoDB
  3. Riak
  4. Redis
  5. Tokyo Cabinet
  6. Berkeley DB
  7. Cassandra
  8. MemcacheDB

And I’m sure that there are more out there that have slipped through my search efforts. With all the information out there, it’s hard to find solid comparisons between all of the competitors. My criteria and questions are:

  1. (Most Important) Which do you recommend, and why?
  2. Which one is the fastest?
  3. Which one is the most stable?
  4. Which one is the easiest to set up and install?
  5. Which ones have bindings for Python and/or Ruby?

Edit:
So far it looks like Redis is the best solution, but that’s only because I’ve gotten one solid response (from ardsrk). I’m looking for more answers like his, because they point me in the direction of useful, quantitative information. Which Key-Value store do you use, and why?

Edit 2:
If anyone has experience with CouchDB, Riak, or MongoDB, I’d love to hear your experiences with them (and even more so if you can offer a comparative analysis of several of them)

Asked By: Mike Trpcic

||

Answers:

They all have different features. And don’t forget Project Voldemort which is actually used/tested by LinkedIn in their production before each release.

It’s hard to compare. You have to ask yourself what you need: e.g. do you want partitioning? if so then some of them, like CouchDB, won’t support it. Do you want erasure coding? Then most of them don’t have that. Etc.

Berkeley DB is a very basic, low level storage engine, that perhaps can be excused from this discussion. Several key-value systems are built on top of it, to provide additional features like replication, versioning, coding, etc.

Also, what does your application need? Several of the solutions contain complexity that may not be necessary. E.g. if you just store static data that won’t change, you can store them under data’s SHA-1 content hash (i.e. use the content-hash as key). In this case, you don’t have to worry about freshness, synchronization, versioning, and lots of complexities can be removed.

Answered By: OverClocked

Which do you recommend, and why?

I recommend Redis. Why? Continue reading!!

Which one is the fastest?

I can’t say whether it’s the fastest. But Redis is fast. It’s fast because
it holds all the data in RAM. Recently, virtual memory feature was added but still all the keys stay in main memory with only rarely used values being swapped to disk.

Which one is the most stable?

Again, since I have no direct experience with the other key-value stores I can’t compare. However, Redis is being used in production by many web applications like GitHub and Instagram, among many others.

Which one is the easiest to set up and install?

Redis is fairly easy to setup. Grab the source and on a Linux box run make install. This yields redis-server binary that you could put it on your path and start it.

redis-server binds to port 6379 by default. Have a look at redis.conf that comes with the source for more configuration and setup options.

Which ones have bindings for Python and/or Ruby?

Redis has excellent Ruby and Python support.

In response to Xorlev’s comment below: Memcached is just a simple key-value store. Redis supports complex data types like lists, sets and sorted sets and at the same time provides a simple interface to these data types.

There is also make 32bit that makes all pointers only 32-bits in size even on 64 bit machines. This saves considerable memory on machines with less than 4GB of RAM.

Answered By: ardsrk

I really like memcached personally.

I use it on a couple of my sites and it’s simple, fast, and easy. It really was just incredibly simple to use, the API is easy to use. It doesn’t store anything on disk, thus the name memcached, so it’s out if you’re looking for a persistent storage engine.

Python has python-memcached.

I haven’t used the Ruby client, but a quick Google search reveals RMemCache

If you just need a caching engine, memcached is the way to go. It’s developed, it’s stable, and it’s bleedin’ fast. There’s a reason LiveJournal made it and Facebook develops it. It’s in use at some of the largest sites out there to great effect. It scales extremely well.

Answered By: Xorlev

At this year’s PyCon, Jeremy Edberg of Reddit gave a talk:

http://pycon.blip.tv/file/3257303/

He said that Reddit uses PostGres as a key-value store, presumably with a simple 2-column table; according to his talk it had benchmarked faster than any other key-value store they had tried. And, of course, it’s very mature.

Ultimately, OverClocked is right; your use case determines the best store. But RDMBSs have long been (ab)used as key-value stores, and they can be very fast, too.

Answered By: AdamKG

Just to make the list complete: there’s Dreamcache, too. It’s compatible with Memcached (in terms of protocol, so you can use any client library written for Memcached), it’s just faster.

Answered By: grokk

I’ve been playing with MongoDB and it has one thing that makes it perfect for my application, the ability to store complex Maps/Lists in the database directly. I have a large Map where each value is a list and I don’t have to do anything special just to write and retrieve that without knowing all the different keys and list values. I don’t know much about the other options but the speed and that ability make Mongo perfect for my application. Plus the Java driver is very simple to use.

Answered By: MattGrommes

There is also zodb.

Answered By: mikerobi

One distinction you have to make is what will you use the DB for?
Don’t jump on board just because it’s trendy. Do you need a key value store? or do you need a document based store? What is your memory footprint requirement? running it on a small VM or a separate one?

I recommend listing your requirements first and then seeing which ones overlap with your requirements.

With that said, I have used CouchDB/MongoDB and prefer to use MongoDB for its ease of setup and best transition from mysql style queries. I chose mongodb over sql because of dynamic schemas(no migration files!) and better data modeling(arrays, hashes). I did not evaluate based on scalability.

MongoMapper is a great MongoDB orm mapper for Ruby and there’s already a working Rails 3 fork.

I listed some more details about why I prefered mongodb in my scribd slides
http://tommy.chheng.com/index.php/2010/02/mongodb-for-natural-development/

Answered By: tommy chheng

I notice how everyone is confusing memcached with memcachedb. They are two different systems. The op asked about memcachedb.

memcached is memory storage. memcachedb uses Berkeley DB as its datastore.

Answered By: drr

I only have experience with Berkeley DB, so I’ll mention what I like about it.

  • It is fast
  • It is very mature and stable
  • It has outstanding documentation
  • It has C,C++,Java & C# bindings out of the box. Other language bindings are available. I believe Python comes with bindings as part of its “batteries”.

The only downside I’ve run into is that the C# bindings are new and don’t seem to support every feature.

Answered By: Ferruccio

You need to understand what modern NoSQL phenomenon is about.
It is not about key-value storages. They’ve been available for decades (BerkeleyDB for example). Why all the fuss now ?

It is not about fancy document or object oriented schemas and overcoming “impedance mismatch”. Proponents of these features have been touting them for years and they got nowhere.

It is simply about adressing 3 technical problems: automatic (for maintainers) and transparent (for application developers) failover, sharding and replication.
Thus you should ignore any trendy products that do not deliver on this front. These include Redis, MongoDB, CouchDB etc. And concentrate on truly distributed solutions like cassandra, riak etc.

Otherwise you’ll loose all the good stuff sql gives you (adhoc queries, Crystal Reports for your boss, third party tools and libraries) and get nothing in return.

Answered By: Vagif Verdi

Cassandra seems to be popular.

Cassandra is in use at Digg, Facebook, Twitter, Reddit, Rackspace, Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX, and more companies that have large, active data sets. The largest production cluster has over 100 TB of data in over 150 machines.

Answered By: yfeldblum

Which key value store is the most promising/stable?

G-WAN KV store looks rather promising:

DB engine            Traversal
-----------          ----------------------------
SQLite               0.261 ms  (b-tree)
Tokyo-Cabinet (TC)   4.188 ms  (hash table)
TC-FIXED             0.103 ms  (fixed-size array)
G-WAN KV             0.010 ms  (unamed)

Also, it is used internally by G-WAN webserver, known for its high concurrency performances (that’s for the stability question).

Answered By: Bert

As the others said, it depends always on your needs. I for example prefer whatever suits my applications best.

I first used memcached to have fast read/write access. As Java API I´ve used SpyMemcached, what comes with an very easy interface you can use for writing and reading data. Due to memory leaks (no more RAM) I was required to look for another solution, also I was not able scale right, just increase the memory for a single process seemed to be not an good achievement.

After some reviewing I saw couchbase, it comes with replication, clustering, auto-failover, and a community edition (MS Windows, MacOs, Linux). And the best thing for me was, the Java client of it implements also SpyMemcached, so I had almost nothing else to do as setup the server and use couchbase instead of memcached as datastore. Advantage? Sure, my data is now persistent, replicated, and indexed. It comes with a webconsole to write map reduce functions for document views in erlang.

It has Support for Python, Ruby, .Net and more, easy configuration through the webconsole and client-tools. It runs stable. With some tests I was able to write about 10k per second for 200 – 400 byte long records. Reading Performance was way higher though (both tested locally). Have a lot of fun making your decision.

Answered By: Alex M

Only have experience with mongoDB, memchache and redis. Here’s a comparison between them and couchDB.

Seems mongoDB is most popular. It support sharding and replication, eventually consistent, has good support in ruby (mongoid). It also have a richer feature set than the other two. All of mongo, redis and memchache can store the key-value in memory, but redis seems to be much faster, according to this post, redis is 2x write, 3x read faster than mongo. It has better designed data structures and more ‘light-weight’.

I would say they have different usages, mongoDB is probably good for large dataset and document storage while memchache and redis are better to store caches or logs.

Answered By: Bruce Lin
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.