24 Apr 2009 10:26
Weekend is coming and I have a very small pet-project for it. I would still keep the idea non-public, but it involves processing hundreds of entries per second, analyzing data from multiple sources. It would have a dead-simple web interface.
The nature of the project requires really fast data backend, capable of storing and retrieving a few thousand items per second. The dataset would be approximately 5GB, average item size: 0.5KB.
When it came to tools selection, after short considerations I have chosen Sinatra for web interface, and Redis as a memory-only (with disk dumps) key-value datastore. It should be capable of handling 100 000 requests per second and deal well with large datasets, so fits perfectly. It also differs from Memcached or MemcachedDB because it has great higher-level structures like Lists and Sets, basic sorting and selection commands.
Recently there is a lot of hype about (distributed) key-value storage, for more info I recommend a nice article by Richard Jones Anti-RDBMS: A list of distributed key-value stores. HighScalability blog also has a lot of references and articles.
Redis looks like the way to go. The only problem with it is that the whole dataset (database) must fit in the RAM, otherwise performance might degrade terribly (because of swapping). Performance itself is not an issue, and you would need several concurrent clients to actually face this as a limit.
Anyway, initially I wanted to deploy the project at Amazon EC2 - because of hyped scalability, price etc. But here comes a surprise — the performance simply sucks. I guess this is because the instances share common hardware and you might have actual memory bandwidth limited.
Here are my results of running
./redis-benchmark -n 100000
Amazon Small instance ($0.10/h)
====== PING ====== 100042 requests completed in 11.95 seconds 50 parallel clients 3 bytes payload keep alive: 1 8369.61 requests per second ====== SET ====== 100023 requests completed in 12.13 seconds 50 parallel clients 3 bytes payload keep alive: 1 8247.28 requests per second ====== GET ====== 100004 requests completed in 14.26 seconds 50 parallel clients 3 bytes payload keep alive: 1 7010.94 requests per second ====== INCR ====== 100000 requests completed in 14.40 seconds 50 parallel clients 3 bytes payload keep alive: 1 6945.89 requests per second ====== LPUSH ====== 100000 requests completed in 12.24 seconds 50 parallel clients 3 bytes payload keep alive: 1 8171.27 requests per second ====== LPOP ====== 100000 requests completed in 14.22 seconds 50 parallel clients 3 bytes payload keep alive: 1 7033.83 requests per second
The small instance is a no-go if you want to use it for Redis. Keep in mind it is AMD-based and in general the High CPU instances (with Intel Xeons) outperform their AMD brothers dramatically.
Amazon High CPU Medium ($0.20/h)
====== PING ====== 100007 requests completed in 6.52 seconds 50 parallel clients 3 bytes payload keep alive: 1 15333.79 requests per second ====== SET ====== 100006 requests completed in 2.22 seconds 50 parallel clients 3 bytes payload keep alive: 1 44986.95 requests per second ====== GET ====== 100009 requests completed in 2.21 seconds 50 parallel clients 3 bytes payload keep alive: 1 45252.94 requests per second ====== INCR ====== 100000 requests completed in 2.35 seconds 50 parallel clients 3 bytes payload keep alive: 1 42625.75 requests per second ====== LPUSH ====== 100009 requests completed in 2.24 seconds 50 parallel clients 3 bytes payload keep alive: 1 44686.78 requests per second ====== LPOP ====== 100011 requests completed in 2.28 seconds 50 parallel clients 3 bytes payload keep alive: 1 43787.66 requests per second
This is much better, but still sucks. For a similar price you could get a dedicated box at SoftLayer, our current provider, with more than a double performance AND good upgrade options.
Surprisingly, more expensive EC2 instances could not deliver any much higher performance, being in every respect less performant than any decent dedicated box. You could find more benchmarks at the Redis website. Our office quad-core server was also able to get about 100 000 inserts per second.
I know the power of Amazon is not exactly the "inexpensive hardware", but rather flexibility, range of added services, probably easier administration… but there are kind of services you really do not want to put in virtualized environment. Talking to "bare metal" is extremely important when running Redis, and probably any memory-intensive software.
Also, since Redis datasets must fit in the memory, it would be nice to be able to get cheap boxes (slow drives are ok) with lots of ram. Still, it is worth considering if using Amazon EC2 is the best option.
Still, I am considering running the project on EC2 in the initial period, but you really need to be careful about the choice.
How it refers to Wikidot?
When I was testing EC2 instances with PostgreSQL installed, populated with a copy of Wikidot.com database, I was getting only 50% of the performance of the dedicated server for queries that for sure all used only cached data, even on the fastest instances. So it looks like moving our database server to EC2 would significantly decrease our performance. At this moment it is not acceptable. This post on Amazon forums would suggests memory bandwidth problems in EC2 instances.
Previously I have been presenting a possible migration to Amazon EC2 services. After a while it looks like our whole database / webserver infrastructure would need to be reconsidered to benefit from EC2 architecture. In the end we will need to partition our datasets (sharding) and probably modify storage for uploaded files, but honestly I would rather move this moment in time as far as I can, and as long as we still have plenty of options within our current setup.
BTW: A weekend (short) project is a kind of project that should take only a few days to complete, or at least to build a reasonably working and functional prototype. It should be fun and educational, give a chance to explore new solutions and technologies. Perfectly I would welcome more people on-board.
rating: 0, tags: amazon ec2 redis sinatra