Welcome to my blog — place for ideas, comments, interesting hacks and boring personal stuff. Enjoy!
27 Apr 2009 08:31
My weekend project involved some Twitter datamining and link analysis (i.e. links that people tweet about). The results we gathered during the weekend were… at least surprising.
By looking at popular Twitter link aggregators like Tweetmeme.com or twitt(url)y you might get the impression that Twitter is full of useful news, links to interesting stuff, a truly user-driven next-level communication.
Below is the result of one-day data analysis, and it lists top links people were tweeting about on Friday:
24 Apr 2009 10:26
Weekend is coming and I have a very small pet-project for it. I would still keep the idea non-public, but it involves processing hundreds of entries per second, analyzing data from multiple sources. It would have a dead-simple web interface.
The nature of the project requires really fast data backend, capable of storing and retrieving a few thousand items per second. The dataset would be approximately 5GB, average item size: 0.5KB.
When it came to tools selection, after short considerations I have chosen Sinatra for web interface, and Redis as a memory-only (with disk dumps) key-value datastore. It should be capable of handling 100 000 requests per second and deal well with large datasets, so fits perfectly. It also differs from Memcached or MemcachedDB because it has great higher-level structures like Lists and Sets, basic sorting and selection commands.
Tags: amazon ec2 redis sinatra