Posts tagged "bigdata"

Useful how-to on setting up Hadoop in Eclipse to enable dev and contribution.

—Jason

The guys at Infochimps published this pricing breakdown for the various EC2 instances.

Seems like it will come in handy one day…

(via @rjurney)

—Jason

This is an excellent howto by Michael Nole on setting up a multi broker Kafka cluster.

—Jason

Spatial Hadoop is a MapReduce framework designed specifically to handle huge datasets of spatial data. SpatialHadoop is shipped with built-in spatial high level language, spatial data types, spatial indexes and efficient spatial operations.

Code on GH.

—Jason

A nice intro video on ElasticSearch from Air Mozilla. It is ~56min.

—Jason

Spark and Shark tutorial/course given at Strata, materials online.

—Jason

This is an awesome presentation by Russel Jurney on building data driven applications that use big data.

image

Russel is in the midst of writing a book on this topic and the book is currently available for review on O’reilly’s Open Feedback Publishing System

—Jason

Chimpler

Great blog post at Chimpler on using Storm to compute rollups on impression logs and storing the results in MongoDB. Code in GH

—Jason

Great blog post on mining the Twitter stream for religious tweets using Storm. Code on GH.

—Jason

Great (short) presentation on “Storing Time Series Metrics With Cassandra and Composite Columns” by Joe Stein from Medialets. With code on GH.

—Jason