Spark and Shark tutorial/course given at Strata, materials online.
This is a great post using Python, Neo4j, and Bulbflow to build a recommendation system using a graph database. It looks like they crawled SnapGuide to get their data for this.
The code for bulbflow is on GH.
Here is another post on Graph Recommendation Systems using Gremlin.
Article on using SolrCloud for low latency analytics. Example configs on GH.
This is an awesome presentation by Russel Jurney on building data driven applications that use big data.
Russel is in the midst of writing a book on this topic and the book is currently available for review on O’reilly’s Open Feedback Publishing System
Great blog post at Chimpler on using Storm to compute rollups on impression logs and storing the results in MongoDB. Code in GH
Great blog post on mining the Twitter stream for religious tweets using Storm. Code on GH.
Great (short) presentation on “Storing Time Series Metrics With Cassandra and Composite Columns” by Joe Stein from Medialets. With code on GH.
Great article on using Gremlin to query graph data in various datastores.
Nice tutorial on writing UDFs for Hive.
Some nice Pig and HBase hackery.