Trident-ML is a realtime online machine learning library. It allows you to build real time predictive features using scalable online algorihms. This library is built on top of Storm, a distributed stream processing framework which runs on a cluster of machines and supports horizontal scaling. The packaged algorithms are designed to fit into limited memory and processing time but they don’t work in a distributed way.
Storm-R uses the multilang protocol to integrate R function calls with a trident Function.
Based on the cascading.pattern project. The pattern sub-project for http://Cascading.org/ which uses flows as containers for machine learning models, importing PMML model descriptions from R, SAS, Weka, RapidMiner, KNIME, SQL Server, etc.
—Jason

This is a great post using Python, Neo4j, and Bulbflow to build a recommendation system using a graph database. It looks like they crawled SnapGuide to get their data for this.
The code for bulbflow is on GH.

Here is another post on Graph Recommendation Systems using Gremlin.
—Jason
This is an awesome presentation by Russel Jurney on building data driven applications that use big data.

Russel is in the midst of writing a book on this topic and the book is currently available for review on O’reilly’s Open Feedback Publishing System
—Jason
Great (short) presentation on “Storing Time Series Metrics With Cassandra and Composite Columns” by Joe Stein from Medialets. With code on GH.
—Jason