Posts tagged "storm"

trident-ml

Trident-ML is a realtime online machine learning library. It allows you to build real time predictive features using scalable online algorihms. This library is built on top of Storm, a distributed stream processing framework which runs on a cluster of machines and supports horizontal scaling. The packaged algorithms are designed to fit into limited memory and processing time but they don’t work in a distributed way.

storm-r

Storm-R uses the multilang protocol to integrate R function calls with a trident Function.

storm-pattern

Based on the cascading.pattern project. The pattern sub-project for http://Cascading.org/ which uses flows as containers for machine learning models, importing PMML model descriptions from R, SAS, Weka, RapidMiner, KNIME, SQL Server, etc.

—Jason

This is an excellent howto by Michael Nole on setting up a multi broker Kafka cluster.

—Jason

Chimpler

Great blog post at Chimpler on using Storm to compute rollups on impression logs and storing the results in MongoDB. Code in GH

—Jason

Great blog post on mining the Twitter stream for religious tweets using Storm. Code on GH.

—Jason

I’m not sure how I haven’t seen this before, but it looks pretty amazing. It provides all the lecture slides and videos as well as student projects.

Check out the video playlist on Youtube

Short article about realtime/stream processing with big data.

Reason 1: Results are changing all the time anyway Reason 2: You can’t have real-time, exactness, and big data Reason 3: Exactness is not necessary Reason 4: You already have an exact batch processing system in place

—Jason

Unbelievably detailed and well written article on using Storm for realtime trending topics analytics. Great diagrams and code snippets. Also, it looks like the author contributed a bunch of his examples into the storm-starter project on GH.

—Jason

This is a great article on using Storm’s Trident, Hadoop, and SploutSQL to build a toy Lambda Archtiecture (Nathan Marz’s concept). And best of all they put all their code in Github.

I’ve never heard of SpoutSQL, but it looks like an interesting technology as well.