Trident-ML is a realtime online machine learning library. It allows you to build real time predictive features using scalable online algorihms. This library is built on top of Storm, a distributed stream processing framework which runs on a cluster of machines and supports horizontal scaling. The packaged algorithms are designed to fit into limited memory and processing time but they don’t work in a distributed way.
Storm-R uses the multilang protocol to integrate R function calls with a trident Function.
Based on the cascading.pattern project. The pattern sub-project for http://Cascading.org/ which uses flows as containers for machine learning models, importing PMML model descriptions from R, SAS, Weka, RapidMiner, KNIME, SQL Server, etc.
—Jason
This is an excellent howto by Michael Nole on setting up a multi broker Kafka cluster.
—Jason
Great article on Lucene’s new-ish (3.1) feature for supporting Near Real-time searches.
—Jason
Brief article about an upcoming Lucene feature. The LiveFieldValues class…
—Jason
Short article about realtime/stream processing with big data.
Reason 1: Results are changing all the time anyway Reason 2: You can’t have real-time, exactness, and big data Reason 3: Exactness is not necessary Reason 4: You already have an exact batch processing system in place
—Jason
Unbelievably detailed and well written article on using Storm for realtime trending topics analytics. Great diagrams and code snippets. Also, it looks like the author contributed a bunch of his examples into the storm-starter project on GH.
—Jason