
Spatial Hadoop is a MapReduce framework designed specifically to handle huge datasets of spatial data. SpatialHadoop is shipped with built-in spatial high level language, spatial data types, spatial indexes and efficient spatial operations.
Code on GH.
—Jason
This is an awesome presentation by Russel Jurney on building data driven applications that use big data.

Russel is in the midst of writing a book on this topic and the book is currently available for review on O’reilly’s Open Feedback Publishing System
—Jason
Nice tutorial on writing UDFs for Hive.
—Jason
Embarrassingly, first I’ve heard of this one, but looks very promising.
Apache Ambari is a tool for provisioning, managing, and monitoring Apache Hadoop clusters. Ambari consists of a set of RESTful APIs and browser-based management console UI. The set of Hadoop components that are currently supported by Ambari includes:
—Jason
Nice listing of Hadoop MapReduce Frameworks broken out by language.
—Jason
I’m not sure how I haven’t seen this before, but it looks pretty amazing. It provides all the lecture slides and videos as well as student projects.
Check out the video playlist on Youtube
Brief intro to using Apache Accumulo and Pig together.
—Jason
Short article on using the Apache Blur REPL for interacting with it.
If you haven’t checked out Blur before, you definitely should. It is an open source search engine built over Lucene and Hadoop. It provides a Thrift API for queries, and it mainly uses MapReduce for ETL/ingest.
Here are some of the features listed on its website:
—Jason