Posts tagged "mapreduce"

Parquet is a columnar storage format for Hadoop.

Format and MapReduce code on GH. Includes a Loader and Storer for pig.

Released under the Apache 2.0 License.

Spatial Hadoop is a MapReduce framework designed specifically to handle huge datasets of spatial data. SpatialHadoop is shipped with built-in spatial high level language, spatial data types, spatial indexes and efficient spatial operations.

Code on GH.

—Jason

This is an awesome presentation by Russel Jurney on building data driven applications that use big data.

image

Russel is in the midst of writing a book on this topic and the book is currently available for review on O’reilly’s Open Feedback Publishing System

—Jason

Nice tutorial on writing UDFs for Hive.

—Jason

Embarrassingly, first I’ve heard of this one, but looks very promising.

Apache Ambari is a tool for provisioning, managing, and monitoring Apache Hadoop clusters. Ambari consists of a set of RESTful APIs and browser-based management console UI. The set of Hadoop components that are currently supported by Ambari includes:

  • Apache Hadoop - HDFS
  • Apache Hadoop - MapReduce
  • Apache Hive
  • Apache HCatalog
  • Apache HBase
  • Apache Zookeeper
  • Apache Oozie
  • Apache Pig
  • Apache Sqoop

—Jason

Nice listing of Hadoop MapReduce Frameworks broken out by language.

—Jason

I’m not sure how I haven’t seen this before, but it looks pretty amazing. It provides all the lecture slides and videos as well as student projects.

Check out the video playlist on Youtube

Brief intro to using Apache Accumulo and Pig together.

—Jason

Short article on using the Apache Blur REPL for interacting with it.

If you haven’t checked out Blur before, you definitely should. It is an open source search engine built over Lucene and Hadoop. It provides a Thrift API for queries, and it mainly uses MapReduce for ETL/ingest.

Here are some of the features listed on its website:

  • Fast data ingestion
  • Hierarchical data storage
  • Record-level access control
  • Paged results
  • Quick search
  • Boolean search logic
  • Fuzzy searches
  • Wildcard searches
  • Facets
  • Term statistics
  • Term lists

—Jason