Posts tagged "pig"

This is an awesome presentation by Russel Jurney on building data driven applications that use big data.

image

Russel is in the midst of writing a book on this topic and the book is currently available for review on O’reilly’s Open Feedback Publishing System

—Jason

Short howto on using Pig to read/write data from/into MongoDB.

—Jason

Embarrassingly, first I’ve heard of this one, but looks very promising.

Apache Ambari is a tool for provisioning, managing, and monitoring Apache Hadoop clusters. Ambari consists of a set of RESTful APIs and browser-based management console UI. The set of Hadoop components that are currently supported by Ambari includes:

  • Apache Hadoop - HDFS
  • Apache Hadoop - MapReduce
  • Apache Hive
  • Apache HCatalog
  • Apache HBase
  • Apache Zookeeper
  • Apache Oozie
  • Apache Pig
  • Apache Sqoop

—Jason

Nice listing of Hadoop MapReduce Frameworks broken out by language.

—Jason

Today I stumbled upon these posts at Data Recipes by @thedatachef. They are really good. Good write-ups, code snippets, and code in GH: varaha, sounder and probably more. I suggest you check them out.

—Jason

I’m not sure how I haven’t seen this before, but it looks pretty amazing. It provides all the lecture slides and videos as well as student projects.

Check out the video playlist on Youtube

Brief intro to using Apache Accumulo and Pig together.

—Jason

Great little article on DataFu, a collection of Pig UDFs for data analysis on Hadoop, from linkedin.

—Jason