Tag Archives: hadoop

Cascading – MapReduce without the complexity

Posted on by

Just bumped into Cascading, which is an open source (GPL 3) framework "for defining and executing complex and fault tolerant data processing workflows on a Hadoop cluster". Hadoop is, I’m sure you all know, an implementation of MapReduce which is at the core of how Google does its processing.

Anyway, the Cascading API "lets the developer quickly assemble complex distributed processes without having to "think" in MapReduce", which can get really complex in non-trivial applications.