Monthly Archives: March 2014

Operationalizing a Hadoop Eco-System (Part 2: Customizing Map Reduce)

Hadoop Map Reduce

It gives me great pleasure to introduce a new contributor to DataTechBlog.  Ms. Neha Sharma makes her debut with this blog post.  Neha is a talented software engineer and big data enthusiast.  In this post, she will be demonstrating how to enhance the “word count” map reduce job that ships with hadoop.   The enhancements will include the removal of “stop” words, the option for case insensitivity and the removal of punctuation.

In part 1 of this series you were shown how to install and configure a hadoop cluster.  Here you will be shown how to modify a map reduce job. In this case the job to be modified is the word count example that ships with hadoop.

photo via
Continue reading

1 Comment

Filed under Big Data