In part 1 of this series, I demonstrated how to install, configure, and run a three node Hadoop cluster. In part 2, you were shown how to take the default “word count” YARN job that comes with Hadoop 2.2.0 and make it better. In this leg of the journey, I will demonstrate how to install and run Hive. Hive is a tool that sits atop Hadoop and facilitates YARN (next generation map-reduce) jobs without having to write Java code. With HIVE, and its scripting language HiveQL, querying data across HDFS is made simple. HiveQL is a SQL like scripting language which enables those with SQL knowledge immediate access to data in HDFS. HiveQL also lets you reference custom MapReduce scripts right in HiveQL queries.