In part 1 of this series, I demonstrated how to install, configure, and run a three node Hadoop cluster. In part 2, you were shown how to take the default “word count” YARN job that comes with Hadoop 2.2.0 and make it better. In this leg of the journey, I will demonstrate how to install and run Hive. Hive is a tool that sits atop Hadoop and facilitates YARN (next generation map-reduce) jobs without having to write Java code. With HIVE, and its scripting language HiveQL, querying data across HDFS is made simple. HiveQL is a SQL like scripting language which enables those with SQL knowledge immediate access to data in HDFS. HiveQL also lets you reference custom MapReduce scripts right in HiveQL queries.
The application of analytics in healthcare has been transforming over the past five to six years. Prior to this transformation, analytics applied to patient data were mostly descriptive in nature. That is to say, the simple reports generated by healthcare providers were basic and only told the story of “what happened.” In this era of big data, more and more healthcare organizations are looking to take advantage of their data in a more meaningful way. Their goal is to extract business relevant information that enables providers, managers, and executives to derive actionable insight from their data. Recently, I had the pleasure of researching this topic for a graduate class I took. I feel strongly that we are seeing a paradigm shift in how providers and payers are looking at their data (both structured and unstructured). This research addresses the key issues facing the healthcare industry today as well as in the future.