Monthly Archives: July 2015

Data Warehouse ETL Offload with Hadoop.

Data Warehouse ETL Offload with Hadoop

Data volumes are growing at an exponential rate causing problems for traditional IT infrastructures.  As a result, we are seeing more and more organizations taking advantage of emerging technologies, like Hadoop, to help mitigate the pressure of exploding data volumes.  Hadoop and its eco-system of tools play an important role in tackling tough problems that are plaguing traditional IT and data warehouse environments.

Specifically, Extract, Transform, and Load (ETL) can be offloaded to Hadoop to address the problem of exploding data volumes that are breaking traditional IT and ETL processes.  Using screen casts and animated video, I will demonstrate how Hadoop can be used to offload the most taxing ETL workloads.

First, I will demonstrate the overarching problem of ETL overload in a fictitious company called Acme Sales.  Proceeding from there are actual demonstrations (screen casts) of an ETL offload using Hadoop, Hive, and Sqoop.

Before we begin I would like to introduce Noelle Dattilo, the newest guest author on DataTechBlog.  Noelle is an education expert who specializes in animation technology.  Noelle gets full credit for creating all the animations you are about to see in this post.  I asked Noelle to explain her process for creating compelling animations.

“These series of animations are created by an on-line program called VideoScribe, a white board animation tool.  These animations leverage problem-based learning to paint a conceptual picture in a form that is easily digestible to a wide audience.  To create the animations, I found some interesting graphics, turned them into SVG files (that’s the tricky part,) uploaded them into VideoScribe and placed them in the order to be drawn.  Once I uploaded the audio track that Louis recorded, I synced the timing for each animation to be drawn, with the track, and voila` we have our Acme Sales animations.”

Photo via

Continue reading

Leave a Comment

Filed under Big Data Use Cases, Tutorials