Statistics & Business Decision Making
The role of statistics in business can be traced back hundreds of years. As early as 744 AD, statistics were used by Gerald of Wales to complete the first population census of Wales (1). It wasn’t long before merchants realized that statistics could be used to measure and quantify trade. The first record of this was in Florence. It was recorded in Giovanni Villani’s “Nuova Cronica”, in 1346 (1). Moreover, statistical methods were further adopted to help drive quality and in doing so helped contribute to the advancement of statistics itself. In 1504, William Sealy Gosset, chief brewer for Guinness in Dublin, devised the t-test (2) to measure consistency between batches of stout (1).
Back in January of 2014, I wrote a post describing my first MOOC experience. Also, during this period of time I shared my insights on a new construct called a Data Lake. The data lake concept has evolved substantially since I first reported on it back in February of 2014. In fact, the ideas around Big Data are in a constant state of flux. This stuff is evolving at the speed of now!
Fast forward to today I find myself in a unique position (shaking my head in disbelief) to be part of a team within EMC developing and offering a MOOC called “Data Lakes for Big Data.” As a member of the Big Data Solutions team, I support the training and education portfolio for EMC’s go-to-market strategy for Big Data across EMC’s federation of companies.
As one of the MOOC’s instructors, I can provide you a bit more insight into the course but first I want each and every one of you to sign up right here: Data Lakes for Big Data. The class is now open and is being delivered asynchronously, meaning you can consume the material at your convenience.
The overarching goal of this MOOC is take a person who has no familiarity with Big Data, Data Science, and Data Lakes and give them a basic foundation of knowledge from which they can grow. The MOOC is broken up into four 1 week sessions, with each week introducing a new topic:
- Week 1: What is Big Data and Data Science?
- Week 2: What is the Value of Big Data and Big Data Analytics?
- Week 3: What is a Data Lake?
- Week 4: How is a Data Lake Operationalized?
This online course is for newbies, there are no pre-requisites outside of a genuine interest in Big Data and a willingness to learn. In this course you will see videos from today’s top thought leaders speaking on Big Data and Data Science, including EMC’s Big Data Solutions CTO Chris Harrold, EMC’s Data Science guru David Dietrich, and none other than the Dean of Big Data himself, EMC’s Bill Schmarzo!
All those who finish the MOOC (with a passing grade of 70 or above) will receive a certificate of completion.
I look forward to seeing you in the course.
The History and Use of R
Recently I attended a great lecture on the statistical programming language R. Titled “The History and Use of R,” this talk was was held at HackReduce in Cambridge, Massachusetts and was sponsored by MediaMath. The lecturer, Joe Kambourakis, is a colleague of mine and is the lead Data Science instructor at EMC Educational Services. Joe is also a talented Data Scientist.
He did a great job of putting together the genesis and evolution of what is one of the hottest programming languages for statistics and graphics today. If you are a practitioner of R, then I encourage you to check out this presentation.
The application of analytics in healthcare has been transforming over the past five to six years. Prior to this transformation, analytics applied to patient data were mostly descriptive in nature. That is to say, the simple reports generated by healthcare providers were basic and only told the story of “what happened.” In this era of big data, more and more healthcare organizations are looking to take advantage of their data in a more meaningful way. Their goal is to extract business relevant information that enables providers, managers, and executives to derive actionable insight from their data. Recently, I had the pleasure of researching this topic for a graduate class I took. I feel strongly that we are seeing a paradigm shift in how providers and payers are looking at their data (both structured and unstructured). This research addresses the key issues facing the healthcare industry today as well as in the future.
In the inaugural post of DataTechBlog, I stated a goal of helping others learn more about the emerging field of data science and big data analytics. It was my original intent to tackle this lofty goal primarily via instructive tutorials. However, as of late, I realize just how important formal instruction is to the learning process. Prior to my professional career in data, I spent the better part of 12 years in college earning various degrees and taking many (extra) classes. The education process afforded me an opportunity to saturate myself in various topics. In the milleau of the classroom setting, many opportunities abounded and I, having been a motivated student, was able to run with the proverbial ball. I guess my point here is that I (having been out of college for a while) forgot just how important classroom learning is to foster expertise in a particular subject and/or discipline. As such, I signed up for and successfully completed my first MOOC (Massive, Open, Online Course).