Training for the Aspiring Data Scientist: My Experience with a MOOC.

In the inaugural post of DataTechBlog, I stated a goal of helping others learn more about the emerging field of data science and big data analytics.  It was my original intent to tackle this lofty goal primarily via instructive tutorials.  However, as of late, I realize just how important formal instruction is to the learning process.  Prior to my professional career in data, I spent the better part of 12 years in college earning various degrees and taking many (extra) classes.  The education process afforded me an opportunity to saturate myself in various topics.  In the milleau of the classroom setting, many opportunities abounded and I, having been a motivated student, was able to run with the proverbial ball.  I guess my point here is that  I (having been out of college for a while) forgot just how important classroom learning is to foster expertise in a particular subject and/or discipline.  As such,  I signed up for and successfully completed my first MOOC (Massive, Open, Online Course).

Photo via

A MOOC (Massive Open Online Course) is a college course that is entirely virtualized and offered via the web.  Why I feel compelled to write about it is this; it is the  best educational deal you will ever encounter!  You see, MOOC’s are absolutely free, they are offered by tier 1 institutions (United States and abroad), and the classes are sourced from actual undergrad and graduate courses offered by the respective universities.  I chose Coursera for my first online class, there are other providers as well, each with their own pros and cons.  The class I took was titled Data Analysis and it was offered by Johns Hopkins University and taught by Jeff Leek.  The course is based on a graduate course at Johns Hopkins.

Course logistics

1. No formal pre-requisites
2. No prior programming skills needed but it is highly recommended to have a
basic knowledge of the R programming language
3. Grades are calculated based on 8 quizzes and two peer-reviewed data analysis reports
4. All material was delivered via Coursera’s web portal.

The 8 quizzes were worth 10 points each, the analysis reports were worth 40 points each.  You needed to attain 100 points to pass and 144 points to pass with distinction (yours truly achieved 139 points, oh so close!).

Course implementation (Video Lectures, Quizzes, and Data Analysis assignments)

Once the class got underway I found the delivery vehicle to be quite intuitive.  At the beginning of each week (8 weeks in all) the professor would release a set of short videos.  The number of videos ranged from 4 to 8, with times from 5 minutes to approx 20 minutes.  Along with the videos all supporting documentation was also provided and was easily downloaded.  Each week brought with it a quiz that you can take at your leisure.  It must be stated that these so called quizzes bore much more of a resemblance to a big old pile of homework.  You are allowed to use any material you see fit, so long as there is no collusion with other students.  The quizzes had between 6 and 10 questions, some of which necessitated quite a bit of R programming.  I myself, on  a few quizzes, spent about 2 hours answering the questions.

As mentioned above there were two data analysis assignments.  These assignments were given to the students and you had two weeks to complete each.  The only real complaint I had (along with many of the  students) is that you generally still had about a week of material to learn before you could complete the assignment.  This essentially made it so you really had only  1 week. With that said you still had lectures to watch, a quiz to take, and whatever of the assignment you could get done. Point being, you were always busy with this class.  The assignment was laid out in the published rubric of the class, if you followed the rubric you did well, if you deviated from the rubric you did not do well.  The professor was kind enough to provide us with a sample (ideal) data analysis project. Really, there was no excuse for not getting the format and layout 100% correct.  However, there were some in the class who chose to deviate and as such were smacked pretty hard during grading.  How do I know  you ask? Well, the grading is done by your fellow students. This is actually part of your grade for your analysis projects. Every student, after submitting their assignment (and after the submission deadline) has to grade four other papers.  The grading method was laid out in the course rubric which made it easy, it did not leave much room for interpretation.  The course has a forum where people post question and comments, the ones who got smacked during grading had some ugly things to say to the community (grading is anonymous so you can’t direct your anger at an individual).  In fact, when these posts went up it ended up being a frenzy of e-fights.  I did my best to  stay clear of these skirmishes. Anyhow, the assignments were clearly defined and in the end not to difficult, again if you followed the rubric it was spelled out for you.  In fact, my last blog post was a data analysis project and the steps were based off of what I learned in the class.  I encourage you to take a look at the data analysis project, you can find it here.  You will get a good feel for what is taught in the class and what you will learn.

 My thoughts on the class

I can say with complete certainty that this class was well worth my time and effort.  The class took on average about 8 hours/week of my time and did not present any major obstacles.  You learn some complex subject matter which is key to being able to fill the role of a data scientist.  If you take this class expect that you will work very hard and that you will be required to focus to get the work done.  However, if a student has to withdraw it is as simple as clicking an icon. Once you withdraw there is no public record of your having been enrolled.

PROS

– Class was well organized
– Material was riveting, insightful, and challenging
– You work at your own pace
– Robust and active forums, your questions always get an answer
– It is FREE

CONS

– The peer grading is flawed. It is a crap shoot whether your assignments will be graded fairly
– You cannot communicate with the professor, only the Community TA’s. To be fair, when there are 70,000 people enrolled a class it is hard for the professor to talks to the students.

 Final Thoughts & Comments

There are lots of great articles on the web that speak to the quality and efficacy of MOOCs, particularly:

Two Cheers for Web U! – Great article in the N.Y. Times Sunday Review by A.J. Jacobs. The author condenses his experience taking 11 MOOCs in which he grades several aspects of his experience.
The Condensed Classroom – Insightful analysis of MOOCs and flipped classrooms. Published in the Atlantic, author Ian Bogost.
Massive open online courses: a first report card – Published in The Principal, author Owen Youngman.

Again, I can’t stress enough how great of a deal this MOOC (and all MOOCS for that matter) is.  You get a first class instruction for the low low price of FREE from a first class instructor at a first class institution.  I encourage you to look through Coursera’s catalog and see just how rich and diverse the offerings are.  Further, there is a large number of classes all geared toward data science and data analysis. For the aspiring data scientist this is a great way to test the waters to see if you have the “stuff” needed of a data scientist.

 

Regards, Louis.

1 Comment

Filed under Education & Instruction

One Response to Training for the Aspiring Data Scientist: My Experience with a MOOC.

Leave a Reply

Your email address will not be published. Required fields are marked *