In the first part (Structuring a Data Analysis using R (Part 1 of 2)) of this two part series, I discussed several key aspects necessary to any successful data analysis project. In that post I also began a prototypical data analysis project working my way up through munging of the data. All those steps in (Part 1 of 2) enabled me to begin the analysis and modeling parts of the project. This post picks up and continues with the data analysis which will culminate in a formal write-up of the data analysis demonstrated here.
Over the next two tutorials, I am going to walk you through a complete data analysis project. You will be shown the proper steps necessary to ensure a consistent and repeatable process that can be used for all your data analysis projects. Simply put, this tutorial’s goal is to create a framework and provide a set of tools that can be used to support any data science project.
In the last three tutorials (Tutorial 1, Tutorial 2, Tutorial 3), I demonstrated how to create an infrastructure to support data science projects. Next in the evolution is to show you how you can load data into Greenplum and R for analysis. For this tutorial I am using the famous Fisher Iris data set. This data is most often used to demonstrate how discriminant analysis can be used to manifest obvious similarities and dissimilarities of objects, and in the case of the Fisher Iris data set, three species of Iris. I chose this particular data because we will be using it in a tutorial in the near future.