Tag Archives: R Statistics

Structuring a Data Analysis using R (Part 2 of 2) – Analyzing, Modeling, and the Write-up.

Data Analysis Using R

In the first part (Structuring a Data Analysis using R (Part 1 of 2)) of this two part series, I discussed several key aspects necessary to any successful data analysis project. In that post I also began a prototypical data analysis project working my way up through munging of the data. All those steps in (Part 1 of 2) enabled me to begin the analysis and modeling parts of the project.  This post picks up and continues with the data analysis which will culminate in a formal write-up of the data analysis demonstrated here.

Continue reading


Filed under Foundations, Tutorials

Greenplum, R, Rstudio, and Data. The Basic Ingredients for Successful Recipes.

IngredientsIn the last three tutorials (Tutorial 1, Tutorial 2, Tutorial 3), I demonstrated how to create an infrastructure to support data science projects.  Next in the evolution is to show you how you can load data into Greenplum and R for analysis. For this tutorial I am using the famous Fisher Iris data set.  This data is most often used to demonstrate how discriminant analysis can be used to manifest obvious similarities and dissimilarities of objects, and in the case of the Fisher Iris data set, three species of Iris.  I chose this particular data because we will be using it in a tutorial in the near future.

Continue reading

Leave a Comment

Filed under Foundations, Tutorials

Building an Infrastructure to Support Data Science Projects (Part 3 of 3) – Installing and Configuring R / RStudio with Pivotal Greenplum Integration

RLogoIn this third and final part (Part 1 of 3, Part 2 of 3) of the series, I walk you through the installation and configuration of R and RStudio.  I also demonstrate how R is integrated with Pivotal Greenplum.  For those of you who don’t know what R is, you can go here for a lot of useful information.  In short, R is a scripting language and runtime environment used for performing complex (or simple) statistical analysis of data. This tool is available for free under the GNU General Public License.  RStudio is a free and open source IDE for R. You can go here for more information about RStudio.

Continue reading


Filed under Infrastructure, Tutorials