In this third and final part (Part 1 of 3, Part 2 of 3) of the series, I walk you through the installation and configuration of R and RStudio. I also demonstrate how R is integrated with Pivotal Greenplum. For those of you who don’t know what R is, you can go here for a lot of useful information. In short, R is a scripting language and runtime environment used for performing complex (or simple) statistical analysis of data. This tool is available for free under the GNU General Public License. RStudio is a free and open source IDE for R. You can go here for more information about RStudio.
Monthly Archives: September 2013
Building an Infrastructure to Support Data Science Projects (Part 3 of 3) – Installing and Configuring R / RStudio with Pivotal Greenplum Integration
Building an Infrastructure to Support Data Science Projects (Part 2 of 3) – Installing Greenplum with MADlib
In the first part of this series (Part 1 of 3) we installed and configured CentOS on a virtual machine. This laid the foundation and made ready an environment that will now be used to install Pivotal Greenplum Community Edition. This edition allows for any use on a single node per Pivotal’s license model. Also, as part of this tutorial I will be demonstrating how to install MADlib (open-source) libraries into Greenplum. MADlib provides a rich set of libraries for advanced in-database data analysis and mining which can be called via regular SQL. The installation of Greenplum and MADlib will facilitate some of the data science excercises I will be demonstrating in the near future.
Building an Infrastructure to Support Data Science Projects (Part 1 of 3) – Creating a Virtualized Environment.