In the first part of this series (Part 1 of 3) we installed and configured CentOS on a virtual machine. This laid the foundation and made ready an environment that will now be used to install Pivotal Greenplum Community Edition. This edition allows for any use on a single node per Pivotal’s license model. Also, as part of this tutorial I will be demonstrating how to install MADlib (open-source) libraries into Greenplum. MADlib provides a rich set of libraries for advanced in-database data analysis and mining which can be called via regular SQL. The installation of Greenplum and MADlib will facilitate some of the data science excercises I will be demonstrating in the near future.
Tag Archives: Analytics
Building an Infrastructure to Support Data Science Projects (Part 2 of 3) – Installing Greenplum with MADlib
Welcome to DataTechBlog. My name is Louis and I am a data professional. I espouse all data: big, small, structured, semi-structured, unstructured, dark, sensor, I do not discriminate. For the past 20 years I have gained expertise in many aspects of data including, analytics, management, operations, architecture, technology, administration, and engineering.
Over the past several years the terms “data science” and “big data” have become commonplace. My goal is to help other data and database professionals learn about the emerging disciplines of data science and big data analytics. Here you will find tutorials, how to’s and topic discussions on various dimensions of these disciplines including data mining, exploratory data analysis, data prep/scrubbing, data engineering, tools (e.g. Greenplum, R, MADlib, Hadoop, Hive, Pig, etc.), visualizations, and much more.
Coming from a traditional data architecture background, I can help bridge the gap for people who work with RDBMS technologies who are interested in learning more about data science and big data analytics.