Building an Infrastructure to Support Data Science Projects (Part 1 of 3) – Creating a Virtualized Environment.

Construction

As with any project or experiment,  infrastructure has to be in place to support the intended work.  For the case of a data science project, the obvious first step is the computing environment.  Simply stated, you can’t do advanced analytics on large data sets without CPU, RAM and Disk. With these items as your foundation, much can be designed, engineered and built.  Before we can walk through a data science project we need to first have hardware and software in place.  For the purposes of the tutorials here on DataTechBlog, a P.C. or laptop with adequate CPU, RAM and disk will suffice.  Further, it is  my plan to use only open or free software and code for all tutorials. You need only a reasonably spec’d computer to accomplish all that we will do here.  This tutorial will walk you through the installation of VMWare Player  and CentOS 6.x (Optimized for Pivotal Greenplum).  This lays the foundation for the next steps which will include the installation of Pivotal Greenplum, MADlib libraries, R, and R-Studio.  When this environment is complete, you will be able to perform many types of “in database” analysis using SQL with MADlib, analysis using R with Greenplum, and analysis with R against flat files or manually entered data.

Let’s Begin

Step 1.  
Download and install VMWare Player.  This should take only 5-10 minutes.  Because we are installing a 64 bit O.S. you need to verify that your CPU supports 64 bit virtualization.  You can check for that support here.  If your CPU does support virtualization you need to check that your BIOS is properly enabled.  Here is a great Youtube video that talks you through the proper BIOS configuration to support 64 bit virtualization.
Step 2.  Download and stage CentOS ISO (At the time of this post 6.4 was the latest) images.  There are many places from which you can download these files, I found mine here.  The two files you want are CentOS-6.x-x86_64-bin-DVD1.iso, and  CentOS-6.x-x86_64-bin-DVD2.iso.  Where “6.x” is the latest release
Step 3.  Launch VMWare Player and choose “Create New Virtual Machine.”
Step 4.   At “New Virtual Machine Wizard” choose “I will install the operating system later.”
CreateVM_1
Click “Next”
Step 5.  At the “Select a Guest Operating System” choose “Linux”, Version CentOS 64-bit.
CreateVM_2
Click “Next”
Step 6.  At the “Name the Virtual Machine” give the VM a location and a name.   Ensure that you have adequate disk space for the location of the VM files.
CreateVM_3
Click “Next”
Step 7.  At the “Specify Disk Capacity” you can take the defaults. This is good enough for our work.
CreateVM_4
Click “Next”
Step 8.  Review the settings then click on “Finish”.   You now have a VM Shell that is ready for the O.S.
Click “Next”
Step 9.  Right click on your new VM and select “Virtual Machine Settings”
CreateVM_5
Click “Next”
Step 10.  Click on “CD/DVD (IDE)” in the window, then “Use an ISO Image file:”   Browse to and choose: CentOS-6.x-x86_64-bin-DVD1.iso.   Take the rest of the defaults, ensure that  the “Network Adapter”  setting is for “NAT”
CreateVM_6
Click “OK”
Step 11.  Back on the Home screen choose “Play virtual machine”.  At the “Welcome to CentOS 6.x!” screen choose  “Install or upgrade an existing system”.
CreateVM_7
Click “Next”
Step 12.   At the “Test Media” screen choose “Skip”.
CreateVM_8
Click “Next”
Step 13.  Click on “Next” at the CentOS 6 screen.  At the Language screen pick your language.
Click “Next”
Step 14.  Choose your keyboard setup at the keyboard screen.
Click “Next”
Step 15.  Choose “Basic Storage Devices”.
Click “Next”
Step 16.   At the “Storage Device Warning” screen choose “Yes, discard any data”.
CreateVM_9
Step 17.  Provide a hostname for your VM.
CreateVM_10
Click “Next”
Step 18.  Pick a time zone.
Click “Next”
Step 19.  Provide a password for the “root” user.
Click “Next”
Step 20.  At the “Which type of installation would you like” choose “Create Custom Layout”
CreateVM_11
Click “Next”
Step 21.  At the disk partition utility click on “Free”,
                      then click on “Create”,
                       then choose “Standard Partition”,
                       then click on “Create”
                       For file system type choose “swap”
                       For Size choose 1000 MB
Click “OK”
Step 22.  At the disk partition utility click on “Free”,
                       then click on “Create”,
                        then choose “Standard Partition”,
                        then click on “Create”
                        For file system type choose “ext3”
                        For Mount Point choose “/boot”
                        For Size choose 250 MB
Click “OK”
Step 23.  At the disk partition utility click on “Free”,
                       then click on “Create”,
                        then choose “Standard Partition”,
                        then click on “Create”
                        For file system type choose “ext3”
                        For Mount Point choose “/”
                        For Size choose 6000 MB
Click “OK”
Step 24.  At the disk partition utility click on “Free”,
                        then click on “Create”,
                         then choose “Standard Partition”,
                         then click on “Create”
                         For file system type choose “xfs”
                         For Mount Point choose “/opt”
                         For Size go to the “Additional Size Optons” pane and choose “Fill to
                         maximum allowable size”.  This tells the installer to use all remaining
                         space for this mount point.
                         ** Make sure that you choose “xfs” as the file system type for the
                          /opt mount point. This is crucial for the installation of Pivotal
                           Greenplum
Click “Next”
Step 25.  At the “Format Warnings” screen choose “Format”
CreateVM_15
Choose “Write Changes to disk”
Click “Next”
Step 26.  At the Boot Loader Screen take defaults.
CreateVM_16
Click “Next”
Step 27.   At O.S. install screen choose “Basic Server”.
                        Choose “Customize Now” at the bottom of the screen.
CreateVM_17
Click “Next”
Step 28.  Choose the following add-ins:
                       Applications => Internet Browser
                       Desktops => General Purpose Desktop
                       Desktops => Graphical Administration Tools
                       Dekstops => X Window System
                       Servers => System Administration Tools
Click “Next”
Step 29.  When complete you will be prompted to “Reboot”.
Click on “Reboot”
Step 30.  When the system comes back up you will be asked if you would like to make changes to the system configuration.
                       Choose “Firewall Configuration”: Disable Firewall.
                       Highlight “OK” then “Enter”
                       Tab to “Quit” then “Enter”
CreateVM_19
Step 31.  At the “login” prompt log in as “root” with the password from #19 above.  If the desktop did not launch issue the command “startx” at the command prompt: # startx
Step 32.  Once the desktop is active right click and choose “Open in Terminal” then issue the “route” command: # route.  This will return a items that we will be using to enable networking on the VM.  You need to look for and make note of the following:
                         Default Gateway: In my case it was 192.168.107.2
                         Network Mask: In my case it was 255.255.255.0
** In my case 192.168.107.* is the basis for network I am configuring. Your VM may produce a different  basis but the pattern (as shown here) will be the same in your environment.
Step 33.  I will not edit the ifcg-eth0 file, but first I will make a backup copy:
#  cp /etc/sysconfig/network-scripts/ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-eth0.orig
                         Configure /etc/sysconfig/network-scripts/ifcfg-eth0 as such:
                          a.) Leave HWADDR as is
                          b.) DEVICE=eth0
                          c.) BOOTPROTO=static
                          d.) ONBOOT=yes
                          e.) IPADDR=192.168.107.100
                          f.) NETMASK=255.255.255.0
                          g.) BROADCAST=192.168.107.255
                          h.) GATEWAY=192.168.107.2
                          i.) DNS1=192.168.107.2
Step 34.  Restart the network services for the change to take affect.
                       # service network restart
Step 35.  Test that the networking is is working as expected.
                       # ping www.yahoo.com
                        You should see successful ping results.

 

Congratulations, you now have a virtualized environment which will be the foundation for the next part of this series:

(Part 2 of 3) – Installing Pivotal Greenplum with MADlib.

Louis V. Frolio

Leave a Comment

Filed under Infrastructure, Tutorials

Leave a Reply

Your email address will not be published. Required fields are marked *