In the last three tutorials (Tutorial 1, Tutorial 2, Tutorial 3), I demonstrated how to create an infrastructure to support data science projects. Next in the evolution is to show you how you can load data into Greenplum and R for analysis. For this tutorial I am using the famous Fisher Iris data set. This data is most often used to demonstrate how discriminant analysis can be used to manifest obvious similarities and dissimilarities of objects, and in the case of the Fisher Iris data set, three species of Iris. I chose this particular data because we will be using it in a tutorial in the near future.
Monthly Archives: October 2013
In an earlier post, I recommended a short read on introductory data science and big data. That book gave a fantastic overview of the major areas and ideas governing these disciplines. However, if you are motivated to dig deeper and wrap your head around details, then Data Science for Business should be your next read. This book does a fantastic job of helping the reader understand how one should think if they are considering data science as a profession, or they want to understand all the hype. Further, much detail and time is given to the idea of the “Data Analytics Lifecycle” which governs data science projects through process and a framework. The authors meticulously step through the various modeling techniques with solid examples and explanations. There are sections of the book that detail some math and their derivations which may prove to be challenge if your math is rusty. However it should not present too much of an obstacle with regards to understanding the gist of what is being conveyed. In the preface, the authors state the book is intended for business people who are working with data scientists, managing data scientists, or seeking to understand the value in data science. Also, the book is suited to developers implementing data science solutions and finally, aspiring data scientists. I believe that this book has a role to play in one’s education in data science and that it is an appropriate read for those wishing to understand, with detail, how data science is done and what it aims to achieve.
Louis V. Frolio