Genetic_Programming_Theory_and_Practice_XIII

(C. Jardin) #1
Using Genetic Programming for Data Science:

Lessons Learned

Steven Gustafson, Ram Narasimhan, Ravi Palla, and Aisha Yousuf


Abstract In this chapter we present a case study to demonstrate how the current
state-of-the-art Genetic Programming (GP) fairs as a tool for the emerging field
of Data Science. Data Science refers to the practice of extracting knowledge from
data, often Big Data, to glean insights useful for predicting business, political or
societal outcomes. Data Science tools are important to the practice as they allow
Data Scientists to be productive and accurate. GP has many features that make it
amenable as a tool for Data Science, but GP is not widely considered as a Data
Science method as of yet. Thus, we performed a real-world comparison of GP with a
popular Data Science method to understand its strengths and weaknesses. GP proved
to find equally strong solutions, leveraged the new Big Data infrastructure, and was
able to provide several benefits like direct feature importance and solution confi-
dence. GP lacked the ability to quickly build and test models, required much more
intensive computing power, and, due to its lack of commercial maturity, created
some challenges for productization as well as integration with data management
and visualization capabilities. The lessons learned leads to several recommendations
that provide a path for future research to focus on key areas to improve GP as a Data
Science tool.


Keywords Genetic programming • Data Science • Gradient boosted regression



  • Machine learning • Industrial applications • Real-world application • Lessons
    learned • Diversity • Ensembles


1 Introduction


Nearly 10 years ago, in this same book series, Castillo et al. ( 2004 ) evaluated
Genetic Programming (GP) as a suitable technique for industrial systems modeling.
The authors examined the state-of-the-art GP system developed within Dow


S. Gustafson () • R. Palla • A. Yousuf
Knowledge Discovery Lab, GE Global Research, Niskayuna, NY, USA
e-mail:[email protected]


R. Narasimhan
Data Science, GE Software, San Ramon, CA, USA


© Springer International Publishing Switzerland 2016
R. Riolo et al. (eds.),Genetic Programming Theory and Practice XIII,
Genetic and Evolutionary Computation, DOI 10.1007/978-3-319-34223-8_7


117
Free download pdf