11

Thinking Pi

TUTORIAL

iris[‘target’] contains the output flower name (encoded as a 0,1, or 2) for each flower row of the input data matrix. Run ‘iris[‘target’]’ and you can see the first 50 are all 0s, so the first 50 rows from the input data matrix are all the same flower type. Each number corresponds to the flower names in iris[‘target_names’]. 0 is I. setosa, 1 is I. versicolor, and 2 is I. virginica.

You can get more information on the dataset by printing out the DESCR of the object:

print(iris[‘DESCR’])

OK, so we have our data! The machine learning algorithm that will learn the mapping between the flower features to the flower name will be a decision tree. Decision trees are models that try to build a tree of questions that split the data into the separate classes (flower types). We’ll start by importing the algorithm from scikit-learn.

from sklearn.tree import DecisionTreeClassifier

We want to train our model on some of the data, and save a portion of our data for testing. It would not be wise to test a student by just getting them to do an exam they have been practising with and already have the answers for, right? They could just memorise the answers without learning the pattern. So, you give them a different exam that they don’t already have the answers to, and compare the answers they ‘predict’ with the real ‘target’ answers. So, we split the data into training and test data, and also shuffle the data rows so that there is a good mix of each flower in both train and test data. The train_test_split function in scikit-learn does all of this for us and, by default, splits the data so that 25% is put into test, and 75% for training.

from sklearn.model_selection import train_test_ split X = iris[‘data’] y = iris[‘target’] X_train, X_test, y_train, y_test = train_test_ split(X, y)

Now we create a new decision tree model and train/fit it to our training data. This is the training phase, so it gets to look at the inputs (flower features stored in X_train) alongside the outputs (flower names stored in y_train).

Left Example of simple decision tree model after learning iris problem

We split the data into training and test data, and also shuffle the data rows so that there

is a good mix of each flower in both train and test data

”

Petal Length > 2.4

Setosa Petal Length < 1.4

Versicolor Virginica

YES

NO

11

Get our desktop app

Company

Features

Documentation

Resources