11

(Marcin) #1

Thinking Pi


FORGE

model = DecisionTreeClassifier()

model.fit(X_train, y_train)

That’s it! Now you get predictions on the test data:

model.predict(X_test)

You can see how many of these it predicts correctly
by manually comparing them to the answers stored
in y_test.
You should find that they’re mostly correct,
but we can check the model’s performance in
an easier way. Scikit-learn models have a .score
function that will tell you what ratio of predictions
it gets right for some input data, alongside
their answers:

model.score(X_test, y_test)

We got a result of 0.9473684, which means we
were 94% correct. It’s not perfect, but pretty good for
a few lines of code.
Now, because we are on a Raspberry Pi using
Python, we can control anything we want with the
output. Some LEDs perhaps? We can import the
gpiozero library, which will allow us to easily control
hardware using Python.

from gpiozero import LED
# connect an LED to pin 17 on the Raspberry Pi
led = LED(17)

We can create a function that will turn on the LEDs
if we predict a flower to be of type setosa (output
label of 0).

def led_on_if_setosa(input_data):
# input data is a list with the following
values:
#[sepal length (cm), sepal width (cm), petal
length (cm), petal width (cm)]
prediction = model.predict([input_data])[0]

if prediction == 0:
led.on()
else:
led.off()

Let’s see if we a found a setosa flower:

led_on_if_setosa([ 4.7, 3.2, 1.3, 0.2])

Hopefully your LED has turned on.
This is the basic structure for code to bring machine
learning to physical computing projects. You need
some training data, a learning algorithm, and a way
of performing actions depending on predictions. If an
automatic flower identifying kit isn’t what you’re after,
then you can send almost any type of data into this
system (provided it’s not too noisy). Temperature and
other environmental sensors can work really well, but
it depends on what you want to control.

Above
Iris versicolor, also
known as the Blue
Flag or Purple Iris,
is the official flower
of Quebec
Credit
Danielle Langlois
CC-BY-SA

IRIS DATASET


The iris dataset is one of the most commonly
used in machine learning. It was first presented
by Ronald Fisher in the paper, The use of multiple
measurements in taxonomic problems, released
in 1936. The four attributes (Sepal Length, Sepal
Width, Petal Length, and Petal Width) combine to
determine the species, though no one can do it
by itself.
It’s a great dataset to get started with. If you
want to try your new-found machine learning skills
with more inputs, there are some other datasets at
hsmag.cc/HGGFOa that you can download
and use (though some of them may need a little
manipulation before they’re in a suitable format).
Free download pdf