11

(Marcin) #1

Thinking Pi


FORGE

In the example of learning the rules of Quidditch
based on pure observation, you can imagine that the
simpler the rules, the fewer observations you would
need in order to learn the rules. The same principle
applies to all machine learning: the more complex the
problem, the more data you will need to learn it, and
vice versa. The other issue of compensating the anti-
learning effect of seeing a bad referee by seeing even
more good referees, can be mirrored within machine
learning as well. It’s known as a noisy data problem
(signal-to-noise for Nate Silver fans). The noisier the
data (i.e. the more examples that don’t perfectly
match the pattern you are trying the learn), the more
data your machine learning model is going to need to
compensate for it.

HELLO PETAL
It should come as no surprise that the problem we
will be solving using the Raspberry Pi is a simple one
with very clean data (and because of that, we won’t
need very much data). The problem is predicting what
type of flower we are looking at, based on different
attributes (or ‘features’ in machine learning lingo) of
the flowers, e.g. petal length and width. First, let’s get
the data!
If you haven’t done interactive Python coding before,
you’re in for a treat! In the Raspberry Pi command line,
you open an interactive Python session by typing in
and running (pressing ENTER) the following:

python

All commands will now run in Python as they
are entered.
Let’s import our flower data from scikit-learn.
It comes with nice toy datasets to play with.

from sklearn.datasets import load_iris
iris = load_iris()

iris.keys() shows all of the different things stored
inside the iris object:

iris.keys()
>>> [‘target_names’, ‘data’, ‘target’, ‘DESCR’,
‘feature_names’]

iris[‘data’] contains the input data (flower
features) for 150 different flowers in a matrix, where
each row is a different flower, and each column is a
different feature.

Pattern complexity

Amount of data needed

Noise in data
Left
Noisy data and
complex problems
require a lot of
data to learn

Below
The first five rows
of flower data
Free download pdf