Elektor_Mag_-_January-February_2021

([email protected]) #1

102 January & February 2021 http://www.elektormagazine.com


sentative of the reading of a single sensor at a given instance in
time. The dataset also contains labels that distinguish failures and
anomalies from the proper functioning of the system.

Once the dataset is downloaded, we install the libraries mentioned
above. From the command line, enter: 
$ pip install numpy pandas scikit-learn matplotlib
seaborn jupyter numpy pandas install

Once the libraries are installed we can set up a simple pipeline
for data analysis.

The first notebook
The first step is to create a new notebook. From the command line
we launch Jupyter using the following instruction: 
$ jupyter-notebook 

A screen similar to the one shown in Figure  1 will open. We create
a notebook by selecting New > Python 3. A new tab will open in our
browser with the newly created notebook. Let’s take a moment
to familiarize ourselves with the interface, shown in Figure  2 ,
which resembles (very vaguely) an interactive command line, a
top menu, and several options. 

The first thing that jumps out is the cell, one part of the view we
can now see. The execution of single cells is initiated by the Run
button and is independent from that of the other cells (we must
keep in mind that the concept of scope of variables remains valid). 

The three buttons immediately to the right of the Run button
allow you to stop, reboot and reset the kernel, i.e. the instance
that Jupyter associates to our notebook. Restarting the instance
may be necessary to reset the local and global variables associated
with the script, which is especially useful when you are experi-
menting with new methods and libraries. 

Another useful option is the one that allows you to select the cell
type, choosing between Code (i.e. Python code), Markdown (useful

we can think of a dataset as an Excel spreadsheet. The rows provide
the samples, that is the individual observations of the phenome-
non, while the columns provide the features, that is the values
that characterize each of the aspects of the process. Returning to
the example of smart manufacturing, each row will represent the
conditions of the production chain at a given moment while each
column will indicate the reading of a given sensor.


When we talked about Scikit-Learn, we briefly mentioned
the concept of label or class. The presence or absence of labels
allows you to distinguish between supervised and unsupervised
algorithms. The difference is, at least in principle, quite simple:
supervised algorithms require a priori knowledge of the class of each
sample in the example dataset, while the unsupervised algorithms do
not. In practical terms, to use a supervised algorithm it is required
that a domain expert establishes the class of belonging for each
sample. In the case of a smart manufacturing process, an ’expert’
could determine if a set of readings, from a specific moment of time,
represent an abnormal situation or not. Thus the single sample
can be associated with one of these two possible classes (abnor-
mal/normal). This is not necessary for unsupervised algorithms.


Futhermore, a distinction must be made between processes with
independent and identically distributed data (IID) and with data in
a chronological order. The difference is related to the nature of the
phenomenon under observation. Samples of an IID process are
independent of each other, while in a time series each sample
depends on a linear or non-linear combination of the values that
the process output at previous points in time.


Let’s get started!
With the necessary theoretical and practical terms covered, we
move on to using a suitable dataset for our example case. The
dataset used is SECOM, an acronym that stands for SEmiCOnductor
Manufacturing, that contains the values read by a set of sensors
during the monitoring of a semiconductor manufacturing process.
In the dataset, which can be downloaded from different sources
(such as Kaggle [5]) there are 590 variables, each of which is repre-


Figure 1: The home screen for managing notebooks in Jupyter.

Figure 2: An empty notebook.
Free download pdf