MySQL for the Internet of Things

(Steven Felgate) #1
Chapter 4 ■ Data transformation

Let’s say you decide you want to read temperature values every hour and soil moisture values every
six hours. Given those intervals, what do you do with the six temperature values? You would have six
temperature values for each soil moisture value. Do you average the temperature values or throw away five
of them?
Clearly, discarding five sensor readings is a potential loss of information. In this case, you could lose
data about when the temperature changed. For example, if the temperature changed at the first hour
by 4 degrees (not unusual in my area) but only 1 degree over the next five hours, saving the last value
obscures when the temperature changed and more importantly loses the time event of the rapid change of
temperature. Even averaging the values will lose the data and obscure knowledge. The loss of knowledge
may not be obvious and requires a bit of thought. Table 4-3 shows an example of the type of data we could
collect.


Table 4-3. Sensor Data Frequency and Loss of Knowledge


Hour Temperature Soil Moisture


1 24.5


2 24.7


3 24.9


4 25.2


5 25.4


6 25.6 426


7 25.8


8 27.9


9 30.1


10 29.3


11 28.9


12 28.6 410


Notice here we see values for the temperature (in Celsius) but only one for the soil moisture each six-
hour period. If we stored the temperature read only when the soil moisture is read, we would see a large
change in value and would not know when the temperature changed—only that it changed since the last
temperature read six hours prior.
For example, notice the temperature and soil moisture for hour 6. Here we see we stored the values
(25.6, 426) respectfully. Notice the values for hour 12. Here we stored the values (28.9, 410). While there
wasn’t much change for the soil moisture, we see a change of temperature of (28.9 – 25.6 = 3.0). However, we
have lost the moment when the temperature changed the most between hours 7 and 10 and even the fact
that the temperature was highest at hour 9.
Conversely, if we averaged the temperature values read, we would save data at hour 6 of (25.05, 426)
and hour 12 of (28.43, 410). While we have factored in the values over time, we haven’t gained much more
information. Yes, we still detect the trend of the temperature rising between the intervals, but the hour where
the temperature was highest is still lost. You can also say we’ve lost the knowledge of the rate of change as
well as even accuracy since we are storing values for temperature which are not accurate for the time the
value was read.
When you encounter situations where storing the sensor data as a single row will obscure knowledge,
you are going to need to divide your data over two tables instead of one. Figure 4-2 shows an example of a
solution where we can save sensor data at different rates but still associate it with a single thing.

Free download pdf