Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
set of numbers (the “one less than” is to do with the number of degrees of
freedom in the sample, a statistical notion that we don’t want to get into here).
The probability density function for a normal distribution with mean mand
standard deviation sis given by the rather formidable expression:

But fear not! All this means is that if we are considering a yesoutcome when
temperaturehas a value, say, of 66, we just need to plug x=66,m=73, and s=
6.2 into the formula. So the value of the probability density function is

By the same token, the probability density of a yesoutcome when humidityhas
value, say, of 90 is calculated in the same way:

The probability density function for an event is very closely related to its prob-
ability. However, it is not quite the same thing. If temperature is a continuous
scale, the probability of the temperature being exactly66—or exactlyany other
value, such as 63.14159262—is zero. The real meaning of the density function
f(x) is that the probability that the quantity lies within a small region around x,
say, between x-e/2 and x+e/2, is ef(x). What we have written above is correct

f humidity( = 90 yes)=0 0221..

f temperature( = yes)= e

=

( - )
66 1 ◊
262

0 0340

66 73
262

2
2
p.

. ..


fx e

x
( )=

(-)
1
2

2
22
ps

m
s.

4.2 STATISTICAL MODELING 93


Table 4.4 The numeric weather data with summary statistics.

Outlook Temperature Humidity Windy Play

yes no yes no yes no yes no yes no

sunny 2 3 83 85 86 85 false 6 2 9 5
overcast 4 0 70 80 96 90 true 3 3
rainy 3 2 68 65 80 70
64 72 65 95
69 71 70 91
75 80
75 70
72 90
81 75


sunny 2/9 3/5 mean 73 74.6 mean 79.1 86.2 false 6/9 2/5 9/14 5/14
overcast 4/9 0/5 std. dev. 6.2 7.9 std. dev. 10.2 9.7 true 3/9 3/5
rainy 3/9 2/5

Free download pdf