Fundamentals of Probability and Statistics for Engineers

(John Hannent) #1

10.3 Kolmogorov–Smirnov Test


The so-called Kolmogorov–Smirnov goodness-of-fit test, referred to as the
test in the rest of this chapter, is based on a statistic that measures the deviation
of the observed cumulative histogram from the hypothesized cumulative dis-
tribution function.
Given a set of sample values x 1 ,x 2 ,...,xn observed from a population X, a
cumulative histogram can be constructed by (a) arranging the sample values in
in cr easing order of magnitude, denoted here by x(1),x(2),...,x(n), (b) determin-
ing the observed distribution function of X at x(1),x(2),..., denoted by


... , from relations and (c) connecting the values
of F^0 [x(i)] by straight-line segments.
The test statistic to be used in this case is


where X(i) is the ith-order statistic of the sample. Statistic D 2 thus measures the
maximum of absolute values of the n differences between observed probability
distribution function (PDF) and hypothesized PDF evaluated for the observed
samples. In the case where parameters in the hypothesized distribution must be
estimated, the values for FX [X(i)] are obtained by using estimated parameter
values.
While the distribution of D 2 is difficult to obtain analytically, its distribution
function at various values can be computed numerically and tabulated. It can be
shown that the probability distribution of D 2 is independent of the hypothesized
distribution and is a function only of n, the sample size (e.g. see Massey, 1951).
The execution of the K–S test now follows that of the^2 test. At a specified
significance level, the operating rule is to reject hypothesis H if d 2 >cn, ;
otherwise, accept H. Here, d 2 is the sample value of D 2 , and the value of cn, is
defined by


The values of cn, for and 0.10 are given in Table A.6 in
Appendix A as functions of n.
It is instructive to note the important differences between this test and the^2
test. Whereas the^2 test is a large-sample test, the K –S test is valid for all values
of n. Furthermore, the K–S test utilizes sample values in their unaltered and
unaggregated form, whereas data lumping is necessary in the execution of the


(^2) test. On the negative side, the K –S test is strictly valid only for continuous
Model Verification 327
F^0 [x-1)], F^0 [x-2)], F^0 [x-i)]ˆi/n,
D 2 ˆmax
n
iˆ 1
fjF^0 ‰X…i†ŠFX‰X…i†Šjg
ˆmax
n
iˆ 1
i
n
FX‰X…i†Š




;

… 10 : 12 †

P…D 2 >cn; †ˆ :… 10 :1 3†

ˆ







K±S



0 01, 0 05,: :
Free download pdf