Mathematical Methods for Physics and Engineering : A Comprehensive Guide

(lu) #1

31.7 HYPOTHESIS TESTING


In the last equality, we rewrote the expression in matrix notation by defining the


column vectorfwith elementsfi=f(xi;a). The valueχ^2 (aˆ) at this minimum can


be used as a statistic to test the null hypothesisH 0 , as follows. TheNquantities


yi−f(xi;a) are Gaussian distributed. However, provided the functionf(xj;a)is


linear in the parametersa, the equations (31.98) that determine the least-squares


estimateˆaconstitute a set ofMlinear constraints on theseNquantities. Thus,


as discussed in subsection 30.15.2, the sampling distribution of the quantityχ^2 (aˆ)


will be achi-squared distribution withN−Mdegrees of freedom(d.o.f), which has


the expectation value and variance


E[χ^2 (ˆa)] =N−M and V[χ^2 (aˆ)] = 2(N−M).

Thus we would expect the value ofχ^2 (aˆ) to lie typically in the range (N−M)±

2(N−M). A value lying outside this range may suggest that the assumed model


for the data is incorrect. A very small value ofχ^2 (aˆ) is usually an indication that


the model has too many free parameters and has ‘over-fitted’ the data. More


commonly, the assumed model is simply incorrect, and this usually results in a


value ofχ^2 (ˆa) that is larger than expected.


One can choose to perform either a one-tailed or a two-tailed test on the

value ofχ^2 (ˆa). It is usual, for a given significance levelα, to define the one-tailed


rejection region to beχ^2 (aˆ)>k,wheretheconstantksatisfies
∫∞


k

P(χ^2 n)dχ^2 n=α (31.127)

andP(χ^2 n) is the PDF of the chi-squared distribution withn=N−Mdegrees of


freedom (see subsection 30.9.4).


An experiment produces the following data sample pairs(xi,yi):

xi: 1 .85 2.72 2.81 3.06 3.42 3.76 4.31 4.47 4.64 4. 99
yi: 2 .26 3.10 3.80 4.11 4.74 4.31 5.24 4.03 5.69 6. 57

where thexi-values are known exactly but eachyi-value is measured only to an accuracy
ofσ=0. 5. At the one-tailed5%significance level, test the null hypothesisH 0 that the
underlying model for the data is a straight liney=mx+c.

These data are the same as those investigated in section 31.6 and plotted in figure 31.9. As
shown previously, the least squares estimates of the slopemand interceptcare given by


mˆ=1. 11 and ˆc=0. 4. (31.128)

Since the error on eachyi-value is drawn independently from a Gaussian distribution with
standard deviationσ, we have


χ^2 (a)=

∑N


i=1

[


yi−f(xi;a)
σ

] 2


=


∑N


i=1

[y
i−mxi−c
σ

] 2


. (31.129)


Inserting the values (31.128) into (31.129), we obtainχ^2 (m,ˆcˆ)=11.5. In our case, the
number of data points isN= 10 and the number of fitted parameters isM= 2. Thus, the

Free download pdf