Pattern Recognition and Machine Learning

266 5. NEURAL NETWORKS

in which the parameterξis drawn from a distributionp(ξ), then the error function defined over this expanded data set can be written as

E ̃=^1 2

∫∫∫ {y(s(x,ξ))−t}^2 p(t|x)p(x)p(ξ)dxdtdξ. (5.130)

We now assume that the distributionp(ξ)has zero mean with small variance, so that we are only considering small transformations of the original input vectors. We can then expand the transformation function as a Taylor series in powers ofξto give

s(x,ξ)=s(x,0) +ξ

∂

∂ξ

s(x,ξ)

∣ ∣ ∣ ∣ ξ=0

+

ξ^2 2

∂^2

∂ξ^2

s(x,ξ)

∣ ∣ ∣ ∣ ξ=0

+O(ξ^3 )

= x+ξτ+

1

2

ξ^2 τ′+O(ξ^3 )

whereτ′denotes the second derivative ofs(x,ξ)with respect toξevaluated atξ=0. This allows us to expand the model function to give

y(s(x,ξ)) =y(x)+ξτT∇y(x)+

ξ^2 2

[ (τ′) T ∇y(x)+τT∇∇y(x)τ

] +O(ξ^3 ).

Substituting into the mean error function (5.130) and expanding, we then have

E ̃ =^1 2

∫∫ {y(x)−t}^2 p(t|x)p(x)dxdt

+ E[ξ]

∫∫ {y(x)−t}τT∇y(x)p(t|x)p(x)dxdt

+ E[ξ^2 ]

∫∫[ {y(x)−t}

1

2

{ (τ′) T ∇y(x)+τT∇∇y(x)τ

}

+

( τT∇y(x)

) 2 ] p(t|x)p(x)dxdt+O(ξ^3 ).

Because the distribution of transformations has zero mean we haveE[ξ]=0. Also, we shall denoteE[ξ^2 ]byλ. Omitting terms ofO(ξ^3 ), the average error function then becomes E ̃=E+λΩ (5.131) whereEis the original sum-of-squares error, and the regularization termΩtakes the form

Ω=

∫ [ {y(x)−E[t|x]}

1

2

{ (τ′) T ∇y(x)+τT∇∇y(x)τ

}

+

( τT∇y(x)

) 2 ] p(x)dx (5.132)

in which we have performed the integration overt.

Pattern Recognition and Machine Learning

266 5. NEURAL NETWORKS

∂

+

∂^2

1

2

1

2

+

Ω=

1

2

+

Get our desktop app

Company

Features

Documentation

Resources