Pattern Recognition and Machine Learning
5.5. Regularization in Neural Networks 261 0 10 20 30 40 50 0.15 0.2 0.25 0 10 20 30 40 50 0.35 0.4 0.45 Figure 5.12 An illustra ...
262 5. NEURAL NETWORKS Figure 5.13 A schematic illustration of why early stopping can give similar results to weight decay in th ...
5.5. Regularization in Neural Networks 263 Figure 5.14 Illustration of the synthetic warping of a handwritten digit. The origina ...
264 5. NEURAL NETWORKS will be one-dimensional, and will be parameterized byξ. Let the vector that results from acting onxnby th ...
5.5. Regularization in Neural Networks 265 Figure 5.16 Illustration showing (a) the original imagexof a hand- written digit, (b) ...
266 5. NEURAL NETWORKS in which the parameterξis drawn from a distributionp(ξ), then the error function defined over this expand ...
5.5. Regularization in Neural Networks 267 We can further simplify this regularization term as follows. In Section 1.5.5 we saw ...
268 5. NEURAL NETWORKS Input image Convolutional layer Sub-sampling layer Figure 5.17 Diagram illustrating part of a convolution ...
5.5. Regularization in Neural Networks 269 the network outputs to translations and distortions of the input image. Because we wi ...
270 5. NEURAL NETWORKS Recall that the simple weight decay regularizer, given in (5.112), can be viewed as the negative log of a ...
5.5. Regularization in Neural Networks 271 The effect of the regularization term is therefore to pull each weight towards the ce ...
272 5. NEURAL NETWORKS Figure 5.18 The left figure shows a two-link robot arm, in which the Cartesian coordinates(x 1 ,x 2 )of t ...
5.6. Mixture Density Networks 273 Figure 5.19 On the left is the data set for a simple ‘forward problem’ in which the red curve ...
274 5. NEURAL NETWORKS x 1 xD θ 1 θM θ t p(t|x) Figure 5.20 Themixture density networkcan represent general conditional probabil ...
5.6. Mixture Density Networks 275 directly by the network output activations μkj(x)=aμkj. (5.152) The adaptive parameters of the ...
276 5. NEURAL NETWORKS Figure 5.21 (a) Plot of the mixing coefficientsπk(x)as a function of xfor the three kernel functions in a ...
5.7. Bayesian Neural Networks 277 where we have used (5.148). Because a standard network trained by least squares is approximati ...
278 5. NEURAL NETWORKS to the posterior distribution (Hinton and van Camp, 1993) and also using a full- covariance Gaussian (Bar ...
5.7. Bayesian Neural Networks 279 form lnp(w|D)=− α 2 wTw− β 2 ∑N n=1 {y(xn,w)−tn}^2 +const (5.165) which corresponds to a regul ...
280 5. NEURAL NETWORKS where the input-dependent variance is given by σ^2 (x)=β−^1 +gTA−^1 g. (5.173) We see that the predictive ...
«
10
11
12
13
14
15
16
17
18
19
»
Free download pdf