REFERENCES 723
Nag, R., K. Wong, and F. Fallside (1986). Script
recognition using hidden markov models. In
ICASSP86, pp. 2071–2074. IEEE.
Neal, R. M. (1993). Probabilistic inference using
Markov chain Monte Carlo methods. Technical
Report CRG-TR-93-1, Department of Computer
Science, University of Toronto, Canada.
Neal, R. M. (1996).Bayesian Learning for Neural
Networks. Springer. Lecture Notes in Statistics
118.
Neal, R. M. (1997). Monte Carlo implementation of
Gaussian process models for Bayesian regression
and classification. Technical Report 9702, De-
partment of Computer Statistics, University of
Toronto.
Neal, R. M. (1999). Suppressing random walks in
Markov chain Monte Carlo using ordered over-
relaxation. In M. I. Jordan (Ed.),Learning in
Graphical Models, pp. 205–228. MIT Press.
Neal, R. M. (2000). Markov chain sampling for
Dirichlet process mixture models. Journal of
Computational and Graphical Statistics 9 , 249–
265.
Neal, R. M. (2003). Slice sampling.Annals of Statis-
tics 31 , 705–767.
Neal, R. M. and G. E. Hinton (1999). A new view of
the EM algorithm that justifies incremental and
other variants. In M. I. Jordan (Ed.),Learning in
Graphical Models, pp. 355–368. MIT Press.
Nelder, J. A. and R. W. M. Wedderburn (1972). Gen-
eralized linear models.Journal of the Royal Sta-
tistical Society, A 135 , 370–384.
Nilsson, N. J. (1965).Learning Machines. McGraw-
Hill. Reprinted asThe Mathematical Founda-
tions of Learning Machines, Morgan Kaufmann,
(1990).
Nocedal, J. and S. J. Wright (1999).Numerical Op-
timization. Springer.
Nowlan, S. J. and G. E. Hinton (1992). Simplifying
neural networks by soft weight sharing.Neural
Computation 4 (4), 473–493.
Ogden, R. T. (1997).Essential Wavelets for Statisti-
cal Applications and Data Analysis. Birkhauser. ̈
Opper, M. and O. Winther (1999). A Bayesian ap-
proach to on-line learning. In D. Saad (Ed.),On-
Line Learning in Neural Networks, pp. 363–378.
Cambridge University Press.
Opper, M. and O. Winther (2000a). Gaussian
processes and SVM: mean field theory and
leave-one-out. In A. J. Smola, P. L. Bartlett,
B. Sch ̈olkopf, and D. Shuurmans (Eds.),Ad-
vances in Large Margin Classifiers, pp. 311–326.
MIT Press.
Opper, M. and O. Winther (2000b). Gaussian
processes for classification. Neural Computa-
tion 12 (11), 2655–2684.
Osuna, E., R. Freund, and F. Girosi (1996). Support
vector machines: training and applications. A.I.
Memo AIM-1602, MIT.
Papoulis, A. (1984).Probability, Random Variables,
and Stochastic Processes(Second ed.). McGraw-
Hill.
Parisi, G. (1988).Statistical Field Theory. Addison-
Wesley.
Pearl, J. (1988).Probabilistic Reasoning in Intelli-
gent Systems. Morgan Kaufmann.
Pearlmutter, B. A. (1994). Fast exact multiplication
by the Hessian.Neural Computation 6 (1), 147–
160.
Pearlmutter, B. A. and L. C. Parra (1997). Maximum
likelihood source separation: a context-sensitive
generalization of ICA. In M. C. Mozer, M. I. Jor-
dan, and T. Petsche (Eds.),Advances in Neural
Information Processing Systems, Volume 9, pp.
613–619. MIT Press.
Pearson, K. (1901). On lines and planes of closest fit
to systems of points in space.The London, Edin-
burgh and Dublin Philosophical Magazine and
Journal of Science, Sixth Series 2 , 559–572.
Platt, J. C. (1999). Fast training of support vector
machines using sequential minimal optimization.
In B. Sch ̈olkopf, C. J. C. Burges, and A. J. Smola
(Eds.),Advances in Kernel Methods – Support
Vector Learning, pp. 185–208. MIT Press.