REFERENCES 715
Choudrey, R. A. and S. J. Roberts (2003). Variational
mixture of Bayesian independent component an-
alyzers.Neural Computation 15 (1), 213–252.
Clifford, P. (1990). Markov random fields in statis-
tics. In G. R. Grimmett and D. J. A. Welsh (Eds.),
Disorder in Physical Systems. A Volume in Hon-
our of John M. Hammersley, pp. 19–32. Oxford
University Press.
Collins, M., S. Dasgupta, and R. E. Schapire (2002).
A generalization of principal component analy-
sis to the exponential family. In T. G. Dietterich,
S. Becker, and Z. Ghahramani (Eds.),Advances
in Neural Information Processing Systems, Vol-
ume 14, pp. 617–624. MIT Press.
Comon, P., C. Jutten, and J. Herault (1991). Blind
source separation, 2: problems statement.Signal
Processing 24 (1), 11–20.
Corduneanu, A. and C. M. Bishop (2001). Vari-
ational Bayesian model selection for mixture
distributions. In T. Richardson and T. Jaakkola
(Eds.),Proceedings Eighth International Confer-
ence on Artificial Intelligence and Statistics, pp.
27–34. Morgan Kaufmann.
Cormen, T. H., C. E. Leiserson, R. L. Rivest, and
C. Stein (2001).Introduction to Algorithms(Sec-
ond ed.). MIT Press.
Cortes, C. and V. N. Vapnik (1995). Support vector
networks.Machine Learning 20 , 273–297.
Cotter, N. E. (1990). The Stone-Weierstrass theo-
rem and its application to neural networks.IEEE
Transactions on Neural Networks 1 (4), 290–295.
Cover, T. and P. Hart (1967). Nearest neighbor pat-
tern classification.IEEE Transactions on Infor-
mation TheoryIT-11, 21–27.
Cover, T. M. and J. A. Thomas (1991).Elements of
Information Theory. Wiley.
Cowell, R. G., A. P. Dawid, S. L. Lauritzen, and D. J.
Spiegelhalter (1999).Probabilistic Networks and
Expert Systems. Springer.
Cox, R. T. (1946). Probability, frequency and
reasonable expectation. American Journal of
Physics 14 (1), 1–13.
Cox, T. F. and M. A. A. Cox (2000).Multidimen-
sional Scaling(Second ed.). Chapman and Hall.
Cressie, N. (1993).Statistics for Spatial Data. Wiley.
Cristianini, N. and J. Shawe-Taylor (2000).Support
vector machines and other kernel-based learning
methods. Cambridge University Press.
Csato, L. and M. Opper (2002). Sparse on-line Gaus- ́
sian processes.Neural Computation 14 (3), 641–
668.
Csisz`ar, I. and G. Tusnady (1984). Information ge-`
ometry and alternating minimization procedures.
Statistics and Decisions 1 (1), 205–237.
Cybenko, G. (1989). Approximation by superposi-
tions of a sigmoidal function.Mathematics of
Control, Signals and Systems 2 , 304–314.
Dawid, A. P. (1979). Conditional independence in
statistical theory (with discussion).Journal of the
Royal Statistical Society, Series B 4 , 1–31.
Dawid, A. P. (1980). Conditional independence for
statistical operations.Annals of Statistics 8 , 598–
617.
deFinetti, B. (1970).Theory of Probability. Wiley
and Sons.
Dempster, A. P., N. M. Laird, and D. B. Rubin
(1977). Maximum likelihood from incomplete
data via the EM algorithm.Journal of the Royal
Statistical Society, B 39 (1), 1–38.
Denison, D. G. T., C. C. Holmes, B. K. Mallick,
and A. F. M. Smith (2002).Bayesian Methods for
Nonlinear Classification and Regression. Wiley.
Diaconis, P. and L. Saloff-Coste (1998). What do we
know about the Metropolis algorithm? Journal
of Computer and System Sciences 57 , 20–36.
Dietterich, T. G. and G. Bakiri (1995). Solving
multiclass learning problems via error-correcting
output codes.Journal of Artificial Intelligence
Research 2 , 263–286.
Duane, S., A. D. Kennedy, B. J. Pendleton, and
D. Roweth (1987). Hybrid Monte Carlo.Physics
Letters B 195 (2), 216–222.
Duda, R. O. and P. E. Hart (1973).Pattern Classifi-
cation and Scene Analysis. Wiley.