Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

448 Index


forward greedy selection, 360
frequentist, 353
gain, 253
GD,seegradient descent
generalization error, 35
generative models, 342
Gini index, 254
Glivenko-Cantelli, 58
gradient, 158
gradient descent, 185
Gram matrix, 219
growth function, 73
halfspace, 118
homogenous, 118, 205
non-separable, 119
separable, 118
Halving, 289
hidden layers, 270
Hilbert space, 217
Hoeffding’s inequality, 56, 425
hold out, 146
hypothesis, 34
hypothesis class, 36
i.i.d., 38
ID3, 252
improper,seerepresentation independent
inductive bias,seebias
information bottleneck, 317
information gain, 254
instance, 33
instance space, 33
integral image, 143
Johnson-Lindenstrauss lemma, 329
k-means, 311, 313
soft k-means, 352
k-median, 312
k-medoids, 312
Kendall tau, 239
kernel PCA, 326
kernels, 215
Gaussian kernel, 220
kernel trick, 217
polynomial kernel, 220
RBF kernel, 220
label, 33
Lasso, 365, 386
generalization bounds, 386
latent variables, 348
LDA, 347
Ldim, 290, 291
learning curves, 153
least squares, 124
likelihood ratio, 348
linear discriminant analysis,seeLDA
linear predictor, 117

homogenous, 118
linear programming, 119
linear regression, 122
linkage, 310
Lipschitzness, 160, 176, 191
sub-gradient, 190
Littlestone dimension,seeLdim
local minimum, 158
logistic regression, 126
loss, 35
loss function, 48
0-1 loss, 48, 167
absolute value loss, 124, 128, 166
convex loss, 163
generalized hinge-loss, 233
hinge loss, 167
Lipschitz loss, 166
log-loss, 345
logistic loss, 127
ramp loss, 209
smooth loss, 166
square loss, 48
surrogate loss, 167, 302
margin, 203
Markov’s inequality, 422
Massart lemma, 380
max linkage, 310
maximum a-posteriori, 355
maximum likelihood, 343
McDiarmid’s inequality, 378
MDL, 89, 90, 251
measure concentration, 55, 422
Minimum Description Length,seeMDL
mistake bound, 288
mixture of Gaussians, 348
model selection, 144, 147
multiclass, 47, 227, 402
cost-sensitive, 232
linear predictors, 230, 405
multi-vector, 231, 406
Perceptron, 248
reductions, 227, 405
SGD, 235
SVM, 234
multivariate performance measures, 243
Naive Bayes, 347
Natarajan dimension, 402
NDCG, 239
Nearest Neighbor, 258
k-NN, 258
neural networks, 268
feedforward networks, 269
layered networks, 269
SGD, 277
no-free-lunch, 61
non-uniform learning, 84
Free download pdf