Understanding Machine Learning: From Theory to Algorithms

19.2 Analysis 261 Proof SinceLD(hS) =E(x,y)∼D[ (^1) [hS(x) 6 =y]], we obtain thatES[LD(hS)] is the probability to sample a train ...

262 Nearest Neighbor Proof From the linearity of expectation, we can rewrite: ES   ∑ i:Ci∩S=∅ P[Ci]   = ∑r i=1 P[Ci]ES [ (^1 ...

19.2 Analysis 263 Since the number of boxes isr= (1/)dwe get that S,Ex[‖x−xπ^1 (x)‖]≤ √ d ( 2 d−d me + ) Combining the preced ...

264 Nearest Neighbor The exponential dependence on the dimension is known as thecurse of di- mensionality. As we saw, the 1-NN r ...

19.6 Exercises 265 is consistent (with respect to the hypothesis class of all functions fromRdto { 0 , 1 }). A good presentation ...

266 Nearest Neighbor We use the notationy∼pas a shorthand for “yis a Bernoulli random variable with expected valuep.” Prove the ...

19.6 Exercises 267 W.l.o.g. assume thatp≤ 1 /2. Now use Lemma 19.7 to show that y P 1 ,...,yj y∼Pp[hS(x)^6 =y]≤ ( 1 + √ 8 k ) y∼ ...

20 Neural Networks An artificial neural network is a model of computation inspired by the structure of neural networks in the br ...

20.1 Feedforward Neural Networks 269 hope it will find a reasonable solution (as happens to be the case in several practical tas ...

270 Neural Networks and ot+1,j(x) =σ(at+1,j(x)). That is, the input tovt+1,jis a weighted sum of the outputs of the neurons inVt ...

20.3 The Expressive Power of Neural Networks 271 That is, the parameters specifying a hypothesis in the hypothesis class are the ...

272 Neural Networks the functionsgi(x), and therefore can be written as f(x) = sign (k ∑ i=1 gi(x) +k− 1 ) , which concludes our ...

20.3 The Expressive Power of Neural Networks 273 implement conjunctions, disjunctions, and negation of their inputs. Circuit com ...

274 Neural Networks Let us start with a depth 2 network, namely, a network with a single hidden layer. Each neuron in the hidden ...

20.4 The Sample Complexity of Neural Networks 275 Proof To simplify the notation throughout the proof, let us denote the hy- pot ...

276 Neural Networks we only consider networks in which the weights have a short representation as floating point numbers withO(1 ...

20.6 SGD and Backpropagation 277 hope it will find a reasonable solution (as happens to be the case in several practical tasks). ...

278 Neural Networks SGD for Neural Networks parameters: number of iterationsτ step size sequenceη 1 ,η 2 ,...,ητ regularization ...

20.6 SGD and Backpropagation 279 Explaining How Backpropagation Calculates the Gradient: We next explain how the backpropagation ...

280 Neural Networks Next, we discuss how to calculate the partial derivatives with respect to the edges fromVt− 1 toVt, namely, ...

«
9
10
11
12
13
14
15
16
17
18
»

Free download pdf

Get our desktop app

Company

Features

Documentation

Resources