Understanding Machine Learning: From Theory to Algorithms
19.2 Analysis 261 Proof SinceLD(hS) =E(x,y)∼D[ (^1) [hS(x) 6 =y]], we obtain thatES[LD(hS)] is the probability to sample a train ...
262 Nearest Neighbor Proof From the linearity of expectation, we can rewrite: ES ∑ i:Ci∩S=∅ P[Ci] = ∑r i=1 P[Ci]ES [ (^1 ...
19.2 Analysis 263 Since the number of boxes isr= (1/)dwe get that S,Ex[‖x−xπ^1 (x)‖]≤ √ d ( 2 d−d me + ) Combining the preced ...
264 Nearest Neighbor The exponential dependence on the dimension is known as thecurse of di- mensionality. As we saw, the 1-NN r ...
19.6 Exercises 265 is consistent (with respect to the hypothesis class of all functions fromRdto { 0 , 1 }). A good presentation ...
266 Nearest Neighbor We use the notationy∼pas a shorthand for “yis a Bernoulli random variable with expected valuep.” Prove the ...
19.6 Exercises 267 W.l.o.g. assume thatp≤ 1 /2. Now use Lemma 19.7 to show that y P 1 ,...,yj y∼Pp[hS(x)^6 =y]≤ ( 1 + √ 8 k ) y∼ ...
20 Neural Networks An artificial neural network is a model of computation inspired by the structure of neural networks in the br ...
20.1 Feedforward Neural Networks 269 hope it will find a reasonable solution (as happens to be the case in several practical tas ...
270 Neural Networks and ot+1,j(x) =σ(at+1,j(x)). That is, the input tovt+1,jis a weighted sum of the outputs of the neurons inVt ...
20.3 The Expressive Power of Neural Networks 271 That is, the parameters specifying a hypothesis in the hypothesis class are the ...
272 Neural Networks the functionsgi(x), and therefore can be written as f(x) = sign (k ∑ i=1 gi(x) +k− 1 ) , which concludes our ...
20.3 The Expressive Power of Neural Networks 273 implement conjunctions, disjunctions, and negation of their inputs. Circuit com ...
274 Neural Networks Let us start with a depth 2 network, namely, a network with a single hidden layer. Each neuron in the hidden ...
20.4 The Sample Complexity of Neural Networks 275 Proof To simplify the notation throughout the proof, let us denote the hy- pot ...
276 Neural Networks we only consider networks in which the weights have a short representation as floating point numbers withO(1 ...
20.6 SGD and Backpropagation 277 hope it will find a reasonable solution (as happens to be the case in several practical tasks). ...
278 Neural Networks SGD for Neural Networks parameters: number of iterationsτ step size sequenceη 1 ,η 2 ,...,ητ regularization ...
20.6 SGD and Backpropagation 279 Explaining How Backpropagation Calculates the Gradient: We next explain how the backpropagation ...
280 Neural Networks Next, we discuss how to calculate the partial derivatives with respect to the edges fromVt− 1 toVt, namely, ...
«
9
10
11
12
13
14
15
16
17
18
»
Free download pdf