Understanding Machine Learning: From Theory to Algorithms

1.6 Notation 27

Chapter 30.

Chapters 12, 13.

Chapter 14.

Chapter 8.

Chapter 17.

Chapter 29.

Chapter 19.

Chapter 20.

Chapter 21.

1.6 Notation

Most of the notation we use throughout the book is either standard or defined on the spot. In this section we describe our main conventions and provide a table summarizing our notation (Table 1.1). The reader is encouraged to skip this section and return to it if during the reading of the book some notation is unclear. We denote scalars and abstract objects with lowercase letters (e.g.xandλ). Often, we would like to emphasize that some object is a vector and then we use boldface letters (e.g.xandλ). Theith element of a vectorxis denoted byxi. We use uppercase letters to denote matrices, sets, and sequences. The meaning should be clear from the context. As we will see momentarily, the input of a learning algorithm is a sequence of training examples. We denote byzan abstract example and byS=z 1 ,...,zma sequence ofmexamples. Historically, Sis often referred to as a trainingset; however, we will always assume thatSis asequencerather than a set. A sequence ofmvectors is denoted byx 1 ,...,xm. Theith element ofxtis denoted byxt,i. Throughout the book, we make use of basic notions from probability. We denote byDa distribution over some set,^2 for example,Z. We use the notation z∼ Dto denote thatzis sampled according toD. Given a random variable f:Z→R, its expected value is denoted byEz∼D[f(z)]. We sometimes use the shorthandE[f] when the dependence onzis clear from the context. Forf:Z→ {true,false}we also usePz∼D[f(z)] to denoteD({z:f(z) =true}). In the next chapter we will also introduce the notationDmto denote the probability overZminduced by sampling (z 1 ,...,zm) where each pointziis sampled from Dindependently of the other points. In general, we have made an effort to avoid asymptotic notation. However, we occasionally use it to clarify the main results. In particular, givenf:R→R+ andg:R→R+we writef=O(g) if there existx 0 ,α∈R+such that for all x > x 0 we havef(x)≤αg(x). We writef=o(g) if for everyα >0 there exists

(^2) To be mathematically precise,Dshould be defined over someσ-algebra of subsets ofZ.
The user who is not familiar with measure theory can skip the few footnotes and remarks
regarding more formal measurability definitions and assumptions.

Understanding Machine Learning: From Theory to Algorithms

1.6 Notation

Get our desktop app

Company

Features

Documentation

Resources