1 2.0 3.3 4.0
2 3.3 2.0 3.7
3 4.0 4.3 4.0
4 2.3 0.0 3.3
5 2.3 1.0 3.3
6 3.3 3.7 4.0
Lacking a header for the data, R named the columnsV1,V2, andV3. Row
numbers appear on the left. As you might be thinking, it would be better
to have a header in our data file, with meaningful names likeExam1. In later
examples, we will usually specify names.
Let’s try to predict the exam 2 score (given in the second column of
examsquiz) from exam 1 (first column):
lma <- lm(examsquiz[,2] ~ examsquiz[,1])
Thelm()(forlinear model) function call here instructs R to fit this predic-
tion equation:
predicted Exam 2 =β 0 +β 1 Exam 1
Here,β 0 andβ 1 are constants to be estimated from our data. In other
words, we are fitting a straight line to the (exam 1, exam 2) pairs in our
data. This is done through a classic least-squares method. (Don’t worry if
you don’t have background in this.)
Note that the exam 1 scores, which are stored in the first column of our
data frame, are collectively referred to asexamsquiz[,1]. Omission of the first
subscript (the row number) means that we are referring to an entire column
of the frame. The exam 2 scores are similarly referenced. So, our call tolm()
above predicts the second column ofexamsquizfrom the first.
We also could have written
lma <- lm(examsquiz$V2 ~ examsquiz$V1)
recalling that a data frame is just a list whose elements are vectors. Here, the
columns are theV1,V2, andV3components of the list.
The results returned bylm()are now in an object that we’ve stored in
the variablelma. It is an instance of the classlm. We can list its components
by callingattributes():
attributes(lma)
$names
[1] "coefficients" "residuals" "effects" "rank"
[5] "fitted.values" "assign" "qr" "df.residual"
[9] "xlevels" "call" "terms" "model"
$class
[1] "lm"
Getting Started 17