For instance, with our previous example, we can uselapplyas follows:
d
kids ages
1 Jack 12
2 Jill 10
dl <- lapply(d,sort)
dl
$kids
[1] "Jack" "Jill"
$ages
[1] 10 12
So,dlis a list consisting of two vectors, the sorted versions ofkids
andages.
Note thatdlis just a list, not a data frame. We could coerce it to a data
frame, like this:
as.data.frame(dl)
kids ages
1 Jack 10
2 Jill 12
But this would make no sense, as the correspondence between names
and ages has been lost. Jack, for instance, is now listed as 10 years old
instead of 12. (But if we wished to sort the data frame with respect to one
of the columns, preserving the correspondences, we could follow the
approach presented on page 135.)
5.4.2 Extended Example: Applying Logistic Regression Models...........
Let’s run a logistic regression model on the abalone data we used in Sec-
tion 2.9.2, predicting gender from the other eight variables: height, weight,
rings, and so on, one at a time.
The logistic model is used to predict a 0- or 1-valued random variableY
from one or more explanatory variables. The function value is the probabil-
ity thatY= 1, given the explanatory variables. Let’s say we have just one of
the latter,X. Then the model is as follows:
Pr(Y=1|X=t)=
1
1 + exp[−(β 0 +β 1 t)]
As with linear regression models, theβivalues are estimated from the
data, using the functionglm()with the argumentfamily=binomial.
Data Frames 113