34 Dirk Geeraerts and Dirk Speelman
Table 2. The results of the multiple linear regression analysis
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.061618 0.350465 3.029 0.00281
missing.places -0.005888 0.001984 -2.968 0.00341
lack.famil 0.740298 0.142952 5.179 5.94e-07
prop.multiword 2.782169 0.428651 6.491 8.04e-10
non.uniqueness 0.053341 0.007283 7.324 7.78e-12
neg.affect 0.540066 0.120095 4.497 1.23e-05
Signif. codes: 0 '' 0.001 '' 0.01 '' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.216 on 180 degrees of freedom
Multiple R-squared: 0.6232
Adjusted R-squared: 0.6128
F-statistic: 59.55 on 5 and 180 DF, p-value: < 2.2e-16
Before we have a closer look at the results, a number of technical remarks
need to be made; these will be relevant only for those readers who are fa-
miliar with the technical apparatus of a regression analysis. First, because
the residual values are not normally distributed when heterogeneity as such
is used as the response variable, the regression analysis is based on the
logarithm of heterogeneity. Second, to avoid cases of extreme data sparse-
ness, we have restricted the analysis to concepts that are attested in at least
ten places. This leaves us with 186 of the original 206 concepts. Third, two
interactions need to be mentioned in addition to the basic results. For one
thing, lack of familiarity enhances heterogeneity only in the case of low or
medium non-uniqueness, but it has no effect in the case of extremely high
non-uniqueness. The second interaction is similar: negative affect triggers
heterogeneity only in the case of low or medium non-uniqueness, but it has
no effect in the case of extremely high non-uniqueness. Because both inte-
ractions do not substantially influence the analysis (neither from a technical
nor from an interpretative point of view), we consider it legitimate to simp-
ly focus on the model without interactions in the rest of the discussion -
even though the model with the interactions is intrinsically more accurate.
Fourth, we find 3 outliers and 19 influential observations in the data set.
Leaving these 22 observations out of the analysis yields a slightly better
model than the one presented in the table: we reach an adjusted R-squared
of 0.7173, and the standard error for residuals decreases slightly. However,