What Transformations to UsePositively skewed or when the standard deviation is proportional to the
meanThere are two possible transformations for positively skewed data, the square root
transformation for moderate +ve skewness and the logarithmic transformation for data
with a severe positive skew. Both transformations ‘pull-in’ the right tail of a distribution.
Skewness is affected by outliers so check these first.
The logarithmic transformation generally uses log 10 (log to the base 10). Log 10 (10)=1,
means the power to which 10 must be raised to give 1. Similarly, log 10 (1000)=3. When
there are a number of zeros in the data set a constant of 0.5 is added to each data value.
The transformation then becomes log 10 (xi+0.5) where xi= original data value. In SAS
code this would be placed in a DATA step as NEWX= LOG10(OLDX+0.5);. If there
were negative values in the data then the largest negative value should be treated as an
absolute value, (|a|), and +0.5 should be added to |a| to make it positive, i.e.,
log10(xi+(|a|+0.5)) where xi=original data value. Log to the basee (e=2.7182...) can be
used rather than log 10 as this has the same transformation effect. Switching from one base
to another only changes the scale, not the shape of a distribution. Figure 5.12 shows a
histogram and normal probability plot for the log transformed variable CORRD—
percentage correct in difficult reading passage. The relevant SAS code that produced this
output is:
data a;
infile 'a:amanda.dat' lrec1= 72;
input id 1–3 corre 57–58 vocab 67–69 corrd 70–72;
newlog=log10 (corrd+0.5);
label corrd = 'Percentage Correct Syntactic Score
(difficult)';
proc print;
var corrd newlog; run;
proc chart;
vbar newlog;
title1 'Distribution of Log Percentage Correct
Syntactic Scores (DIFFICULT)';
run;proc univariate plot normal;
var newlog;
run;Choosing a statistical test 151