Robust estimation of style analysis coefficients 167
both in the constituent series (inX) and in the portfolio returns (inY). The positions
of outlier contamination has been set both to 1% and to 5%. We considered median
regression (θ= 0 .5) and we usedT=250, 500, 1000 as sample sizes. We carried out
1000 Monte Carlo runs foreach simulation of the experimental set up.
Figure 1 depicts the impact of outlying observations on LS and QR estimators.
Each row of the panel graph refers to a portfolio constituent (i= 1 ,...,5) while
the columns show the different cases of presence of outliers: no outliers, outliers
in portfolio returns (inY), outliers in constituent returns (inX), and outliers both in
portfolio returns and in constituent returns (inXandY). In each panel the leftboxplot
refers to the LS estimator while the right one depicts the QR estimator behaviour. As
expected, the impact of outlying observation can be very serious on LS estimates
of style coefficients, especially when considering outliers in the constituent series.
It is worth noticing that the variability of LS estimates increases very much and
this can have serious practical drawbacks since the style coefficients vary in the
unit interval: a large variability of the estimates induces results with limited practical
utility. Clearly, when no outlying observations are present in the data, the LS estimates
are more efficient than quantile estimates. However, the differences between the two
distributions are not so evident. Although it is well known that quantile regression
estimators are robust only inY[16], the simulation study shows more evidence of
robustness in the case of outliers in constituent returns (third column of panels in
Fig. 1). A possible explanation can be given by considering the presence of the
double constraint, which forceseach estimated coefficient to be inside the unit interval.
However, a formal study based on the influence function of the constrained estimators
is not available at the moment. This issue is still under investigation.
In order to obtain information on consistency of the constrained median esti-
mators, we use different values for the length of time series. Figure 2 depicts the
behaviour of the QR estimator forT=250 (left boxplot ineach panel),T= 500
(middle boxplot) andT=1000 (right boxplot). As in the previous figure, the rows
of the plot refer to the different constituents while the columns report the different
cases treated in our simulation with respect to the presence of outlying observations.
It is evident that in any case efficiency increases as sample size increases.
Figures 1 and 2 are built using a percentage of outlier contamination set to 1%.
Similar patterns have been noticed for the case of 5% contamination and so the
related plots are not reported here for the sake of brevity. Using a percentage of
outlier contamination set to 5%, as expected, an increase in the variability of the QR
estimator is observed, although there is only a very limited difference between the
two cases. For the sake of space we do not include any results for the comparison
between the different cases of outlier contamination considered. It is straightforward
to note, anyway, that increases in the variability of the QR estimator due to an increase
in the percentage of outlier contamination are counterbalanced moving the number
of observations fromT=250 toT=500 and then toT=1000.