Wall    time:   5.61    s
Let    us  separate    the index   data    since   we  need    it  regularly:
In  [ 4 ]:  dax =   pd.DataFrame(data.pop(‘^GDAXI’))
The    DataFrame   object  data    now has log return  data    for the 30  DAX stocks:
In  [ 5 ]:  data[data.columns[: 6 ]].head()
Out[5]:                                                 ADS.DE      ALV.DE      BAS.DE      BAYN.DE     BEI.DE      BMW.DE
                                Date
                                2010-01-04          38.51           88.54           44.85               56.40           46.44           32.05
                                2010-01-05          39.72           88.81           44.17               55.37           46.20           32.31
                                2010-01-06          39.40           89.50           44.45               55.02           46.17           32.81
                                2010-01-07          39.74           88.47           44.15               54.30           45.70           33.10
                                2010-01-08          39.60           87.99           44.02               53.82           44.38           32.65
Applying    PCA
Usually,   PCA works   with    normalized  data    sets.   Therefore,  the following   convenience
function   proves  helpful:
In  [ 6 ]:  scale_function  =   lambda x:   (x  -   x.mean())   /   x.std()
For    the beginning,  consider    a   PCA with    multiple    components  (i.e.,  we  do  not restrict    the
number of  components):
[ 43 ]
In  [ 7 ]:  pca =   KernelPCA().fit(data.apply(scale_function))
The    importance  or  explanatory power   of  each    component   is  given   by  its Eigenvalue.
These  are found   in  an  attribute   of  the KernelPCA   object. The analysis    gives   too many
components:
In  [ 8 ]:  len(pca.lambdas_)
Out[8]: 655
Therefore, let us  only    have    a   look    at  the first   10  components. The tenth   component   already
has    almost  negligible  influence:
In  [ 9 ]:  pca.lambdas_[: 10 ].round()
Out[9]: array([ 22816.,         6559.,          2535.,          1558.,              697.,               442.,               378.,
                                                                        255.,               183.,               151.])
We are mainly  interested  in  the relative    importance  of  each    component,  so  we  will
normalize  these   values. Again,  we  use a   convenience function    for this:
In  [ 10 ]: get_we  =   lambda x:   x   /   x.sum()
In  [ 11 ]: get_we(pca.lambdas_)[: 10 ]
Out[11]:    array([ 0.6295725   ,       0.1809903   ,       0.06995609,     0.04300101,     0.01923256,
                                                                    0.01218984,     0.01044098,     0.00704461,     0.00505794,     0.00416612])
With   this    information,    the picture becomes much    clearer.    The first   component   already
explains   about   60% of  the variability in  the 30  time    series. The first   five    components
explain    about   95% of  the variability:
In  [ 12 ]: get_we(pca.lambdas_)[: 5 ].sum()
Out[12]:    0.94275246704834414
Constructing    a   PCA Index
Next,  we  use PCA to  construct   a   PCA (or factor) index   over    time    and compare it  with    the
original   index.  First,  we  have    a   PCA index   with    a   single  component   only:
In  [ 13 ]: pca =   KernelPCA(n_components= 1 ).fit(data.apply(scale_function))
dax[‘PCA_1’]    =   pca.transform(-data)