Nature 2020 01 30 Part.02

(Grace) #1

Article


Code availability


Source code for the distogram, reference distogram and torsion
prediction neural networks, together with the neural network weights
and input data for the CASP13 targets are available for research and
non-commercial use at https://github.com/deepmind/deepmind-
research/tree/master/alphafold_casp13. We make use of several
open-source libraries to conduct our experiments, particularly
HHblits^36 , PSI-BLAST^37 and the machine-learning framework Tensor-
Flow (https://github.com/tensorflow/tensorflow) along with the Ten-
sorFlow library Sonnet (https://github.com/deepmind/sonnet), which
provides implementations of individual model components^50. We also
used Rosetta^9 under license.



  1. Dawson, N. L. et al. CATH: an expanded resource to predict protein function through
    structure and sequence. Nucleic Acids Res. 45 , D289–D295 (2017).

  2. Mirdita, M. et al. Uniclust databases of clustered and deeply annotated protein
    sequences and alignments. Nucleic Acids Res. 45 , D170–D176 (2017).

  3. Remmert, M., Biegert, A., Hauser, A. & Söding, J. HHblits: lightning-fast iterative
    protein sequence searching by HMM–HMM alignment. Nat. Methods 9 , 173–175
    (2012).

  4. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database
    search programs. Nucleic Acids Res. 25 , 3389–3402 (1997).

  5. Yu, F. & Koltun, V. Multi-scale context aggregation by dilated convolutions. Preprint at
    arXiv https://arxiv.org/abs/1511.07122 (2015).

  6. Oord, A. d. et al. Wavenet: a generative model for raw audio. Preprint at arXiv https://arxiv.
    org/abs/1609.03499 (2016).

  7. Clevert, D.-A., Unterthiner, T. & Hochreiter, S. Fast and accurate deep network learning
    by exponential linear units (ELUs). Preprint at arXiv https://arxiv.org/abs/1511.07289
    (2015).

  8. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a
    simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15 , 1929–1958
    (2014).

  9. Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern
    recognition of hydrogen-bonded and geometrical features. Biopolymers 22 ,
    2577–2637 (1983).

  10. Yang, Y. et al. Sixty-five years of the long march in protein secondary structure prediction:
    the final stretch? Briefings Bioinf. 19 , 482–494 (2018).

  11. Zemla, A., Venclovas, C., Moult, J. & Fidelis, K. Processing and analysis of CASP3 protein
    structure predictions. Proteins 37 , 22–29 (1999).

  12. Mariani, V., Biasini, M., Barbato, A. & Schwede, T. lDDT: a local superposition-free score for
    comparing protein structures and models using distance difference tests. Bioinformatics
    29 , 2722–2728 (2013).
    46. Abriata, L. A., Tamo, G. E. & Dal Peraro, M. A further leap of improvement in tertiary
    structure prediction in CASP13 prompts new routes for future assessments. Proteins 87 ,
    1100–1112 (2019).
    47. Kayikci, M. et al. Visualization and analysis of non-covalent contacts using the Protein
    Contacts Atlas. Nat. Struct. Mol. Biol. 25 , 185–194 (2018).
    48. Croll, T. I. et al. Evaluation of template-based modeling in CASP13. Proteins 87 , 1113–1127
    (2019).
    49. Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In Proc. 34th
    International Conference on Machine Learning Vol. 70, 3319–3328 (2017).
    50. Abadi, M. et al. Tensorflow: a system for large-scale machine learning. In Proc. 12th
    USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) 265–283
    (2016).
    51. Söding, J., Biegert, A. & Lupas, A. N. The HHpred interactive server for protein homology
    detection and structure prediction. Nucleic Acids Res. 33 , W244–W248 (2005).
    52. Cong, Q. et al. An automatic method for CASP9 free modeling structure prediction
    assessment. Bioinformatics 27 , 3371–3378 (2011).
    53. Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the
    TM-score. Nucleic Acids Res. 33 , 2302–2309 (2005).
    54. Tovchigrechko, A., Wells, C. A. & Vakser, I. A. Docking of protein models. Protein Sci. 11 ,
    1888–1896 (2002).
    55. Audet, M. et al. Crystal structure of misoprostol bound to the labor inducer prostaglandin
    E 2 receptor. Nat. Chem. Biol. 15 , 11–17 (2019).


Acknowledgements We thank C. Meyer for assistance in preparing the paper; B. Coppin, O.
Vinyals, M. Barwinski, R. Sun, C. Elkin, P. Dolan, M. Lai and Y. Li for their contributions and
support; O. Ronneberger for reading the paper; the rest of the DeepMind team for their
support; the CASP13 organisers and the experimentalists whose structures enabled the
assessment.
Author contributions R.E., J.J., J.K., L.S., A.W.S., C.Q., T.G., A.Ž., A.B., H.P. and K.S. designed and
built the AlphaFold system with advice from D.S., K.K. and D.H. D.T.J. provided advice and
guidance on protein structure prediction methodology. S.P. contributed to software
engineering. S.C., A.W.R.N., K.K. and D.H. managed the project. J.K., A.W.S., T.G., A.Ž., A.B., R.E.,
P.K. and J.J. analysed the CASP results for the paper. A.W.S. and J.K. wrote the paper with
contributions from J.J., R.E., L.S., T.G., A.B., A.Ž., D.T.J., P.K., K.K. and D.H. A.W.S. led the team.

Competing interests A.W.S., J.K., T.G., J.J., L.S., R.E., H.P., C.Q., K.S., A.Ž. and A.B. have filed
provisional patent applications relating to machine learning for predicting protein structures.
The remaining authors declare no competing interests.
Additional information
Supplementary information is available for this paper at https://doi.org/10.1038/s41586-019-
1923-7.
Correspondence and requests for materials should be addressed to A.W.S.
Peer review information Nature thanks Mohammed AlQuraishi and the other, anonymous,
reviewer(s) for their contribution to the peer review of this work.
Reprints and permissions information is available at http://www.nature.com/reprints.
Free download pdf