Nature | Vol 582 | 25 June 2020 | 595Overall, 38.4% of the identified proteins did not have any functional
annotation for the biological processes, and interestingly this was
true even for 22.9% of the 100 most highly abundant proteins of each
species at the biological-process level, and for 10% when considering
protein functional domains (Extended Data Fig. 7 and Supplemen-
tary Table 6). Thus, our data point to a very large number of highly
expressed proteins without any functional annotation or sequence
homology to proteins with known gene ontology terms. Exploration
of this part of the ‘dark proteome’ would be attractive: these proteins
may indicate essential but unique features in the evolutionary develop-
ment of these organisms that may be of biological or biotechnological
interest.P < 0.05
P < 0.01All proteins
Proteinqualitycontrolformisfolded
orincompletelysynthesizedproteins10987110 100 200 300 400 500 600 700 800(i) Respiratoryelectrontransportchain (8)
(ii) Proteinqualitycontrolformisfoldedor
incompletelysynthesizedproteins (15)
(iii) Post-translationalproteintargetingto
membrane,translocation (5)
(iv) Ribosomal RNA processing (63)
(v) Cellwallorganization (58)
(vi) Proteinphosphorylation (376)
(vii) Peptidyl-threoninephosphorylation (7)Ranklog10-transformed intensity0 2,000 4,000 6,000 8,000 10,000 12,000log10-transformed intensityRankGlycine max Vitis vinifera(i) Lon protease homologue, mitochondrial
(ii) ATP-dependent Clp protease proteolytic subunitFunctional annotation
Homology(iii) Lon protease homologue 2, peroxisomal
(iv) ATP-dependent Clp protease proteolytic subunitProject
TaxonomyAnnotationProtein
Functional region(i)
(i)(ii)(ii)(iii)(iii)(iv)(iv)
(v)
(vi)(vii)abcdProteinqualitycontrolformisfolded
orincompletelysynthesizedproteins(i) Lonprotease homologue,mitochondrial
(ii) ATP-dependent Clp proteaseproteolytic subunit
(iii) Lon protease homologue2, peroxisomal
(iv)ATP-dependentClpproteaseproteolyticsubunit(i)(i)(ii)(iii)(iv)1098711120
0102030405060708090100Abundance-rankedproteins (%)Portionofprotein mass(%)10 20 30 40 50 60 70 80 90 100Minimum to maximumMedian with 25–75% percentileFig. 3 | Organism-resolved integration of proteome data into a global
analysis. a, Cumulative protein intensities (ranked by abundance; x axis) and
their contribution to total protein mass (y axis) across all organisms (n = 100
organisms). b, Exemplified structure from the data model of the graph
database, illustrating the connection between two homologous proteins of
G. max and V. vinifera, and related annotations. c, All quantified proteins
from G. max are displayed, plotting their intensities against their rank in the
dynamic range. All proteins for which the functions are associated with
‘protein quality control for misfolded or incomplete synthesized proteins’
are highlighted. d, Significantly enriched functions (grey circles, P < 0.05;
red circles, P < 0.01) within the proteome of G. max (with seven specific
examples) and their distribution across the dynamic range (sample sizes in
parentheses; one-sided Mann–Whitney U-test to the mean functional
expression level). Error bars represent minimum to maximum values, and
boxes show 10–90% percentiles.