Nature | Vol 582 | 25 June 2020 | 595
Overall, 38.4% of the identified proteins did not have any functional
annotation for the biological processes, and interestingly this was
true even for 22.9% of the 100 most highly abundant proteins of each
species at the biological-process level, and for 10% when considering
protein functional domains (Extended Data Fig. 7 and Supplemen-
tary Table 6). Thus, our data point to a very large number of highly
expressed proteins without any functional annotation or sequence
homology to proteins with known gene ontology terms. Exploration
of this part of the ‘dark proteome’ would be attractive: these proteins
may indicate essential but unique features in the evolutionary develop-
ment of these organisms that may be of biological or biotechnological
interest.
P < 0.05
P < 0.01
All proteins
Proteinqualitycontrolformisfolded
orincompletelysynthesizedproteins
10
9
8
7
11
0 100 200 300 400 500 600 700 800
(i) Respiratoryelectrontransportchain (8)
(ii) Proteinqualitycontrolformisfoldedor
incompletelysynthesizedproteins (15)
(iii) Post-translationalproteintargetingto
membrane,translocation (5)
(iv) Ribosomal RNA processing (63)
(v) Cellwallorganization (58)
(vi) Proteinphosphorylation (376)
(vii) Peptidyl-threoninephosphorylation (7)
Rank
log
10
-transformed intensity
0 2,000 4,000 6,000 8,000 10,000 12,000
log
10
-transformed intensity
Rank
Glycine max Vitis vinifera
(i) Lon protease homologue, mitochondrial
(ii) ATP-dependent Clp protease proteolytic subunit
Functional annotation
Homology
(iii) Lon protease homologue 2, peroxisomal
(iv) ATP-dependent Clp protease proteolytic subunit
Project
Taxonomy
Annotation
Protein
Functional region
(i)
(i)
(ii)
(ii)
(iii)
(iii)
(iv)
(iv)
(v)
(vi)
(vii)
ab
cd
Proteinqualitycontrolformisfolded
orincompletelysynthesizedproteins
(i) Lonprotease homologue,mitochondrial
(ii) ATP-dependent Clp proteaseproteolytic subunit
(iii) Lon protease homologue2, peroxisomal
(iv)ATP-dependentClpproteaseproteolyticsubunit
(i)(i)
(ii)
(iii)
(iv)
10
9
8
7
11
12
0
0
10
20
30
40
50
60
70
80
90
100
Abundance-rankedproteins (%)
Portio
no
fp
rote
in mass
(%
)
10 20 30 40 50 60 70 80 90 100
Minimum to maximum
Median with 25–75% percentile
Fig. 3 | Organism-resolved integration of proteome data into a global
analysis. a, Cumulative protein intensities (ranked by abundance; x axis) and
their contribution to total protein mass (y axis) across all organisms (n = 100
organisms). b, Exemplified structure from the data model of the graph
database, illustrating the connection between two homologous proteins of
G. max and V. vinifera, and related annotations. c, All quantified proteins
from G. max are displayed, plotting their intensities against their rank in the
dynamic range. All proteins for which the functions are associated with
‘protein quality control for misfolded or incomplete synthesized proteins’
are highlighted. d, Significantly enriched functions (grey circles, P < 0.05;
red circles, P < 0.01) within the proteome of G. max (with seven specific
examples) and their distribution across the dynamic range (sample sizes in
parentheses; one-sided Mann–Whitney U-test to the mean functional
expression level). Error bars represent minimum to maximum values, and
boxes show 10–90% percentiles.