Nature - USA (2020-08-20)

(Antfer) #1

multiplexed lanes of an Illumina HiSeq3000 (Illumina Inc.) resulting
in 18–47 million read pairs per sample.


Analysis of RNA-seq data
Raw RNA-seq reads were mapped to the C. elegans reference assem-
bly (PRJNA13758, WormBase v.WS250) by TopHat2 (v.2.0.14, default
options)^38. Expression levels were estimated as fragments per kilo-
base transcript per million mapped reads (FPKM) for each sample
individually by Cufflinks (v.2.2.1, default options)^39. FPKM values were
then transformed into z-scores using the scale function of R (nega-
tive values were converted into zeros and the maximum z-score was
set to 10). Z-scores were then used to perform hierarchical cluster-
ing (R function heatmap with scale = ‘none’ option and using single
linkage clustering algorithm). In addition, to identify differentially
expressed genes, we compared the ECR overexpressing samples against
their respective control sample with the help of the Cuffdiff program
(v.2.2.1,–library-norm-method classic-fpkm)^39. We defined genes to be
significantly differentially expressed if they showed an FDR-corrected
P < 0.1 and an absolute fold change >2 in at least two of the ECR over-
expressing samples.


Bioinformatic analysis
To bridge the evolutionary gap between the nematode and human
proteins, a bioinformatics pipeline was built which derives consensus
sequences with iterative sequence alignment using HMMER v.3.2.1.
This algorithm was demonstrated to be more sensitive than regular
sequence alignments in detecting remote orthologues (www.hmmer.
org). The workflow was written using Python 3.5. Protein sequences
were retrieved using the UniProt API. BioPython was used for handling
of alignments produced by jackhmmer. Alignments were visualized
using ‘JavaScript Sequence Alignment Viewer’.
Genes identified in the screen were mapped to their correspond-
ing Uniprot entries. For each entry a consensus sequence profile was
iteratively created using the jackhmmer algorithm^40. Each consensus
profile was then used to search for human proteins and genes which
match this profile. The potential human homologues were ranked fol-
lowing three criteria. First the relative alignment length (0 to 1, 1 means
the potential human homologue has the same length as the consensus
profile). Second the gene expression in the brain (−1 if not expressed,
0 if unknown, 1 if expressed)^41. Third the association with known neuronal
or mental diseases (sum of all scores)^42. Brain gene expression values
were taken from the Allen Human Brain Atlas^41. Association with known
diseases was taken from DisGeNET^42. Only diseases that are classified
as ‘diseases of mental health’ or ‘nervous system disease’ in the Disease
Ontology^43 were taken into consideration. For each gene, the sum of
the weighted evidence score was used. Human orthologues of ECR
candidates were selected with ≥25% alignment and a total score ≥ 1.2.
For three candidates (F11E6.3, endu-1 and lron-13), orthologues are
listed from Wormbase (Supplementary Table 4). Possible protein struc-
tures, folds and functions were predicted using the webserver Phyre2
with default settings^44. Folds with over 90% confidence are reported
(Supplementary Table 1). Several hits have predicted enzymatic func-
tions such as hydrolase or transferase activity, which could change
post-translational modifications to inhibit aggregation.


Statistical analysis
No statistical method was used to predetermine sample size and ani-
mals were randomly distributed between conditions. Aggregate count-
ing experiments were performed blinded except with phenotypes
precluding blinding. For analysis of aggregation, two-sided Fisher’s
exact test (GraphPad Prism, v.7.04) was performed to analyse two
aggregation categories (animals with no puncta versus animals <10
puncta) and chi-square test (GraphPad) for three categories (animals
with no puncta versus animals <10 puncta versus animals >10 puncta).
When one aggregation category had less than three animals in both


conditions, we performed two-sided Fisher ́s exact test combining
two categories together (animals with no puncta versus animals <10
puncta plus animals >10 puncta). For multiple aggregation categories
and to analyse the effect on protein aggregation of multiple RNAi treat-
ments compared to control RNAi, we used an ordinal logistic regression
model, which was performed using R (v.3.6.0) and its MASS package
(v.7.3–51.4)^45. For two aggregation categories with multiple compari-
sons, we controlled for the false discovery rate with the Benjamini–
Hochberg correction. Lifespan and survival assays were not performed
blind and analysis was carried out by log-rank test with Bonferroni cor-
rection for multiple comparisons using OASIS 2 (https://sbi.postech.
ac.kr/oasis2/)^46. For qRT–PCR experiments, unpaired two-sided Stu-
dent’s t-test with Welch’s correction (GraphPad) was used on ΔCt of tran-
script levels, treated versus untreated from biologically independent
samples. Enrichment analysis of ECRs in Supplementary Tables 2 and
3 was performed with WormExp (http://wormexp.zoologie.uni-kiel.
de/wormexp/)^47 , category Microbes, one-sided Fisher’s exact test with
Bonferroni correction P < 0.05. All numerical values used for graphs
and detailed statistical analysis can be found in the Source data.

Reporting summary
Further information on research design is available in the Nature
Research Reporting Summary linked to this paper.

Data availability
All relevant data are available and/or included with the manuscript
or its Supplementary Information. RNA-sequencing data have been
uploaded to the European Nucleotide Archive under the study acces-
sion PRJEB36386. Source data are provided with this paper.

Code availability
The source code for the bioinformatics analysis of homologues is avail-
able at https://github.com/Ashafix/C_Elegans_Homologs.


  1. Petersen, T. N., Brunak, S., von Heijne, G. & Nielsen, H. SignalP 4.0: discriminating signal
    peptides from transmembrane regions. Nat. Methods 8 , 785–786 (2011).

  2. Krogh, A., Larsson, B., von Heijne, G. & Sonnhammer, E. L. Predicting transmembrane
    protein topology with a hidden Markov model: application to complete genomes. J. Mol.
    Biol. 305 , 567–580 (2001).

  3. Hansen, M., Hsu, A. L., Dillin, A. & Kenyon, C. New genes tied to endocrine, metabolic,
    and dietary regulation of lifespan from a Caenorhabditis elegans genomic RNAi screen.
    PLoS Genet. 1 , 119–128 (2005).

  4. Groh, N. et al. Methods to study changes in inherent protein aggregation with age in
    Caenorhabditis elegans. J. Vis. Exp. 129 , 56464 (2017).

  5. Shen, X., Ellis, R. E., Sakaki, K. & Kaufman, R. J. Genetic interactions due to constitutive
    and inducible gene regulation mediated by the unfolded protein response in C. elegans.
    PLoS Genet. 1 , e37 (2005).

  6. Huang, C. et al. Intrinsically aggregation-prone proteins form amyloid-like aggregates
    and contribute to tissue aging in Caenorhabditis elegans. eLife 8 , e43059 (2019).

  7. Bischof, L. J., Huffman, D. L. & Aroian, R. V. Assays for toxicity studies in C. elegans with Bt
    crystal proteins. Methods Mol. Biol. 351 , 139–154 (2006).

  8. Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions,
    deletions and gene fusions. Genome Biol. 14 , R36 (2013).

  9. Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals
    unannotated transcripts and isoform switching during cell differentiation. Nat.
    Biotechnol. 28 , 511–515 (2010).

  10. Johnson, L. S., Eddy, S. R. & Portugaly, E. Hidden Markov model speed heuristic and
    iterative HMM search procedure. BMC Bioinformatics 11 , 431 (2010).

  11. Hawrylycz, M. J. et al. An anatomically comprehensive atlas of the adult human brain
    transcriptome. Nature 489 , 391–399 (2012).

  12. Piñero, J. et al. DisGeNET: a discovery platform for the dynamical exploration of human
    diseases and their genes. Database (Oxford) 2015 , bav028 (2015).

  13. Kibbe, W. A. et al. Disease Ontology 2015 update: an expanded and updated database of
    human diseases for linking biomedical knowledge through disease data. Nucleic Acids
    Res. 43 , D1071–D1078 (2015).

  14. Kelley, L. A., Mezulis, S., Yates, C. M., Wass, M. N. & Sternberg, M. J. The Phyre2
    web portal for protein modeling, prediction and analysis. Nat. Protoc. 10 , 845–858
    (2015).

  15. Venables, W. N. & Ripley, B. D. Modern Applied Statistics with S (Springer-Verlag New York,
    2002).

  16. Han, S. K. et al. OASIS 2: online application for survival analysis 2 with features for
    the analysis of maximal lifespan and healthspan in aging research. Oncotarget 7 ,
    56147–56152 (2016).

Free download pdf