The Lotus japonicus Genome

(Steven Felgate) #1

space coverage of v. 3.0 is ~98 % based on the
placement of 57,916 de novo assembled RNA-
seq contigs and 25,694 tentative contigs (TCs,L.
japonicusGene Index v. 6.0) (Sato et al. 2014 ).


4.3 Gene Annotation


In early versions of the genome assembly, 1.0
and 2.5, annotation relied on ab initio predictions
and homology to available protein sequences. In


version 3.0 a hierarchical approach, prioritizing
transcription evidence was used instead (Sato
et al. 2014 ). The differences in annotation
approaches did not greatly influence the number
of annotated protein coding genes, which
remained around 40,000. However, the number
of annotated amino acids increased by 20 %
between v. 1.0 and 3.0, and this increase was
reflected by an increase in the number of
assigned peptides from proteomics data (Sato
et al. 2014 ).

Illumina reads
mate pair 38 bp
(2kb, 5kb)

Illumina reads
paired end 108 bp
(200bp, 500bp) ~ 14 Gbp ~ 3 Gbp

454 GS FLX paired end reads
(3kb, 8kb)

Filtering of repeat sequences against Lj_rep

Integration by mapping
(Megablast)

Combined assembly by PCAPrep

Version 2.5 phase
1 regions

LjSGA
de novo assembly by SOAPdenovo seqs

Upgraded pseudomolecules (build 3.0)

phase 2,3 sequences in version 2.5
additional finished TAC/BAC clones

Anchoring the scaffolds
(Marker-seq & pair-end info.)

Finished clone integrated contigs

Fig. 4.1 Summary of the hybrid assembly strategy used forL. japonicusgenome sequence version 3.0


Table 4.1 Assembly statistics ofL. japonicusgenome sequences in version 1.0, 2.5 and 3.0


Version 1.0 Version 2.5 Version 3.0
Total length of scaffolds (bases) 315,073,275 296,886,266 393,918,449
Anchored scaffolds
Number of anchored scaffolds 594 647 132
Length of anchored scaffolds (bases) 130,251,279 184,542,525 231,615,632
Unanchored scaffolds
Number of unanchored scaffolds 110,346 67,902 23,572
Length of unanchored scaffolds (bases) 184,821,996 112,343,741 162,302,817

4 Genome Sequencing 37

Free download pdf