The Lotus japonicus Genome

(Steven Felgate) #1

vectors. The average insert sizes of the main TAC
libraries were around 100 kb. In addition to the
main libraries, TAC libraries with shorter inserts
and BAC libraries were constructed as subli-
braries. The total of these libraries was 20 times
haploid genome equivalents. Three-dimensional
DNA pools for PCR screening were prepared
from the main libraries and BAC sublibraries.
End sequences of these clones were accumulated
to facilitate walking clone selection from the seed
sequences.
Aiming to identify seed points for sequencing
as well as to generate a catalogue of expressed
genomic regions, a large scale analysis of
expressed sequence tags (ESTs) was carried out
in the initial phase of the project. Based on the
EST information, TAC/BAC clones were selec-
ted from the genomic libraries as seed points for
the clone-by-clone approach. The nucleotide
sequence of each clone was determined using a
the shotgun strategy with three to five times
redundancy, and the sequenced clones were
anchored onto six chromosomes using a total of
788 microsatellite markers derived from the clone
sequences. Since there are chromosomal inver-
sions presumably caused by translocation
between the top arm of MG-20 chromosome
1 and the bottom arm of B-129 chromosome 2 the
genetic distance of these regions is nearly zero,
with limited recombination. Therefore, the order
of the clones placed within the chromosomal
inversion regions was assigned based on the
corresponding markers on the genetic linkage
maps ofL.filicaulisxL. japonicusGifu and
L. burttiixL. japonicusGifu as well as the results
offluorescent in situ hybridisation (FISH) analy-
sis. The constructed pseudomolecules represented
the physical form of MG20 genome. In parallel
with the clone-by-clone approach, shotgun
sequencing of selected genomic regions was used
to accumulate draft sequence information for the
remaining gene-rich regions (Sato et al. 2008 ).
The total length of the v. 1.0 assembly was
315 Mbp, consisting of 594 anchored supercon-
tigs with a total length of 130 Mbp and 110,000
unanchored contigs with a total length of
184 Mbp. While this assembly corresponded to
67 % of the reported L. japonicus genome


(472 Mb) (Ito et al. 2000 ), it was estimated that it
covered ~91 % of the gene space because 11,404
out of 12,485 tentative consensus (TC) sequen-
ces of theL. japonicusGene Index provided by
the Gene Index Project [http://compbio.dfci.
harvard.edu/tgi/plant.html] could be mapped to
the v. 1.0 assembly.
In 2010, an updated genome sequence, v. 2.5,
was constructed by adding genome sequence
information from 460 TAC/BAC clones ana-
lyzed after the release of v. 1.0, increasing the
total length of anchored contigs to 195 Mbp. This
Sanger-based sequence information is available
through the web database athttp://www.kazusa.
or.jp/lotus/build2.5.

4.2 Whole-genome Sequencing

The advent of high-throughput and low cost
next-generation sequencing (NGS) technology
led to a general change in sequencing strategy
from clone-by-clone Sanger sequencing to
whole-genome shotgun approaches. In 2009, the
NGS shotgun approach was implemented in the
L. japonicusgenome sequencing project using
two emerging NGS platforms, 454 GS FLX
(Margulies et al. 2005 ) and Illumina (Bentley
et al. 2008 ). Since there was no assembly pro-
gram available that could carry out hybrid
assembly of short NGS reads and longer Sanger-
based contigs, a step-by-step approach for com-
bined assembly was used (Fig.4.1).
Using this hybrid assembly approach, a total
of 132 scaffolds covering 232 Mbp of the gen-
ome were aligned to the sixL. japonicuschro-
mosomes. Thus, in v 2.5, the total number of
scaffolds has been decreased to one-fifth of the
previous 646 scaffolds, and the total length of
anchored contigs has been increased by 20 % of
the original 195 Mbp. The remaining 23,572
unanchored contigs, corresponding to 162 Mbp
were assigned to a virtual chromosome 0
(Table4.1). Version 3.0 of the Lotus genome
(Sato et al. 2014 ) is thus comparable to other
published legume reference genomes, including
Medicago truncatulav. 3.5 and soybean (Sch-
mutz et al. 2010 ; Young et al. 2011 ). The gene

36 S. Sato and S.U. Andersen

Free download pdf