The Lotus japonicus Genome

(Steven Felgate) #1

4.4 Repetitive Sequences


Since the highly repetitive sequences in theL.
japonicus genome tend to be excluded from
scaffolds/contigs during NGS and hybrid assem-
bly, these regions of the genome could be less
represented in the latest assembly. Therefore, less
biased TAC end sequences were used to survey
the highly repetitive sequences in the entire gen-
ome ofL. japonicus. Based on analysis of 37,000
TAC end sequences, 34 types of highly repetitive
sequences were identified (Sato et al. 2008 ).
These included sequences localized to highly
condensed heterochromatic regions of the gen-
ome known as chromosome knobs and centro-
mere-associated sequences (Pedrosa et al.
2002 ; Sato et al. 2008 ).
In parallel, a combined computer-assisted and
experimental analysis of transposable elements
(TEs) on the v. 1.0 assembly was carried out
(Holligan et al. 2006 ). The analysis revealed that
theL. japonicusgenome is rich in Pack-MULEs,
nonautonomous MULEs (Mutator-like ele-
ments), and Copia-like elements with an addi-
tional ORF (Holligan et al. 2006 ). Transposon
display indicated a significant level of insertion
polymorphism between Miyakojima MG-20 and
Gifu B-129, suggesting recent element activity.
Indeed, the transposition activity of two gypsy-
like retrotranspons, LORE1 and LORE2, was


confirmed (Fukai et al. 2010 ; Madsen et al.
2005 ). Later, the construction of a large scaleL.
japonicusinsertion mutant collection was initi-
ated based on the transposition activity of
LORE1(Fukai et al. 2012 ; Urbanski et al. 2012 ,
2013 ).
As a further analysis of repetitive sequences
RECON (Bao and Eddy, 2002 ) was used to
identify dispersed repetitive sequences in v. 3.0.
As a result, a variety of repeat elements including
class I and class II TE subfamilies and those that
are difficult to classify into known subfamilies
were stored in repetitive sequence libraries in
addition to the highly repetitive sequences iden-
tified by TAC end sequence analysis. Using
RepeatMasker analysis based on theL. japonicus
repetitive sequence libraries, ~32 % of the L.
japonicusgenome sequence (version 3.0) was
classified as repetitive (Table4.2). About half of
the entire length of the repetitive sequences
identified was Class I TEs (retrotransposons),
while the nonautonomous class II TEs including
Pack-MULEs and MITEs were the most abun-
dant with nearly 130,000 copies. A substantial
portion of these nonautonomous class II TEs
were found in introns and UTRs. A short insert
size class I retroelement, a member of the short-
interspersed nucleotide elements (SINE), was
also preferentially observed in introns and the
3 ’UTRs (Fawcett et al. 2006 ). The fraction of

Table 4.2 Repetitive sequences inL. japonicusgenome sequence v. 3.0


Repeat type Number of elements Coverage (kb) Percentage of sequence (%)
Class I
SINEs 294 36.9 0.01
LINEs 11,650 4105.7 1.04
LTR: Copia 56,317 30773.5 7.80
LTR: Gypsy 36,819 26773.3 6.79
LTR: other 4,915 2410.2 0.61
Total class I 109,995 64099.6 16.25
Class II
Autonomous class II 57,146 15214.7 3.86
Nonautonomous class II 129,182 30833.1 7.81
Total class II 186,328 46047.8 11.67
Short tandem repeats 650 612.7 0.16
Unclassified 68,567 17327.5 4.39

38 S. Sato and S.U. Andersen

Free download pdf