Synthetic Biology Parts, Devices and Applications

(Nandana) #1
4.3 The E. coli Genome 53

the  genomic islands, they are called “core genome” and “auxiliary genome,”
respectively.
How many genes belong to the core genome? Obviously, the more genomes are
compared, the smaller the core genome appears, and the core identified within a
phylogroup is larger than the core obtained by inclusion of distant relatives. A
comparison of 61 sequenced E. coli genomes revealed that out of a huge pan-
genome of 15 741 gene families, only 993 (6%) of the families were represented in
every genome (core genome) [32]. The accessory genes thus make up more than
90% of the pan-genome and about 80% of a typical genome [32]. It should be
noted, however, that selection criteria applied to find conserved genes might
miss homologs in distantly related strains. Moreover, alternative genetic solu-
tions might exist for the same function. A refined comparison of 186 sequenced
E. coli genomes [33], identifying homolog gene clusters (HGCs), revealed a pan-
genome of 16 373 HGCs. The “soft core,” defined as all HGCs found in at least
95% of the genomes, consisted of 3051 HGCs (Figure 4.2). A recent census, list-
ing 2085 sequenced E. coli genomes, revealed that the pan-genome still grew
linearly with the number of genomes added, while the size of the core genome of
3188 gene families hardly changed [34].
Why do we think that a significant part of the genome is dispensable without
loss of fitness? E. coli evolved its gene set in the lower gut of animals, with
periodic shedding in the environment. It has obviously many genes that are


Average E. coli
4800 HGCs

Strict core
1700 HGCs

Soft core
3000 HGCs

Pan-genome
16000 HGCs

Figure 4.2 Comparison of the pan-genome and core-genome sizes, defined by homologous
gene clusters (HGCs). Data and classification criteria are from [33] and are based on the
analysis of 186 sequenced E. coli genomes. HGCs are generated by sequence similarity (95% of
HGCs have <0.242 substitutions per site). The soft-core genome is defined as all HGCs that
have members in at least 95% of the 186 genomes. The strict core genome is defined as all
HGCs that have members in all genomes. The pan-genome is defined as all HGCs.

Free download pdf