may contain more than one copy of their genome. Genomic DNA from nearly all
prokaryotic and eukaryotic organisms is also complexed with protein and termed
chromosomal DNA. Each gene is located at a particular position along the chromo-
some, termed thelocus, whilst the particular form of the gene is termed theallele.In
mammalian DNA each gene is present in twoallelic formswhich may be identical
(homozygous) or which may vary (heterozygous). It is thought that there are approxi-
mately 20 000 genes present in the human genome, although not all will be expressed
in a given cell at the same time. However various processing events such as alternative
splicing or RNA editing can increase the number of actual proteins found in the cell in
relation to the number of genes to nearly 1 million. The occurrence of different alleles
at the same site in the genome is termedpolymorphism. In general the more complex
an organism the larger its genome, although this is not always the case since many
higher organisms have non-coding sequences some of which are repeated numerous
times and termedrepetitive DNA. In mammalian DNA repetitive sequences may be
divided into low copy number and high copy number DNA. The latter is composed of
repeat sequences that are dispersed throughout the genome and those that are clus-
tered together. The repeat cluster DNA may be defined into so-calledclassical satellite
DNA,minisatelliteandmicrosatellite DNA, the latter being mainly composed of
dinucleotide repeats (Table 5.2). These sequences are termed polymorphic, collectively
termed polymorphisms, and vary between individuals; they also form the basis of
genetic fingerprinting.
5.3.2 Single nucleotide polymorphisms (SNPs)
A further important source of polymorphic diversity known to be present in genomes
is termedsingle nucleotide polymorphismsor SNPs (pronouncedsnips). SNPs are
substitutions of one base at a precise location within the genome. Those that occur in
coding regions are termedcSNPs. Estimates indicate that an SNP occurs every once in
Table 5.2Repetitive satellite sequences found in DNA, and their characteristics
Types of repetitive DNA
Repeat unit
size (bp) Characteristics/motifs
Satellite DNA 5 200 Large repeat unit range (Mb) usually found at centromeres
Minisatellite DNA
Telomere sequence 6 Found at the ends of chromosomes. Repeat unit may span
up to 20 kb G-rich sequence
Hypervariable sequence 10 60 Repeat unit may span up to 20 kb
Microsatellite DNA 1 4 Mononucleotide repeat of adenine dinucleotide repeats
common (CA). Usually known as VNTR (variable number
tandem repeat)
Notes:bp, base-pairs; kb, kilobase-pairs.
146 Molecular biology, bioinformatics and basic techniques