15.2.2 Late Sequence-Based Molecular Markers
(EST-SSR and SNPss): Broadening Knowledge
on Genetic Variability in Cannabis
The increasing availability of high-throughput genomic tools,finally leading to the
release of thefirst complete genome and transcriptome sequences ofC. sativa(Van
Bakel et al. 2011 ; see also Chap. 10 of this volume) opened the way to a larger scale
development of molecular markers, and to genome-wide approaches to the study of
genetic variability inCannabis. In particular, the bioinformatic analysis of collec-
tions of available expressed sequence tags (ESTs), revealed that these sequences are
a rich source of polymorphic SSRs in almost all taxa examined, with the additional
advantage of being tightly associated to expressed genes, and therefore of potential
agronomic, productive or qualitative interest (Ellis and Burke 2007 ).
The EST-SSRs approach has recently been exploited inC. sativaby Gao et al.
( 2014 ). These authors found a frequency of SSRs inCannabisEST sequences of
about 1 per 8.7 kb; about 11% of theCannabisESTs examined contained SSRs,
similarly to other species analyzed. Starting from the 3442 EST-SSRs detected in
the GenBank database of over 32,000 ESTs GenBank database, Gao et al. ( 2014 )
designed 117 EST-SSR primers, from which 45 were selected for genetic analysis
conducted on a collection of Chinese germplasm, with the addition of a few
Europeanfiber varieties. A preliminary test on a set of 24 varieties, however,
showed that 21 out of the 119 loci that can be scored by these markers turned out to
be monomorphic. This polymorphism level is lower than that reported by tradi-
tional genomic microsatellites, suggesting either that the 24 varieties selected to test
the markers were relatively closely related, or that SSRs detectable inside expressed
sequences are subject to variation constraints and are less variable than genomic
SSRs. Besides, Gao et al. ( 2014 ) found that the trinucleotide AAG/CTT repeat
motif was the most abundant in theCannabisgenome (17.96%), differently from
what is reported by Alghanim and Almirall ( 2003 ), indicating the AG/CT dinu-
cleotide as the most frequent. It is possible that this discrepancy might be due to the
different strategy employed for SSR marker development (probe technology vs.
transcriptome data base search), and truly reflects differences in the distribution of
SSRs when considering the genome in its expressed fraction or as a whole.
The availability of extensive sequencing data accumulated inC. sativaalso led
to the identification of a huge number of single nucleotide polymorphisms (SNPs),
often within expressed genes, and therefore with a strong functional meaning (see
Sect.15.3). Following genome and transcriptome sequencing, van Bakel et al.
( 2011 ) carried out a survey of the genome sequences of two marijuana varieties—
Purple Kush and Chemdawg—and of two hemp varieties—Finola and USO-31—
and estimated the rate of occurrence of SNPs as varying from 0.38% (Purple Kush
vs. Chemdawg) to 0.64% (Chemdawg vs. Finola); the estimated heterozygosity was
comparable in the four varieties, being respectively 0.20, 0.26, 0.25 and 0.18%.
These values were very similar to those estimated by RAPD markers by other
authors (0.15–0.20%, Forapani et al. 2001 ).
324 C. Onofri and G. Mandolino