Article
Extended Data Fig. 1 | Canary song annotation and sequence statistics.
a, Architecture of syllable segmentation and annotation machine learning
algorithm. (i) A spectrogram is fed into the algorithm as a 2D matrix in
segments of 1 s. (ii) Convolutional and max-pooling layers learn local spectral
and temporal filters. (iii) Bidirectional recurrent LSTM layer learns temporal
sequencing features. (iv) Projection onto syllable classes assigns a probability
for each 2.7-ms time bin and syllable. b, After manual proofreading
(see Methods), a support vector machine classifier was used to assess the
pairwise confusion between all syllable classes of bird 1 (see Methods). The test
set confusion matrix (right) and its histogram (left) show that in rare cases the
error exceeded 1% and at most reached 6%. As the higher values occurred only
in phrases with 10 s of syllables, this metric guarantees that most of the
syllables in every phrase cannot be confused as belonging to another syllable
class. Accordingly, the possibility of making a mistake in identifying a phrase
type is negligible. c, Number of phrases per song for the three birds used in this
study. d, Song durations for the three birds. e, Mean syllable durations for 85
syllable classes from three birds. Red arrow marks the duration below which all
trill types have more than ten repetitions on average. f, Relation between
phrase class mean duration (x axis) and standard deviation (y axis). Syllable
classes (dots) of three birds are coloured according to bird number. Dashed line
marks 450 ms (upper limit for the decay time constant of GCaMP6f ). g, Range of
mean number of syllables per phrase (y axis) for all syllable types with mean
duration shorter than the x-axis value. Red line is the median, light grey marks
the 25% and 75% quantiles and dark grey marks the 5% and 95% quantiles (blue
line marks the number of syllable types contributing to these statistics). The
red arrow matches the arrow in e. h, Cumulative histogram of trill phrase
durations. i, All complex phrase transitions with second-order or higher
dependence on song history context (for birds 1 and 2). For each phrase type
that precedes a complex transition, the context dependence is visualized by a
PST (see Methods). Transition outcome probabilities are marked by pie charts
at the centre of each node. The song context (phrase sequence) that leads to
the transition is marked by concentric circles, the innermost being the phrase
type that preceded the transition. Nodes are connected to indicate the
sequences in which they are added in the search for longer Markov chains that
describe context dependence (for example, i–iii for first- to third-order Markov
chains). Grey arrows indicate additional incoming links that are omitted for
simplicity.