13,513,873 sites contain at least two mutations
affecting more than one sample, which implies
that up to 17.5% of variable sites could result
from more than one ancestral mutation. A high
proportion of sites with more than∼100 muta-
tions on chromosome 20 have sequencing or
alignment quality issues as defined by the
TGP accessibility mask ( 6 ) or are in minimal
linkage disequilibrium to their surrounding
sites (fig. S6), which suggests that they arelargely erroneous. Moreover, analysis of data
simulated with an empirically calibrated error
profile and evaluation of the enrichment of
multiple mutations at sites with known ele-
vated mutation rates suggests that most ofWohnset al.,Science 375 , eabi8264 (2022) 25 February 2022 2of9
A = mutationRelative ageBInfer Tree
Sequence TopologyOrder by
FrequencyConstrain Ages with
Ancient Samples (if available)OlderOlderOrder by
Estimated AgeStep 0Step 1Date Tree
SequenceStep 2Step 3Step 4Modern Samples OnlyModern +
Ancient Samples (if
available)CCEU CHB YRIDAncient Sample564
0 1 2 37
640 1 2 357
640 1 2 3Fig. 1. Schematic overview and validation of the inference methodology.
(A) An example tree sequence topology with four samples (nodes 0 to 3),
two marginal trees, four ancestral haplotypes (nodes 4 to 7), and two mutations.
Tspanmeasures the genomic span of each marginal tree topology, with
the dotted line indicating the location of a recombination event. The graph
representation is equivalent to the tree representation. (B) Schematic
representation of the inference methodology. Step 0: Alleles are ordered by
frequency (freq.); the mutation represented by the four-point star is considered
to be older. Step 1: The tree sequence topology is inferred withtsinferusing
modern samples. Step 2: The tree sequence is dated withtsdate. Step 3:
Node date estimates are constrained with the known age of ancient samples.
Step 4: Ancestral haplotypes are reordered by the estimated age of their focal
mutation; the five-pointed star mutation is now inferred to be older. The
algorithm returns to step 1 to reinfer the tree sequence topology with ancient
samples. Arrows refer to modes of operation: steps 0, 1, and 2 only (red);
steps 0, 1, 2, 4, 1, and 2 (green); or steps 0, 1, 2, 3, 4, 1, and 2 (blue) ( 24 ).
(C) Scatter plots and accuracy metrics comparing simulated (xaxis) and inferred
(yaxis) mutation ages frommsprimeneutral coalescent simulations, using
tsdatewith the simulated topology (left) and inferred topology fromtsinfer
(right). RMSLE, root mean squared log error. (D) Accuracy metrics, RMSLE (top),
and Spearman rank correlation coefficient (r) (bottom), with modern samples
only (first panel), after one round of iteration (second panel), and with increasing
numbers of ancient samples (third panel) [colored arrows as in (B)]. Ancient
samples from three eras of human history are considered, as in the schematic
( 24 ). CEU, Utah residents with Northern and Western European Ancestry;
CHB, Han Chinese; YRI, Yorubans.RESEARCH | RESEARCH ARTICLE