Science - USA (2022-01-28)

(Antfer) #1

MOLECULAR BIOLOGY


Sequence specificity in DNA binding is mainly


governed by association


Emil Marklund^1 , Guanzhong Mao^1 †,Jinwen Yuan^1 †, Spartak Zikrin^1 , Eldar Abdurakhmanov^2 ,
Sebastian Deindl^1 , Johan Elf^1


Sequence-specific binding of proteins to DNA is essential for accessing genetic information. We derive a
model that predicts an anticorrelation between the macroscopic association and dissociation rates of
DNA binding proteins. We tested the model for thousands of differentlacoperator sequences with a
protein binding microarray and by observing kinetics for individuallacrepressor molecules in
single-molecule experiments. We found that sequence specificity is mainly governed by the efficiency
with which the protein recognizes different targets. The variation in probability of recognizing
different targets is at least 1.7 times as large as the variation in microscopic dissociation rates.
Modulating the rate of binding instead of the rate of dissociation effectively reduces the risk of the
protein being retained on nontarget sequences while searching.


S


equence-specific recognition and binding
of DNA target sites by proteins such as
polymerases, DNA-modifying enzymes,
and transcription factors are essential
for gene expression and regulation across
all kingdoms of life ( 1 ). The textbook explanation
for this sequence dependence of binding posits
that favorable hydrogen bonding interactions
between the protein and particular DNA se-
quences result in prolonged binding times ( 2 ).
Consequently, the rate of protein dissociation
would depend on the DNA sequence, whereas
the association rate would be invariant with
respect to sequence. Indeed, the rate of protein
association with DNA has often been assumed
to be sequence independent ( 3 – 6 ). However,
single-molecule measurements have shown
that when a protein scans the DNA for binding
sites, the association rate does depend on the
sequence ( 7 ), and that different target sequences
can be bypassed with distinct probabilities
( 8 ). These variations have been ascribed to
differences in the probability of recognition
when the protein is centered on the target
sequence ( 7 ). It is, however, unknown whether
the specific retention of a given sequence is
chieflycausedbydifferencesintheprobability
of recognition or in the rate of dissociation.
The physical constraints on the rate constants
are unknown beyond the fact that the ratio of
association and dissociation rates is necessar-
ily dictated by the free-energy difference be-
tween the free and bound states.
To explore what limits the association and dis-
sociation rates, we considered the theoretical
standard model ( 9 ), according to which a pro-
tein has a nonspecific testing mode where it is


bound nonspecifically to DNA (Fig. 1A). In the
testing state, where the protein can slide into
the target sequence through nonspecific inter-
actions, the protein can either specifically bind
the target with probabilityptotor dissociate
into solution with probability 1−ptot. When
the association process is modeled as a three-
state (specifically bound, nonspecifically bound,
and dissociated) continuous-time Markov chain,
the effective macroscopic target association and
dissociation rate constants (kaandkd)relateto
each other according to

ka¼kon;max

kon;max
koff;m

kd ð 1 Þ

wherekon,maxis the association rate constant
given by a searching protein that binds the
target upon every nonspecific encounter
(ptot= 1), andkoff,mis the rate of microscopic
dissociation from the bound state into the
nonspecifically–bound searching mode (see
supplementary text for derivation of Eq. 1).
This equation implies that the macroscopic
association and dissociation rates are inher-
ently coupled, and linearly anticorrelated if
binding sites exhibit identical microscopic
dissociation rates, becausekon,maxdoes not
depend on the specific sequence. The linear
relationship betweenkaandkddescribed by
Eq. 1 is implicitly parameterized by the prob-
abilityptotof binding rather than dissociating
from the nonspecifically bound state, such
that an increase inptotcauses an increase in
kaand a corresponding decrease inkd(Fig. 1B).
The anticorrelation can be intuitively under-
stood by acknowledging that a decrease in the
number of target site encounters required for
successful binding must in turn result in a
corresponding increase in the number of dis-
sociation attempts needed for the macro-
scopic dissociation from the target (Fig. 1C).
Notably, Eq. 1 makes it possible to access the
microscopic parameterskoff,mandptotfrom

macroscopically measurable parameters, such
askaandkd. In Fig. 1D, we show predictions for
the distributions ofkaandkdthat would be
observed experimentally for different binding
sequences whenkoff,mandptotare varied in dif-
ferent ways (see also materials and methods).
Three scenarios that yield the same range of
KD=kd/kaare simulated by (i) varying mainly
koff,m, (ii) varyingkoff,mandptotto the same extent,
or (iii) by varying mainlyptot. Notably, all three
scenarios give distinct distributions in (ka,kd)-
space (Fig. 1D). The linear anticorrelation be-
tweenkaandkdis observed only in the scenario
whenptotvaries to a larger extent thankoff,m.
To experimentally test whether there is
anticorrelation between association and dis-
sociation rates, we measured the kinetics by
which a prototypical DNA-binding protein—
thelacrepressor (LacI)—binds to different
operator sequences. To directly compare the
rates for the association to and dissociation
from different operators under identical ex-
perimental conditions, we used a protein bind-
ing microarray (PBM) ( 10 ) with 2479 different
operator sequences that are mutated versions
of the naturalO 1 andO 2 ,aswellastheartifi-
cially strongOsymsequences. PBMs are normally
used to study equilibrium binding, but by
mounting the array in a flow cell on the micro-
scope (fig. S1A), we were able to monitor the
binding and unbinding kinetics of fluorescent
LacI-Cy3 in real time (Fig. 2, A and B). The Cy3
label at a site distal to the DNA binding
domain has previously been shown to affect
neither the specific nor the nonspecific DNA
binding ( 8 ) (labeling efficiency: 84.5%; see also
supplementary text and table S1). Because it
is impossible to measure a dissociation rate
when the fluorescence signal at equilibrium
is not substantially higher than the back-
ground, weak target sequences showed poor
reproducibility for individual sequences in
repeat experiments (gray points in Fig. 2, C
and D). In the remainder of our analysis we
therefore focused on operators where the
fluorescence signal at equilibrium was >3%
of the signal forOsym. For these operators the
measurements of both association and dis-
sociation rates were reproducible in repeat
experiments (cyan points in Fig. 2C; see also
fig. S1B). Moreover, equilibrium dissociation
constantsKDestimated byKD=kd/ka, versus
KDestimated from the fluorescence values at
equilibrium (see methods), show excellent
agreement (fig. S1C). In a plot of the associa-
tion versus dissociation rates for all operators
we observed an anticorrelation (Fig. 2D), which
implies that the microscopic rate of bindingptot
is different for different operators. To quantify
the relative importance ofptotandkoff,mfor
binding strength, we computed which range
ofptotandkoff,mvalues would give rise to the
observed spread in (ka,kd)-space. According to
this analysis, the ratio of variation inptotto

442 28 JANUARY 2022•VOL 375 ISSUE 6579 science.orgSCIENCE


(^1) Department of Cell and Molecular Biology, Science for Life
Laboratory, Uppsala University, Box 596, 75124, Uppsala,
Sweden.^2 Drug Discovery and Development Platform, Science
for Life Laboratory, Department of Chemistry, BMC, Uppsala
University, Box 576, 751 23 Uppsala, Sweden.
*Corresponding author. Email: [email protected] (S.D.);
[email protected] (J.E.)
These authors contributed equally to this work.
RESEARCH | REPORTS

Free download pdf