1004 4 MARCH 2022¥VOL 375 ISSUE 6584 science.orgSCIENCE
Fig. 4. Transcriptional neighborhood predicts
transcript isoform expression levels and lengths.
(A) Averaged feature importance scores for models
predicting TSS or TES positioning or expression level
changes (Dexpn level) for genes in the SCRaMbLE strains
learned using Gradient Boosted Regression Trees (GBRT).
Stacked bars show the fractional contribution of sequence
features and transcriptional features (transcriptional sim-
ilarity on either strand, expression level fold change, and
distance to the nearest isoform) in the 5′and 3′
neighborhoods (within 3 kb) for each prediction. The
importance of all 5′and 3′features sums to one for each
prediction task. (B) Performance of models predicting
TSS or TES positioning orDexpn level trained using
genomic features only (sequence), features related to the
transcriptional neighborhood only (transcription), or all
features (full). Bars indicate 95% CI across all models. MSE,
mean squared error. (C) Observed versus predicted (from
-SCRaMbLE) flanking transcriptional similarities for rear-
ranged segments and their correlation (Pearson’s correla-
tion coefficient,r). Areas of greater density are darker.
Because transcript isoform coverage vectors on both strands were used, cosine similarity ranges from−1to1.
AB
−1 1 −1 1
=0.7
−1 p=3.3e-26
1
−1
1
0
0
0
0
=0.78
p=2.7e-12
(+SCRaMbLE)
predicted similarity (-SCRaMbLE)
5 ́-
neighborhood
3 ́-
feature importance neighborhood
5 ́-
neighborhood
3 ́-
neighborhood
prediction
0.5 0 0 0.5
TSS
TES
expn
level
sequence
-strand similarity
+strand similarity fold change
isoform distance
1e-4
0.5e-4
sequence
transcription
performance
(1/MSE)
C
r r
observed similarity
0
TSS
full
sequence
transcription
full
sequence
transcription
full
0
0.05
0.025
expn
level
2e-5
0
TES
Fig. 5. Neighboring gene expression regulates and
can be used to engineer 3 UTR lengths.(A)3′UTR
lengths of convergent genes binned by 100-bp
increments of intergenic distance in the WT genome.
(B) Change in 3′UTR lengths of convergent gene pairs
plotted by increased (100-bp increments) intergenic
distance after SCRaMbLE. (C) Expression fold changes
of genes convergent to those with minor (<100 nt) or
major (≥100 nt) 3′UTR extensions after rearrangement.
(D) Length of overlap (nt) between novel convergent
transcripts where the downstream member is
expressed at a low (≤50 TPM) or high (≤150 TPM) level.
(E) Distribution ofYLR082CTESs (relative to its
CDS) when the convergent gene is overexpressed (Gal,
black) or not (Raf, gray). (F) Fraction of genes in
convergent and tandem pairs with significantly altered
TES positions (Kolmogorov-Smirnov test,P≤0.001,
applied to each gene) when an adjacent gene is
overexpressed by a factor of≥20 (hatched) or not
(white) after galactose-induced transcription factor (TF)
overexpression. Dashed lines indicate the fraction of
randomly selected genes with significantly altered TESs
in galactose. Numbers of genes tested are indicated
above the bars. (G) Change in 3′UTR length distributions
for convergent, tandem, and random gene pairs upon
TF overexpression in galactose as assessed by the
change in the area under the curve (Dauc) of TES
cumulative distributions. Negative values indicate iso-
form shortening. (H) cDNA sequencing reads aligned to
YIR018W(above) andYIR018C-A(below) in a tetracycline-
repressibleYIR018C-Astrain in the absence (gray) and
presence (green) of doxycycline. (I) Change inYIR018C-A
expression (left), plotted as mean ± SD, andYIR018W 3 ′
UTR length (right) upon doxycycline-induced inhibition
ofYIR018C-Aexpression. (J) The ability to control 3′UTR
length by altering convergent gene expression levels could
be applied to embed a reversibly expressed, functional sequence tag in transcript 3′UTRs. Only adjacent bins were tested for significance in (A) and (B). Boxplots indicate
median and IQR, and whiskers extend to the minimum and maximum values within 1.5 times the IQR. Notches indicate 95% CIs. Asterisks denote significance levels
in the Mann-WhitneyUtest, *P≤0.05, P≤1e-2, **P≤1e-4.
A BDC
E F G
'-UTR
length (nt)
*
3 ́- expressionfold change 0 1
500
1000
<100 100
3’-expression
(TPM)
convergent transcript
overlap (nt)
0
600
**
50 150
400
200
induced (Gal)
expression level
́
density
YLR082C
−200 0 200 400
TES distance
from CDS (nt)
uninduced (Raf)
convergent
tandem
random
−.1 0 .1
*
*
shorter
isoforms
fraction
significant
TES
convergent
tandem
# tested
0
.1
.2
.3
.4
.5^2655 3583159 37
8
<20
fold change
TES
**
****
−100 0 300
'-UTR length (nt)
-SCRaMbLE +SCRaMbLE
distance (nt)
500
100
100
200
300
400
−250 0 250 500
****
****
****
****
intergenic distance (nt)
3 '-UTR length (nt)
500
100
200
300
400
distance
HI
J tag
off
on
+- custom tag
expression
3266
0
2.31e6
# reads+ dox
- dox
YIR018W YIR018C-A tetOFFp
0 .5 1 1.5
distance (kb)
2 2.25
0 600
+dox
-dox ****
TES
distance (nt)
04
log2
fold change
300
RESEARCH | RESEARCH ARTICLES