Science - USA (2022-03-04)

(Maropa) #1

1004 4 MARCH 2022¥VOL 375 ISSUE 6584 science.orgSCIENCE


Fig. 4. Transcriptional neighborhood predicts
transcript isoform expression levels and lengths.
(A) Averaged feature importance scores for models
predicting TSS or TES positioning or expression level
changes (Dexpn level) for genes in the SCRaMbLE strains
learned using Gradient Boosted Regression Trees (GBRT).
Stacked bars show the fractional contribution of sequence
features and transcriptional features (transcriptional sim-
ilarity on either strand, expression level fold change, and
distance to the nearest isoform) in the 5′and 3′
neighborhoods (within 3 kb) for each prediction. The
importance of all 5′and 3′features sums to one for each
prediction task. (B) Performance of models predicting
TSS or TES positioning orDexpn level trained using
genomic features only (sequence), features related to the
transcriptional neighborhood only (transcription), or all
features (full). Bars indicate 95% CI across all models. MSE,
mean squared error. (C) Observed versus predicted (from
-SCRaMbLE) flanking transcriptional similarities for rear-
ranged segments and their correlation (Pearson’s correla-
tion coefficient,r). Areas of greater density are darker.
Because transcript isoform coverage vectors on both strands were used, cosine similarity ranges from−1to1.


AB

−1 1 −1 1

=0.7
−1 p=3.3e-26

1

−1

1

0

0

0

0

=0.78
p=2.7e-12
(+SCRaMbLE)

predicted similarity (-SCRaMbLE)

5 ́-
neighborhood

3 ́-
feature importance neighborhood

5 ́-
neighborhood

3 ́-
neighborhood

prediction

0.5 0 0 0.5

TSS

TES

expn
level

sequence

-strand similarity

+strand similarity fold change
isoform distance

1e-4

0.5e-4

sequence
transcription

performance

(1/MSE)

C

r r

observed similarity

0

TSS

full
sequence
transcription

full
sequence
transcription

full

0

0.05

0.025

expn
level

2e-5

0

TES

Fig. 5. Neighboring gene expression regulates and
can be used to engineer 3 UTR lengths.(A)3′UTR
lengths of convergent genes binned by 100-bp
increments of intergenic distance in the WT genome.
(B) Change in 3′UTR lengths of convergent gene pairs
plotted by increased (100-bp increments) intergenic
distance after SCRaMbLE. (C) Expression fold changes
of genes convergent to those with minor (<100 nt) or
major (≥100 nt) 3′UTR extensions after rearrangement.
(D) Length of overlap (nt) between novel convergent
transcripts where the downstream member is
expressed at a low (≤50 TPM) or high (≤150 TPM) level.
(E) Distribution ofYLR082CTESs (relative to its
CDS) when the convergent gene is overexpressed (Gal,
black) or not (Raf, gray). (F) Fraction of genes in
convergent and tandem pairs with significantly altered
TES positions (Kolmogorov-Smirnov test,P≤0.001,
applied to each gene) when an adjacent gene is
overexpressed by a factor of≥20 (hatched) or not
(white) after galactose-induced transcription factor (TF)
overexpression. Dashed lines indicate the fraction of
randomly selected genes with significantly altered TESs
in galactose. Numbers of genes tested are indicated
above the bars. (G) Change in 3′UTR length distributions
for convergent, tandem, and random gene pairs upon
TF overexpression in galactose as assessed by the
change in the area under the curve (Dauc) of TES
cumulative distributions. Negative values indicate iso-
form shortening. (H) cDNA sequencing reads aligned to
YIR018W(above) andYIR018C-A(below) in a tetracycline-
repressibleYIR018C-Astrain in the absence (gray) and
presence (green) of doxycycline. (I) Change inYIR018C-A
expression (left), plotted as mean ± SD, andYIR018W 3 ′
UTR length (right) upon doxycycline-induced inhibition
ofYIR018C-Aexpression. (J) The ability to control 3′UTR
length by altering convergent gene expression levels could
be applied to embed a reversibly expressed, functional sequence tag in transcript 3′UTRs. Only adjacent bins were tested for significance in (A) and (B). Boxplots indicate
median and IQR, and whiskers extend to the minimum and maximum values within 1.5 times the IQR. Notches indicate 95% CIs. Asterisks denote significance levels
in the Mann-WhitneyUtest, *P≤0.05, P≤1e-2, **P≤1e-4.


A BDC

E F G

'-UTR
length (nt)

*

3 ́- expressionfold change 0 1

500

1000

<100 100
3’-expression
(TPM)

convergent transcript

overlap (nt)
0

600

**

50 150

400

200

induced (Gal)

expression level

́

density

YLR082C

−200 0 200 400
TES distance
from CDS (nt)

uninduced (Raf)
convergent

tandem

random

−.1 0 .1

*

*

shorter
isoforms

fraction
significant

TES

convergent
tandem

# tested

0

.1

.2

.3

.4

.5^2655 3583159 37

8

<20
fold change

TES

**
****

−100 0 300
'-UTR length (nt)

-SCRaMbLE +SCRaMbLE

distance (nt)
500
100

100
200
300
400

−250 0 250 500

****
****
****
****
intergenic distance (nt)

3 '-UTR length (nt)

500

100
200
300
400

distance

HI

J tag
off

on
+- custom tag
expression

3266

0

2.31e6

# reads+ dox


  • dox


YIR018W YIR018C-A tetOFFp

0 .5 1 1.5

distance (kb)
2 2.25
0 600

+dox

-dox ****

TES
distance (nt)

04
log2
fold change

300

RESEARCH | RESEARCH ARTICLES

Free download pdf