RESEARCH ARTICLE
◥
INTEGRATIVE OMICS
An atlas of the protein-coding genes in the human,
pig, and mouse brain
Evelina Sjöstedt1,2, Wen Zhong^2 , Linn Fagerberg^2 , Max Karlsson^2 , Nicholas Mitsios^1 , Csaba Adori^1 ,
Per Oksvold^2 , Fredrik Edfors^2 , Agnieszka Limiszewska^1 , Feria Hikmet^3 , Jinrong Huang4,5,6,7,
Yutao Du5,8, Lin Lin4,6, Zhanying Dong4,5,6, Ling Yang4,5,6, Xin Liu5,8, Hui Jiang^9 , Xun Xu5,8,
Jian Wang5,8, Huanming Yang5,8, Lars Bolund4,5,6, Adil Mardinoglu^2 , Cheng Zhang^2 ,
Kalle von Feilitzen^2 , Cecilia Lindskog^3 , Fredrik Pontén^3 , Yonglun Luo4,5,6, Tomas Hökfelt^1 ,
Mathias Uhlén1,2†, Jan Mulder^1 †
The brain, with its diverse physiology and intricate cellular organization, is the most complex organ of
the mammalian body. To expand our basic understanding of the neurobiology of the brain and its
diseases, we performed a comprehensive molecular dissection of 10 major brain regions and multiple
subregions using a variety of transcriptomics methods and antibody-based mapping. This analysis was
carried out in the human, pig, and mouse brain to allow the identification of regional expression profiles,
as well as to study similarities and differences in expression levels between the three species. The
resulting data have been made available in an open-access Brain Atlas resource, part of the Human
Protein Atlas, to allow exploration and comparison of the expression of individual protein-coding genes in
various parts of the mammalian brain.
T
he brain is an extraordinarily complex
organ owing to its diverse physiology,
complex cellular organization, and abun-
dance of expressed genes. Identifying the
molecular organization of the brain at
regional, cellular, and subcellular levels will
advance our understanding of its function
under normal and diseased conditions. The
Human Protein Atlas (HPA) program aims to
combine antibody-based profiling with genome-
wide transcriptomics analysis to explore the
spatial expression levels of transcripts and
proteins across cells, tissues, and organs ( 1 ).
The Tissue Atlas ( 1 , 2 )—a subsection of the
HPA—includes only a limited number of hu-
man brain regions (the cerebral cortex, hip-
pocampus, caudate nucleus, and cerebellum).
Here, we describe genome-wide expression
profiles for the protein-coding genes in 10
major well-defined mammalian brain regions
to capture the complexity of the cellular or-
ganization. To identify differences and sim-
ilarities of the brain in different phylogenetic
orders, the expression profiles have been an-
alyzed in three species: primates (human),
Cetartiodactyla (pig), and Rodentia (mouse).
The effort described here is complementary
to several brain mapping projects focused on
basic organization and regional or cellular gene
expression of the mammalian brain. The Allen
Institute for Brain Science (https://alleninstitute.
org ) hosts several knowledge resources, in-
cludinganinsituhybridizationatlasofthe
adult ( 3 ) and developing ( 4 ) mouse brain; and
a microarray-based atlas of the adult human
brain ( 5 ) has been complemented with a map
of the human brain during development ( 6 ).
More recently, brain atlas strategies have been
launched on the basis of different approaches:
fluorescence-activated cell sorting in mouse
( 7 ), antibody-based cell sorting in human ( 8 ),
single-cell gene expression in mouse ( 9 )and
human ( 10 , 11 ), and covariation analysis of
transcriptomics expression ( 12 ). These efforts
have been further complemented with several
large-scale mapping programs, including the
National Institutes of Health (NIH) BRAIN
Initiative Cell Census Network ( 13 ), the Euro-
pean Human Brain Project ( 14 ), the NIH Hu-
man BioMolecular Atlas Program ( 15 ), and the
Human Cell Atlas project ( 16 ).
Here, we present the HPA Brain Atlas ( 17 ),
where the data collected have been used for
cell topological analysis, systems modeling,
and data integration, with the aim to create a
knowledge resource of messenger RNA and pro-
tein expression in the mammalian brain. We
complement the transcriptomics with antibody-
based protein profiles of selected proteins in
multiple regions of the mouse brain. In this
open-access resource, transcriptomics data from
three externalsources—the Genotype-Tissue
Expression (GTEx) portal ( 18 ), the Function-
al Annotation of Mammalian Genomes 5
(FANTOM5) project ( 19 ), and the Allen Mouse
Brain Atlas ( 3 )—are presented together with
RNA profiles and protein stainings generated
“in-house.”The classification of all protein-
coding genes with regard to brain regional
specificity is reported, and this is integrated
with the tissue and organ specificity across the
human body.
Transcriptomics analysis of the human brain
Transcriptomics analysis was performed on
anatomically dissected human, pig, and mouse
brain regions (Fig. 1A and figs. S1 to S3). For
the human brain, we integrated publicly avail-
able RNA sequencing (RNA-seq) data gener-
ated by the GTEx consortium ( 18 )andcap
analysis of gene expression (CAGE) data from
the FANTOM consortium ( 19 ), with data from
the HPA ( 1 ), for a total of 1710 samples from
selected human brain regions (table S1). The
combined dataset contains 23 human brain
regions, including white matter (corpus callo-
sum) and spinal cord, as outlined in Fig. 1B.
Several issues complicate the combining of
datasets. First, samples may not be homoge-
neous, especially for regions with a high level
of cellular heterogeneity, such as the hypo-
thalamus, midbrain, pons, and medulla oblong-
ata. Furthermore, both HPA and GTEx data are
based on RNA-seq protocols using polyadenyl-
ate [poly(A)] tail enrichment, whereas CAGE
data are based on the selection and sequenc-
ing of the 5′cap. As a result, genes lacking the
poly(A) tail, such as canonical histone mRNA,
areonlydetectedbyCAGE.Despitethesecom-
plications, the large number of included sam-
ples and our gene classification approach enable
us to generate a comprehensive overview of bio-
logically relevant gene expression and regional
and species variation.
We used normalization strategies to avoid
batch effects caused by sampling, technology
platforms, and differences in transcriptome
size between different types of tissues and
also to allow both within-sample and between-
sample comparisons ( 20 , 21 ). The within-sample
normalization was based on protein-coding
transcripts per million (pTPM), while the
between-sample normalization was based on
trimmed means of M values (TMM) ( 22 ), Pareto
scaling per gene ( 23 ), andlimma( 24 ), resulting
in normalized expression (NX) values calculated
for all genes across all tissue types, as out-
lined in Fig. 1C and described in detail in the
supplementary information (figs. S4 to S6).
The uniform manifold approximation and
projection (UMAP) (fig. S7) of all 1710 human
brain samples shows the expected global ex-
pression patterns after normalization. Devel-
opmentally related anatomical regions cluster
together, with the cerebellum being an outlier
RESEARCH
Sjöstedtet al.,Science 367 , eaay5947 (2020) 6 March 2020 1of16
(^1) Department of Neuroscience, Karolinska Institutet, 171 77
Stockholm, Sweden.^2 Department of Protein Science, Science
for Life Laboratory, KTH-Royal Institute of Technology, 17121
Stockholm, Sweden.^3 Department of Immunology, Genetics
and Pathology, Uppsala University, 751 85 Uppsala, Sweden.
(^4) Lars Bolund Institute of Regenerative Medicine, BGI-Qingdao,
Qingdao 266555, China.^5 BGI-Shenzhen, Shenzhen 518083,
China.^6 Department of Biomedicine, Aarhus University, 80000
Aarhus, Denmark.^7 Department of Biology, University of
Copenhagen, 2100 Copenhagen, Denmark.^8 China National
GeneBank, BGI-Shenzhen, Shenzhen 518083, China.^9 MGI,
BGI-Shenzhen, Shenzhen 518083, China.
*These authors contributed equally to this work.
†Corresponding author. Email: [email protected]
(M.U.); [email protected] (J.M.)