Computational Drug Discovery and Design

(backadmin) #1
of the fragile equilibrium between protein solubility and aggrega-
tion by energetically favoring the native conformations of targeted
proteins and degrading sticky misfolded species [8–10]. Unfortu-
nately, under certain conditions, such as cellular stress, aging,
downregulation of proteostasis and/or specific genetic mutations,
certain proteins manage to overcome the quality control and con-
sequently aggregate, compromising cell fitness [1, 11, 12]. For this
reason, it is not surprising that, protein aggregation is closely
related to the onset of more than 40 severe human disorders,
including the well-known and devastating neurodegenerative Alz-
heimer’s and Parkinson’s diseases [1, 13–16]. In addition, protein
aggregation represents one of the major restrictions for pharma-
ceutical and biotechnology manufacturing of protein-based thera-
pies. Proteins can aggregate during synthesis, purification or
storage into visible or subvisible particles with significant immuno-
genicity [17–21]. Thus, protein aggregation not only limits the
disposal of active molecules in the formulation, but can convert a
beneficial drug into a deadly agent. Last but not the least, the
development of new bioinspired nanomaterials with self-
assembling features is significantly constrained by our present
understanding of the molecular mechanisms behind protein aggre-
gation and the mechanical and chemical properties of these insolu-
ble structures [22–26].
One classical strategy to overcome the aggregation phenome-
non has been the rational design of vulnerable protein regions
followed by protein engineering [27–30]. However, the
subsequent expression and purification of recombinant proteins
and the experimental assessment of their solubility is time-
consuming and precludes a high-throughput analysis of a signifi-
cant number of variants. This limitations pushed the development
of a series of predictive algorithms that allow to anticipate the
aggregation propensity of protein sequences and to virtually screen
for solubilizing mutations. To date, more than twenty 1st genera-
tion prediction algorithms are available as on-line servers or
packages. They are known as linear predictors, since they use the
amino acidic sequence of the protein as an input, and their generic
architecture involves a variable-in-length window, which slides over
the amino acidic sequence and averages, using different functions,
theoretical aggregation propensities of those residues inclosed in
the sequence frame. In spite of their pipeline similarity, each pre-
dictor take into account different variables when assessing theoret-
ical values of aggregation for a given amino acid. Specifically, this
include experimentally obtained data, theoretical physicochemical
properties, or a combination of these. Nonetheless, although their
performance is notable when predicting over disordered polypep-
tides or unfolded regions of a protein, they tend to overestimate the
aggregation propensities of globular and compact proteins
[31]. These regions usually emerge from the spatial approximation

428 Jordi Pujols et al.

Free download pdf