A Deep Neural Network (DNN) is a type of artificial neural network that contains multiple hidden layers between the input and
output layers. The layers in a DNN form a hierarchy of abstraction, with each layer capable of identifying
increasingly more complex and abstract features. In effect, each layer is building upon those it came before.
Neural networks can be viewed as “universal approximators”—they are flexible enough to learn how to
approximate a wide range of functions. Their hidden layers are a “black box” of sorts—we cannot easily
associate a given neuron/node with a given feature. A neural network must be trained to fit it to the
particular function we wish to approximate. A key part of this process involves iteratively refining
weights—numerical values that encapsulate the strength/importance of each connection between two
nodes/neurons. This is analogous to the concept of learning in human cognition, wherein links between neurons
become stronger with practice, reinforcing important pathways.
Deep learning is the key to
AlphaFold's success. In the case of proteins, if we can extract distinctive features in the comparison between
a sequence with an unknown structure and a database of sequences and their known structures, we can make
inferences about the unknown structure.
Now that we have looked at a broader overview of deep learning, let us now look at AlphaFold 2 in a more concrete sense. Alphafold's architecture can be roughly divided into three main stages:
In the first part of the process, a multiple-sequence alignment (MSA) is performed on the input sequence. MSA
compares the input sequence with similar sequences in a genetic database. We can use information from similar
sequences to determine coevolutionary relationships, from which we can draw inferences about the proximity of
those two coevolved amino acids (AA). If two AAs are close to each other, they likely mutated together in order to
preserve their overall structure.
Also part of the first stage is the template search, resulting in
the production of an initial pair representation. The template search identifies portions of candidate known
structures that may be shared with the input sequence. This “pair representation” is a coordinate-independent
way of representing the structure of a protein—it does not specify the position of each amino acid in 3D
space. Rather, it encodes the relationships between each amino acid.
Moving on to part two, in which we improve upon the two representations, we use Evoformer: the first
of the two neural networks in AlphaFold. At a high level, Evoformer works by passing information between two
transformer modules, with each refining its own representation based on the output of the other. A transformer
is a type of DNN that specializes in identifying relationships in serial data (such as a protein sequence!).
Finally, the DNN model generates 3D Cartesian coordinates based on the two representations and creates
a model along with a confidence value.
We used AlphaFold 2's capabilities to predict the structure of 5 different versions of gpT7 protein. We
did not go further and perform any biosimulation alongside it. We wanted to view the structures and predict
what might happen in our Wet Lab experiments. We also could answer questions as to why our Wet Lab experiments
concluded with certain results.
Some critiques made during the modeling were the confidence intervals
of the model. Alphafold's algorithm and internal structure depends on pre-exisiting data. Any unfolding in the
model or disordered structures result in a lower confidence score. The models are are displayed follow this
pattern, but are color coded by the secondary structure.