Model

Robust computational modeling was a core tenet of our project. Our modeling page describes our computational work in depth, explaining the core features of our model, its assumptions and limitations, data inputs, and insights gained. Ultimately, developing a computational model able to predict beneficial mutations for ligand-receptor binding enabled us to speed up the wet lab time of generating a new protein and showed the power of in silico approaches to reduce wet lab time and resources.

How Our Project Model is Unique

One component that differentiates our project from other protein engineering in the field is the method of mutagenesis. Traditionally, in the literature random mutagenesis is how people implement mutations which involve error prone PCR. While this seems straightforward, it is not cost effective and does not optimize the use of resources. Our project model steers away from this traditional approach by doing targeted mutagenesis applying a wide breadth of computational tools that were recently developed such as AlphaFold. Our project is intentional and specific in the mutations we create easily the burden of wet lab resources getting directed evolution of the Ste2 protein to recognize the kidney disease biosensor cystatin C in the most time efficient matter.

Moving on to narrowing down focus, our project is inspired by the paper by Adeniran et al. 2018, which outlines the step-wise random mutagenesis specifically using wet-lab techniques to evolve Ste2, the yeast membrane receptor to detect Cystatin C.

In pink originally is Ste2, and the researcher implemented mutations to evolve it as a diagnostic receptor to recognize the biomarker. Our project applies computational tools in addition to wet lab techniques outlined in the article to speed up directed evolution for biological applications. Whereas this foundational article utilizes random mutagenesis, we will apply directed or targeted mutagenesis as a building block and foundational advance for directed evolution for disease detection receptors. The wetlab team will validate the computationally determined mutations through fluorescence readout.

Computational Tools Used

Crystal Structure of Ste2 alpha pheromone complex from RCSB Protein Data Bank (Velazhahan and Tate 2021)

ColabFold (Mirdita et al.) - input amino acid sequence and outputs pdb structure. Based on the research by Tsaban et al. (2022), ColabFold could be used for the prediction of receptor ligand docking by complexing the receptor and ligand.

Prodigy (Vangone et al. 2015 and Xue et al. 2016) - predicts which residues are interacting with each other in the cystatin Ste2 and alpha factor Ste2 complex and determines the associated binding energies (how well the receptor and ligand interact with each other). Our project desire to minimize these energies

Project Modeling Part 1: Determining mutations through BLOSUM 62 matrix

Our goal is to determine mutations that will optimize the binding or lower the binding energy between Ste2 and biomarker cystatin C so that Ste2 recognizes the biomarker.

Workflow:

The first step in our pipeline was to come up with a list of candidate mutations by identifying the interacting residues using Prodigy/PYMOL. We are identifying the interacting residues to determine if mutating a residue will optimize binding by reducing the binding energy.

The below PyMOL shows the 30 interacting residues (colored blue) We determined these interacting residues utilizing Prodigy. We inputted the Ste2 and cystatin C PyMOL file and Prodigy outputted which residues interact with each other.

The next step utilizes the BLOSUM 62 matrix to determine what to mutate these residues to. BLOSUM 62 is a quantitative tool to show what amino acid is most similar to another. The idea behind this mutation strategy of our project model is to mutate the interacting residue to the most similar amino acid to see if a slight change will optimize the binding by lowering the binding energy.

The BLOSUM 62 matrix is shown below.

For example, Aspartic Acid is most similar to Glutamic Acid (-4 is the lowest value in that column), which is supported by their structural similarities.

We implemented these most similar amino acids for the 30 interacting residues through the mutate command in PyMOL. After these mutations are in the PyMOL structures, we inputted the pdb PyMOL file into Prodigy to evaluate the change in binding energy.

This graph represents the change in binding energy for these residues. The Y axis is the binding energies and the x axis represents the mutations of each interacting residue. For example, for residue 112, we mutated it to Serine and saw a decrease in binding energy from -8.5 to -9. This indicates that the mutation led to a favorable effect or better binding between Ste2 and cystatin C.

This table below shows the results in a linear format with the 3 mutations leading to the greatest decrease in binding energy highlighted in yellow.

To design the appropriate primers to implement these mutations, we used the codon table for yeast from Bennetzen and Hall, 1981. This is to determine what to mutate each codon triplet of the interacting residue to get the desired amino acid mutation.

As a part of our iterative workflow, we gave these mutations and primer design to the wetlab team to determine if the mutation optimizes the binding between Ste2 and cystatin C by evaluating the strength in fluorescence.

Iteration 2:

The second iteration utilizes evolutionary information to “Mutate to the most similar evolutionary amino acid”

We determined the evolutionary relationships using the tree below:

We defined the most evolutionarily related amino acid using the pairwise alignment to reference structure. We looked at the residues in each receptor (reference and evolutionarily related) that was involved in binding to both Alpha-Factor and Cystatin-C. For the evolutionary related receptors, we mapped their corresponding binding residues back onto the reference Ste2 receptor.

The “binding” residues were defined as amino acids in the reference structure that either bind to both Alpha-Factor and Cystatin-C or map to evolutionary-related amino acids that bind to both Alpha-Factor and Cystatin-C. In this way, we were able to increase the range of residues that could be defined as binding. As depicted in the picture below, the residues that were candidates for mutation were those that were classified as “binding” residues as well as those that had a high Shannon entropy. The extra criteria of Shannon entropy was included to target only residues that had an evolutionary history of high variability. Our rationale for excluding the low Shannon entropy amino acids was that those residues were most likely important for the overall structure of the protein. Although we could mutate these important amino acids as well, the computational predictions most likely would not take into account the potential dramatic structural changes that would occur due to changing these specific residues.

Finally, these target residues were mutated to all its evolutionary related amino acids, and the binding energy was determined using the Prodigy software.

This graph shows the binding energies associated with the mutations using the most evolutionary related amino acid compared to the wild-type Ste2 receptor, we were able to get residue mutations that significantly decreased the binding energy compared to using BLOSUM 62.