Loading Neuron Loading Neuron Loading Neuron

Segments

Chevron Forward Modelling
Chevron Forward Hardware
Chevron Forward Software

Modelling

Aptamers are isolated using a method called Systematic Evolution of Ligands by Exponential Enrichment (SELEX). It is an iterative process in which highly specific oligonucleotide molecules are selected, out of a pool of oligonucleotides, against a specific target (can be protein, toxin, organic compound, single cells, etc.). Aptamers isolated against particular microbial species using SELEX are generally not checked for their exact target in the cells. As a part of our dry lab work, we aimed to identify the possible targets of the aptamers being utilized in our model.

1) Literature Survey

We performed a comprehensive literature survey using various relevant keywords like ‘aptamer’, ‘bacteria’, ‘e.coli’, ‘salmonella’, ‘binding constant’, ‘impedance’, ‘SELEX’ to identify some best suited and selected aptamers against E.coli and Salmonella typhimurium. A similar search was also extrapolated for the penicillin molecule, which we utilized as a biomarker for the fungi Penicillium sp. The aptamers showing high specificity, maximum binding constant (Kd) and considerable change in impedance were ultimately taken into the pipeline. Binding constant provided us the idea of the stability and kinetics of the binding whereas the change in impedance is an important factor to be considered while utilizing the aptamer for the detection.

Organism Aptamer Aptamer Sequence
E. coli P12-31 5’-CCCTCCGGGGGGGGGGGTCATCGGGAT ACCTGGTAAGGATACCCTCCGGGGGGGTC ATCGGGATACCTGGTAAGGATA-3’
Salmonella typhimurium ST2P 5′-ATAGGAGTCACGACGACCAGAAAGTA ATGCCCGGTAGTTATTCAAAGATGAGTAGGAA AAGATATGTGCGTCTACCTCTTGACTAAT-3′
Penicillium sp. (Penicillin) P8 5’-GGGAGGACGAAGCGGAACGAGATGTAGA TGAGGCTCGATCCGAATGCGTGACGTCTATC GGAATACTCGTTTTTACGCCTCAGAAGACA CGCCCGACA-3’

2) Aptamer 3D prediction

The screened aptamer sequences were subjected to 2-dimensional structure prediction by Mfold from the UNAFold web server (http://www.unafold.org/) amalgamation. The mfold algorithm works by predicting minimum free energy ΔG along with the minimum free energies that must contain any particular base pair (ri-rj). Any base pair which has a free energy no more than the minimum is selected and is plotted in an ‘energy plot’. Base pairs within this free energy increment are chosen and foldings that contain the chosen base pair are computed with a conditional minimum free energy. During our predictions, the folding temperature was fixed as 37℃. The dot bracket secondary structure notation (Vienna format) of aptamers obtained from Mfold was submitted in the 3dDNA web server (http://biophy.hust.edu.cn/new/3dRNA/create) for 3D structure prediction. It assembles the smallest secondary elements into hairpins or duplexes and then into complete tertiary structure.

E. Coli aptamer

S. Typhii aptamer

Penicillin aptamer

3) Protein Clusters

PseAAC is a technique used to vectorise proteins for various further processing. The process involves either taking a single polypeptide or separating the chains of different proteins and then vectorising them individually. The vector generated contains the amino acid frequency (AAC) and the correlation between the residues at a certain distance, ie. kth tier correlation. The algorithm generates a vector of length 20+lambda (the maximal value of k). The amino acid frequencies combined with the correlation between the residues is a factor that is similar for proteins that perform a similar function or show structural homology.

Consider a polypeptide, P of length L, and residues R1, R2..RL. We can therefore write the primary structure of the polypeptide as:


This is a list of the aminoacid names and to convert it into a numeric vector we use PseAAC.


This technique generates vectors used to train AI classifiers. We have used the same vectorisation technique to classify membrane proteins according to their structural similarities. For the development of the software KAMI (Kwick Aptamer binding Motif Identification), we have demonstrated the capabilities of this system to identify the best binding protein from an Aptamer generated from whole cell Selex. We have also verified our software by finding the best binding protein for E.coli and S.typhii aptamers. We first download all the membrane proteins FASTA files from RCSB PDB and separate them on the basis of individual chains. We then input this file into the KAMI software and perform a K-mean clustering of the polypeptide chains. The output of the software is a list of the cluster centres, the protein cluster text file and the tSNE plots of the clustering. We then dock the cluster centres to find the cluster that shows the lowest binding energy upon binding to the Aptamer of interest. This is indicative that the cluster contains the best binding protein. We then divide the cluster into segments and then dock randomly selected sequences from each segment. The number of clusters and the segments is user-defined, but an ideal number of cluster predictors is also installed in the software. The segement that shows the best binding is selected and repeatedly divided until the best binding protein is obtained. This drastically reduces the number of docking that needs to be performed to identify the best binding protein and makes it possible to identify the protein that an aptamer binds to, even though it is generated from whole cell selex. This opens the opportunity for in-silico modifications of the aptamer to enhance its binding capacity and also for repurposing the aptamer.

tSNE plot of E. coli membrane proteins

tSNE plot of S. typhi membrane proteins

4) Molecular Docking

Prior to docking, the water molecules and/or ions or small molecules found in retrieved PDB structure of protein were eliminated. Protein- DNA aptamer docking was then performed by using HEX 8.0.0 software in ‘Shape Only’ docking mode. HEX is the first Fourier transform (FFT)-based protein docking server that performs a 6D docking run using comparable resolution and scoring functions.

The docked complexes were sorted on the basis of binding energy scores. The complexes with the minimum binding energy were selected and further subjected to the dynamic simulation analysis.

We selected the best binding protein from E. Coli (2400+ membrane proteins) using the KAMI program. Salmonella was also subjected to the KAMI pipeline but all the membrane proteins were docked to verify programs' accuracy.

Docking results of all membrane proteins in S. typhi

Visualisation and ligand interactions

The docked complexes were visualised and studied for the interactions using PyMol and Discovery Studio. They were prompted for the visualisation of intermolecular hydrogen, hydrophobic and electrostatic interactions. The interactions were observed both at atomic and molecular levels.

5) Molecular Dynamic Simulations

To determine the binding efficacy and stability of the protein-DNA aptamer binding, molecular dynamics was performed using GROningen MAchine for Chemical Simulations (GROMACS 2022.3). The ‘AMBER99SB-ILDN protein, nucleic AMBER94’ force field was used for the simulation of the systems. The required positional restraint files (.itp; to restrain the positions of heavy atoms), processed structural files (.gro; containing all the atoms defined within the force field) and topology file (.top; containing all the information related to bonded and non-bonded parameters necessary to define the molecule within a simulation) were generated using the abovementioned force field.

Thereafter, the systems were placed in dodecahedron box with the size of 1.0 nm and filled with TIP3P water model. Once solvated, they were neutralized by adding the required amount of Na+ and Cl− ions. In order to ensure that the systems have no steric clashes or inappropriate geometry, they were relaxed through the process of energy minimization. Steepest Descent Algorithm was used to energy minimize the built systems with maximum of 50,000 steps and with energy tolerance of 1000 kJmol−1 nm−1. Bond lengths in the system were constrained using the LINCS algorithm and periodic boundary constraints were applied in all XYZ directions. Particle Mesh Ewald method (PME) was used to compute the long-range electrostatics in the systems with 0.16 nm Fourier spacing and 1.2 nm cut-off.

To optimize the systems further i.e, to bring them to the required simulating temperature and density, they were equilibrated using NVT and NPT ensemble, respectively. The V-rescale and Parrinello-Rahman coupling methods were employed for the equilibration of the NVT (at 300K) and NPT ensembles (pressure of 1 bar). The systems were equilibrated for 200ps for each step.The equilibrated systems, after releasing the positional restraints, were set up for final production run for 3 ns for the E. coli complex and for 5ns for the Salmonella complex.

Visualisation

VMD 1.9.3 software was used for visualising the systems at different steps of simulations as well as the final trajectory files. The final trajectories were corrected for the periodicity and then visualised (Videos below). All the graphs were plotted using the inbuilt GROMACS commands and visualised using Xmgrace.

For E.Coli: The fluctuations in the RMSD are in the range of 0.2-0.4 nm over a time period of 3 ns simulations. The fluctuations seem to get stable over the time period. However, to deduce the stability of complex firmly, there is still a need of simulating the system for a longer duration, which was, unfortunately, not possible given the system requirements.

E.Coli MDS

Open the image in a new tab at: E. Coli RMSD

For Salmonella: The fluctuations in the RMSD are in the range of 0.1-0.2 nm over a time period of 5 ns simulations. The RMSD values over the trajectories seem to undergo lesser fluctuations, indicating the preferrable possible conformations the complex is undertaking. A stable RMSD trajectory over the simulation time is indicating the stability of the docked complex.

Salmonella MDS

Open the image in a new tab at: Salmonella RMSD

References

  1. https://2021.igem.org/Team:Rochester/Experiments
  2. Nengo AI
  3. https://ezgif.com/maker
  4. https://www.molsoft.com/activeicmjs.html
  5. https://pypi.org/project/brian/
  6. http://www.unafold.org/
  7. http://biophy.hust.edu.cn/new/3dRNA/create
  8. GROMACS 2022.3