Segments

Modelling

Hardware

Software

Modelling

Aptamers are isolated using a method called Systematic Evolution of Ligands by Exponential Enrichment (SELEX). It is an iterative process in which highly specific oligonucleotide molecules are selected, out of a pool of oligonucleotides, against a specific target (can be protein, toxin, organic compound, single cells, etc.). Aptamers isolated against particular microbial species using SELEX are generally not checked for their exact target in the cells. As a part of our dry lab work, we aimed to identify the possible targets of the aptamers being utilized in our model.

1) Literature Survey

We performed a comprehensive literature survey using various relevant keywords like ‘aptamer’, ‘bacteria’, ‘e.coli’, ‘salmonella’, ‘binding constant’, ‘impedance’, ‘SELEX’ to identify some best suited and selected aptamers against E.coli and Salmonella typhimurium. A similar search was also extrapolated for the penicillin molecule, which we utilized as a biomarker for the fungi Penicillium sp. The aptamers showing high specificity, maximum binding constant (Kd) and considerable change in impedance were ultimately taken into the pipeline. Binding constant provided us the idea of the stability and kinetics of the binding whereas the change in impedance is an important factor to be considered while utilizing the aptamer for the detection.

Organism	Aptamer	Aptamer Sequence
E. coli	P12-31	5’-CCCTCCGGGGGGGGGGGTCATCGGGAT ACCTGGTAAGGATACCCTCCGGGGGGGTC ATCGGGATACCTGGTAAGGATA-3’
Salmonella typhimurium	ST2P	5′-ATAGGAGTCACGACGACCAGAAAGTA ATGCCCGGTAGTTATTCAAAGATGAGTAGGAA AAGATATGTGCGTCTACCTCTTGACTAAT-3′
Penicillium sp. (Penicillin)	P8	5’-GGGAGGACGAAGCGGAACGAGATGTAGA TGAGGCTCGATCCGAATGCGTGACGTCTATC GGAATACTCGTTTTTACGCCTCAGAAGACA CGCCCGACA-3’

2) Aptamer 3D prediction

The screened aptamer sequences were subjected to 2-dimensional structure prediction by Mfold from the UNAFold web server (http://www.unafold.org/) amalgamation. The mfold algorithm works by predicting minimum free energy ΔG along with the minimum free energies that must contain any particular base pair (ri-rj). Any base pair which has a free energy no more than the minimum is selected and is plotted in an ‘energy plot’. Base pairs within this free energy increment are chosen and foldings that contain the chosen base pair are computed with a conditional minimum free energy. During our predictions, the folding temperature was fixed as 37℃. The dot bracket secondary structure notation (Vienna format) of aptamers obtained from Mfold was submitted in the 3dDNA web server (http://biophy.hust.edu.cn/new/3dRNA/create) for 3D structure prediction. It assembles the smallest secondary elements into hairpins or duplexes and then into complete tertiary structure.

E. Coli aptamer

S. Typhii aptamer

Penicillin aptamer

3) Protein Clusters

PseAAC is a technique used to vectorise proteins for various further processing. The process involves either taking a single polypeptide or separating the chains of different proteins and then vectorising them individually. The vector generated contains the amino acid frequency (AAC) and the correlation between the residues at a certain distance, ie. kth tier correlation. The algorithm generates a vector of length 20+lambda (the maximal value of k). The amino acid frequencies combined with the correlation between the residues is a factor that is similar for proteins that perform a similar function or show structural homology.

Consider a polypeptide, P of length L, and residues R1, R2..RL. We can therefore write the primary structure of the polypeptide as:

This is a list of the aminoacid names and to convert it into a numeric vector we use PseAAC.

This technique generates vectors used to train AI classifiers. We have used the same vectorisation technique to classify membrane proteins according to their structural similarities. For the development of the software KAMI (Kwick Aptamer binding Motif Identification), we have demonstrated the capabilities of this system to identify the best binding protein from an Aptamer generated from whole cell Selex. We have also verified our software by finding the best binding protein for E.coli and S.typhii aptamers. We first download all the membrane proteins FASTA files from RCSB PDB and separate them on the basis of individual chains. We then input this file into the KAMI software and perform a K-mean clustering of the polypeptide chains. The output of the software is a list of the cluster centres, the protein cluster text file and the tSNE plots of the clustering. We then dock the cluster centres to find the cluster that shows the lowest binding energy upon binding to the Aptamer of interest. This is indicative that the cluster contains the best binding protein. We then divide the cluster into segments and then dock randomly selected sequences from each segment. The number of clusters and the segments is user-defined, but an ideal number of cluster predictors is also installed in the software. The segement that shows the best binding is selected and repeatedly divided until the best binding protein is obtained. This drastically reduces the number of docking that needs to be performed to identify the best binding protein and makes it possible to identify the protein that an aptamer binds to, even though it is generated from whole cell selex. This opens the opportunity for in-silico modifications of the aptamer to enhance its binding capacity and also for repurposing the aptamer.

tSNE plot of E. coli membrane proteins

tSNE plot of S. typhi membrane proteins

4) Molecular Docking

Prior to docking, the water molecules and/or ions or small molecules found in retrieved PDB structure of protein were eliminated. Protein- DNA aptamer docking was then performed by using HEX 8.0.0 software in ‘Shape Only’ docking mode. HEX is the first Fourier transform (FFT)-based protein docking server that performs a 6D docking run using comparable resolution and scoring functions.

The docked complexes were sorted on the basis of binding energy scores. The complexes with the minimum binding energy were selected and further subjected to the dynamic simulation analysis.

We selected the best binding protein from E. Coli (2400+ membrane proteins) using the KAMI program. Salmonella was also subjected to the KAMI pipeline but all the membrane proteins were docked to verify programs' accuracy.

Docking results of all membrane proteins in S. typhi

Visualisation and ligand interactions

The docked complexes were visualised and studied for the interactions using PyMol and Discovery Studio. They were prompted for the visualisation of intermolecular hydrogen, hydrophobic and electrostatic interactions. The interactions were observed both at atomic and molecular levels.

5) Molecular Dynamic Simulations

To determine the binding efficacy and stability of the protein-DNA aptamer binding, molecular dynamics was performed using GROningen MAchine for Chemical Simulations (GROMACS 2022.3). The ‘AMBER99SB-ILDN protein, nucleic AMBER94’ force field was used for the simulation of the systems. The required positional restraint files (.itp; to restrain the positions of heavy atoms), processed structural files (.gro; containing all the atoms defined within the force field) and topology file (.top; containing all the information related to bonded and non-bonded parameters necessary to define the molecule within a simulation) were generated using the abovementioned force field.

Thereafter, the systems were placed in dodecahedron box with the size of 1.0 nm and filled with TIP3P water model. Once solvated, they were neutralized by adding the required amount of Na+ and Cl− ions. In order to ensure that the systems have no steric clashes or inappropriate geometry, they were relaxed through the process of energy minimization. Steepest Descent Algorithm was used to energy minimize the built systems with maximum of 50,000 steps and with energy tolerance of 1000 kJmol−1 nm−1. Bond lengths in the system were constrained using the LINCS algorithm and periodic boundary constraints were applied in all XYZ directions. Particle Mesh Ewald method (PME) was used to compute the long-range electrostatics in the systems with 0.16 nm Fourier spacing and 1.2 nm cut-off.

To optimize the systems further i.e, to bring them to the required simulating temperature and density, they were equilibrated using NVT and NPT ensemble, respectively. The V-rescale and Parrinello-Rahman coupling methods were employed for the equilibration of the NVT (at 300K) and NPT ensembles (pressure of 1 bar). The systems were equilibrated for 200ps for each step.The equilibrated systems, after releasing the positional restraints, were set up for final production run for 3 ns for the E. coli complex and for 5ns for the Salmonella complex.

Visualisation

VMD 1.9.3 software was used for visualising the systems at different steps of simulations as well as the final trajectory files. The final trajectories were corrected for the periodicity and then visualised (Videos below). All the graphs were plotted using the inbuilt GROMACS commands and visualised using Xmgrace.

For E.Coli: The fluctuations in the RMSD are in the range of 0.2-0.4 nm over a time period of 3 ns simulations. The fluctuations seem to get stable over the time period. However, to deduce the stability of complex firmly, there is still a need of simulating the system for a longer duration, which was, unfortunately, not possible given the system requirements.

E.Coli MDS

Open the image in a new tab at: E. Coli RMSD

For Salmonella: The fluctuations in the RMSD are in the range of 0.1-0.2 nm over a time period of 5 ns simulations. The RMSD values over the trajectories seem to undergo lesser fluctuations, indicating the preferrable possible conformations the complex is undertaking. A stable RMSD trajectory over the simulation time is indicating the stability of the docked complex.

Salmonella MDS

Open the image in a new tab at: Salmonella RMSD

Neurasyn takes in electrical data from Aptamers and reacts to it in real time. Neurons connected to the input are sensitive to the currents at the range of 70-100mV. We use this property to detect the electrical impedance change and quantify the amount of Receptor- Ligand interaction in our test strip. Higher impedance implies a lower current value input to that particular set of neurons. The chip is trained in such a way that when the current input is lowered the neurons are activated to send a signal. This is achieved by cross connecting the input and the output electrodes. We detect using electrical impedance data rather than fluorescent markers as it is of higher reliability and is a linear property of aptamers. This method of detection also allows us to overcome the problems faced due to faulty equipment and sensitivity issues in traditional detection methods. By detecting bacteria at a cellular level we also do not encounters the problem of cell-pixel loss and hence makes a platform that can easily be expanded to multi cohort detection.

To model such systems it is necessary to not only take into account the training of the neurons but also the overall activity of a group of neurons that are in the network. We use Nengo, a python library to code our hardware. This allows us to simulate large systems with ease and look at the global network properties of the network.

It is difficult to mathematically model the network using the firing property of a single neuron. We hence use a LIF (leaky integrate and fire) model for individual neurons in the network to reduce the compuation time. The Overall result however is a weighted sum of all the output neurons and changing the modelling parameter does not influence the output to a great degree. This was verified by modelling the same system using lower number of neurons in a HH model (Hodgkin Huxley) with alpha and beta coefficients for real neurons.

In our model we create two computational sections, the input and the output groups. These are encoded by Neuron Ensemble, each containing 1000 and 200 neurons respectively. These numbers are approximated from out wet lab images of the neural chip. A variation analysis of the neuron number parameter did not cause any significant change in the output. This depicts the robustness of the hardware.

The Input Ensemble (calc 1 and 2) is connected to a 3 stimuli of Dimension 1. The dimension refers to the number of variable stimuli in consideration. Since NeuraSyn detects 2 bacteria (Escherichia coli and Salmonella typhii) and 1 fungi (penicillium chrysogenum), we decided to use the same for our model. In the Nengo framework the number of dimensions of the input stimulus can be very easily changed. This makes the modelling for future implementation of multi cohort detection simple.

The output Ensemble was monitored for its global activity. The Dimension of the output Ensemble was kept as 2 to match the physical chip. This implied that the output Ensemble contained two subgroups of Neurons. We monitored the Firing activity and the global network property, namely the Voltage, for both the subgroups of the output Ensemble.

Response curves and turing curves

Output currents simulated

The program was initially run for 20 cycles to generate the random connections in the network. The connectivity plots were plotted and the network was visualised. We then ran out system with varing input strengths to look at the output neuron activity. After many iterations the neurons were trained to closely decode the input parameter values.

Evolution of the synaptic connection for different input neurons.

Segments

Modelling

Visualisation and ligand interactions

Visualisation

References