Segments

Modelling

Hardware

Software

KAMI

NeuraSyn works by measuring and analysing impedance change of aptamers on binding to a specific protein on the bacterial membrane. In this process, we used aptamers that were generated by whole cell selex. This is the most commonly used and preferred method for developing selective and specific aptamers against a specific bacteria of interest. This method does not specify the protein against which the aptamer is binding. This limits the potential of the technique as the aptamer cannot be then modified in-silico to increase its binding ability. With the receptor unknown, we are also unable to use aptamers as therapeutic agents and unable to check if the aptamer binds to other proteins of different origins due to structural homology.

We hence developed KAMI (Kwick Aptamer-based Motif Identification). This software takes in a list of polypeptide FASTA files from a list of proteins of interest. We also created an option of generating polypeptides that mimic natural proteins.

Generation of Polypeptides

We analysed proteins and did a literature survey to come up with rules for generating polypeptides that mimicked natural protein fragments.

20 Amino Acids: A R N D C Q E G H I L K M F P S T W Y V

Amino acids with α helix propensity (α pool): A L R M K Q E H

Amino acids with β sheet propensity (β pool): I V T F Y N W C M L S Q

Two generated sequences cannot be the same.
Same amino acid can be repeated at most 4 consecutive times in a peptide sequence (the same letter can be repeated at most 4 consecutive times) letters.
E.g. ATWITTAACCCCWWS allowed; KTDDDDWTIIIIIGE not allowed
The first amino acid (letter) of every sequence is chosen at random.
If an amino acid (letter) is from the α pool then the next amino acid (letter) has a higher probability of being from the α pool than β pool, i.e.,
Similarly, P(βi|βi−1) > P(αi|βi−1).
Here αi is a random amino acid (letter) from the α pool, at the ith position in the sequence.
The amino acid (letter) from within a pool (α or β) is chosen at random.
For every sequence we can generate:

Conversion of polypeptides into vectors (PseAAC)

PseAAC is a method for vectorizing proteins for several types of further processing. Either a single polypeptide is used, or the chains of many proteins are divided, and each chain is then vectorized separately. The amino acid frequency (AAC) and the correlation between the residues at a specific distance, or kth tier correlation, are both included in the vector that was created. The formula creates a vector with a length of 20 + lambda (the maximal value of k). For proteins that carry out comparable functions or exhibit structural homology, the correlation between the residues and the amino acid frequencies is a factor that is similar.

For every polypeptide of length L, we can write the primary sequence as the following vector:

We can then, based on the following formulae, convert the primary sequence vector into a numeric vector generated by PseAAC. Here, w is the weight factor, tau(k) is the k-th tier correlation factor which contains information about the sequence order correlation amongst k-th residues. Here, lambda is the maximum value of k.

Tau can be formulated as follows:

Here, phi(Ri) is the q-th function of Ri and gamma is the total number of functions under consideration.

The vectors produced by this method are used to train AI classifiers. We have categorised membrane proteins based on their structural similarity using the same vectorization method. We have proved the ability of this method to discover the optimal binding protein from an Aptamer synthesised from whole cell Selex for the creation of the programme KAMI (Kwick Aptamer binding Motif Identification). By identifying the ideal binding protein for aptamers of E. coli and S. typhimurium, we have further tested our programme. We initially download all of the FASTA data for membrane proteins from the RCSB PDB and sort them according to the chains they are made up of. Then, using this file as input, we do a K-mean clustering operation on the polypeptide chains using the KAMI programme. The cluster with the lowest binding energy after binding to the target aptamer is found by docking the cluster centres. This suggests that the cluster has the most effective binding protein. After segmenting the cluster, we dock randomly chosen sequences from each segment. Although the programme also includes a recommended number of cluster predictors, the number of clusters and segments is user-defined. The segment that exhibits the best binding is chosen, and it is continually separated to produce the best binding protein. This significantly lowers the number of docking operations required to determine the optimal binding protein and enables the identification of the protein that an aptamer binds to, despite the fact that it was made from whole cell selex. This gives rise to the possibility of improving the aptamer's binding capability through in-silico alterations as well as repurposing the aptamer.

General workflow of KAMI

tSNE plot of generated polypeptides. The lambda value chosen as 14, hence the vectors generated were 34 dimensional. The clustering was done using 10 clusters.

Link to software https://gitlab.igem.org/2022/software-tools/iiser-mohali

Steps to use KAMI:

Step 1.1: Uploading FASTA file (mention lambda and number of clusters)

Step 1.2: Generate random protein (mention length, lambda and number of clusters)

Step 2: Click run

Step 3: Download cluster centers and aptamers, begin docking.

References

Street AG, Mayo SL. Intrinsic beta-sheet propensities result from van der Waals interactions between side chains and the local backbone. Proc Natl Acad Sci U S A. 1999 Aug 3;96(16):9074-6. doi: 10.1073/pnas.96.16.9074.
PMID: 10430897; PMCID: PMC17734.
Pace CN, Scholtz JM. A helix propensity scale based on experimental studies of peptides and proteins. Biophys J. 1998 Jul;75(1):422-7. doi: 10.1016/s0006-3495(98)77529-0. PMID: 9649402; PMCID: PMC1299714.
Armstrong KM, Baldwin RL. Charged histidine affects alpha-helix stability at all positions in the helixby interacting with the backbone charges. Proc Natl Acad Sci U S A. 1993 Dec 1;90(23):11337-40.
doi: 10.1073/pnas.90.23.11337. PMID: 8248249; PMCID: PMC47977
Katti MV, Sami-Subbu R, Ranjekar PK, Gupta VS. Amino acid repeat patterns in protein sequences: their diversity and structural-functional implications. Protein Sci. 2000 Jun;9(6):1203-9.
doi: 10.1110/ps.9.6.1203.
PMID: 10892812; PMCID: PMC2144659.