.
Many Puccinia spp. proteins have been sequenced and stored in databases. However, to a large extent, the function and structures of these proteins remains uncharacterised. To identify viable protein targets, our project deployed Protein Sequence Alignment and protein-protein interaction networking methods. Following this, a peptide-based inhibitor was designed using known protein binding proteins as templates. The peptide was produced by identifying the active site regions in known protein binding proteins was tested using a Protein-Protein Docking Simulation. Furthermore, wet lab used literature to identify potential targets in the fungus and modelling these proteins in AlphaFold.
.
.
We began our modelling process by screening existing Puccinia spp. literature, for viable proteins to target. This involved reviewing the Puccinia graminis proteome through literature and entries from the Uniprot database (UniProt, 2022). We subsequently discovered that the Puccinia graminis proteome was highly uncharacterised - with only 15 of the 15808 proteins being reviewed (UniProt, 2022). Upon further analysis, these 15 proteins were concluded to be highly conserved amongst other species. Consequently, we ruled them out as potential targets – due to concerns over off-target effects.
Subsequently, viable protein targets would need to be screened using the unreviewed section of UniProt. This presented a major challenge, due to the lack of characterised protein functions and the sheer volume of Puccinia graminis proteins. Additionally, many of these proteins were highly conserved. Thus, to aid our screening, we devised a pipeline using BLASTp and GENEMANIA to annotate unreviewed protein sequences (NCBI, 2022, Warde-Farley et al., 2010). This would allow us to assign biological functions to each unreviewed protein, using reviewed proteins from other species.
To aid our screening of the Puccinia graminis proteome, we decided to use BLASTp (NCBI, 2022). This allowed us to compare our unreviewed Puccinia graminis proteins, with reviewed protein entries from other species. Subsequently, we assumed that proteins with >90% identity is equivalent to the annotated protein counterparts. This will provide more proteins to analyse the importance of the protein to Puccinia graminis
The first hurdle we faced was the sheer volume of proteins to be reviewed. Manually inputting these proteins on the BLASTp webpage would be extremely inefficient. As a result, we started writing a python script to run the BLAST and analyse the resulting data. This would allow us to identify similar protein sequences and their similarity scores.
However, the NCBI servers were not designed to run many queries quickly for scraping (NCBI, 2022). So, we devised a plan to break the list of proteins to search into small lists and have 20 computers running the script to run the blast. This would involve writing scripts to run the BLAST, as well as resume the BLAST at the correct protein if the process was interrupted. Time estimates for the blasting were produced by running a BLASTp on the NCBI site. This approximately took 1.5-2 minutes. Consequently, it would take 20 computers continuously running the script, 26 hours to finish blasting the entire proteome.
However, when the script ran BLAST, it was unable to produce results as quickly as searching from the NCBI sites. Further research into NCBI, showed that the servers were set to prioritise users manually searching and servers only ran at full performance between certain times. These were significant issues as the script’s blast search for a single protein would take 10-20 minutes. Ultimately, this meant that web scraping was not viable.
To work around our web scraping hurdles, we were fortunate enough to gain access to Australia’s HPC Gadi. This provided us the computing power to conduct an local BLASTp. However, due to GENEMANIA’s organism limitations we decided to limit our BLASTp search of Puccinia graminis’s unannotated proteome against reviewed proteomes of specific organisms. Only reviewed proteomes were selected because unreviewed proteins were unavailable to query in GENEMANIA.
Following this, the results of our BLASTp were exported as a text file. This needed to be analysed to identify proteins with high % identity to our Puccinia graminis proteins (query proteins). Thus, a script was built to extract a list of proteins that shared >90% identity with our Puccinia graminis proteins.
This produced a list of 174 protein sequences from the dataset of every modelled species’ reviewed proteome sequence (83495 protein sequences). These proteins were queried on GENEMANIA against 9 available species database and produced small protein-protein networks.
GENEMANIA takes the queried protein names and produces a protein-protein interaction network based on its databases. Each node represents a gene coding for the protein queried and the edges represent known relationships from GENEMANIA’s databases.
We use GENEMANIA's protein-protein interaction network of model species to produce a protein-protein interaction network of Puccinia graminis proteins. This data will be used to identify a protein that is vital to Puccinia graminis proliferation.
To screen the best protein target to disrupt Puccinia graminis we took the genes with the greatest number of edges, because inhibiting that protein could mean disrupting the other proteins connected by edges. We then queried these genes on Uniprot to ensure that the gene is not highly conserved in Homo sapiens or other non-Puccinia graminis organisms. These were repeated for all species databases on GENEMANIA. However, we were unable to find any non-highly conserved proteins, meaning we couldn’t find any ideal protein targets.
Using model species databases from GENEMANIA is not an accurate model of Puccinia graminis protein interactions. This is because the data used to generate protein-protein interaction networks is from other experiments conducted on other species. This means our model assumes that GENEMANIA's protein-protein interaction networks between Puccinia graminis and GENEMANIA's model species. Furthermore, the assumptions of this model means that any protein target identified will have a high risk of off-target effects since the model is based on data from other species. Also, it is highly likely that the protein-protein interaction network produced is mostly consisting of highly conserved genes between distantly related organisms, which makes identifying a unique protein target for Puccinia graminis highly unlikely.
The use of a model species protein-protein network interaction model to model Puccinia graminis's protein-protein interaction is valid because the model produced is dependent on how closely related both species are. This is evident when comparing the network of Saccharomyces cerevisiae and Escherichia coli, as Escherichia coli was unable to produce a network, due to insufficient data on the query proteins; whereas the Saccharomyces cerevisiae network produced the largest network.
We chose to closely analyse the Saccharomyces cerevisiae network, because this species was the most closely related to Puccinia graminis Using these genes, BLASTp and Uniprot were used to determine whether the protein it produced was highly non-conserved within Puccinia graminis or within Fungi. This process filtered out all proteins except for JJJ2 and ZUO1, however there has been no evidence in literature to support these proteins as targets for Puccinia graminis
.
.
Template peptides were identified by screening literature for proteins known to bind to our target protein. Following this, Latch Bio’s AlphaFold was used to construct the 3D structure of the known peptide (Jumper et al., 2021). We modelled several variations of the known peptide, these included: an unaltered peptide and a variation fused with a GGSG linker and mVenus fluorescent protein. Additionally, we modelled our protein target fused with an mCerulean fluorescent protein. This was modelled to allow for FRET analysis between our protein target/peptide and due to a lack of resolved structures.
After generating the protein structures, we used ClusPro to perform protein-peptide docking simulations (Porter et al., 2017). This modelled potential binding sites between our protein target and peptide inhibitor. The top ten ranked models were used to identify potential active site regions on Pymol. Active regions were identified by selecting peptide residues which were within 5 Angstroms of the receptor. The selected residues were validated by checking for polar contacts between residues and referencing AlphaFold’s pLDDT values (predicted accuracy) for both receptor and ligand structures. Additionally, we incorporated amino acids which were part of the secondary structures formed by our selected residues, as well as those influencing the structure of the selected residues.
PyMol analysis data was used to identify overlaps in active sites between the top 10 ranked models. We subsequently produced 5 unique potential active sites from the template peptide sequence. Each variation of the active site was used as a foundation to design two unique peptide sequences.
Furthermore, a peptide with only the linker and fluorescent protein was produced to compare its docking simulations to the engineered peptides; this is to observe the binding affinity of the linker and fluorescent protein itself.
Given that the results of AlphaFold and ClusPro can be unreliable, we must test whether the predicted active sites are accurate. This will be achieved by testing peptide-protein binding affinity in the lab. For this purpose, we designed a peptide specifically based on the predicted active site. To produce a peptide that binds to the target protein and prevent other proteins from binding to it, we also created a second peptide. It was designed by substituting non-active sites regions, for parts of sequences with high similarity to the template protein and known 3D structures. This reference to known structures will allow us to better understand the structure and binding potential of our peptides. Thus, using the data produced in the lab and further Alphafold structure prediction and ClusPro docking prediction, we can assess the binding affinity of our peptides and better understand the optimal structure and active site of our peptide.
.
.
With such a scarcely studied organism, Puccinia spp. is a mystery. As such, it was essential to our project to extensively model and predict the structure of the proteins involved in the fungal pathogenesis. Though the dry lab team was responsible for scouring genome databases, our wet lab team were tasked with consulting the literature in order to find these somewhat characterised effector proteins. In doing so, we surmised a list of potential targets in the fungus that we could hit using an inhibitory peptide. In broadening our approach, we also considered targeting plant proteins involved in the pathogenesis and by reducing their activity, the plant may be stimulated to mount an immune response against the fungus.
Target |
Description of function |
Structure characterisation |
References |
---|---|---|---|
PstSTE12 |
Found in the Puccinia striiformis f. sp. tritici species Acts a transcription factor that regulates the expression the of proteins involved in the invasive growth of the fungus on the wheat Localised in the nucleus of the fungal cell |
Gene contains an open reading frame of 2637 bp Protein structure involves 879 amino acids Motifs found in the protein:
|
(Zhu et al., 2017) |
PstSCR1 |
Found in the Puccinia striiformis f. sp. tritici species Small secreted cysteine-rich effector protein Suppresses host immunity, mediates nutrient uptake and subsequently enables parasitism It is predicted that the effector is only functional (in triggering plant immune response) if secreted into plant apoplast (the space between the cell wall and the cell membrane) |
Mature transcript contains 488 bp Protein has a chain of 116 amino acids Motifs found in the protein:
|
(Dagvadorj et al., 2017) |
PgtSR1 |
Found in the Puccinia graminis species Involved in RNA silencing in plants to impede plant defences by altering the abundance of small RNAs that serve as defence regulators Promotes susceptibility to multiple pathogens and partially suppress cell death triggered by multiple plant resistance proteins Localised in the plant cytoplasm and nucleus |
Protein contains a 145 long amino acid chain |
(Yin et al., 2019) |
PEC6 |
Found in the Puccinia striiformis f. sp. tritici species Effector protein that suppresses pattern-triggered immunity in the host wheat plant Targets adenosine kinases (ADKs) and may affect metabolism regulation, cytokinin interconversion and methyl transfer reactions to favour fungal growth Localised in the nucleus and cytoplasm of plant cell |
Small and cysteine-rich protein of a total length of 66 amino acids Contains a signal peptide at the N-terminus of 22 amino acids Shown to interact with the C-terminus of ADK in yeast |
(Liu et al., 2016) |
Pst_12806 |
Found in the Puccinia striiformis f. sp. tritici species Upregulated during infection, and its knockdown reduces fungal growth and development, likely due to increased ROS accumulation Translocates into chloroplasts and affects chloroplast function Interacts with the C-terminal Rieske domain of the wheat TaISP protein (a component of the cytochrome b6-f complex that connects the photosystems) |
146 amino acids in length in the mature protein Contains a signal peptide and a transit peptide at the N-terminus |
(Xu et al., 2019) |
PsHXT1 |
Found in the Puccinia striiformis f. sp. tritici species Hexoses are a major form of sugar utilised by this obligate biotrophic fungus Involved in nutrient uptake to sustain fungal growth and development Indispensable for establishing the fungal–wheat interaction Localised to the plasma membrane of the fungal cell |
Protein contains a length of 551 amino acids Predicted to have 12 transmembrane domains |
(Chang et al., 2020) |
Pgt-IaaM |
Found in the Puccinia graminis species This tryptophan 2-monooxygenase is involved in the synthesis of the auxin precursor indole-3-acetamide (IAM) Induces wheat plants to accumulate auxin in infected leaf tissue Transient silencing of the gene in infected wheat plants indicated that it was required for full pathogenicity Expressed in haustoria cells in infected plant tissue |
Protein has a predicted length of 588 amino acids |
(Yin et al., 2014) |
PGTG_10537.2/VPS9 complex |
PGTG_10537.2 contains fibronectin type III and breast cancer type 1 susceptibility protein domains VPS9 is a vacuolar protein sorting-associated protein 9 with a coupling of ubiquitin to endoplasmic reticulum degradation domain Suggested that these proteins exist as a complex in vivo Initiates the hypersensitive response in the host, causing significant damage to the leaf Localised to the cytoplasm of the host plant cell |
PGTG_10537.2 is predicted to be 818 amino acids long VPS9 is predicted to be 744 amino acids long |
(Nirmala et al., 2011) |
TaDIR1-2 |
Defective in Induced Resistance 1 (DIR1-2) is a lipid transfer protein in wheat Upon immune response induction, DIR1 moves from locally infected to distant uninfected leaves to activate defence priming Knocking down the expression of TaDIR1-2 increased wheat resistance to Puccinia, accompanied by hypersensitive response, increased accumulation of H2O2 and salicylic acid, increased expression of TaPR1, TaPR2, TaPAL, and TaNOX, and decreased expression of two reactive oxygen species (ROS) scavenging genes TaCAT and TaSOD TaDIR1-2 acts as a negative regulator in wheat resistance to Puccinia by modulating ROS and/or salicylic acid-induced signalling Localised in the cytoplasm and the cell membrane of wheat mesophyll protoplast |
Protein is approximately 100 amino acids in length Motifs found in the protein:
| (Ahmed et al., 2017) |
Though the gene and/or amino acid sequences were known for these targets, there was limited understanding of their protein structure. Subsequently, to elucidate this mystery, our wet lab team modelled a prediction of these structures based on the amino acid sequences using the AlphaFold v2.0 software. The results of these modelling practices are shown below. In doing this, we selected and proceeded with the targets that gave the most confident prediction.
Subsequently, our team had arrived at the decision that we will be testing the molecular interactions of the fungal targets and our designed peptide candidates using a Förster resonant energy transfer (FRET) assay. In doing so, we were tasked with conjugating the targets with a fluorescent protein that can be used to give an output signal when the fungal target and peptide are in close proximity. As such, we modelled, using AlphaFold v2.0, the structure of the fungal target separated by a short linker sequence attached to mCerulean3, the fluorescent protein that was available to us in our lab, as shown below. This was to predict any structural changes that may occur with such a big change, which would severely impact the true binding of our peptide. To lessen this impact, we fused the fluorescent tag on whichever terminus of the protein was protruding outwards. Though this still resulted in some changes for a few of the protein targets, some showed no apparent conformational changes. And so, we proceeded to synthesise and test these selected targets conjugated with mCerulean3 that remained the same conformation in the lab.
This approach was used in conjunction with the dry lab approach to reach a common goal in constructing a more resolved understanding of the cereal rust pathogenesis and how our solution fits into it. Here, without this understanding of the fungal target structures, it would have been near impossible to design an inhibitory peptide suited to bind to them.
.
.
.
.
https://doi.org/10.5281/zenodo.7178227
https://doi.org/10.5281/zenodo.7178898
.
.