Many Puccinia spp. proteins have been sequenced and stored in databases. However, to a large extent, the function and structures of these proteins remains uncharacterised. To identify viable protein targets, our project deployed Protein Sequence Alignment and protein-protein interaction networking methods. Following this, a peptide-based inhibitor was designed using known protein binding proteins as templates. The peptide was produced by identifying the active site regions in known protein binding proteins was tested using a Protein-Protein Docking Simulation. Furthermore, wet lab used literature to identify potential targets in the fungus and modelling these proteins in AlphaFold.

.

Protein Target Identification

We began our modelling process by screening existing Puccinia spp. literature, for viable proteins to target. This involved reviewing the Puccinia graminis proteome through literature and entries from the Uniprot database (UniProt, 2022). We subsequently discovered that the Puccinia graminis proteome was highly uncharacterised - with only 15 of the 15808 proteins being reviewed (UniProt, 2022). Upon further analysis, these 15 proteins were concluded to be highly conserved amongst other species. Consequently, we ruled them out as potential targets – due to concerns over off-target effects.

Subsequently, viable protein targets would need to be screened using the unreviewed section of UniProt. This presented a major challenge, due to the lack of characterised protein functions and the sheer volume of Puccinia graminis proteins. Additionally, many of these proteins were highly conserved. Thus, to aid our screening, we devised a pipeline using BLASTp and GENEMANIA to annotate unreviewed protein sequences (NCBI, 2022, Warde-Farley et al., 2010). This would allow us to assign biological functions to each unreviewed protein, using reviewed proteins from other species.

BLASTp – Web scraping from NCBI

To aid our screening of the Puccinia graminis proteome, we decided to use BLASTp (NCBI, 2022). This allowed us to compare our unreviewed Puccinia graminis proteins, with reviewed protein entries from other species. Subsequently, we assumed that proteins with >90% identity is equivalent to the annotated protein counterparts. This will provide more proteins to analyse the importance of the protein to Puccinia graminis

The first hurdle we faced was the sheer volume of proteins to be reviewed. Manually inputting these proteins on the BLASTp webpage would be extremely inefficient. As a result, we started writing a python script to run the BLAST and analyse the resulting data. This would allow us to identify similar protein sequences and their similarity scores.

However, the NCBI servers were not designed to run many queries quickly for scraping (NCBI, 2022). So, we devised a plan to break the list of proteins to search into small lists and have 20 computers running the script to run the blast. This would involve writing scripts to run the BLAST, as well as resume the BLAST at the correct protein if the process was interrupted. Time estimates for the blasting were produced by running a BLASTp on the NCBI site. This approximately took 1.5-2 minutes. Consequently, it would take 20 computers continuously running the script, 26 hours to finish blasting the entire proteome.

However, when the script ran BLAST, it was unable to produce results as quickly as searching from the NCBI sites. Further research into NCBI, showed that the servers were set to prioritise users manually searching and servers only ran at full performance between certain times. These were significant issues as the script’s blast search for a single protein would take 10-20 minutes. Ultimately, this meant that web scraping was not viable.

Fig 1. NCBI Usage Guidelines containing rules for high volume users

BLASTp – Local BLASTp on HPC Gadi

To work around our web scraping hurdles, we were fortunate enough to gain access to Australia’s HPC Gadi. This provided us the computing power to conduct an local BLASTp. However, due to GENEMANIA’s organism limitations we decided to limit our BLASTp search of Puccinia graminis’s unannotated proteome against reviewed proteomes of specific organisms. Only reviewed proteomes were selected because unreviewed proteins were unavailable to query in GENEMANIA.

Following this, the results of our BLASTp were exported as a text file. This needed to be analysed to identify proteins with high % identity to our Puccinia graminis proteins (query proteins). Thus, a script was built to extract a list of proteins that shared >90% identity with our Puccinia graminis proteins.

This produced a list of 174 protein sequences from the dataset of every modelled species’ reviewed proteome sequence (83495 protein sequences). These proteins were queried on GENEMANIA against 9 available species database and produced small protein-protein networks.

Fig 2. List of GENEMANIA organisms: Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus, Homo sapiens and Saccharomyces cerevisiae.

GENEMANIA

GENEMANIA takes the queried protein names and produces a protein-protein interaction network based on its databases. Each node represents a gene coding for the protein queried and the edges represent known relationships from GENEMANIA’s databases.
We use GENEMANIA's protein-protein interaction network of model species to produce a protein-protein interaction network of Puccinia graminis proteins. This data will be used to identify a protein that is vital to Puccinia graminis proliferation.

Fig 3. GENEMANIA network of Saccharomyces cerevisiae

Screening Proteins using GENEMANIA

To screen the best protein target to disrupt Puccinia graminis we took the genes with the greatest number of edges, because inhibiting that protein could mean disrupting the other proteins connected by edges. We then queried these genes on Uniprot to ensure that the gene is not highly conserved in Homo sapiens or other non-Puccinia graminis organisms. These were repeated for all species databases on GENEMANIA. However, we were unable to find any non-highly conserved proteins, meaning we couldn’t find any ideal protein targets.

Fig 4. GENEMANIA network of Saccharomyces cerevisiae with circled central nodes

Analysis Results

Model Issues

Using model species databases from GENEMANIA is not an accurate model of Puccinia graminis protein interactions. This is because the data used to generate protein-protein interaction networks is from other experiments conducted on other species. This means our model assumes that GENEMANIA's protein-protein interaction networks between Puccinia graminis and GENEMANIA's model species. Furthermore, the assumptions of this model means that any protein target identified will have a high risk of off-target effects since the model is based on data from other species. Also, it is highly likely that the protein-protein interaction network produced is mostly consisting of highly conserved genes between distantly related organisms, which makes identifying a unique protein target for Puccinia graminis highly unlikely.

Validity of the Model

The use of a model species protein-protein network interaction model to model Puccinia graminis's protein-protein interaction is valid because the model produced is dependent on how closely related both species are. This is evident when comparing the network of Saccharomyces cerevisiae and Escherichia coli, as Escherichia coli was unable to produce a network, due to insufficient data on the query proteins; whereas the Saccharomyces cerevisiae network produced the largest network.

Analysing the Saccharomyces Cerevisiae Model

We chose to closely analyse the Saccharomyces cerevisiae network, because this species was the most closely related to Puccinia graminis Using these genes, BLASTp and Uniprot were used to determine whether the protein it produced was highly non-conserved within Puccinia graminis or within Fungi. This process filtered out all proteins except for JJJ2 and ZUO1, however there has been no evidence in literature to support these proteins as targets for Puccinia graminis

Future work

Build a pipeline to take the proteins from BLASTp and filter out proteins that are highly conserved with distantly related organisms
Analyse the networks from other the model species' on GENEMANIA
Consider analysing RNA-seq datasets of Puccinia graminis infection to identify potential protein targets

.

Designing a Peptide-Based Inhibitor for Puccinia spp. Proteins

Template peptides were identified by screening literature for proteins known to bind to our target protein. Following this, Latch Bio’s AlphaFold was used to construct the 3D structure of the known peptide (Jumper et al., 2021). We modelled several variations of the known peptide, these included: an unaltered peptide and a variation fused with a GGSG linker and mVenus fluorescent protein. Additionally, we modelled our protein target fused with an mCerulean fluorescent protein. This was modelled to allow for FRET analysis between our protein target/peptide and due to a lack of resolved structures.

Active Site Determination

After generating the protein structures, we used ClusPro to perform protein-peptide docking simulations (Porter et al., 2017). This modelled potential binding sites between our protein target and peptide inhibitor. The top ten ranked models were used to identify potential active site regions on Pymol. Active regions were identified by selecting peptide residues which were within 5 Angstroms of the receptor. The selected residues were validated by checking for polar contacts between residues and referencing AlphaFold’s pLDDT values (predicted accuracy) for both receptor and ligand structures. Additionally, we incorporated amino acids which were part of the secondary structures formed by our selected residues, as well as those influencing the structure of the selected residues.

Designing a Novel Peptide Inhibitor

PyMol analysis data was used to identify overlaps in active sites between the top 10 ranked models. We subsequently produced 5 unique potential active sites from the template peptide sequence. Each variation of the active site was used as a foundation to design two unique peptide sequences.

Peptide 1 is made by removing parts of the sequence that were not part of the active site (not including linker or fluorescent protein).

Peptide 2 is made by substituting non-active site areas with a part of another sequence that had high similarity to the template peptide. This was produced by 3 runs of PSI-BLAST against the PDB database (Altschul et al., 1997).

Furthermore, a peptide with only the linker and fluorescent protein was produced to compare its docking simulations to the engineered peptides; this is to observe the binding affinity of the linker and fluorescent protein itself.

Reasoning of methodology

Given that the results of AlphaFold and ClusPro can be unreliable, we must test whether the predicted active sites are accurate. This will be achieved by testing peptide-protein binding affinity in the lab. For this purpose, we designed a peptide specifically based on the predicted active site. To produce a peptide that binds to the target protein and prevent other proteins from binding to it, we also created a second peptide. It was designed by substituting non-active sites regions, for parts of sequences with high similarity to the template protein and known 3D structures. This reference to known structures will allow us to better understand the structure and binding potential of our peptides. Thus, using the data produced in the lab and further Alphafold structure prediction and ClusPro docking prediction, we can assess the binding affinity of our peptides and better understand the optimal structure and active site of our peptide.

Future work

Producing a pipeline to analyse all peptide-protein docking structures for potential active sites
Using different protein-protein docking sofware to benchmark known docking structures to identify the most accurate and reliable software to use
Using a variety of protein-protein docking software to generate more data to accurately predict active site regions
Using peptides and proteins without the fluorescent protein and linker attached for docking simulations, to compare the effects of the linker and fluorescent protein to binding affinity

.

With such a scarcely studied organism, Puccinia spp. is a mystery. As such, it was essential to our project to extensively model and predict the structure of the proteins involved in the fungal pathogenesis. Though the dry lab team was responsible for scouring genome databases, our wet lab team were tasked with consulting the literature in order to find these somewhat characterised effector proteins. In doing so, we surmised a list of potential targets in the fungus that we could hit using an inhibitory peptide. In broadening our approach, we also considered targeting plant proteins involved in the pathogenesis and by reducing their activity, the plant may be stimulated to mount an immune response against the fungus.

Target	Description of function	Structure characterisation	References
PstSTE12	Found in the Puccinia striiformis f. sp. tritici species Acts a transcription factor that regulates the expression the of proteins involved in the invasive growth of the fungus on the wheat Localised in the nucleus of the fungal cell	Gene contains an open reading frame of 2637 bp Protein structure involves 879 amino acids Motifs found in the protein: three helices in the homeodomain conserved phenylalanine and tryptophan sites two C2/H2-Zn2+ finger domains It is noted that the C-terminus is necessary for the activation of transcription	(Zhu et al., 2017)
PstSCR1	Found in the Puccinia striiformis f. sp. tritici species Small secreted cysteine-rich effector protein Suppresses host immunity, mediates nutrient uptake and subsequently enables parasitism It is predicted that the effector is only functional (in triggering plant immune response) if secreted into plant apoplast (the space between the cell wall and the cell membrane)	Mature transcript contains 488 bp Protein has a chain of 116 amino acids Motifs found in the protein: three conserved (Y/F/W)x(C) motifs, one of which is located in the N-terminus first amino acid of this motif is aromatic (tyrosine, phenylalanine or tryptophan) and the last is always cysteine Contains a signal peptide which facilitates crossing of proteins through cellular membranes	(Dagvadorj et al., 2017)
PgtSR1	Found in the Puccinia graminis species Involved in RNA silencing in plants to impede plant defences by altering the abundance of small RNAs that serve as defence regulators Promotes susceptibility to multiple pathogens and partially suppress cell death triggered by multiple plant resistance proteins Localised in the plant cytoplasm and nucleus	Protein contains a 145 long amino acid chain	(Yin et al., 2019)
PEC6	Found in the Puccinia striiformis f. sp. tritici species Effector protein that suppresses pattern-triggered immunity in the host wheat plant Targets adenosine kinases (ADKs) and may affect metabolism regulation, cytokinin interconversion and methyl transfer reactions to favour fungal growth Localised in the nucleus and cytoplasm of plant cell	Small and cysteine-rich protein of a total length of 66 amino acids Contains a signal peptide at the N-terminus of 22 amino acids Shown to interact with the C-terminus of ADK in yeast	(Liu et al., 2016)
Pst_12806	Found in the Puccinia striiformis f. sp. tritici species Upregulated during infection, and its knockdown reduces fungal growth and development, likely due to increased ROS accumulation Translocates into chloroplasts and affects chloroplast function Interacts with the C-terminal Rieske domain of the wheat TaISP protein (a component of the cytochrome b6-f complex that connects the photosystems)	146 amino acids in length in the mature protein Contains a signal peptide and a transit peptide at the N-terminus	(Xu et al., 2019)
PsHXT1	Found in the Puccinia striiformis f. sp. tritici species Hexoses are a major form of sugar utilised by this obligate biotrophic fungus Involved in nutrient uptake to sustain fungal growth and development Indispensable for establishing the fungal–wheat interaction Localised to the plasma membrane of the fungal cell	Protein contains a length of 551 amino acids Predicted to have 12 transmembrane domains	(Chang et al., 2020)
Pgt-IaaM	Found in the Puccinia graminis species This tryptophan 2-monooxygenase is involved in the synthesis of the auxin precursor indole-3-acetamide (IAM) Induces wheat plants to accumulate auxin in infected leaf tissue Transient silencing of the gene in infected wheat plants indicated that it was required for full pathogenicity Expressed in haustoria cells in infected plant tissue	Protein has a predicted length of 588 amino acids	(Yin et al., 2014)
PGTG_10537.2/VPS9 complex	PGTG_10537.2 contains fibronectin type III and breast cancer type 1 susceptibility protein domains VPS9 is a vacuolar protein sorting-associated protein 9 with a coupling of ubiquitin to endoplasmic reticulum degradation domain Suggested that these proteins exist as a complex in vivo Initiates the hypersensitive response in the host, causing significant damage to the leaf Localised to the cytoplasm of the host plant cell	PGTG_10537.2 is predicted to be 818 amino acids long VPS9 is predicted to be 744 amino acids long	(Nirmala et al., 2011)
TaDIR1-2	Defective in Induced Resistance 1 (DIR1-2) is a lipid transfer protein in wheat Upon immune response induction, DIR1 moves from locally infected to distant uninfected leaves to activate defence priming Knocking down the expression of TaDIR1-2 increased wheat resistance to Puccinia, accompanied by hypersensitive response, increased accumulation of H2O2 and salicylic acid, increased expression of TaPR1, TaPR2, TaPAL, and TaNOX, and decreased expression of two reactive oxygen species (ROS) scavenging genes TaCAT and TaSOD TaDIR1-2 acts as a negative regulator in wheat resistance to Puccinia by modulating ROS and/or salicylic acid-induced signalling Localised in the cytoplasm and the cell membrane of wheat mesophyll protoplast	Protein is approximately 100 amino acids in length Motifs found in the protein: 8 cysteine residues, forming 4 intrachain disulfide bridges a flexible hydrophobic cavity which interacts non-specifically with lipid molecules a proline rich domain	(Ahmed et al., 2017)

Though the gene and/or amino acid sequences were known for these targets, there was limited understanding of their protein structure. Subsequently, to elucidate this mystery, our wet lab team modelled a prediction of these structures based on the amino acid sequences using the AlphaFold v2.0 software. The results of these modelling practices are shown below. In doing this, we selected and proceeded with the targets that gave the most confident prediction.

HXT1

PEC6

PGTG_10537.2

Pgt-IaaM

Pst_12806

SCR1

PgtSR1

STE12

TaDIR1-2

VPS9

Subsequently, our team had arrived at the decision that we will be testing the molecular interactions of the fungal targets and our designed peptide candidates using a Förster resonant energy transfer (FRET) assay. In doing so, we were tasked with conjugating the targets with a fluorescent protein that can be used to give an output signal when the fungal target and peptide are in close proximity. As such, we modelled, using AlphaFold v2.0, the structure of the fungal target separated by a short linker sequence attached to mCerulean3, the fluorescent protein that was available to us in our lab, as shown below. This was to predict any structural changes that may occur with such a big change, which would severely impact the true binding of our peptide. To lessen this impact, we fused the fluorescent tag on whichever terminus of the protein was protruding outwards. Though this still resulted in some changes for a few of the protein targets, some showed no apparent conformational changes. And so, we proceeded to synthesise and test these selected targets conjugated with mCerulean3 that remained the same conformation in the lab.

HXT1 + mCerulean3

Pgt-IaaM + mCerulean3

PEC6 + mCerulean3

PgtSR1 + mCerulean3

SCR1 + mCerulean3

Pst_12806 + mCerulean3

This approach was used in conjunction with the dry lab approach to reach a common goal in constructing a more resolved understanding of the cereal rust pathogenesis and how our solution fits into it. Here, without this understanding of the fungal target structures, it would have been near impossible to design an inhibitory peptide suited to bind to them.

.

Ahmed, S.M. et al. (2017) “TADIR1-2, a wheat ortholog of lipid transfer protein atdir1 contributes to negative regulation of wheat resistance against Puccinia striiformis f. sp. tritici,” Frontiers in Plant Science, 8. Available at: https://doi.org/10.3389/fpls.2017.00521.
BLASTp. Retrieved from https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins NCI. (2022).
Chang, Q. et al. (2020) “Hexose Transporter PsHXT1‐mediated sugar uptake is required for pathogenicity of wheat stripe rust,” Plant Biotechnology Journal, 18(12), pp. 2367–2369. Available at: https://doi.org/10.1111/pbi.13398.
ClusPro PeptiDock: efficient global docking of peptide recognition motifs using FFT. Bioinformatics, 33(20), 3299-3301. Available at https://doi:10.1093/bioinformatics/btx216
Dagvadorj, B. et al. (2017) “A Puccinia striiformis f. sp. tritici secreted protein activates plant immunity at the cell surface,” Scientific Reports, 7(1). Available at: https://doi.org/10.1038/s41598-017-01100-z.
HPC Systems. Retrieved from https://nci.org.au/our-systems/hpc-systems Porter, K.A. et al. (2017).
Jumper, J. et al. (2021). "Highly accurate protein structure prediction with AlphaFold". Nature, 596(7873), 583-589. Available at https://doi:10.1038/s41586-021-03819-2 NCBI. (2022).
Liu, C. et al. (2016) “The stripe rust fungal effector PEC6 suppresses pattern‐triggered immunity in a host species‐independent manner and interacts with adenosine kinases,” New Phytologist [Preprint]. Available at: https://doi.org/10.1111/nph.14034.
The GENEMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Research, 38(suppl_2), W214-W220. Available at https://doi:10.1093/nar/gkq537
UniProt. (2022). UniProt. Retrieved from https://www.uniprot.org/uniprotkb?query=proteome%3AUP000008783 Warde-Farley, D. et al. (2010).
Nirmala, J. et al. (2011) “Concerted action of two avirulent spore effectors activates reaction to Puccinia graminis 1 (rpg1)-mediated cereal stem rust resistance,” Proceedings of the National Academy of Sciences, 108(35), pp. 14676–14681. Available at: https://doi.org/10.1073/pnas.1111771108.
Yin, C. et al. (2014) “Characterization of a tryptophan 2-monooxygenase gene from Puccinia graminis f. sp. tritici involved in auxin biosynthesis and rust pathogenicity,” Molecular Plant-Microbe Interactions, 27(3), pp. 227–235. Available at: https://doi.org/10.1094/mpmi-09-13-0289-fi.
Yin, C. et al. (2019) “A novel fungal effector from Puccinia graminis suppressing RNA silencing and plant defense responses,” New Phytologist, 222(3), pp. 1561–1572. Available at: https://doi.org/10.1111/nph.15676.
Zhu, X. et al. (2017) “The transcription factor PstSTE12 is required for virulence of Puccinia striiformis f. sp. tritici,” Molecular Plant Pathology, 19(4), pp. 961–974. Available at: https://doi.org/10.1111/mpp.12582.

.

Data and scripts used for protein target screening of Puccinia spp.

Link:

https://doi.org/10.5281/zenodo.7178227

Contents of package:

HPC Job Script used to BLASTp Puccinia spp. unreviewed proteins against reviewed proteins of model species
Script used to extract >90% identity from BLASTp data
List of protein names extracted from BLASTp data
Unreviewed Puccinia spp. proteins and reviewed proteins of model species in separate FASTA files

Data on Peptide Engineering

Link:

https://doi.org/10.5281/zenodo.7178898

Contents of package:

Active site analysis data and documentation of TaISP protein with PST12806 target
Data and documentation of peptide engineering design process
ClusPro docking simulation data between engineered peptides and PST12806 target

.