.

Many Puccinia spp. proteins have been sequenced and stored in databases. However, to a large extent, the function and structures of these proteins remains uncharacterised. To identify viable protein targets, our project deployed Protein Sequence Alignment and protein-protein interaction networking methods. Following this, a peptide-based inhibitor was designed using known protein binding proteins as templates. The peptide was produced by identifying the active site regions in known protein binding proteins was tested using a Protein-Protein Docking Simulation. Furthermore, wet lab used literature to identify potential targets in the fungus and modelling these proteins in AlphaFold.

.

.

Protein Target Identification


We began our modelling process by screening existing Puccinia spp. literature, for viable proteins to target. This involved reviewing the Puccinia graminis proteome through literature and entries from the Uniprot database (UniProt, 2022). We subsequently discovered that the Puccinia graminis proteome was highly uncharacterised - with only 15 of the 15808 proteins being reviewed (UniProt, 2022). Upon further analysis, these 15 proteins were concluded to be highly conserved amongst other species. Consequently, we ruled them out as potential targets – due to concerns over off-target effects.

Subsequently, viable protein targets would need to be screened using the unreviewed section of UniProt. This presented a major challenge, due to the lack of characterised protein functions and the sheer volume of Puccinia graminis proteins. Additionally, many of these proteins were highly conserved. Thus, to aid our screening, we devised a pipeline using BLASTp and GENEMANIA to annotate unreviewed protein sequences (NCBI, 2022, Warde-Farley et al., 2010). This would allow us to assign biological functions to each unreviewed protein, using reviewed proteins from other species.


BLASTp – Web scraping from NCBI


To aid our screening of the Puccinia graminis proteome, we decided to use BLASTp (NCBI, 2022). This allowed us to compare our unreviewed Puccinia graminis proteins, with reviewed protein entries from other species. Subsequently, we assumed that proteins with >90% identity is equivalent to the annotated protein counterparts. This will provide more proteins to analyse the importance of the protein to Puccinia graminis

The first hurdle we faced was the sheer volume of proteins to be reviewed. Manually inputting these proteins on the BLASTp webpage would be extremely inefficient. As a result, we started writing a python script to run the BLAST and analyse the resulting data. This would allow us to identify similar protein sequences and their similarity scores.

However, the NCBI servers were not designed to run many queries quickly for scraping (NCBI, 2022). So, we devised a plan to break the list of proteins to search into small lists and have 20 computers running the script to run the blast. This would involve writing scripts to run the BLAST, as well as resume the BLAST at the correct protein if the process was interrupted. Time estimates for the blasting were produced by running a BLASTp on the NCBI site. This approximately took 1.5-2 minutes. Consequently, it would take 20 computers continuously running the script, 26 hours to finish blasting the entire proteome.

However, when the script ran BLAST, it was unable to produce results as quickly as searching from the NCBI sites. Further research into NCBI, showed that the servers were set to prioritise users manually searching and servers only ran at full performance between certain times. These were significant issues as the script’s blast search for a single protein would take 10-20 minutes. Ultimately, this meant that web scraping was not viable.

Fig 1. NCBI Usage Guidelines containing rules for high volume users

BLASTp – Local BLASTp on HPC Gadi


To work around our web scraping hurdles, we were fortunate enough to gain access to Australia’s HPC Gadi. This provided us the computing power to conduct an local BLASTp. However, due to GENEMANIA’s organism limitations we decided to limit our BLASTp search of Puccinia graminis’s unannotated proteome against reviewed proteomes of specific organisms. Only reviewed proteomes were selected because unreviewed proteins were unavailable to query in GENEMANIA.

Following this, the results of our BLASTp were exported as a text file. This needed to be analysed to identify proteins with high % identity to our Puccinia graminis proteins (query proteins). Thus, a script was built to extract a list of proteins that shared >90% identity with our Puccinia graminis proteins.

This produced a list of 174 protein sequences from the dataset of every modelled species’ reviewed proteome sequence (83495 protein sequences). These proteins were queried on GENEMANIA against 9 available species database and produced small protein-protein networks.

Fig 2. List of GENEMANIA organisms: Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus, Homo sapiens and Saccharomyces cerevisiae.

GENEMANIA


GENEMANIA takes the queried protein names and produces a protein-protein interaction network based on its databases. Each node represents a gene coding for the protein queried and the edges represent known relationships from GENEMANIA’s databases.
We use GENEMANIA's protein-protein interaction network of model species to produce a protein-protein interaction network of Puccinia graminis proteins. This data will be used to identify a protein that is vital to Puccinia graminis proliferation.

Fig 3. GENEMANIA network of Saccharomyces cerevisiae

Screening Proteins using GENEMANIA


To screen the best protein target to disrupt Puccinia graminis we took the genes with the greatest number of edges, because inhibiting that protein could mean disrupting the other proteins connected by edges. We then queried these genes on Uniprot to ensure that the gene is not highly conserved in Homo sapiens or other non-Puccinia graminis organisms. These were repeated for all species databases on GENEMANIA. However, we were unable to find any non-highly conserved proteins, meaning we couldn’t find any ideal protein targets.

Fig 4. GENEMANIA network of Saccharomyces cerevisiae with circled central nodes

Analysis Results


Model Issues

Using model species databases from GENEMANIA is not an accurate model of Puccinia graminis protein interactions. This is because the data used to generate protein-protein interaction networks is from other experiments conducted on other species. This means our model assumes that GENEMANIA's protein-protein interaction networks between Puccinia graminis and GENEMANIA's model species. Furthermore, the assumptions of this model means that any protein target identified will have a high risk of off-target effects since the model is based on data from other species. Also, it is highly likely that the protein-protein interaction network produced is mostly consisting of highly conserved genes between distantly related organisms, which makes identifying a unique protein target for Puccinia graminis highly unlikely.


Validity of the Model

The use of a model species protein-protein network interaction model to model Puccinia graminis's protein-protein interaction is valid because the model produced is dependent on how closely related both species are. This is evident when comparing the network of Saccharomyces cerevisiae and Escherichia coli, as Escherichia coli was unable to produce a network, due to insufficient data on the query proteins; whereas the Saccharomyces cerevisiae network produced the largest network.


Analysing the Saccharomyces Cerevisiae Model

We chose to closely analyse the Saccharomyces cerevisiae network, because this species was the most closely related to Puccinia graminis Using these genes, BLASTp and Uniprot were used to determine whether the protein it produced was highly non-conserved within Puccinia graminis or within Fungi. This process filtered out all proteins except for JJJ2 and ZUO1, however there has been no evidence in literature to support these proteins as targets for Puccinia graminis


Future work

  1. Build a pipeline to take the proteins from BLASTp and filter out proteins that are highly conserved with distantly related organisms
  2. Analyse the networks from other the model species' on GENEMANIA
  3. Consider analysing RNA-seq datasets of Puccinia graminis infection to identify potential protein targets

.

.

Designing a Peptide-Based Inhibitor for Puccinia spp. Proteins


Template peptides were identified by screening literature for proteins known to bind to our target protein. Following this, Latch Bio’s AlphaFold was used to construct the 3D structure of the known peptide (Jumper et al., 2021). We modelled several variations of the known peptide, these included: an unaltered peptide and a variation fused with a GGSG linker and mVenus fluorescent protein. Additionally, we modelled our protein target fused with an mCerulean fluorescent protein. This was modelled to allow for FRET analysis between our protein target/peptide and due to a lack of resolved structures.


Active Site Determination


After generating the protein structures, we used ClusPro to perform protein-peptide docking simulations (Porter et al., 2017). This modelled potential binding sites between our protein target and peptide inhibitor. The top ten ranked models were used to identify potential active site regions on Pymol. Active regions were identified by selecting peptide residues which were within 5 Angstroms of the receptor. The selected residues were validated by checking for polar contacts between residues and referencing AlphaFold’s pLDDT values (predicted accuracy) for both receptor and ligand structures. Additionally, we incorporated amino acids which were part of the secondary structures formed by our selected residues, as well as those influencing the structure of the selected residues.


Designing a Novel Peptide Inhibitor


PyMol analysis data was used to identify overlaps in active sites between the top 10 ranked models. We subsequently produced 5 unique potential active sites from the template peptide sequence. Each variation of the active site was used as a foundation to design two unique peptide sequences.

  • Peptide 1 is made by removing parts of the sequence that were not part of the active site (not including linker or fluorescent protein).

  • Peptide 2 is made by substituting non-active site areas with a part of another sequence that had high similarity to the template peptide. This was produced by 3 runs of PSI-BLAST against the PDB database (Altschul et al., 1997).


  • Furthermore, a peptide with only the linker and fluorescent protein was produced to compare its docking simulations to the engineered peptides; this is to observe the binding affinity of the linker and fluorescent protein itself.


    Reasoning of methodology


    Given that the results of AlphaFold and ClusPro can be unreliable, we must test whether the predicted active sites are accurate. This will be achieved by testing peptide-protein binding affinity in the lab. For this purpose, we designed a peptide specifically based on the predicted active site. To produce a peptide that binds to the target protein and prevent other proteins from binding to it, we also created a second peptide. It was designed by substituting non-active sites regions, for parts of sequences with high similarity to the template protein and known 3D structures. This reference to known structures will allow us to better understand the structure and binding potential of our peptides. Thus, using the data produced in the lab and further Alphafold structure prediction and ClusPro docking prediction, we can assess the binding affinity of our peptides and better understand the optimal structure and active site of our peptide.


    Future work


    .

    .

    With such a scarcely studied organism, Puccinia spp. is a mystery. As such, it was essential to our project to extensively model and predict the structure of the proteins involved in the fungal pathogenesis. Though the dry lab team was responsible for scouring genome databases, our wet lab team were tasked with consulting the literature in order to find these somewhat characterised effector proteins. In doing so, we surmised a list of potential targets in the fungus that we could hit using an inhibitory peptide. In broadening our approach, we also considered targeting plant proteins involved in the pathogenesis and by reducing their activity, the plant may be stimulated to mount an immune response against the fungus.


    Target
    Description of function
    Structure characterisation
    References
    PstSTE12
    Found in the Puccinia striiformis f. sp. tritici species

    Acts a transcription factor that regulates the expression the of proteins involved in the invasive growth of the fungus on the wheat
    Localised in the nucleus of the fungal cell
    Gene contains an open reading frame of 2637 bp
    Protein structure involves 879 amino acids

    Motifs found in the protein:
    • three helices in the homeodomain
    • conserved phenylalanine and tryptophan sites
    • two C2/H2-Zn2+ finger domains
    It is noted that the C-terminus is necessary for the activation of transcription
    (Zhu et al., 2017)
    PstSCR1
    Found in the Puccinia striiformis f. sp. tritici species

    Small secreted cysteine-rich effector protein
    Suppresses host immunity, mediates nutrient uptake and subsequently enables parasitism
    It is predicted that the effector is only functional (in triggering plant immune response) if secreted into plant apoplast (the space between the cell wall and the cell membrane)
    Mature transcript contains 488 bp

    Protein has a chain of 116 amino acids
    Motifs found in the protein:
    • three conserved (Y/F/W)x(C) motifs, one of which is located in the N-terminus
    • first amino acid of this motif is aromatic (tyrosine, phenylalanine or tryptophan) and the last is always cysteine
    Contains a signal peptide which facilitates crossing of proteins through cellular membranes
    (Dagvadorj et al., 2017)
    PgtSR1
    Found in the Puccinia graminis species

    Involved in RNA silencing in plants to impede plant defences by altering the abundance of small RNAs that serve as defence regulators
    Promotes susceptibility to multiple pathogens and partially suppress cell death triggered by multiple plant resistance proteins
    Localised in the plant cytoplasm and nucleus
    Protein contains a 145 long amino acid chain
    (Yin et al., 2019)
    PEC6
    Found in the Puccinia striiformis f. sp. tritici species

    Effector protein that suppresses pattern-triggered immunity in the host wheat plant
    Targets adenosine kinases (ADKs) and may affect metabolism regulation, cytokinin interconversion and methyl transfer reactions to favour fungal growth
    Localised in the nucleus and cytoplasm of plant cell
    Small and cysteine-rich protein of a total length of 66 amino acids

    Contains a signal peptide at the N-terminus of 22 amino acids
    Shown to interact with the C-terminus of ADK in yeast
    (Liu et al., 2016)
    Pst_12806
    Found in the Puccinia striiformis f. sp. tritici species

    Upregulated during infection, and its knockdown reduces fungal growth and development, likely due to increased ROS accumulation
    Translocates into chloroplasts and affects chloroplast function
    Interacts with the C-terminal Rieske domain of the wheat TaISP protein (a component of the cytochrome b6-f complex that connects the photosystems)
    146 amino acids in length in the mature protein

    Contains a signal peptide and a transit peptide at the N-terminus
    (Xu et al., 2019)
    PsHXT1
    Found in the Puccinia striiformis f. sp. tritici species

    Hexoses are a major form of sugar utilised by this obligate biotrophic fungus
    Involved in nutrient uptake to sustain fungal growth and development
    Indispensable for establishing the fungal–wheat interaction
    Localised to the plasma membrane of the fungal cell
    Protein contains a length of 551 amino acids

    Predicted to have 12 transmembrane domains
    (Chang et al., 2020)
    Pgt-IaaM
    Found in the Puccinia graminis species

    This tryptophan 2-monooxygenase is involved in the synthesis of the auxin precursor indole-3-acetamide (IAM)
    Induces wheat plants to accumulate auxin in infected leaf tissue
    Transient silencing of the gene in infected wheat plants indicated that it was required for full pathogenicity
    Expressed in haustoria cells in infected plant tissue
    Protein has a predicted length of 588 amino acids
    (Yin et al., 2014)
    PGTG_10537.2/VPS9 complex
    PGTG_10537.2 contains fibronectin type III and breast cancer type 1 susceptibility protein domains

    VPS9 is a vacuolar protein sorting-associated protein 9 with a coupling of ubiquitin to endoplasmic reticulum degradation domain
    Suggested that these proteins exist as a complex in vivo
    Initiates the hypersensitive response in the host, causing significant damage to the leaf
    Localised to the cytoplasm of the host plant cell
    PGTG_10537.2 is predicted to be 818 amino acids long

    VPS9 is predicted to be 744 amino acids long
    (Nirmala et al., 2011)
    TaDIR1-2
    Defective in Induced Resistance 1 (DIR1-2) is a lipid transfer protein in wheat

    Upon immune response induction, DIR1 moves from locally infected to distant uninfected leaves to activate defence priming
    Knocking down the expression of TaDIR1-2 increased wheat resistance to Puccinia, accompanied by hypersensitive response, increased accumulation of H2O2 and salicylic acid, increased expression of TaPR1, TaPR2, TaPAL, and TaNOX, and decreased expression of two reactive oxygen species (ROS) scavenging genes TaCAT and TaSOD
    TaDIR1-2 acts as a negative regulator in wheat resistance to Puccinia by modulating ROS and/or salicylic acid-induced signalling
    Localised in the cytoplasm and the cell membrane of wheat mesophyll protoplast
    Protein is approximately 100 amino acids in length

    Motifs found in the protein:
    • 8 cysteine residues, forming 4 intrachain disulfide bridges
    • a flexible hydrophobic cavity which interacts non-specifically with lipid molecules
    • a proline rich domain
    (Ahmed et al., 2017)

    Though the gene and/or amino acid sequences were known for these targets, there was limited understanding of their protein structure. Subsequently, to elucidate this mystery, our wet lab team modelled a prediction of these structures based on the amino acid sequences using the AlphaFold v2.0 software. The results of these modelling practices are shown below. In doing this, we selected and proceeded with the targets that gave the most confident prediction.


    HXT1
    PEC6
    PGTG_10537.2
    Pgt-IaaM
    Pst_12806
    SCR1
    PgtSR1
    STE12
    TaDIR1-2
    VPS9

    Subsequently, our team had arrived at the decision that we will be testing the molecular interactions of the fungal targets and our designed peptide candidates using a Förster resonant energy transfer (FRET) assay. In doing so, we were tasked with conjugating the targets with a fluorescent protein that can be used to give an output signal when the fungal target and peptide are in close proximity. As such, we modelled, using AlphaFold v2.0, the structure of the fungal target separated by a short linker sequence attached to mCerulean3, the fluorescent protein that was available to us in our lab, as shown below. This was to predict any structural changes that may occur with such a big change, which would severely impact the true binding of our peptide. To lessen this impact, we fused the fluorescent tag on whichever terminus of the protein was protruding outwards. Though this still resulted in some changes for a few of the protein targets, some showed no apparent conformational changes. And so, we proceeded to synthesise and test these selected targets conjugated with mCerulean3 that remained the same conformation in the lab.


    HXT1 + mCerulean3
    Pgt-IaaM + mCerulean3
    PEC6 + mCerulean3
    PgtSR1 + mCerulean3
    SCR1 + mCerulean3
    Pst_12806 + mCerulean3

    This approach was used in conjunction with the dry lab approach to reach a common goal in constructing a more resolved understanding of the cereal rust pathogenesis and how our solution fits into it. Here, without this understanding of the fungal target structures, it would have been near impossible to design an inhibitory peptide suited to bind to them.

    .

    .

    .

    .

    Data and scripts used for protein target screening of Puccinia spp.


    Link:

    https://doi.org/10.5281/zenodo.7178227


    Contents of package:

    Data on Peptide Engineering


    Link:

    https://doi.org/10.5281/zenodo.7178898


    Contents of package:

    .

    .