Software

We constructed our dry lab plan through multiple stages of filtering and shaping. We started by determining a set of goals to prove, considering the hardships we may face in doing that experimentally. Hence, it was necessary to ensure our computational workflow covered every aspect of our systems. Consequently, we knew our plan would include the non-stop generation of libraries and many simultaneous computational jobs. For that, it was only right to think of something that would save us time and effort, giving us the space to be more innovative.

So, here we present our golden solution, the automated pipeline, PharaohSuite.

PharaohSuite is a set of automated scripts designed to utilize different modeling, docking, assembly, molecular dynamic (MD) simulations, and mathematical modeling algorithms to carry out numerous jobs automatically instead of submitting them one by one. You could either choose a single script to use on multiple inputs or execute the whole pipeline to run altogether.

Fig. 1: Figure illustrating the PharaohSuite protein pipeline.

The idea

Our project is based on two systems, the snitch system, which mimics the Ubiquitin-Proteasome degradation cascade, and the Plug-Sink system, which depends on creating a switchable protease that is only activated upon recognition of the target. Both systems depend on performing protein modeling and simulations and predicting assembled systems' activity.

The first step was to select the different tools we were going to use for each step, and this was done by creating lists of tools for 3D modeling, docking, etc., and filtering them by running trial jobs to validate their algorithm and reproducibility after checking their benchmarking.

Version 1 - The Protein Simulation Pipeline

The first package of our suite, or as we call it, "Team1," consists of seven scripts, each with a distinct job; the execution of these scripts, one after the other, would take you from mere predicted 3D structures to a completely assembled system.

Note: This pipeline was constructed based on the NCBI protocol [1].

Step 1 (Structure Assessment)

There are many algorithms for different types of 3D modeling, like Ab initio and homology modeling, all of which can generate multiple results. Visualization is important when choosing your best fit model, but so is a set of structure assessment parameters that validate its quality; these parameters can be calculated using multiple servers. In our case, the Swiss Structure Assessment server seemed to calculate all of the parameters needed, having a Molprobity plug-in along with the QMEAN calculation service.

Our first code depends on uploading your 3D models to the Swiss Server and acquiring the JSON files of your assisted models to rank them out of 6 according to their parameters' scores, the automation of this process would save you time and provide you with a CSV file containing the ranking so that it would be easy to select the top predicted 3D model to use in your next steps. It is even more useful if you have hundreds of models to choose from, just like we did.

Note: This code was constructed in collaboration with Swiss Model(You can read more about it in our Human Practices Page)

Step 2 (Docking)

Whether it's a Protein-Ligand or Protein-Protein docking, finding a suitable tool always seems troublesome. We passed by a variety of tools until we settled on three, Cluspro and Galaxy web servers for rigid-body docking and a protein-protein/protein-peptide flexible docking framework called light dock, which is based on Glowworm-Swarm optimization algorithm. (You can read more about it on our Docking page).

Light-dock is open-source Linux-based software; users usually input one receptor and one ligand and wait for the job to finish, but we optimized the second code in our software to allow users to input a set of receptors and ligands all at once, and the algorithm would run the jobs for all of them in a matrix-based manner.
There are many features available for docking with this software; our default code runs flexible docking and gives you the option to run with a restraints file and score your docked results with different scoring functions. (You can read more about the code in our user guide)

Step 3 (Ranking)

Having run our docking with different software that uses different algorithms, it was only wise to score the models externally using the same pipeline and parameters so that we would be able to compare the results; that is where the third code comes in.

Our third code scores docked models and output their binding affinity (ΔG) using PRODIGY services (PROtein binDIng enerGY prediction). The energies are generated in a text file, and the user is then able to evaluate their model based on the energy parameter alongside the visualization.

Step 4 (Molecular Dynamics Simulation)

Finally, after you have your docked structures, you are ready to simulate your produced systems in their real environment. GROMACS is one of the most popular software for MD simulation, and we wanted it to gain the same privilege as the other automated scripts. So, we wrote our code to allow the user to input a set of pdb files and optimize their MD conditions, and they will be simulated one after the other.

After your simulation is done, you can easily analyze the results through our Post MD analysis script, to calculate parameters like tempreature, pressure, poetential energy, RMSF, RMSD, surface accessible surface area, radius of gyration,and ramchandran.

As mentioned previously, you could use each one of our tools separately or run them all at once using our combined script (sirAlex); whatever option you choose, we hope our software serves its purpose and becomes a little part of your success.

Extra Packages

Our main aim when generating this pipeline was to make things easier for future iGEM teams because the tools of this suite have certainly saved us a lot of time and effort and made it easier to inspect results and choose from the enormous libraries we have generated for our linkers, peptides, and so on.

Accordingly, and in order to facilitate things further, we added extra packages that were a little bit more specific to our project, but could still be helpful to many other iGEM teams and even modified to fit their own purposes. You could find an R-based Data analysis package, a MATLAB-based package for the prediction of enzymatic activity, and a Python-based mathematical modeling package for the prediction of transcription and translation rates, along with our ubiquitination model that could be used by anyone simulating our system.