The MD and Gromacs

1.What is MD simulation?

Molecular Dynamics, often known as MD, is a computer simulation method used in various engineering and scientific fields to calculate the motion and equilibrium of each atom or molecule.[1]

2.Why do we need MD simulation?

Based on a generic model of the physics driving interatomic interactions, molecular dynamic (MD) simulations can describe the motion of each atom in a protein or other molecular system over time [2]. These simulations can capture a wide variety of significant biomolecular processes, such as conformational change, ligand binding, and protein folding and display the positions of all the atoms at a temporal resolution of femtoseconds. Importantly, these simulations may also anticipate how biomolecules would behave, on an atomic level, to perturbations like mutation, phosphorylation, protonation, or the addition or removal of a ligand. This is important because it enables researchers to understand better how biomolecules work [3].

3.Gromacs and how to install it?


The MD simulation generates a solvated protein by simulating water as discrete molecules. Since the interaction with water is a major process for protein folding, this step is essential for validating the structures. GROMACS is a community-driven open-source software package that can perform molecular dynamics, sometimes known as simulating the Newton equations of motion for systems that contain hundreds of thousands or millions of particles [4]. You can download the last version of Gromacs for Linux, from here. Follow these commands respectively to install Gromacs from here.

Force field and code construction

4. Force fields in MD, and why did we choose CHARMM36?


All popular molecular mechanics force fields are supported in Gromacs; these include 15 different implementations of AMBER, CHARMM, GROMOS, and OPLS. In addition, there are other community-supported force fields to choose from. Users can create their own tables to store data in a non-standard functional format [5]. After a lot of literature, we decided to use CHARMM36 force field and implement it in Gromacs force fields files download from here, (Wbsite link), we used CHARMM36 as our force field due to its heavily referenced work that it is suitable to work in the extracellular region of the brain and aggregated proteins[6].. With CHARMM36 we used TIP3P (transferable intermolecular potential) as a water solvation box which works in rigid geometry closely matching actual water molecules. The proposed models have three interaction points corresponding to the atoms of the water molecule. Only oxygen atoms have the Lennard-Jones parameters [7], so this water model is thought to be the best for our case[8].

-For Adding CHARMM36 in force fields file of Gromacs use the following command to copy the files into Gromacs directory.

"Sudo cp $(directory of CHARMM36 file)/usr/local/gromacs/share/gromacs/top"

5.Automated Code overview


-The code change the directory to be inside the build folder in gromacs file.
-It takes the path of a file which contains all your PDB files.
-Before the running of the code add this files to the folder which contain your PDB files it has all -the parameter adjusted to work in the brain environment.
We developed 3 codes for running MD on our structures, All of the codes were build based on Gromacs tutorial 1 lysozymwe in water
The running codes respectively: -

1.Generating the topology file:
To make a structure file that can be read by GROMACS, you must complete this procedure. It is first necessary to verify that the PDB consists entirely of protein chains and no aqueous solvents.
If any exist, this command can eliminate them:

$ grep -v HOH (your protein name).pdb >(your protein name) _clean.pdb
The pdb2gmx command is used to construct the topology. The necessary command is:

$ gmx pdb2gmx -f your protein name_clean.pdb -o your protein name_processed.gro -water spce
"A menu should appear asking to choose a forcefield – choose number 9 which is CHARMM36"

2.Generating explicit solvent and environment:
At this point, we need to build and establish the environment that will be used to simulate the protein. Inside the tutorial, we will work with a system that involves water. The following commands will allow us to generate our "box," also known as our environment:

$ gmx editconf -f (your protein name) _processed.gro -o (your protein name)_newbox.gro -c -d 1.0 -bt cubic
We have used the -bt cubic command to create a cube and the -c and -d 1.0 commands to ensure that our protein is in the exact center of the cube, at least 1.0nm from any of the box's edges.
Following these commands, our newly specified box will be solvated with spc216, a standard 3-point solvent arrangement. As the SPC, SPC/E, and TIP3 are all 3-point water models, this also works with them.

$ gmx solvate -cp your protein name_newbox.gro -cs spc216.gro -o your protein name_solv.gro -p topol.top
By enclosing the protein in a box made of water molecules, we can study it more thoroughly. The next step is to provide the protein with ions to cancel out any negative charges it might have.

$ gmx grompp -f ions.mdp -c your protein name_solv.gro -p topol.top -o ions.tpr -maxwarn 1
$ gmx genion -s ions.tpr -o (your protein name)_solv_ions.gro -p topol.top -pname SOD -nname CLA -neutral -conc 0.15
The -neutral command will make the total charge of the system equal to zero by adding positive and negative ions under flags -pname and -nname respectively, SOD and CLA is the name of Na and Cl atoms as defined in charmm36 force field, -conc 0.15 is to add number of Na and Cl atoms to reach the concentration 0.15 mol/liter. You can check the net charge of your system before this step by opening “topol.top” and reading the last line of the [atoms] section. Net charge should be displayed as “qtot X”.
#Additionally, the last command will bring up a menu. Specifically, we need to select option 13: SOL, which confines the ions to the solvent.

3.Energy minimization (EM)
This is a vital stage in the process of creating a high-quality protein model. If our protein model has any steric conflicts or unnatural geometries, we can fix them using an EM run. To begin energy minimization, use the following commands:

$ gmx grompp -f minim.mdp -c your protein name_solv_ions.gro -p topol.top -o em.tpr -maxwarn 1 $ gmx mdrun -v -deffnm em
When we execute this command, we obtain many files, one of which, em.gro, includes the energy-minimized structure's model.
We also provide a binary energy file (em.edr) that may be used to make a graphic of the energy evolution during the energy minimization simulation.

4.Equilibration
The temperature of the system and the concentrations of the solvent and ions relative to the protein must now be equilibrated. While energy minimization improved our protein structure, it did not improve it with its (solvated) surroundings. When the system temperature is right, we'll apply pressure to get it to the necessary density.
The following commands will begin the first stage of equilibration, which requires increasing the temperature:

$ gmx grompp -f nvt.mdp -c em.gro -r em.gro -p topol.top -o nvt.tpr -maxwarn 1
$ gmx mdrun -deffnm nvt
Now we will go through the second stage of equilibration, putting pressure into our system involves running the following commands:

$ gmx grompp -f npt.mdp -c nvt.gro -r nvt.gro -t nvt.cpt -p topol.top -o npt.tpr
$ gmx mdrun -deffnm npt
5.Starting Molecular Dynamic Simulation
Once the system has reached the set point for temperature and pressure, the production MD can be started, and data collecting can begin. In the end, the mdrun program is used, which is GROMACS' primary computational suite. The subsequent instructions are what need to be run in production MD.

$ gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -o md_0_1.tpr -maxwarn 1
$ gmx mdrun -deffnm md_0_1
It's important to realize that this process takes a significant amount of time. After it's done, we'll have a .gro file representing the protein's final solvent-exposed structure, and a PDB file will also be formed. [4,9,10]

6.Our MD models:
Concerning our Project, our first priority was to obtain MD simulation for our 2 main proteins which are Amyloid-beta and Tau, we made MD simulation for 2 isoforms of tau PHF and PHF* due to our limited computational resources needed to run MD for the whole tau protein, then we ran a simple simulation for one of our parts in the snitch system which is Coh2-linker-Td28rev.

Results:
We have Run two trial jobs to check whether our codes are right and compatible with our models, but After the Running of these jobs, through a literature search, we discovered that 100 ps was not enough for the MD to reach the final form and its plateau to conclude that our system become stable, we should at least run the jobs in 0.15 ns to get the results successfully; but unfortunately we didn't have the ability to do it due to our facilities.

(GST) Coh-Linker-TD28rev


Figure 1. the MD of COH-Linker-Td28rev model in a solvation box visualized by Pymol.


Tau (PHF)


Figure 2. the MD of COH-Linker-Td28rev model in a solvation box visualized by Pymol.


Our Future work: we aim to run all the jobs again in a supercomputer with 0.5 ns and apply the post MD analysis on it then compare it with 100 ps results to define the difference between them.


Post MD

7.Post MD analysis:

After the MD results, we wanted to analyze it to consider whether these results are suitable to the normal and expected ranges or not. After going through literature and tutorials, we have reached to several analysis to perform in our results and we have filtered them to reach to the main analysis which will benefit us in our results. These parameters: are
Temperature Analysis.
Pressure Analysis.
Potential Energy Analysis.
RMSF (Root Mean Square Fluctuation).
RMSD (Root Mean Square Deviation).
Radius of Gyration.
SASA (Surface Accessible Surface Area).
Ramachandran Plot.
Including these parameters, we constructed a post MD analysis code automated by bash script language through Linux.

8.Code Construction:

-Before starting any procedure, GROMACS needs to be activated so that you can use its command:

source /usr/local/gromacs/bin/GMXRC
-check if the MD run is Successful or not

gmx check -f md_0_1.xtc
-Temperature Analysis (select number 15 then press 2 enter)

gmx energy -f md_0_1.edr -o temperature.xvg
xmgrace -nxy temperature.xvg
-Pressure Analysis (select number 16 then press 2 enter)

gmx energy -f md_0_1.edr -o pressure.xvg
xmgrace -nxy pressure.xvg
-Potential Energy Analysis (select number 11 then press 2 enter)

gmx energy -f md_0_1.edr -o potential.xvg
xmgrace potential.xvg
-RMSF (Root Mean Square Fluctuation) Analysis

Root-mean-square fluctuation (RMSF) quantifies the typical angular displacement (of a particle, such as a protein residue, from a standard position) over a given amount of time (typically the time-averaged position of the particle). Therefore, RMSF examines the structural components that deviate most noticeably from their average state (or least). [11]

gmx rmsf -f md_0_1.xtc -s md_0_1.tpr -o rmsf-per-residue.xvg -ox average.pdb -res
(Select number 4 backbone)

xmgrace rmsd-per-residue.xvg
-RMSD (Root Mean Square Deviation) Analysis

We use it to find out how much of a deviation there is between the experimental or observed value and the theoretical or expected value by squaring each difference, axmgrace radius-of-gyration.xvgdding them all up, and then dividing by the total number of observed or predicted values [9,11]. The closer experimental data are to theoretical predictions, the smaller the RMSD.
gmx rms -f md_0_1.xtc -s md_0_1.tpr -o rmsd-all-atoms-vs-start.xvg
(Select number 4 backbone for both the least-squares fit)

xmgrace rmsd-all-atoms-vs-start.xvg
Calculate RMSD with the selection of backbone atoms

gmx rms -f md_0_1.xtc -s md_0_1.tpr -o rmsd-backbone-vs-start.xvg
gmx trjconv -f md_0_1.xtc -s md_0_1.tpr -o peptide.xtc
xmgrace peptide.xtc
gmx rms -f peptide.xtc -s average.pdb -o rmsd-all-atom-vs-average.xvg
xmgrace rmsd-all-atom-vs-average.xvg
gmx rms -f peptide.xtc -s average.pdb -o rmsd-backbone-vs-average.xvg
xmgrace rmsd-backbone-vs-average.xvg
-Radius of Gyration (Rg)

A protein's compactness can be quantified by its radius of gyration. It is expected that Rg will remain reasonably constant in a protein that has folded correctly. Rg can shift over time if a protein unfolds. First, let's look at the lysozyme simulation's radius of gyration [9,11].

(Select group 1 for protein analysis)

gmx gyrate -f md_0_1.xtc -s md_0_1.tpr -p -o radius-of-gyration.xvg
-SASA (Surface Accessible Surface Area)

SASA is surface-accessible surface area in, which it indicates how much the surface area is exposed (or any residue) in your protein [11].

gmx sasa -s md_0_1.gro -n protein.ndx -o area.xvg
xmgrace area.xvg
-Ramachandran plot

gmx rama -s md_0_1.tpr -f md_0_1.xtc -o rama.xvg
xmgrace rama.xvg
Note that this command will be accepted for gromacs 2022 or above only so in case your version is lower than 2022 use this 2 commands
gmx chi -s md_0_1.gro -f md_0_1.xtc -rama yes and gmx chi -s md_0_1.gro -f md_0_1.xtc -viol yes
You can check the Programming club to get the code.

9.Post MD analysis Results

The following results were performed using our Post MD analysis code , we run the code on our 2 trial jobs to evaluate the results.
-(GST) COH-linker-TD28rev
-Temperature -Pressure


Figure 3. the temperature and pressure analysis of COH-Linker-Td28rev model visualized by xmgrace.


-Potential energy

Figure 4. the potential energy analysis of COH-Linker-Td28rev model visualized by xmgrace.


-RMSD

Figure 5. the RMSD analysis (all atoms vs average) of COH-Linker-Td28rev model visualized by xmgrace.



Figure 6. the RMSD analysis (backbone vs average) of COH-Linker-Td28rev model visualized by xmgrace.



Figure 7. the RMSD analysis (backbone vs Start) of COH-Linker-Td28rev model visualized by xmgrace.



Figure 8. the RMSD analysis (all atoms vs start) of COH-Linker-Td28rev model visualized by xmgrace.


-Radius of Gyration & Ramachandran plot

Figure 9. the Radius of Gyration analysis and Ramachandran plot of COH-Linker-Td28rev model visualized by xmgrace.


-RMSF
There were no results for it due to that the time was not enough for the MD to generate fluctuations.
-SASA
There were no results for it due to that the time was not enough for the MD to generate SASA
.

-Tau (PHF)


-Temperature & Pressure

Figure 10. the temperatureand Pressure analysis of Tau (PHF) model visualized by xmgrace.

-Potential energy


Figure 11. the potential energy analysis of Tau (PHF) model visualized by xmgrace.


-RMSD

Figure 12. the RMSD analysis (all atoms vs average) of Tau (PHF) model visualized by xmgrace.



Figure 13. the RMSD analysis (backbone vs average) of Tau (PHF) model visualized by xmgrace.



Figure 14. the RMSD analysis (backbone vs start) of Tau (PHF) model visualized by xmgrace.



Figure 15. the RMSD analysis (all atoms vs start) of Tau (PHF) model visualized by xmgrace.


-Radius of Gyration & Ramachandran Plot

Figure 16. the Radius of Gyration and Ramachandranplot analysis of Tau (PHF) model visualized by xmgrace.


-RMSF
There were no results for it due to that the time was not enough for the MD to generate fluctuations.
-SASA
There were no results for it due to that the time was not enough for the MD to generate SASA.

Conclusion:

In our journey, we faced many obstacles. Running the MD jobs was very troublesome and exhausting for our devices as our personal laptops have limited qualities.

Why have we performed umbrella sampling for Beta-amyloid Specifically?

Knowing the importance of studying Amyloid beta 42 aggregations (ID: 2BEG) according to their proven toxicity and promoting Alzheimer's disease pathogenesis, we tried to simulate these aggregations and study their binding energy (ΔGbind) according to many configurations generated from umbrella sampling simulations. Umbrella sampling simulation is a method for improving the sampling system to estimate the free energy of binding profiles along reaction coordinations. The main force derives from umbrella sampling is the pulling force between molecules and each other according to pull.MDP file was specific to the chosen force field. Thus, it can estimate the aggregation formation process and stability.

Following Gromacs Umbrella sampling codes, we simulated Ab42 based on 3 main steps:

1. Generating different configurations according to where the AB42 molecules are of interest (entitled here are our ligands). Each configuration depends on the degree of freedom (Molecule coordination) and where They will be restrained to increase their center of mass distance by the umbrella sampling potential.

2. According to the generated confirmations, the trajectory file is generated based on each molecule's desired center of mass distance (COM).

3. Running umbrella sampling simulations according to generated COM distance for each configuration.

4. Delta G calculation using Weighted Histogram Analysis Method (WHAM) analysis

workflow of codes will be displayed below:

1-pdb2gmx in the code for generating topology file (Coordinate file) for our AB42 Molecule after its acetylation process.

2- Setting up the force field, we will consider the GROMOS96 53A6 parameter set, AP water environment, optimize each chain for None for the N-terminal, COO- for the C terminal

Our output file is: Complex.gro

gmx pdb2gmx -f 2BEG_model1_capped.pdb -ignh -ter -o complex.gro


Defining Unit Cell

3-editconf setting up the box that resembles the environment of the brain with suitable ions and molecules will be detailed below, and place inside it our molecule of interest location

Note: The solvation box coordinates are 12.0-nm boxes, with a distance of 0.5 nm to avoid complications. After generating topology file , COM of our fibril to be (3.280, 2.181, 2.4775) in box dimensions choose to be 6.560 x 4.362 x 12

Output file : newbox.gro

gmx editconf -f complex.gro -o newbox.gro -center 3.280 2.181 2.4775 -box 6.560 4.362 12


Generating Toplology file

4-Solvate : determine a specially number of water molecules will be added in the topology file according to our water module SPC water and then write it in our Topology file.

-cp file represents the protein configuration resulted from the previous editconf step.

-cs represents solvent configuration will be added (SPC water model in our case)

Output file : solv.gro

gmx solvate -cp newbox.gro -cs spc216.gro -o solv.gro -p topol.top


Adding Our ions

5-genion: for adding ions (In our case , 100mM NaCl as a neutralizing concentrations) by using ions.mdp file specific t GROMOS96 53A6 force field. This step will generate atomic level descriptions displayed in a binary file ion.tpr

gmx grompp -f ions.mdp -c solv.gro -p topol.top -o ions.tpr


Choose 13 (SOL) to replace water molecules with our ions and embedding it . Also apply other feature like defining positive and negative ion names , specific concentrations and neutralization.

gmx genion -s ions.tpr -o solv_ions.gro -p topol.top -pname NA -nname CL -neutral -conc 0.1


Relaxing our assembled electroneutral structure using Energy minimization Process with specific minim.mdp file for GROMOS96 53A6 . For process of EM we will assemble our topology , structure and simulation parameter in a .tpr file but this time we will pass it through MD engine of GROMACS “mdrun”

gmx grompp -f minim.mdp -c solv_ions.gro -p topol.top -o em.tpr
gmx mdrun -v -deffnm em


Using posre.itp generated files from pdb2gmx command for heavy atoms of the protein to apply position restraining form. This step aims to equilibriate solvent around our protein without any other added variables. Pressure Equilibriation is done under NPT ensemble that is called "isothermal-isobaric" ensemble and mostly similar to experimental conditions. NPT ensemble include 3 constants of particles , pressure and Temperature. Process done with specific npt.mdp file specific to GROMOS96 53A6 force field .

gmx grompp -f npt.mdp -c em.gro -p topol.top -r em.gro -o npt.tpr
gmx mdrun -deffnm npt


Before umbrella sampling performance, we need to preform configurations as a starting point for the umbrella sampling window and generate an index file for customizing initiative groups that will drive the pulling force. This can be done with a pull.mdp file that will pull peptide_A from the protofibril at 500 ps MD. Leading to capturing enough configurations along with reaction coordinated, more regular umbrella sampling windows, and dissociation between A and B Peptides.

gmx grompp -f md_pull.mdp -c npt.gro -p topol.top -r npt.gro -n index.ndx -t npt.cpt -o pull.tpr
gmx mdrun -deffnm pull -pf pullf.xvg -px pullx.xvg



Figure 17. displays the beginning of the first dissociation of peptide A from the protofibril structure, as seen in the harmonic spring force graph.



Frame extraction for Umbrella sampling in 4 steps:

1-Window space coordination set up

2- frame extraction from generated pulling trajectory file using Bash script written in gromacs

3- COM distance measurement of the coordinate of Peptide A and Peptide B

4- Using extracted frames as input for umbrella sampling

Note: Summary.dat file explain the distance between COMs for generated configurations to choose which ones we will initiate the sampling based on their high values.

gmx trjconv -s pull.tpr -f pull.xtc -o conf.gro -sep


In our MD , we will simulate from 0.5 to 5.0 nm distance with space 0.2 nm along Z-axis coordinates. After generating configurations, we will need to generate input files according to this generation for initiating umbrella sampling, we considered in our case, configuration 6 for the initiation, by using npt equilibration mdp file.

gmx grompp -f npt_umbrella.mdp -c conf6.gro -p topol.top -r conf6.gro -n index.ndx -o npt0.tpr


gmx mdrun -deffnm npt0


Due to low of facilities and Labptop powers, we run this job on 13 different time for 17 days on one configuration. Our future work is to assemble all of these configurations with each other on super computer to know the actual ∆G of amyloid beta 42 . Also, we will go through PHF6 and PHF6* sampling with our peptides for knowing the actual free enrgy of binding.


Figure 18: explaining the npt equilibration method for generating multiple input files that initiate umbrella sampling



Figure 19: represents how the pressure equilibration step derived by pulling force affected the Center of the mass distance of our molecules of interest in our desired configurations.



After equilibrating our configurations, it is now time for our umbrella sampling process, we will first pass our equilibrated file through grompp to be processed by md_umbrella.mdp file. Initial COM distance is set to zero and acts as a reference distance for our simulation.

gmx grompp -f md_umbrella.mdp -c npt0.gro -t npt0.cpt -p topol.top -r npt0.gro -n index.ndx -o umbrella0.tpr


gmx mdrun -deffnm umbrella0


Created two text files the first one containing the Pullf.xvg files and the second contains the tpr file. WHAM code in Gromacs will open every file and calculate deltaG of them.

gmx wham -it tpr-files.dat -if pullf-files.dat -o -hist -unit kCal




Figure [20, 21]: The two graphs represent the analysis of PMF and umbrella histogram according to our configuration using number 6 with its input file to generate umbrella sampling.




Figure 22. represents the aggregates' final side and lateral view.



References

1- Kondori, J., Zendehboudi, S., & Hossain, M. E. (2017). A review on simulation of methane production from gas hydrate reservoirs: Molecular dynamics prospective. Journal of Petroleum Science and Engineering, 159, 754-772.

2- Karplus, M., & McCammon, J. A. (2002). Molecular dynamics simulations of biomolecules. Nature structural biology, 9(9), 646-652.

3- Hollingsworth, S. A., & Dror, R. O. (2018). Molecular Dynamics Simulation for All. Neuron, 99(6), 1129–1143. https://doi.org/10.1016/j.neuron.2018.08.011.

4- Abraham, M. J., Murtola, T., Schulz, R., Páll, S., Smith, J. C., Hess, B., & Lindahl, E. (2015). GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX, 1, 19-25.

5- https://www.gromacs.org/about.html

6- Tang, J. D., McAnany, C. E., Mura, C., & Lampe, K. J. (2016). Toward a designable extracellular matrix: Molecular dynamics simulations of an engineered laminin-mimetic, elastin-like fusion protein. Biomacromolecules, 17(10), 3222-3233.

7- https://computecanada.github.io/molmodsim-md-theory-lesson-novice/09-water_models/index.html

8- Robustelli, P., Piana, S., & Shaw, D. E. (2018). Developing molecular dynamics force field for both folded and disordered protein states. Proceedings of the National Academy of Sciences, 115(21), E4758-E4766.

9- Lemkul, Justin A. Ph.D. (2018). GROMACS tutorial Lysozyme in water. Retrieved from: http://www.mdtutorials.com/gmx/lysozyme/index.html

10- A Guide to Molecular Dynamics using GROMACS by Dimitar Dimitrov by team KCL_UK

11- http://cgmartini.nl/~mdcourse/pepmd/analysis.html.

12- http://www.mdtutorials.com/gmx/umbrella/index.html