Overview:

The first step the dry lab took in preparation for delivering primer sequences to the wet lab was to identify appropriate primer binding regions. Some consideration when choosing were commonality and specificity. That is, the region that we choose should be specific enough that it is not conserved between different species, but common enough that it is shared among most populations in the species. Deviations from either could lead to false positive, and false negative results, respectively. We would then use primer design software to generate the actual primer sequences for viable primer sets for each region that we had chose in the first step. Finally, we would pass these sequences through our own filter by inputting them in PrimerScorer and deliver those primers to the wetlab for experiment.

Identifying Target Regions For Primer Design:

Our initial research identified the ITS region to be satisfactory to our needs for commonality and specificity. However, this was quickly dismissed, as it became clear from more thorough research that the ITS region was a universally conserved region for all fungi, and was simply an ineffictive region to build primers on (Schoch et al., 2012). We utilized the sequences provided by Genbank to narrow down which regions of the Bretziella fagacearum we should focus on, and created a table of valid target regions (image below). From this table, we continued with the decision that the BT and MCM7 regions would be good candidates as target regions for primer design.

Generating Primers:

We utilized multiple already existing softwares to generate primers for the target regions BT and MCM7. Of the existing softwares that are shared publicly, we utilized GLAPD (Ben at al.) and PrimerExplorer (FUJITSU LIMITED).

Ranking Primer Outputs with PrimerScorer:

Mathematical modelling for predicting and ranking the success of LAMP primer sets generated from various softwares

Different primer design software often use various guidelines to achieve enhanced functionality and optimal amplification results. For example, GLAPD applied specificity, commonality and binding tendency checks to ensure that primers are able to amplify all and only the target regions from the genome background. From these different guidelines, different software generate and return different primer sequences. Additionally, each software might generate sequences that are subject to the biases and knowledge of the software creators. Therefore, we aimed to apply an additional layer of checks through PrimerScorer in order to standardize the primer ranking process. This way, users (in this case iGEM Toronto), would not be constrained to experimenting with the results of just one software.

In our team discussions for creating additional filters, we talked about critical factors, such as repeating domains, temperatures, dimerization, and mutations. Some of these guidelines would later be implemented in our PrimerScorer. Other considerations, such as visualising the mutation rate by calling all collectable variants against the reference genome, however, were difficult for us to implement into our guideline for the shear amount of computation and possibilities we would have to consider.

The guidelines we ended up with are implemented in our PrimerScorer. PrimerScorer is built from Python in Google Colab to encompass and include the skills of as many members as possible. PrimerScorer, simply, adds a point for every primer design guideline it satisfies, and divides the point count by the total number of points that can be earned. A higher percentage means that more guidelines are satisfied and that the primer set is predicted to better bind to the target region it was designed for.

Primer distances (5pts total)

One point for each criteria that is satisfied

(F2/B2): 120-165 bp
Loop (F1C-F2): 40-60 bp
F2-F3: 0-60 bp
B2-B1c: 40-60 bp
B3-B2: 0-60 bp

Coding this guideline required important considerations, such as understanding what the distance between primers actually delineated. Do we start at the 3' end of the primer, or the 5' end, or was it different for each pair of primers? With the help of Eiken Chemical Co.'s primer design manual as well as Li et al.'s 2016 paper, we were able to reflect the proper distance between primers. Below is a re-drawing of the diagram that Eiken Chemical Co used to explain primer distance, which we closely followed.

Primer length (6pts total)

One point for each primer that satisfies the below.

F3, B3: 15-25 bp
F2, B2: 15-25 bp
F1c, B1c: 15-25 bp

GC Content (6pts total)

One point for each primer that has 40-60% GC Content.

Melting Temperature (6pts total)

One point for each primer that satisfies the below.

F3, B3: 55-63 degree Celsius
F2, B2: 55-63 degree Celsius
F1c, B1c: 60-68 degree Celsius

Primer3 was used to calculate melting temperature. Small modifications were made to this criteria, as none of the primers generated by GLAPD and PrimerExplorer reached the minimum temperature for both MCM7 and BT primers. Therefore, for each separate analysis, the minimum temperature was made to equal a value that would allow for the top three highest melting temperatures for each primer (F3, F2, F1c, B1c, B2, B3) to score a point. The team felt comfortable in decreasing the melting temperature threshold, because the melting temperature is a mathematically derived number, and therefore not meeting the threshold cited in literature was not so significant. Additionally, by doing so, we can actually make use of the fact that some primers that have higher melting temepratures compared to others might have a better chance at primer success.

Melting Temperature Difference (3pts total)

One point for each primer pair that satisfies the below

Each primer pair (F3/B3, F2/B2, F1c/B1c) has a maximum Tm difference of 5 degrees Celcius.

Runs of 3 or more of one base (i.e ACCC) (6pts total)

One point for each primer that does not have a run of three or more of one base.

Dinucleotide repeats (i.e ATATATAT) (6pts total)

One point for each primer that does not have a run of three or more dinucleotide repeats.

Forward and reverse primers complementary sequences (3pts total)

One point for each pair of forward and reverse primers that don't have complementary sequences of three or more bases

Self-complement (6pt)

One point for each primer that doesn't have complementary sequences of three or more bases within its own sequence.

Out of 47.

Primer design guidelines inspired from Lucigen Lamp Primer Design., Eiken Chemical Co, and Li et al., 2016

The nature of primer binding and biology is not fit for one-fits-all criteria. Therefore, throughout PrimerScorer, we incorported places for guideline modification. Users can customize the guidelines, as we did for melting temperatures, to better suit their experiments and needs. Some of these parameters include minimum and maximum distance between primer pairs. There are also print statements at the end of most cells so that users can understand the output after each code cell and understand our approach, instead of treating the scorer as a black box.

Additionally, PrimerScorer is suited for both primer sequences that are inputted manually, as well as formatted as a text file (as is the output of GLAPD). With just a few other inputs, such as the region reference ID (i.e MG269953.1 for BT), PrimerScorer can output a text file with the ranking and computed score of the primers.

The primer explorer was uploaded to the GitLab software repository, but here is also a PDF version of our code and software, one which displays the code and logic behind the scorer, another which is what users will see when using the scorer.

Results:

The ranking of PrimerScorer on the primers outputted by GLAPD and PrimerExplorer for MCM7 and BT.

From these rankings, wetlab pulled the top 3 primers of each list, and then from the MCM7 primers, chose a primer that was similarly scored to the top primers, and from the BT primers, chose the lowest scoring primer to validate the PrimerScorer. In theory, if the third highest ranking primer from the MCM7 primers showed results, then the similarily scored primer should also show results. In contrast, if the top three BT primer sets showed results, the fourth primer set we chose that had a low score could show no results.

In summary, for the region MCM7, wet lab chose these primers to experiment with (brackets are the names that wetlab calls the primers): PrimerExplorer ID39 (MCM7 PE1), PrimerExplorer ID6 (MCM7 PE2), GLAPD 0 (GLAPD1), and GLAPD 1 (GLAPD2) . For the region BT, wet lab chose these primers to experiment with (brackets are the names that wetlab calls the primers): PrimerExplorer ID12 (BT PE3), PrimerExplorer ID4 (BT PE2), PrimerExplorer ID4 (BT PE1), and GLAPD 0 (BT GLAPD1).

When comparing the wet lab experimental results and PrimerScorer results, we saw successful prediction within the primer rankings of each region but saw some differences when comparing the primers between the MCM7 primers and BT primers. Within the MCM7 primers, PrimerScorer accurately predicted the most successful primer set, PrimerExplorer ID39 (MCM7 PE1). PrimerExplorer ID6 (MCM7 PE2), GLAPD 0 (MCM7 GLAPD1), and GLAPD 1(MCM7 GLAPD2) all scored the same score, but PrimerExplorer ID6 (MCM7 PE2) and GLAPD 0 (MCM7 GLAPD1) performed better than GLAPD 1(MCM7 GLAPD2) in the wet lab. Within the BT primers, none of the primers showed results. Therefore, the predictive quality of PrimerScorer couldn't be significantly validated. However, GLAPD 0 (BT GLAPD1) did not show positive results when the top three scored primers didn't, which could speak to the accuracy of PrimerScorer in giving that primer set a low score.

When comparing the scores of the BT primers with the MCM7 primers, one might expect that the BT primers out perform the MCM7 primers. However, this was not the case. This might be explained by a previously mentioned note about melting temperature adjustments made during analysis. For each separate analysis of MCM7 and BT primers, the minimum melting temperature was decreased to encompass the top three highest melting temeperatures of each primer within the primer sets, which should not affect rankings within the analysis, but could affect comparisons between different analyses. In fact, the melting temperatures of BT primers were generally lower than that of MCM7, but this change would not be reflected in the way that we approached the analysis. Another factor to think about is the fact that PrimerScorer was out 47 points, which meant that a difference of a few percent could be just a one point difference, which might not have any real-life translation.

Learnings, Difficulties, and Improvements

There were many difficulties and considerations during the project journey. First, while the initial goal of the dry lab was to build a machine learning model to determine optimal primer design, constraints in resources made this difficult, as there would not be enough data to make use of the advantages that machine learning would have provided. Moreover, there were already many LAMP primer design software in place by previous research teams, and we did not want to reinvent the wheel by building our own software. Additionally, we wanted to brainstorm ways that we could incorporate a mathematical model into our project, while keeping in mind both the compelxity of primer binding, as well as the skillset of our team. From these observations, as well as observations about the singularity of different LAMP design softwares, we came up with the idea of PrimerScorer. The purpose of PrimerScorer is to aid the team in shortlisting which primers to experiment on in the wetlab. We had hopes of including PrimerScorer in an interative process, whereby we would get the results back from the lab about the initial primers provided, identify which primers were sucessful or not, and then make corresponding changes. However, limited funding meant that even if PrimerScorer was to give feedback about which guidelines the primer sequences failed, we would be unable to create these new sequences and experiment on them.

There are also some places for improvement that we identify in PrimerScorer itself. Points about melting temperature were already made in the results section, but there are also other things to consider. In terms of automation and output, there is not much to improve. However, the logic behind PrimerScorer can be ameliorated. For example, many different guidelines are potentially related to each other, and counting the guidelines as separate counts could be a double-counting of points. For example, melting temperature and GC content are not entirely exclusive from each other, and deficits in GC content can affect melting temperature. However, these were weighted equally in PrimerScorer. Moreover, PrimerScorer was out of only 47 points. Based on our results, most primers only deviated from each other by a few percent, which can be just one or two points. Therefore, while ranking the primers might separate the primers visually to the user, in reality, the complexity of primer binding might not depend on those one or two points. In the future, PrimerScorer can be improved upon so that scoring is more sensitive, and thus can provide a more accurate distribution of the primer set rankings.

The significance of a scoring system like PrimerScorer is not to be ignored. From our results, we see demonstrations of the predictive quality of PrimerScorer. With the scorer comes a standardized way to evaluate primers across different primer design softwares, which can help remove the bias of individual softwares, conserve resources, and make wet lab experiments more efficient.

References:

Blaser, Simon et al. "A Loop-Mediated Isothermal Amplification (LAMP) Assay For Rapid Identification Of ≪Em≫Bemisia Tabaci≪/Em≫". *Journal Of Visualized Experiments*, no. 140, 2018. *Myjove Corporation*, https://doi.org/10.3791/58502.

Eiken Chemical Co., Ltd. “A Guide to LAMP primer designing (PrimerExplorer V5).” 15 Oct. 2019

FUJITSU LIMITED. "PrimerExplorer.", http://primerexplorer.jp/lampv5e/index.html

Jia, Ben et al. “GLAPD: Whole Genome Based LAMP Primer Design for a Set of Target Genomes.” *Frontiers in microbiology* vol. 10 2860. 13 Dec. 2019, doi:10.3389/fmicb.2019.02860

Li, Jing-Jian et al. “Loop-Mediated Isothermal Amplification (LAMP): Emergence As an Alternative Technology for Herbal Medicine Identification.” Frontiers in plant science vol. 7 1956. 26 Dec. 2016, doi:10.3389/fpls.2016.01956

LucigenVideo. *Loop-Mediated Isothermal Amplification (LAMP): Primer Design And Assay Optimization*. 2018, https://www.youtube.com/watch?v=GJkvQqDufh0.

“PRIMER3.” Primer3, https://primer3.org/.

Schoch, Conrad L. et al. "Nuclear Ribosomal Internal Transcribed Spacer (ITS) Region As A Universal DNA Barcode Marker For Fungi". *Proceedings Of The National Academy Of Sciences*, vol 109, no. 16, 2012, pp. 6241-6246. *Proceedings Of The National Academy Of Sciences*, https://doi.org/10.1073/pnas.1117018109.

Si Ammour, Melissa et al. "Use Of LAMP For Assessing Botrytis Cinerea Colonization Of Bunch Trash And Latent Infection Of Berries In Grapevines". *Plants*, vol 9, no. 11, 2020, p. 1538. *MDPI AG*, https://doi.org/10.3390/plants9111538.

Silva Zatti, Matheus et al. "Isothermal Nucleic Acid Amplification Techniques For Detection And Identification Of Pathogenic Fungi: A Review". *Mycoses*, vol 63, no. 10, 2020, pp. 1006-1020. *Wiley*, https://doi.org/10.1111/myc.13140.

Wu, C. P. et al. "Rapid And Accurate Detection Of Ceratocystis Fagacearum From Stained Wood And Soil By Nested And Real-Time PCR". *Forest Pathology*, vol 41, no. 1, 2011, pp. 15-21. *Wiley*, https://doi.org/10.1111/j.1439-0329.2009.00628.x.

Introduction: Ideas and Goals