To bring our iGEM project to life, and to fulfil our goal of using biological systems for solving
real-world problems, we had to use the engineering cycle in multiple stages of the project.
The Origin of Project
During the initial literature review and brainstorming phase of our project, we were interested in the role of the human BRCA mutation, which predisposes women to much higher risk of developing breast cancer. Our initial hypothesis was to design a method of using this as a marker for early breast cancer screening. However, on further reading, we discovered the link between HPV and a small percentage of breast cancer cases. HPV grabbed our attention due to it being the primary causative agent for cervical cancer, which has a higher mortality rate and is heavily underdiagnosed due to lack of appropriate screening tools. The identification of this large unsolved problem, that disproportionately affects people of developing countries, with a potential for detection and implementation of preventative measures was the inception of our further efforts.
Design of the proposed kit
1. Biomarker Selection
HPV has over 150 strains. However, only close to 20 strains have the potential to cause cancer. 90% of the sexually active population would test positive for HPV. This makes it essential for us to limit our diagnosis to the detection of only the high-risk strains of HPV in order to convey useful results to the patient.HPV16 being the most carcinogenic. These statistics clearly demonstrate the importance and need to focus on the diagnosis of HPV16 and 18.
Previous attempts at HPV detection in the literature and existing diagnostics target the viral capsid i.e. the L1 capsid protein gene as a biomarker. Literature review to assess the suitability of L1 as a target in high-risk strains suggested that the gene has a high variability, even among HPV 16 sublineages.[1] We performed Multiple Sequence Alignment of the L1 genes of HPV16 and HPV18 and clear variations can be directly observed.
A more appropriate biomarker was required.
A recent study demonstrated the importance of the E7 oncogene in cervical cancer progression and how its conservation plays an important role.[2] To confirm this theory, multiple sequence alignment of E7 genes of HPV16 sublineages was performed.HPV16 can be classified into 4 sublineages (A1-4, B1-4, C1-4, D1-4) [3]. All of them have different cervical cancer risks. After reviewing literature, it was found that sublineages A4, C, D2 and D3 of HPV16 are most prevalent in Asia. Thus, we chose to perform a Multiple Sequence Alignment on these 4 sublineages to confirm the conservation of HPV16The results clearly show that E7 of all high risk sublineages of HPV16 is highly conserved and can serve as a suitable biomarker.
MSA of oncogenic sublineages of HPV16:
2. Choice of detection method
We decided to opt for a cell-free system as the basis for our kit, due to reasons of simplicity and practicality, as our solution is intended for low-resource settings.
Our initial approach to HPV DNA detection was based on the use of a RNA toehold switch complementary to the target gene, coupled with LacZ expression. However, the system required a cell-free expression system for colorimetric readout and was not practical. As an alternative, we chose to focus on CRISPR-Cas based approaches due to their high specificity for user-defined targets. The transcleavage property of Cas 12, led us to choose it as our tool of choice, allowing for a measurable readout triggered by presence of target DNA.
3. Primer Design for PCR
Forward and Reverse primers need to be designed for the amplification of our target gene i.e. E7. General set of rules followed for these primers:
- The GC content of the primers should be greater than 40%.
- The length of each primer should range anywhere from 18-20 base pairs.
- The primers should end at either a GG or a CC. This is to ensure a strong bond between the primer and the target. A stronger bonding is ensured because of a triple hydrogen bonding between G and C.
The primers are figured out by taking one strand as it is and one reverse compliment of the same strand. According to this, the sequence of our primers was:
- E7 Forward Primer (5’ - 3’) - ATGCATGGAGATACACCTAC
- E7 Reverse Primer(5’ - 3’) - TTATGGTTTCTGAGAACAGATGG
The feasibility of these primers was analysed using IDT’s oligo analyzer tool
4. Guide RNA Design
PAM or Protospacer Adjacent Motif is an essential component of any CRISPR system and plays an important role in designing the guide RNAs for the CRISPR reactions. A PAM is a 2-6 nucleotide long sequence that guides the Cas12a enzyme to cut the target DNA. The PAM sequences are different for different Cas enzymes.Cas12 recognizes TTTV as its PAM site (V = A/C/G). This PAM site is recognized on the 5’ to 3’ strand of the target DNA and the guide RNA binds immediately downstream of the PAM but on the opposite strand.
Our first goal was to determine the PAM sites on our target sequence i.e. the E7 gene. The sequence was taken from NCBI. 297 conserved bases of E7 were taken into consideration and our own PAM finder algorithm was used to find out three PAM sites on the 5’ to 3’ strand of the E7 gene. The guide RNA is supposed to bind immediately downstream of the PAM sequence on the 5’ - 3’ strand.
According to our literature review, 21 bases was the optimum length for a Cas12 guide RNA. The constant region of a Cas12a guide RNA is 21 bases long hence, making our guide RNA 42 bases in length.
Here are the three PAM sequences highlighted in red on the 5’-3’ strand of E7 :
ATGCATGGAGATACACCTACATTGCATGAATATATGTTAGATTTGCAACCAGAGACAACTGATCTCTACT
GTTATGAGCAATTAAATGACAGCTCAGAGGAGGAGGATGAAATAGATGGTCCAGCTGGACAAGCAGAACC
GGACAGAGCCCATTACAATATTGTAACCTTTTGTTGCAAGTGTGACTCTACGCTTCGGTTGTGCGTACAA
AGCACACACGTAGACATTCGTACTTTGGAAGACCTGTTAATGGGCACACTAGGAATTGTGTGCCCCATCT
GTTCTCAGAAACCATAA
Therefore, the three guide RNAs sent to be synthesized from IDT are:
1. CAACCAGAGACAACTGATCTC
2. TTGCAAGTGTGACTCTACGCT
3. GAAGACCTGTTAATGGGCACA
EXPERIMENTAL PROTOCOLS
Initially, the CRISPR/Cas12a reaction protocol was followed according to the standard protocols of the manufacturers. However, multiple unsuccessful experiments led us to alter the protocol to better suit our needs. After discussion with experts and some out of the box thinking, multiple iterations of the protocol were performed by varying reaction parameters and reagent concentrations. Finally, an optimal combination of reaction time, Cas12a concentration and ssDNA reporter concentration were arrived at that yielded successful results.
We designed the initial protocol, tested it and when it failed, we read more and altered the protocol till fruitful results were obtained. The engineering cycle greatly helped us formulate the correct protocol. For more information, please refer to our Experimental Protocols page.
MACHINE LEARNING MODEL
This model was intended to tell us about the most accurate guide RNA based on existing CRISPR/Cas12 reactions. It tells us about the indel frequency of CRISPR reaction at the target i.e. 3’ to 5’ strand of the E7 gene. One of the main outputs of CRISPR indel analysis is the indel frequency, which represents the CRISPR editing efficiency. This parameter is used to assess the suitability of crRNA that will give the most efficient results. Please refer to the Model page for more details
COMMUNITY FEEDBACK
We interacted with various experts in the industry and incorporated their valuable insights into the project as much as possible. We also interacted with various doctors and patients to help understand the severity of the problem statement in depth. This helped us in the “Learning” aspect of the engineering cycle on the basis of which, we could formulate new protocols and test them out to achieve the desired results. For more information, please refer to the Human Practices page
USEFUL TOOLS
To ease out the design cycle, we developed a software tool that could help find the PAM sites in long DNA sequences, and it also gives the guide RNA sequences as the output. This tool helped us in our project, and we hope it also comes of some help to fellow researchers while finding out the PAM sites and designing guide RNAs for their CRISPR reactions. For more information, please refer to the Web Application page under Drylab.
References:
- El Aliani, A., El Abid, H., Kassal, Y., Khyatti, M., Attaleb, M., Ennaji, M.M. and El Mzibri, M., 2020. HPV16 L1 diversity and its potential impact on the vaccination-induced immunity. Gene, 747, p.144682.
- Mirabello, L., Yeager, M., Yu, K., Clifford, G.M., Xiao, Y., Zhu, B., Cullen, M., Boland, J.F., Wentzensen, N., Nelson, C.W. and Raine-Bennett, T., 2017. HPV16 E7 genetic conservation is critical to carcinogenesis. Cell, 170(6), pp.1164-1174.
- Clifford, G.M., Tenet, V., Georges, D., Alemany, L., Pavón, M.A., Chen, Z., Yeager, M., Cullen, M., Boland, J.F., Bass, S., Steinberg, M., Raine-Bennett, T., Lorey, T., Wentzensen, N., Walker, J., Zuna, R., Schiffman, M. and Mirabello, L. (2019). Human papillomavirus 16 sub-lineage dispersal and cervical cancer risk worldwide: Whole viral genome sequences from 7116 HPV16-positive women. Papillomavirus Research, 7, pp.67–74. doi:10.1016/j.pvr.2019.02.001.