Our human practices integrated feedback from academic experts, clinical researchers, biology educators, and high school students to create new design and accessibility features in our project. These human practices brought us face-to-face with a breadth of envisioned end users and stakeholders, allowing us to gauge the impact of our work from multiple angles and communities.
The goal of our project was to develop a computation pipeline that can be utilized by a wide variety of researchers in different wet lab settings, but what exactly does this look like in practice? Beyond mere technical features, our project must be one that users trust and are comfortable using. Given how new the computational methods our project uses are, designing project features with the feedback of end users and the general public in mind was essential to instilling confidence in our technology.
From the early stages of our project, we wanted to ensure that our human practices gathered feedback from diverse groups of stakeholders on a wide variety of topics. Thus, the first stage of our human practices was brainstorming how to best accomplish this.
The first question we had to ask ourselves was: Who exactly do we want to gather feedback from? To answer this, we turned to the ultimate goal of our project. We wanted our pipeline to be widely accessible, and envision applications for it in both academia and clinical research, as well as in educational settings. In addition to aiding researchers, our project seeks to address inequities within synthetic biology by reducing the amount of wet lab resources needed to engineer novel proteins–making protein engineering more accessible to labs with limited resources. Thus, we broke our human practices demographics down into three categories: academia, clinical, and educational. In addition, we wanted to make sure we solicited feedback from labs of different sizes, to assess how we can better address inequalities through our project.
We then asked ourselves how we could best gather this information. Who do we reach out to? What format would be best to present our questions? For the academic side of things, we decided it would be feasible to reach out to professors at various Harvard and MIT labs and discuss our project with them in a more one-on-one style discussion, meeting at multiple points in our project development. For clinics, we found that many of the people we reached out to would not be able to work multiple one-on-one meetings into their schedule, so we instead opted to attain a breadth of clinical feedback through electronic surveys and occasional follow-up discussions. Lastly, in order to gain a sense of how our project could be incorporated into educational biology settings, we reached out to various summer programs in wet lab biology and even became involved directly with the BioStar bioengineering program. In addition to serving as an educational and mentorship experience for our members, our interactions with students and facilitators in these programs taught us a lot about how bioengineering is currently taught and how we can adapt our project to more easily be incorporated into current curriculum.
Given our location in Cambridge, Massachusetts–a large hub of biotechnology research–we were fortunate enough to find a number of labs willing to provide feedback on our project through personal discussions over the course of the project cycle. In order to adequately address how our pipeline can be adapted for researchers from a wide range of familiarity with computational methods, we sought to have direct conversations with labs that incorporate varying degrees of computational work into their research. We ended up connecting with four labs. Two of these labs were primarily computationally oriented; we received feedback from researchers in the labs of Sean Eddy, a Professor of Molecular and Cellular Biology and of Applied Mathematics whose lab focuses on dry lab computational biology, and Sergey Ovchinnikov, who’s group also works on computational protein folding. In addition, we received feedback from the labs of Andrew Murray, who studies yeast through a combination of wet lab and computational methods, and Lee Rubin, a professor of stem cell and regenerative biology with a primarily wet lab focus.
These labs differed in the degree to which they incorporate computational methods, allowing us to first gauge how researchers from different backgrounds would respond to our initial computational pipeline.
Through multiple conversations with these labs, various design and accessibility features of our project evolved. Our project started as a terminal command-line based software package, but wet lab researchers expressed concerns about the accessibility of this approach. Moreover, our initial thinking was to keep the output of our pipeline simple and limiting it to residue mutations, but, at various stages of the project, academic researchers requested specific sets of data generated by our pipeline to be accessible in the output, such as evolutionary trees and various other binding metrics beyond mere binding affinity. In particular, the Murray group was interested in the evolutionary relationships between successive generations of protein candidates in our pipeline. While we agreed that having all this data thrown at the user at once would be overwhelming, we developed a system of secondary file outputs to include these datasets in an interpretable format, organizing them by specific research application.
Moreover, feedback from computational researches enabled us to confront many of the current limitations to AlphaFold. Advice from the Ovchinnikov and Eddy groups informed us about the shortcomings of AlphaFold when predicting single-point mutations. Since Alphafold tries to generate predictions of overall folding, the effects of single point mutations can be lost in a variety of ways. In response to this, we adjusted our design features to incorporate various mutation groups at once, and using an n-1 approach to identify crucial mutations.
While we initially wanted to have a similar one-on-one relationship with clinical researchers as we had with academic labs, this ultimately proved unfeasable due to the demanding schedules of the researches we reached out to. However, our team adapted to this early roadblock by transitioning our clinical human practices work to a survey based system.
To prepare for this transition, our human practices team underwent CITI Social/Behavioral Research training. In addition, we submitted our survey and questions to the Harvard University Committee on the Use of Human Subjects, in order to ensure that our survey upheld the ethical standards maintained by Harvard and iGEM. The survey we sent was not considered in-person human research, and thus did not have to go through the entire IRB approval process as part of regulated research. Once our questionnaire was finished and approved, we sent it out to various clinics at Mass General Hospital, Brigham and Women’s Hospital, and Mount Auburn Hospital. These clinical researchers varied in speciality, and although most of them were in the diagnostics space, again we saw high variation in familiarity with computational tools. Our survey accounted for the backgrounds and current areas of research among our respondents, to help identify the different needs among different groups of stakeholders.
All participants participated in our survey willingly. There were no incentives granted by the Harvard iGEM Team for filling out the survey, and participants were able to stop participating at any time they would have liked. They were also free to not answer any questions they did not want to. All questions were chosen carefully in order to take into consideration different backgrounds, identities, and groups.
A Brief Overview of the Backgrounds of Our Respondents
The results of questions 12-14 were by far what stood out to us the most, and informed how we designed our project with our end users in mind.
In addition to the questions showcased above, many clinical researchers expressed concern about fully automated software in the open response portion of our survey. By far the most common concerns alluded to the idea that computational programs are a “black box” that do not provide researchers with insights into underlying biological mechanisms. This was quite a different tone from our academic feedback, where researchers seemed more open to using the software so long as it was accessible and straightforward to work with. This feedback made us rethink our approach. Under the hood, there are biological insights being generated by our software, notably the identification of residue interactions between the receptor and ligand. However, at this stage in our project, our software did not showcase all its identified receptor-ligand interactions, highlighting mainly binding energy and identified mutations. We thus asked ourselves what design features we can implement to address this concern among a large portion of our envisioned stakeholders.
Ultimately, we decided to rework how our pipeline operated, by breaking a singular, multistep process down into multiple steps. We sepapared the different stages of our pipeline into individual programs that can be run one by one, with each program generating a special output. This way, users can look into each step and the biological insights generated by it, and are able to follow along and see how the data generated by one step is used by the next one. By emphasizing the process through which our pipeline reaches its final result, we hope to instill confidence in computational biology among our clinical end user. These finalized modifications to our pipeline are documented on our wiki, and were sent to our clinical respondents for another round of evaluation and feedback. We are currently awaiting feedback and looking forward to hearing the reactions of our clinical partners to these modifications in our pipeline.
Through our experiences with both academic and clinical researchers, we realized that many of the cutting edge computational tools our project uses are not well integrated into research settings. We asked ourselves how we could solve this problem, and one of our members posed a question: To what degree are researchers exposed to computational tools during their training? This prompted us to look further into synthetic biology education, and investigate how current biology curriculums approach computation. Moreover, we wanted to get involved in educational initiatives ourselves and bring computational biology into an education setting, which we see as an essential step to achieving our mission of greater computational and wet lab integration.
To do so, we reached out to various bioengineering summer research programs, and got a response from BioSTAR, a summer wet lab program for high schoolers. This group of students was a very different demographic from our other human practices contributors, and allowed us to further reflect on how we can make our computational tools more accessible to a broader audience. Through the course of the BioSTAR program, our team members had opportunities to mentor high schoolers in wet lab biology and provide presentations on computational tool and bioengineering. This direct work with high schoolers prompted us to expand our educational initiative beyond the BioSTAR program and develop our own dedicated synthetic biology lecture series, which is described on our education and communication page.
Developing educational materials for high schoolers taught us a lot about how the accessibility of computational biology can be expanded. First and foremost, we realized that many of the commonly used terms in computational protein engineering–such as binding affinity, Shannon Entropy, multiple sequence alignment, and so forth–were unfamiliar to our high school audience. This made it initially difficult for the students to follow the mathematical models we used, and one student even commented that computational biologists “try to solve problems by making them more complicated.” We realized that, without a strong emphasis on the underlying biology that computational variables represent, it is very difficult to instill confidence in these tool among biologists in training. We thus circled back to these terms, and dedicated an entire discussion section to decomposing each variable in our pipeline and the corresponding phenomenon that variable tracks. This insight was not only helpful for our educational materials, but we also incorporated these descriptions into the outputs of our pipeline, where users are given a file that explains each variable as well as describes the meaning associated with the particular value of that variable the user’s data generated (e.g., “A binding affinity this low indicates the protein will not bind the active site.”).
Moreover, this work with students reemphasized the need for visualization within our pipeline. A consistent point of feedback from students was a desire to see the actual residue-residue interactions in a 3D model, which, along with the clinical feedback, pushed us to break our pipeline up into multiple steps, including a step where the 3D Pymol structure can be pulled up for uses to look at.
Overall, our human practices brought us in direct contact with a wide breadth of envisioned end users, and allowed us to build our project in a way that made it more accessible to a larger range of stakeholders. Many of the features of our finalized pipeline would not have existed had it not been for the feedback we received. We would like to thank all the researchers, students, and educators we got to work with over the past nine months. Thank you all for being such a vital part of our Harvard iGEM 2022 experience!