Updated on 2022-10-28: We were nominated for Best Software Tool this year.
Introduction
The research approach of synthetic biology is more like an engineering method than a traditional biological method. In practice, the designs of synthetic biology are focused on biological functions. That is to say that synthetic biology follows the principle of from bottom to top, specifically from parts to devices and finally to system. As is explained above, parts, as the foundation, play an important role.
The idea of parts standardization was put forward by Drew Endy in 2005. When creating or modifying corresponding biological components, the standardization of parts must be carried out. For the standardized settings of parts, the parts created by one person can be reused by all researchers through standardized settings. And that’s why all teams must submit their parts in the iGEM Competition.
Since a great database of standardized parts had been provided by the iGEM community, there’s no reason not to make full use of the parts database. A problem was encountered when first collecting information and resources for our parts design this year. Although the parts database is provided, we cannot quickly identify which part is most related to our project from the parts registration page because there is no powerful search tool for us yet.
Therefore, we decided to develop PartHub for the iGEM community, as an assistant tool for the parts database. We hope that PartHub can help future teams and researchers in their Design-Build-Test-Learn(DBTL) pipelines.
Try it now
ID Name Sequence Designer Team Content
How to use PartHub
For common users
Below is an overview of PartHub’s workflow.
-
Visit PartHub, for example, http://3.238.241.161:5000/. The website is supposed to be like this:
-
Select the search type and enter search terms (e.g. search type is Content & search term is carotene). The format of the search terms corresponding to different search types is as follows:
Search type Meaning & Format Example Example URL ID The id of the part (e.g. BBa_xxxxxxxx). It is fine to enter a search term with or without BBa_. K3790012 http://3.238.241.161:5000/sp?s=K3790012&searchtype=number Name The name of the part in Registry of Standard Biological Parts. GFP http://3.238.241.161:5000/sp?s=GFP&searchtype=name Sequence The sequence of the part (if it exists). The search term can only be the combination of [a, t, g, c, A, T, G, C]. TTAACTTTAAGAAGGAG http://3.238.241.161:5000/sp?s=TTAACTTTAAGAAGGAG&searchtype=sequence Designer The name of the person who designed the part. Guanqiao Chi http://3.238.241.161:5000/sp?s=Guanqiao%20Chi&searchtype=designer Team The name of the team which designed the part. Fudan http://3.238.241.161:5000/sp?s=Fudan&searchtype=team Content The content of the part in Registry of Standard Biological Parts. carotene http://3.238.241.161:5000/sp?s=carotene&searchtype=contents Our search engine is case-insensitive and support partial match retrieval. In addition, PartHub supports boolean search with multiple search terms (The format of the boolean search with multiple search terms should be xxx AND xxx or xxx OR xxx).
What’s more, PartHub supports fuzzy search. For example, if you want to search for beta AND carotene but you accidentally type in bata AND carotene, PartHub will automatically do the fuzzy search for you and the result will be:
-
View sorted searching results (you can choose certain sorting order). PartHub provides several ways to sort the results, like Most cited, Best match, etc. In this way, users can organize the result in order to their needs. Sorting by Recommended is a weighted sort that considers various aspects of suitability and can be used in situations where users are unsure which sort is more appropriate for their needs.
-
Click on a specific part you are interested in to visualize the relationship network of this part. The page is as follows:
This relationship network is interactive, you can scroll to zoom the canvas and drag to move the nodes. Click to display the part details and double click to go to the part page. On this page, you can also get the sequence of this part directly.
For developers
The source code is on GitLab.
The system requirements to deploy PartHub are as follows:
Minimum | Recommended | ||
---|---|---|---|
CPU | 2core, 2.4GHz | CPU | 8core, 2.9GHz |
Memory | 8GB | Memory | 16GB |
Hard Drive | 20GB, SSD | Hard Drive | 512GB, SSD |
Network | Broadband Internet connection | Network | Broadband Internet connection |
-
The
DataBase
file contains code to import data into the database. -
The
WebCrawler
file contains code to get data from the Registry of Standard Biological Parts. If you want to update the data for a new year, replace the year in line 679 ofsoftware_ver0.4.py
with the year you want to get the data for. Then you can runMergeCSVFiles.py
to get the newall_collections.csv
. Then runPreprocessing.py
to getall_collections_filted.csv
. After uploading the data viaLoadCSVFile.py
, you are able to use PartHub with updated data. We recommend you to run it on a PC with more than 8 core CPUs for getting data of just one year, and on a higher performance computer for getting multiple years of data. Remember to switch the appropriate number of threads insoftware_ver0.4.py
for better performance. -
The back-end code of PartHub is written in Python (
.py
files). The role of each part of the code has been commented in the file. -
The
static
file contains static files(images and JavaScript files). Thetemplates
file contains front-end HTML files.
Contribution to the iGEM community
Since PartHub contains information on almost all of the Registry of Standard Biological Parts from 2004 onwards, PartHub can be useful for teams with multiple tracks. In the Design-Build-Test-Learn(DBTL) biological engineering pipeline, PartHub can provide support for all iGEMers in every phase.
During the designing, PartHub can help designers find and focus on sequences, engineering approaches, workflows, componentry, and organisms of interest. In the build phase, PartHub can help you build a combinatorial assembly of DNA-encoded componentry using the previous BioBricks of iGEM. You can get some useful protocols in the test phase through PartHub. In the learn phase, when organizing and examining data from the design / build / test phase, you may be enlightened on the next step of improvement by the PartHub-provided relationships between parts.
The first users of PartHub were Team Fudan's teammates responsible for the part. They tried out our PartHub, and found it significantly improved the efficiency of their part search compared to the search engine in the Registry of Standard Biological Parts. Eventually they replaced the search engine in the Registry of Standard Biological Parts with PartHub. (for more details please visit our parts page). In their DBTL pipelines, PartHub is proved to be validated by their experimental work. In addition, The web-based PartHub has a good interface and can be easily integrated into other team's new workflows. In addition, because PartHub covers all tracks and all types of projects, it is useful for all iGEM projects. We introduced PartHub to the community in slack's iGEM 2022 Global and received positive feedback:
Design and implementation
Web Crawler
To get data of parts from Registry of Standard Biological Parts, the Web Crawler based on the selenium framework was developed. The information of the parts in the Registry of Standard Biological Parts is standardized, hence the information can be obtained by crawling through the elements of the fixed XPath. We ended up deploying the Web Crawler on a Windows-based server (Linux is also supported). A high-performance computer or computing group, especially HPC platforms, is recommended to run the web crawler due to the vast number of pages on Registry of Standard Biological Parts. Laptop might be not suitable for this heavy-load work. The following is our computer configuration:
CPUs | Dual Intel® Xeon E5-2680v2(10core, 20thread, 2.80 GHz to 3.60 GHz) |
---|---|
Memory | 64GB, DDR3, 1600MHz |
Hard Drive | 500GB, NVMe, SSD |
GPU | NVIDIA® GeForce GTX1050 Ti |
Additionally, in order to keep the parts stored in PartHub updated, you can get data from multiple years in a row with the Web Crawler, or you can also crawl the most recent year's data and merge it with former data.
Finally, after merging each year’s collection and data cleaning, we will save the collected data to a .csv file, which will include information such as the part number and name, content, citation, etc.
Deployment of PartHub
The acquired .csv
file will be loaded to establish nodes for each part, construct a relationship network of parts with citations and twin parts relationships, creating a Neo4j database. For querying Neo4j databases afterwards, the Cypher language, which is an efficient database query language, is required. To facilitate data transmission between the back-end and the database, py2neo is applied to gain database access to the back-end.
PartHub’s back-end is based on Python, and a public PartHub is currently working on the server of AWS (Amazon Web Services) with the configuration of t3.xlarge. Private deployment of PartHub or public deployment of mirrored PartHub is warmly welcomed.
Search logic
There are options for what to search, which based on the different features in Registry of Standard Biological Parts. PartHub provides search optimizations such as support for complementary sequence search, fuzzy search, and search result sorting. Boolean search with multiple terms combined with AND / OR relations is also available. As for the search results, each entry represents a part, and the basic information is neatly listed nearby. Click on the title to view the relationship network tree related to this part, and you will find a large number of connections between parts which can not be found in the iGEM built-in search engine.
Also, PartHub provides several ways to sort the results, like Most cited, Best match, etc. In this way, users can organize the result in order to their needs. Sorting by Recommended is a weighted sort that considers various aspects of suitability and can be used in situations where users are unsure which sort is more appropriate for their needs.
Visualization of relationship between parts
Under the structure of Neo4j, we manage to visualize the relationship between parts. Each node stands for a specific iGEM part in the database, while the line between nodes represents the cite and cited relationship or twin parts relationship. The number of citations and publication time of part are directly presented by the size and color of the nodes. The canvas outputs are interactive, allowing the user to zoom, drag nodes, and click nodes, etc. In addition, basic information of the clicked part is presented in a table on the side. For further information, the user may access the page of this part in the Registry of Standard Biological Parts directly, or obtain its sequence (if it exists).
Cross-platform compatibility
In order to ensure a robust cross-platform approach for PartHub, we chose to develop a web-based user interaction module. It is also important to use universal features and interfaces when developing web applications in order to achieve good cross-platform performance. PartHub can be used on a PC (either Windows, Linux, or macOS), iPad, iPhone, or Android phone. The following browsers have been tested to work (this does not mean that the browsers not listed do not work):
- Chrome
- Edge
- Firefox
- Safari
Outlook
At present, only part of iGEM has been standardized, and we expect to standardize the description or design in the future to obtain more useful knowledge from the iGEM community. In addition, it would be great for the iGEM community to have a public PartHub that can run for a long time with data updating every year.
Acknowledgment
- Sunzhe Kang @KangSunzhe
- Shitao Gong @Tom_GhoST_Smith, and 2021's Part Camera