GenoCAD is one of the earliest computer assisted design tools for synthetic biology.[1] The software is a bioinformatics tool developed and maintained by GenoFAB, Inc.. GenoCAD facilitates the design of protein expression vectors, artificial gene networks and other genetic constructs for genetic engineering and is based on the theory of formal languages.[2]
Initial release | 30 August 2007 |
---|---|
Stable release | 2.3.1
/ 11 January 2014 |
Repository | |
Written in | PHP JavaScript C MySQL |
Type | Computer-Aided Design Bioinformatics |
License | Apache v2.0 |
Website | genocad |
History
editGenoCAD originated as an offshoot of an attempt to formalize functional constraints of genetic constructs using the theory of formal languages. In 2007, the website genocad.org (now retired) was set up as a proof of concept by researchers at Virginia Bioinformatics Institute, Virginia Tech. Using the website, users could design genes by repeatedly replacing high-level genetic constructs with lower level genetic constructs, and eventually with actual DNA sequences.[2]
On August 31, 2009, the National Science Foundation granted a three-year $1,421,725 grant to Dr. Jean Peccoud, an associate professor at the Virginia Bioinformatics Institute at Virginia Tech, for the development of GenoCAD.[3] GenoCAD was and continues to be developed by GenoFAB, Inc., a company founded by Peccoud (currently CSO and acting CEO), who was also one of the authors of the originating study.[2]
Source code for GenoCAD was originally released on SourceForge in December 2009.[4]
GenoCAD version 2.0 was released in November 2011 and included the ability to simulate the behavior of the designed genetic code. This feature was a result of a collaboration with the team behind COPASI.[5]
In April, 2015, Peccoud and colleagues published a library of biological parts, called GenoLIB,[6] that can be incorporated into the GenoCAD platform.[7]
Goals
editThe four aims of the project are to develop a:[8]
- computer language to represent the structure of synthetic DNA molecules used in E.coli, yeast, mice, and Arabidopsis thaliana cells
- compiler capable of translating DNA sequences into mathematical models in order to predict the encoded phenotype
- collaborative workflow environment which allow to share parts, designs, fabrication resource
- means to forward the results to the user community through an external advisory board, an annual user conference, and outreach to industry
Features
editThe main features of GenoCAD can be organized into three main categories.[9]
- Management of genetic sequences: The purpose of this group of features is to help users identify, within large collections of genetic parts, the parts needed for a project and to organize them in project-specific libraries.
- Genetic parts: Parts have a unique identifier, a name and a more general description. They also have a DNA sequence. Parts are associated with a grammar and assigned to a parts category such a promoter, gene, etc.
- Parts libraries: Collections of parts are organized in libraries. In some cases part libraries correspond to parts imported from a single source such as another sequence database. In other cases, libraries correspond to the parts used for a particular design project. Parts can be moved from one library to another through a temporary storage area called the cart (analogous to e-commerce shopping carts).
- Searching parts: Users can search the parts database using the Lucene search engine. Basic and advanced search modes are available. Users can develop complex queries and save them for future reuse.
- Importing/Exporting parts: Parts can be imported and exported individually or as entire libraries using standard file formats (e.g., GenBank, tab delimited, FASTA, SBML).
- Combining sequences into genetic constructs: The purpose of this group of features is to streamline the process of combining genetic parts into designs compliant with a specific design strategy.
- Point-and-click design tool: This wizard guides the user through a series of design decisions that determine the design structure and the selection of parts included in the design.
- Design management: Designs can be saved in the user workspace. Design statuses are regularly updated to warn users of the consequences of editing parts on previously saved designs.
- Exporting designs: Designs can be exported using standard file formats (e.g., GenBank, tab delimited, FASTA).
- Design safety: Designs are protected from some types of errors by forcing the user to follow the appropriate design strategy.
- Simulation: Sequences designed in GenoCAD can be simulated to display chemical production in the resulting cell.[10]
- User workspace: Users can personalize their workspace by adding parts to the GenoCAD database, creating specialized libraries corresponding to specific design projects, and saving designs at different stages of development.
Theoretical foundation
editGenoCAD is rooted in the theory of formal languages; in particular, the design rules describing how to combine different kinds of parts and form context-free grammars. [2]
A context free grammar can be defined by its terminals, variables, start variable and substitution rules.[11] In GenoCAD, the terminals of the grammar are sequences of DNA that perform a particular biological purpose (e.g. a promoter). The variables are less homogeneous: they can represent longer sequences that have multiple functions or can represent a section of DNA that can contain one of multiple different sequences of DNA but perform the same function (e.g. a variable represents the set of promoters). GenoCAD includes built in substitution rules to ensure that the DNA sequence is biologically viable. Users can also define their own sets of rules for other purposes.
Designing a sequence of DNA in GenoCAD is much like creating a derivation in a context free grammar. The user starts with the start variable and repeatedly selects a variable and a substitution for it until only terminals are left.[2]
Alternatives
editThe most common alternatives to GenoCAD are Proto, GEC and EuGene[12]
Tool | Advantages | Disadvantages |
---|---|---|
GEC |
|
|
EuGene |
|
|
Proto |
References
edit- ^ a b Beal, Jacob; Phillips, Andrew; Densmore, Douglas; Cai, Yizhi (2011). "High-Level Programming Languages for Biomolecular Systems". In Koeppl, Heinz; Densmore, Douglas; Setti, Gianluca; di Bernardo, Mario (eds.). Design and Analysis of Biomolecular Circuits. New York Dordrecht Heidelberg London: Springer. p. 241. doi:10.1007/978-1-4419-6766-4. ISBN 978-1-4419-6765-7.
- ^ a b c d e Cai Y; Hartnett B; Gustafsson C; Peccoud J (2007). "A syntactic model to design and verify synthetic genetic constructs derived from standard biological parts". Bioinformatics. 23 (20): 2760–7. doi:10.1093/bioinformatics/btm446. PMID 17804435.2760-7&rft.date=2007&rft_id=info:doi/10.1093/bioinformatics/btm446&rft_id=info:pmid/17804435&rft.au=Cai Y&rft.au=Hartnett B&rft.au=Gustafsson C&rft.au=Peccoud J&rfr_id=info:sid/en.wikipedia.org:GenoCAD" class="Z3988">
- ^ Jodi Lewis (September 14, 2009). "National Science Foundation awards $1.4 million for GenoCAD development". Archived from the original on June 11, 2015. Retrieved October 7, 2013.
- ^ "GenoCAD Code". Sourceforge. Retrieved 8 October 2013.
- ^ Wilson, Mandy. "GenoCAD Release Notes". Peccoud Lab. Archived from the original on 13 October 2013. Retrieved 8 October 2013.
- ^ Adames, Neil; Wilson, Mandy; Fang, Gang; Lux, Matthew; Glick, Benjamin; Peccoud, Jean (April 29, 2016). "GenoLIB: a database of biological parts derived from a library of common plasmid features". Nucleic Acids Research. 43 (10): 4823–32. doi:10.1093/nar/gkv272. PMC 4446419. PMID 25925571.4823-32&rft.date=2016-04-29&rft_id=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4446419#id-name=PMC&rft_id=info:pmid/25925571&rft_id=info:doi/10.1093/nar/gkv272&rft.aulast=Adames&rft.aufirst=Neil&rft.au=Wilson, Mandy&rft.au=Fang, Gang&rft.au=Lux, Matthew&rft.au=Glick, Benjamin&rft.au=Peccoud, Jean&rft_id=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4446419&rfr_id=info:sid/en.wikipedia.org:GenoCAD" class="Z3988">
- ^ Adames N, Wilson M, Fang G, Lux M, Glick B, Peccoud J (2015). "GenoLIB: a database of biological parts derived from a library of common plasmid features". Nucleic Acids Research. 43 (10): 4823–32. doi:10.1093/nar/gkv272. PMC 4446419. PMID 25925571.4823-32&rft.date=2015&rft_id=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4446419#id-name=PMC&rft_id=info:pmid/25925571&rft_id=info:doi/10.1093/nar/gkv272&rft.aulast=Adames&rft.aufirst=N&rft.au=Wilson, M&rft.au=Fang, G&rft.au=Lux, M&rft.au=Glick, B&rft.au=Peccoud, J&rft_id=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4446419&rfr_id=info:sid/en.wikipedia.org:GenoCAD" class="Z3988">
- ^ Jean Peccoud (June 21, 2013). "GenoCAD: Computer Assisted Design of Synthetic DNA". Archived from the original on July 7, 2013. Retrieved October 7, 2013.
- ^ Wilson ML; Hertzberg R; Adam L; Peccoud J (2011). "A Step-by-Step Introduction to Rule-Based Design of Synthetic Genetic Constructs Using GenoCAD". Synthetic Biology, Part B - Computer Aided Design and DNA Assembly. Methods in Enzymology. Vol. 498. pp. 173–88. doi:10.1016/B978-0-12-385120-8.00008-5. ISBN 9780123851208. PMID 21601678.173-88&rft.date=2011&rft_id=info:pmid/21601678&rft_id=info:doi/10.1016/B978-0-12-385120-8.00008-5&rft.isbn=9780123851208&rft.au=Wilson ML&rft.au=Hertzberg R&rft.au=Adam L&rft.au=Peccoud J&rfr_id=info:sid/en.wikipedia.org:GenoCAD" class="Z3988">
- ^ Cai, Y.; Lux, M. W.; Adam, L.; Peccoud, J. (2009). Sauro, Herbert M (ed.). "Modeling Structure-Function Relationships in Synthetic DNA Sequences using Attribute Grammars". PLOS Computational Biology. 5 (10): e1000529. Bibcode:2009PLSCB...5E0529C. doi:10.1371/journal.pcbi.1000529. PMC 2748682. PMID 19816554.
- ^ Sipser, Michael (2013). Introduction to the Theory of Computation, Third edition. Boston, MA, USA: Cengage Learning. p. 104. ISBN 978-1-133-18779-0.
- ^ a b c d e f g h Habibi, N., Mohd Hashim, S. Z., Rodriguez, C. A., & Samian, M. R. (2013). A Review of CADs, Languages and Data Models for Synthetic Biology. Jurnal Teknologi, 63(1).
- ^ Pedersen, M. (2010). Modular languages for systems and synthetic biology.