CH391L/S14/CAD systems

Introduction
Computer-Aided Design (CAD) tools are software packages which are created to help design and engineer new systems. In traditional engineering fields, these programs have long been used to aid in optimizing production processes, modeling chemical reactions, and creating new products. Graphical User Interfaces (GUIs) act as the human-readable visualization of computer languages which are designed to assemble components into useful products or devices. Many of these programs include capabilities for simulating the outcome of a given assembled device as well as automating the assembly with a specific goal in mind. CAD tools offer high throughput design and analysis of synthetic biological devices, making synthetic biology more accessible, cost-effective and powerful. In general, the use of a CAD program in synthetic biology will involve the following steps: step 1: User draws a biological system

step 2: User performs some analysis and repeats step 1 if didn't obtain what expected

Analysis in step#2 can include: mathematical analysis of non-linear systems kinetic and chemical analysis stochastic simulations, structural analysis, and methods from systems biology prediction of evolutionary trajectories for directed evolution database look-up to find suitable components.

Synthetic Biology CAD Tools
Synthetic Biology CAD tools are programs which help to create novel biological constructs. At the most basic, these programs are essentially enhanced DNA editors which provide a user interface to facilitate easier manipulation of the basic “parts” which comprise biological devices. Some of the more advanced programs have a variety of functions including visualization, asserting validity of constructs, and simulations of metabolic networks. In general, CAD programs for synthetic biology should comply with SBOL (Synthetic Biology Open Language) to facilitate use with the Parts Registry and sharing of parts with other researchers.

Basic Design and Alignment Tools

In the majority of CAD programs for biology, the basic program is a GUI for editing and annotating DNA sequences. The interface often provides a way to edit the sequence for parts and devices, in addition to annotating various regions of the DNA. Most programs have, at the very least, a sequence/part editor which will output the information according to various standards for exchanging biological parts, i.e. SBOL. Many also contain visualization features which show the parts assembled into a vector or plasmid in a compact way, as in VectorEditor VectorEditor2012 or Ape. Others also include improved design features such as codon optimization (i.e. Gene Designer 2.0 GeneDesigner2006 ). More advanced transcription/translation optimizer software is also available commercially (GeneOptimizer), and includes considerations such as mRNA secondary structure and GC content in choosing the most productive device design.

Many of the basic DNA editors also allow for the design of primers for traditional cloning. In light of more recent advances in large-scale cloning techniques, some newer programs such as Gibthon provide automated design of primers for Gibson cloning and other new cloning strategies.

BLAST (Basic Local Alignment Search Tool) is a web-based utility which aligns genetic sequences to a reference sequence. This tool is a basic requirement for almost all synthetic biology research, as it is used to verify that the sequencing results of a given part or device match the expected composition for the design. Moreover, BLAST is the tool of choice to detect particular similar/homologous or identical sequences (including non-continuous sequences) within a user defined set of genome sequences from publicly available nucleotide and protein data banks.

Other more advanced alignment programs (such as Chromas or Geneious) will align multiple sequences directly from the trace files which show signal intensity output from sequencing software. The program Geneious is particularly useful in generating a complete and organized view of the genome of choice. From annotating genome sequences, to keeping track of everything related (primer design, genetic modifications, sequence analysis, etc.) to a particular genetic engineering project, these sort of multi-purpose stand-alone software tools are becoming very popular among the synthetic biology community.

Assembly Tools Several of the more advanced CAD programs provide features which aid in the assembly of simple biological parts into more complex features and devices. In some cases, the framework provides a way to compile various simple parts into more complex features with error checking to validate the composition of a component. For example, the complex device at right (genetic toggle switch Togglepaper2000 ), which is composed of several simple parts (i.e. promoter), can be error-checked using the Eugene Language Eugene2011, which strictly defines synthetic biology devices, part types, parts and properties, to validate a functional composition. More advanced algorithms automate the assembly of components by checking the entire set of permutations containing a given group of parts for valid constructs, returning only those designs which are likely to be functional for the desired task. There are also downloadable tools such as Genome Compiler or Gene Composer and web-based tools such as DNAWorks DNAworks2002 or GeneDesign Genedesign2006 which are designed to facilitate the assembly of much larger devices from simple and complex parts.

Database Tools

Several software programs are designed for maintaining records of BioBricks or other synthetic constructs. These programs are primarily focused on providing accessibility to collections of parts which are available. One example is the Joint BioEnergy Institute's JBEI GD-ICE program, which is a web-based tool for creating and maintaining a "Inventory of Composable Elements" for a lab group. The tool is primarily designed for creating private databases within a smaller group of researchers, but JBEI also maintains a public database of parts. Clotho also has built-in capability for maintaining a local database of biological parts within a lab group or institution.

Addgene is non-profit plasmid repository that features a free online cloning vector analysis tool. Their mission is to maintain a high-quality library of published plasmids for use in research and discovery, and for preservation and distribution Addgene2014. Their platform conveniently links plasmids with their corresponding research articles. The BioBricks Foundation is presently partnering Addgene to distribute plasmids that have been contributed under the BioBrick™ Public Agreement.

Pathway prediction/construction tools

FMM can reconstruct metabolic pathways form one metabolite to the other one, thus this tool provides essential support for synthetic biologist. This user-friendly freely available web service works by combining KEGG (metabolic pathway database) maps and KEGG LIGAND information to form combined pathway maps, identifying the corresponding genes and organisms, giving an output in which different pathways can be compared. Although it is limited to characterized pathways in the KEGG framework, it can provide a convenient starting point for many investigations. A more advanced method, BNICE Hatzimanikatis2005, predicts novel pathways on the basis of somewhat broader reaction rules of the Enzyme Comission classification system. Because BNICE is not restricted to entries from a specific database, it can also predict unknown pathways that are potentially chemically feasible. Another prediction system based on enzymatic reactions, DESHARSKY Rodrigo2008, uses the choice of host organism as starting point for pathway prediction. Its algorithm searches for all possible pathways that connect the metabolic network of the organism to a target compound, after which the thermodynamic favourability and the energy loss in transcription and translation are calculated. A comprehensive review of these and various other pathway prediction tools has been published recently Medema2012.

Full Featured Tools

There are a several stand-alone CAD applications which combine various of these tools into one single package. From importing large sets of parts in spreadsheet format (i.e. Clotho) to simulating the metabolite levels from a network containing synthetic devices (i.e. Tinker Cell TinkerCell2009 ), these integrated packages aim to provide the entire toolbox of CAD capabilities to synthetic biologists. In addition to these full featured packages, some programs are designed solely for the purpose of modeling metabolic networks (i.e. SynBioSS SynBioSS2010 ).

j5 is a web-based tool that has multiple design features. It features automated assembly of scar-free devices from multiple biological parts. j5 can perform a variety of assembly protocols, including Gibson, Golden Gate, and circular polymerase extension cloning (CPEC). j5 also showcases engineering-related features such as cost optimization, enforcing design specification rules, and automated construction of combinatorial libraries. j52011

SnapGene Viewer is a software that allows to create, browse, edit and share richly annotated DNA sequence files up to 1 Gb in length. Sequence data may be directly entered, or imported from record from GenBank, or opening an annotated sequence stored in one of many common file formats. It has built-in automatic annotation of common features, such as identification of open reading frame (ORI) with a single mouse click.

GenoCAD has designed framework that can automatically manage the constraints associated with the different standards, this will help the community better leverage ongoing standardization efforts. It uses context-free grammar (CFG) CFG to model the structure of genetic constructs making it possible for users to quickly assemble from a rich library of genetic parts, constructs compliant with any of six BioBrick assembly standards Cai2010. GenoCAD's design strategy of synthetic genetic constructs in the form of grammatical models allows two different ways in which it can be used: a user can design a synthetic construct by successively selecting design rules to transform the structure of the design; or a user can upload a DNA sequence designed outside GenoCAD to validate its consistency with the grammatical model.

Standardizing Representation of Synthetic Biology Parts


If images representing biological parts are not formalized and every CAD software developer creates their own symbols and representations, this would generate much confusion and increase the CAD learning curve for the synthetic biology community. Standard biological representations of parts is critical for the advancement of synthetic biology. The Synthetic Biology Open Language is an open-source standard for representing designs consisting of both DNA sequence information and higher level annotation of parts with defined roles and behaviors Galdzicki2011. The core specification of this system has been developed as an RFC SBOLRFC. Several different synthetic biology CAD software programs use this format. Representation at this higher level of parts can be visualized and simulated in some of these systems (e.g., TinkerCell).

The Eugene Language Eugene2011 is an open-source human-readable language designed to facilitate automatic creation of new devices from a collection of parts. Eugene includes a standardized format for specifying devices and parts as well as constraints on how they can be assembled into higher level devices (i.e. genetic toggle switch). Eugene also features functions for automatic generation of functional assemblies into complex devices. Eugene does not support visualization of constructs.

iGEM Software Tools Development
The iGEM competition for development of software tools is designed to promote creation of publicly available CAD programs for synthetic biology. Similar to the Registry for Standard Biological Parts], the software tools entered into the competition must adhere to certain standards of interoperability and data format in order to facilitate reuse and ease of collaboration among researchers. There are several categories developers can pursue, including specific modular CAD frameworks (i.e. Clotho) as well as sharing data and interfacing with the Parts Registry. iGEM hosts a repository of these open source software packages from past competitions, which is freely available.

One exciting tool is the MoClo Planner, a multi-touch interface for supporting the design of complex and useful biological constructs. It draws information from the MIT Registry of Biological Parts, PubMed, and the iGEM archive. Its design implements Golden Gate Modular Cloning (MoClo) MoClo, a novel laboratory method that allows the efficient creation of multi-gene constructs from a library of biological parts. Using this method, biological parts are permuted and joined together in a tiered fashion to create new synthetic biology constructs. The MoClo method includes: browsing a library over 2200 biological parts; selecting biological parts based on their function, genetic sequence, and other biological characteristics; computing possible permutations of parts in predefined arrangements; and designing primers and fusion recognition sites.

Future Directions
Although there is a vast collection of useful synthetic biology CAD programs, there is a pressing need for improved standardization and modularity. This includes finding consensus for defining individual components or parts, and the implementation of restrictions intended to simplify the process of building synthetic networks while making these more robust and interchangeable. An existing standard is the standard assembly Shetty2008, which has made DNA assembly simpler. In the future, it is anticipated that standards will also exist for describing the dynamics of a part; for example, standard promoter parts might contain a "strength" value, describing its efficiency in recruiting RNA polymerase under some standard environmental condition Kelly2009. Standardization is also important in naming such future values as well as parts to always maintain a computer-readable format such as the Resource Definition Language Galdzicki2009 standards.

The current state of understanding for how DNA parts come together to make a functional biological device is lacking. Advances are coming swiftly with the advent of high-throughput technologies, but Computer Aided Design programs have yet to catch up. Specifically, it is not fully understood how a part changes its function when placed in different devices, so it has proven difficult to create a fully functional, complete language for combining parts efficiently while maintaining their expected functionality. Whereas we are currently capable of modeling metabolic networks to study the effects of a single step in the pathway of synthesis of a relevant material (i.e. biofuel), one can envision a time in the future where the software tools will advance to the point of being able to create de novo networks for the synthesis of completely new products (i.e. non-protein/nucleic acid polymers) within the context of a cell. In the coming years, synthetic biology CAD programs will be able to facilitate the rapid advancement of completely new engineered biological devices Medema2012.