Resources

Certain nucleotide or amino acid sequence may reduce the yield of the protein of interest. In such cases, optimizing the sequence may improve the yield. When designing the template DNA sequence for PUREfrex®, please consider the following points.




CONTENTS



Codon usage

Ribosomes, tRNAs, and translation factors in PUREfrex® are derived from E. coli. Therefore, in principle, it is recommended to design the gene encoding the protein of interest with codons optimized for translation in E. coli.
Nucleotide sequences optimized for E. coli by available optimization tool sometimes may use single codon frequently used in E. coli. Only CTG codon, for example, may be used for leucine. The yield of the synthesized protein may reduce when using such DNA as a template DNA. If the codon usage is excessively imbalanced in the optimized sequence, reassign properly according to the codon usage in the E. coli genome.
The codon usage in the E. coli genome (W3110) is referred to the following web site:
https://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=316407



AT content just after the start codon

Select codons to maximize AT content in the region immediately after the start codon (usually up to 6th codon) (Figure 1). In this region, use AT-rich codons over codons frequently used in E. coli.



Figure 1. Effects of codons near N-terminal region on protein yield
A. Codons of N-terminal region of heavy chain (HC) of trastuzumab (Herceptin)
B. Comparison of yields from template DNA with different N-terminal codons



Secondary structure of mRNA around the start codon

If mRNA can form a rigid secondary structure around the start codon, from SD sequence to the N-terminal of the ORF (about 6-10 codons), and SD sequence is hidden, binding of mRNA to ribosome is impaired and then the yield may be decreased. Substitute N-terminal codons with synonymous codons if a rigid secondary structure to form in this region is predicted.
Example of secondary structure prediction is illustrated in Figure 2. In this example, RNAfold (http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi) is used to analyze DHFR DNA included as the positive control in the kit. Since the region around the start codon does not form rigid secondary structure, the yield is not reduced.


Figure 2. Secondary structure prediction of the translation initiation region
A. Nucleotide sequence of 5'-terminal region of DHFR DNA. The region analyzed is highlighted in yellow.
B. Secondary structure prediction by RNAfold



Amino acid sequence at N-terminus

A low yield may occur if the second and third amino acids right after  the first methionine are proline or glycine. Avoid using proline and glycine in this region if possible.
This is an example of a protein synthesized from a template DNA that has a Gly right after the first Met, resulting in a lower amount of synthesis (Figure 3).





Figure 3. Effect of Gly right after the first Met on the amount of synthesis
A. Construction of template DNA
B. SDS-PAGE



Sequence that cause frameshift

Nucleotide sequences that are likely to cause frameshift during elongation, such as X/XXY/YYZ, should be replaced with other codons. A nucleotide sequence of “A/AAA/AAA” for 2 consecutive lysine residues, for example, should be replaced with “A/AAG/AAA” to avoid frameshift.

Sharma V. et al. (2014) Nucleic Acids Res., vol.42, p.7210.


Sequences containing consecutive proline residues

Proteins with consecutive proline residues may be produced at a low yield. A translation factor called EF-P is known to be involved in elongation at consecutive proline residues in E. coli, but the kit (PUREfrex® 2.0/2.1) contains no EF-P and may therefore produce a lower-than-normal yield. EF-P is available as a supplement (#PFS052-0.5-EX).
Examples of effect of EF-P on the increase of the yield are shown in the following poster.