Lecture 2 - Protein overexpression in E. coli

In the first lecture, we dealt with inserting a piece of DNA into a plasmid, and introducing the recombinant plasmid into bacteria, where the plasmid, together with the inserted foreign DNA, is replicated by the cell's machinery. We also considered, briefly, the special type of plasmid vectors called expression plasmids, which are designed so that a foreign gene can be placed under the control of a promoter that will function in E. coli. Under appropriate conditions, the foreign gene can be transcribed, and the resulting messenger RNA translated, in the bacterial cell, to produce the protein encoded by the gene. The plasmids used in the lab class (thepBB96 series) are of this type.

As discussed in Lecture 1, expression plasmids make it possible for us to obtain relatively large amounts of proteins that might otherwise be very difficult to purify. Proteins obtained by expression of cloned genes in bacteria can be used in a variety of ways. For example, they may be used for studying the biochemical function of the protein. Also, since the gene is cloned, it is relatively simple to mutate it, and obtain mutant protein, to see how changes in the gene can affect the function of the protein it encodes. Some other uses for overexpressed proteins (that is, proteins that are made in abnormally large amounts by the expression of the gene encoding them) are in raising antibodies, and in structural studies. In order to successfully set up the expression of a cloned gene of interest in bacterial cells, it is necessary to first consider the basic processes involved in the expression of genes.

What are the important features of gene expression in E. coli?

Gene expression in bacteria, as in all cells, involves


a) the production of messenger RNA by copying of the DNA template by RNA polymerase.


b) translation of the message into protein by the protein synthesis machinery.


In bacteria, these two steps are closely linked. Transcription starts when RNA polymerase binds to the promoter region of the gene, and proceeds to copy the DNA sequence into mRNA.
As the 5' end of the transcript ( i.e., the mRNA) is made, ribosomes immediately attach to it, and start translation, even before the entire message is synthesized. The ribosomes move along the message, translating the mRNA, as it is being made. Behind the ribosomes, the 5' end of the message is rapidly degraded. The entire process could be completed in a matter of a couple of minutes.
The important features of this process are:

 

1. RNA polymerase binding to the promoter region of the gene, followed by transcription to produce a messenger RNA. When the polymerase reaches a terminator sequence, transcription stops, and the mRNA is released from the DNA template. This mRNA contains a 5' flanking sequence, the actual coding sequence that will be translated, and a 3' flanking sequence. Because translation does not usually begin at the 5' end of the message, the mRNAs also carry signals that define the beginning and end of each encoded protein.

 

2. Initiation of protein synthesis (translation):
The signal that indicates the start site of protein synthesis is usually the AUG, or start codon, which specifies methionine. This site is recognized by the initiator tRNA, which carries a formylmethionine residue (this tRNA is different from the one that inserts methionines within the protein sequence).

Another important signal on the mRNA is a purine-rich region about 10 nucleotides upstream of the AUG. This sequence, called a Shine-Dalgarno sequence (SD sequence), contains a sequence complementary to the 3' end of 16S ribosomal RNA, which is part of the 30S ribosomal subunit. Because of this, this sequence is also called the Ribosome Binding Site, or RBS. The initiator tRNA will bring a formylmethionine residue to the AUG nearest to the RBS, to initiate translation.

 

3. Termination of protein synthesis:
The end of the protein coding sequence is indicated by the presence of a stop codon (UAA, UGA, or UAG). This signal is recognized by release factors, which detach the completed protein from the ribosome.

What features should an expression vector have?

Obviously, then, to express a foreign gene in E. coli, the most basic requirements are to set the gene up with a promoter, and a ribosome binding site(RBS). Many plasmid vectors are now available,that contain a promoter, as well as an RBS , and convenient cloning sites into which one can insert the gene of interest. The pBB96 plasmids we are using in the lab carry a promoter from the T5 coliphage. This promoter, which is recognized by E. coli RNA polymerase, directs the expression of the dhfr gene cloned into the plasmid.The plasmid also provides a convenient ribosome binding site. Although these are the minimum requirements for expression of the cloned gene, there are several refinements and improvements one can make. These include devices to make the purification of the overexpressed protein easier, as well as strategies to regulate the production of the protein in the bacteria. These are briefly described below.

Affinity tags
To facilitate the purification of the expressed protein, one may use affinity tags. These are DNA sequences that are fused to the coding sequence of the gene, either at the 5' end or the 3' end. These sequences are transcribed and translated together with the gene, resulting in the formation of what is known as a fusion protein. Theadvantage of making such a construct is that the gene of interest can be fused to sequences encoding a peptide that can be readily purified using affinity chromatography. For example, proteins may be expressed from the plasmid pMAL c2 as fusion proteins with the maltose-binding protein, MBP. When the proteinis expressed in bacteria, the cells can be lysed, and the resulting lysate passed over a column of amylose (amylose is a polymer containing many maltose units), that will specifically bind the maltose binding protein, together with its fusion partner.Other proteins are washed away. Elution of the column with maltose then yields only the pure fusion protein. We use a variation on this theme in the pBB96 plasmids, where a DNA sequence encoding six histidine residues is added to the 5'end of the dhfr gene. When the gene is expressed in bacteria, it is made with 6histidine residues attached to the N-terminus of the protein. The histidines serve as the affinity tag in this case, because they bind tightly to a nickel-NTA column. As in the previous example, a lysate of cells expressing the gene is passed over a nickel-NTAcolumn, allowing the histidines, together with the attached DHFR protein, to bind. Contaminating proteins are washed off, and pure DHFR is eluted using imidazole.

Regulation of gene expression
One important consideration in expressing a foreign gene in bacteria is the fact that the foreign gene product may be toxic to E. coli. If this is the case, allowing the gene to be constantly transcribed and translated could result in killing the bacterial cells, which would defeat your purpose. Sometimes, even if the protein is not toxic to the bacteria, production of large quantities of the protein could lead to the formation of inclusion bodies, which are structures in the cell that contain aggregates of insoluble protein. This makes it difficult to recover the protein easily. For these reasons, is a good practice, in general, to arrange things so that one has control over the production of the protein. To do this one can use a variety of tricks, the commonest of which relies on using a regulatory sequence, the lac operator, from a bacterial gene,to control the expression of the foreign gene. The lac operator is a component of a system which is involved in the control of lactose metabolism in E. coli. This system operates as follows: When bacteria are grown in the absence of lactose, they do not make beta galactosidase, an enzyme that metabolizes lactose. If lactose is added to the medium, the cells very rapidly start to make the enzyme, by transcribing and translating the lac Z gene, which encodes beta galactosidase. The way in which this control is achieved is described below:
The lac operator, a sequence downstream of the site where RNA polymerase binds, is normally occupied by a tetrameric protein called the lac repressor, that is the product of the lacI gene. This is the state of affairs in the absence of lactose. As long as the repressor is bound to the operator site, RNA polymerase cannot bind the promoter, and lac Z cannot be transcribed. If lactose is now added, it binds to the repressor, causing conformational changes in the latter, so that it no longer binds the operator site. This frees up the promoter for RNA polymerase binding, leading to transcription of lac Z, and production of beta galactosidase.

 

In pBB96, we have inserted the lac operators equence next to the T5 promoter, to control the expression of the dhfr gene. Instead of lactose, we use a chemical analog, isopropylthiogalactoside(IPTG), that is not metabolized by E. coli, as an inducer. This inducer binds to the lac repressor, just as lactose would, and turns on the transcription of dhfr. Where does the lac repressor come from? Normal E. Coli cells make some lac repressor, which is encoded by the bacterial lac I gene. In our experiments we use a strain of E. coli in which the lac I gene has a mutant promoter that leads to the production of about ten times as much lac repressor as usual.This mutant lac gene is called lacI q. Strains bearing the lacI q mutation are ideal for expressing genes under lac operator control, because they make a lot of repressor,thus ensuring that the operator site is always occupied by the repressor until the inducer is added. For us, this means that the dhfr gene will remain turned off until we choose to turn it on by adding IPTG.

There are several other ways in which transcription of the foreign gene may be controlled. Of these, a popular choice is one where the gene to be expressed is under the control of a T7 phage promoter. This promoter is not recognised by E. coli RNA polymerase, but must be transcribed by the phage polymerase. This is achieved by transforming the plasmid into a special E.coli strain that has the gene for the phage RNA polymerase integrated into its genome. In this case, as long as there is no phage RNA polymerase produced, there will be no expression of the cloned gene.

Other considerations


No single strategy will work for all the different proteins that one might want toexpress in E. coli. Each gene is different, and the specific problems that you might encounter will vary. Some proteins may be harmful to the cell when expressed at high levels, so fine-tuning the regulation of expression may be necessary.

In addition to the strategies outlined, expression may be induced at lowered temperatures, or at different levels of inducer. Sometimes, it may even be desirable to switch to a weaker promoter. For some proteins, the problem may be that although they are expressed, they are degraded easily by the action of proteases. In such cases, a strain of E. coli deficient in proteases might be tried as a host strain. The problem of stabilizing an expressed protein is also sometimes solved by expressing it as a fusion with another protein. Other ways to get around this problem include adding a signal sequence to the protein, causing it to be secreted into the periplasmic space. This can also, in some cases, make it easier to purify the protein.

Return to BB494 Page.

Return to Lectures Page.