Gene Expression Regulation Notes
Outline HERE

Bacteria must (and do) respond rapidly to changes in their environment. Since all cellular information is in DNA and virtually all cellular tools are in the form of protein, control of when RNA is made from DNA to make protein is important for cells to be able to respond to environmental changes. The term 'gene expression' is used to describe the synthesis of mRNA followed by processing and then translation to protein. Each of the individual steps in gene expression can be controlled, depending on conditions, allowing the cell to 'fine-tune' its response. The first level of control is transcription - synthesis of RNA.

As you have seen, sugar metabolism is important for cells. For E. coli, as for most cells, glucose is the most readily metabolized sugar, but glucose is not always available. Consequently, E. coli must be able to respond appropriately when sugars besides glucose are present. Lactose is a disaccharide composed of glucose and galactose. Genes for lactose utilization are activated once E. coli cells run out of glucose and sense the presence of lactose or a similar compound. Cells respond to changes in the environment by differentially controlling genes. In bacteria, gene control often exists at the level of transcriptional regulation of a complex known as the operon. An operon is a collection of linked genes under common, coordinate transcriptional control. Operons generally contain three important features (proposed originally by Jacob and Monod). First, they contain one or more genes called regulator gene(s) that helps control the expression of the other genes in the operon. Second, operons contain specific control sequences in their DNA (called operator sequences) that are bound by the regulator gene(s). Third, operons have the actual genes (called structural genes) whose transcription is being controlled. The lactose operon (more commonly called the Lac operon) of E. coli is the most intensely studied genetic regulatory system. (Note that operons are unique to prokaryotes. In eukaryotes, each gene is regulated separately, not in clusters like they are in operons).

The Lac operon consists of three linked structural genes that encode enzymes of lactose utilization, plus adjacent regulatory sites. The three structural genes--z, y, and a--encode -galactosidase, -galactoside permease - also called permease - (a transport protein), and thiogalactoside transacetylase (an enzyme with a relatively obscure function), respectively. -galactosidase is an enzyme that breaks down the more complicated sugar lactose into two simpler sugars glucose and galactose. Permease transports lactose into the cell. Transacetylase has a non-essential enzymatic activity - not discussed here. The concentration of -galactosidase is very low in cells normally (basal level); but when lactose is the sole carbon source, the concentration of the enzyme is elevated markedly (a process called induction). After all the lactose is metabolized, -galactosidase returns to a very low level in the cell.

Transcription of the lac operon commences at a promoter (lacP) to the left of lacZ and transcribes a 5,200 nucleotide messenger RNA molecule (mRNA), ending at a terminator beyond lacA. Transcription yields a single polycistronic messenger RNA (that is, an RNA containing all three genes). (Note - the term "cistron" is used here to indicate a region of a genome that encodes one polypeptide chain. A polycistronic message, therefore, contains coding for more than one polypeptide chain.)

In the presence of an inducer (a molecule that activates transcription of the operon), all three enzymes (z,y, and a) accumulate simultaneously, but to different levels. Lactose itself ultimately leads to induction of the lactose operon. In the laboratory one can uses a synthetic inducer to activate the operon in cells. One such synthetic inducer is isopropylthiogalactoside (IPTG - induces the lactose operon but is not cleaved by -galactosidase) to artificially activate the operon and a non-inducing substrate, called Xgal, which turns BLUE when hydrolyzed by beta-galactosidase to give an indication that the operon is active. Thus, E. coli cells turn blue when treated with IPTG and Xgal.

Lac Repressor Protein

The lac operon exhibits both positive and negative transcriptional regulation. Negative regulation is the easiest to understand. The i gene product of the lac operon is a protein (called the lac repressor) that is a macromolecular repressor which, in the active form, binds to the lac operator, and blocks RNA polymerase from binding, thereby blocking transcription. The lac repressor has the ability to bind allolactose - a byproduct of the action of beta-galactosidase on lactose. When the repressor binds allolactose, the repressor is INACTIVATED and cannot bind the operator, allowing the operon to be activated. When lactose (and thus allolactose) is absent, the lac repressor is ACTIVE and turns OFF the operon. The lac repressor gene is transcribed from its own promoter (separate from the operon). Control by the lac repressor is exceedingly efficient, particularly in view of the minute amount of repressor present in an E. coli cell. The repressor (i gene) is present at about 10 molecules of repressor tetramer per cell, but it is exceedingly effective at turning OFF the lac operon.

The repressor works in a very simple way. If it is not bound to the allolactose inducer, it binds very tightly to the lac operator and inhibits transcription. When the repressor is bound to the operator, RNA polymerase cannot bind to the promoter and/or transcription is blocked. Binding of IPTG, allolactose, or some other inducer by the lac repressor, however, inactivates it by vastly decreasing its affinity for DNA. In the absence of an inducer of the lac operon, the repressor binds to the lac operator and prevents RNA polymerase from transcribing the operon. However, when an inducer is present, it binds to the repressor and prevents the repressor from binding to the operator and permits RNA polymerase to bind to lacP and to initiate transcription.

Inactivating the repressor in this way can help to stimulate transcription of z, y, and a. Thus, the introduction of an inducer activates synthesis of the gene products involved in its catabolism by removing a barrier to their transcription. This mode of regulation is negative, because the active regulatory element (the repressor) is an inhibitor of transcription. Positive control is also a factor and is described below.

The E. coli genome has a single region with a sequence matching the binding site of the lac repressor. Thus, the lac repressor only affects the transcription of one operon - the lac operon. By contrast, there are multiple binding sites for another repressor in E. coli called pur. Pur is also a repressor, but is different from the lac repressor in that it binds to DNA only when it binds a small molecule (a base, actually). As you can see, pur is involved in regulation of many operons.

Positive Activation of the Lac Operon

The promoter of the lactose operon differs from ideal -35 and -10 sequences and does not function very efficiently by itself to stimulate transcription because RNA polymerase does not bind it well.


 -35 Sequences

  -10 Sequences

 Ideal promoter



 Lactose operon promoter



A positive activator protein known as CRP (Cyclic AMP Receptor Protein) (also called CAP by your book) binds to a specific sequence adjacent to the promoter and assists RNA polymerase in binding to the promoter. In order to function to stimulate RNA polymerase binding to the lac promoter, CRP must bind to cAMP. Some mutant cells with promoter sequences more like the "ideal" promoter no longer require CRP assistance to stimulate lac operon transcription.

When it binds cAMP, CRP undergoes a conformational change. The change greatly increases its affinity for certain DNA sites, including a site in the lac operon adjacent to the RNA polymerase binding site. This binding facilitates transcription of the lac operon by stimulating the binding of RNA polymerase to the promoter of the operon by a factor of about 50.

The cAMP--CRP complex activates several different gene systems in E. coli, all of them involved with energy generation. They include operons for utilization of other sugars, including galactose, maltose, arabinose, and sorbitol, and several amino acids. Among the operons that have been analyzed, the DNA binding site of the cAMP-activated dimer varies considerably with respect to the transcriptional start point, suggesting that regulatory mechanisms involving this protein are complex.

The structure of the CRP-cAMP-DNA complex, as revealed by x-ray crystallography, shows how the protein binds to DNA. CRP has a DNA-binding structural element called helix-turn-helix (HTH). It is found in several DNA-binding regulatory proteins, suggesting common evolutionary origins for this family of proteins. Analysis of the DNA - protein complex shows that CRP induces DNA to bend quite sharply when it binds.

Another protein structure involved in DNA binding is that of beta strands, as contained in the methionine repressor.

Catabolite Repression

E. coli grown on glucose do not metabolize lactose or other sugars until all of the glucose in the medium is exhausted. This is called catabolite repression. It once was thought that catabolite repression in E. coli was mediated by varying levels of cAMP which, in turn, affected the binding of CRP to its binding site in the lac operon. However, cAMP levels are the same in cells grown on glucose alone or on lactose alone.

Since there are always adequate concentrations of cAMP in E. coli to bind and activate cAMP, CRP is bound to its binding site at all times (even during repression by the lac repressor). Thus, when lactose is available, the overall control of the operon is determined by the presence or absence of glucose.

1. When glucose is present, it represses the lac operon of E. coli by prohibiting the entry of lactose through any permease molecules that happen to be present. Consequently, in the presence of glucose there is never enough lactose inside the uninduced cells to be transformed into inducer (allolactose) by the very few molecules of beta-galactosidase, so the repressor remains active and no transription of the operon results.

2. In the absence of glucose, lactose enters the cell and is converted by the very few beta-galactosidase molecules in the cell to allolactose, the natural inducer of the lac operon. Allolactose binds to the repressor and changes the shape of the repressor so that it no longer can bind to the operator. This removes the repressor from the operator and transcription of the operon is favored.

In summary,

Eukaryotic Gene Regulation

Gene regulation in eukaryotic cells is considerably more complex than that found in prokaryotic cells. This is due to the larger genomes, larger number of genes, and the phenomena associated with multicellularity, including cellular differentiation. Adding to this complexity in eukaryotes is the presence of histone proteins covering DNA sequences, creating chromatin. Five major histones are found in chromatin. Four of them, H2A, H2B, H3, and H4 associate with each other. The fifth is called H1. Repeating structures of about 200 bp of DNA mixed with an octamer composed of two copies each of H2A, H2B, H3, and H4 is called a nucleosome and is an important repeating unit of chromatin. A smaller complex of about 145 bp of DNA bound to the histone octamer is called a nucleosome core particle. The DNA linking adjacent core particles is called linker DNA. Histone H1 binds in part to linker DNA. The four histones of the core particle have considerable structural similarity. All of the histones are basic proteins, facilitating their interaction with the negatively charged DNA. Tails of the histones are rich in lysine and arginine and are covalently modified by enzymes in the cell as a means of affecting their affinity for DNA and controlling gene expression. Histone amino acid sequences are remarkably similar from yeast to humans.

Wrapping of DNA around the nucleosome core shrinks it in size, but not sufficiently by itself to account for the compacting of DNA actually seen in cells. Nucleosomes themselves are arranged in helical arrays for a higher order of packing. Winding of the DNA around the histone core in a left-handed fashion stores negative supercoils. When the nucleosome is straightened out, it unwinds, allowing the strands to separate and transcription to occur.

Chromatin Remodeling

Chromatin consists of DNA wrapped around histones and these ordered in a higher order structure. In this configuration, the DNA sequences are not readily available for important processes, such as transcription. In order to make DNA sequences accessible to transcriptional apparatuses, the chromatin must be 'remodeled.' Remodeling of chromatin appears to be closely linked to genes that are to be transcribed. It appears to be cell type specific - occurring near regions of genes in tissues where those genes are transcribed and not occurring near the same genes in tissues where those genes are not normally transcribed. Remodeling of DNA appears to make stretches of DNA 'open' near transcriptional start sites. This can be readily detected by the susceptibility of these sequences to DNase I treatment. Open DNA sequences will be attacked by DNase I, but DNA sequences covered up in chromatin will not. DNase I sensitivity then is a measure of chromatin remodeling.


As discussed previously, enhancer sequences are regions of DNA bound by proteins that affect the transcription of genes nearby. Enhancer proteins are specific to certain tissues where transcription of specific genes is needed. Enhancer proteins may help remodel chromatin structure by binding to DNA and exposing of DNA, which is necessary for providing access to transcriptional machinery. Moving enhancer sequences in front of genes may alter their transcriptional behavior. For example, taking an enhancer that activates genes in muscle cells and placing it experimentally in front of another gene that is not normally active in muscle cells causes it to become active in muscle cells.

DNA Modifications

In eukaryotic cells, methylation of the cytosine in CpG (p refers to phosphate) sequences in DNA provides another way of controlling gene expression. For example, if one examines the CpG sequences of globin genes in tissues where globin is actively made, one finds that CpG sequences surrounding the gene are very much reduced compared to the same sequences in tissues where globin is not made. Note that methylation of cytosine provides yet another mutational opportunity in cells because deamination of 5-methylcytosine (which can occur spontaneously) creates a thymine.

Transcriptional Activation and Repression

Protein-protein interactions play important roles in controlling gene expression. In eukaryotic cells these interactions are important because individual proteins rarely control gene expression by themselves. Instead, they usually work as part of a complex of many proteins. Indeed a given protein may have one role (say activation of gene expression) in one complex and an opposite role in another one. This varying level of control is called combinatorial control and is an important means of regulation for differentiated cells.

Steroid Regulation

Steroids, such as estrogen (and other chemicals) can affect patterns of gene expression. Estrogens are hydrophobic molecules and can readily move across cell membranes. Inside cells, they bind to receptor proteins called estrogen receptors. After an estrogen receptor binds estrogen, it modifies the gene expression by binding to DNA control regions in front of specific genes. There are about 50 nuclear hormone receptors in the human genome. These receptors have two regions - a hormone binding region and a DNA binding region. The hormone receptors use cysteine residues in the DNA binding region to recognize and bind to specific DNA sequences. It is in this way that specific genes can be targeted. The cysteine residues bind zinc ions and form structures called zinc fingers. Binding of ligands (estrogen, in this case) induces structural changes in the receptor and these ultimately must lead to changes in its ability to regulate gene expression. Note that the changes in ability to regulate gene expression are NOT due to changes in ability to bind DNA.

Coactivator Recruitment

Coactivator proteins, such as SRC-1, GRIP-1 and NcoA-1 (these are called part of the p160 family) bind to nuclear receptor proteins after the receptor proteins bind a ligand, such as estrogen. Binding of estrogen induces changes in the receptor protein structure that favor binding of the coactivator in a process called recruitment that results in activation of transcription. Note that some nuclear receptor proteins can act as repressors of transcription when they are not bound by ligand. In this case, the unbound nuclear receptor binds to corepressor proteins.

Drug Targeting of Receptors

Molecules that bind to a receptor and stimulate it to act are called agonists. Estrogens binding to estrogen receptors are agonists. Molecules that bind to receptors, but do not stimulate them to act are called antagonists. Antagonists act like competitive inhibitors of enzymes. One good example of an antagonist of the estrogen receptor is tamoxifen, which binds in the same place as estradiol. Tamoxifen appears to work by binding the site estradiol normally binds and then a portion of the tamoxifen molecule extends across the receptor and prevents coactivators from binding. Estradiol binding does not inhibit coactivator binding.

Chromatin Structure Modification

Nuclear receptors, such as the estrogen receptor, act by recruiting coactivator or corepressor molecules to the chromatin. Coactivators and corepressors appear to act by covalently modifying the amino-terminal tails of histones and other proteins. Some p160 coactivators (and enzymes they recruit) catalyze transfer of acetyl groups from acetyl-CoA to specific lysine side chains in the amino portions of histones. Enzymes catalyzing such reactions are called histone acetyltransferases. Addition of an acetyl group to lysine converts it from having a positive charge to a zero charge, reducing the affinity for that portion of the histone for DNA and loosening the affinity of the histone complex for DNA. As a result, DNA in this region becomes exposed to transcriptional proteins. Many proteins involved in regulating eukaryotic transcription contain a domain called a bromodomain that recognizes the acetyllysine residue. Two large complexes are involved in this. One is a complex of 10 proteins that binds TBP. These proteins are called TAFs. TAFII250 is one such prominent protein. There are five steps in eukaryotic gene regulation:

Remodeling engines use ATP hydrolysis energy to move nucleosomes alogn DNA and thus change the structure of the chromatin.

Histone deacetylases contribute to the opposite effect - repression of transcription. These enzymes remove acetyl groups from histones and favor reformation of the original, closed chromatin structure and thus make access of RNA polymerase to transcriptional sites more difficult. Histone deacetylases thus allow cells to turn off genes when necessary and together with histone acetylases, allow for a thorough means of control of gene expression.

Ligand Binding and the Phosphorylation Cascade

As you have seen before, binding of epinephrine to a cell surface receptor can activate G proteins and stimulate cAMP production and subsequently stimulate protein kinase A (PKA). PKA can activate glycogen breakdown and inhibit glycogen synthesis, among others. Besides these pathways, PKA phosphorylates the cyclic AMP-response element binding protein (CREB). CREB is a transcription factor that, when phosporylated, activates transcription of a variety of genes through stimulating interaction with a coactivator protein called CBP (CREB binding protein). The domain structure of CBP is interesting. It includes a domain called KIX that stands for Kinase Inducible Interaction (binds the phosphorylated region of CREB), one bromodomain for binding acetylated lysines in chromatin, and two TAZ domains. TAZ domains are zinc binding regions of a protein that facilitate binding of CBP to other proteins.

Posttranscriptional Gene Regulation

Gene expression is mediated in cells by other mechanisms than transcriptionally. In prokaryotes, the coordinated processes of transcription and translation are exploited for regulation by a process called attenuation. Attenuation occurs in several amino acid operons (remember that operons are prokaryotic collections of genes under common transcriptional control). Attentuation relies on sequences in the 5' end of the mRNA of these operons. One such operon regulated by attenuation is the tryptophan (trp) operon. This operon contains the genes necessary for the bacterium to synthesize tryptophan. Controls on it allow E. coli to make the genes only when necessary.

The critical feature of attentuation is that translation occur very soon after transcription start in prokaryotes. The short peptide coded for at the beginning of the sequence contains two tryptophans and that two regions of the sequence in the region called the attenuator are capable of forming stem structures by base pairing. When tryptophan is present abundantly, the ribosome moves quickly across the region encoding the tryptophan residues and doesn't stall because sufficient tryptophan is present. As a result of this quick movement of the ribosome, the ribosome binds the red region of the mRNA (first region capable of forming a stem) and prevents it from forming a stem. Instead, the blue region forms a stem and causes the RNA/DNA duplex to be unstable (remember how factor independent transcriptional termination occurs), causeing the RNA polymerase to come "loose" from the DNA, releasing the mRNA in the process before further transcription can occur.

On the other hand, when tryptophan is in short supply, the ribosome translates the mRNA slower. In this case, the ribosome does NOT cover the red region of the mRNA, allowing it to form an alternative stem, preventing termination from occurring. This scheme of using multiple amino acids in the leader sequence of an mRNA is a common regulatory mechanism for amino acid operons in E. coli.

Iron Metabolism in Eukaryotes

Iron is a nutrient that is important for hemoglobin, cytochromes, and other proteins in eukaryotes. To understand iron movement in the body of animals, one must know of several proteins. They include transferrin (carries iron in blood serum), transferrin receptor (membrane protein that binds transferrin), and ferritin (iron storage protein of kidney and liver). When iron is low in the body, transferrin receptor synthesis increases and little or no new ferritin is synthesized. The levels of transcription of these genes does NOT occur under different conditions. Instead, the amounts of each protein is regulated by the process of translation.

Ferritin mRNA can form a stem-loop structure. The stem of this structure (called an Iron Response Element - IRE) is the binding target of a protein called IRE-BP (Iron Response Element). When iron is abundant, IRE-BP binds iron, but cannot bind the stem. Without the IRE-BP on the IRE, the ferritin mRNA is translated, but when the IRE-BP is bound to the IRE (low iron), translation of ferritin is reduced.

The transferrin-receptor also has IRE regions that can serve as binding sites for IRE-BP as well. These regions are AFTER the coding region of the protein in the 3' end of the mRNA that is not translated. Under low iron conditions, IRE-BP binds to these regions and protects the mRNA from nucleolytic degradation. When iron levels are high, IRE-BP does not bind the mRNA and it is rapidly degraded and thus not translated into transferrin-receptor.

An interesting side note about IRE-BP is that it turns out to be an aconitase (citric acid cycle enzyme) with an iron-sulfur center.