Initially, enhancers and promoter-proximal
elements were thought to be distinct types of transcription-control
elements. However, as more enhancers and promoter-proximal elements
were analyzed, the distinctions between them became less clear.
For example, both types of element generally can stimulate transcription
even when inverted, and both types can be cell-type specific.
The general consensus now is that a spectrum of control regions
regulate transcription by RNA polymerase II. Enhancers that can
stimulate transcription from a promoter tens of thousands of base
pairs away (e.g., the SV40 enhancer) are at one extreme. Promoter-proximal
elements, such as the upstream elements controlling the HSV tk
gene, which lose their influence when moved an additional 15-20
base pairs further from the promoter, are at the other extreme.
Researchers have identified a large number of transcription-control
regions that can stimulate transcription from distances between
these two extremes (Fig.
ET2).
Fig.
11-42 summarizes the locations
of transcription-control sequences for a hypothetical mammalian
gene. Transcription initiates at the cap site encoding the first
nucleotide of the first exon of an mRNA. For many genes, especially
those encoding abundantly expressed proteins, a TATA box located
25-30 base pairs upstream form the cap site directs RNA polymerase
II to the start site. Promoter-proximal elements roughly within
the first 200 base pairs upsteam of the cap site stimulate transcription.
Enhancers, which strongly activate transcription, frequently in
a specific differentiated cell type, usually are 100-200 base
pairs long. Although enhancers often lie within a few kilobases
of the cap site, in some cases they lie much further upstream
or downstream from the cap site or within an intron. Some genes
are controlled by more than one enhancer region, as in the case
of the Drosophila even-skipped gene.
The S. cerevisae genome contains regulatory elements called
upstream activating sequences (UASs), which function similarly
to enhancers and promoter-proximal elements in higher eukaryotes.
Most yeast genes contain only one UAS, which generally lies within
a few hundred base pairs of the cap site. In addition, yeast genes
contain a TATA box about 100 base pairs upstream from the transcription
start site (Figure 11-42b).
As in E. coli, the various eukaryotic transcription-control elements described in the previous section are binding sites for regulatory proteins. Binding of these proteins, called transcription factors, generally stimulates transcription, although eukaryotic regulatory proteins that repress transcription also have been identified. In this section, we discuss the identification, purification, and structures of proteins that function as transcription activators.
In yeast, Drosophila, and other genetically tractable eukaryotes, classical genetic studies have identified genes encoding transcription factors. However, in mammals and vertebrates, which are not amenable to such genetic analysis, most transcription factors have been identified by biochemical purification.
Once a DNA regulatory element has been identified
by the kinds of mutational analyses described in the previous
section, proteins in an extract of cell nuclei can be assayed
by their ability to bind specifically to the identified sequence.
In this approach, DNA fragments containing the identified regulatory
seuence are used in an electrophoretic mobility shift assay or
DNaseI footprinting assay to identify a putative transcription
factor as it is purified by column chromatography. A particular
type of affinity chromatography, called sequence-specific DNA
affinity chromatography, is a powerful technique for the final
purification step. Long DNA strands containing multiple copies
of the transcription-factor binding site are synthesized and hybridized.
These synthetic molecules then are coupled to a solid support
to create a sequence-specific affinity column. A partially purified
extract containing the desired transcription factor is applied
to the column in a low-salt buffer (100 mM KCl). Proteins that
do not bind to the specific binding site are washed off the column
with additional low-salt buffer. Proteins with low affinity for
the binding site are eluted with buffer containing 300 mM KCl.
Finally, highly purified transcription factor is eluted with buffer
containing 1 M KCl. The entire procedure for identifying DNA regulatory
elements and isolating transcription factors that bind to them
is summarized in Fig. 11-43. As a
final test, the ability of the isolated protein to stimulate transcription
of a template containing the corresponding binding site is assayed
in an in vitro transcription reaction.
Genetic Identification of Genes Encoding Transcription Factors
In yeast, genes encoding transcription factors
were first identified through classical genetic analysis. For
example, one of the yeast genes required for growth on galactose
is called GAL4. Incubation of wildtype yeast cells in galactose
medium results in more than a thousandfold increase in the concentration
of mRNAs encoding the enzymes catalyzing galactose metabolism
(Fig. ET3 ). This
activation of mRNA expression is not observed in gal4 mutants.
(In S. cerevisiae, wildtype genes are designated with capital
letters in italics, while recessive mutant alleles of the gene
are indicated with lowercase letters in italics. The encoded protein
is designated by the name of the gene in Roman type, with the
first letter capitalized, e.g., Gal4.) Directed mutagenesis
studies like those described in the previous section identified
UASs for the induced genes. Each of the UASs was found to contain
one or more copies of a related 17-bp sequence called UASGAL.
When a copy of UASGAL was cloned upstream
of a TATA box followed by a lacZ reporter gene, expression
of lacZ was activated in galactose medium in wildtype cells,
but not in gal4 mutants This indicated that UASGAL
is a transcription-control element activated by the Gal4 protein
in galactose medium.
The GAL4 gene was isolated by complementation of a gal4
mutant with a library of wildtype yeast DNA. Using recombinant
DNA techniques, the Gal4 protein was expressed in E. coli
and found to bind to UASGAL. Thus, the Gal4
protein binds to UASGAL sequences and activates
transcription from a nearby promoter when cells are placed in
galactose medium.
Classical genetic studies in a number of other organisms including
Drosophila, the nematode C. elegans, and higher
plants have uncovered several genes encoding transcription factors.
For example, many mutations that interfere with normal Drosophila
development have been identified. One of the inactivates the ultrabithorax
(Ubx) gene, causing an extra pair of wings to develop from
the third thoracic segment. The wildtype Ubx gene was cloned
and sequenced, and the encoded protein expressed in large amounts
using recombinant DNA techniques. Transcription assays showed
that the Ubx protein functions as a transcription factor. The
remarkable change in phenotype observed in Ubx- mutants
indicates that Ubx protein influences transcription of a large
number of Drosophila genes.
A remarkable set of experiments with the yeast
Gal4 protein demonstrated that this transcription factor is composed
of separable functional domains: a DNA-binding domain,
which interacts with specific DNA sequences, and an activation
domain, which interacts with other proteins to stimulate transcription
from a nearby promoter. In these experiments, a series of gal4
deletion mutants were tested for their ability to activate transcription
of a reporter gene (lacZ) in yeast cells (Fig.
11-46); binding of the encoded
mutant Gal4 proteins to the UASGAL sequence
also was assayed. Transcription activation was measured in cells
lacking the GAL4 gene, so that wildtype Gal4 protein would
not interfere with the analysis. A small deletion from the N-terminus
of Gal4 eliminated its ability to bind, whereas a series of mutant
proteins with deletions from the C-terminus extending all the
way to amino acid 74 retained the ability to bind specifically
to UASGAL (Figure 11-46b). These results demonstrate
that a domain present in the N-terminal 74 amino acids of Gal4
is capable of binding to this DNA sequence. A similar analysis
of the yeast transcription factor Gcn4, which regulates genes
required for the synthesis of many amino acids, showed that it
contains a different DNA-binding domain within its C-terminal
60 amino acids.
As expected, deletion of the N-terminal region of Gal4 eliminated
its ability to activate expression of the reporter gene, because
it could not bind to the UASGAL sequence. Deletion
of about 125 or more amino acids from the C-terminus of Gal4 was
required to completely eliminate its activation ability; these
deletions did not affect its DNA-binding activity. Surprisingly,
when the N-terminal DNA-binding domain of Gal4 was fused directly
to various C-terminal fragments, the resulting truncated proteins,
which lacked most of the internal portion of the protein, retained
the ability to stimulate expression of the reporter gene (see
Figure 11-46b, bottom). Thus the C-terminal about 100 amino acid
region of Gal4 contains an activation domain that, when fused
the N-terminal DNA-binding domain, is capable of stimulating transcription
from the reporter gene.
Similar experiments with Gcn4 indicated that it contains an about
20 amino acid activation domain near the middle of its sequence.
The wildtype Gcn4 protein contains a region between this activation
domain and its C-terminal DNA-binding domain that is extremely
sensitive to digestion by proteases. Such highly protease-sensitive
regions generally are relatively unstructured, flexible stretches
of polypeptide chain, which can readily bend into a conformation
that fits easily into the active site of a protease.
Further evidence for the existence of distinct activation domains
in Gal4 and Gcn4 came from experiments in which their activation
domains were fused to a DNA-binding domain from an entirely different
protein, the E. coli LexA repressor. LexA has an N-terminal
DNA-binding domain that binds specifically to lexA operator
sequences. In these studies, a reporter gene with an upstream
lexA operator was introduced into yeast cells, Expression
of the LexA DNA-binding domain in these yeast cells did not activate
expression of the reporter gene. But introduction of DNA constructs
containing the coding sequence for the LexA DNA-binding domain
ligated to the coding sequence for either the Gal4 or Gcn4 activation
domain led to expression of the reporter gene in yeast cells.
In this case, a fusion protein consisting of the DNA-binding domain
from one transcription factor and the activation domain from a
different factor was expressed in vivo and activated transcription.
Thus, entirely novel transcription factors composed of prokaryotic
and eukaryotic elements can be constructed.
Studies such as these have now been carried out with many eukaryotic
transcription factors. Activation domains in mammalian transcription
factors are frequently assayed by fusing them to the Gal4 DNA-binding
domain since mammalian cells do not contain an endogenous transcription
factor that binds to the UASGAL sequence. The
structural model of eukaryotic activators that has emerged from
these studies is a modular one in which one or more activation
domains is connected to a sequence-specific DNA-binding domain
through relatively flexible protein domains (Fig. 11-47). In some cases, amino acids included in the DNA-binding
domain also contribute to transcriptional activation. Activation
domains are thought to function through protein-protein interactions
with transcription factors bound at the promoter. The flexible
protein domains in activators, which connect the DNA-binding domains
to activation domains, may explain why alterations in the spacing
between control elements are so well tolerated in eukaryotic control
regions. When the DNA-binding domains of neighboring transcription
factors are shifted in their relative positions on the DNA, their
activation domains may still be able to interact because they
are attached to their DNA-binding domains through flexible protein
regions.