Regulation of Eukaryotic Transcription I

Most eukaryotic genes are regulated by multiple transcription-control elements

Initially, enhancers and promoter-proximal elements were thought to be distinct types of transcription-control elements. However, as more enhancers and promoter-proximal elements were analyzed, the distinctions between them became less clear. For example, both types of element generally can stimulate transcription even when inverted, and both types can be cell-type specific.

The general consensus now is that a spectrum of control regions regulate transcription by RNA polymerase II. Enhancers that can stimulate transcription from a promoter tens of thousands of base pairs away (e.g., the SV40 enhancer) are at one extreme. Promoter-proximal elements, such as the upstream elements controlling the HSV tk gene, which lose their influence when moved an additional 15-20 base pairs further from the promoter, are at the other extreme. Researchers have identified a large number of transcription-control regions that can stimulate transcription from distances between these two extremes (
Fig. ET2).

Fig. 11-42 summarizes the locations of transcription-control sequences for a hypothetical mammalian gene. Transcription initiates at the cap site encoding the first nucleotide of the first exon of an mRNA. For many genes, especially those encoding abundantly expressed proteins, a TATA box located 25-30 base pairs upstream form the cap site directs RNA polymerase II to the start site. Promoter-proximal elements roughly within the first 200 base pairs upsteam of the cap site stimulate transcription. Enhancers, which strongly activate transcription, frequently in a specific differentiated cell type, usually are 100-200 base pairs long. Although enhancers often lie within a few kilobases of the cap site, in some cases they lie much further upstream or downstream from the cap site or within an intron. Some genes are controlled by more than one enhancer region, as in the case of the Drosophila even-skipped gene.

The S. cerevisae genome contains regulatory elements called upstream activating sequences (UASs), which function similarly to enhancers and promoter-proximal elements in higher eukaryotes. Most yeast genes contain only one UAS, which generally lies within a few hundred base pairs of the cap site. In addition, yeast genes contain a TATA box about 100 base pairs upstream from the transcription start site (Figure 11-42b).

Eukaryotic Transcription Factors

As in E. coli, the various eukaryotic transcription-control elements described in the previous section are binding sites for regulatory proteins. Binding of these proteins, called transcription factors, generally stimulates transcription, although eukaryotic regulatory proteins that repress transcription also have been identified. In this section, we discuss the identification, purification, and structures of proteins that function as transcription activators.

Biochemical and genetic techniques have been used to identify transcription factors

In yeast, Drosophila, and other genetically tractable eukaryotes, classical genetic studies have identified genes encoding transcription factors. However, in mammals and vertebrates, which are not amenable to such genetic analysis, most transcription factors have been identified by biochemical purification.

Biochemical isolation of transcription factors

Once a DNA regulatory element has been identified by the kinds of mutational analyses described in the previous section, proteins in an extract of cell nuclei can be assayed by their ability to bind specifically to the identified sequence. In this approach, DNA fragments containing the identified regulatory seuence are used in an electrophoretic mobility shift assay or DNaseI footprinting assay to identify a putative transcription factor as it is purified by column chromatography. A particular type of affinity chromatography, called sequence-specific DNA affinity chromatography, is a powerful technique for the final purification step. Long DNA strands containing multiple copies of the transcription-factor binding site are synthesized and hybridized. These synthetic molecules then are coupled to a solid support to create a sequence-specific affinity column. A partially purified extract containing the desired transcription factor is applied to the column in a low-salt buffer (100 mM KCl). Proteins that do not bind to the specific binding site are washed off the column with additional low-salt buffer. Proteins with low affinity for the binding site are eluted with buffer containing 300 mM KCl. Finally, highly purified transcription factor is eluted with buffer containing 1 M KCl. The entire procedure for identifying DNA regulatory elements and isolating transcription factors that bind to them is summarized in Fig. 11-43. As a final test, the ability of the isolated protein to stimulate transcription of a template containing the corresponding binding site is assayed in an in vitro transcription reaction.

Genetic Identification of Genes Encoding Transcription Factors

In yeast, genes encoding transcription factors were first identified through classical genetic analysis. For example, one of the yeast genes required for growth on galactose is called GAL4. Incubation of wildtype yeast cells in galactose medium results in more than a thousandfold increase in the concentration of mRNAs encoding the enzymes catalyzing galactose metabolism (Fig. ET3 ). This activation of mRNA expression is not observed in gal4 mutants. (In S. cerevisiae, wildtype genes are designated with capital letters in italics, while recessive mutant alleles of the gene are indicated with lowercase letters in italics. The encoded protein is designated by the name of the gene in Roman type, with the first letter capitalized, e.g., Gal4.) Directed mutagenesis studies like those described in the previous section identified UASs for the induced genes. Each of the UASs was found to contain one or more copies of a related 17-bp sequence called UASGAL. When a copy of UASGAL was cloned upstream of a TATA box followed by a lacZ reporter gene, expression of lacZ was activated in galactose medium in wildtype cells, but not in gal4 mutants This indicated that UASGAL is a transcription-control element activated by the Gal4 protein in galactose medium.

The GAL4 gene was isolated by complementation of a gal4 mutant with a library of wildtype yeast DNA. Using recombinant DNA techniques, the Gal4 protein was expressed in E. coli and found to bind to UASGAL. Thus, the Gal4 protein binds to UASGAL sequences and activates transcription from a nearby promoter when cells are placed in galactose medium.

Classical genetic studies in a number of other organisms including Drosophila, the nematode C. elegans, and higher plants have uncovered several genes encoding transcription factors. For example, many mutations that interfere with normal Drosophila development have been identified. One of the inactivates the ultrabithorax (Ubx) gene, causing an extra pair of wings to develop from the third thoracic segment. The wildtype Ubx gene was cloned and sequenced, and the encoded protein expressed in large amounts using recombinant DNA techniques. Transcription assays showed that the Ubx protein functions as a transcription factor. The remarkable change in phenotype observed in Ubx- mutants indicates that Ubx protein influences transcription of a large number of Drosophila genes.

Many Transcription Factors Are Modular Proteins Composed of Distinct Functional Domains

A remarkable set of experiments with the yeast Gal4 protein demonstrated that this transcription factor is composed of separable functional domains: a DNA-binding domain, which interacts with specific DNA sequences, and an activation domain, which interacts with other proteins to stimulate transcription from a nearby promoter. In these experiments, a series of gal4 deletion mutants were tested for their ability to activate transcription of a reporter gene (lacZ) in yeast cells (Fig. 11-46); binding of the encoded mutant Gal4 proteins to the UASGAL sequence also was assayed. Transcription activation was measured in cells lacking the GAL4 gene, so that wildtype Gal4 protein would not interfere with the analysis. A small deletion from the N-terminus of Gal4 eliminated its ability to bind, whereas a series of mutant proteins with deletions from the C-terminus extending all the way to amino acid 74 retained the ability to bind specifically to UASGAL (Figure 11-46b). These results demonstrate that a domain present in the N-terminal 74 amino acids of Gal4 is capable of binding to this DNA sequence. A similar analysis of the yeast transcription factor Gcn4, which regulates genes required for the synthesis of many amino acids, showed that it contains a different DNA-binding domain within its C-terminal 60 amino acids.

As expected, deletion of the N-terminal region of Gal4 eliminated its ability to activate expression of the reporter gene, because it could not bind to the UASGAL sequence. Deletion of about 125 or more amino acids from the C-terminus of Gal4 was required to completely eliminate its activation ability; these deletions did not affect its DNA-binding activity. Surprisingly, when the N-terminal DNA-binding domain of Gal4 was fused directly to various C-terminal fragments, the resulting truncated proteins, which lacked most of the internal portion of the protein, retained the ability to stimulate expression of the reporter gene (see Figure 11-46b, bottom). Thus the C-terminal about 100 amino acid region of Gal4 contains an activation domain that, when fused the N-terminal DNA-binding domain, is capable of stimulating transcription from the reporter gene.

Similar experiments with Gcn4 indicated that it contains an about 20 amino acid activation domain near the middle of its sequence. The wildtype Gcn4 protein contains a region between this activation domain and its C-terminal DNA-binding domain that is extremely sensitive to digestion by proteases. Such highly protease-sensitive regions generally are relatively unstructured, flexible stretches of polypeptide chain, which can readily bend into a conformation that fits easily into the active site of a protease.

Further evidence for the existence of distinct activation domains in Gal4 and Gcn4 came from experiments in which their activation domains were fused to a DNA-binding domain from an entirely different protein, the E. coli LexA repressor. LexA has an N-terminal DNA-binding domain that binds specifically to lexA operator sequences. In these studies, a reporter gene with an upstream lexA operator was introduced into yeast cells, Expression of the LexA DNA-binding domain in these yeast cells did not activate expression of the reporter gene. But introduction of DNA constructs containing the coding sequence for the LexA DNA-binding domain ligated to the coding sequence for either the Gal4 or Gcn4 activation domain led to expression of the reporter gene in yeast cells. In this case, a fusion protein consisting of the DNA-binding domain from one transcription factor and the activation domain from a different factor was expressed in vivo and activated transcription. Thus, entirely novel transcription factors composed of prokaryotic and eukaryotic elements can be constructed.

Studies such as these have now been carried out with many eukaryotic transcription factors. Activation domains in mammalian transcription factors are frequently assayed by fusing them to the Gal4 DNA-binding domain since mammalian cells do not contain an endogenous transcription factor that binds to the UASGAL sequence. The structural model of eukaryotic activators that has emerged from these studies is a modular one in which one or more activation domains is connected to a sequence-specific DNA-binding domain through relatively flexible protein domains (
Fig. 11-47). In some cases, amino acids included in the DNA-binding domain also contribute to transcriptional activation. Activation domains are thought to function through protein-protein interactions with transcription factors bound at the promoter. The flexible protein domains in activators, which connect the DNA-binding domains to activation domains, may explain why alterations in the spacing between control elements are so well tolerated in eukaryotic control regions. When the DNA-binding domains of neighboring transcription factors are shifted in their relative positions on the DNA, their activation domains may still be able to interact because they are attached to their DNA-binding domains through flexible protein regions.