Regulation of Eukaryotic Transcription II

The most common recognition pattern between transcription factors and DNA is an interaction between an alpha-helical domain of the factor and about five base pairs within the major groove.

Examples of Regulatory Proteins

 Type

 Abbreviation

 Example

 helix-turn-helix

 HTH

 lac repressor, CRP (CAP)

 basic leucine zipper

 bZIP

 CREB, AP1 (Fos, Jun)

 zinc finger

 zif

 TFIIIA, Gal4

note: AP1 is a combination of Fos and Jun. Fos and Jun are two different families of transcription factors. Jun can form both homodimers and heterodimers (and the heterodimers have 10-fold higher binding affinity than the homodimers). Fos can only form heterodimers, and cannot bind to DNA without a partner. CREB is cyclic AMP response element binding protein. TFIIIA is an RNA polymerase III general transcription factor.

A Variety of Protein Structures Form the DNA-binding Domains of Eukaryotic Transcription Factors

Eukaryotic transcription factors contain a variety of structural motifs that interact with specific DNA sequences. As with most bacterial activators and repressors, alpha helices in the DNA-binding domain of eukaryotic transcription factors are oriented so that they lie in the major groove of DNA where protein atoms make specific hydrogen bonds and van der Waals interactions with atoms in the DNA. Interactions with sugar-phosphate backbone atoms and, in some case, with atoms in the DNA minor groove contribute to binding. X-ray crystallographic analyses of several complexes between specific protein-binding sites in DNA and isolated transcription factor DNA-binding domains have revealed a number of structural motifs that can present an alpha helix to the major groove.

Transcription factors often are classified according to the type of DNA-binding domain they contain. Most of the structural classes of DNA-binding domains have characteristic consensus amino acid sequences. Consequently, newly characterized transcription factors frequently can be classified once the corresponding genes or cDNAs are cloned and sequenced. A few of the more common classes of DNA-binding domains whose three-dimensional structures have been determined are described here. Many additional classes are recognized, and new classes are still being characterized. The genomes of higher eukaryotes may encode dozens of classes of DNA-binding domains and literally hundreds of transcription factors (
Figure AY). (Click here to see the Original Figure from the journal Nature).

Zinc-Finger Proteins. A number of different proteins have regions that fold around a central Zn2+ ion, producing a compact domain from a relatively short length of the polypeptide chain. Termed a zinc finger (zif), this structural motif was first recognized in DNA-binding domains. It is now known to occur in proteins that do not bind to DNA. To date, three classes of zinc-finger proteins have been identified.

Fig. ET6 shows the structure of one type of zinc finger DNA-binding domain, termed the C2H2 (or classic) zinc finger. The name is derived from the sequence of repeating unit initially identified in the DNA-binding domain of transcription factor IIIA, which is required for transcription of 5S rRNA genes by RNA polymerase III. Each repeating unit has the consensus sequence (Tyr/Phe) X Cys X2-4 Cys X3 (Phe/Tyr) X5 Leu X2 His X3-4 His where X is any amino acid. Each repeating unit binds one zinc ion through the two cysteine (C) and two histidine (H) side chains. The name "zinc finger" was coined because a two-dimensional diagram of the structure resembles a finger. When the three-dimensional structure was solved, it became clear that the binding of the zinc ion by the two cysteines and two histidines folds the relatively short polypeptide sequence into a compact domain, which can insert its alpha helix into the major groove of DNA.

The C2H2 zinc finger is one of the most common DNA-binding motifs in eukaryotic transcription factors. More than a thousand of these consensus sequences are in the current protein data base. The repeating units in these proteins can interact with successive groups of base pairs, primarily within the major groove, as the protein wraps around the DNA double helix.

Leucine-Zipper Proteins. Another structural motif present in a large class of transcription factors is exemplified by the DNA-binding domain of yeast Gcn4. The first transcription factors recognized in this class contained the hydrophobic amino acid leucine at every seventh position in the C-terminal portion of their DNA-binding domains (
Fig. ET4). These proteins bind to DNA as dimers, and mutagenesis of the leucines showed that they were required for dimerization. Consequently, the name leucine zipper was coined to denote this structural motif.

X-ray crystallographic analysis of complexes between DNA and the Gcn4 DNA-binding domain has shown that the dimeric protein contains two extended alpha helices that "grip" the DNA molecule, much like a pair of scissors, at two adjacent major grooves separated by about half a turn of the double helix (
Fig. ET5). The portions of the alpha helices contacting the DNA include basic residues that interact with phosphates in the DNA backbone and additional residues that interact with specific bases in the major groove.

Gcn4 forms dimers via hydrophobic interactions between the C-terminal regions of the alpha helices, forming a coiled-coil structure. This structure is common in proteins containing amphipathic alpha helices in which hydrophobic amino acid residues are regularly spaced alternately three or four positions apart in the sequence. As a result of this characteristic spacing, the hydrophobic side chains form a stipe down one side of the alpha helix. The hydrophobic stripes make up the interacting surfaces between the alpha-helical monomers in a coiled-coil dimer.

As noted above, the first transcription factors in this class to be analyzed contained leucine residues at every seventh position in the dimerization region and thus were named leucine-zipper proteins. However, additional DNA-binding proteins containing other hydrophobic amino acids in these positions subsequently were identified. Like leucine-zipper proteins, they form dimers containing a C-terminal coiled-coil region and N-terminal DNA-binding domain. The term basic zipper (bZip) now is frequently used to refer to all proteins with these common structural features. Many basic-zipper transcription factors are heterodimers of two different polypeptide chains, each containing one basic-zipper region.

Heterdimeric Transcription Factors Increase Regulatory Diversity

Three types of DNA-binding proteins discussed in the previous section can form heterodimers: C4 zinc-finger proteins, basic-zipper proteins, and helix-loop-helix proteins. Other classes of transcription factors whose structures have not yet been determined also form heterodimeric proteins. One consequence of heterodimeric formation is an expansion of the number of potential DNA sequences that a family of factors can bind. Heterodimer formation also allows different combinations of activation domains to be brought together at regulatory sequences. In addition, there are examples of basic-zipper and helix-loop-helix proteins that block DNA binding when they dimerize with another polypeptide otherwise capable of binding DNA. When these inhibitory factors are expressed, they repress transcriptional activation by the factors with which they interact.

The rules governing the interactions of members of a transcription-factor class are complex. In the example shown in
Fig. 11-52, factors A, B, and C can interact with each other, but the inhibitory factor interacts only with factor A. This combinatorial complexity expands both the number of DNA sites from which these factors can activate transcription and the ways in which they can be regulated. This is not possible for transcription factors that bind only as monomers or homodimers.

A Diverse Group of Amino Acid Sequences Are Found in Activation Domains

Although the three-dimensional structures of the DNA-binding domains from numerous eukaryotic transcription factors have been determined, the structure of not one activation domain has yet been solved. Nonetheless, activation domains defined by mutation analysis exhibit common amino acid sequence features in some cases.

For example, Gal4, Gcn4, and most other yeast transcription factors analyzed so far are rich in acidic amino acids (aspartic and glutamic acids). Deletion analyses of numerous transcription factors from mammals and Drosophila have identified several classes of activation domains. Some are glutamine rich, some are proline rich, and some are rich in the closely related amino acids serine and threonine, both of which have hydroxyl groups. However, some strong activation domains that are not particularly rich in any specific amino acid also have been identified.

Most activation domains characterized in yeast transcription factors also stimulate transcription in mammalian cells, whereas a number of mammalian activation domains do not stimulate transcription when tested in yeast. Thus, while some activation mechanisms function in all eukaryotic cells, some mechanisms may have evolved since the divergence of yeast and animals.

There are many types of DNA-binding domains

Comparisons between the sequences of many transcription factors suggest that common types of motifs can be found that are responsible for binding to DNA. The motifs are usually quite short and comprise only a small part of the protein structure. Motifs have also been identified that are responsible for activating transcription via interactions between proteins of the transcription apparatus.

We have detailed information about several groups of proteins that regulate transcription by using particular motifs to bind DNA:

The steroid receptors are defined as a group by a functional relationship: each receptor is activated by binding a particular steroid. The glucocorticoid receptor is the most fully analyzed. Together with other receptors, such as the thyroid hormone receptor or the retinoic acid receptor, the steroid receptors are members of a superfamily of transcription factors with the same general mode of action.

The zinc finger motif comprises a DNA-binding domain. It was originally recognized in factor TFIIIA, which is required for RNA polymerase III to transcribe 5S rRNA genes. It has since been identified in several other transcription factors (and presumed transcription factors). A distinct form of the motif is found also in the steroid receptors.

The helix-turn-helix motif was originally identified as the DNA-binding domain of phage repressors. One alpha-helix lies in the wide groove of DNA; the other lies at an angle across DNA. A related form of the motif is present in the homeodomain, a sequence first characterized in several proteins coded by genes concerned with developmental regulation in Drosophila. It is also present in genes for mammalian transcription factors.

The amphipathic helix-loop-helix (HLH) motif has been identified in some developmental regulators and in genes coding for eukaryotic DNA-binding proteins. Each amphipathic helix presents a face of hydrophobic residues on one side and charged residues on the other side. The length of the connecting loop varies from 12-28 amino acids. The motif enables proteins to dimerize, and a basic region near this motif contacts DNA.

Leucine zippers consist of a stretch of amino acids with a leucine residue in every seventh position. A leucine zipper in one polypeptide interacts with a zipper in another polypeptide to form a dimer. Adjacent to each zipper is a stretch of positively charged residues that is involved in binding to DNA.

The activity of an inducible transcription factor may be regulated in any one of several ways (Figure AZ)

A factor is tissue-specific because it is synthesized only in a particular type of cell. This is typical of factors that regulate development, such as homeodomain proteins.

The activity of a factor may be directly controlled by modification. HSTF is converted to the active form by phosphorylation. AP1 (a heterodimer between the subunits Jun and Fos) is converted to the active form by phosphorylating the Jun subunit.

A factor is activated or inactivated by binding a ligand. The steroid receptors are prime examples. Ligand binding may influence the localization of the protein (causing transport from cytoplasm to nucleus), as well as determining its ability to bind to DNA.

One transcription factor is produced as a protein bound to the nuclear envelope and endoplasmic reticulum. The absence of sterols (such as cholesterol) causes the cytosolic domain to be cleaved; it then translocates to the nucleus and provides the active form of the transcription factor.

Availability of a factor may vary; for example, the factor NF-kappa B (which activates immunoglobulin kappa genes in B lymphocytes) is present in many cell types. But it is sequestered in the cytoplasm by the inhibitory protein I-kappa B. In B lymphocytes, NF-kappa B is released from I-kappa B and moves to the nucleus, where it activates transcription.

A dimeric factor may have alternative partners. One partner may cause it to be inactive; synthesis of the active partner may displace the inactive partner. Such situations may be amplified into networks in which various alternative partners pair with one another, especially among the HLH proteins.

Mechanisms of Hormonal Control of Nuclear-Receptor Activity

Hormone binding to a nuclear receptor regulates its activity as a transcription factor. This regulation differs in some respects for heterodimeric and homodimeric nuclear receptors.

When heterodimeric nuclear receptors (e.g., RXR-VDR, responsive to vitamin D3; RXR-TR, responsive to thyroid hormone; and RXR-RAR, responsive to retenoic acid. Note: RXR is a common nuclear-receptor monomer) are bound to their cognate sites in DNA, they act as repressors or activators of transcription depending on whether hormone occupies the ligand-binding site. In the absence of hormone, these nuclear receptors direct histone deacetylation at nearby nucleosomes. In the presence of hormone, the ligand-binding domain undergoes a dramatic conformational change. In the ligand-bound conformation, these nuclear receptors can direct hyperacetylation of histones in nearby nucleosomes, thereby reversing the repressing effects of the free ligand-binding domain. The N-terminal activation domain in these receptors then probably interacts with additional factors, stimulating the cooperative assembly of an initiation complex.

In contrast to heterodimeric nuclear receptors, which are located exclusively in the nucleus, homodimeric receptors are found both in the cytoplasm and nucleus, and their activity is regulated by controlling their transport from the cytoplasm to the nucleus. The hormone-dependent translocation of the homodimeric glucocorticoid receptor (GR) was demonstrated in the transfection experiments shown in Figure BA1. The GR hormone-binding domain alone mediates this transport. Subsequent studies showed that, in the absence of hormone, the glucocorticoid receptor is anchored in the cytoplasm as a large protein aggregate complexed with inhibitor proteins, including Hsp90, a protein related to Hsp70, the major heatshock chaperone. In this situation, the receptor cannot interact with target genes; hence, no transcriptional activation occurs. Binding of hormone releases the glucocorticoid receptor from its cytoplasmic anchor, allowing it to enter the nucleus where it can bind to response elements associated with target genes (Figure BA2). Once the receptor with bound hormone interacts with a response element, it activates transcription by directing histone hyperacetylation and facilitating cooperative assembly of an initiation complex.

Some Eukaryotic Regulatory Proteins Function As Repressors

Most eukaryotic transcription factors that have been studied extensively are activators, which stimulate transcription. However, proteins that repress transcription also have been identified in eukaryotes. Some repressor proteins function by binding to DNA sequences that overlap activator-binding sites. Other repressors function by binding to sequences that overlap a transcription start site, much like prokaryotic repressors. In both cases, binding of a repressor molecule to a specific DNA site blocks binding of proteins required to initiate transcription.

In many cases, however, eukaryotic repressors inhibit transcription without interfering with the binding of an activator or general transcription factor. One important example is the protein encoded by the Wilm's tumor (WT1) gene, which is expressed preferentially in the developing kidney. Children who inherit mutations in both the maternal and paternal WT1 genes, so that they produce no functional WT1 protein, invariably develop kidney tumors early in life. The WT1 protein has a C2H2 zinc-finger DNA-binding domain, and binding sites for the protein were discovered in the control region of the gene encoding a transcription activator called EGR-1. The experiment outlined in
Fig. 11-55 demonstrated that WT1 protein repressed transcription of a reporter gene linked to the EGR-1 promoter region.

Eukaryotic transcription repressors like WT1 appear to be the functional converse of activators. They can inhibit transcription from a gene they do not normally regulate when their cognate binding sites are placed within a few hundred base pairs of the gene's start site. This effect was demonstrated in an experiment with Kruepple protein, which represses transcription of several genes during embryonic development of Drosophila. Like activators, many eukaryotic repressors have two functional domains: a DNA-binding domain and a repression domain. When the Kruepple repression domain was fused to the DNA-binding domain of the E. coli lac repressor, the resulting fusion protein inhibited transcription of a reporter gene linked to upstream lac operator sites. As discussed previously for activation domains, a variety of amino acid sequences can function as repression domains. Little information is yet available on the mechanism by which eukaryotic repressors inhibit transcription.