SPLICING

Eukaryotic genes are often interrupted by sequences that do not appear in the final RNA. The intervening sequences that are removed are called introns. The process by which introns are removed is referred to as splicing. The sequences remaining after the splicing are called exons. All of the different major types of RNA in a eukaryotic cell can have introns. Although most higher eukaryotic genes have introns, some do not. Higher eukaryotes tend to have a larger percentage of their genes containing introns than lower eukaryotes, and the introns tend to be larger as well. The pattern of intron size and usage roughly follows the evolutionary tree, but this is only a general tendency. The human titin gene has the largest number of exons (178), the longest single exon (17,106 nucleotides) and the longest coding sequence (80,781 nucleotides = 26,927 amino acids). The longest primary transcript, however, is produced by the dystrophin gene (2.4 million nucleotides).

RNA-DNA Hybridization Reveals Spliced-out Introns

RNA splicing was discovered during analysis of adenovirus mRNA synthesis. In these studies, the abundant viral mRNA encoding the major virion capsid protein, called hexon, was isolated by gel electrophoresis of cytoplasmic polyadenylated RNA. To map the region of the viral DNA coding for hexon mRNA, researchers hybridized the isolated mRNA to the coding strand and the RNA-DNA hybrid was visualized in the electron microscope (Figure 12-17). Three loops of single-stranded DNA (A, B, and C) were observed; these correspond to the three introns in the hexon gene. Since these intron sequences in the viral genomic DNA are not present in mature hexon nRNA, they loop out between the exon sequences that hybridize to their complementary sequences in the mRNA.

Similar analyses of hybrids between RNA isolated from the nuclei of infected cells and viral DNA revealed RNAs that were coliner withe the viral DNA (primarly transcripts) and RNAs with one or two of the introns removed (processing intermediates). These results, together with the findings that the 5' cap and 3' poly-A tail of mRNA precoursors are retained in mature cytoplasmic mRNAs, led to the realization that introns are removed from primary transcripts as exons are spliced together. For short transcription units, RNA splicing usually follows cleavage and polyadenylation of the 3' end of the primary transcript. But for long transcription units containing multiple exons, splicing of exons in the nascent RNA sometimes begins before transcription of the gene is complete.

Splice Site in Pre-mRNAs Exhibit Short, Conserved Sequences

The location of splice sites in a pre-mRNA can be determined by comparing the sequence of genomic DNA with that of the cDNA prepared from the corresponding mRNA. Sequences that are present in the genomic DNA but absent from the cDNA represent introns and indicate the positions of exon-intron boundaries. Such analysis of a large number of different mRNAs revealed moderately conserved, short consensus sequences at intron-exon boundaries in eukaryotic pre-mRNA; in higher organisms, a pyrimidine-rich region just upstream of the 3' splice site also is common (Figure 12-18). The only universally conserved nucleotides are the (5')GU and (3')AG in the intron. Deletion analyses of the center portion of introns in various pre-maRNAs have shown that generally only 30-40 nucleotides at each end of an intron are necessary for splicing to occur at normal rates.

Recombinant DNAs containing the 5' exon-intron junction of one transcription unit (e.g., SV40 late region) and the 3' intron-exon of another (e.g., mouse beta-globin gene) have been prepared and introduced into cultured cells. Spliced mRNA molecules are formed in which the two exon sequences are joined and the chimeric intron is deleted precisely. The formation of correctly spliced mRNAs in such experiments indicates that the cell's splicing machinery can recognize and correctly join heterologous 5' and 3' splice sites.

Excision of Introns and Splicing of Exons in Pre-mRNAs Occur via Two Transesterification Reactions

Experiments with cell extracts that accurately excise intons and join exons at the correct sites in a pre-mRNA molecule were critical for understanding the mechanism of RNA splicing. Analysis of the intermediates formed duing such in vitro splicicing reactions led to the conclusion that introms are not cut out as linear locecules (Figure 12-19). Suprisingly, an intron is removed as a lariat structure in which the 5' G of the intron is joined in a unusual 2'-5'-phosphodiester bond to an adenosine near the 3' end of the intron. The adenosine is called the branch point because it forms an RNA branch in the lariat structure (Figure of Lariat Structure).

The finding that exicised introns have a branched lariat structure led to the discovery that splicing of exons proceeds via two sequential transesterification reactions as illustrated in
Figure 12-20. In each reaction, one phosphate-ester bond is exchanged for another. Since the number of phosphate-ester bonds in the molecule is not changed in either reaction, no energy is consumed. The net result of these two transesterification reactions is that two exons are ligated and the intervening intron is released as a branched lariat structure.

Small Nuclear Ribonucleoprotein Particles Assist in Splicing

Six small U-rich RNAs are abundant in the nuclei of mammalian cells. Designated U1 through U6, these small nuclear RNAs (snRNAs) range in size from 107 to 210 nucleotides (Figure 12-21). Even before splicing was accomplished in vitro, several observations led to the suggestion that snRNAs assisted in the splicing reaction. First, the short consensus sequence at the 5' end of introns (CAG|GUAAGU) was found to be complementary to a sequence near the 5' end of the snRNA called U1. Second, snRNAs were found associated with hnRNPs in nuclear extracts.

The snRNAs associate in the nucleus with six to to ten proteins to form small nuclear ribonucleoprotein particles (snRNPs). Some of these proteins are common to all snRNPs, and some are specific for individual snRNPs. For unknown reasons, antisera from patients with the autoimmune disease systemic lupus erythematosus (SLE) contain antibodies (called anti-Sm antibody) directed against a protein that is common to the U1, U2, U4, and U5 snRNPs (Figure 12-21b). Antisera from some SLE patients were found to have greater specificity for one or another of the individual snRNPs. These specific antisera have been widely used in characterizing components of the splicing reaction. For example, when antiserum that is specific for U1 snRNP is added to an in vitro splicing mixture, the splicing reaction is interrupted, confirming the importance of the U1 snRNP.

In subsequent studies, a synthetic oligonucleotide that is complementary to and thus hybridizes with the 5'-end region of U1 snRNA was found to block splicing, indicating that this is the region that binds to pre-mRNA during the splicing reaction. Further evidence for the importance of bse pairing between the 5' end of U1 snRNA and the conserved 5' splice-site sequence of pre-mRNA came from experiments with genes that were mutated in the 5' splice-site consensus sequence of an intron. When genes containing these mutations were transfected into cells, splicing of the corresponding pre-mRNAs was blocked. However, when a mutant gene was cotransfected with a mutant U1 snRNA gene containing a compensating sequence change that restored base pairing with the mutant 5' splice site, splicing was restored (
Figure 12-22). This result argued strongly that base pairing between the 5' splice site of a pre-mRNA and the 5' region of U1 snRNA is required for RNA splicing.

After discovery of the lariat structure of excised introns, a consensus sequence was recognized in the region flanking the branch point in pre-mRNAs (see Figure 12-18). In the yeast S. cerevisiae, virtually all introns have the sequence UACUAAC in the region of the branch-point A (in bold face). Except for the branch-point A, the yeast sequence is complementary to an internal sequence in U2 snRNA. Compensatory mutation experiments, similar to those just described with U1 snRNA and 5' splice sites, demonstrated that base pairing between U2 snRNA and the branch-site sequence in pre-mRNA is critical to splicing. Significantly, the branch-point A itself, which is not base-paired to U2 snRNA, "bulges out," allowing its 2' -hydroxyl to participate in the first transesterification reaction of RNA splicing (Figure 12-20).

These studies with U1 and U2 snRNAs indicate that during splicing they base-pair with pre-mRNA as shown in
Figure 12-23. Additional compensatory mutation experiments demonstrated that other RNA-RNA interactions also occur during splicing (Figure 12-23b,c). Based on the results of these experiments, identification of reaction intermediates, and other biochemical analyses, the spliceosomal splicing model depicted in Figure 12-24 was proposed. According to this model, U1 and U2 snRNAs, as part of the U1 and U2 snRNPs, base-pair with the 5' splice site and banch-point regions of an intron, respectively. In yeast, base pairing between U2 snRNA and the branch site occurs over six bases (see Figure 12-23a). In higher eukaryotes where the branch-site sequence is less highly conserved, the association of U2 snRNP with pre-mRNA is assisted by a protein called U2AF, which binds to the pyrimidine-rich region near the 3' splice site. U2AF interacts with RNA through an RNP motif and is thought to interact with other proteins required for splicing through a domain containing repeats of the dipeptide serine-arginine (the SR motif). The snRNAs in the U4 and U6 snRNPs base-pair over an extended complementary region (see Figure 12-23b); this complex then associates, presumably via protein-protein interactions, with the previously formed complex consisting of pre-mRNA base-paired to U1 and U2 snRNPs. The resulting high-molecular-weight (60S) ribonucleoprotein complex is called a spliceosome (Figure 12-25).

After formation of the spliceosome, extensive rearrangements occur in the pairing of snRNAs and the pre-mRNA. The U4 and U6 snRNAs dissociate from each other; U6 snRNA then base-pairs to a sequence in U2 snRNA that is just 5' to the sequence that interacts with the branch site in pre-mRNA. U1 is thought to dissociate from the 5' splice site in the pre-mRNA, after which U5 base-pairs exon sequences flanking the splice sites. These rearrangements yield the interactions shown in Figure 12-23c.

The rearranged spliceosome then catalyzes the two transesterification reactions that result in RNA splicing. After the second transesterification reaction, the ligated exons are released from the spliceosome while the lariat intron remains associated with the snRNPs. This final intron-snRNP complex is unstable and disociates. The individual snRNPs released can participate in a new cycle of splicing. The excised intron is rapidly degraded by a "debranching enzyme," which hydrolyzes the 2'-5' phosphodiester bond at the branch point, and other nuclear RNases.

It is estimated that at least one hundred proteins are involved in RNA splicing, making this process comparable in complexity to protein synthesis and initiation of transcription. Some of these splicing factors are associated with snRNPs, but others are not. In yeast, various proteins that are required for pre-mRNA splicing have been identified by analysis of RPR (precursor RNA processing) mutants. In these temperature-sensitive mutants, RNA splicing is blocked at nonpermissive temperatures. In mammalian systems, purification of nuclear extracts has led to identification of proteins required for specific steps in the splceosomal splicing cycle. Many of the yeast PRP genes have been cloned, as have some genes encoding mammalian splicing factors. Sequencing of these genes has revealed that the encoded proteins contain domains with the RNP motif and the SR motif; some proteins also exhibit sequence homologies to known RNA helicases. As noted earlier, RNP domains are involved in binding RNA; SR domains are involved in protein-protein interactions and also may contribute to binding RNA. RNA helicases may be necessary for the base-pairing rearrangements that occur in snRNAs during spliceosomal splicing cycle, particularly the dissociation of U4 from U6 and of U1 from the 5' splice site.

Drosophila Sex Determination

The isolation and characterization of genes required for normal sex determination in Drosophila showed that three of these genes operate in a cascade of gene regulation by controlling the splicing of specific primary transcripts. The first of these genes to be expressed, called sex-lethal (sxl), is transcribed from an early promoter (PE) that is active only in very early female embryos (Figure 12-30). The primary sxl transcript is spliced into an mRNA with two exons, resulting in expression of Sex-lethal (Sxl) protein in early female embryos. Sex-lethal protein is a sequence-specific RNA-binding protein that binds to RNA through an RNP motif at the C-terminal end.

As development of the Drosophila embryo proceeds, transcription of the sxl gene from PE is repressed, while transcription from an upstream late promoter (PL) is induced (Figure 12-30b). Transcription from PL occurs in both males and females and continues from this time in development onward. The sxl transcript produced from PL contains four exons; the first one is located upstream and the second one downstream of the first exon in the early sxl transcript produced from PE. In both males and females, the first exon of the PL-derived late transcript is spliced to the second exon, at a splice site that is not used in processing the early transcript. Subsequent splicing of PL transcripts differs in males and females due to the presence of Sex-lethal protein resulting from transcription from PE in females.

In males, the second exon of the late transcript is spliced to a third exon, which in turn is spliced to a fourth exon encoding the RNA-binding RNP motif. The third exon of the resulting mRNA contains a stop codon (UGA). Consequently, translation of this mRNA yields a Sxl protein that lacks the RNA-binding domain and is nonfunctional. In females, the Sxl protein produced from the early PE transcript binds to specific sequences in the late sxl pre-mRNA, preventing the splicing of exon 2 to exon 3. Instead, exon 2 is spliced to exon 4, producing an mRNA in which exon 3 of the male-specific sxl mRNA containing the stop codon is "skipped." This late female-specific mRNA is translated into a protein containing the RNP motif that binds specifically to the sxl transcript. As a result, that late female Sxl protein, which differs from the early female Sxl protein at the N-terminus, but has the identical RNP motif at its C-terminus, also regulates splcing of the late sxl primary transcript in females. In this way, the late female Sxl protein autoregulates its own production, ensuring continued expression of a functional Sxl protein after repression of the early promoter and activation of the late promoter.

Sxl protein also regulates expression of the transformer (tra) gene, the second gene to be expressed in the regulatory cascade leading to sexual differentiation in Drosophila. This control involves binding of Sxl protein to the 3' pyridimidine-rich region in the intron between exons 1 and 2 in tra pre-mRNA (
Figure 12-31). The bound Sxl blocks binding of U2AF and U2 snRNPs in this region, preventing splicing to the 5' end of exon 2 and favoring splicing to an alternative splice site at the 5' end of exon 3. As a result, female embryos express functional Transformer (Tra) protein. As in the case of the sxl gene, male embryos produce tra mRNAs containing a stop codon; these are translated into a nonfunctional protein. Although Tra protein does not contain an RNA-binding domain, it forms a complex with another protein, Transformer-2 (Tra2), that does. The Tra-Tra2 complex binds to a repeated 13-base sequence in the pre-mRNA synthesized from the double-sex (dsx) gene. In this case, the bound proteins activate an upstram 3' splice site that deviates considerably from the consensus 3' splice-site sequence (see Figure 12-18). This splice site is not used in males, because they produce no functional Tra protein and therefore cannot activate the splice site. Both Tra and Tra2 proteins contain serine-arginine (SR) repeats similar to those in U2AF protein; as noted earlier, U2AF has been shown to interact with the pyridimidine-rich region near 3' splice sites and with several proteins required for RNA splicing. The SR repeats in the Tra-Tra2 complex bound to the dsx pre-mRNA bind to SR repeats in one of the general splicing factors. This interaction promotes assembly of a complex of splicing proteins that activates the upstream 3' splice site, leading to production of a female-specific dsx mRNA, which is polydenlyated at the end of exon 4 (see Figure 12-31). This mRNA is translated into a female-specific Double-sex protein that represses transcription of genes required for male sexual development. In male embryos, which lack Tra protein, the third exon of the dsx transcript is splicied to an altenative exon (exon5); this is spliced to additional downstream exons, producing a male-specific dsx mRNA. This mRNA encodes an alternative transcription repressor that inhibits female sexual development.

Consequences of splicing

Splicing has evolutionary implications. Exons often coincide with protein "domains". Domains are parts of the protein with a specific function. Exons can be readily "exchanged" between different genes by recombination. This means that new types of proteins can be formed relatively easily. Splicing also allows a cell to "swap" exons during gene expression. For example, during development, some genes are spliced one way, and then spliced a different way later. Changing the way a mRNA is spliced changes the amino acid sequence in the protein made from it, so cells can in this way "modify" the sequence, and function, of a protein. Splicing thus offers yet another opportunity for regulation of gene expression in eukaryotic cells. (Remember we have already seen how regulation can occur at the level of transciption initiation. Splicing is yet another mechanism for regulating whether or not a specific version of a protein is made, how much of it is made, and when it is made).

One very good example of exon shuffling can be seen in the tropomyosin gene(
Figure 28.34). Tropomyosin is a protein involved in muscle-like contraction in cells. It is present in many different types of cells of the body. Examination of the tropomyosin mRNAs in different cells reveals a strikingly different arrangement of exons. Students should note that "shuffling" of exons, as shown here, has some constraints. Exons that are 3' to another exon are never placed 5' to it after splicing. Another interesting feature of the tropomyosin gene organization is the presence of two different 3' polyadenylation sites.

As mentioned earlier, about 5 percent of all pre-mRNAs in higher eukaryotes are subject to this kind of regulated splicing (Other Examples of Splicing Patterns). Regulated splicing can lead to the production of different proteins (e.g., male- and female-specific Double-sex proteins) or can control expresson of functional proteins by including or excluding a stop codon as in the Drosophila sex-lethal and transformer genes. It is likely that in many other examples of regulated RNA splicing, sequence-specific RNA-binding proteins either inhibit or activate splice sites. During in vitro splicing of pre-mRNA from complex transcription units, differences in the relative concentrations of hnRNP proteins and essential splicing factors also can influence the selection of alternative splice site. Consequently, it has been suggested that differences in the concentrations of general splicng factors and hnRNP proteins between different cell types also may contribute to cell type-specific splicing of pre-mRNAs from complex transcription units.