Eukaryotic genes are often interrupted by sequences that do not appear in the final RNA. The intervening sequences that are removed are called introns. The process by which introns are removed is referred to as splicing. The sequences remaining after the splicing are called exons. All of the different major types of RNA in a eukaryotic cell can have introns. Although most higher eukaryotic genes have introns, some do not. Higher eukaryotes tend to have a larger percentage of their genes containing introns than lower eukaryotes, and the introns tend to be larger as well. The pattern of intron size and usage roughly follows the evolutionary tree, but this is only a general tendency. The human titin gene has the largest number of exons (178), the longest single exon (17,106 nucleotides) and the longest coding sequence (80,781 nucleotides = 26,927 amino acids). The longest primary transcript, however, is produced by the dystrophin gene (2.4 million nucleotides).
RNA splicing was discovered during analysis
of adenovirus mRNA synthesis. In these studies, the abundant viral
mRNA encoding the major virion capsid protein, called hexon, was
isolated by gel electrophoresis of cytoplasmic polyadenylated
RNA. To map the region of the viral DNA coding for hexon mRNA,
researchers hybridized the isolated mRNA to the coding strand
and the RNA-DNA hybrid was visualized in the electron microscope
(Figure 12-17). Three
loops of single-stranded DNA (A, B, and C) were observed; these
correspond to the three introns in the hexon gene. Since these
intron sequences in the viral genomic DNA are not present in mature
hexon nRNA, they loop out between the exon sequences that hybridize
to their complementary sequences in the mRNA.
Similar analyses of hybrids between RNA isolated from the nuclei
of infected cells and viral DNA revealed RNAs that were coliner
withe the viral DNA (primarly transcripts) and RNAs with one or
two of the introns removed (processing intermediates). These results,
together with the findings that the 5' cap and 3' poly-A tail
of mRNA precoursors are retained in mature cytoplasmic mRNAs,
led to the realization that introns are removed from primary transcripts
as exons are spliced together. For short transcription units,
RNA splicing usually follows cleavage and polyadenylation of the
3' end of the primary transcript. But for long transcription units
containing multiple exons, splicing of exons in the nascent RNA
sometimes begins before transcription of the gene is complete.
The location of splice sites in a pre-mRNA
can be determined by comparing the sequence of genomic DNA with
that of the cDNA prepared from the corresponding mRNA. Sequences
that are present in the genomic DNA but absent from the cDNA represent
introns and indicate the positions of exon-intron boundaries.
Such analysis of a large number of different mRNAs revealed moderately
conserved, short consensus sequences at intron-exon boundaries
in eukaryotic pre-mRNA; in higher organisms, a pyrimidine-rich
region just upstream of the 3' splice site also is common (Figure
12-18). The only universally conserved
nucleotides are the (5')GU and (3')AG in the intron. Deletion
analyses of the center portion of introns in various pre-maRNAs
have shown that generally only 30-40 nucleotides at each end of
an intron are necessary for splicing to occur at normal rates.
Recombinant DNAs containing the 5' exon-intron junction of one
transcription unit (e.g., SV40 late region) and the 3' intron-exon
of another (e.g., mouse beta-globin gene) have been prepared and
introduced into cultured cells. Spliced mRNA molecules are formed
in which the two exon sequences are joined and the chimeric intron
is deleted precisely. The formation of correctly spliced mRNAs
in such experiments indicates that the cell's splicing machinery
can recognize and correctly join heterologous 5' and 3' splice
sites.
Experiments with cell extracts that accurately
excise intons and join exons at the correct sites in a pre-mRNA
molecule were critical for understanding the mechanism of RNA
splicing. Analysis of the intermediates formed duing such in vitro
splicicing reactions led to the conclusion that introms are not
cut out as linear locecules (Figure 12-19). Suprisingly,
an intron is removed as a lariat structure in which the 5' G of
the intron is joined in a unusual 2'-5'-phosphodiester bond to
an adenosine near the 3' end of the intron. The adenosine is called
the branch point because it forms an RNA branch in the lariat
structure (Figure of Lariat Structure).
The finding that exicised introns have a branched lariat structure
led to the discovery that splicing of exons proceeds via two sequential
transesterification reactions as illustrated in Figure
12-20. In each reaction, one phosphate-ester
bond is exchanged for another. Since the number of phosphate-ester
bonds in the molecule is not changed in either reaction, no energy
is consumed. The net result of these two transesterification reactions
is that two exons are ligated and the intervening intron is released
as a branched lariat structure.
Six small U-rich RNAs are abundant in the nuclei
of mammalian cells. Designated U1 through U6, these small nuclear
RNAs (snRNAs) range in size from 107 to 210 nucleotides (Figure
12-21). Even before splicing was
accomplished in vitro, several observations led to the suggestion
that snRNAs assisted in the splicing reaction. First, the short
consensus sequence at the 5' end of introns (CAG|GUAAGU) was found
to be complementary to a sequence near the 5' end of the snRNA
called U1. Second, snRNAs were found associated with hnRNPs in
nuclear extracts.
The snRNAs associate in the nucleus with six to to ten proteins
to form small nuclear ribonucleoprotein particles (snRNPs). Some
of these proteins are common to all snRNPs, and some are specific
for individual snRNPs. For unknown reasons, antisera from patients
with the autoimmune disease systemic lupus erythematosus (SLE)
contain antibodies (called anti-Sm antibody) directed against
a protein that is common to the U1, U2, U4, and U5 snRNPs (Figure
12-21b). Antisera from some SLE patients were found to have greater
specificity for one or another of the individual snRNPs. These
specific antisera have been widely used in characterizing components
of the splicing reaction. For example, when antiserum that is
specific for U1 snRNP is added to an in vitro splicing mixture,
the splicing reaction is interrupted, confirming the importance
of the U1 snRNP.
In subsequent studies, a synthetic oligonucleotide that is complementary
to and thus hybridizes with the 5'-end region of U1 snRNA was
found to block splicing, indicating that this is the region that
binds to pre-mRNA during the splicing reaction. Further evidence
for the importance of bse pairing between the 5' end of U1 snRNA
and the conserved 5' splice-site sequence of pre-mRNA came from
experiments with genes that were mutated in the 5' splice-site
consensus sequence of an intron. When genes containing these mutations
were transfected into cells, splicing of the corresponding pre-mRNAs
was blocked. However, when a mutant gene was cotransfected with
a mutant U1 snRNA gene containing a compensating sequence change
that restored base pairing with the mutant 5' splice site, splicing
was restored (Figure 12-22). This
result argued strongly that base pairing between the 5' splice
site of a pre-mRNA and the 5' region of U1 snRNA is required for
RNA splicing.
After discovery of the lariat structure of excised introns, a
consensus sequence was recognized in the region flanking the branch
point in pre-mRNAs (see Figure 12-18). In the yeast S. cerevisiae,
virtually all introns have the sequence UACUAAC in the
region of the branch-point A (in bold face). Except for
the branch-point A, the yeast sequence is complementary to an
internal sequence in U2 snRNA. Compensatory mutation experiments,
similar to those just described with U1 snRNA and 5' splice sites,
demonstrated that base pairing between U2 snRNA and the branch-site
sequence in pre-mRNA is critical to splicing. Significantly, the
branch-point A itself, which is not base-paired to U2 snRNA, "bulges
out," allowing its 2' -hydroxyl to participate in the first
transesterification reaction of RNA splicing (Figure 12-20).
These studies with U1 and U2 snRNAs indicate that during splicing
they base-pair with pre-mRNA as shown in Figure 12-23. Additional compensatory mutation experiments demonstrated
that other RNA-RNA interactions also occur during splicing (Figure
12-23b,c). Based on the results of these experiments, identification
of reaction intermediates, and other biochemical analyses, the
spliceosomal splicing model depicted in Figure 12-24 was proposed. According to this model, U1 and U2 snRNAs,
as part of the U1 and U2 snRNPs, base-pair with the 5' splice
site and banch-point regions of an intron, respectively. In yeast,
base pairing between U2 snRNA and the branch site occurs over
six bases (see Figure 12-23a). In higher eukaryotes where the
branch-site sequence is less highly conserved, the association
of U2 snRNP with pre-mRNA is assisted by a protein called U2AF,
which binds to the pyrimidine-rich region near the 3' splice site.
U2AF interacts with RNA through an RNP motif and is thought to
interact with other proteins required for splicing through a domain
containing repeats of the dipeptide serine-arginine (the SR motif).
The snRNAs in the U4 and U6 snRNPs base-pair over an extended
complementary region (see Figure 12-23b); this complex then associates,
presumably via protein-protein interactions, with the previously
formed complex consisting of pre-mRNA base-paired to U1 and U2
snRNPs. The resulting high-molecular-weight (60S) ribonucleoprotein
complex is called a spliceosome (Figure 12-25).
After formation of the spliceosome, extensive rearrangements occur
in the pairing of snRNAs and the pre-mRNA. The U4 and U6 snRNAs
dissociate from each other; U6 snRNA then base-pairs to a sequence
in U2 snRNA that is just 5' to the sequence that interacts with
the branch site in pre-mRNA. U1 is thought to dissociate from
the 5' splice site in the pre-mRNA, after which U5 base-pairs
exon sequences flanking the splice sites. These rearrangements
yield the interactions shown in Figure 12-23c.
The rearranged spliceosome then catalyzes the two transesterification
reactions that result in RNA splicing. After the second transesterification
reaction, the ligated exons are released from the spliceosome
while the lariat intron remains associated with the snRNPs. This
final intron-snRNP complex is unstable and disociates. The individual
snRNPs released can participate in a new cycle of splicing. The
excised intron is rapidly degraded by a "debranching enzyme,"
which hydrolyzes the 2'-5' phosphodiester bond at the branch point,
and other nuclear RNases.
It is estimated that at least one hundred proteins are involved
in RNA splicing, making this process comparable in complexity
to protein synthesis and initiation of transcription. Some of
these splicing factors are associated with snRNPs, but others
are not. In yeast, various proteins that are required for pre-mRNA
splicing have been identified by analysis of RPR (precursor
RNA processing) mutants. In these temperature-sensitive mutants,
RNA splicing is blocked at nonpermissive temperatures. In mammalian
systems, purification of nuclear extracts has led to identification
of proteins required for specific steps in the splceosomal splicing
cycle. Many of the yeast PRP genes have been cloned, as
have some genes encoding mammalian splicing factors. Sequencing
of these genes has revealed that the encoded proteins contain
domains with the RNP motif and the SR motif; some proteins also
exhibit sequence homologies to known RNA helicases. As noted earlier,
RNP domains are involved in binding RNA; SR domains are involved
in protein-protein interactions and also may contribute to binding
RNA. RNA helicases may be necessary for the base-pairing rearrangements
that occur in snRNAs during spliceosomal splicing cycle, particularly
the dissociation of U4 from U6 and of U1 from the 5' splice site.
The isolation and characterization of genes
required for normal sex determination in Drosophila showed
that three of these genes operate in a cascade of gene regulation
by controlling the splicing of specific primary transcripts. The
first of these genes to be expressed, called sex-lethal
(sxl), is transcribed from an early promoter (PE) that
is active only in very early female embryos (Figure 12-30). The primary sxl transcript is spliced into
an mRNA with two exons, resulting in expression of Sex-lethal
(Sxl) protein in early female embryos. Sex-lethal protein is a
sequence-specific RNA-binding protein that binds to RNA through
an RNP motif at the C-terminal end.
As development of the Drosophila embryo proceeds, transcription
of the sxl gene from PE is repressed, while transcription
from an upstream late promoter (PL) is induced (Figure 12-30b).
Transcription from PL occurs in both males and females and continues
from this time in development onward. The sxl transcript
produced from PL contains four exons; the first one is located
upstream and the second one downstream of the first exon in the
early sxl transcript produced from PE. In both males and
females, the first exon of the PL-derived late transcript is spliced
to the second exon, at a splice site that is not used in processing
the early transcript. Subsequent splicing of PL transcripts differs
in males and females due to the presence of Sex-lethal protein
resulting from transcription from PE in females.
In males, the second exon of the late transcript is spliced to
a third exon, which in turn is spliced to a fourth exon encoding
the RNA-binding RNP motif. The third exon of the resulting mRNA
contains a stop codon (UGA). Consequently, translation of this
mRNA yields a Sxl protein that lacks the RNA-binding domain and
is nonfunctional. In females, the Sxl protein produced from the
early PE transcript binds to specific sequences in the late sxl
pre-mRNA, preventing the splicing of exon 2 to exon 3. Instead,
exon 2 is spliced to exon 4, producing an mRNA in which exon 3
of the male-specific sxl mRNA containing the stop codon
is "skipped." This late female-specific mRNA is translated
into a protein containing the RNP motif that binds specifically
to the sxl transcript. As a result, that late female Sxl
protein, which differs from the early female Sxl protein at the
N-terminus, but has the identical RNP motif at its C-terminus,
also regulates splcing of the late sxl primary transcript
in females. In this way, the late female Sxl protein autoregulates
its own production, ensuring continued expression of a functional
Sxl protein after repression of the early promoter and activation
of the late promoter.
Sxl protein also regulates expression of the transformer
(tra) gene, the second gene to be expressed in the regulatory
cascade leading to sexual differentiation in Drosophila.
This control involves binding of Sxl protein to the 3' pyridimidine-rich
region in the intron between exons 1 and 2 in tra pre-mRNA
(Figure 12-31). The
bound Sxl blocks binding of U2AF and U2 snRNPs in this region,
preventing splicing to the 5' end of exon 2 and favoring splicing
to an alternative splice site at the 5' end of exon 3. As a result,
female embryos express functional Transformer (Tra) protein. As
in the case of the sxl gene, male embryos produce tra
mRNAs containing a stop codon; these are translated into a nonfunctional
protein. Although Tra protein does not contain an RNA-binding
domain, it forms a complex with another protein, Transformer-2
(Tra2), that does. The Tra-Tra2 complex binds to a repeated 13-base
sequence in the pre-mRNA synthesized from the double-sex
(dsx) gene. In this case, the bound proteins activate an
upstram 3' splice site that deviates considerably from the consensus
3' splice-site sequence (see Figure 12-18). This splice site is
not used in males, because they produce no functional Tra protein
and therefore cannot activate the splice site. Both Tra and Tra2
proteins contain serine-arginine (SR) repeats similar to those
in U2AF protein; as noted earlier, U2AF has been shown to interact
with the pyridimidine-rich region near 3' splice sites and with
several proteins required for RNA splicing. The SR repeats in
the Tra-Tra2 complex bound to the dsx pre-mRNA bind to
SR repeats in one of the general splicing factors. This interaction
promotes assembly of a complex of splicing proteins that activates
the upstream 3' splice site, leading to production of a female-specific
dsx mRNA, which is polydenlyated at the end of exon 4 (see
Figure 12-31). This mRNA is translated into a female-specific
Double-sex protein that represses transcription of genes required
for male sexual development. In male embryos, which lack Tra protein,
the third exon of the dsx transcript is splicied to an
altenative exon (exon5); this is spliced to additional downstream
exons, producing a male-specific dsx mRNA. This mRNA encodes
an alternative transcription repressor that inhibits female sexual
development.
Splicing has evolutionary implications. Exons
often coincide with protein "domains". Domains are parts
of the protein with a specific function. Exons can be readily
"exchanged" between different genes by recombination.
This means that new types of proteins can be formed relatively
easily. Splicing also allows a cell to
"swap" exons during gene expression. For example, during
development, some genes are spliced one way, and then spliced
a different way later. Changing the way a mRNA is spliced changes
the amino acid sequence in the protein made from it, so cells
can in this way "modify" the sequence, and function,
of a protein. Splicing thus offers yet another opportunity for
regulation of gene expression in eukaryotic cells. (Remember we
have already seen how regulation can occur at the level of transciption
initiation. Splicing is yet another mechanism for regulating whether
or not a specific version of a protein is made, how much of it
is made, and when it is made).
One very good example of exon shuffling can be seen in the tropomyosin
gene(Figure 28.34). Tropomyosin
is a protein involved in muscle-like contraction in cells. It
is present in many different types of cells of the body. Examination
of the tropomyosin mRNAs in different cells reveals a strikingly
different arrangement of exons. Students should note that "shuffling"
of exons, as shown here, has some constraints. Exons that are
3' to another exon are never placed 5' to it after splicing. Another
interesting feature of the tropomyosin gene organization is the
presence of two different 3' polyadenylation sites.
As mentioned earlier, about 5 percent of all pre-mRNAs in higher eukaryotes are subject to this kind of regulated splicing (Other Examples of Splicing Patterns). Regulated splicing can lead to the production of different proteins (e.g., male- and female-specific Double-sex proteins) or can control expresson of functional proteins by including or excluding a stop codon as in the Drosophila sex-lethal and transformer genes. It is likely that in many other examples of regulated RNA splicing, sequence-specific RNA-binding proteins either inhibit or activate splice sites. During in vitro splicing of pre-mRNA from complex transcription units, differences in the relative concentrations of hnRNP proteins and essential splicing factors also can influence the selection of alternative splice site. Consequently, it has been suggested that differences in the concentrations of general splicng factors and hnRNP proteins between different cell types also may contribute to cell type-specific splicing of pre-mRNAs from complex transcription units.