Lecture 9

Eukaryotic RNA Processing

As mentioned in the last lecture, higher cells (eukaryotic cells) have three types of RNA polymerase (called RNA polymerase I, RNA polymerase II, and RNA polymerase III). In eukaryotes, each type of RNA is made by its own polymerase:

RNA polymerase I makes rRNA
RNA polymerase II makes mRNA
RNA polymerase III makes tRNA

Moreover, RNAs are made in the nucleus of a higher cell, but function in protein synthesis in the cytoplasm. Unlike prokaryotic mRNAs, eukaryotic mRNAs undergo extensive modifications after synthesis as larger precursor RNAs (pre-mRNAs) by RNA polymerase II. These changes include capping, polyadenylation, and splicing.


Modification of the 5'-ends of eukaryotic mRNAs is called capping. The cap consists of a methylated GTP linked to the rest of the mRNA by a 5'-to-5' triphosphate "bridge"(FIGURE). Capping occurs very early during the synthesis of eukaryotic mRNAs, even before mRNA molecules are finished being made by RNA polymerase II. Capped mRNAs are very efficiently translated by ribosomes to make proteins. In fact, some viruses, such as poliovirus, prevent capped cellular mRNAs from being translated into proteins. This enables poliovirus to take over the protein synthesizing machinery in the infected cell to make new viruses.


Modification of the 3'-ends of eukaryotic mRNAs is called polyadenylation (FIGURE). Polyadenylation is the addition of several hundred A nucleotides to the 3' ends of mRNAs. All eukaryotic mRNAs destined to get a poly A tail (note: most, but not all, eukaryotic mRNAs get such a tail) contain the sequence AAUAAA about 11-30 nucleotides upstream to where the tail is added. AAUAAA is recognized by an endonuclease that cuts the RNA, allowing the tail to be added by a specific enzyme: polyA polymerase.


Eukaryotic genes are often interrupted by sequences that do not appear in the final RNA. The intervening sequences that are removed are called "introns". The process by which introns are removed is referred to as "splicing" (see FIGURE). The sequences remaining after the splicing are called "exons." All of the different major types of RNA in a eukaryotic cell can have introns. Although most higher eukaryotic genes have introns, some do not. Higher eukaryotes tend to have a larger percentage of their genes containing introns than lower eukaryotes, and the introns tend to be larger as well. The pattern of intron size and usage roughly follows the evolutionary tree, but this is only a general tendency. The human titin gene has the largest number of exons (178), the longest single exon (17,106 nucleotides) and the longest coding sequence (80,781 nucleotides = 26,927 amino acids). The longest primary transcript, however, is produced by the dystrophin gene (2.4 million nucleotides).

The location of splice sites in a pre-mRNA can be determined by comparing the sequence of genomic DNA with that of the cDNA prepared from the corresponding mRNA. Sequences that are present in the genomic DNA but absent from the cDNA represent introns and indicate the positions of exon-intron boundaries. Such analysis of a large number of different mRNAs revealed moderately conserved, short consensus sequences at intron-exon boundaries in eukaryotic pre-mRNA; in higher organisms, a pyrimidine-rich region just upstream of the 3' splice site also is common (FIGURE). The only universally conserved nucleotides are the (5')GU and AG(3') in the 5' and 3' ends of the intron and the conserved branch point A.

An intron is removed as a lariat structure in which the 5' G of the intron is joined in a unusual 2'-5'-phosphodiester bond to an adenosine near the 3' end of the intron. The adenosine is called the branch point because it forms an RNA branch in the lariat structure (FIGURE).

Remember that transcription occurs in the nucleus of eukaryotic cells. Splicing occurs in the nucleus before mRNA is exported to the cytoplasm. Splicing involves both protein and RNA factors in the nucleus. The RNA factors are called snRNAs (small nuclear RNAs). snRNAs are complexed with proteins to form what are called snRNPs (small nuclear ribonucleoprotein particles, or "snurps" as they are called). snRNPs are components of the structures that actually do the splicing. These structures are called "spliceosomes" and they include snRNPs, other proteins, and the RNA being spliced. Each snRNP usually contains a single snRNA and multiple (5-15) proteins. snRNAs involved in splicing appear to be essential (not surprisingly) to the cell. Mutations in the genes encoding them is always lethal. There are four major classes of snRNPs, named according to the snRNA they contain. These are called U1, U2, U5, and U4/U6 (the last one is the only snRNP containing two snRNAs - U4 and U6). snRNAs have distinctive secondary structures (shapes) and can form specific base pairs with precursor RNA molecules. For example, one part of U1 can form base pairs with the 5' end of the intron (called the donor site) and one part of U2 can form base pairs with sequences at the branch point (see FIGURE).

In the process of splicing, U1 is the first snRNP to bind (see FIGURE). Its binding helps the second snRNP, U2 to bind. U2 binds within the intron to a special sequence called a branch point. The branch point is a special sequence in the intron near the 3' end of the intron to which the 5' end will ultimately become attached, forming a structure described as a lariat. After U2 binds, the other snRNPs bind (U4/U6 and U5). At this time, U1 is released and the snRNPs shift positions. Finally, the covalent attachment of the 3' end of one exon to the 5' end of the next occurs, with removal of the intron sequence in the form of a lariat.

At 5' end of intron (donor site) the only strongly conserved site is the sequence GU.
At 3' end of intron (acceptor site) the only strongly conserved site is the sequence AG.
There are many GU's and AG's in RNAs that are not used in splicing, and it is not completely clear why.

Evolutionary implications of splicing

Exons often coincide with protein "domains". Domains are parts of the protein with a specific function. Exons can be readily "exchanged" between different genes by recombination. This means that new types of proteins can be formed relatively easily.

Splicing also allows a cell to "swap" exons during gene expression. For example, during development, some genes are spliced one way, and then spliced a different way later. Changing the way a mRNA is spliced changes the amino acid sequence in the protein made from it, so cells can in this way "modify" the sequence, and function, of a protein. Splicing thus offers yet another opportunity for regulation of gene expression in eukaryotic cells. Splicing can regulate whether or not a specific version of a protein is made, how much of it is made, and when it is made.

One very good example of exon shuffling can be seen in the tropomyosin gene (see
FIGURE). Tropomyosin is a protein involved in muscle-like contraction in cells. It is present in many different types of cells of the body. Examination of the tropomyosin mRNAs in different cells reveals a strikingly different arrangement of exons. Students should note that "shuffling" of exons, as shown here, has some constraints. Exons that are 3' to another exon are never placed 5' to it after splicing. Another interesting feature of the tropomyosin gene organization is the presence of two different 3' polyadenylation sites.