Transcription Notes
Outline HERE

The process of the synthesis of RNA in cells is called transcription. Transcription is a process where information in DNA is assembled into RNA using complementarity similar to that used in making double-stranded DNA. Mechanistically, transcription is similar to DNA replication, particularly in the use of nucleoside triphosphate substrates and the template-directed growth of nucleic acid chains in a 5' to 3' direction. The first nucleotide of the RNA chain retains the 5'-triphosphate group, but all subsequent nucleotides that are added to the growing chain only retain the alpha phosphate in the phosphodiester linkage. In a differentiated eukaryotic cell, very little of the total DNA is transcribed. Even in single-celled organisms, in which virtually all of the DNA sequences can be transcribed, far fewer than half of all genes may be transcribed at any time.

The mechanisms used to select particular genes and template strands for transcription operate largely at the levels of initiation and termination of transcription, through the actions of proteins that contact DNA in a highly site-specific manner. RNAs are modified ('processed') after synthesis in many cases, particularly in eukaryotic organisms.


Three major types of RNA are found in cells--ribosomal RNA (rRNA), transfer RNA (tRNA), and messenger RNA (mRNA). The major RNA types function in ribosome structure/function, translating the genetic code, and carrying the message to be translated, respectively. mRNA is a small percentage of the bulk of total cellular RNA (1% to 3% in bacteria). A fourth type of RNA called snRNA is present in eukaryotic cells. It functions to aid in the splicing of RNAs (see below).

 RNAs in E. coli
 Type Abbreviation Function

Size in nucleotides

 Transfer RNA tRNA Carries activated amino acid

about 75

 Ribosomal RNA 5S rRNA
16S rRNA
23S rRNA
Ribosome component
Ribosome component
Ribosome component


 Messenger RNA mRNA Codes for proteins


Bacterial genes are organized in clusters under common regulation. These clusters are called operons and are controlled by binding of proteins to specific sequences in the DNA called regulatory elements.

RNA Polymerases

RNA polymerase is an enzyme that makes RNA, using DNA as a template. RNA polymerase uses the nucleoside triphosphates, ATP, GTP, CTP, and UTP (uridine triphosphate) to make RNA. The nucleoside bases adenine, guanine, cytosine and uracil pair with the bases thymine, cytosine, guanine, and adenine, respectively, in DNA to make RNA. Like DNA polymerase, RNA polymerases catalyze polymerization of nucleotides only in the 5' to 3' direction. Unlike DNA polymerases, however, RNA polymerases do not require a primer to initiate synthesis.

Like DNA replication and (as we shall see) protein synthesis, transcription occurs in three distinct phases-initiation, elongation, and termination. Initiation and termination signals in the DNA sequence punctuate the genetic message by directing RNA polymerase to specific genes and by specifying where transcription will start, where it will stop, and which strand will be transcribed. The signals involve instructions encoded in DNA base sequences mediated by interactions between DNA and proteins. In prokaryotes, RNA polymerase finds DNA control sequences called promoters, binds to them, unwinds a short stretch of DNA and begins polymerization of RNA without the need for a primer to start the process. It also detects (by itself or with assistance of other proteins) the termination sequence to stop transcription. Proteins (called transcription factors) that bind at or near to prokaryotic (and eukaryotic) promoters interact with RNA polymerase. Some transcription factors act positively (activators) to assist RNA polymerase to begin RNA synthesis at a particular site, whereas others act negatively (repressors) to inhibit the ability of RNA polymerase to act at that site.

A single RNA polymerase catalyzes the synthesis of all three E. coli RNA classes--mRNA, rRNA, and tRNA. This was shown in experiments with rifampicin, an antibiotic that inhibits prokaryotic RNA polymerase in vitro and blocks the synthesis of mRNA, rRNA, and tRNA in vivo. Another antibiotic that inhibits transcription i s actinomycin D, that acts by binding specifically to double-stranded DNA and prevents it from acting as a template for transcription. Eukaryotes contain three distinct RNA polymerases, one each for the synthesis of the three larger rRNAs, mRNA, and small RNAs (tRNA plus the 5S species of rRNA). These are called RNA polymerases I, II, and III, respectively. The eukaryotic RNA polymerases differ in their sensitivity to inhibition by -amanitin, a toxin from the poisonous Amanita mushroom. RNA polymerase II is inhibited at low concentrations, RNA polymerase III is inhibited at high concentrations, and RNA polymerase I is quite resistant.

The maximum rate of polymerization of the DNA polymerase III holoenzyme (about 500 to 1000 nucleotides per second), is much higher than the chain growth rate for bacterial transcription and, not surprisingly, RNA polymerase (50 nucleotides per second). Although there are only about 10 molecules of DNA polymerase III per E. coli cell, there are some 3000 molecules of RNA polymerase, of which half might be involved in transcription at any one time.

Replicative DNA chain growth is rapid but occurs at few sites, whereas transcription is much slower, but occurs at many sites. The result is that far more RNA accumulates in the cell than DNA. Like the DNA polymerase III holoenzyme, the action of RNA polymerase is highly processive. Once transcription of a gene has been initiated, RNA polymerase rarely, if ever, dissociates from the template until the specific signal to terminate has been reached.

Another important difference between DNA and RNA polymerases is the accuracy with which a template is copied. With an error rate of about 10-5, RNA polymerase is far less accurate than replicative DNA polymerase holoenzymes, although RNA polymerase is much more accurate than would be predicted from Watson-Crick base pairing alone. Recent observations suggest the existence of error-correction mechanisms. In E. coli, two proteins, called GreA and GreB, catalyze the hydrolytic cleavage of nucleotides at the 3' ends of nascent RNA molecules. These processes may be akin to 3' exonucleolytic proofreading by DNA polymerases.

E. coli RNA Polymerase Subunits

E. coli RNA polymerase is a multi-subunit protein. Two copies of the subunit are present, along with one each of , ', and  , giving an Mr of about 450,000 for the holoenzyme. The subunit helps the polymerase to identify the promoter sequence in the DNA in initiation of RNA synthesis and then dissociates. The core enzyme consists of 2'). Subunit is the target for rifampicin inhibition is the subunit with the catalytic site for chain elongation. The catalytic site of RNA polymerase resembles that of DNA polymerase, having two metal ions.

Promoters and Promoter Selection

In E. coli, rates of transcription initiation vary enormously--from about one initiation every 10 seconds for some genes to as infrequently as once per generation (30 to 60 minutes) for others. Because all genes in bacteria are transcribed by the same core enzyme, variations in promoter structure must be largely responsible for the great variation in the frequency of initiation. Variations in promoter structure represent a simple way for the cell to vary rates of transcription from different genes.

By analyzing DNA sequences ahead of genes, it is possible to identify common sequence features of promoter regions. For instance, near position -10 (position +1 is the start site of transcription), a common sequence motif is present in E. coli that is close to (or exactly) the sequence TATAAT on the sense strand (nontranscribed DNA strand). Another region of conserved nucleotide sequence is centered at nucleotide -35, with a consensus sequence of TTGACA. In general, the more closely these regions in a promoter resemble the consensus sequences, the more efficient that promoter is in initiating transcription.

The subunit plays an important role in directing E. coli's RNA polymerase to bind to template at the proper site for initiation--the promoter site--and to select the correct strand for transcription. The addition of to core polymerase reduces the affinity of the enzyme for nonpromoter sites by about 104, thereby increasing the enzyme's specificity for binding to promoters. In at least some cases, gene expression is regulated by having core polymerase interact with different forms of , which would in turn direct the holoenzyme to different promoters. For example, the subunit with a mass of 70 kiloDaltons recognizes general promoter sequences (called consensus sequences), but when the temperature rises, the 32 kiloDalton unit is synthesized. It directs the RNA polymerase to promoters of genes to deal with the heat shock. The promoters in both cases are in the region of DNA about 10 base pairs ahead of the starting point of polymerization by the RNA polymerase (called the transcriptional start point). These sequences are also called the -10 sequences. Other factors allow E. coli to respond to other environmental changes.

Promoter/Polymerase Complexes

The first step in transcription is binding of RNA polymerase to DNA, followed by migration to the promoter.

1. RNA polymerase finds promoters by a search process, in which the holoenzyme binds nonspecifically to DNA, with low affinity, and then slides along the DNA, without dissociation from it, until it reaches a promoter sequence, to which it binds with much higher affinity. factor is essential for this search, because the core enzyme does not bind to promoters more tightly than to nonpromoter sites. Binding to DNA and then moving along it reduce the complexity of the search for the promoter from three dimensions to one, just as finding a house becomes simpler once you find the street upon which that house is located.

2. The initial encounter between RNA polymerase holoenzyme and a promoter generates a closed-promoter complex. Whereas DNA strands unwind later in transcription, no unwinding is detectable in a closed-promoter complex. Footprinting studies show that polymerase contacts DNA from about nucleotide -55 to -5, where +1 represents the first DNA nucleotide to be transcribed.

3. RNA polymerase unwinds about 17 base pairs of DNA, from about, giving an open-promoter complex, so-called because it binds DNA whose strands are open, or unwound. This highly temperature-dependent reaction occurs with half-times of about 15 seconds to 20 minutes, depending upon the structure of the promoter. A Mg2+-dependent isomerization next occurs, giving a modified form of the open- promoter complex with the unwound DNA region extending from -12 to +2. It is worth noting that negative supercoiling is consistent with unwinding of DNA. It should come as no surprise, therefore that enzymes, such as topoisomerase II, which introduces negative supercoils in DNA, favors transcription of many genes. Interestingly, one of the genes INHIBITED by action of topoisomerase II is the topoisomerase II gene itself. Thus, this serves as an autoregulatory system.


After RNA polymerase has bound to a promoter and formed an open-promoter complex, the enzyme is ready to initiate synthesis of an RNA chain. One nucleoside triphosphate binding site on RNA polymerase is used during elongation. It binds any of the four common ribonucleoside triphosphates (rNTPs). Another binding site is used for initiation. It binds ATP and GTP preferentially. Thus, most mRNAs have a purine at the 5' end.

1. Chain growth begins with binding of the template-specified rNTP at the initiation-specific site of RNA polymerase,

2. The next nucleotide binds at the elongation-specific site.

3. Nucleophilic attack by the 3' hydroxyl of the first nucleotide on the (inner) phosphorus of the second nucleotide generates the first phosphodiester bond and leaves an intact triphosphate moiety at the 5' position of the first nucleotide.

Most initiations are abortive, with release of oligonucleotides 2 to 9 residues long. It is not yet clear why this happens.

During transcription of the first 10 nucleotides, the subunit dissociates from the transcription complex, and the remainder of the transcription process is catalyzed by the core polymerase. Once has dissociated, the elongation complex becomes quite stable. Transcription, as studied in vitro, can no longer be inhibited by adding rifampicin after this point, and virtually all transcription events proceed to completion.

Unwinding and Rewinding of DNA

During elongation, the core enzyme moves along the duplex DNA template and simultaneously unwinds the DNA, exposing a single-strand template for base pairing with incoming nucleotides and with the nascent transcript (the most recently synthesized RNA). It also rewinds the template behind the 3' end of the growing RNA chain. About 18 base pairs of DNA are unwound to form a moving "transcription bubble." As one base pair becomes unwound in advance of the 3' end of the nascent RNA strand, one base pair becomes rewound near the trailing end of the RNA polymerase molecule. A structure with RNA polymerase separates the RNA from the DNA after about 8 base pairs of the 3' end of the nascent transcript are hybridized to the template DNA strand.

Irregular movement

(This section is not described in your book) RNA polymerase often advances through DNA discontinuously, holding its position for several cycles of nucleotide addition and then jumping forward by several base pairs along the template. RNA polymerase "pauses" when it reaches DNA sequences that are difficult to transcribe in vitro, often sitting at the same site for several minutes before transcription is resumed. At such sites, RNA polymerase often translocates backward, and in the process the 3' end of the nascent transcript is displaced from the catalytic site of the enzyme. When this happens, a 3' "tail," is created which may be several nucleotides long and is not base-paired to the template, protruding downstream of the enzyme. In order for transcription to resume, the 3' end of the RNA must be positioned in the active site of the RNA polymerase. This is evidently the main function of the RNA 3' cleavage reactions catalyzed by the GreA and GreB proteins, which have been shown to stimulate a transcript cleavage activity intrinsic to the polymerase. These observations suggest that RNA polymerase movement generally moves forward until one of these special sequences is reached, or perhaps, until a transcription insertion error generates a DNA-RNA mispairing that weakens the hybrid and allows backtracking.

Overwinding in front of the transcription bubble (which puts in positive supercoils) is removed by the action of gyrase (also known as topoisomerase II, which puts in negative supercoils). Likewise, topoisomerase I (which relaxes negative supercoils) eliminates the underwinding (negative supercoils) behind the transcription bubble.

Transcription Termination

In bacteria two distinct types of termination events have been identified-those that depend on the action of a protein termination factor, called (rho), and those that are factor-independent.

Factor-independent termination - Sequencing the 3' ends of genes that terminate in a factor-independent manner reveals the following two structural features shared by many such genes:

1. Two symmetrical GC-rich segments in the transcript have the potential to form a stem--loop structure

2. A downstream run of four to eight A residues.

These features suggest the following as elements of the termination mechanism:

1. RNA Polymerase slows down, or pauses, when it reaches the first GC-rich segment, because the stability of G-C base pairs makes the template hard to unwind. In vitro, RNA polymerase does pause for several minutes at a GC-rich segment.

2. The pausing gives time for the complementary GC-rich parts of the nascent transcript to base-pair with one another. In the process, the downstream GC-rich segment of the transcript is displaced from its template. Hence, the complex of RNA polymerase, DNA template, and RNA is weakened. Further weakening, leading to dissociation, occurs when the A-rich segment is transcribed to give a series of AU bonds (which are very weak), linking transcript to template.

The actual mechanism of termination is more complex than just described, in part because DNA sequences both upstream and downstream of the sequence also influence termination efficiency. Moreover, not all pause sites are termination sites.

Factor-dependent termination - Factor-dependent termination sites are less frequent than factor-independent termination sites, and the mechanism of factor-dependent termination is complex. The protein, a hexamer composed of identical subunits, has been characterized as an RNA--DNA helicase and contains a nucleoside triphosphatase activity that is activated by binding to polynucleotides. Apparently acts by binding to the nascent transcript at a specific site near the 3' end, when RNA polymerase has paused. Then moves along the transcript toward the 3' end, with the helicase activity unwinding the 3' end of the transcript from the template (and/or the RNA polymerase molecule), thus causing it to be released.

Involvement of NusA protein - It is not clear what causes RNA polymerase to pause at -dependent termination sites. However, the action of another protein, NusA, is somehow involved. The NusA protein evidently associates with RNA polymerase, and there is reason to believe that it binds at some point in transcription after the factor has dissociated, because the two purified proteins compete with each other for binding to core RNA polymerase.


Further insight into termination mechanisms has come from an extensively studied regulatory mechanism called attenuation (not described in your book). Attenuation occurs in bacterial some operons undergoing simultaneous transcription/translation. Pausing of the translational machinery in the process affects the stability of the RNA/DNA hybrid in the RNA polymerase and can terminate the synthesis of a nascent transcript before RNA polymerase has transcribed very far.

Processing of E. coli's tRNAs and rRNAs

Ribosomal RNAs (rRNAs) and transfer RNAs (tRNAs) are created by extensive post-transcriptional processing of larger primary transcripts (sometimes also called precursor RNA or pre-RNA) by exonucleolytic and endonucleolytic cleavages. Many of the original standard bases in the tRNAs are also changed into modified bases by post-transcriptional enzymatic action. Examples include ribothymidylate and pseudouridylate. Processing enzymes for tRNAs include Ribonuclease P (generates 5' terminus of all E. coli tRNAs) and Ribonuclease III which removes the rRNA precursors (5S, 16S,a nd 23S rRNAs) by cleaving at specific sites in the primary RNA. The sugars of some rRNA nucleotides are modificed.

Amino acids are attached to the 3'-ends of tRNAs by the enzyme aminoacyl-tRNA synthetases to form aminoacyl-tRNAs (charged tRNAs). There are 86 tRNAs in E. coli. Most tRNAs are about 75 nucleotides long and have extensive secondary structure (base pairing interactions) as well as tertiary structure (not, in this case, supercoiling, but additional folding in three dimensional space). All tRNAs end with ...CCA at the 3'-end.


Eukaryotes and prokaryotes have similarities in both transcription and translational (protein synthesis) processes, but they have significant differences as well. For instance, transcription and translation are often occurring simultaneously for the same gene in E. coli, due to the lack of spatial separation of the processes. By contrast, in eukaryotes, transcription occurs exclusively in the nucleus and translation occurs in the cytosol. In addition, as noted above, eukaryotic RNAs differ from prokaryotic RNAs in being more heavily processed, as will be described below. This includes one major process (splicing) that does not occur in prokaryotes at all.

Eukaryotic RNA Synthesis

Unlike prokaryotes which have one RNA polymerase that makes all classes of RNA molecules, eukaryotic cells have three types of RNA polymerase (called RNA pol I, RNA pol II, and RNA pol III), and each type of RNA is made by its own polymerase:

RNA polymerase I makes ribosomal RNA (rRNA)
RNA polymerase II makes messenger RNA (mRNA)
RNA polymerase III makes transfer RNA (tRNA)

The three eukaryotic RNA polymerases vary in their sensitivity to the poison, -amanitin, produced by the mushroom Amanita phalloides. RNA polymerase II is VERY sensitive to the toxin, whereas RNA polymerase III is less sensitive and RNA polymerase I is not very sensitive.

RNA Polymerase I is a complex enzyme, containing 13 subunits totaling over 600,000 daltons. It is responsible for synthesizing the large 45S pre-rRNA transcript that is later processed into mature 28S, 18S, and 5.8S ribosomal RNAs (rRNAs). At least two transcription factors are known to be required, but there is no need for an elaborate transcriptional apparatus characteristic of pol II transcription, because only a single kind of gene is transcribed.

RNA Polymerase II - All of the protein-coding genes in eukaryotes are transcribed by RNA polymerase II (pol II). This enzyme also transcribes some of the small nuclear RNAs (snRNAs) involved in splicing. Like other RNA polymerases, pol II is a complex, multisubunit enzyme, but not even its numerous subunits are sufficient to allow pol II to initiate transcription on a eukaryotic promoter. To form a minimal complex capable of initiation, at least five additional protein factors are needed. The minimal unit involves the TATA binding protein, (TBP), but in vivo formation of the complex probably always uses TFIID, a multi-subunit structure incorporating both TBP and TATA binding associated factors (TAFs). RNA polymerase II is partly regulated by phosphorylation of serine and threonine residues in the carboxy terminal domain.

RNA polymerase III (pol III) is the largest and most complex of the eukaryotic RNA polymerases. It involves 14 subunits, totaling 700,000 daltons. All of the genes it transcribes are small, they are not all translated into proteins, and their transcription is regulated by certain sequences that lie within the transcribed region. The major targets for pol III are the genes for all the tRNAs and for the 5S ribosomal RNA. Like the major ribosomal RNA genes, these small genes are present in multiple copies, but they are usually not grouped together in tandem arrays, nor are they localized in one region of the nucleus. Rather, they are scattered over the genome and throughout the nucleus.

Most of the modifications that are made on RNAs are performed on mRNAs (except some chemical modification of the bases in tRNAs) and are the subject of descriptions below.

Transcriptional Apparatus

We shall talk later in the term about control of gene expression of which transcriptional regulation is one mechanism. Nevertheless, it is useful to understand some of the basic principles of transcriptional regulation here. As noted above transription starts at a sequence near to a promoter. Each of the three eukaryotic RNA polymerases recognizes specific types of promoters. RNA polymerase I uses only a single type of promoter. RNA polymerase III is unusual in recognizing promoter sequences that are sometime ahead of (in the 5' direction of) the transcriptional start site and in other genes are downstream of the transcriptional start site. RNA polymerase II promoters are simple and complex. In each case, for all of the RNA polymerases, the promoters are on the same physical strand as the genes they control and are ahead of the genes the control. This property of the promoter sequence being on the same molecule as the gene it regulates is referred to as being a cis-acting element. By contrast, proteins (apart from the RNA polymerase) often bind promoter sequences and help RNA polymerase to start (or in some cases stop) transcription. These proteins are called transcription factors and are referred to as being trans-acting elements.


Like prokaryotic promoters, eukaryotic promoters are found 5' to the tranascriptional start site and often contain a sequence rich in thymine and adenine called a TATA box. The TATA box is usually found between positions -30 to -100 of the transcriptional start site. Other sequence elements besides the TATA box are important for proper transcriptional control in eukaryotes. They include a so-called CAAT box and, in some cases, a GC box. These sequences vary somewhat in position between -40 and -150 of the transcriptional start site. These variable positions in eukaryotes are possible because proteins bind these sequences and help control transcription. By contrast, E. coli has conserved sequence elements positioned fairly strictly at -10 and -35 due to the fact that these sequences are binding sites for RNA polymerase itself.

The TATA box is necessary for strong promoter activity. This sequence is bound by a protein called the TATA-box-binding protein (TBP), which is a small component of a much larger complex that helps regulate transcription. As noted above, transcription factors are proteins that bind sequences in the promoter and help control transcription. Transcription factors interacting with RNA polymerase II in eukaryotes have names like TFIIA, TFIIB, TFIID, TFIIE, and TFIIF. The 'TF' stands for 'transcription factor' and the 'II' refers to RNA polymerase II. Note that the 'transcription factors' can be complexes of many proteins. TBP is a saddle-shaped protein that recognizes the TATA box, causing the DNA to undergo a large conformational change, including unwinding. Once TBP has bound the TATA box, it helps recruit binding of other proteins/complexes. The order of binding of the transcription factors is TFIID, TFIIA, TFIIB, TFIIF, followed by binding of RNA polymerase II and then TFIIE. This complex of complexes is called the basal transcription apparatus. Notably, TFIIF is a helicase that helps separate strands to allow RNA polymerase II to bind. During the formation of the basal transcription complex, RNA polymerase is phosphorylated near its carboxyl end - a process required for initiation of transcription.

Other Transcription Factors

Initiation of transcription does not occur efficiently by the basal transcription apparatus alone. Other transcription factors bind to other sequence sites to allow mRNA synthesis to occur with high efficiency. For example, the transcription factor called Sp1 binds to promoters with GC boxes. The CAAT box is bound by the CCAAT-binding transcription factor called CTF or NF1.


Still other control over transcription of eukaryotic genes is exerted by enhancer sequences in the DNA that are bound by still other proteins. Enhancer sequences provide a considerable amount of 'fine tuning' of transcription, being active in some cell types, but not others or active at certain times, but not others. Enhancer sequences differ from promoters in 1) not being able to activate transcription by themselves, but 2) when active increase transcription of the genes they control. Interestingly, enhancer sequences can be located up to thousands of base pairs away from the gene they regulate. They can be found on either strand relative to the gene they regulate, and they can be 5' to, 3' to, or in some cases, within the gene they regulate.

tRNA Processing (yeast)

tRNAs are modified after they are transcribed. This is necessary because the tRNA is made initially in an RNA that is longer than the final tRNA. This involves cleavage of the 5' leader sequence, removal of a middle sequence called an intron (see splicing description below), and replacement of the 3' UU sequence by CCA. In addition, several bases are chemically modified, as well.

mRNAs- Prokaryotes vs. Eukaryotes

There are significant differences in the ways that messenger RNAs (mRNAs) for protein-coding genes are produced and processed in prokaryotic and eukaryotic cells.

Prokaryotes - Prokaryotic mRNAs are synthesized on the bacterial nucleoid in direct contact with the cytosol and are immediately available for translation. The Shine-Dalgarno sequence (we will talk about later in translation) near the 5' end of the mRNA binds to a site on the prokaryotic ribosomal RNA (rRNA), allowing attachment of the ribosome and initiation of translation, often even before transcription is completed.

Eukaryotes - In eukaryotes, the mRNA is produced in the nucleus and must be exported into the cytosol for translation. Furthermore, the initial product of transcription (pre-mRNA) may include sequences (called introns), which must be removed before translation can occur. There is no ribosomal attachment sequence like the Shine - Dalgarno sequence in prokaryotes. For all these reasons, eukaryotic mRNA requires extensive processing before it can be used as a protein template. This processing takes place while mRNA is still in the nucleus.

The description below all refer to modifications to mRNAs in eukaryotes.

mRNA End Modifications

Capping - The first modification to mRNA occurs at the 5' end of the pre-mRNA (pre-mRNA is the term given to the raw mRNA before processing begins). A GTP residue is added in reverse orientation (that is, to form a 5' to 5' bond) and forms, together with the first two nucleotides of the chain, a structure known as a cap. The cap is "decorated" by the addition of methyl groups to the N-7 position of the guanine and to one or two sugar hydroxyl groups of the cap nucleotides. The cap structure serves to position the mRNA on the ribosome for translation. Capping occurs very early during the synthesis of eukaryotic mRNAs, even before mRNA molecules are finished being made by RNA polymerase II. Capped mRNAs are very efficiently translated by ribosomes to make proteins. Viruses, such as poliovirus, prevent capped cellular mRNAs from being translated into proteins. This enables poliovirus to take over the protein synthesizing machinery in the infected cell to make new viruses.

Polyadenylation - The 3'-ends of eukaryotic mRNAs are altered in a process called polyadenylation where about 250 A nucleotides are added to the 3' ends of mRNAs. The sequence 5'AAUAAA 3' acts as a signal to an endonuclease to cut the RNA and locate the polyA tail 11-30 nucleotides downstream. PolyA Polymerase is the enzyme responsible for adding the tail. The polyA tail appears to increase the efficiency with which translation of the message occurs and the length of the tail may be a factor in the stability of the mRNA outside of the nucleus.

RNA Editing

RNA editing is a process affecting mitochondrial mRNAs of some unicellular eukaryotes and a few other genes, including some, such as Apolipoprotein B (apo B), which is involved in lipid transport in lipoprotein complexes in the blood. RNA editing involves changes in the sequence of the mRNA after it is made. It can involve insertion or deletion of residues into messages during processing steps or it can involve chemical modification to change one base in a sequence to a different one. Such is the case with apo B. The apo B protein is found in two forms - a 512 kd apoB-100 (in LDLs) and a 240 kD apoB-48. The larger form is made in the liver for LDL synthesis and the smaller form is made in the intestine for chylomicron synthesis. The same RNA is used to make both proteins - one edited (apoB-48) and one unedited (apoB-100). Editing occurs in intestinal cells, but not liver cells and involves chemical alteration of a single cytosine to make a uracil. The change to a uracil creates a STOP signal for translation in the middle of the gene, giving rise to the smaller apoB-48 protein. The unedited sequence does not contain the STOP signal and translation continues much further, creating the apoB100 protein.

In trypanosomes, nearly half of the uridines in some mitochondrial mRNAs are inserted into the sequence AFTER the RNA is made. Apparently, the insertions are made by a kind of reverse splicing mechanism (see below), and only at certain points. Small RNAs, called guide RNAs, are required for the process.


Eukaryotic genes often contain sequences that are not found in the final RNA (true for mRNAs, tRNAs, and rRNAs). The process of removing the intervening sequences is called splicing. The intervening sequences that are removed in splicing are called 'introns' and the sequences that remain after splicing are called 'exons.' Splicing provides cells with a simple mechanisms of 'shuffling' functional domains of proteins (see alternative splicing below). Higher eukaryotes tend to have a larger percentage of their genes containing introns than lower eukaryotes, and the introns tend to be larger as well. The pattern of intron size and usage roughly follows the evolutionary tree, but this is only a general tendency. The human titin gene has the largest number of exons (178), the longest single exon (17,106 nucleotides) and the longest coding sequence (80,781 nucleotides = 26,927 amino acids). The longest primary transcript, however, is produced by the dystrophin gene (2.4 million nucleotides).

For short transcription units, RNA splicing usually follows cleavage and polyadenylation of the 3' end of the primary transcript. But for long transcription units containing multiple exons, splicing of exons in the nascent RNA sometimes begins before transcription of the gene is complete. The location of splice sites in a pre-mRNA can be determined by comparing the sequence of genomic DNA with that of the cDNA prepared from the corresponding mRNA. Sequences that are present in the genomic DNA but absent from the cDNA represent introns. Comparison of cDNA sequence to genomic DNA sequence of a large number of different mRNAs revealed short consensus sequences at intron-exon boundaries in eukaryotic pre-mRNA; in higher organisms, a pyrimidine-rich region just upstream of the 3' splice site also is common . The only completely conserved nucleotides are the (5')GU and AG(3') in the 5' and 3' ends of the intron, respectively, and the conserved branch point A.

Errors in splicing can cause problems, but do not occur often. More likely problems arise from mutation in DNA to alter splice junction sites, as is the case for beta-thalassemia, which arises from mutation of a single base to create a new splice site where none existed previously. (Note that your book talks about splicing as if it only occurs on mRNAs, but that is not right. Splicing occurs on tRNAs and rRNAs too.

Splicing Mechanism

The splicing process begins with a pre-RNA becoming complexed with a number of small nuclear ribonucleoprotein particles (snRNPs), which are themselves complexes of small nuclear RNAs (snRNAs) and special splicing enzymes. The snRNP--pre--mRNA complex is called a spliceosome and it is in the spliceosome where splicing occurs. snRNAs recognize and bind intron--exon splice sites by means of complementary sequences. Excision of a single intron involves assembling and disassembling a spliceosome. The sequence of reactions can be summarized as follows:

After capping, poly(A) tailing, and splicing are complete, the newly formed mRNA is exported from the nucleus, almost certainly through the nuclear pores. It is then attached to ribosomes for translation.

SNRNPs in Splicing - Six small U-rich RNAs are abundant in the nuclei of mammalian cells. Designated U1 through U6, these small nuclear RNAs (snRNAs) range in size from 107 to 210 nucleotides. The observation that the short consensus sequence at the 5' end of introns (CAG|GUAAGU) was found to be complementary to a sequence near the 5' end of the snRNA called U1 led to the suggestion that snRNAs assisted in the splicing reaction. The snRNAs associate in the nucleus with six to to ten proteins to form small nuclear ribonucleoprotein particles (snRNPs). Some of these proteins are common to all snRNPs, and some are specific for individual snRNPs.

It is estimated that at least one hundred proteins are involved in RNA splicing, making this process comparable in complexity to protein synthesis and initiation of transcription. Some of these splicing factors are associated with snRNPs, but others are not. Some proteins also exhibit sequence homologies to known RNA helicases. RNA helicases may be necessary for the base-pairing rearrangements that occur in snRNAs during spliceosomal splicing cycle, particularly the dissociation of U4 from U6 and of U1 from the 5' splice site.

Alternative Splicing

Some gene transcripts may be spliced in different ways, in different tissues of an organism or at different developmental stages. Alternative splicing of the heavy chains of immunoglobulins results in proteins that may or may not carry a hydrophobic membrane-binding domain. The protein -tropomyosin is used in different kinds of contractile systems in various cell types. A single gene is transcribed, but the specific splicing patterns in different tissues provide a variety of -tropomyosins. There are three positions at which alternative choices can be made for which exon to splice in. The choice of splice site appears to be determined by a cell-specific protein that interacts with the spliceosome. The economy of alternative splicing, given the size of the genome is significant.

Implications of Splicing

Evolutionary - Exons often coincide with protein "domains", or parts of the protein with a specific function. Splicing allows domains to be independent of each other and thus allow exchange readily over evolutionary time. This means that new types of proteins can be formed relatively easily compared to the situation in bacteria where domains are kept in a single intact unit (the polypeptide coding sequence).

Developmental Domain Switching - Splicing permits cells to "swap" exons during differential gene expression. For example, during development, some genes are spliced one way, and then spliced a different way later (see Tropomyosin expression above). Changing the way a mRNA is spliced changes the amino acid sequence in the protein made from it, so cells can, in this way, "modify" the sequence, and function, of a protein as the needs of a cell change. Splicing thus offers yet another opportunity for regulation of gene expression in eukaryotic cells besides that of transcriptional control.

Catalytic RNA

A remarkable ability of some RNAs to catalyze reactions (like enzymes) was discovered that involved splicing and processing of tRNAs. Such catalytic RNAs are known as ribozymes and the list of catalytic RNAs has grown and includes the catalytic activity in ribosomes responsible for making peptide bonds. The ribozyme involved in tRNA processing is known as ribonuclease P and it removes nucleotides from the 5' end of the precursor molecule. Another interesting catalytic RNA activity is that involving a self-splicing intron of Tetrahymena (a ciliated protozoan). The process requires only an added guanosine residue. In the mechanism, G binds to the RNA and attacks the 5' splice site to form a phosphodiester bond with the 5' end of the intron, generating a 3' hydroxyl at the end of the upstream exon that attacks the 3' splice. This second reaction joins the exons together and leads to release of the intron. The folding of the RNA in the intron is important for its function (not unlike the situation in enzymes where the folding of the protein is essential for its function).

Self-splicing is now known to occur in introns of other species, including yeast, fungi, and Chlamydomonas. Group I self-splicing introns are mediated by a guanosine factor, whereas Group II self-splicing RNAs require the 2'-hydroxyl group of an adenylate in the intron. Self-splicing resembles splicing by spliceosomes in that the first step involves attack of the 5' splice site by a ribose hydroxyl. The newly formed 3'-hydroxyl then attacks the 3' splice junction to form a phosphodiester bond with the downstream exon. Group II splicing is more closely related to spliceosome splicing and may be an intermediate process between the two.