DNA Replication / Recombination / Repair Notes
Outline HERE


The structure of one form of DNA was discovered by James Watson and Francis Crick (using data from Rosalind Franklin) in 1953. The form of DNA they described, the B form, is the most common one found inside of cells. It consists of two strands (double helix) of polymers of nucleotides oriented opposite to each other (antiparallel). On the "outside" of the double helix are located the sugars and phosphates of the nucleotides. On the "inside" of the double helix are located the nitrogenous bases. The arrangement of bases is such that cytosines are always adjacent to (paired) guanines and adenines are always adjacent to thymines (and vice versa). Hydrogen bonds between the bases hold the double helix together. Adenine-thymine pairs have two hydrogen bonds between them in B DNA and guanine-cytosie pairs have three hydrogen bonds between them. The complementary nature of the DNA strands is such that information in one strand is sufficient to specify how to make the other strand.

Besides B-DNA, two other forms of DNA have been widely studied. They are the A-form of DNA (discovered by Rosalind Franklin) and the Z-form of DNA (discovered by Alexander Rich's laboratory). Both A and Z DNA forms are favored by specific DNA sequences. The Z form is favored by alternating stretches of pyrimidine/purine nucleotides in a strand. High salt concentrations are also important for Z and A DNA formation. Conditions that favor formation of the A form of DNA include dehydrating conditions (not commonly found in cells). In addition, it is worth noting, that duplex RNAs and some DNA/RNA hybrids assume the A configuration. It is also worth noting that the absence of the oxygen in deoxyribose of DNA is essential for its ability to assume the B form. An oxygen at position #2 of the sugar has sterical limitations in B DNA. Note that the A and B forms of DNA have their helix arranged in a right-handed form, whereas the Z form of DNA has its helix in a left-handed configuration.

DNA's structure has several important aspects. First, The double helix of B DNA has two significant features, called the major and the minor groove. The grooves contain potential hydrogen-bond donor and acceptor sites to allow proteins to recognize and bind to specific sequences. The major groow, which is wider, contains more components that can be recognized than the does minor groove and is also wider, facilitating easier DNA-protein interactions. DNA/protein interactions are critical for processes, such as DNA replication and transcription (synthesis of RNA from DNA).

DNA Polymerases

Enzymes that replicate DNA using a DNA template are called DNA polymerases. However, there are also enzymes that synthesize DNA using an RNA template (reverse transcriptases). Most organisms have more than one type of DNA polymerase (for example, E. coli has five DNA polymerases), but all work by the same basic rules.

1. Polymerization occurs only 5' to 3'
2. Polymerization requires a template to copy: the complementary strand.
3. Polymerization requires 4 dNTPs: dATP, dGTP, dCTP, dTTP
4. Polymerization requires a pre-existing primer from which to extend. The primer is RNA in most organisms, but it can be DNA in some organisms; very rarely the primer is a protein in the case of certain viruses only.
Some examples of DNA polymerases

Polymerase Name



 Other activities

 Other features

 E. coli DNA pol I



 3'-5' exo, 5'-3' exo


 E. coli DNA pol I (Klenow fragment)



 3'-5' exo

 C-terminal fragment

 E. coli DNA pol III



3'-5' exo (on a separate subunit)

 multimeric structure

 Taq pol



extendase (adds 3'-A overhangs)

thermostable, used in PCR

 reverse transcriptase



(ribonuclease H)

used to make cDNA
E. coli DNA Polymerase I

DNA Polymerase I from E. coli was the first DNA polymerase characterized. There are approximately 400 molecules of the enzyme per cell. E. coli DNA polymerase I is abbreviated pol I. The enzyme is a single large protein with a molecular weight of approximately 103 kDa (103,000 grams per mole). The enzyme requires a divalent cation (Mg++) for activity and has three enzymatic activities associated with it:

1. 5'-to-3' DNA Polymerase activity
2. 3'-to-5' exonuclease (Proofreading activity)
3. 5'-to-3' exonuclease (Nick translation activity)
The location of the three enzymatic activities within the protein is known, and it is possible to remove the 5'-to-3' exonuclease activity using an enzyme called a protease to cut pol I into two protein fragments (see Klenow fragment of polI in Table above; both the polymerization and 3'-to-5' exonuclease activities are on the large Klenow fragment of polI, and the 5'-to-3' exonuclease activity is on the small fragment). The 3D structure of the Klenow fragment reveals a shape common to other DNA polymerases - resembling a right hand with fingers and a thumb to "grip" the DNA and a "palm" area where catalysis occurs. Note that DNA polymerase interacts with DNA through interactions with the minor groove

DNA polymerase I requires dNTPs, a template DNA, and a primer from which to initiate replication. The enzyme polymerizes deoxyribonucleotides into DNA in the 5' to 3' direction using the complementary strand as a template. Newly synthesized DNA is covalently attached to the primer, but only hydrogen-bonded to the template. The template provides the specificity according to Watson-Crick base pairing. Specifity of replication is a function of shape selectivity. What is meant by this is the fact that binding of a dNTP by the polymerase induces a conformation change in the protein, forming a tight pocket for the potential base pair. The conformational change is only possible when the correct nucleotide is bound by the enzyme. In the catalysis reaction, only the alpha phosphate (closest to the deoxyribose) of the dNTP is incorporated into newly synthesized DNA. The other two phosphates are released as a pyrophosphate (and this may help to drive DNA polymerization forward).


As noted above, DNA Polymerase I (and many other, but not all DNA polymerases) contains an enzymatic activity known as a 3' to 5' exonuclease. This activity (also called proofreading) provides an important additional step of "quality control" to the replication process to reduce the likelihood of incorporation of an incorrect nucleotide during DNA synthesis. Briefly, incorporation of an incorrect nucleotide causes the DNA polymerase to stop its forward progress (stalling) to facilitate the unwinding of the strand containing the incorrect nucleotide into the exonuclease site. The exonuclease site cleaves the nucleotide off, the DNA rewinds, and the polymerization re-occurs to incorporate the correct nucleotide. Proofreading activity helps DNA polymerases improve the accuracy of the polymerization by a factor of about 1000. It is worth noting that DNA polymerases, such as the HIV reverse transcriptase, that lack the exonuclease are far more prone to mutation. In the case of HIV, the virus uses this mutational tendency to evolve drug resistance.

Winding/Unwinding DNA

For DNA polymerization to proceed, DNA strands must be separated ahead of the newly synthesized strand. Unwinding of DNA is the function of enzymes known as helicases that bind single stranded DNAs ahead of the region of polymerization. The unwinding of DNA process they catalyze must occur ahead of the replication fork at least as fast as the rate of replication. In E. coli, this corresponds to about 1000 bases per second (100 turns of the helix per second) or a rate of unwinding of at least 6000 revolutions per minute (RPM)! To unwind DNA, helicase uses energy from hydrolysis of ATP to slide along the DNA, alternately bringing the A1 and B1 domains together and then springing them apart apart (not unlike the ends of an inchworm). The result of this action is that the enzyme "slides" along the single strand to which it is attached, unwinding duplex DNA ahead of it. When the sequences of a large number of known helicases are compared, seven common domains can be identified between them.

Unwinding one region of DNA causes overwinding of the region ahead of it (see below also). Unless compensated for, such overwinding will eventually cause the DNA ahead of the replication fork to kink up and become impassible to the replication enzymes. When thinking of the winding of DNA, it is useful to remember that there are three possible windings - overwound (too many base pairs per turn), underwound (too few base pairs per turn), and relaxed (exactly the number found in B-DNA). Note that linear DNA rests in the relaxed configuration if no external forces are applied to it. If one takes the ends of that linear DNA and joins them to each other (ligation), one creates a relaxed circle. If, instead, one takes the linear DNA, holds one end, and then unwinds the other end by two turns and then forms the circle, an unwound region can be seen. This circle is underwound and will attempt to relieve the pressures of underwinding by supercoiling, much like an over or undercoiled rubber band. Note that in supercoiling of the circular molecule that it recreated the number of twists, restoring the molecule to the number of twists it had in the relaxed stated. This was made possible by the two crossings over itself that the molecule did (called writhes). As noted previously, the molecule supercoiled to produce the writhes (written with a minus sign to indicate underwinding. If the action had been taken to PUT IN right hand turns instead of remove them, the writhes would have a positive sign. An equation describes the relationship between twists and writhes by describing a third quantity called the linking number (Lk). This relationship is as follows:

Lk = Tw + Wr

The linking number is the total number of times the two strands cross over each other. In a relaxed molecule, the only times the strands cross each other are in the twists of the double helix, so

25 = 25+ 0, since there is no writhing.

However, in a supercoiled molecule, the linking number is 23, since the number of writhes is -2:

23 = 25 + -2

Two molecules that differ from each other only in their linking numbers are called topoisomers.


Note that what was happening in DNA molecule above as a result of action of the helicases was the addition of twists ahead of the replication fork. If these are not relieved, the DNA molecule will writhe to relieve the stress. This is the source of the potential kinking of the DNA. To prevent the DNA from doing this, there are enzymes known as topoisomerases that relieve the positive twists put into the DNA by the helicases by putting negative ones in to compensate. Topoisomerases, therefore, act to change the linking number of DNA molecules by changing the twists in a DNA. Note that most DNAs in cells are kept in a slightly supercoiled state. This partly allows them to fit into a smaller space.

We will be concerned here with two types of topoisomerase - types I and II. Type I topoisomerases act to relax DNA by cleaving a strand and allowing it to swivel around the other strand until the DNA is relaxed. This process is energetically favorable and does not require input of energy. Type II topoisomerases, by contrast, use energy from ATP hydrolysis to create negative supercoils in DNA. A double stranded break is introduced in one portion of a DNA (G segment) and another portion of the same DNA (T segment) is passed through the break. By moving the T segment through the break (try it with a rubber band), negative supercoils can be introduced. Note that hydrolysis of ATP is necessary to return the complex to its original state. The bacterial topoisomerase II is also known as gyrase and is the target of antibiotics that inhibit the enzyme more than its eukaryotic counterparts. Such antibiotics include novobiocin (blocks ATP binding by gyrase) and nalidixic acid or ciprofloxacin, which interfere with the breakage and rejoining of the DNA strands. An antitumor agent, camptothecin inhibits human topoisomerase I.

Starting DNA Replication

Replication of DNAs in cells starts at specific sequences called origins. In the E. coli chromosome, this region of DNA is known as the oriC locus. OriC has a length of 245 base pairs (bp) and has four repeats of a sequence that serves as a binding site for an initiation protein known as dnaA. Binding of dnaA to the origin causes the DNA to unwind slightly, allowing a helicase, known as dnaB to begin unwinding the duplex. As the single stranded DNA regions are exposed, another set of proteins, known as SSB (single stranded binding protein) binds the DNA. This complex of proteins and DNA is known as the pre-priming complex.

As noted above, all DNA polymerases require a primer to begin replication from. For many of them, the primer comes in the form of an RNA made by an enzyme known as primase that uses a DNA template to make RNA. Such an enzyme is a type of RNA polymerase and RNA polymerases differ significantly from DNA polymerases in not requiring a primer. When the primase joins the pre-priming complex, the complex is referred to as the primosome. The RNA primer made by the primase is only a few nucleotides in length. The RNA primer must be removed later by an enzyme and in E. coli, the enzyme is the 5' to 3' exonuclease of DNA polymerase I, described above.

Leading Versus Lagging Strand Replication

Replication of the two strands of a double helix occurs by very different mechanisms on each strand. This is due to the nature of DNA replication occurring ONLY in the 5' to 3' direction. Synthesis of DNA on the leading strand occurs uninterrupted, but synthesis of DNA on the lagging strand occurs in pieces as the leading strand synthesis exposes more of the other strand to copy. The pieces of DNA on the lagging strand are called Okazaki fragments after the person who discovered them. Each Okazaki fragment must start with a primer and each primer must be removed by DNA polymerase I's 5' to 3' exonuclease. The gaps between the Okazaki fragments after the primers are remove are readily filled in by DNA polymerases, but an additional enzymatic activity (DNA ligase) is necessary to covalently link the fragments into one contiguous piece. A schematic representation of the mechanism of action of E. coli DNA ligase is shown in. In bacterial DNA ligase, NAD is the source of the AMP. A related DNA ligase in the bacteriophage T4 (called T4 DNA ligase) uses ATP as the source of AMP and is commonly used in biotechnology to join two different DNA molecules, forming a recombinant.

E. coli DNA Polymerase III

DNA polymerase I of E. coli, described above, has two properties that make it unfit for replicating the chromosome of the organism. First, the enzyme replicates DNA relatively slowly - only 20 nucleotides/second, much slower than the rate of 1,000 nucleotides/second measured for the replication of E. coli DNA. Second, the enzyme has a low processivity - a measure of the tendency of the enzyme to remain on the DNA as replication proceeds. Hence, DNA polymerase I jumps on and hops off the DNA with relatively high frequency. This trait is useful for fixing the gaps between Okazaki fragments, but is not very useful for CREATING the leading or lagging strand. Instead, the E. coli chromosome is replicated for the most part by DNA Polymerase III holoenzyme (a complex of several proteins). DNA polymerase III is very processive (stays on the DNA) and replicates the chromosome at about 1000 nucleotides per second. One of the subunits of the holoenzyme called Beta2 is responsible for the high processivity of the enzyme. The protein has a ring structure and it is in the middle gap where the DNA duplex is held. The protein is commonly known as a beta clamp and versions of the beta clamp are present in virtually all other cells as well, including humans. Thus, the beta clamp holds the Polymerase III to the DNA, giving it great processivity.

A remarkable feature of the DNA Polymerase III holoenzyme system is that it replicates BOTH the leading AND the lagging strands at the replication fork. Note how the leading strand (red) is synthesized continuously, while the lagging strand is synthesized in pieces, each one starting with a green primer laid down by primase. By looping the lagging strand, the polymerase can be made in the same direction as the leading strand. Synthesis of the lagging strand takes more time than the leading strand, so the loop of the lagging strand can grow and shrink as the leading strand synthesis speeds and slows. Remember that the gaps between the Okazaki fragments of the lagging strand are filled in (and the primers removed) by DNA Polymerase I and DNA ligase connects the pieces in the final step.

Eukaryotic DNA Replication

DNA replication in eukaryotic cells is more complicated than that of prokaryotic cells. There are several reasons. First, the DNA is longer in eukaryotic cells (6 billion bp in humans versus 4.8 million in E. coli). In addition, E. coli has a single chromosome, whereas humans have 23 pairs. Human chromosomes are linear, not circular, like they are in E. coli, and this too adds complexity for replication.

To deal with the larger DNA, eukaryotes use more replication origins (about 1 per 30,000 to 300,000 bp in eukaryotes versus 1 per 4.8 million in E. coli). Coordination of all of these replication origins in eukaryotes to ensure that they only operate once per cell division is accomplished by tying replication to the cell cycle. The coordination involves several checkpoints to control progression through the cycle. In yeast, the DNA origin sequences are called autonomously replicating sequence (ARS) and it is the target for docking of the origin of replication complex (ORC). ORC is composed of six proteins that recruit other proteins called licensing factors to form a pre-replication complex. Licensing factors are marked for destruction after forming the complex, insuring that they are each only used once. DNA helicases separate the parental DNA strands and a single stranded binding protein (replication protein A) binds to them. Replication begins with binding of polymerase alpha - an enzyme with both RNA primase and DNA polymerase activity. It adds about 20 deoxynucleotides onto the end of the primer and then a complex involving polymerase delta and a clamp (called PCNA in eukaryotic cells) binds and highly processive polymerization begins. As in E. coli, leading and lagging strand synthesis progresses until adjacent replicons meet and fuse. RNA primers are removed, gaps filled, and pieces are linked together by DNA ligase.


The ends of the linear eukaryotic chromosomes contain thousands of copies of a short sequence (AGGGTT in humans) that is placed there by an enzyme called telomerase. Telomeres have been of tremendous interest because they shorten as cells grow older, due to the inactivity of telomerase in most differentiated cells after birth. With each generation, the linear chromosome loses a few bases due to the inability of the replication machinery (which does NOT include telomerase) to replicate the far 5' ends of each strand. Consequently, some have referred to telomeres as a "time bomb" that may determine the lifetime of cells such that when the telomere is gone, the cell may die. One theory is that cells with longer telomeres have the potential, therefore to live longer than cells with shorter telomeres. Interestingly, two types of cells contain an active telomerase to build and lengthen telomeres. These cells are embryonic cells and tumor cells. Tumor cells may require telomerase to be able to divide uncontrollably. Normal cells, when plated in tissue culture die after a few generations, but tumor cells do not and appear able to divide forever.

Telomerase Activity

The enzyme telomerase is responsible for lengthening telomeres. It does so by carrying with it an RNA molecule that is complementary to a portion of the DNA at the 3' end of the one of the strands of the telomere. The RNA component of telomerase base pairs with the 3' overhang of the telomere to provide an RNA template to be copied by the polymerase in the telomerase. The telomerase enzyme is therefore a so-called reverse transcriptase - an enzyme that uses and RNA template to copy to make DNA. Reverse transciptases are also found in retroviruses, such as HIV that have an RNA genome that gets converted to DNA during their life cycle.

A single stranded overhang of the G-rich strand of a telomere (in red) invades a duplex to form a large duplex loop, thus protecting the end from damage, while marking the end of the chromosome.


DNA molecules are not fixed entities changed only by mutations arising during replication. Rather, they are capable of interacting with each other during cellular processes, such as meiosis, and, with the assistance of proteins, exchange pieces between each other. This phenomenon is known as genetic recombination. Scientists in laboratories too cause DNA molecules to "swap pieces" with each other as well, and this is often called recombinant DNA technology. Though the mechanisms employed in the laboratory are different from the cellular mechanisms in most cases, the results can be the same - properties previously carried on separate DNA molecules are joined on the same DNA molecule.

Cellular recombination is important for several reasons. In meiosis, pairing and exchange occurs between related sequences on paired chromosomes, helping to create genetic diversity. Though many of the traits you have you inherited from your parents, there are also genes that you have that are not exactly the same as either of your parents, but rather result from mixtures of the two. Recombination is a CRITICAL mechanism for generation of billions of different antibodies. We shall talk more about that later in the term. Some viruses, such as HIV, recombine with the host chromosome as an important step in their life cycle. Cellular recombination is also exploited by biotechnologists to create genetically modified organisms.

Most (but not all) recombination that happens in cells occurs between sequences that are related to (similar in sequence to) each other. This type of recombination is called homologous recombination. Enzymes that assist in the exchange of DNA sequences are called recombinases. One way in which recombination appears to occur in cells is through what is called a Holliday junction, named after Robin Holliday, who proposed them in 1964. The process starts with formation of recombination synapse followed by cleavage of one strand of each duplex. Bonds are formed to join the separate DNAs, making a Holliday junction. These can then isomerize (reorientation of chains in the middle). Subsequent strand cleavage and rejoining results in recombined DNAs.

Recombinases and topoisomerases share structural similarities, including tyrosine residues that participate in the DNA cleavage reactions that both enzymes catalyze. In a sense, recombinases catalyze intermolecular topoisomerase-type reactions.

DNA Mutation

Mutation involves alteration in DNA sequences. These can arise by a variety of mechanisms and can result in 1) subsitutions, 2) insertions of extra sequence, or 3) deletions of existing sequence. Substitutions come in two types. Transition mutations involve replacement of one purine for another or one pyrimidine for another. Transversion mutations involve replacement of a purine by a pyrimidine (or vice versa). Several mechanisms can result in mutation. They include errors made by the polymerase during replication as a result of tautomerization. Chemicals too can cause mutation. A brominated form of uracil can readily tautomerize, allowing it to pair with guanine. Alteration of bases in DNA by chemical treatment, such as with nitrous acid can result in conversion of adenine to hypoxanthine, which can pair with cytosine, whereas the original adenine base would have paired with thymine.

Aflatoxin B1 is a compound from molds that can become mutagenic as a result of action of the P450 enzyme system. In this case, an epoxide is created that is very reactive with the N-7 atom of guanine, forming an adduct that can lead to a G to T transversion.

Yet another mutational mechanism (and a VERY common one among people frequenting tanning booths) is that of pyrimidine dimers (usually called thymine dimers). These are two adjacent pyrimidines in the same strand that form covalent bonds between each other as a result of exposure to ultraviolet light. Pyrimidine dimers do NOT fit well in a double helix, causing DNA replication and gene expression to be blocked until the problem is (hopefully) repaired. Pyrimidine dimers may be a major reason why people who tan tend to have more skin cancer.

DNA Repair

DNA has a special need for metabolic stability. Its information content must be transmitted virtually intact from one cell to another during cell division or reproduction of an organism. The chemical stability of DNA is maintained in the following two ways:

Error reduction systems include proofreading of DNA polymerase (see above) and the cellular repair systems below: Direct Repair

Direct repair of damaged DNA involves conversion of the altered base back to its original form. A prime enzyme in this regard is E. coli's DNA photolyase. This particular enzyme uses light energy to undo the bonds in pyrimidine dimers. Thus, photolyase uses light energy to undo the damage caused originally by light energy.

Nucleotide Excision Repair

Nucleotide excision repair is a process whereby a damaged section of a DNA chain is cut out, or excised, followed by the action of first DNA polymerase and then DNA ligase to regenerate a covalently closed duplex at the site of the original damage. Nucleotide excision repair can also repair pyrimidine dimers. The enzyme system involved, which in E. coli includes the products of the uvrA, uvrB, and uvrC genes, acts upon a number of DNA-damaged sistes containing lesions that may be quite bulky. Very similar systems exist in mammalian cells and in yeast.

The three-subunit UvrABC enzyme recognizes a lesion and, with the help of ATP hydrolysis, forces DNA to bend, leading to cleavage of the damaged strand at two sites--eight nucleotides to the 5' side of the damaged site and four or five nucleotides to the 3' side. Helicase II, the product of the uvrD gene, is also required, presumably to unwind and remove the excised oligonucleotide, which is ultimately broken down by other enzymes. The end result is a gap 12 or 13 nucleotides in length, with a 3' hydroxyl group and a 5' phosphate at the ends. Polymerase and ligase action then replaces the damaged 12-mer or 13-mer with undamaged DNA.

The UvrABC enzyme is not a classical endonuclease, because it cuts at two distinct sites, so the term excinuclease has been proposed for it, denoting its role in excision repair. This system also repairs DNA damage that results when two strands covalently crosslink to each other. In this case, the two strands are repaired sequentially (one after the other) in order to preserve an intact template strand.

Recent studies of excision repair show that active genes (those undergoing transcription) are preferred substrates for excision repair, and within these genes the template DNA strand is preferentially repaired. Thus, the repair machinery is somehow directed toward sites where repair of DNA damage will do the most good. Transcription-coupled repair may initiate when a transcribing RNA polymerase stalls at the site of a DNA lesion. BRCA 1, a gene associated with increased risk of breast and ovarian cancer, has been implicated in transcription-coupled excision repair.

Base Excision Repair

Base excision repair (BER) is a process that removes one or more nucleotides from a site of base damage. The process initiates with enzymatic cleavage of the glycosidic bond between the damaged base and deoxyribose. The replacement of uracil by thymine in DNA by uracil-DNA N-glycosylase is an example of BER. This is an important mechanism for maintaining the integrity of DNA because cytosine spontaneously deaminates on occasion to form uracil. This can give rise to a G-U base pair. The enzyme uracil DNA glycosylase clips the uracil from the deoxyribose, followed by breakage of the phosphodiester bond by deoxyribose phosphodiesterase. Excision of the deoxyribose sugar followed by polymerization to insert a cytosine followed by DNA ligation completely repairs the problem.

Another enzyme involved in excision of damaged bases is the E. coli enzyme AlkA. This enzyme binds to damaged DNA and flips the base out of the double helix into the active site of the enzyme. The enzyme acts as a glycosylase (as above), clipping the damaged base from the deoxyribose followed by phosphodiester bond cleavage by deoxyribose phosphodiesterase and repair as above.

Mismatch Repair

The repair of replication errors is the best understood system. If DNA polymerase introduces an incorrect nucleotide and it is not corrected by the 3' exonucleolytic proofreading activity of the enzyme, the replicated DNA will contain a mismatch at that site. The error can be corrected by the process called mismatch repair. The prokaryotic mismatch correction system works by "scanning" newly replicated DNA, looking for both mismatched bases and single-base insertions or deletions. When it finds a problem, part of one strand containing the mismatched region is cut out and replaced. The mismatch repair system must reliably recognize the proper strand to repair, for if it chose randomly, it would be incorrect half the time and there would be no gain in replication accuracy. Mismatch repair enzymes can identify the newly replicated strand because it is unmethylated for a short period of time after replication. Only later, after the replication fork passes, is the DNA methylation completed. Thus, if the mismatch repair system encounters a duplex with one strand methylated and one strand unmethylated, it easily recognizes that the parental strand is the methylated one and then uses it as the template to repair the other strand. E. coli methylates primarily at the 'A' of the sequence GATC. As a result, recognition of an unmethylated GATC as far as 1 kbp or more away from the GATC site, in either direction, can help a cell identify which one is the newly replicated strand.

In E. coli, mismatch repair involves action of the MutS, MutL, and MutH proteins. Similar proteins are found in humans and defects in these lead to a common form of colorectal cancer (see below). Repair by the E. coli proteins starts with binding of MutS to the mismatch site, followed by binding of MutL and MutH. MutH breaks a phosphodiester bond on the newly replicated daughter strand, followed by excision, replication, and ligation.

After the methylation system has acted on all GATC sites in the daughter strand, it is too late for the mismatch repair system to recognize the more recently synthesized DNA strand, and thus cannot improve total DNA replication fidelity. However, by operating before methylation, the mismatch repair system increases overall replication fidelity by about 100-fold-from about 1 error in 108 base pairs replicated to about 1 in 1010.

Mutation and Cancer

The disease, xeroderma pigmentosum (XP), is actually a family of diseases, in which one or more enzymes of the excision pathways are deficient. The biological consequences of XP include extreme sensitivity to sunlight and a high incidence of skin cancers. In afflicted individuals, pyrimidine dimers accumulate and are only very poorly repaired. XP can be produced by a defect in the excinuclease that hydrolyzes the DNA backbone near a pyrimidine dimer (nucleotide excision). Eight other DNA repair gene defects can produce the same result In affected humans, there is at present no known way to treat XP.

Another mutational problem known to lead to cancer is hereditary nonpolyposis colorectal cancer (HNPCC). It arise from mutations in mismatch repair genes. In humans, these genes include the hMSHw and hMLH1 genes. These two genes are homologous to the MutS and MutL genes (above) of E. coli. Apparently mutations that inactivate these genes lead to mutations arising throughout the genome. Over time cell proliferation genes get altered, resulting in the characteristic uncontrollable growth of cancer cells.

Triplet Repeats

Yet another type of mutation is that found in Huntington's disease. In this neurological disorder, long tandem repeats of a sequence (called an array) of a gene called huntington of three nucleotides (CAG) occurs. Expression of the huntington gene produces a protein with many glutamines. These tend to aggregate and cause neurological problems if they become too long. In unaffected people, between 6 and 31 repeats are normally found. In those afflicted with the disease, they may have 36-100 or more repeats of the sequence. In addition, the array tends to become longer with each generation. Children of a person with Huntington's disease will begin to show the disease earlier than their parents did.

Ames Test

The Ames test provides a simple way to test for the tendency of materials to cause mutations. In this test, a mixture of bacteria that have a single mutation that prevents them from growing in the absence of histidine are placed in the presence of the compound being tested and then plated on agar plates lacking histidine. By definition, any cells growing on these plates will have suffered a mutation. By comparing the mutation rate (number of cells growing on plates without histidine) of untreated cells to those of treated cells, one can accurately determine the mutation rate for a compound. Many other variants on this simple test are availabe.