Photocross-linking of the NH2-terminal region of Taq MutS protein to the major groove of a heteroduplex DNA.

The MutS DNA mismatch repair protein recognizes heteroduplex DNAs containing mispaired or unpaired bases. To identify regions of MutS protein in close proximity to the heteroduplex DNA, we have utilized the photoactivated cross-linking moiety 5-iododeoxyuridine (5-IdUrd). Nucleoprotein complexes of Thermus aquaticus MutS protein bound to monosubstituted 5-IdUrd-containing heteroduplex DNAs were cross-linked with long-wavelength ultraviolet light. Positioning of the 5-IdUrd moiety at one of three positions within the DNA bulge, two nucleotides upstream or three nucleotides downstream of the unpaired base, resulted in an identical subset of cross-linked peptides as determined by proteolytic fingerprinting. The tryptic peptide cross-linked to an unpaired 5-IdUrd residue was determined by peptide sequencing to correspond to a highly conserved region spanning residues 25-49. Cross-linking to the bulge nucleotide occurred at Phe-39, indicating that this residue contacts, or is in close proximity to, the unpaired base of a heteroduplex DNA. Site-directed mutagenesis resulting in the substitution of Ala for Phe-39 reduced the affinity of the mutant protein for heteroduplex DNA by roughly 3 orders of magnitude, but had no apparent effect on its ability to dimerize, its thermostability, or its ATPase activity. These results implicate the region in the vicinity of Phe-39 as being crucial for heteroduplex DNA binding by Taq MutS protein.

DNA mismatch repair is critical for mutation avoidance in virtually all organisms. In addition to its role in the repair of base pair mismatches and insertion/deletion mutations, mismatch repair serves as a barrier to homologous recombination between evolutionarily divergent sequences (reviewed in Ref. 1). The importance of mismatch repair is highlighted in its role in cancer surveillance; defects in highly conserved components of mismatch repair have been implicated in both hereditary and sporadic tumors (reviewed in Ref. 2; see also Ref. 3). In addition, defects in mismatch repair have been implicated in the rapid evolution and spread of pathogenic virulence in enteric bacteria (4).
Recognition of mispaired or unpaired bases during mismatch repair is carried out by the family of MutS proteins whose members are found in many organisms from bacteria to man. Biochemical studies demonstrate that purified MutS proteins bind to heteroduplex DNAs containing unpaired or mispaired bases in vitro (reviewed in Refs. 1, 2, and 5); however, the molecular details of heteroduplex DNA recognition by MutS, including the identity of structural determinants involved in mismatch recognition, remain obscure.
To identify regions of MutS protein in close contact with the heteroduplex DNA, we have carried out photocross-linking of nucleoprotein complexes containing a MutS protein from Thermus aquaticus bound to a derivatized heteroduplex DNA containing 5-iododeoxyuridine (5-IdUrd), 1 a photoactivated zerolength cross-linker (6). Taq MutS protein has a higher affinity in vitro for heteroduplexes containing insertion/deletions compared with a G:T mismatch and, on the basis of chemical probe experiments, appears to interact with both the major and minor grooves of a heteroduplex DNA in the immediate vicinity of an unpaired base (7,8).
Photocross-linking of DNA-protein complexes has been used to identify regions of a protein in close proximity to a bound DNA. The success of the approach hinges on the requirement for close proximity of an amino acid especially in the case of a zero-length cross-linking moiety, the need for the amino acid to assume an appropriate geometry for cross-linking, and the chemical reactivity of a potential target amino acid. For these reasons, not all amino acid residues in close contact with the DNA will be identified by cross-linking. Conversely, a crosslinked region, while close to the DNA, need not constitute part of a DNA-binding domain. Nevertheless, in the absence of a high-resolution structure, cross-linking experiments combined with mutational analysis of a candidate region can provide a starting point for identifying regions of a protein involved in DNA binding.
Photocross-linking of halogenated pyrimidines has been used to identify point contacts in DNA-and RNA-protein complexes (9 -11). Specific DNA-protein photocross-linking utilizing the photoactivated 5-IdUrd cross-linking moiety has been reported by several groups. Willis et al. (6) demonstrated selective cross-linking of 5-IdUrd-substituted DNA to the Oxytricha telomere protein ␣-subunit following long-wavelength UV irradiation at 325 nm. Recently, we (12) and others (13) have utilized photocross-linking of a 5-IdUrd-substituted DNA to identify the DNA-binding domain of RecA protein, thereby confirming predictions based on crystallographic studies of RecA protein.
Photocross-linking with the 5-IdUrd cross-linker has several desirable characteristics. First, 5-IdUrd is nearly isosteric with thymine since the van der Waals radius of iodine is only 8% larger than that of the methyl group it replaces (6). Second, long-wavelength UV irradiation of 5-IdUrd-derivatized substrates results in specific cross-linking to 5-IdUrd while minimizing photodamage to other chromophores in the protein or DNA (6). Third, incidental cross-linking to regions of a protein not involved in DNA binding is minimized because the 5-IdUrd * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18  moiety is a zero-length cross-linker. Fourth, several amino acids can form adducts with photoactivated pyrimidines, including cysteine, serine, methionine, lysine, arginine, histidine, tryptophan, phenylalanine, and tyrosine (14).
In this paper, we show that a highly conserved region near the NH 2 terminus of Taq MutS protein is cross-linked to the unpaired base of a heteroduplex DNA. Substitution of Ala for Phe-39, the point of cross-linking, results in a mutant protein that is significantly impaired in its ability to bind heteroduplex DNA, although it retains the ability to oligomerize, is thermostable, and retains functional ATPase activity. These results implicate the region in the vicinity of Phe-39 as being crucial for heteroduplex DNA binding by Taq MutS protein.

MATERIALS AND METHODS
Reagents-Taq MutS protein isolated from an Escherichia coli BL21 overproducing strain was purified to apparent homogeneity on Source 30Q and MonoQ HR 10/10 anion-exchange columns (Pharmacia Biotech Inc.) as described elsewhere (8). The concentration of protein refers to protein monomers and was based on a molar extinction coefficient of ⑀ 280 ϭ 8.1 ϫ 10 4 M Ϫ1 cm Ϫ1 determined by amino acid analysis (8). Oligonucleotides were synthesized by automated ␤-cyanoethyl phosphoramidite DNA synthesis using 5-IdUrd-␤-cyanoethyl phosphoramidites (Glen Research Corp.) on a Model 308B DNA synthesizer (Applied Biosystems, Inc.). DNA concentrations reflect the number of DNA molecules.
DNA Binding Assays-Gel mobility shift assays were carried out with 15 nM 32 P-labeled DNA substrates and 280 nM Taq MutS protein as described previously (7). For competition studies, ϳ420 nM 32 Plabeled monosubstituted DNA substrate ⌬1-I(0) was incubated for 15 min at 37°C with 850 nM Taq MutS protein and the indicated molar excess of unlabeled substrate ⌬1 over ⌬1-I(0) in 10 l of 20 mM Tris-HCl, pH 7.5, 10 mM MgCl 2 , and 0.1 mM DTT. Reactions were electrophoresed on 5% native polyacrylamide gels containing 7 mM MgCl 2 , followed by quantitation using a Fuji BAS 2000 phosphoimager.
Analytical Scale UV Irradiation, Protease Digestion, and Purification of Cross-linked Peptides-Approximately 100 nM 32 P-labeled DNA substrates was preincubated for 15 min at 37°C with 440 nM Taq MutS protein (11.9 g) in 20 mM Tris-HCl, pH 7.5, 10 mM MgCl 2 , and 0.1 mM DTT in a volume of 300 l. The material was placed in a thin 1-mm quartz cuvette and irradiated in a UV Stratalinker 1800 equipped with 312-nm bulbs (Stratagene) for 4 h, resulting in the cross-linking of ϳ0.5 g of MutS protein to DNA. This length of cross-linking maximized the yield of photocross-linked products while maintaining high specificity. The cross-linked sample was purified over oligo(dT)-cellulose (Boehringer Mannheim) as described below. Digestion with clostripain (Promega) was performed for 4 h with 20 g of the protease, followed by subsequent digestion with trypsin or Staphylococcus aureus V8 protease (Promega). The trypsin digestions were performed overnight with 13 g of modified trypsin (Promega). Prior to V8 protease digestion, samples were divided in two and diluted either with 50 mM NH 4 HCO 3 , pH 7.8, for cleavage at Glu or with 50 mM NaH 2 PO 4 for cleavage at GluϾ ϾAsp (15). Cleavages were carried out overnight at 37°C in the presence of 6.7 g of S. aureus V8 protease. Finally, the cross-linked material was analyzed on urea-10% denaturing polyacrylamide gels.
Preparative UV Irradiation, Protease Digestion, and Purification of Cross-linked Peptides-45 nmol of 5-IdUrd-monosubstituted bulged DNA substrate ⌬1-I(0) was incubated with 112 nmol (10 mg) of Taq MutS protein for 20 min at 37°C in 17.5 ml of buffer containing 20 mM Tris-HCl, pH 7.5, 10 mM MgCl 2 , and 0.1 mM DTT. Under these conditions, in excess of 85% of the DNA was bound by MutS protein. The majority of DNA was synthesized with a phosphate at the 5Ј-end; a small fraction was 32 P-labeled by T4 polynucleotide kinase. Irradiation in a UV Stratalinker 1800 equipped with 312-nm bulbs (40 watts total) was carried out for 4 h with ventilation on a heat block to maintain the temperature between 30 and 40°C. The efficiency of cross-linking of DNA was 10%, indicating that ϳ1 mg of MutS protein was cross-linked to 32 P-labeled DNA. The irradiated DNA-protein complexes were brought to 15 mM EDTA, 0.5 M NaCl, and 0.1% SDS and passed over a small oligo(dT)-cellulose column (1.2 g; Boehringer Mannheim) equilibrated with binding buffer 1 (40 mM Tris-HCl, pH 7.5, 0.1 mM EDTA, 0.2 mM DTT, 0.01% SDS, and 0.5 M NaCl). The column was sequentially washed with 3 volumes of binding buffer 2 (40 mM Tris-HCl, pH 7.5, 1 mM EDTA, 0.2 mM DTT, 0.1% SDS, and 0.5 M NaCl) and 3 volumes of binding buffer 1. The sample was eluted in elution buffer 1 (5 mM Tris-HCl, pH 7.5, 0.1 mM EDTA, 0.2 mM DTT, and 0.01% SDS).
Samples were subjected to one cycle of clostripain digestion followed by three cycles of trypsin digestion as follows. Fractions containing 32 P-labeled DNA were brought to 0.2% SDS, 3 mM DTT, and 50 mM Tris-HCl, pH 8.0, in a reaction volume of 0.2 ml. The tube was heated at 75°C for 20 min to denature the protein and cooled; 0.4 ml of clostripain buffer (20 mM Tris-HCl, pH 7.5, 2 mM CaCl 2 , and 2 mM DTT) was added. After 4 h of clostripain digestion with 60 g of enzyme at 37°C, the reaction was further diluted by the addition of 1.2 ml of clostripain buffer containing 40 g of clostripain and incubated overnight.
The sample was applied to an oligo(dT)-cellulose column (0.9 g) as described above. The cross-linked material was eluted in 1.5 ml of elution buffer 2 (5 mM Tris-HCl, pH 7.5, 0.1 mM EDTA, 0.2 mM DTT, and 1 M urea) and concentrated in a SpeedVac to ϳ0.1 ml. A small fraction was saved for analysis. The cross-linked material was heated at 75°C for 25 min. After denaturation, the reaction was cooled and diluted with 0.9 ml of trypsin buffer (50 mM Tris-HCl, pH 7.5, and 2 mM CaCl 2 ). After 4 h of digestion with 80 g of modified trypsin at 37°C, an additional 80 g of trypsin was added, and the cleavage was continued overnight.
The sample was again purified by oligo(dT)-cellulose (0.7 g) chromatography and eluted in elution buffer 1. The buffer was adjusted to 0.2% SDS, 2 mM DTT, and 50 mM Tris-HCl, pH 8.0, in 0.1 ml. The eluted material was heated at 75°C for 20 min and cooled to room temperature. 0.45 ml of trypsin buffer and 100 g of trypsin were added. The digestion proceeded at 37°C overnight.
The sample was again purified by oligo(dT)-cellulose (0.6 g) chromatography, eluted in elution buffer 2, and concentrated in a SpeedVac. The final round of trypsin digestion proceeded overnight in the presence of 180 g of modified trypsin. After purification on oligo(dT)-cellulose, the material was loaded on a urea-10% denaturing polyacrylamide gel. Radioactive material visualized by autoradiography was excised from the gel, eluted, and sent to the W. M. Keck Foundation Biotechnology Resource Laboratory at Yale University for peptide sequencing.
Site-directed Mutagenesis and F39A Purification-For mutagenesis, a fragment containing the entire coding region of Taq MutS protein was recloned from pETMutS (7) into a pET23 expression vector (Novagen). pETMutS was digested with SacII and blunt-ended. Following digestion with NdeI, the NdeI-SacII fragment containing the coding region of Taq MutS was ligated into NdeI/BamHI-digested pET23 in which the BamHI site was blunt-ended. The resulting construct, pET23MutS, contains the entire coding region of the wild-type Taq MutS protein. A single amino acid substitution, Ala for Phe-39, was introduced into the coding region of Taq MutS in the pETMutS vector using the Quick-Change mutagenesis kit (Stratagene). The DNA synthesis primers used were 5Ј-CTC TTC CAG GTG GGG GAC GCC TAC GAG TGC TTC GG-G GAG and 5Ј-CTC CCC GAA GCA CTC GTA GGC GTC CCC CAC C-TG GAA GAG, with the mutated nucleotides underlined. The mutagenized plasmid was verified by sequencing and linearized with NdeI and BamHI. The resulting 218-base pair NdeI-BamHI fragment containing the mutation was ligated to pET23MutS linearized with NdeI and BamHI. The resulting construct, pET23MutS-F39A, was verified by sequencing.
Both the wild-type and F39A mutant Taq MutS proteins were purified from E. coli BL21 by heating the crude lysate at 70°C, followed by passage over a MonoQ HR 10/10 column and a Hi-Load Sephacryl S-300 16/60 column (Pharmacia Biotech Inc.) as described (8). The concentration of the F39A mutant protein was determined by comparison with the wild-type Taq MutS protein after staining SDS-polyacrylamide gels with SYPRO Orange (Molecular Probes, Inc.) and quantitation on a Molecular Dynamics Storm 860 as well as by spectrophotometric measurement.

Heteroduplex DNA Binding by Taq MutS Protein-
The photocross-linking scheme necessitated modification of heteroduplex DNA substrates containing an unpaired base, usually a thymidine. The modified heteroduplexes differ from the canonical heteroduplex (⌬1) in that the former (⌬1-I(0), ⌬1-I(Ϫ2), and ⌬1-I(ϩ3)) each contain a single photoactivated 5-IdUrd crosslinking moiety at one of three different positions in the heteroduplex (Fig. 1). In addition, the modified heteroduplexes used in cross-linking experiments contain an oligo(dA) 18 singlestrand region at the 5Ј-end of the strand bearing the 5-IdUrd moiety that facilitates purification of the cross-linked species. Since the van der Waals radius of iodine is only 8% larger than that of the methyl group it replaces (6), it is unlikely that the monosubstitution of thymine by photoactive 5-IdUrd would significantly alter the structure of the MutS nucleoprotein complex.
Here, we show that Taq MutS fails to recognize either the poly(dA) 18 single-strand region or a 5-IdUrd-bearing homoduplex. This result is consistent with previous observations showing that Taq MutS does not readily form complexes with singlestrand DNA or DNA homoduplexes (7). The ability of Taq MutS protein to bind to each of the heteroduplex DNAs and their corresponding control homoduplexes was assessed in a gel mobility shift assay as shown in Fig. 1. As expected, Taq MutS protein readily formed a complex with the bulged ⌬1 substrate (Fig. 1, lane 2) and did not bind to the perfect homoduplex AT1 (lane 3). Taq MutS protein also efficiently formed nucleoprotein complexes with 5-IdUrd-containing heteroduplexes ⌬1-I(0) (lane 5), ⌬1-I(Ϫ2) (lane 7), and ⌬1-I(ϩ3) (lane 9). Furthermore, these complexes had the same electrophoretic mobility as that formed with the control ⌬1 substrate (lane 2). In contrast, we failed to detect any binding of Taq MutS protein to homoduplex DNA substrates lacking an unpaired base, but containing poly(dA) tails and one 5-IdUrd substitution (lanes 6, 8, and 10).
All DNA binding experiments were conducted at 37°C, and UV irradiation of nucleoprotein complexes was carried out at 30 -40°C (see "Materials and Methods"). We have previously shown that the extent of DNA binding by Taq MutS and the discrimination between heteroduplex and homoduplex DNAs are unchanged over a wide temperature range from 4 to 70°C (7).
The mode of heteroduplex binding by Taq MutS protein to the 5-IdUrd-substituted ⌬1-I(0) heteroduplex was assessed in competition experiments. As shown in Fig. 2, an increasing molar excess of the unsubstituted ⌬1 heteroduplex effectively competes for binding to the 32 P-labeled, 5-IdUrd-substituted ⌬1-I(0) heteroduplex. Quantitation of the extent of competition reveals that a 2-fold molar excess of the ⌬1 heteroduplex reduces binding of Taq MutS to the derivatized ⌬1-I(0) heteroduplex by 50%. From the data shown in Figs. 1 and 2, we conclude that Taq MutS protein binds specifically to heteroduplex DNAs bearing an unpaired base and that this binding is not appreciably altered in heteroduplexes containing the photocrosslinking derivative.
Photocross-linking of Heteroduplex-MutS Complexes-Analytical scale photocross-linking experiments were carried out with the ⌬1-I(0) heteroduplex and Taq MutS protein to ascertain that cross-linking was specific for heteroduplex-MutS complexes. DNA binding reactions between Taq MutS protein and either the ⌬1-I(0) heteroduplex or the corresponding AT-I(0) homoduplex (see Fig. 1) were carried out as described above. Samples were then irradiated at 312 nm to effect cross-linking as described under "Materials and Methods," and the resulting material was visualized after SDS-polyacrylamide gel electrophoresis on 10 -20% Tricine gels (Fig. 3). Cross-linking was achieved only when Taq MutS protein was incubated with the ⌬1-I(0) heteroduplex and UV-irradiated and resulted in the appearance of a single cross-linked species as judged by electrophoretic mobility (Fig. 3, lane 4). The yield of cross-linked complexes was routinely 10 -13%. Omission of either MutS protein or UV irradiation failed to yield any cross-linked species. As expected, no cross-linked material was detected in the case of the 5-IdUrd-substituted homoduplex (lane 8), consistent with the DNA substrate specificity of Taq MutS protein shown in Fig. 2.
Mapping of Cross-linked Peptides-To determine the optimal placement of the 5-IdUrd crossing-linking group within the heteroduplex DNA and to obtain preliminary information on the nature of any cross-linked peptide, we carried out crosslinking on an analytical scale followed by peptide fingerprinting. Dimethyl sulfate footprinting experiments revealed that interactions of Taq MutS protein with the major groove of a heteroduplex are limited to several base pairs on either side of a mispaired or unpaired base (8). Correspondingly, the 5-IdUrd moiety was positioned 2 bases 5Ј or 3 bases 3Ј of the unpaired thymidine, ⌬1-I(Ϫ2) and ⌬1-I(ϩ3), respectively, or was itself the unpaired base, ⌬1-I(0) (see Fig. 1).
The 32 P-labeled ⌬1-I(0), ⌬1-I(Ϫ2), and ⌬1-I(ϩ3) heteroduplexes were incubated with a 2-fold molar excess of Taq MutS protein, followed by UV irradiation to effect cross-linking as described under "Materials and Methods." Cross-linked complexes were purified from free MutS protein by passage over an oligo(dT) column after denaturation in the presence of 0.1% SDS and 15 mM EDTA. The addition of EDTA to chelate free Mg 2ϩ and SDS reduced the chance of binding of free MutS protein to the oligo(dT) column since, as we have previously shown, DNA binding exhibits an absolute requirement for Mg 2ϩ (7). The presence of SDS during affinity chromatography and clostripain digestion also minimized losses of cross-linked material. Cross-linked complexes were then subjected to sequential rounds of proteolysis with clostripain and S. aureus V8 proteases. At each step of proteolysis, cross-linked peptides covalently bound to heteroduplex DNAs containing singlestrand oligo(dA) tails were recovered by affinity chromatography over oligo(dT) columns.
The use of DNA affinity chromatography for the purification of cross-linked peptides afforded several advantages over other methods that depend on characteristics of the peptide moiety. First, since the chromatography step is based on the recovery of heteroduplex DNAs rather than on any property of the crosslinked peptide, we minimized any bias in the recovery of crosslinked species. Second, the recovery of cross-linked material was compared at each round of purification to the recovery of uncross-linked DNAs, providing a means for monitoring selective as opposed to nonspecific losses of cross-linked peptides. Third, since Taq MutS protein exhibits insignificant binding to single-strand DNA (7), the risk of contaminating free protein and resulting peptides was lowered.
The cross-linked proteins were digested with clostripain, which cleaves at the carboxylic side of Arg, and were visualized by autoradiography after urea-denaturing polyacrylamide gel electrophoresis (Fig. 4). Each of the three heteroduplex DNAs cross-linked to a clostripain-derived peptide (denoted peptide A) was observed to have the same apparent electrophoretic mobility (Fig. 4, lanes 1, 5, and 9). Inspection of Fig. 4 reveals that the efficiency of cross-linking to different heteroduplexes showed significant differences. Whereas 10 -13% of the complexes were cross-linked in the case of the ⌬1-I(0) substrate, Ͻ1% of the complexes were cross-linked to the ⌬1-I(Ϫ2) or ⌬1-I(ϩ3) heteroduplex.
Peptide fingerprints of the cross-linked peptide obtained by sequential rounds of digestion with two additional proteases strongly suggest that an identical MutS peptide is cross-linked to each of the three heteroduplexes. The clostripain-treated peptides were digested with trypsin, which cleaves at the carboxylic side of both Arg and Lys residues, followed by digestion with S. aureus V8 protease, which cleaves after Glu and Asp residues. Trypsin digestion did not result in the appearance of any new species (Fig. 4, lanes 2, 6, and 10), suggesting that the clostripain peptides cross-linked to DNA do not contain internal lysines and are the shortest tryptic peptides cross-linked to 5-IdUrd (see below). These data also suggest that only one region of Taq MutS protein is being efficiently cross-linked to the 5-IdUrd-substituted heteroduplex DNA.
Subsequent incubation with V8 protease under conditions in which cleavage occurs only at Glu, in NH 4 HCO 3 buffer (Fig. 4,  lanes 3, 7, and 11), or under conditions in which cleavage at Glu is favored over Asp, in NaH 2 PO 4 buffer (lanes 4, 8, and 12), resulted in the appearance of a major cross-linked product (denoted peptide AЈ) as well as several minor species with identical electrophoretic mobilities for all three heteroduplexes. This result indicates that the cross-linked, clostripainderived peptide has internal Glu residues. We have routinely observed that V8 protease cleaves at Glu residues more efficiently in the presence of the phosphate buffer compared with the carbonate buffer (12). This most likely accounts for the increase in abundance of product AЈ in lanes 4, 8, and 12 compared with lanes 3, 7, and 11.
⌬1-I(0) substrate is due to the following. First, the frequency of photocleavage is greater when the 5-IdUrd moiety is paired with an adenine and stacked within the DNA helix as opposed to being unpaired. Second, the presence of a bound protein in the vicinity of 5-IdUrd may inhibit DNA photocleavage.
Amino acid Sequencing of Cross-linked Tryptic Peptides-The identity of the photocross-linked MutS peptide was unambiguously determined by amino acid sequencing. Taq MutS protein was incubated with the 32 P-labeled ⌬1-I(0) heteroduplex on a preparative scale and irradiated at 312 nm. As described for the analytical cross-linking shown in Fig. 3, the yield of cross-linking was ϳ10%. Due to the low levels of crosslinking of the ⌬1-I(Ϫ2) and ⌬1-I(ϩ3) heteroduplexes described above, peptide sequencing of these complexes was not attempted. Following cross-linking of the ⌬1-I(0) heteroduplex, the addition of SDS and EDTA, and passage over an oligo(dT) column to remove unbound MutS protein (see above), the crosslinked material was subjected to clostripain digestion. A small aliquot was analyzed by denaturing gel electrophoresis, revealing that the clostripain digestion was incomplete, resulting in larger, underdigested products (compare Fig. 4 (lane 1) with Fig. 5 (lane 2)). Such a result was not unexpected since, in the preparative cross-linking experiment in Fig. 5, 1 mg of crosslinked MutS protein was digested with 100 g of clostripain, whereas in the peptide fingerprinting experiment shown in Fig.  4, 0.5 g of cross-linked protein was digested with 20 g of clostripain. We have also observed that the thermostable Taq MutS protein is relatively resistant to proteolytic cleavage by a variety of proteases. 2 The clostripain-treated material was repurified on an oligo(dT)-cellulose column and subjected to three additional cycles of trypsin digestion, each followed by purification on an oligo(dT)-cellulose column. Urea-denaturing polyacrylamide gel electrophoresis of the cross-linked material after the final round of trypsinization revealed the presence of three 32 Plabeled species (labeled A, B, and C) as well as free 32 P-labeled oligonucleotide and DNA photocleavage products (Fig. 5, lane  3). Material from bands A-C and free DNA (Oligo) were excised from the gel and subjected to amino acid sequencing. After the extensive proteolysis and purification scheme, the final yield of all 32 P-labeled material that was subjected to amino acid sequencing was 36%. More important, the yield of cross-linked DNAs corresponding to bands A-C after proteolysis and purification was 8% of all 32 P-labeled material compared with 10% of all 32 P-labeled material prior to proteolysis and purification. Thus, the recoveries of cross-linked and non-cross-linked DNAs were similar throughout the proteolysis and purification procedure. At each cycle of proteolytic digestion and oligo(dT) purification, the recovery of 32 P-labeled material was in excess of 70%. These results suggest that our experimental scheme did not result in selective loss of the major cross-linked species.
Amino acid sequencing of peptide A yielded an unambiguous match with a region from the NH 2 terminus of Taq MutS corresponding to amino acid residues 25-49 (Fig. 5). After 25 cycles of sequencing terminating in an Arg residue, no additional amino acid sequence was detected. As expected from the pattern of clostripain and trypsin digestion seen in Figs. 4 and 5, no Lys residues were identified in peptide A, and the deduced peptide sequence of the intact Taq MutS protein predicts that peptide A is immediately preceded by Arg-24. No amino acid was identified in cycle 15, which corresponds to Phe-39. Since Phe-34 and Phe-43 were detected in cycles 10 and 19, respectively, this strongly suggests that Phe-39 is the point of cross-linking.
The cross-linked material corresponding to band A migrated with the identical mobility as the clostripain-derived peptide obtained from analytical cross-linking experiments shown in Fig. 4 (band A) and represents the limit digest with trypsin. The recovery of the same cross-linked tryptic peptide after the abbreviated purification scheme in the analytical experiment and after extensive rounds of purification and proteolytic digestion in the preparative case strongly suggests that, under our cross-linking conditions, only one region of MutS protein is efficiently cross-linked to the heteroduplex DNA.
Based on electrophoretic mobilities, proteolytic precursorproduct relationships, and amino acid sequencing, the peptides corresponding to bands B and C were shown to be derived from incomplete proteolysis of peptide A. First, the slower migrating B and C bands were not detected during analytical peptide mapping where the extent of proteolysis was greater (Fig. 4). Second, after the second cycle of preparative trypsin digestion, bands B and C predominated and band A was a minor species (data not shown), whereas after the third round of trypsin digestion, band A predominated at the expense of bands B and C (Fig. 5, lane 3). Third, amino acid sequencing of peptides from bands B and C indicated that they both initiate at Gly-7, consistent with trypsin cleavage at Lys-6 (7). Sequencing was carried out for 19 cycles ending at Asp-25, indicating that peptides B and C start upstream of peptide A and overlap with peptide A. The exact position of cross-linking of peptides B and C was downstream of Asp-25 and could not be determined. As expected, no amino acid sequence was obtained from the material corresponding to free oligonucleotide.
Loss of Heteroduplex Binding in a F39A Mutant Protein-Site-specific mutagenesis of the coding region of Taq MutS protein was carried out, resulting in a single amino acid substitution of Ala for of Phe-39, the point of cross-linking. The wild-type and F39A mutant Taq MutS proteins were purified in parallel from E. coli.
The F39A protein exhibited the same anomalous behavior upon gel filtration on Sephacryl S-300 as the wild-type protein (Fig. 6A), yielding apparent sizes of 290 kDa for the F39A mutant compared with 280 kDa for the wild-type protein. Since the wild-type Taq MutS protein exists primarily as a dimer in solution, 2 this result indicated that the substitution of Ala for Phe-39 did not significantly alter the overall structure of the protein and did not prevent oligomerization of the mutant monomers. Circular dichroism revealed that the F39A mutant 2 I. Biswas and P. Hsieh, unpublished observations. protein has to a first approximation retained the native conformation observed for the wild-type protein and ruled out the possibility that the mutant polypeptide is grossly misfolded or denatured (data not shown).
The thermal stability of the mutant protein as well as its ATPase activity were assessed in parallel with the wild-type protein. We have previously shown that the wild-type Taq MutS protein has a thermostable ATPase activity in which ATP is hydrolyzed to ADP and P i (7). As shown in Fig. 6B, the ATPase activity of the F39A mutant protein was very similar to that of the wild-type protein at 70°C. In addition, both the wild-type and mutant MutS proteins retained full ATPase activity after incubation for 20 min at 70°C. Thus, the substitution at Phe-39 has no significant effect on either the thermostability or the ATPase activity of MutS protein. The identical size as determined by gel filtration chromatography, the similar CD spectra, and the same thermal stability and ATPase activity of the wild-type and F39A mutant proteins are wholly consistent with the absence of gross conformational changes in the mutant protein.
The ability of the F39A mutant protein to bind heteroduplex DNA was assessed in gel mobility shift assays. As shown in Fig.  6C, the wild-type Taq MutS protein readily bound to a heteroduplex DNA containing an unpaired thymidine residue. In contrast, the F39A mutant protein bound very weakly to the same heteroduplex, requiring protein concentrations in excess of 10 Ϫ7 M (although the size of the DNA-F39A complex was unchanged as judged by electrophoretic mobility). Examination of the binding data indicates that substitution of Ala for Phe-39 may have lowered the relative affinity for heteroduplex DNA by some 3 orders of magnitude. The best fit of the data in Fig.  6C, assuming that at equilibrium one dimer of MutS binds to a single heteroduplex DNA, yielded apparent dissociation binding constants of 6 ϫ 10 Ϫ10 and 1 ϫ 10 Ϫ6 M for the wild-type and F39A proteins, respectively. While these are only rough approximations of the relative affinities for heteroduplex DNA, the data in Fig. 6C clearly establish that heteroduplex binding is severely impaired in the F39A mutant. In fact, the high protein concentration (Ͼ10 Ϫ7 M) required for residual heteroduplex binding of the mutant protein is similar to that required for binding to perfect homoduplex DNA by the wild-type Taq MutS protein (data not shown). We were unable to detect any binding to homoduplex DNA by the F39A mutant protein under these conditions. DISCUSSION Photocross-linking of Taq MutS protein to a derivatized heteroduplex DNA containing a 5-IdUrd cross-linking moiety reveals that a region at the NH 2 terminus of MutS is closely associated with the major groove of the heteroduplex DNA. Peptide sequencing of the limit trypsin digest of the crosslinked peptide indicates that it maps to residues 25-49, with Phe-39 being the point of cross-linking. Substitution of Ala for Phe-39 results in a mutant protein whose relative affinity for heteroduplex DNA is 3 orders of magnitude lower than that of the wild-type MutS protein. The severe deficiency in DNA binding resulting from a single amino acid change at Phe-39 is not attributable to a gross alteration in the conformation of the mutant protein. The F39A mutant protein is able to dimerize like its wild-type counterpart. In addition, the F39A mutant protein retains thermostability and an ATPase activity that is essentially unchanged from that of the wild-type MutS protein.
Taken together, these data strongly implicate the region near Phe-39 as being critical for heteroduplex binding by Taq MutS protein.
The importance of the region in the vicinity of Phe-39 of Taq MutS protein for heteroduplex binding is supported by the extensive sequence conservation of this region among prokaryotic and eukaryotic MutS proteins. While the highest degree of conservation resides in regions near the C terminus including the Walker consensus sequence for magnesium-dependent ATP binding and a helix-turn-helix motif (7), the cross-linked region spans a number of conserved residues. Phe-39, the point of cross-linking, is itself highly conserved with two exceptions. In yeast and human MSH3 proteins, the corresponding position is a Lys residue. Interestingly, in both cases, the Lys residue is flanked on both sides by aromatic Tyr residues. Inspection of the region in the vicinity of Phe-39 reveals that it is very hydrophobic, with conserved aromatic residues at positions 27, 34, 39, 40, and 43. A hypothesis, as yet untested, is that this region is involved in hydrophobic interactions with the major groove of a heteroduplex DNA. The region shown in Fig. 7 corresponds to the smallest tryptic peptide cross-linked to the DNA; the actual region at the N terminus of MutS involved in DNA binding may be larger or smaller.
While the cross-linked region is conserved in virtually all MutS homologs, there are two notable exceptions. A search of GenBank™ data base sequences (Release 100) using BLAST for homology to the first 50 amino acids of Taq MutS protein identified all known members of the MutS family except Saccharomyces cerevisiae MSH4 and MSH5. Multiple sequence alignments of MutS family members using Pileup (Genetics Computer Group, Inc., Madison, WI) suggest that MSH4 and MSH5 may have lost the putative DNA-binding domain identified in this study since the first amino acids of MSH4 and MSH5 align at Taq MutS positions 73 and 101, respectively. Interestingly, the MSH4 and MSH5 homologs do not appear to function in mismatch repair since msh4 (17) and msh5 (18) mutants do not have a mutator phenotype. Instead, these mutants have increased levels of nondisjunction of homologous chromosomes at meiosis I resulting from decreased levels of reciprocal exchanges between homologs, suggesting that MSH4 and MSH5 proteins function in the regulation of homologous recombination. On the basis of these findings, it has been proposed that MSH4 and MSH5 have evolved to interact with intermediates of homologous recombination rather than mismatches or unpaired bases. These findings concerning MSH4 and MSH5 lend indirect support to our contention that the NH 2 -terminal region of MutS is involved in the recognition of heteroduplex DNA during mismatch repair.
The cross-linking data constitute independent support for interactions of MutS proteins with the major groove of the heteroduplex DNA. Methylation footprinting and interference studies of Taq MutS (8) and a human MutS homolog, GTBP (19), bound to a heteroduplex established that these MutS proteins contact the major groove of the heteroduplex in the vicinity of an unpaired or mispaired base. The 5-IdUrd crosslinking moiety used in this study is a zero-distance cross-linker positioned in the major groove of the heteroduplex. Although the geometry of the unpaired 5-IdUrd base within the nucleoprotein complex containing heteroduplex ⌬1-I(0) is unknown, the fact that the identical clostripain peptide was cross-linked to heteroduplexes ⌬1-I(Ϫ2) and ⌬1-I(ϩ3) in which the 5-IdUrd moieties are stacked within the duplex (20) suggests that the NH 2 -terminal region of Taq MutS protein is in close proximity to the major groove near an unpaired base. These data suggest that the specificity of heteroduplex DNA recognition is achieved by the interaction of MutS protein with the mispaired or unpaired base and, at a minimum, constituents residing in the major groove of flanking residues near the lesion. Interactions involving more distal regions of the heteroduplex detected by DNase I footprinting (8,21) may serve to increase the affinity of MutS protein for nonspecific DNA contacts.
Photocross-linking of Phe-39 to a heteroduplex and characterization of a mutant protein bearing a substitution of Ala for of Phe-39 suggest that Phe-39 is involved in heteroduplex DNA binding by MutS protein. We note, however, that the exact role of Phe-39 in DNA binding has not been established. Phe-39 may make direct contacts with the unpaired base. If so, the low levels of cross-linking to the heteroduplexes substituted at positions Ϫ2 and ϩ3 must involve residues other than Phe-39, a prediction we have not tested. Alternatively, despite its close proximity to the unpaired base, Phe-39 may have an indirect role in DNA binding, contributing, for example, to the conformational stability of the DNA-binding site. Finally, the identification of other residues critical for DNA binding that reside in the vicinity of Phe-39 as well as elsewhere in the MutS polypeptide awaits further study. FIG. 7. Sequence comparison of the cross-linked NH 2 -terminal region of Taq MutS with other MutS homologs. Sequences were obtained from GenBank™ and were aligned using the Pileup program. The black shading denotes identical residues; the gray shading denotes highly conserved residues. The arrow corresponds to the position of cross-linking at Phe-39. Numbering refers to the Taq sequence.