The High Mobility Group Domain Protein Cmb1 ofSchizosaccharomyces pombe Binds to Cytosines in Base Mismatches and Opposite Chemically Altered Guanines*

The mismatch-binding activity Cmb1 ofSchizosaccharomyces pombe was enriched from wild type cells, and N-terminal sequencing enabled cloning of the respective gene. The deduced amino acid sequence of cmb1 +contains a high mobility group domain, a motif that is common to a heterogeneous family of DNA-binding proteins. In crude protein extracts of a cmb1 gene-disruption strain, specific binding to C/T, C/A, and C/Δ was abolished. Weak binding to C/C revealed the presence of a second mismatch-binding activity, Cmb2. Cmb1, enriched fromS. pombe and purified from Escherichia coli,bound specifically to C/C, C/T, C/A, T/T, and C/Δ but showed little or no affinity to other mismatches and small loops. Cmb1 recognizes 1,2 GpG intrastrand cross-links, produced by the chemotherapeutic drug cisplatin, when two cytosines are opposite the cross-linked guanines but not when other bases are present. Consistently, O6-methylguanine:C but not O6-methylguanine/T lesions were bound. Thus, cytosines in mismatches and opposite chemically modified guanines are the preferred target of Cmb1 recognition. cmb1 mutant cells are more sensitive to cisplatin than wild type cells, indicating a role of Cmb1 in repair of cisplatin-induced DNA damage.

DNA can be damaged by a variety of physical and chemical agents. Most types of lesions are efficiently removed from DNA by repair mechanisms. The nucleotide excision repair (NER) 1 system is able to correct a broad spectrum of adducts, including cyclobutane pyrimidine dimers and 6-4 photoproducts caused by ultraviolet light, intrastrand cross-links formed by cis-diamine-dichloroplatinum(II) (cisplatin), as well as O 6 -methylguanine (O 6 meG) produced by methylating agents such as N-methyl-NЈ-nitro-N-nitrosoguanidine (MNNG) (1)(2)(3). The mutHLSlike mismatch-repair pathway efficiently corrects most basebase mismatches and small DNA loops, which frequently occur during replication by misincorporation of bases and strand slippage, respectively (4,5). A defect in the human genes hMSH2 (a homologue of Escherichia coli MutS), hMLH1, hPMS1, or hPMS2 (all of which are MutL homologues) is responsible for the heritable cancer syndrome hereditary nonpolyposis colon cancer. The clinical symptoms of patients with homozygously mutated hMSH2, hMLH1, or hPMS2 genes are accompanied by microsatellite instability and mismatch repair deficiency, whereas there is no indication that hPMS1 has a role in mismatch repair (4,6,7). The human mismatch-binding activity hMutS␣, a heterodimer of the two MutS homologous proteins hMSH2 and GTBP (8,9), also recognizes DNA lesions containing O 6 meG, O 4 -methylthymine, and cisplatin-induced 1,2 GpG intrastrand cross-links (10). Thus, mismatch-repair proteins and NER proteins might compete in repair of some types of DNA lesions. Whereas repair mediated by the NER system normally restores wild type information, the action of mismatch-repair proteins on lesions rather contribute to cytotoxicity (11,12). Several high mobility group (HMG) proteins are known to bind to GpG intrastrand cross-links (13). Binding by HMG-1 shields the lesion from excision by the human excinuclease protein complex (14). Consistently, a Saccharomyces cerevisiae mutant, defective in the HMG-box gene IXR1, is more resistant to cisplatin than wild type (15).
We have identified a mismatch-binding activity in Schizosaccharomyces pombe crude extracts showing high affinity to cytosine-containing mismatches (16). Here, we describe the purification of this activity, Cmb1, and cloning of the respective gene. DNA sequencing revealed that Cmb1 contains a HMG domain. Cmb1 was purified from an E. coli overexpression strain and tested for binding to mismatches, small DNA loops, and substrates containing either GpG intrastrand cross-links or O 6 meG. Furthermore, we addressed the question of whether a cmb1 mutant responds differently to the chemicals cisplatin, transplatin, and MNNG.

EXPERIMENTAL PROCEDURES
DNA Binding Analyses-Binding of Cmb1 protein to DNA substrates was tested with a band shift assay. Substrates and competitor DNA used to test binding to mismatches, loops, and O 6 meG lesions are in the M13mp9 polylinker sequence context (16). The sequence of the platinated substrates and their unplatinated controls is shown in Fig. 6A. For platination, 1 pmol of plus strand was incubated for 24 h at 37°C in 100 l of 10 mM Tris-HCl (pH 7.5) containing either 0.1 mM cisplatin or 0.1 mM transplatin (Sigma). Unreacted cisplatin or transplatin was removed by chromatography on Sephadex G25 (Amersham Pharmacia Biotech). The extent of platinated oligonucleotides was checked on a 6% polyacrylamide gel and was estimated to be approximately 60 -80%. Modified and unmodified strands were 5Ј-end-labeled by T4 polynucleotide kinase (Promega) in the presence of [␥-32 P]ATP and annealed with their complementary strands as described (16). To test binding to the various substrates, 20-l reactions were performed for 15 min at 4°C in 25 mM Tris-HCl (pH 7.5), 0.5 mM dithiothreitol, 4 mM spermidine, 0.5 mM EDTA, 10% glycerol, 0.01 mM ZnCl 2 , 50 -200 mM NaCl in the presence of 40 fmol radiolabeled substrate. The amount of unlabeled * This work was supported by the Swiss National Science Foundation. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EBI Data Bank with accession number(s) AJ002513.
‡ Supported by a Forschungsstipendium of the Deutsche Forschungsgemeinschaft. To whom correspondence should be addressed. homoduplex DNA as competitor and the source of protein samples are indicated in the text. Reaction samples were separated by electrophoresis (110 V, 4°C) in 6% nondenaturing polyacrylamide gels in 40 mM Tris-HCl (pH 7.5), 20 mM sodium acetate, 1 mM EDTA. Subsequently, gels were subjected to autoradiography.
Partial Purification of Cmb1 from S. pombe and N-terminal Peptide Sequencing-The procedure for obtaining S. pombe cells for protein extracts was as described (16), except that we used Buffer A (25 mM Tris-HCl (pH 7.5, adjusted at room temperature), 150 mM NaCl, 1 mM EDTA, 0.5 mM spermidine, 0.1 mM spermine, 5 mM ␤-mercaptoethanol, 0.1 mM phenylmethylsulfonyl fluoride, 1 mM dithiothreitol) for washing and suspending cells. To estimate the degree of purity of Cmb1, protein samples from different purification steps were separated on 12-13% SDS-polyacrylamide gels by electrophoresis and analyzed after staining with silver nitrate or Coomassie Blue. For protein purification, all steps were done at 4°C. 485 g of cells (720 ml), previously frozen in liquid nitrogen and stored at Ϫ70°C, were thawed. A 400-ml chamber for a Bead beater (Biospec Products) loaded with 350 g of acid-washed glass beads (0.5 mm in diameter) was used for four consecutive steps of cell disruption. Each step included ten 30-s intervals, with at least 2 min of cooling on ice between each interval. The suspensions were pooled and centrifuged for 1 h at 32,000 rpm in a Ti45 rotor (Beckman). The 510-ml supernatant was separated from cell debris and lipids (fraction I). (NH 4 ) 2 SO 4 was added to a final concentration of 50%, and the precipitate was collected by 20 min of centrifugation at 13,000 ϫ g, suspended in Buffer B150 (25 mM Tris-HCl (pH 7.5, adjusted at room temperature), 150 mM NaCl, 0.1 mM EDTA, 10% glycerol (v/v), 5 mM ␤-mercaptoethanol, 0.1 mM phenylmethylsulfonyl fluoride) and dialyzed twice against 5.5 liters of Buffer B150 (fraction II). The 310-ml fraction II was loaded at a flow rate of 140 ml/h on two parallel DEAE-cellulose columns (DE52, Whatman, 20 cm 2 ϫ 22 cm), equilibrated with Buffer B150. The flow-through (420 ml of fraction III) was loaded at a flow rate of 43 ml/h on a double-stranded DNA-cellulose column (Sigma, 4.5 cm 2 ϫ 20 cm), equilibrated with B150. Bound proteins were eluted by a 480-ml linear gradient from 100% B300 to 100% B700 (same buffers as B150, except that they contain 0.3 and 0.7 M NaCl, respectively). Active fractions (between 0.4 and 0.6 M) were pooled and dialyzed twice against 4 liters of Buffer B150 to give 290 ml of fraction IV. Fraction IV was loaded at a flow rate of 27 ml/h on a heparin-agarose column (type I, Sigma, 1.8 cm 2 ϫ 7 cm), equilibrated with Buffer B150. Bound proteins were eluted step-wise with 2 column volumes of B400, B550, and B700, respectively (buffers differ from Buffer B150 in NaCl concentrations of 0.4, 0.55, and 0.7 M, respectively). The mismatch-binding activity Cmb1 eluted at 0.55 M NaCl. A 4-ml aliquot was stored for N-terminal peptide sequencing (see below). The remainder of the fractions were pooled, dialyzed twice against 2 liters of B150 (18 ml of fraction V) and then loaded on a 1 ml MonoQ column (HR5/5, Amersham Pharmacia Biotech) at a flow rate of 0.5 ml/min. Bound proteins were eluted by 2 ml of B300 followed by a 14-ml linear gradient from 0.3 to 1 M NaCl. Cmb1 peaked at 0.4 M NaCl of the gradient but was also detected in the flow-through and in the fractions eluted by 0.3 M NaCl, indicating that the MonoQ column was overloaded. Nevertheless, the use of this column enabled the identification of the mismatch binding activity as a 22-kDa protein (Fig. 1). Active fractions were separately dialyzed against Buffer S (25 mM Tris-HCl (pH 7.5), 150 mM NaCl, 0.1 mM EDTA, 50% glycerol (v/v), 5 mM ␤-mercaptoethanol, 0.2 mM phenylmethylsulfonyl fluoride) and stored at Ϫ20°C (fraction VI).
For N-terminal sequencing, a 4-ml aliquot of fraction V was concentrated to 280 l by 75 min of centrifugation at 3600 ϫ g in two 2-ml Centricon-10 tubes (Amicon Inc.). Proteins were separated on a 12% SDS-polyacrylamide gel, transferred to an Immobilon membrane by electroblotting, and stained with Amido Black, and the 22-kDa protein band (400 pmol) was excised. The N-terminal sequence EKNGLQKLIP-PRLKTIWNQMLVETKGAGN of 29 amino acids was obtained by Johann Schaller and Urs Kä mpfer (Protein Analytical Service, Institute of Biochemistry, Bern, Switzerland) using a pulsed-liquid phase Sequenator 477A (Applied Biosystems Inc.).
Polymerase Chain Reactions (PCRs)-PCRs were performed in a Perkin-Elmer DNA thermal cycler. Standard reactions of 50 l contained 10 -100 pmol of each of two primers, 0.1 mM each dNTP, 1 unit of Taq DNA polymerase (Appligene) in a standard PCR buffer (Appligene). As templates, either 300 ng of genomic S. pombe DNA (for PCR cloning) or about 10 ng of plasmid DNA (for hybridization probes, for the construction of the cmb1::his3 ϩ gene disruption, and for cloning of cmb1 into the E. coli expression vector pT7-7) were used. Reactions included 5 min at 94°C, followed by 30 cycles of 45 s of denaturation at 94°C, 1 min of annealing at a temperature depending on the composition of the primer (annealing temperatures are specified below for individual experiments), and 1 min of synthesis at 72°C. Finally, a 10-min extension step at 72°C was applied. Differences from this standard protocol are indicated in the text.
PCR Cloning-The 29 amino acid sequence determined from the Cmb1 protein was used to isolate a PCR clone of the cmb1 ϩ gene from chromosomal S. pombe DNA. Four degenerate primers derived from the ends (primers CC-1a, CC-1b, and CC-3) or from a middle part (primer CC-2) of this sequence were used for PCR: primer CC-1a, 5Ј-CTCGGA-TCCAA ( Primers CC-1a, CC-1b, and CC-3 contain a BamHI restriction site and three additional bases at their 5Ј-ends. A PCR product of the expected size was obtained with primer pair CC-1b/CC-3 but not with CC-1a/ CC-3. Amplification was performed by three initial cycles with an annealing temperature of 40°C, followed by 30 cycles with an annealing temperature of 65°C. PCR products were separated on 1.5% agarose gels. A band of the expected size (102 bp) was eluted from the gel (17) and used for reamplification with primers CC-1b and CC-3 and for PCR with primers CC-1b and CC-2. This reaction (annealing at 47°C) produced a 67-bp fragment, indicating that the 102-bp fragment contained the sequence of interest. The reamplified 102-bp DNA fragment was digested with BamHI, cloned into pUC18, and sequenced. One clone (pOL50) was used for further screening.
Library Screening-The pOL50-insert was random-prime-labeled with 32 P (Ready to Go DNA labeling kit, Amersham Pharmacia Biotech) and used to screen a plasmid-based library of S. pombe (18) by colony hybridization. From this library several clones with a truncated open reading frame (ORF) of cmb1 ϩ were identified. A region containing 264 bp of the ORF was amplified by PCR with primer CC-6 (5Ј-CTGTTT-GACGCAATGCCTCC-3Ј) and primer CC-7 (5Ј-GATCTCATCGGCAGT-CAAAG-3Ј) (annealing at 54°C). The PCR product was radiolabeled with 32 P and hybridized to ordered cosmid and P1 libraries (19,20). Two cosmids and four P1 clones were obtained which all map on chromosome I between ras1 ϩ and cdc3 ϩ (19). From one cosmid (ICRFc60H037), three fragments from the region of interest were subcloned in pUC18 (pOL54 to pOL59).
Sequence Analysis-Double-stranded plasmid-DNA (pOL54 to pOL59) was sequenced by the dideoxy method using a sequencing kit (United States Biochemical Corp.). The cmb1 ϩ ORF was determined from both strands with primers derived from pUC18 and with primers derived from the inserts (not shown). Sequences were analyzed with the Wisconsin program package, version 9 (Genetics Computer Group, Madison, WI).
Overexpression of Cmb1 in E. coli and Purification-The full-length cmb1 gene was amplified by PCR (annealing at 54°C) using Pfu polymerase (Stratagene), primer EC1 (5Ј-CTTGCTAGCCATATGCGTCTGT-TTGACGCAATG-3Ј), and primer EC2 (5Ј-GTAGGATCCTCATCATCG-AAATCCGGCTTC-3Ј), derived from the 5Ј-and 3Ј-ends of the coding region, respectively. A truncated cmb1 gene missing the 5Ј nucleotide sequence coding for the amino acids 2-41 was amplified with primer EC3 (5Ј-CTTCATATGGAAAAGAATGGATTACAGAAG-3Ј) and primer EC2 (annealing at 55°C). Primers EC1 and EC3 contain a NdeI restriction site, and primer EC2 contains a BamHI restriction site. The PCR products were digested with NdeI/BamHI and then ligated with digested pT7-7. The ligation mixtures were transformed into E. coli strain XL1blue, and plasmids containing correct inserts were checked for mutations by sequencing. For expression of the protein, the plasmids were transformed into E. coli strain BL21/DE3. Stationary phase cultures of the expression strains were inoculated 1:40 in fresh LB medium containing 100 g/ml ampicillin and incubated at 37°C until an A 600 of about 0.7 was reached. Isopropylthio-␤-D-galactoside was added to a final concentration of 1 mM. After 5 h of induction, cells were harvested by 10 min of centrifugation at 8300 ϫ g and suspended in Buffer A. After brief freezing in liquid nitrogen, cells were stored at Ϫ70°C until further use. To prepare crude protein extracts containing Cmb1 or Cmb1⌬41, Buffer B150 was added to the thawed cells ( 1 ⁄3 of the total volume). Cells were disrupted by sonication on ice (10 times for 30 s, with at least 1 min cooling on ice between each round). After 20 min of centrifugation at 10,000 ϫ g, the supernatant was removed (crude protein extracts, fraction I).
The strain BL21/DE3 containing pT7-7 with the truncated cmb1 gene was used for purification of the respective protein, Cmb1⌬41. Fraction I (137 mg of protein in 28 ml) was obtained from a 750-ml E. coli culture and loaded at a flow rate of 19 ml/h on a Heparin-agarose column (type I, Sigma, 1.8 cm 2 ϫ 11 cm) previously equilibrated with Buffer B150. Bound proteins were eluted by 20 ml of Buffer B300 followed by a 100-ml gradient from 300 to 800 mM NaCl. Cmb1⌬41 peaked at 580 mM. The purest fractions were collected and dialyzed in B150 to give fraction II (2.8 mg of protein in 7 ml). Fraction II was loaded at a flow rate of 8 ml/h on a P11 phosphocellulose column (Whatman, 0.64 cm 2 ϫ 4.7 cm). Bound proteins were eluted with 10 ml of B350 (Buffer B containing 350 mM NaCl), followed by a 60-ml gradient from 350 to 800 mM NaCl. Cmb1⌬41 eluted between 400 and 550 mM and peaked at 480 mM. At this step, a few faint bands of other proteins were detected in early fractions (around 400 mM) but not in the other fractions containing Cmb1⌬41. However, an additional purification step was included with later fractions. These fractions were pooled and the NaCl concentration was adjusted to 300 mM by addition of Buffer B0 (same as Buffer B150 but without NaCl). The resulting fraction III (1.9 mg of protein in 7.3 ml) was loaded at a flow rate of 7 ml/h on a double-stranded DNA-cellulose column (United States Biochemical Corp., 0.64 cm 2 ϫ 4.7 cm), equilibrated with B300. After washing, a 40-ml gradient from 300 to 600 mM NaCl was applied. Cmb1⌬41 peaked at 400 mM. No other proteins could be detected in a silver-stained SDS-polyacrylamide gel. Cmb1⌬41 was then dialyzed against Buffer S and stored in aliquots at Ϫ20°C (fraction IV, 1.8 mg of protein in 7 ml).
Physiological Tests-To test sensitivity of S. pombe cells to various chemicals, 1 ϫ 10 7 cells derived from stationary phase cultures were incubated with shaking in 2 ml of 0.25% yeast extract; 1.5% glucose, adenine, histidine, and uracil (each 0.005%); and varying concentrations of cisplatin (0 -0.8 mM), transplatin (0 -3 mM), or MNNG (0 -400 g/ml). Cells were incubated with either cisplatin or transplatin for 90 min or with MNNG for 30 min. The cells were harvested, washed with 5 ml of 0.85% NaCl, pelleted again, and resuspended in 0.85% NaCl. Appropriate dilutions were plated on YEA (0.5% yeast extract; 3% glucose; and 1.5% agar supplemented with adenine, histidine and uracil (each 0.01%)). Plates were incubated for 3 days at 30°C before scoring colonies. The experiments were carried out six times in the case of cisplatin, two times with transplatin, and six times with MNNG.

RESULTS
Partial Purification of Cmb1 from S. pombe-An activity binding to cytosine-containing single-base mismatches was previously identified in S. pombe crude extracts (16). We enriched this activity, Cmb1, from extracts of an S. pombe wild type strain (fraction I) by ammonium sulfate precipitation (fraction II), followed by chromatography through DEAE-cellulose (fraction III), double-stranded DNA-cellulose (fraction IV), Heparin-agarose (fraction V), and MonoQ columns (fraction VI) (see under "Experimental Procedures"). Analysis of the fractions from the last purification step by a band-shift assay revealed a strong enrichment of the mismatch-specific activity in fractions 30 and 31 (Fig. 1A). The intensity of the band shift correlated well with the intensity of a 22-kDa band visualized in a SDS-polyacrylamide gel by Coomassie Blue staining (Fig.  1B). The purest fractions also contained a second dominant M indicates size marker proteins (Bio-Rad). C, mismatch binding specificity of partially purified Cmb1. MonoQ fraction 29, containing partially purified Cmb1, was tested using a 120-fold excess of unlabeled homoduplex as competitor. The NaCl concentration in the reaction was 80 mM. An activity was detected that showed high affinity to all cytosine-containing mismatches and T/T, but no or only weak binding to other mismatches and homoduplex.
protein of about 65 kDa and a few other proteins at low concentrations. These fractions were tested for their mismatchbinding ability. We found strong affinity to C/A, T/C, and C/⌬ and some binding to C/C and T/T. No or weak binding was observed to homoduplex and to substrates with a T/G, A/A, G/G, or G/A mismatch (Fig. 1C).
Cloning of the cmb1 ϩ Gene-400 pmol of the 22-kDa Cmb1 protein were used for N-terminal sequencing and resulted in the determination of 29 amino acids ( Fig. 2A). Four degenerate primers derived from the peptide sequence of Cmb1 were used for PCR ( Fig. 2A). A band of the expected size, obtained with primer pair CC-1b/CC-3, was eluted from the gel and subjected to PCR with primer pair CC-1b/CC-2. A single 67-bp band was produced, indicating that the 102-bp fragment contained the sequence of interest. Sequencing of the 102-bp fragment from five clones revealed inserts with the coding capability for the amino acid sequence determined from the Cmb1 peptide. The insert of one clone (pOL50) was used to screen a plasmid-based S. pombe library (18). Several clones were identified, but all contained only part of the open reading frame of cmb1 ϩ at one end of the inserts (Fig. 2B). This truncated ORF (270 bp) contained 126 bp upstream and 61 bp downstream of the sequence previously determined from the PCR clones. A PCR product of this ORF was used as hybridization probe for ordered cosmid and P1 libraries (19,20). Two cosmids and four P1 clones were obtained that all map on chromosome I between ras1 ϩ and cdc3 ϩ (19). From one cosmid (ICRFc60H037), three fragments from the region of interest were subcloned in pUC18 (Fig. 2C). By sequencing the 3Ј-flanking region of cmb1 ϩ , the arg3 ϩ gene (22) was found to be located approximately 600 bp downstream from the ORF of cmb1 ϩ , which is consistent with the mapping of cmb1 ϩ between ras1 ϩ and cdc3 ϩ . Doublestranded sequencing of cmb1 ϩ revealed an intronless ORF of 669 bp encoding a 26-kDa protein. The partially purified Cmb1 has a molecular mass of 22 kDa. Judged from its N-terminal amino acid sequence used for cloning, it represents a proteolytic product without the N-terminal 41 amino acids encoded in the ORF of cmb1 ϩ .
Cmb1 Is a HMG Domain Protein-By data base searches with the deduced amino acid sequence of cmb1 ϩ , a HMG domain was identified in the C-terminal part (Fig. 3). This DNAbinding motif is common to a large family of HMG domain proteins characterized in various eukaryotes (13,23). Originally, HMG-1 and HMG-2 were found to be abundant nonhistone components of chromatin, migrating with high mobility in gels (28). A number of proteins were then identified that contain one to six regions of significant homology to regions of HMG-1 and HMG-2 (23,29). These regions were named HMG domains. Some HMG domain proteins are transcription factors that recognize specific DNA sequences, whereas some are known to bind to specific DNA structures, such as four-way junctions (30,31), intrastrand cross-links (13,15) or intermediates of V(D)J recombination (26). Interestingly, our data base searches revealed that also hPMS1, a human MutL homolog (24), contains a HMG domain.
A hydropathy profile of the HMG domain of Cmb1 revealed that it is extremely hydrophilic, a feature common to other HMG domains (26,32). A few amino acids are conserved among sequence-specific HMG domain proteins, whereas others are present only in non-sequence-specific proteins (23). In this respect, Cmb1 belongs to the subfamily of non-sequence-specific HMG domain proteins (Fig. 3).
A Second C/C Binding Activity, Cmb2, Is Present in the cmb1 Mutant Cells-The cmb1::his3 ϩ disruption strain OL142 was constructed as described under "Experimental Procedures." Because the haploid mutant is viable, cmb1 ϩ has no essential function. We tested mismatch binding with crude protein extracts from the cmb1 disruption mutant. In contrast to extracts from wild type, in which strong binding to all cytosine-containing mispairs was detected (Fig. 4, lanes 2-5), no binding to C/A, FIG. 2. Cloning of the cmb1 ؉ gene. A, 29 amino acids were determined from purified Cmb1 by N-terminal peptide sequencing. Four primers derived from the amino acid sequence were used for PCR (see under "Experimental Procedures"). B, a PCR fragment obtained with primers CC-1b/CC-3 was used to screen a plasmid-based library (18). Several clones containing a truncated ORF of cmb1 ϩ were identified. The putative start codon (ATG) is indicated. The insert end, a Sau3A restriction site, is marked with GATC. The amplification product obtained with primers CC-6/CC-7 was used to identify cmb1 ϩ on ordered cosmid/P1 filters (19,20). C, DNA fragments were isolated from cosmid ICRFc60H037 and cloned into pUC18 to give plasmids pOL54 to pOL59. Identical fragments cloned in different orientations are shown below the chromosomal map (e.g. pOL54/pOL55). The restriction map was constructed by Southern hybridization with chromosomal DNA, with cosmid DNA, and by restriction analysis of the plasmid clones pOL54 -pOL59. The left end, containing a NsiI and SalI site, is not drawn to scale. Abbreviations for restriction sites: Bg, BglII; EV, EcoRV; N, NsiI; Sa, SalI; and Sc, ScaI. During sequencing of cmb1 ϩ , the arg3 ϩ gene was identified about 600 bp downstream. Open reading frames are indicated by black arrows. C/T, or C/⌬ mismatches was observed with the mutant extracts (lanes 8 -10). Surprisingly, weak binding to C/C persisted (lane 7). Thus a second activity (termed Cmb2) for binding to C/C exists in S. pombe. Cmb2 does not bind to other mismatches, whereas Cmb1 binds specifically to C/C, C/A, T/C, C/⌬, and T/T (Fig. 1C).
Expression of Cmb1 in E. coli and Binding to Mismatches and Small DNA Loops-The full-length cmb1 ORF and a truncation were expressed in E. coli. The latter version, encoding a 22-kDa protein, Cmb1⌬41, corresponds to the proteolytic peptide enriched from S. pombe cells. Crude extracts of both overexpression strains were tested for mismatch binding. Both Cmb1-and Cmb1⌬41-containing extracts exhibited an activity that bound to the mismatches C/C, C/A, C/T, C/⌬, and T/T but only weakly to other mismatches and to homoduplex (data not shown). The complex was not observed when uninduced strains or a strain containing the empty expression vector were used. No differences in gel migration and mismatch specificity between Cmb1, Cmb1⌬41, and the proteolytic Cmb1 peptide enriched from S. pombe were found (data not shown). Noticeably, overexpression of Cmb1⌬41 was about 20 times higher than overexpression of the full-length protein.
For further binding tests, the recombinant Cmb1⌬41 protein was extensively purified (see under "Experimental Procedures"). Cmb1⌬41 was first tested for its binding ability to substrates containing base-base mismatches, single unpaired nucleotides and small loops with two or four nucleotide insertions (Fig. 5). A strong protein-DNA complex was formed when C/C or C/⌬ mismatches were present (Fig. 5, lanes 2 and 3). Weak or no interaction was found with T/⌬-, G/⌬-, A/⌬-, C 2 /⌬-, C 4 /⌬-, and T 2 /⌬-containing substrates (lanes 4 -9). Thus, one unpaired cytosine is well recognized, but not an unpaired thymine, guanine, or adenine. On the other hand, recognition is not dependent on the presence of mispaired or unpaired cytosines per se, as two or four cytosines in a loop were only weakly bound by Cmb1⌬41 (Fig. 5, lanes 7 and 8). Cmb1 Binds to 1,2 GpG Intrastrand Cross-links When Cytosines Are Opposite the Modified Bases-Several HMG domain proteins are known to bind to 1,2 GpG intrastrand cross-links (13,15). We were therefore interested to learn whether Cmb1 also shows specific binding to this type of lesion (Fig. 6A). Single-stranded oligonucleotides, either platinated or not, were poor substrates (Fig. 6B, lanes 1 and 2). A complex was formed with unplatinated homoduplex (lane 3), which might indicate a preference of Cmb1 for cytosine-rich sequences. Nevertheless, binding to the cisplatinated substrate cisplatin-GpG/CpC was FIG. 5. Substrate specificity of Cmb1 protein purified from E. coli. Band-shift assays were performed to test the specificity of Cmb1 to various mismatches and small loops. The reactions contained 2.5 pmol of Cmb1⌬41 protein, 40 fmol of radiolabeled substrates as indicated, a 120-fold excess of unlabeled homoduplex as competitor, and 240 mM NaCl in a standard reaction buffer. Strong binding was only observed with the substrates C/C and C/⌬ (lanes 2 and 3).  5) and from the cmb1 mutant (lanes 6 -10). A 40-fold excess of unlabeled homoduplex was included in the reactions. Specific binding to C/A, C/⌬, and C/T was not detected with cmb1 extracts (lanes 8 -10). Complex formation with C/C in cmb1 extracts (lane 7) revealed the existence of an additional C/C binding protein, Cmb2 (marked by an arrow). much stronger (lane 4). Interestingly, only weak binding was detected to cisplatinated oligonucleotides, where opposite the 1,2 cross-linked guanines, other bases were present instead of the cytosines (lanes 5-8). Even the change of only one cytosine strongly affected complex formation (lane 8). Binding to the compound lesions (intrastrand cross-link combined with mismatches) was as low as, or even lower than, to homoduplex (lane 3) and to a substrate containing a G/T mismatch in the same sequence context (lane 9). No significant affinity to the trans-diamine-dichloroplatinum(II)-GpTpG/CpApC lesion was found (data not shown). This 1,3 intrastrand cross-link is produced by transplatin, an isomer of cisplatin.
S. pombe Cells Defective in cmb1 Are Sensitive to Cisplatin-To investigate the cellular role of Cmb1 in repair of lesions produced by either cisplatin or transplatin, the cytotoxicity of these drugs was tested in wild type and the cmb1 mutant. The cmb1 strain was significantly more sensitive to cisplatin than wild type. In the presence of 0.8 mM cisplatin, cell survival was about 17-fold reduced (Fig. 6C). On the other hand, when cells were treated with transplatin at concentrations between 1 and 3 mM, no difference in survival was found between wild type and the cmb1 mutant (data not shown).
Cmb1 Binds to O 6 meG:C but Not to O 6 meG/T-The cytotoxic action of MNNG is predominantly due to methylation of guanines. O 6 meG can equally pair with cytosine and thymine. Thus a persisting O 6 meG:C site in DNA is frequently changed during replication to O 6 meG/T, resulting in fixation of a mutation (33). We tested binding of Cmb1 to both types of lesions. When a cytosine was present opposite the modified guanine, a complex was formed that was of comparable strength to that seen with the substrate containing a C/C mismatch (Fig. 7A,  lanes 2 and 3). In contrast, binding to O 6 meG/T was as weak as to G:C homoduplex DNA and to G/T containing substrate (lanes 1, 4, and 5). When S. pombe cells, defective in cmb1, were treated with different amounts of MNNG, no difference from wild type cells was found in survival rates (Fig. 7B).

DISCUSSION
Cmb1 Is a HMG Domain Protein-We have previously described the identification of an activity in S. pombe crude extracts that efficiently binds to the cytosine-containing mismatches C/C, C/A, T/C, and C/⌬ and weakly binds to T/T (16). We now report partial purification of the 22-kDa activity, Cmb1, from S. pombe wild type cells and cloning of the gene. The ORF of cmb1 ϩ encodes a 26-kDa protein with 223 amino acids, indicating that a proteolytic peptide was enriched from S. pombe. However, we found no differences in mismatch specificity between the full-length 26-kDa Cmb1 and the 22-kDa peptide when overproduced in E. coli.
The Cmb1 protein contains an HMG domain at the C-terminal end (Fig. 3). Various HMG domain proteins are known to recognize specific DNA structures, such as V(D)J recombination signals (26), cruciform DNA (30,31), and intrastrand cross-links (13,15). These features suggest a role in DNA recombination and DNA repair. Interestingly, hPMS1 also contains an HMG domain (Fig. 3). hPMS1 is one of the three human MutL homologues which, if defective, causes a predis-  5-8). Also, binding to a G/T mismatch in the same sequence context (lane 9) was as low as binding to homoduplex (lane 3). Note that the cis-DDP-GpG containing substrates run more slowly in the gel than the unmodified substrates. No specific binding to the transplatinated substrate was detected (data not shown). C, cytotoxic effect of cisplatin on cell survival of S. pombe wild type (q) and cmb1 (E) strains. The experiment was carried out six times (see under "Experimental Procedures"). It should be noted that we found some variations in cell survival between different experiments. However, in all tests, cmb1 was more sensitive to cisplatin than was wild type. Points are the average of three independent experiments; bars represent S.D. position to the colon cancer syndrome hereditary nonpolyposis colon cancer (7,24).
Cmb1 Binds to Cytosine-containing Mismatches-The purification of Cmb1 from S. pombe cells resulted in highly active fractions, which predominantly contained the Cmb1 protein but also a few other proteins (Fig. 1). Although the strength of mismatch binding correlated well with the concentration of the Cmb1 protein in the various fractions from the last purification step, it could not be ruled out that mismatch binding was provided by another protein in the fractions. Efforts to separate Cmb1 from the remaining S. pombe proteins by additional chromatographic steps failed (data not shown). We therefore decided to test substrate binding with recombinant Cmb1 protein purified from E. coli.
In a first step, the full-length cmb1 ϩ ORF, as well as a truncated version, Cmb1⌬41, which is missing the first 41 amino acids, like the proteolytic peptide purified from yeast, were overexpressed in E. coli and tested for mismatch binding. No difference in specificity was found between the two recombinant Cmb1 peptides and Cmb1 enriched from S. pombe. In the E. coli expression strain containing the empty pT7-7 vector, the specific mismatch-binding complex was not detected. It is concluded that Cmb1 is able to bind to mismatches without the need of other S. pombe proteins.
We then purified Cmb1⌬41 from E. coli cells and tested binding to mismatches and small DNA loops. Cmb1 recognizes C/C, C/A, C/T, T/T, and a C/⌬ insertion but shows no or weak affinity to other types of mismatches and insertions (Fig. 5 and data not shown). Although one unpaired cytosine is strongly bound by Cmb1, two or four unpaired cytosines are poor substrates (Fig. 5). Thus, cytosine-containing base mismatches (and T/T) represent a DNA structure irregularity that is recognized by a new type of HMG domain protein. This binding pattern is different from E. coli MutS and eukaryotic MutS␣, which are able to bind to almost all types of base mismatches (34 -38). The difference is most pronounced for C/C mismatches, which are a poor substrate for binding and repair by E. coli and yeast MutS proteins. A human mismatch-binding activity was identified that binds to A/C, C/T, and T/T mismatches without requirement of the MutS homologue hMSH2 (39). This specificity is similar to that of Cmb1 (Fig. 1) except that C/C mismatches were poorly bound by the human activity.
A Second C/C Binding Activity, Cmb2, Exists in S. pombe-A cmb1 gene disruption strain was tested for the presence of the mismatch-binding activity. Specific binding to C/A, C/T, and C/⌬ was abolished, but some binding to C/C remained (Fig. 4). We conclude that a second activity, Cmb2, exists in S. pombe that recognizes C/C but not other mismatches. Thus, C/C mismatches are recognized redundantly by Cmb1 and Cmb2.
Cmb1 Specifically Binds to 1,2 GpG Intrastrand Cross-links and to O 6 meG When Cytosines Are Opposite the Damaged Bases-1,2 GpG intrastrand cross-links, produced by cisplatin, are substrates of NER proteins, of MutS-type proteins, and of HMG domain proteins (1,10,13,15). We found that this lesion is also bound by the HMG domain protein Cmb1. Our data further suggest that binding by Cmb1 requires the cytosines opposite the cisplatinated guanines. Consistently, O 6 meG:C but not O 6 meG/T is recognized by Cmb1. Thus, like other HMG domain proteins, MutS-type proteins, and the NER complex, Cmb1 recognizes a series of DNA distortions. It is not well understood how the various recognition proteins discriminate between normally paired bases and the different types of altered DNA structures. In the case of Cmb1, the presence of cytosines in mismatches as well as in DNA lesions might be crucial for recognition. hMutS␣ and some HMG domain proteins are thought to confer cellular sensitivity in response to the drug cisplatin. Whereas binding by the HMG domain proteins HMG-1 and Ixr1 is proposed to shield cisplatin adducts from repair (14,15), it is likely that binding of Cmb1 facilitates repair, as indicated by enhanced sensitivity of the cmb1 mutant to cisplatin. Transplatin is an isomer of cisplatin that is inactive as a chemotherapeutic drug, and cross-links produced by transplatin are not bound by HMG domain proteins (13). Consistently, an ixr1 mutant treated with transplatin showed no difference from wild type in cell survival (15). Similarly, Cmb1 does not bind to a 1,3 GpTpG intrastrand cross-link produced by transplatin, and in comparison to wild type, the cmb1 mutant is not more sensitive to transplatin (data not shown).
Although Cmb1 specifically binds to an O 6 meG:C lesion (Fig.  7A), no significant difference between wild type and the cmb1 mutant was found when cells were treated with MNNG (Fig.  7B). This may be due to efficient repair of O 6 meG by an O 6methylguanine-DNA methyltransferase. However, more experiments should be done to clarify whether Cmb1 has a role in removal of O 6 meG from DNA. Genetic experiments with cmb1 disruption strains revealed changes of the pattern of meiotic mismatch repair but no alteration in mitotic mismatch repair. 2 The observed effects on meiotic mismatch repair were rather weak. This may be due to redundancy of Cmb1 with other proteins of similar function. One candidate is Cmb2, because it is able to bind to C/C mismatches (Fig. 4). Cmb1 may act as an 2 O. Fleck and J. Kohli, unpublished data. accessory protein binding to chemically modified DNA and mismatches and thereby marking such sites for repair. A candidate repair pathway is the NER system, which can restore bases damaged by cisplatin and MNNG (1,3). Our recent data show that NER genes of S. pombe are involved in mismatch repair in a MutS/L-independent pathway. 2 The key characteristic is the repair of C/C mismatches that cannot be repaired by a MutS/L-type system (40,41).