Structure of Yeast Kinetochore Ndc10 DNA-binding Domain Reveals Unexpected Evolutionary Relationship to Tyrosine Recombinases*

Background: Ndc10 is a DNA-binding protein in yeast that is responsible for centromere formation. Results: The structure of the protein unexpectedly shows that it contains a type IB topoisomerase/λ-integrase fold. Conclusion: The structure demonstrates how the IB/Int fold has been adapted to a new cellular role. Significance: The structure is the first example of the IB/Int fold being used in a non-catalytic protein. We have solved the x-ray structure of the N-terminal half of the yeast kinetochore protein Ndc10 at 1.9 Å resolution. This essential protein is a key constituent of the budding yeast centromere and is essential for the recruitment of the centromeric nucleosome and establishment of the kinetochore. The fold of the protein shows unexpected similarities to the tyrosine recombinase/λ-integrase family of proteins, most notably Cre, with some variation in the relative position of the subdomains. This finding offers new insights into kinetochore evolution and the adaptation of a well studied protein fold to a novel role. By comparison with tyrosine recombinases and mutagenesis studies, we have been able to define some of the key DNA-binding motifs.

The kinetochore is the large, multiprotein assembly that serves to connect condensed sister chromatids to the mitotic spindle (1). Proteins in the kinetochore are responsible for several functions, including transmission of mechanical tension, mitotic checkpoint control, and modulation of microtubule dynamics. The kinetochore binds to a chromosomal locus known as the centromere, which in budding yeast consists of a short ϳ125-bp DNA sequence that is both necessary and sufficient to support kinetochore formation (2). This sequence may be subdivided into three sections, CDEI, CDEII, and CDEIII (3). CDEI binds the non-essential Cbf1 protein (4). CDEII is thought to be involved in interactions with the Cse4-containing centromeric nucleosome (5), whereas CDEIII is bound by the CBF3 complex (6) (see Fig. 1a). This ϳ445-kDa complex contains four essential proteins, Ndc10, Cep3, Ctf13, and Skp1. Ctf13 and Skp1 appear to fulfill regulatory roles in the assembly of the complex (7-10), whereas Cep3 and Ndc10 can directly bind DNA. Cep3 has a Gal4-type DNA-binding domain that contacts a conserved CCG triplet in CDEIII, whereas Ndc10 has been proposed to bind the centromere both in the context of the intact CBF3 complex and independently at the CDEII element (11,12). Recent work suggests that the primary function of CBF3/Ndc10 is to recruit the centromere-specific Cse4 nucleosome to DNA after S-phase, possibly by interactions with the histone chaperone Scm3 (13). Ndc10 has also been observed to relocate to the spindle midzone during anaphase, via interactions with the yeast survivin homolog, Bir1, indicating that it also plays an important role in coordination of cell division (14 -17). Currently, the only proteins in the CBF3 complex for which we have structural data are Cep3 (12,18) and Skp1 (19). In this study, we describe the x-ray structure of a large fragment of Ndc10, responsible for DNA binding.

EXPERIMENTAL PROCEDURES
Molecular Biology-A DNA fragment encoding amino acid residues 1-551 (Ndc10 N-terminal domain (NTD)) 2 of the Saccharomyces cerevisiae Ndc10 (CBF3A) gene was generated by PCR and then subcloned into a pET28a plasmid that also encodes an N-terminal SUMO tag and SUMO protease site. Site-directed mutagenesis was carried out using standard PCRbased techniques.
Protein Expression and Purification-BL21-RIL (DE3)-competent cells (Agilent Technologies) were transformed with the Ndc10 NTD expression vector and grown in LB broth to A 600 of 0.6 at 37°C. Protein expression was induced overnight at 16°C by the addition of isopropyl ␤-D-1-thiogalactopyranoside to a final concentration of 1 mM. Cells were harvested by centrifugation, resuspended in buffer A (60 mM Hepes, pH 7.5, 300 mM NaCl, 40 mM imidazole, 5% glycerol, 10 mM benzamidine, 1 mM PMSF, 10 mM ␤-mercaptoethanol), and immediately lysed by sonication in an ice bath. The cell lysate was centrifuged at 33,000 ϫ g, and the clarified supernatant was loaded onto a pre-equilibrated His-Trap column (HisTrap FF, GE Healthcare) in buffer A. The SUMO-Ndc10 NTD protein was eluted in buffer A by a 40 -500 mM imidazole gradient and immediately incubated overnight with a SUMO protease. Following removal * This work was supported by a grant from Cancer Research UK. of the tag, the protein was further purified by anion exchange chromatography (MonoQ 10/100GL, GE Healthcare) followed by size exclusion chromatography (HiLoad 16/60 Superdex 200, GE Healthcare). The purified protein was stored in buffer B (20 mM Hepes, 100 mM NaCl, 1 mM DTT). The protein purity and size was assessed by SDS-PAGE. Selenomethionine-labeled Ndc10 NTD was produced by expression in a methionine auxotroph in LeMaster minimal medium supplemented with specific amino acids as well as selenomethionine and purified as described above.
DNA Binding Assay-Ndc10 NTD was prepared at a range of concentrations in 20 mM Hepes, pH 7.5, 50 mM NaCl, 1 mM DTT and added to 300 ng of a 1.2-kb random sequence DNA fragment. The protein/DNA mixes were incubated for 10 min at 4°C and then electrophoresed on a 0.8% agarose gel in Trisborate-EDTA buffer. DNA was visualized using ethidium bromide under UV light.
Analytical Ultracentrifugation-Solution masses of the Ndc10 NTD and full-length Ndc10 were determined by sedimentation velocity studies. Proteins were concentrated to between 0.5 and 1 mg/ml in a buffer containing 10 mM Hepes, pH 7.5, 150 mM NaCl, 1 mM DTT. Samples were analyzed using a Beckman XL-I analytical centrifuge running at 25,000 rpm (NTD) or 20,000 rpm (full-length) in an An-60Ti rotor. Data were analyzed using the program DCDTϩ.
Protein Crystallization and Structure Determination-Purified Ndc10 NTD was exchanged against a buffer solution containing 20 mM Hepes (pH 7.5), 50 mM NaCl, 1 mM DTT and concentrated to 15 mg/ml. The protein was crystallized at 16°C by a hanging-drop vapor diffusion method where the sample and the crystallization buffer (200 mM potassium bromide, 100 mM sodium acetate (pH 5.5), and 10% PEG 4000) were mixed in a 1:1 (v/v) ratio. Rectangular crystals appeared overnight and were cryo-protected with 25% PEG 400 before being flash-frozen in liquid nitrogen. Diffraction data on native and selenomethionine proteins were collected at the Diamond synchrotron beamline I24. Crystals were of space group P2 1 2 1 2 1 with cell dimensions a ϭ 55.7 Å, b ϭ 87.8 Å, c ϭ 104.6 Å. All programs were from the CCP4 suite unless otherwise stated (20). Diffraction data were indexed, integrated, and scaled using XDS (21) and SCALA (22). The structure was solved using single-wavelength anomalous dispersion (SAD) phasing as implemented in the AutoSHARP system (23), and an initial model was autotraced using BUCCANEER (24). Subsequent rounds of rebuilding in Coot (25) were iterated with refinement using REFMAC (26). Later stages of refinement were carried out against the native data to an R-factor of 19.4% (R free ϭ 23.2%). The final structural model was validated using Coot and Molprobity (27) tools. In the Ramachandran plot, 98.9 and 1.1% of residues are in allowed and outlier regions, respectively. Structure figures were prepared with CCP4mg (28).

RESULTS
Structure Determination-Like many kinetochore proteins, the sequence of Ndc10 contains no recognizable structural motifs and does not appear to be related to any other proteins in the sequence database (29,30). The 111.9-kDa protein has 956 residues and is predicted to contain a large number of intrinsi-cally disordered sequences, particularly in the C-terminal. During purifications of the full-length protein, we identified a 64.6-kDa proteolytically derived fragment of Ndc10, which was determined to contain residues 1-551 of the N-terminal. Both the full-length protein and this fragment were found to be competent for DNA binding against a random (non-centromeric) DNA sequence (Fig. 1b). We crystallized this fragment and determined the x-ray structure at 1.9 Å resolution using the SAD phasing technique. The final model was refined to an R-factor of 19.4% (R free ϭ 23.2%) and has excellent geometry ( Table 1). No clear electron density was seen for residues 1-43 or 538 -551, and loops between residues 65-72, 97-108, 168 -  182, 257-264, and 414 -423 were also disordered and not included in the final model. Overall Features-The structure of the protein is shown as a ribbon diagram in Fig. 2a. The overall fold may be divided into two ␣-helical lobes, which sandwich a central ␤-sheet formed by residues 239 -370. The N-lobe of the protein consists of a distinct antiparallel four-helix bundle, whereas the C-lobe consists of several long loops interspersed with short helices and wraps back around the core of the protein. Analysis of the electrostatic molecular surface (Fig. 2b) shows that the convex "top" of the molecule is predominantly positively charged and is also one of the most highly conserved sections of the protein (Fig.  2c).
Oligomerization-Although Ndc10 has been proposed to exist as a dimer in the intact CBF3 complex (9), the crystal packing of the isolated NTD does not suggest any biologically relevant oligomerization. To analyze this further, we determined the molecular masses of the NTD and full-length protein by analytical ultracentrifugation. The NTD was determined to have a mass of ϳ61 kDa, whereas the full-length protein was 222 kDa. These results are consistent with the NTD being monomeric and the full-length protein being dimeric. This strongly suggests that the C terminus of the protein (residues 551-957) is solely responsible for dimerization.
Structural Homologies-Analysis of the fold of the Ndc10 NTD using the DALI server (31) unexpectedly revealed that it belongs to the type IB topoisomerase/-integrase (IB/Int) family. This large group of proteins includes viral and eukaryotic type IB topoisomerases, phage integrases, and recombinases (32)(33)(34). All these enzymes utilize a conserved nucleophilic tyrosine residue that forms a covalent enzyme-nucleic acid intermediate during cleavage of the DNA phosphodiester backbone (35). The most significant structural similarity was against Cre recombinase (DALI z-score ϭ 8.3). A superimposition of the Ndc10 NTD and the Cre recombinase bound to duplex DNA (36) is shown in Fig. 3a. The proteins may be superimposed with a C␣ r.m.s. deviation of 2.6 Å over 127 residues. As can be seen, the domain spanning residues 139 -274 in Cre, which is structurally equivalent to residues 182-354 in Ndc10, forms the key protein-DNA backbone contacts. This domain includes the aforementioned ␤-sheet and positively charged helix.
Closer analysis of the N-terminal four-helix bundle of Ndc10 revealed that it is structurally homologous to the N-terminal helical bundle of Cre but occupies a different position in the intact structure (Fig. 3, a and b). This difference is accommodated by a flexible loop that connects the domain to the rest of the protein. In Cre, this domain is required for both DNA interactions and tetramerization of the protein, suggesting that both are substantially different in Ndc10, although we cannot exclude the possibility that the domain might adopt a substantially different conformation upon DNA binding. Sequence analysis of the Ndc10 four-helix bundle shows little obvious similarities to the equivalent Cre domain, although there is weak conservation of some of the DNA-contacting basic residues. However, the surface of the Ndc10 domain is considerably less electropositive than that of Cre (data not shown), making it unclear whether it is capable of binding DNA in the same manner. Interestingly, an atomic force microscopy study on the intact CBF3 complex (37) showed that a fraction of the complexes examined bound two DNA duplexes in a non-covalent manner, giving rise to four-armed structures. These show a marked resemblance to the recombination synapses formed by Cre, which also links two duplexes to form a four-way junction. However, it is not clear whether Ndc10, when in the intact CBF3 complex, could mediate formation of such a structure or what the functional significance of this might be.
Catalytic Domain-The IB/Int family of proteins has catalytic activity, whereas none has been reported for Ndc10 or CBF3. IB/Int proteins catalyze DNA rearrangements by a phosphoryl transfer reaction. The reaction involves a nucleophilic attack on the DNA backbone by a conserved tyrosine residue to form covalent 3Ј-phosphotyrosine linkage and release the free 5Ј-hydroxyl group. The active site is composed of several conserved residues including one tyrosine, one lysine, and two arginine residues (35,38). In known IB/Int structures, the catalytic tyrosine and the basic residues responsible for transition state stabilization are located on a short helix located in the C terminus of the protein, after the core DNA-binding fold. This helix is entirely absent in Ndc10, and no spatially equivalent residues are seen. Instead, the C terminus of the NTD loops back around the main body of the protein, on the reverse face to the putative DNA-binding residues, then continues in the opposite direction to the N-terminal four-helix bundle (Fig. 3c).
DNA Binding-Superimposition of the Cre structure on Ndc10 (Fig. 3, a and d) shows that the basic patch, which includes Arg-325, Lys-327, Lys-354, Arg-355, and Arg-356 in the two conserved Ndc10 sequence motifs RGKS and YKRR, is perfectly positioned to make contact with the DNA phosphodiester backbone (Fig. 4, a and b). We prepared versions of the Ndc10 NTD with both Arg-325 and Lys-327 or Arg-355 and Arg-356 mutated to alanine and tested the DNA binding properties (Fig. 4c). The Arg-325/Lys-327 mutations totally abolished binding, whereas it was substantially reduced in the Arg-355/Arg-356 mutant. Mutating all four residues to alanine totally eliminated binding.
Phosphorylation Sites-Kinetochore assembly is tightly regulated by several kinases, notably Aurora B (Ipl1 in budding yeast) (39). Studies have suggested that Ndc10 is a target of both Ipl1 and casein kinase 2 (CK2) (40,41). Phosphorylations of multiple serine and threonine residues influence Ndc10 localization on the anaphase spindle and targeting to kinetochore. Two of these residues lie within the crystallized construct. Threonine 106 is disordered in the crystal but is situated on a solvent-exposed flexible loop that would be accessible to a kinase. However, serine 189 is buried deep inside the structure, and so it is hard to see how such a phosphorylation event might occur.

DISCUSSION
The data presented here provide new insights into the formation and evolution of an essential complex in the budding yeast kinetochore. Recent work (13) has lead to the development of a model in which the CBF3 complex is the key positional marker of the centromere. Binding of the complex to CDEIII allows recruitment of the centromeric nucleosome upon which the assembly of the rest of the kinetochore depends. We have shown that Ndc10 will bind non-centromeric sequences, and we predict that the protein-nucleic acid interactions are likely to occur through the DNA backbone. This would suggest that binding occurs in a non-sequence-specific manner. It is possible that the centromere specificity of the CBF3 complex is entirely due to the Cep3 protein, whereas Ndc10 may increase the overall affinity for DNA. It is possible that the binding of Ndc10 is important for local alteration of DNA structure, for example overwinding and/or bending that might be important for the nucleosome loading process.
The structure of the DNA-binding domain of Ndc10 provides the first example of the co-option of a tyrosine recombinase DNA-binding domain to the chromosome segregation apparatus, and as far as we are aware, is the first example of the fold being used in a non-catalytic role. Given the extremely weak sequence homology between Ndc10 and other members of the IB/Int family, it seems probable that the fold might also be utilized by other, as yet uncharacterized, proteins in unrelated pathways. It also demonstrates the remarkable reuse of protein domains from other pathways by the CBF3 complex. Ctf13, an F-box protein, and Skp1 are more commonly found as components of E3 ubiquitin ligases, again in catalytically active complexes, whereas in CBF3, they appear to fulfill structural functions. The Cep3 protein has made use of a sequence-specific DNA-binding domain from the Gal4 family, albeit in a very different way from the canonical transcription factor. This structure of Ndc10 shows the incorporation of yet another motif, more closely associated with DNA topology and transposition, and demonstrates the versatility of the fold. It has been proposed that the point centromere of Saccharomyces and  FEBRUARY 10, 2012 • VOLUME 287 • NUMBER 7 related yeasts is a relatively recent evolutionary adaptation (42), and this scavenging of pre-existing protein motifs to build the key DNA-binding component of the centromere demonstrates how such adaptation might be accelerated.

Structure of Ndc10 Protein
The structure also allows us to make predictions about the likely way that DNA is bound. The conservation of basic residues involved in DNA backbone interactions suggests that Ndc10 binds duplex DNA in a manner similar to other members of the IB/Int family. However, the possible oligomerization of Ndc10 within the CBF3 complex and DNA interactions with other CBF3 proteins, notably Cep3, may well further alter the DNA structure. Our data suggest that the dimerization of Ndc10 is mediated through the C terminus of the protein, although it is unclear whether this domain is also involved in the formation of the CBF3 complex.
While this manuscript was in revision, the structure of the equivalent domain of Ndc10 from Kluyveromyces lactis was solved, bound to a 30-mer DNA duplex (43). The protein-DNA interactions seen in the complex were broadly similar to those inferred from our model based on the Cre-DNA structure. Interestingly, the authors described a putative dimerization interface in the N-terminal domain, based on crystal packing analysis. We see no similar dimerization in our structure, consistent with our hydrodynamic studies. It may be that the C terminus-induced dimerization is necessary to stabilize this interface, or alternatively, it might represent a species-specific adaptation.
Further biochemical and structural studies should hopefully resolve some of these issues.