Structural basis of human PR/SET domain 9 (PRDM9) allele C–specific recognition of its cognate DNA sequence

PRDM9 is the only mammalian gene that has been associated with speciation. The PR/SET domain 9 (PRDM9) protein is a major determinant of meiotic recombination hot spots and acts through sequence-specific DNA binding via its C2H2 zinc finger (ZF) tandem array, which is highly polymorphic within and between species. The most common human variant, PRDM9 allele A (PRDM9a), contains 13 fingers (ZF1–13). Allele C (PRDM9c) is the second-most common among African populations and differs from PRDM9a by an arginine-to-serine change (R764S) in ZF9 and by replacement of ZF11 with two other fingers, yielding 14 fingers in PRDM9c. Here we co-crystallized the six-finger fragment ZF8–13 of PRDM9c, in complex with an oligonucleotide representing a known PRDM9c-specific hot spot sequence, and compared the structure with that of a characterized PRDM9a-specific complex. There are three major differences. First, Ser764 in ZF9 allows PRDM9c to accommodate a variable base, whereas PRDM9a Arg764 recognizes a conserved guanine. Second, the two-finger expansion of ZF11 allows PRDM9c to recognize three-base-pair-longer sequences. A tryptophan in the additional ZF interacts with a conserved thymine methyl group. Third, an Arg–Asp dipeptide immediately preceding the ZF helix, conserved in two PRDM9a fingers and three PRDM9c fingers, permits adaptability to variations from a C:G base pair (G–Arg interaction) to a G:C base pair (C–Asp interaction). This Arg–Asp conformational switch allows identical ZF modules to recognize different sequences. Our findings illuminate the molecular mechanisms for flexible and conserved binding of human PRDM9 alleles to their cognate DNA sequences.

More than 40 allelic variants of human PRDM9 have been documented, which display marked differences in recombination profile and crossover frequency (12)(13)(14)(15)(16). Allele A of human PRDM9 (PRDM9a) is the most common form of PRDM9, found in ϳ86% of European and ϳ50% of African populations (13). PRDM9 allele C is the second most common allele in African populations, with a frequency of 12.8% (13,14). In A/C heterozygotes, PRDM9c-specific hot spots are more frequent (56% versus 44%) and more active than PRDM9a-specific hot spots (4), suggesting a partial dominance of allele C. Alleles C and A differ by both an arginine-to-serine change (R764S) in ZF9 and by replacement of ZF11 with two other fingers, resulting in an extra finger (12,13,17) (Fig. 1, A and B). In addition, allele C has one more difference from allele A, a serine-to-threonine substitution in ZF6 (4,12), although this change does not affect hot spot recognition (see "Discussion").
We previously expressed and purified ZF8 -12 of PRDM9a and ZF8 -13 of PRDM9c and showed that each allele binds with the highest affinity to the respective allele-specific sequence (18), consistent with PRDM9a and PRDM9c acting at entirely different hot spots (4). In addition, PRDM9c has Ͼ10-fold higher affinity for the C-specific sequence than the affinity of PRDM9a for the A-specific sequence (18), in agreement with the observation that PRDM9c is partially dominant over PRDM9a (4). This was quite surprising, considering that, of five fingers in PRDM9a or six fingers in PRDM9c, the two proteins share three identical ZFs and one nearly identical ZF with a single amino acid difference. In addition, we studied three additional alleles (L9/L24, L13, and L20) with single amino acid differences from allele A and demonstrated that these alleles possess altered DNA-binding affinity and/or altered DNA sequence specificity (18). Here, to explore the surprisingly different specificities of these PRDM9 proteins, we present the structure of PRDM9c ZF8 -13 in complex with allele-specific DNA and compare it with that of PRDM9a (18). In addition, we discuss unique features of PRDM9 that have rarely (if ever) been described for canonical ZF-DNA complexes.
In conventional C2H2 ZF proteins, each finger comprises two ␤ strands and a helix (19). Characteristically, two histidines in the helix together with one cysteine in each ␤ strand coordinate a zinc ion, forming a tetrahedral C2-Zn-H2 structural unit that confers rigidity to fingers. The amino acids occupying four key "canonical" positions of the helix, namely Ϫ1, 2, 3, and 6, specify a DNA target sequence of three or four adjacent DNA base pairs (20,21) (Fig. 1B, bottom). This structure-based numbering scheme refers to the position immediately before the helix (position Ϫ1) and positions within the helix (positions 2, 3, and 6). To reduce possible ambiguity, we use the first zinccoordination His in each finger as reference position 0, with residues before this, at sequence positions Ϫ1 (blue), Ϫ4 (red), Ϫ5 (black), and Ϫ7 (green), corresponding to the 6, 3, 2, and Ϫ1 of the structure-based numbering (compare top and bottom of Fig. 1B). The new numbering scheme (residues at positions Ϫ1, Ϫ4, and Ϫ7) corresponds to the 5Ј-middle-3Ј of each DNA triplet element.
PRDM9 is unique (relative to other ZF proteins) in that the ZFs are highly repetitive and resulted from sequence duplications (Fig. 1B). This feature provides an opportunity to study identical ZFs (ZF10 and ZF12) or nearly identical fingers with only one amino acid difference (ZF9 and ZF10 at position Ϫ4) in response to target sequence variation, along with variable cross-strand base contacts by a position-specific invariant serine residue at position Ϫ5, and a "switch" mechanism by Arg-Asp dipeptide at positions Ϫ8 and Ϫ7 for base recognition. The base-specific interaction by Arg at position Ϫ8 (corresponding to the second residue before the helix) has not been considered in the original canonical model, even in a recent large survey of the three-finger DNA-binding landscape (22).

Results
The six-finger fragment of PRDM9c was used for cocrystallization with a 20-bp oligonucleotide derived from the C consensus sequence Motif 1 (16), plus a 5Ј-overhanging thymine or adenine on the opposite strand. We crystallized the protein-DNA complex in space group P2 1 and determined the structure to a resolution of 2.4 Å ( Table 1). The crystallographic asymmetric unit contains two protein-DNA complexes (Mol A and Mol B in Fig. 1C) and an additional protein that mediates the protein-protein interactions between the two complexes (Mol C in Fig. 1D). In the current model, we only modeled ZF8 of molecule C (Fig. 1D), as no electron density was observed for the rest of the fingers in that molecule (supplemental Fig. S1A). Interestingly, a similar observation of an additional protein molecule (with only one ordered finger) was made in the crystal structure of a designed zinc finger protein bound to DNA (23). (We do not know whether the additional molecule C is biologically relevant, but it does mediate intermolecule contacts in the crystal lattice (supplemental Fig. S1B).) The two protein-DNA complexes (Mol A and Mol B in Fig. 1C) are highly similar, with a root mean square deviation (RMSD) of Ͻ0.5 Å when comparing 145 pairs of C␣ atoms (Fig. 1E). The DNA molecules are largely B-form (supplemental Table S1) and coaxially stacked, with the overhanging A and T forming a base pair with neighboring DNA molecules, thus forming a pseudocontinuous duplex in the crystal lattice. Here we will only describe the structure of complex A.
As was seen in the first structure reported for a three-finger protein Zif268 in complex with DNA (19), the six fingers of PRDM9c interact with DNA exclusively in the major groove ( Fig. 1F), primarily recognizing one strand of the doublestranded DNA in a linear, polar fashion from 3Ј to 5Ј (magenta in Fig. 1F and bottom strand in Fig. 2A), with the corresponding protein sequence proceeding from N to C terminus (ZF8 to ZF13). To our knowledge, this is the largest native tandem ZF array whose structure has been determined in complex with an oligonucleotide, where every finger is involved in DNA sequence-specific interactions. Previously, a designed six-finger zinc finger protein, Aart, was characterized with bound   DNA (24) (PDB code 2I13). Pairwise comparison of the two six-finger proteins (Aart versus PRDM9c) yielded an RMSD of ϳ2.6 Å over 149 pairs of C␣ atoms (supplemental Fig. S2A). In addition, one duplex DNA containing two adjacent sites, bound by two copies of the three-finger Zif268, had been characterized as being equivalent to a six-finger protein (25) (PDB code 1P47). Interestingly, the "relaxed" discontinuous six-finger Zif268 structure is similar (RMSD of ϳ2.6 Å) to either the native sixfinger PRDM9c or the designed six-finger Aart (supplemental Fig. S2B). More recently, we characterized human CTCF ZF2-7 (a six-finger array) in complex with DNA (26). In CTCF, ZF3 to ZF7 make base-specific contacts, whereas ZF2 continues to follow in the major groove, but the side chains within the DNA-interacting helix were too far away to make base-specific hydrogen bonds (26). Interestingly, only five of the six fingers of CTCF align with either Aart or PRDM9c (supplemental Fig. S2, C and D).

Human PRDM9 allele C interaction with DNA
The convention that we used for numbering nucleotides and amino acids is shown in Fig. 2A. Base pairs of the crystallization oligonucleotide are numbered 0 -19, with the allele C-specific consensus hot spot sequence motif as the "top" strand ( Fig. 2A). Like classic C2H2 ZF proteins (20,21,27), each finger interacts with a "triplet" element consisting of three adjacent DNA base pairs. In the case of PRDM9c, it is the opposite "bottom" strand that is being recognized by the ZF array. To keep the nomenclature of the consensus sequence, ZF8 interacts with the three base pairs of 5Ј sequence (ACC), ZF9 with the second triplet (CCA), ZF10 with the third triplet (GTG), ZF11 with the fourth triplet (AGC), ZF12 with the fifth triplet (GTT), and ZF13 with the 3Ј sequence (GCC). Analysis of the allele C-specific consensus sequence motifs (4,16) suggested that each triplet contains one variable base (N), at the first position of the first and sixth triplets (N1 and N16), the second position of the fourth triple (N11), or the third position of the second, third, and fifth triplets (N6, N9, and N15 in Fig. 2A). In addition, we used a web server for predicting C2H2 ZF DNA-binding specificity (28), which gave a predicted sequence for PRDM9c in general agreement with the experimentally determined allele C-specific consensus sequence motif, including five variable positions (supplemental Fig. S2). However, there are two notable and reciprocal exceptions; at nucleotide position 4, a conserved C:G base pair in the consensus motif is variable in the prediction, whereas at nucleotide position 11, a variable base in the consensus is strongly predicted to be a G:C base pair.
From the protein side, the side chains from specific amino acids within the N-terminal portion of each helix and the preceding loop make major groove contacts primarily with the bases of the "bottom" recognition strand. Using the first zinccoordination His in each finger as reference position 0, residues before this, at positions Ϫ1 (blue), Ϫ4 (red), and Ϫ7 or Ϫ8 (green) lie on the protein-DNA interface and form hydrogen bonds (H-bonds) with the exposed edges of the DNA bases in the major groove (Figs. 1B and 2A). Variations at the four positions (Ϫ8, Ϫ7, Ϫ4, and Ϫ1) among zinc fingers correspond to the varied sequences each finger recognized. For example, ZF9 and ZF10 (or ZF12) differ only at the Ϫ4 position, with His in ZF9 and Asn in ZF10 and ZF12 (red shading in Fig. 1B).

The conserved G-Arg, G-His, and A-Asn interactions
The seven conserved C:G base pairs in the consensus sequence ( Fig. 2A) are recognized primarily by H-bonds between the guanines of the "bottom" recognition strand and arginine or histidine residues of ZF8, ZF9, ZF11, and ZF13. The terminal N1 and N2 groups of arginine residues 736 of ZF8, 757 of ZF9, 820 of ZF11, and 876 of ZF13 donate H-bonds to the O6 and N7 atoms of guanines at base pair positions 3, 4, 12, and 18, respectively (Fig. 2, D, E, M, and S), a bonding pattern specific to guanine (29,30). Depending on side chain rotomer conformation, the N⑀2 group of histidine residues 733 of ZF8, 761 of ZF9, and 873 of ZF13 donate one H-bond to either guanine O6 or guanine N7 (Fig. 2R), the adjacent ring C⑀1 atom makes a C-H . . . N type of hydrogen bond (Fig. 2F) or a water-mediated interaction (Fig. 2C).
Triplets 3 and 5 of the consensus sequence include an invariant T:A base pair at positions 8 and 14, which are recognized by Asn 789 of ZF10 and Asn 845 of ZF12, respectively (Fig. 2, I and  O). Juxtaposition of Asn with A is a common mechanism for recognition of this base (29), as the side chain of asparagine donates one H-bond to adenine N7 and accepts one from adenine N6. As mentioned above, ZF10 and ZF12 differ from ZF9 only at position Ϫ4; an asparagine (for A) replaces the histidine in ZF9 and changes the base preference (for G). These specific G-Arg, G-His, and A-Asn interactions involve bidentate H-bond interactions. The total of nine invariant base pairs account for half of the recognition sequence (9 of 18). Interestingly, the Arg 757 -G4 interaction was predicted to involve a variable base pair (supplemental Fig. S3). This may be explained by the fact that Arg 757 is located at position Ϫ8 of ZF9, which is not part of the canonical model, and thus was not considered in the survey (28).

Interactions with variable base pairs
Interactions with the variable (N) base pairs of the consensus sequence involve water-mediated H-bonds with Asn 730 (A1:T1 of triplet 1; Fig. 2B), hydrophobic interaction with Val 817 (G11: C11 of triplet 4; Fig. 2L), and a gap in the protein-DNA interface with Asn 870 positioned too far away from the base (Fig.  2Q). The smaller serine side chains are also too far away to form interactions: Ser 764 of ZF9, Ser 792 of ZF10, and Ser 848 of ZF12 (dashed lines in Fig. 2, G, J, and P; all located at the Ϫ1 position of each finger). Serine and asparagine can each act as both an H-bond donor and acceptor, and they might accommodate alternative base pairs. The participating amino acids may, in other words, alter their conformation so as to interact with different base pairs and, in this way, intimately fit the ZF array to a variety of different sequences. For example, when a T:A base pair occurs to position 1 (A:T in the currently used DNA molecule; Fig. 2B) or 16 (currently G:C; Fig. 2Q), an A-Asn interaction could occur similar to those between asparagine residues at ZF10 and ZF12 that specify the invariant T:A base pairs at positions 8 and 14 (Fig. 2, I and O). We note that this observation (that a serine residue at the Ϫ1 position of ZF9, ZF10, or ZF12 could accommodate T, C, or A, whereas arginine at the corresponding Ϫ1 position of ZF9, ZF11, and ZF13 recognizes only G) is in agreement with a similar observation made

Human PRDM9 allele C interaction with DNA
with the designed six-finger protein Aart, where triplets containing 5Ј A, C, or T of the recognition strand are typically not specified by direct interaction with the small amino acid (Ala, Ser, Val) in position Ϫ1 of the recognition helix (24).

The adaptive interaction with variable sequence by the invariant serine at position ؊5
Despite complete conservation of Ser at position Ϫ5 in each ZF (Fig. 1B), the way in which it interacts with DNA differs from triplet to triplet. Nevertheless, most of the serines interact with the top strand, the nucleotide immediately before and/or the first base pair of the cognate triplet (Fig. 3A). Thus, Ser 732 of ZF8 interacts with guanine immediately before its own triplet (G0 in Fig. 3B). Ser 760 of ZF9 interacts, via a water molecule, with cytosine of its own triplet (C4 in Fig. 3C). Ser 788 of ZF10 interacts with an adenine of the previous triplet (A6 in Fig. 3D). Ser 816 (ZF11) joins through a water bridge between a guanine of the previous triplet (G9 in Fig. 3E) and T10 of its own triplet. Ser 844 hydrogen-bonds with the first guanine of its own triplet (G13 in Fig. 3F). Finally, Ser 872 uses the C␤ carbon atom to make a van der Waals contact with the methyl group of T15 of the previous triplet (Fig. 3G). This "adaptability" stems in part from the ability of serine to act as an H-bond donor or acceptor or both at the same time. This observation also suggests that the cross-strand contact mediated by the small amino acid (such as serine) at position Ϫ5, corresponding to position 2 of the structure-based numbering scheme (Fig. 1B, bottom), is generally not a determinant of DNA-binding specificity.

The adaptive interaction by Arg-Asp dipeptide at positions ؊8 and ؊7 to sequence variation
The Arg-Asp (RD) dipeptides at positions Ϫ8 and Ϫ7 of ZF9, ZF10, and ZF12 (shaded green in Fig. 1B) provide additional examples of the adaptability of PRDM9c to sequence variations. In ZF9, Arg 757 conforms to C4:G4 as the first base pair of its triplet (CCA) and forms bidentate H-bonds with the guanine in the bottom strand (Fig. 3C), whereas Asp 758 hydrogen-bonds with the arginine as well as a water-mediated network with the paired cytosine of the opposite strand (Fig. 3C). In ZF10 and ZF12, these same amino acids adopt different conformations and partners. Specifically, Asp 786 matches to the first base pair of its triplet (GTG) and makes an H-bond with the cytosine in the bottom strand (Fig. 2H); by doing so, the adjacent Arg 785 instead interacts with a backbone phosphate group (Fig. 3, H and I). The same conformational change of an RD dipeptide is also evident in ZF12 (Fig. 3, J and K). Thus, the RD dipeptide can accommodate changes from C:G (with G-Arg interaction) to G:C (with C-Asp interaction). We note that Asp can bind unmodified cytosine (21,26,31). The same adaptability could apply to ZF13, which contains the Arg-Asn (RN) dipeptide at positions Ϫ8 and Ϫ7 (shaded green in Fig.  1B), allowing recognition of the first base pair of a triplet when it is either C:G (G-Arg) or T:A (A-Asn) (Fig. 3L). The latter interaction would be analogous to A8 -Asn 789 of ZF10 and A14-Asn 845 of ZF12 (Fig. 2, I and O).

The thymine-tryptophan van der Waals interaction in ZF11
As noted earlier, the expansion of ZF11 of PRDM9a into two fingers resulted in an extra finger in PRDM9c (Fig. 1A). Allele C ZF11 possesses hydrophobic (Val) and aromatic (Trp) residues at positions Ϫ4 and Ϫ7, whereas ZF12 has all of the usual features of a regular zinc finger unit, with polar or charged residues at the DNA base-interacting positions (Fig. 1B). The indole ring of Trp 814 of ZF11 forms a interaction with the methyl group of thymine T10 (Fig. 2K). The methylinteractions are also used by the zinc finger protein ZNF217 in interaction with DNA, where a tyrosine contacts a thymine methyl group (32). In contrast, the hydrophobic residue Val 817 of ZF11 allows a variable base at base pair position 11, probably due to its ability to rotate the side chain torsion angle, moving the terminal

Human PRDM9 allele C interaction with DNA
methyl groups into or out of contact (Fig. 2L). In contrast, the prediction strongly forecasted that the Val 817 corresponding nucleotide position 11 was a G:C base pair (supplemental Fig.  S2), probably influenced by the within-finger context of amino acids immediately surrounding Val 817 (22).
Although Trp/Tyr-thymine methyl interactions have not often been described in classical native ZF-DNA complexes, aromatic (Phe, Tyr) or hydrophobic (Val, Ala) residues have been observed in the bacterial one-hybrid screen against threefinger proteins in response to selection for binding thymine (22). In addition, a negatively charged glutamate was observed interacting with the methyl groups of thymine or 5-methylcytosine in Klf4 (31) and Kaiso (33).

Structural comparison of PRDM9 A and C alleles in complex with allele-specific DNA
To gain insights from the available structural information, we compared PRDM9a-DNA and PRDM9c-DNA complex structures. In the structure of PRDM9a, five-finger fragment ZF8 -12 was used for crystallization, but the last finger (ZF12) could not be seen in the structure (18). Superimposition of the four fingers shared between the two structures (ZF8 -11) yielded an RMSD of ϳ3 Å over 106 pairs of C␣ atoms. The largest difference lies at one end of the DNA molecule, where PRDM9a has no DNA contacts and PRDM9c has two C-terminal zinc fingers (ZF12-13) bound (Fig. 4, A and B). DNA flexibility in the absence of paired ZF unit and crystal packing lattice force involving DNA ends might have influenced the DNA conformation observed here.
From the protein side, two types of differences were observed between the complex structures. One difference results from the protein sequence change. As mentioned, allele C differs from A by an arginine-to-serine change, R764S, at position Ϫ1 of ZF9 (Fig. 4C) (12,13,17). Arg 764 of PRDM9a recognizes the conserved guanine of the C:G base pair in the allele A-specific consensus (18) (Fig. 4D), whereas Ser 764 of PRDM9c allows a variable nucleotide at the corresponding position that lacks specific interaction with the base (Fig. 4E). The expansion of ZF11 into two fingers allows PRDM9c to recognize a longer 18-bp consensus sequence, instead of a 15-bp sequence as in PRDM9a. The additional ZF provides an infrequently observed feature of aromatic Trp 814 contacting the thymine T10 methyl group (Fig. 2K) (see "Discussion"). These changes allow PRDM9c to bind 3-bp-longer sequences and gain additional interactions with the associated backbone phosphate groups (supplemental Fig. S4). The net result is Ͼ10-fold enhanced binding affinity (18).
The second type of difference results from the adaptability of the RD dipeptide to sequence variations. In PRDM9a, the RD dipeptide in ZF10 conforms to C:G as the first base pair of its triplet (CTA), and the Arg forms two H-bonds with the guanine (18) (Fig. 4F). In contrast, the corresponding RD residues in PRDM9c adopt different conformations and partners, due to the substitution of C:G to G:C base pair, and the Asp hydrogenbonds with the cytosine (Fig. 4G), whereas the Arg interacts with a backbone phosphate group (Fig. 3I). The same adaptability applies to RD in ZF9 of both alleles, RD in ZF12 of PRDM9c and RN in ZF12 of PRDM9a and ZF13 of PRDM9c (Fig. 4C).
We note that the arginine residue of the RD dipeptide is located outside of the canonical positions of the helix. Here, we have demonstrated that an identical ZF unit (ZF10) interacts with two different sequences: TAG with PRDM9a and CAC with PRDM9c. This is not due to the ability of an amino acid in a given position (for instance, Asp at position Ϫ7) to specify different bases at the corresponding 3Ј nucleotide position. Rather, it is due to the ability of the RD dipeptide to undergo conformation switching. We first observed this unique RD switch in the two-finger structure of Zfp57 (34), a related KRAB-ZF protein. We do not know whether this feature is unique to the family of KRAB-ZF proteins, which are the largest and most rapidly diversifying family of DNA-binding transcriptional regulators in mammals (35). Nevertheless, our study illustrates that (i) residue(s) outside of the canonical positions For clarity, the protein components have been removed from the superimposition. C, the allele A-specific sequence (blue) and allele C-specific sequence (orange) are aligned to the corresponding ZF with amino acids at the Ϫ8, Ϫ7, Ϫ4, and Ϫ1 positions (see Fig. 1B, top). The expansion of one finger (ZF11) in PRDM9a to two fingers (ZF11-12) in PRDM9c is highlighted in a gray box. D, Arg 764 of allele A interacts with G6. E, Ser 764 of allele C allows a variable base at base pair position 6. F, RD dipeptide of ZF10 in allele A interacts with G7. G, RD dipeptide of ZF10 in allele C interacts with C7.

Human PRDM9 allele C interaction with DNA
of the ZF helix contribute to the DNA-binding specificity, and (ii) an identical ZF can recognize different sequences via the RD conformational switch.

Discussion
To summarize, our results from in vitro studies of interaction between human PRDM9 C-terminal C2H2 ZF tandem array and allele-specific DNA provide a structural explanation for the sequence adaptability between the A and C alleles, the two most common alleles found in African populations (13,14). The importance of the ZF tandem array in hot spot activation is demonstrated by different alleles that activate completely different, allele-specific hot spots yet differ in the DNA-interacting ZFs, whether due to a subtle single amino acid change or expansion/deletion of an additional unit in the ZF array (12)(13)(14)(15). A recent experiment humanized the ZF array in mouse Prdm9 (by replacing the mouse ZF array with the orthologous sequence from humans), reversing the hybrid infertility between musculus and domesticus subspecies by entirely reprogramming recombination hot spots (36). Hybrid sterility is a common mechanism for preventing gene exchange between related species throughout the animal and plant kingdoms (37). Altering one Prdm9 allele in mice mimics the consequences of a newly arising allele and thus links the Prdm9 DNA-binding ZF array to its putative role in speciation. In vitro studies indicate that the PRDM9 ZF array forms a stable complex with its specific DNA sequence, with a dissociation halftime of many hours (38). In addition, each allele can bind DNA with high affinity while recognizing sequences with high variability in the consensus sequence motif (18). This property of PRDM9 can be traced to the ability of specific residues in each ZF unit to adopt alternative conformations, allowing it to establish versatile H-bonds with some bases but not with others.
The first seven fingers (ZF1-7) and the last finger (ZF13 in PRDM9a and ZF14 in PRDM9c) appear to be dispensable for binding of specific hot spot sequences. For example, allele B differs from A by a serine-to-threonine change, S680T, at position Ϫ1 of ZF6 (12) (the same change occurs in allele C). However, 88% of hot spots in a heterozygous A/B individual overlapped those in two A/A individuals, which themselves overlapped by 89%, suggesting that PRDM9b does not specify a distinct set of hot spots (4). It is evident that the residues at the base-interacting positions for these dispensable fingers are largely small amino acids (Ser/Thr) and hydrophobic or aromatic (Val/Tyr/Trp), whereas the corresponding residues for the functional fingers are large and charged/polar residues (Arg/His/Asp/Asn).
However, the apparently dispensable fingers provide a potential pool for duplication and variation when the need arises. The Trp-containing ZF11 could be considered as a duplication of identical ZF3, ZF6, or ZF7, although with the smaller threonine at position Ϫ1 replaced by a larger and charged arginine (Fig. 1B), generating a functional finger. On the other hand, alleles A and C might evolve independently; in PRDM9a, ZF11 is a duplication of ZF8, whereas in PRDM9c, ZF12 is a duplication of ZF10. The amount of variability among ZFs, as indicated by the number of unique residues at a given position comparing all ZFs in a protein, is highest in the first two and last ZFs and at ZF8 and very low in the intervening and DNA-binding ZFs of PRDM9 (unique positions highlighted in yellow in Fig. 5A). This pattern is similar to that seen for the chimpanzee ortholog, with higher variability near the ends (ZF2 and ZF14) (Fig. 5B). In contrast, the orthologs for macaque and gibbon show a different pattern, with the least variability at the two ends and at ZF8 and the highest variability between these three points (Fig. 5, C and D). This may reflect a particularly important role in DNA recognition by ZF8 in those orthologs, perhaps contributing to the presently only partially explained basis for speciation of gibbons and related primates (39). Together, our structures of PRDM9-DNA complexes provide important insights into molecular mechanisms of PRDM9 action for the diversity, flexibility, and conservation of allele-specific DNA binding across the human genomes.

Experimental procedures
The gene encompassing the C-terminal ZF array of human PRDM9c (ZF8 -14) was synthesized by GENEWIZ and subcloned into pGEX-6p1 vector (pXC1289). The ZF8 -13 construct (pXC1505) was generated by PCR from the ZF8 -14 plasmid DNA and subcloned into the pGEX-6p1 vector. The protein was expressed and purified using the protocol as described (18,40) with a few changes. First, after Precission cleavage, the protein was not diluted for salt concentration but was loaded directly at 0.5 M NaCl onto tandem Hitrap Q-SP columns (GE Healthcare). Most of the protein flowed through the Q column onto the SP column from which it was eluted as a single peak at ϳ0.8 M NaCl using a linear gradient of NaCl from 0.5 to 1 M. Second, the protein was further purified on a Superdex-200 (16/60) column with the storage buffer containing 20 mM Tris (pH 7.5), 500 mM NaCl, 5% glycerol, 25 M ZnCl 2 , and 0.5 mM tris(2-carboxyethyl)phosphine hydrochloride.
Fluorescence polarization was used to measure the dissociation constants (K D ), as described (18,40 S5), we used ZF8 -13 for co-crystallization. Purified ZF8 -13 proteins were incubated with the double-stranded DNA (Table  1), synthesized by Integrated DNA Technologies, at an equimolar ratio to a final concentration of 25 M on ice in the storage buffer. The protein-DNA complexes were formed by dialysis against the same buffer components with 250 mM NaCl but without ZnCl 2 . The complex was further concentrated up to ϳ0.6 mM before crystallization. Crystallization conditions were screened using a crystallization robot (PHOENIX, Art Robbins Instruments) and commercial screens from Hampton Research. This method utilized the sitting drop vapor diffusion method. The PHOENIX instrument mixed an aliquot of protein-DNA complex (0.2 l) with an equal volume of crystallization mother liquor. The best-diffracting crystal used for data collection was obtained from optimized conditions of 0.1 M BisTris propane, pH 7.3, and 23% (w/v) polyethylene glycol 3350, using manually set-up hanging drop vapor diffusion at Human PRDM9 allele C interaction with DNA 16°C. The crystals were flash-frozen under liquid nitrogen using 20% glycerol as cryoprotectant.
X-ray diffraction data were remotely collected at the SER-CAT 22-ID beamline of the Advanced Photon Source at the Argonne National Laboratory and processed by HKL2000 (41). The autosolve module of PHENIX (42) was used for initial crystallographic phasing calculation by single-wavelength anomalous dispersion of zinc signals. The initial electron density revealed clearly visible DNA molecules, and a B-DNA model made by the "make-na server" (http://structure.usc.edu/make-na/server.html) 3 was placed into the density manually and refined using PHENIX REFINE (43). This partial model was then used to search for the ZF proteins and place them into the density by molecular replacement using PHASER-MR (44). The structure was further refined using PHENIX, and the model was manually adjusted by COOT (45). Structure quality was analyzed and validated by the PDB validation server (46). Molecular graphics were generated using PyMOL (Schroedinger, LLC, New York). . We note that aside from the fact that the gibbon protein is the best BlastP hit, there is no biochemical proof that it is the PRDM9. The numbers indicate the unique residues at a given position comparing all ZFs in a protein.