Lock and key binding of the HOX YPWM peptide to the PBX homeodomain.

HOX homeodomain proteins bind short core DNA sequences to control very specific developmental processes. DNA binding affinity and sequence selectivity are increased by the formation of cooperative complexes with the PBX homeodomain protein. A conserved YPWM motif in the HOX protein is necessary for cooperative binding with PBX. We have determined the structure of a PBX homeodomain bound to a 14-mer DNA duplex. A relaxation-optimized procedure was developed to measure DNA residual dipolar couplings at natural abundance in the 20-kDa binary complex. When the PBX homeodomain binds to DNA, a fourth alpha-helix is formed in the homeodomain. This helix rigidifies the DNA recognition helix of PBX and forms a hydrophobic binding site for the HOX YPWM peptide. The HOX peptide itself shows some structure in solution and suggests that the interaction between PBX and HOX is an example of "lock and key" binding. The NMR structure explains the requirement of DNA for the PBX-HOX interaction and the increased affinity of DNA binding.

The specification of segmental identity along the embryonic anteroposterior axis is largely determined by Hox genes. These genes encode transcription factors that bind DNA through a highly conserved 60-amino acid domain known as the homeodomain, which consists of three ␣-helices and an N-terminal arm. The third helix lies in the major groove of the DNA and, along with the N-terminal arm, is responsible for DNA recognition. Well conserved amino acid residues contact DNA and form a hydrophobic core (1).
The Hox gene cluster in Drosophila consists of eight genes, the expression of which is directly related to their location in the cluster. In mammals, four Hox clusters, A to D, encompass a total of 39 genes. The mammalian HOX proteins are classified into 13 paralog groups on the basis of their position in the gene cluster and homology to the Drosophila Hox genes (2).
Paralogs are expressed in overlapping domains and possess both similar and unique functions. Despite their highly specific in vivo activities, in vitro HOX homeodomain proteins bind to the short DNA sequence TAAT with relatively low affinity (3). The formation of cooperative DNA-binding complexes between HOX proteins and a cofactor, PBX, increases both the affinity and specificity of HOX proteins for DNA (4,5).
PBX binds to the DNA sequence 5Ј-TGAT-3Ј (6, 7) through an atypical three-amino acid loop extension (TALE) 1 homeodomain (8). The extra amino acids, which extend the loop between the first and second helices of the homeodomain (9), are necessary for cooperative DNA binding with HOX proteins (10,11), forming part of a hydrophobic binding pocket for the YPWM motif (12,13). The 15 amino acids immediately C-terminal to the PBX homeodomain are highly conserved among PBX (PBX1, -2, and -3) and related proteins (Drosophila Exd, Caenorhabditis elegans ceh -20) and increase the affinity of PBX for DNA and HOX proteins (11,14).
The minimal elements required for formation of PBX-HOX complexes are the PBX and HOX homeodomains and the HOX YPWM motif (15)(16)(17)(18)(19). The YPWM motif is found at a variable distance (5-50 amino acids) N-terminal to the homeodomain in paralog groups 1-8 of the mammalian Hox cluster (16). Tryptophan residues that mediate cooperative binding are also found in paralogs 9 and 10 (5,20). Peptides that encompass the HOX YPWM motif and flanking amino acids disrupt the formation of PBX-HOX heterodimers (11,16,21) and also stimulate binding of the PBX homeodomain to DNA. PBX-HOX heterodimers recognize the DNA sequence 5Ј-ATGATTNATNN-3Ј (5,22). PBX binds to the TGAT half-site, and HOX binds to the TNAT half-site. The identities of the variable bases, N, are determined by the HOX protein (5,22,23).
We have solved the solution structure of the extended PBX homeodomain-DNA binary complex and studied the binding of a HOX-derived YPWM peptide to the complex. The structure provides insights into the role of the C-terminal extension and the PBX-YPWM motif interaction in increasing the affinity and specificity of HOX and PBX for DNA.

EXPERIMENTAL PROCEDURES
Sample Preparation-Recombinant PBX protein, chemically synthesized HOX-derived peptides, and DNA oligonucleotides ( Fig. 1) were purified as described previously (9). Plasmids encoding the HOXA1 and HOXD4 peptides were constructed by polymerase chain reaction amplification using primers that incorporated stop codons and BamHI and * This research was supported by Canadian Institutes of Health Research Grant MA14129 (to K. G.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.  1 The abbreviations used are: TALE, three-amino acid loop extension; NOE, nuclear Overhauser effect; NOESY, nuclear Overhauser effect spectroscopy; RDC, residual dipolar coupling; ROESY, rotating frame Overhauser enhancement spectroscopy; ROEs, rotating frame Overhauser enhancements; HSQC, heteronuclear single-quantum correlation; HMQC, heteronuclear multiple quantum coherence; DMPC, 1,2-dimyristoyl-sn-glycero-3-phosphocholine; DHPC, 1,2-dihexanoyl-sn-glycero-3-phosphocholine.
MfeI sites to clone HOXA1 amino acids 203-221 and HOXD4 amino acids 129 -147 into pGEX-6P-1 (Amersham Biosciences). The glutathione S-transferase fusion proteins were expressed in Escherichia coli BL21(DE3) cells grown on M9 minimal media supplemented with 15 N NH 4 Cl and U 13 C 6 D-glucose as the sole nitrogen and carbon sources. The proteins were purified according to the manufacturer's protocol (24), and the peptide was cleaved from glutathione S-transferase with 50 units of Precission Protease (Amersham Biosciences) at 4°C for 20 h. The resulting peptides were purified by reverse-phase chromatography on a C-18 column (Vydac), lyophilized, and resuspended in 20 mM sodium phosphate, pH 7.0, for NMR studies.
NMR Spectroscopy-NMR experiments were recorded on Bruker DRX 500, Varian INOVA 750, and Varian INOVA 800 spectrometers equipped with pulsed-field gradient probes. Proton chemical shifts were referenced to internal 2,2-dimethyl-2-silapentanesulfonic acid at 0 ppm. 15 N and 13 C chemical shifts were referenced to the proton spectrum using the ratio of gyromagnetic moments (␥ N /␥ H ϭ 0.10132905, ␥ C /␥ H ϭ 0.25144952). Data were processed using the Gifa program, versions 4.2 and 4.3 (25), and peak picking and assignment were carried out with XEASY (26). Resonance assignments and HNH␣ coupling constants for the PBX homeodomain and the HOXA1 peptide were obtained by standard methods. The methyl groups of valine and leucine side chains were assigned stereospecifically from a 13 C-HSQC of a 10% 13 C-labeled sample. DNA assignments were made from a two-dimensional 1,2-13 Cfiltered NOESY. Distance restraints for the binary complex were derived from a 15 N-edited HSQC-NOESY, a 13 C-edited HSQC-NOESY, a two-dimensional 1,2-13 C-filtered NOESY, and a two-dimensional 2-13 C-filtered NOESY. All NOESY spectra were recorded at 30°C with mixing times of 150 ms. Intermolecular NOEs in the ternary complex were obtained from a 15 N-HSQC NOESY, a 13 C-HSQC NOESY, and a two-dimensional transferred NOE experiment recorded for a sample of 4.3 mM chemically synthesized HOXA1 with 5% (mol:mol) PBX 1-78-DNA complex.
Residual Dipolar Coupling (RDC) Data-1 H-15 N couplings were measured for the PBX-DNA complex in isotropic solution and two aligned media: 10 mg/ml filamentous phage at 30°C, and 5 mg/ml q ϭ 3.0 DMPC/DHPC bicelles at 37°C. Initial values for D a and R were estimated from the RDC powder pattern (27). Final values of 13.5 and 0.6 in bicelles and 15.5 and 0.55 in phage for D a and R, respectively, were determined using a variational method (28). 1 H-13 C residual couplings were measured by taking twice the difference in peak position between a t 1 -coupled 13 C-HSQC and a 13 C-HMQC. DNA RDCs were recorded in the presence and absence of DMPC/DHPC bicelles in 100% D 2 O.
Structure Calculation-100 structures of the PBX-DNA complex were generated on the basis of the restraints listed in Table I. The 20 lowest energy structures were chosen to represent the solution structure of the binary complex. 1180 NOEs were assigned manually, and eight iterations of ARIA (29) were used to obtain additional assignments. The starting conformation for the calculation consisted of the free PBX homeodomain (9) with the N-terminal arm and C-terminal extension modeled as extended strands and placed next to B-form DNA. Initially crystallography and NMR System (CNS; Yale University) was used to generate 100 structures with the DNA duplex frozen. The protein and DNA were then submitted to a second round of dynamical annealing using Cartesian dynamics, starting from the structures generated in the first round, with the dummy coordinate system for the RDCs fixed and with NOE and RDC constraints for the DNA added.

RESULTS
The Extended PBX Homeodomain Binds to DNA, Forming a Fourth ␣-Helix-We have determined the structure of the extended PBX homeodomain (PBX 1-78) bound to a DNA duplex containing the PBX recognition site 3Ј-ATGAT-5Ј (Fig. 1). An ensemble of 20 structures, with a backbone root-mean-square deviation of 0.54 Å (Table I), was chosen to represent the structure of the complex in solution. The extended PBX homeodomain consists of four ␣-helices and an N-terminal arm (Fig.  2). The first ␣-helix runs from Lys-10 to Phe-20, the second from Glu-28 to Ser-38, the third from Val-42 to Lys-58, and the fourth from Ile-60 to Ala-72. The fourth ␣-helix forms on binding of the homeodomain to the DNA duplex and folds across the homeodomain at an angle of approximately 65°to helix three, crossing helix one.
FIG. 1. Sequences of protein, DNA, and peptide constructs. a, sequence of the PBX homeodomain. The standard homeodomain numbering is shown above the sequence and the numeric position in the full-length protein is shown below the sequence. The extra three amino acids are labeled a, b, and c, and the conserved residues of the Cterminal extension are underlined. b, nucleic acid sequence. The PBX recognition site is in bold. c, HOX-derived peptides. The YPWM motif is indicated by bold type. The numbering scheme used in this study is shown above the amino acid sequences, and the position relative to the respective homeodomains is shown below the sequences. A number of intermolecular NOEs are observed that position PBX on the DNA duplex. The N-terminal arm lies in the minor groove of the DNA. Many NOEs are observed between its arginine and lysine side chains and the sugar protons of ϪG4, ϪT5, ϪA6 and ϩA8, as well as the H2s of ϪA6 and ϩA8. Particularly strong NOEs are observed between the ␦ and ⑀ protons of Arg-5 and the H2 of ϪA6. Both Arg-5 and the TA base pair (TNAT) are highly conserved in all homeodomains and recognition sites (1). The PBX three-amino acid extension is accommodated in loop 1 in such a way that a conserved Tyr-25-phosphate backbone contact is preserved. The third ␣-helix sits in the major groove of the DNA. The methyl groups of isoleucine 54 in the third helix show strong NOEs to the sugars and bases of ϪA9, ϪA10, and ϪC11. Residual dipolar couplings were used in the refinement of the structure to compensate for the lack of long range NOEs in the nucleic acid.
RDCs-RDCs in the binary complex were measured in two aligned media, lipid bicelles and filamentous phage. The structure was refined using the RDCs measured in bicelle media. 60 1 H-15 N residual dipolar couplings were measured for the backbone amides of the PBX homeodomain aligned in lipid bicelles. The distribution of positive and negative RDCs reflects the orientation of the four ␣-helices (Fig. 3). The PBX-DNA complex aligns with the axis of the DNA duplex; the first three ␣-helices are roughly perpendicular to the magnetic field, and the fourth ␣-helix is parallel to the magnetic field. RDCs were also measured for PBX in the ternary PBX-DNA-HOX complex aligned in filamentous phage (data not shown). The orientation of both the binary and ternary complexes with respect to the external magnetic field was very similar.
28 aromatic and H1Ј-C1Ј residual dipolar couplings in the DNA were measured at natural 13 C abundance. Interference between chemical shift anisotropy and dipolar relaxation mechanisms in the binary complex resulted in a TROSY-like effect (30) in the proton-coupled 13 C-HSQC. The effect was most pronounced for the aromatic and C1Ј-H1Ј cross-peaks, for which only the slowly relaxing component of each 13 C doublet was observed. The large chemical shift anisotropy of aromatic carbons (Ͼ100 ppm) results in favorable conditions for imple-mentation of TROSY-type experiments in the assignment of nucleic acids and aromatic residues of proteins (31)(32)(33). The transverse relaxation rates of multiple quantum coherences are longer than those of the corresponding single quantum coherences. This phenomenon has also been exploited to improve sensitivity in NMR experiments on large biomolecules (33)(34)(35). A 13 C-HMQC spectrum was acquired to measure 1 H-13 C splittings. This method showed the 1 H-decoupled chemical shifts and hence allowed calculation of the 1 H-13 C couplings with only a single component of the doublet (Fig. 3). The 13 C-HMQC experiment was more sensitive than an 1 H-decoupled 13 C-HSQC and yielded a more complete set of residual dipolar couplings (data not shown). Residual dipolar couplings for both the DNA and the PBX homeodomain were well satisfied in the structure calculation (Fig. 3e).
HOXA1 and HOXD4-derived Peptides Are Structured in Solution-HOX-derived YPWM peptides were studied both free and bound to the PBX-DNA complex. The peptides encompass the YPWM motif and flanking residues of HOXA1 and HOXD4 (Fig. 1). ROESY spectra were recorded at 15°C and pH 5.0 for the free peptides. Medium range ROEs were observed between the aromatic ring of residue 4 and methionine 7 in both peptides (Fig. 4). ROEs between the methyl groups of Val-9 and the aromatic rings of Phe-4 and Trp-6 are observed in HOXA1, and the side-chains of Val-2 and Tyr-4 of HOXD4 are close together. At a higher temperature and pH value (30°C, pH 6.8), the amide cross-peaks in the 15 N-HSQC are well dispersed, indicating that the structure is retained under these conditions (Fig. 5a).
The HOXA1 Peptide Binds to a Preformed Hydrophobic Pocket-To discriminate between signals that arise from PBX and HOXA1, samples were prepared in which either the peptide or the homeodomain was 15 N-or 15 N-and 13 C-labeled. Formation of the ternary complex was followed by the recording of a series of 15 N-HSQC spectra of the labeled polypeptide during titration with the unlabeled component. The amide cross-peaks for residues in PBX loop one and the C-terminal half of helix three broaden and shift as HOXA1 is added to the PBX-DNA complex. Residues in the C-terminal extension experience smaller changes in chemical shift (9). Amide cross-peaks broaden, and significant changes in chemical shift are observed for residues 4 -10 of HOXA1 when the PBX-DNA complex is titrated into the peptide (Fig. 5). 1 H-15 N heteronuclear NOEs are positive for amino acids 3-12, indicating that they are structured (Fig. 5c). In contrast to the binding of PBX to DNA, which is in slow exchange on the NMR time scale, the HOXA1 peptide binds to the PBX-DNA complex in an intermediate exchange regime. The low affinity of a truncated HOXA1 peptide (residues 1-12 of the HOXA1 peptide) for the PBX-DNA complex allowed the use of transferred NOEs to study its bound conformation. A sample in which the peptide was in 20-fold molar excess over the binary complex was prepared. Under these conditions information about the bound state of the peptide is transferred to the free state and observed. The proton resonances of the side chains of Phe-4, Trp-6, and Lys-8 were broadened compared with the free peptide (Fig. 4). These side chains are involved in binding to PBX. More intramolecular NOEs are observed in a transferred NOESY spectrum than in the ROESY spectrum of the free HOXA1 peptide, indicating that it becomes more structured on binding to the PBX-DNA complex. Intermolecular NOEs between the aromatic rings of Phe-4 and Trp-6 and the methyls of Leu-23a, and Met-7 and Ile-60 place the YPWM motif in a pocket formed by loop one and helix three, as expected from the changes in PBX chemical shifts. 15 N and 13 C-HSQC-NOESY spectra of PBX in the binary and ternary complexes are very similar. Additional NOEs are ob- served to the backbone amide of Ser-23b, which is solventexposed in the binary complex. Some chemical shifts change for the amino acids around the hydrophobic binding pocket, although the NOE patterns are the same. The imino protons of base pairs 7 and 8 shift slightly when HOXA1 is titrated into the PBX-DNA complex; otherwise the DNA spectrum is unchanged. The HOXA1 peptide binds to a preformed hydrophobic pocket bordered by the TALE and helices three and four of PBX.

DISCUSSION
Our determination of the binary PBX-DNA structure demonstrates that the PBX homeodomain and the HOX YPWM motif interact via a ''lock and key'' mechanism. On binding to DNA, the third ␣-helix of the PBX homeodomain lengthens, and contact between Tyr-25 and the DNA duplex brings the loop between the first two helices closer to the third helix. This creates a hydrophobic pocket bordered by the TALE and the C terminus of the third ␣-helix. The fourth ␣-helix, which forms on binding to DNA, contributes an additional side to the pocket. The HOX YPWM motif, which is partially folded by hydrophobic interactions, inserts into this pocket. An increase in the number of NOEs observed for the HOXA1 peptide indicates it becomes more structured when it binds to PBX. No changes are observed in the conformation of the PBX homeodomain on formation of the ternary complex.
The lengthening of the third ␣-helix when PBX binds to DNA induces the formation of a fourth ␣-helix in the C-terminal extension (9,36). This ␣-helix folds over the homeodomain and contacts helix one, thereby reducing the mobility of the third helix. This accounts for the increased DNA-binding affinity of the extended PBX homeodomain, PBX 1-78, in comparison with the homeodomain, PBX 1-59 (11,14,15). No additional DNA contacts are formed by the C-terminal extension. PBX 1-78 also demonstrates higher affinity for HOX proteins and peptides (11,14) because of stabilization of the hydrophobic binding pocket and formation of additional PBX-HOX contacts in the C-terminal extension. Conversely, HOX-derived peptides stimulate binding of the PBX homeodomain to DNA (11,16). The third ␣-helix is involved in both DNA and YPWM binding interactions (9,12,13). The chemical shifts of the imino protons of base pairs 7 and 8 change slightly when the HOXA1 peptide binds to the PBX-DNA complex. No direct contacts occur between the peptide and the DNA, which suggests that the change in the environment of the base pairs is a result of the repositioning of PBX side chains. Proton resonances in the base and sugar moieties of ϩG7 and ϩA8 are among the broadest in the binary complex. The disordered side chains of Arg-53 and Arg-55 may form more stable contacts with DNA in the presence of the HOX YPWM motif.
The PBX-DNA structure was compared with the crystal structure of a PBX-DNA-HOXB1 complex (12). The conformation of the PBX homeodomain is very similar in the NMR and crystal structures (average root-mean-square deviation 1.1 Å for the backbone atoms of the three core ␣-helices). Differences are seen in the positions of the C-terminal half of helix three, helix four, and the extended loop. The crystal structure includes the PBX homeodomain, the HOXB1 homeodomain and YPWM motif, and a 20-base pair DNA duplex. Each homeodomain introduces a slight bend in the DNA. In comparison, a single PBX homeodomain is bound to a 14-mer DNA duplex in the NMR structure. Helix three of the PBX homeodomain lies in the major groove of the DNA. In the PBX-HOXB1 heterodimer, the N-terminal arm of the HOX homeodomain contacts the same bases from the minor groove. The slightly different conformation of the DNA results in the differences in the positions of the third and fourth helices between the NMR and crystal structures. The PBX-HOXB1 recognition site is 5Ј-TGATTGAT-3Ј. A single TGAT site was used in the NMR structure determination to avoid homodimer formation. In the NMR structure tyrosine 25 in the extended loop thus contacts a guanine close to the end of the DNA duplex rather than the thymine present in the crystal structure. Combined with the bend in the DNA introduced by the HOXB1 homeodomain in the crystal structure, this results in the shift in the position of the loop between helices one and two. No changes were observed in the conformation of the PBX-DNA complex on binding of a HOXA1-derived YPWM peptide. The differences in PBX in the NMR and crystal structures are a result of the sandwiching of the DNA between two homeodomains.
ROESY spectra of the HOXA1 and HOXD4 peptides indicate that there is some structure in the absence of PBX (Fig. 4a). Hydrophobic interactions between the aromatic ring of the tyrosine (or phenylalanine) and the conserved methionine fold the peptide. Interactions between conserved aromatic and hydrophobic residues flanking the core YPWM motif also contribute. The HOXA1 peptide becomes more structured on binding to the PBX homeodomain. Chemical shift dispersion increases in 15 N-HSQC spectra and more intramolecular NOEs are observed. NMR studies at low temperature of two six-amino acid peptides, TFDWMK and LFPWMR (i.e. residues 3-8 of our HOX peptides), have also demonstrated that the YPWM motif is partially prefolded in solution (37). The conformation of the PBX-bound HOXA1 peptide is similar to the corresponding region of HOXB1 in the crystal structure of the PBX-HOXB1 heterodimer (12). The HOX YPWM motif binds to PBX through hydrophobic interactions. Contacts are observed between the conserved tryptophan ring and the side chain of Leu-23a in the extended loop in both the crystal structure and NOESY spectra of the ternary complex. Intermolecular NOEs also place the methionine and phenylalanine side chains within the hydrophobic binding pocket as in the PBX-HOXB1 heterodimer. However, the relatively weak binding of the HOXA1 peptide to the PBX-DNA complex, and broad NMR line widths indicate that the peptide experiences some conformational exchange that is not seen in the crystal structure. In the crystal, the intact HOXB1 homeodomain and linker region stabilize the HOX YPWM motif-PBX interaction.
The structures of the PBX-DNA and PBX-DNA-HOXB1 complexes show how PBX-HOX heterodimers achieve increased DNA binding affinity and different sequence selectivity (4,5) and demonstrate the importance of DNA-binding by PBX for heterodimer formation. The amino acid sequences of the HOX N-terminal arm and the YPWM motif are the most important determinants of binding site preference for PBX-HOX complexes (5,23). The insertion of the HOX YPWM ''key'' into the hydrophobic ''lock'' on the surface of the PBX homeodomain tethers the HOX homeodomain to the DNA duplex, resulting in the formation of more stable contacts between the HOX Nterminal arm and DNA. Nevertheless, the formation of PBX-HOX heterodimers is not the sole determinant of the very precise in vivo functions of the different HOX proteins. PBX-HOX binding sites have been identified in several gene regulatory elements, including the Hoxb1 and Hoxb2 r4 enhancers (38 -42). Adjacent to these sites are sequences recognized by PREP1 and MEIS homeodomain proteins that form trimeric complexes with HOX and PBX (42)(43)(44)(45)(46). Thus, the presence of a third partner may be expected to further modify protein-protein and subtle protein-DNA interactions in the DNA-bound homeodomains.