Stereochemical Determinants of C-terminal Specificity in PDZ Peptide-binding Domains

Background: PDZ-peptide binding specificities establish a complex network of protein-protein interactions in the cell. Results: Crystal structures of multiple PDZ-peptide complexes reveal distinct mechanisms for accommodating C-terminal ligand side chains. Conclusion: A residue in the PDZ “XΦ1GΦ2” signature sequence co-determines peptide carboxylate and C-terminal side-chain binding. Significance: Understanding the stereochemical determinants of peptide binding leads to an improved ability to predict PDZ interaction specificity. PDZ (PSD-95/Dlg/ZO-1) binding domains often serve as cellular traffic engineers, controlling the localization and activity of a wide variety of binding partners. As a result, they play important roles in both physiological and pathological processes. However, PDZ binding specificities overlap, allowing multiple PDZ proteins to mediate distinct effects on shared binding partners. For example, several PDZ domains bind the cystic fibrosis (CF) transmembrane conductance regulator (CFTR), an epithelial ion channel mutated in CF. Among these binding partners, the CFTR-associated ligand (CAL) facilitates post-maturational degradation of the channel and is thus a potential therapeutic target. Using iterative optimization, we previously developed a selective CAL inhibitor peptide (iCAL36). Here, we investigate the stereochemical basis of iCAL36 specificity. The crystal structure of iCAL36 in complex with the CAL PDZ domain reveals stereochemical interactions distributed along the peptide-binding cleft, despite the apparent degeneracy of the CAL binding motif. A critical selectivity determinant that distinguishes CAL from other CFTR-binding PDZ domains is the accommodation of an isoleucine residue at the C-terminal position (P0), a characteristic shared with the Tax-interacting protein-1. Comparison of the structures of these two PDZ domains in complex with ligands containing P0 Leu or Ile residues reveals two distinct modes of accommodation for β-branched C-terminal side chains. Access to each mode is controlled by distinct residues in the carboxylate-binding loop. These studies provide new insights into the primary sequence determinants of binding motifs, which in turn control the scope and evolution of PDZ interactomes.

PDZ (PSD-95/Dlg/ZO-1) binding domains often serve as cellular traffic engineers, controlling the localization and activity of a wide variety of binding partners. As a result, they play important roles in both physiological and pathological processes. However, PDZ binding specificities overlap, allowing multiple PDZ proteins to mediate distinct effects on shared binding partners. For example, several PDZ domains bind the cystic fibrosis (CF) transmembrane conductance regulator (CFTR), an epithelial ion channel mutated in CF. Among these binding partners, the CFTR-associated ligand (CAL) facilitates post-maturational degradation of the channel and is thus a potential therapeutic target. Using iterative optimization, we previously developed a selective CAL inhibitor peptide (iCAL36). Here, we investigate the stereochemical basis of iCAL36 specificity. The crystal structure of iCAL36 in complex with the CAL PDZ domain reveals stereochemical interactions distributed along the peptide-binding cleft, despite the apparent degeneracy of the CAL binding motif. A critical selectivity determinant that distinguishes CAL from other CFTR-binding PDZ domains is the accommodation of an isoleucine residue at the C-terminal position (P 0 ), a characteristic shared with the Tax-interacting protein-1. Comparison of the structures of these two PDZ domains in complex with ligands containing P 0 Leu or Ile residues reveals two distinct modes of accommodation for ␤-branched C-terminal side chains. Access to each mode is controlled by distinct residues in the carboxylate-binding loop. These studies provide new insights into the primary sequence determinants of binding motifs, which in turn control the scope and evolution of PDZ interactomes.
Within each cell, diverse arrays of peptide recognition domains (PRDs) 3 contain binding sites that engage cognate peptide ligands, forming interaction networks that regulate protein trafficking, localization, and activity. The specificity of each PRD is reflected in a linear binding motif that prescribes both the identity and spacing of amino acids required for the interaction. In principle, such preferences are encoded by the PRD primary sequence, which determines the stereochemistry of the peptide-binding cleft, and thus favors interactions with partners that share a complementary motif. Deciphering this "sequence-motif" code would enable us to identify potential interactions and design selective inhibitor sequences in silico (e.g. Ref. 1). Unfortunately, even for highly conserved interactions, connections between PRD sequence determinants and specific target motif preferences often remain obscure.
This challenge is particularly acute for interactions mediated by PDZ domains, which were first characterized in the proteins PSD-95, Dlg, and ZO-1. PDZ domains are now recognized as one of the largest families of PRDs and are responsible for modulating the activity of a wide variety of receptors and enzymes. Multidomain PDZ proteins also form scaffolds that regulate the assembly and composition of supermolecular complexes (2). As a result, PDZ domains play key roles in a number of human diseases, including cancer (3)(4)(5), epilepsy and pain (6,7), and cystic fibrosis (8,9). * This work was supported, in whole or in part, by National Institutes of Health Although experimental studies have characterized target preferences for a large number of PDZ domains (10,11), binding information is absent or available only for a limited subset of ligands for many other domains. Furthermore, as for other PRDs, despite an abundance of work investigating recognition patterns and motifs, it is still difficult to predict the binding motif of a given PDZ domain based on its protein sequence or binding-site stereochemistry.
In part, this is due to the surface geometry of PDZ-peptide interactions. All PDZ domains share a conserved tertiary structure, comprised of two ␣-helices and an antiparallel ␤-sheet. Canonical target peptides bind in a shallow cleft between the ␤2 strand and ␣2 helix, forming an antiparallel ␤-sheet hydrogenbonding network and allowing the ligand C-terminal residue to interact with the carboxylate-binding loop on the PDZ domain (12). This loop contains a conserved X⌽ 1 G⌽ 2 sequence (⌽ ϭ hydrophobic) that facilitates main-chain hydrogen bonding to the peptide carboxylate (13). As a result, most PDZ proteins bind C-terminal target sequences. In addition, this loop fixes the orientation of the C-terminal (P 0 ) side chain to point into a hydrophobic pocket, imposing a primary specificity filter for peptides carrying a hydrophobic residue at the P 0 position (14).
PDZ binding specificity is also strongly influenced by the second position from the C terminus (P Ϫ2 ). For class I PDZ domains, an essential hydrogen bond is formed between the O ␥ atom of the P Ϫ2 residue and the first residue of the ␣2 helix, ␣2-1 (His), imposing a requirement for Ser or Thr (15). Subsequent work has revealed additional preferences for many domains, leading to more differentiated motifs (11, 16 -18). Nevertheless, the motifs overlap, enabling multiple PDZ proteins to bind a common target, and several domains retain highly degenerate motifs. As a result, selective targeting is a challenge.
We previously reported a peptide inhibitor of the interaction of the cystic fibrosis transmembrane conductance regulator (CFTR) with the PDZ domain of the CFTR-associated ligand (CAL). The resulting inhibitor of CAL (or iCAL36) peptide is a decamer with the sequence ANSRWPTSII (8,9). Although the Na ϩ /H ϩ exchanger regulatory factor (NHERF) proteins also bind CFTR through their PDZ domains, they do not interact with iCAL36 (8,9).
One of the earliest observations in the development of iCAL36 was that CAL and NHERF PDZ domains exhibit different secondary preferences for the P 0 residue. Although each prefers a Leu residue at this position, CAL also favors either a Val or Ile, whereas the NHERF domains can accommodate a Phe instead.
To understand the stereochemical basis for this difference, we determined a high-resolution structure of the CAL PDZ domain (CALP) in complex with either iCAL36 or a variant bearing a C-terminal Leu (iCAL36 L ). In addition, we co-crystallized both peptides with the Tax-interacting protein-1 (TIP-1), another PDZ protein that accommodates ␤-branched C-terminal side chains (19) and binds iCAL36. 4 By comparing these complexes to previously studied PDZ domains, we have identi-fied alternative mechanisms of P 0 side-chain accommodation. The resulting insights expand our understanding of sequencemotif relationships in this important class of PRDs.
Structure Determination of CALP⅐iCAL36-CALP⅐iCAL36 complexes were crystallized, and data were collected as previously described (20). Phases were determined by single isomorphous replacement with anomalous scattering (SIRAS), using a CALP⅐iCAL36 crystal (reservoir solution: 200 mM NaCl, 100 mM Tris, pH 7.5, 32% (w/v) PEG 8000) that had been soaked 2 min in reservoir solution containing 250 mM 5-amino-2,4,6triiodoisophthalic acid (I3C; lithium salt), and then transferred to cryoprotectant buffer (200 mM NaCl, 100 mM Tris, pH 7.5, 32% (w/v) PEG 8000, 20% (v/v) glycerol) prior to flash cooling in liquid nitrogen. For the anomalous data, crystal diffraction was evaluated at 100 K on beamline X6A at the National Synchrotron Light Source (NSLS) at Brookhaven National Laboratory. Oscillation data were collected at ϭ 1.7463 Å over a 720°r ange, using 0.25°frames and an exposure time of 3 s per frame. The two data sets were scaled using SADABS (22). The data were prepared using XPREP (23), and heavy atoms were located using SHELXD (24). Phase extension and density modification were performed with SHELXE including an autotracing algorithm (25). Density modification, including noncrystallographic symmetry averaging, was performed using the CCP4 suite (26) and the in-house masking program ENVAT (27). Model building and refinement were performed using PHENIX (28,29), and manual refinement done using COOT (30) ( Table  1). After refinement, model geometry was assessed using MOLPROBITY (31) and the PDB validation server (32)(33)(34). All residues are found in the favored (95.5%) and allowed (4.5%) regions of the Ramachandran plot. Coordinates and structure factors were deposited in the Protein Data Bank with accession code 4E34.
To locate the position of the I3C compound in the electron density map of the heavy atom substructure, we exploited the defined geometry of the iodine peaks (35). To visualize the interaction of I3C with the protein-peptide complex, a molecular replacement (MR) search of the anomalous dataset was performed in PHENIX (28,29), using a single protomer (A) of the refined CALP⅐iCAL36 structure as the search model. Model building, refinement, and assessment were performed as described above (Table 1). Previously calculated restraints, derived from a small molecule crystal structure, were used in refinement of the I3C molecule (36). Due to the low occupancy of the I3C compound, density is only seen for the 3 iodine atoms, whose occupancies were set to 0.3. The final B-factors for the I3C iodine atoms were 67, 56, and 27 Å 2 .
Structure Determination of CALP⅐iCAL36 L -The CALP⅐ iCAL36 L complex crystallized by vapor diffusion in hangingdrop format under conditions similar to those for CALP⅐ iCAL36 (20). The iCAL36 L peptide was synthesized and purified by the Tufts Peptide Core Facility. CALP was dialyzed into crystallization buffer (10 mM HEPES, pH 7.4, 25 mM NaCl) and adjusted to a final concentration of 5.5 mg ml Ϫ1 of protein, and iCAL36 L was added to a final concentration of 1 mM. The CALP⅐iCAL36 L solution was mixed in a 1:1 ratio with reservoir solution, total volume was 4 l, and the drop was allowed to equilibrate by vapor diffusion at 291 K. As seen with the CALP⅐iCAL36 complex, crystals appeared in 2-4 days and continued to grow for up to 10 days (20). The crystal used for diffraction was obtained using a reservoir solution of 150 mM NaCl, 100 mM Tris, pH 7.5, 33% (w/v) PEG 8000.
Prior to data collection, the crystal was transferred into cryoprotectant buffer (150 mM NaCl, 100 mM Tris, pH 7.5, 33% (w/v) PEG 8000, 20% (v/v) glycerol) and flash cooled by plunging into a liquid nitrogen bath. Crystal diffraction was evaluated at 100 K on beamline X6A (NSLS). Oscillation data were collected at ϭ 1.0000 Å over a 180°range, using 0.3°frames and an exposure time of 1 s per frame. Diffraction images were processed using the XDS package (37) ( Table 1). An MR search was performed with PHENIX (28,29), using a CALP⅐iCAL36 protomer (A), excluding peptide, as a search model. To minimize model bias in either the peptide or the ⌽ 1 and ⌽ 2 residues, initial modeling was performed using a composite-omit strategy (38). As a further control for bias, the procedure was repeated using the other protomer (B) as a search model, and the solutions were compared with ensure that there were no significant differences. Model building and refinement were performed as described for the CALP⅐iCAL36 structure above. Final refinement statistics are shown in Table 1. Coordinates and structure factors were deposited in the Protein Data Bank with accession code 4E35.
Structure Determination of TIP-1⅐iCAL36 L -Crystals of TIP-1⅐iCAL36 L were obtained by the same protocol described above and elsewhere, 4 using a solution of 5.5 mg ml Ϫ1 of TIP-1 and 1 mM iCAL36 L . The crystal used for data collection was obtained with a reservoir buffer of 250 mM KSCN, 100 mM MES, pH 6.0, 25% (w/v) PEG3350.
The crystal was transferred into cryoprotectant buffer (250 mM KSCN, 100 mM MES, pH 6.0, 30% (w/v) PEG 400) prior to flash cooling in liquid nitrogen, and diffraction data were collected on beamline X6A (see above) at ϭ 0.9795 Å over a 270°r ange, using 0.2°frames. Data were processed using the XDS package (37) ( Table 1). MR was performed using PHENIX (28,29), with a protomer (A) of TIP-1 from the TIP-1⅐iCAL36 structure as a search model. Electron density for the P 0 Leu residue confirmed the MR solution (data not shown). Model building, refinement, and assessment were performed as described above for the CALP⅐iCAL36 structure. Final refinement statistics are presented in Table 1. Coordinates and structure factors were deposited in the Protein Data Bank, with accession code 4E3B.
FP Assays-FP assays were performed as described previously (9, 21), using the following FP buffer: 150 mM NaCl, 0.02% (w/v) sodium azide, 50 mM sodium phosphate, pH 7.4, and 0.1 mM TCEP. The final protein concentration for competition binding experiments was 1.5 ϫ K d for CALP, TIP-1, N1P1, N1P2, N2P1, and N2P2. The following reporter peptides were used: . F*-corresponds to a fluorescein group linked to the peptide N terminus via an aminohexanoic acid linker. The K d values of the reporter peptides were calculated as previously described (21). The K i values were calculated using a SOLVERbased least squares fitting algorithm in EXCEL (9,21). Values for weakly interacting peptides, defined as K i Ͼ 1000 M, were calculated by comparison with modeled displacement isotherms (9).

RESULTS
Structure Determination of the CALP⅐iCAL36 Complex-As previously reported, although CALP⅐iCAL36 co-crystals diffract to 1.4-Å resolution, molecular replacement searches failed to yield a correct phase model (20). To obtain de novo phase information, we soaked CALP⅐iCAL36 crystals in the presence of I3C (35). This compound was recently developed as a tool to facilitate the incorporation of heavy atoms into protein crystals for phasing by anomalous scattering and/or isomorphous replacement, an attractive option, as CALP contains no methionine residues for selenomethionine labeling. Soaking in cryoprotectant buffer yielded an I3C derivative CALP⅐iCAL36 cocrystal that diffracted to 1.98-Å resolution and exhibited anomalous differences (SigANO ϭ 2.0 in the 6.99 -6.0 Å resolution shell) (37).
Initial phase determination by SIRAS failed using traditional protocols, yielding a figure of merit of 0.50, and pseudo-free correlation coefficient of 53%. However, using the SHELXE autotracing function (25), we identified a solution with an figure of merit value of 0.69 and pseudo-free correlation coefficient of 72%. Phase quality was confirmed by the presence of side chains that were not present in the polyalanine phasing model (Fig. 1A), as well as clear electron density for the bound peptide (data not shown).
The asymmetric unit contains two CALP⅐iCAL36 complexes (protomers A and B), bridged by a single I3C molecule "halogen bonding" (41) to the carbonyl oxygens of Ile 315 in the A-protomer and His 296 in the B-protomer (Fig. 1, B and C). Following iterative rounds of density modification, model building, and maximum likelihood refinement to 1.4-Å resolution, the final model exhibits excellent agreement with the experimental diffraction data, yielding values for R work and R free of 0.174 and 0.197, respectively. It also conforms to standard peptide geometrical constraints (Table 1). Alternative sidechain conformations were exhibited by five solvent-exposed residues. The final model includes 87 residues of each CALP protomer, along with their corresponding decameric peptide ligands. Least squares superposition of the noncrystallographic symmetry-related complexes by main-chain atoms shows they share a highly similar domain structure: 344 atoms align with a root mean square deviation (r.m.s. deviation) of 0.45 Å, and no individual differences exceed a threshold of 3 ϫ r.m.s. deviation. The largest variation between the two chains is in the carboxylate-binding loop, at residues Gly 300 and Ile 301 , the G⌽ 2 residues of the X⌽ 1 G⌽ 2 sequence, with C ␣ offsets of 0.9 and 1.2 Å, respectively.
CALP⅐iCAL36 Binding Stereochemistry-The resulting structure provides the first high resolution images of CALP ligandbinding interactions. Consistent with previous NMR structural analyses of the domain (42, 43), both CALP protomers exhibit a standard PDZ tertiary structure (44), consisting of five FIGURE 1. CALP⅐iCAL36 crystal structure. A, a stereo view of the SIRAS electron density map (blue mesh, contour level: 1) is shown with the SHELXE poly-Ala trace (orange carbons) and the final refined model (cyan carbons), clearly revealing the position of the His 296 side chain. B, the I3C magic triangle (stick figure, green carbons) is bound between the A-and B-protomers (ribbon diagrams, cyan) of CALP⅐iCAL36. C, substructure phases reveal clear density (blue mesh; contour level: 2.5 ) with the expected interatomic distances (dashed lines) for the I3C heavy atoms that form halogen bonds with the carbonyl oxygens of His 296 of the CALP B-protomer and Ile 315 of the CALP A-protomer (dashed lines, distances labeled). D, residues P 0 to P Ϫ5 of the peptide (stick figure, yellow carbons) interact with a shallow groove at the surface of the CALP domain (cyan). Some peptide residues in the range P Ϫ6 to P Ϫ9 form lattice contacts with a neighboring molecule (not shown). E, main-chain functional groups of the peptide (stick figure, yellow carbons) form hydrogen bonds (dashed lines) with the ␤2 strand of CALP, extending the antiparallel ␤-sheet (stick figure, cyan carbons), which also includes the adjacent ␤3 strand. Non-carbon atoms are colored by element (red ϭ O, blue ϭ N, purple ϭ I).
␤-strands (␤1-␤5) and two ␣-helices (␣1-␣2) (Fig. 1B). Comparison of the A-and B-protomers to known structures using the DALI distance-alignment matrix algorithm (45) confirmed that the 100 structures most similar to CALP are all PDZ domains. For both the A-and B-protomers, the most similar domain is the A-protomer of ␣-syntrophin (PDB code 1QAV) with Z-scores of 18.2 and 17.8, respectively. Superposition of 85 C ␣ atoms of either protomer with a previously determined CALP NMR structure (42) also reveals close similarity, with equivalent secondary structure elements and an r.m.s. deviation of 1.4 Å. Main-chain offsets are distributed evenly throughout the entire protein, which may have confounded previous phase determination efforts by MR (20).
In both CALP protomers, the P 0 to P Ϫ5 residues bind in an extended peptide-binding groove that runs along one surface of the domain, where they engage in canonical Class I interactions (Fig. 1D). Through main-chain hydrogen bonds to the CALP ␤2 strand, the peptide forms an additional strand of the central anti-parallel ␤-sheet in the PDZ domain (Fig. 1E). The serine residue at the P Ϫ2 position of iCAL36 forms a hydrogen bond with His 349 of CALP, whereas the C-terminal oxygen atoms interact with Leu 299 and Gly 300 in the CALP carboxylate-binding motif. Finally, the P 0 side chain is buried in a pocket formed by Gly 300 and Ile 301 , the G⌽ 2 pair of the carboxylate-binding motif, as well as Ile 303 and Val 353 .
At position P Ϫ4 , there is weak evidence for an alternative proline pucker in the B-protomer complex. However, when both conformers were modeled, the alternative conformation was refined to an occupancy value Յ20%. Although a proline pucker can influence main-chain orientation (46), in this case, the effects on neighboring residues are on the scale of the maximum likelihood coordinate error (0.16 Å). Thus, our final model includes only the dominant P Ϫ4 conformation.
Upstream of the P Ϫ6 residue (i.e. closer to the peptide N terminus), the conformation of both peptides appears to be influenced by crystal lattice contacts, involving P Ϫ8 Asn and P Ϫ9 Ala in the A-protomer, and the P Ϫ6 Arg and P Ϫ8 Asn residues in the B-protomer. However, the P 0 to P Ϫ5 positions of each protomer are very similar (r.m.s. deviation Ͻ0.16 Å), and the stereochemical contacts of the peptide residues with the PDZ binding site are conserved.
Previous analyses indicate that up to seven ligand residues can interact with the PDZ domain, substantially extending classical motifs (11). Although the CAL binding motif shows clear preferences only at the P 0 and P Ϫ2 positions, studies of CALbinding sequences identified affinity contributions extending past the P Ϫ9 position (9). Our CALP⅐iCAL36 complex structure supports the idea that multiple non-motif positions along the binding cleft contribute to the interaction. For example, the Trp residue at P Ϫ5 , the most N-terminal position not affected by lattice contacts, interacts with a mostly hydrophobic ledge on the CALP surface, composed of residues His 309 , Gly 310 , and Val 311 ( Fig. 2A). In addition, the P Ϫ4 Pro residue interacts with a groove formed by CALP residues Gly 305 , His 309 , and His 349 (Fig. 2B).
P 0 Contributions to CAL/NHERF Selectivity-One of the earliest observations from peptide array experiments was that both CALP and N1P1 share a preference for a Leu residue at P 0 , but that only CALP binds peptides containing a P 0 Ile. In a sequence derived from the somatostatin receptor subtype 5 (SSR5) C terminus, replacement of the wild-type P 0 Leu with Ile caused only a 1.6-fold weakening of CALP affinity, but a 47-fold weakening of N1P1 affinity (9). To test whether these distinct P 0 preferences were dependent on the overall sequence of the peptide, we measured binding affinities for two additional pairs of peptides, with each pair differing only at the C-terminal position. The sequence of iCAL36 L (ANSRWPTSIL) corresponds to that of iCAL36 with a Leu substitution at P 0 . The sequence of CFTR I (TEEEVQDTRI) corresponds to the CFTR C terminus (TEEEVQDTRL) with an Ile substitution at P 0 . The results obtained with both pairs of sequences confirm the selectivity pattern seen with SSR5. For the CALP domain, FP competition assays show very similar binding of CALP to iCAL36 L versus iCAL36, and also to CFTR versus CFTR I (Fig.  3A, Table 2). In contrast, for the NHERF1 and NHERF2 PDZ domains (N1P1, N1P2, N2P1, and N2P2), binding affinities are weakened when Ile is substituted for Leu at P 0 . Specifically, the K i values are 20-to 35-fold weaker for CFTR I than for the Leubearing WT CFTR sequence (Table 2). Likewise, none of the NHERF domains exhibit binding to iCAL36, which carries a P 0 Ile, but three of four show detectable affinity for iCAL36 L (Fig.  3B, Table 2).
Thus, in multiple background sequences, the NHERF PDZ domains exhibit a robust preference for Leu over Ile at P 0 , whereas CALP appears largely indifferent. These differing P 0 preferences are an essential component of the selectivity of iCAL36 for CALP versus the NHERF PDZ domains. iCAL36 binds CALP Ͼ170-fold more tightly than any of the four NHERF domains (9). However, if the P 0 Ile is replaced with a Leu this difference is largely erased: iCAL36 L binds N2P1 (K i ϭ 77 Ϯ 2 M) only 3-fold more weakly than it binds CALP (K i ϭ 24 Ϯ 2 M). P 0 Specificity Determinants in the Carboxylate-binding Loop-Given this central contribution to target selectivity, we wished to understand the stereochemical differences that enable CALP, but not the NHERF domains, to accommodate a P 0 Ile. In the N1P1⅐CFTR structure (PDB code 1I92, Ref. 47), in silico replacement of the C-terminal Leu with an Ile side chain leads to steric clashes of the Ile ␤-branch with the narrow entrance of the P 0 pocket (data not shown), providing a straightforward explanation for its failure to bind iCAL36. However, previous structures of the CALP domain had yielded conflicting views of the P 0 binding pocket. Our NMR structure (PDB code 2LOB, Ref. 42) had shown a pocket that was modestly expanded in CALP compared with N1P1, consistent with the ability of CALP to bind Ile (Fig. 4A). However, another NMR structure (PDB code 2DC2, Ref. 43) shows a CALP pocket more constricted than either.
These conflicting observations not only complicated our understanding of Ile accommodation, but also suggested that the CALP pocket could be inherently flexible, and thus capable of induced-fit binding. To visualize the stereochemistry of accommodation, we therefore compared the N1P1⅐CFTR complex to the co-crystal structure of CALP bound to iCAL36, which includes a P 0 Ile side chain. Superposition of the ␤2 and ␣2 secondary structural elements that flank the peptide-binding cleft yielded an r.m.s. deviation of 0.62 Å.
For the A-protomer of the CALP⅐iCAL36 complex, the dimensions of the P 0 binding pocket are surprisingly similar to those found in N1P1 (Fig. 4B, orange and green meshes). However, as noted above, the N1P1 pocket geometry does not appear able to accommodate a branched Ile side chain without further changes. Instead, A-protomer binding appears to involve a concerted translation of the carboxylate-binding loop and the P 0 residue. Compared with its position in the N1P1⅐ CFTR complex, the C ␣ atom of the ⌽ 1 residue in the carboxylate-binding loop has moved nearly 2 Å closer to the pocket in the A-protomer structure (Fig. 4C). A corresponding displacement of the peptide carboxylate group reorients the Ile side chain within the pocket and alleviates the steric clashes near the entrance (Fig. 4B).
In contrast, for the B-protomer of the CALP⅐iCAL36 complex, the ⌽ 1 C ␣ atom shows a smaller ϳ1 Å displacement relative to N1P1⅐CFTR (Fig. 4D). However, a second difference is also observed: the ⌽ 2 Ile residue of the protein has adopted a different rotamer (Fig. 4B, stick figures labeled ⌽ 2 ). The resulting displacement of the ⌽ 2 ␥ carbon expands the volume of the P 0 binding pocket (Fig. 4B, yellow mesh) and enables CALP to accommodate the peptide Ile side chain despite a more modest translation of the carboxylate group.
Overall, it appears that CAL can utilize two distinct modes for expanding its P 0 binding repertoire compared with that of N1P1: i) displacement of the ⌽ 1 residue within the carboxylatebinding loop ("loop offset") and/or ii) expansion of the P 0 pocket via reorientation of ⌽ 2 Ile ("pocket expansion"). Each of these strategies involves a rearrangement of one of the two hydrophobic residues of the X⌽ 1 G⌽ 2 motif.
Influence of P 0 Side-Chain Identity on the Binding Site-To determine whether these distinct ⌽ 1 and ⌽ 2 conformations are also observed in the presence of a P 0 Leu side chain, we determined the structure of the CALP domain bound to the iCAL36 L peptide, using MR protocols designed to avoid model bias. Following refinement, the model exhibits R work and R free values of 0.184 and 0.198, respectively (Table 1), and also contains two molecules in the asymmetric unit. To compare P 0 binding modes, we aligned the iCAL36 and iCAL36 L structures by pairwise superposition of the ␤2 and ␣2 secondary structural elements, which flank the peptide-binding cleft. Overall, the CALP⅐iCAL36 and CALP⅐iCAL36 L structures are very similar, with an r.m.s. deviation of 0.04 Å.
Unlike the iCAL36 complex, both of the CALP⅐iCAL36 L complexes in the asymmetric unit exhibit a shared ⌽ 1 and ⌽ 2 conformation, most similar to that seen in the CALP⅐iCAL36 A-protomer (Fig. 5A). Instead of an expanded pocket, both show a ϳ1 Å P 0 offset compared with the CALP⅐iCAL36 B-protomer (Fig. 5B). Thus, the ⌽ 1 and ⌽ 2 conformations of the A-protomers appear relatively unchanged in the presence of Leu and Ile. For the B-protomers, the presence of a C-terminal Ile is associated with the two aforementioned conformational changes in the binding pocket: a ⌽ 2 rotamer shift that expands the P 0 pocket (arrow in Fig. 5B), together with a displacement of the ⌽ 1 main chain that repositions the peptide carboxylate and C-terminal side chain (⌽ 1 C ␣ difference in Fig.  5A).
Accommodation of a P 0 Ile without Pocket Expansion-In an attempt to determine the relative contributions of these two mechanisms to the accommodation of Ile in solution, we first considered a possible influence of the crystal lattice on the observed stereochemistries of binding. In the A-protomers of both CALP⅐peptide complexes, a lattice hydrogen bond is formed by the main-chain carbonyl oxygen of His 288 , directly upstream of the 290 GLGI 293 motif. In contrast, in the B-protomers, no contacts are observed in the vicinity of the X⌽ 1 G⌽ 2 -loop: the closest symmetry-related atom is at least 6 Å away.
Based on these differential lattice interactions, we initially suspected that the ⌽ 2 reorientation and associated P 0 pocket expansion seen in the B-protomers were more likely to reflect the solution-state mechanism for P 0 Ile accommodation by the CAL PDZ domain. This idea would be consistent with earlier proposals that the ⌽ 2 residue is a critical determinant of P 0 specificity (17).
To test whether ⌽ 2 identity is a crucial component of P 0 Ile accommodation, we investigated the binding stereochemistry of the TIP-1 PDZ domain. Like the NHERF domains, the TIP-1 PDZ domain contains a Phe side chain at the ⌽ 2 position, in contrast to the ⌽ 2 Leu found in CALP. Like CALP, TIP-1 also binds iCAL36, and the previously determined TIP-1⅐iCAL36 crystal structure 4 (PDB code 3SFJ) superimposes well on the CALP domain with an overall C ␣ r.m.s. deviation value of 1.5 Å. Comparison confirms that the TIP-1 P 0 pocket is not expanded relative to the CAL⅐iCAL36 A-protomer (Fig. 5C).
The TIP-1:iCAL36 complex also aligns well to the N1P1⅐CFTR complex (47) (r.m.s. deviation ϭ 0.63 Å), revealing an offset of the carboxylate-binding loop and P 0 residue (Fig.  5D). Thus, TIP-1 employs a mechanism for accommodating a P 0 Ile that is similar to the loop offset seen in the CALP A-protomer. This result confirms that displacement of the ⌽ 1 C ␣ position is an alternative strategy for accommodating ␤-branched P 0 residues in the presence of ⌽ 2 rotamers (CALP A-protomer) or larger side chains (TIP-1) that restrict the volume of the pocket.
The Loop Offset Is Also Observed with Ligands Containing a P 0 Leu-A loop offset is not observed exclusively in the presence of a peptide P 0 Ile; as described above, it is also seen in the CALP⅐iCAL36 L complex, which contains a P 0 Leu. To test  Table 2. whether this observation applies to the TIP-1 binding site as well, we investigated the stereochemistry of the interaction of the TIP-1 PDZ domain with the iCAL36 L peptide. FP competition experiments reveal that TIP-1 binds iCAL36 L (K i ϭ 0.36 M) about 3-fold more tightly than it binds iCAL36 (K i ϭ 1.3 M), confirming a modest preference for a P 0 Leu (Fig. 3C). In parallel, we co-crystallized the TIP-1⅐iCAL36 L complex and determined its structure by x-ray diffraction. The final structure was refined to R work and R free values of 0.187 and 0.216, respectively, with geometrical parameters as described in Table 1.
The TIP-1⅐iCAL36 L structure contains two very similar molecules within the asymmetric unit, with a C ␣ r.m.s. deviation of 0.16 Å for main-chain atoms between protomers, and 0.08 Å in comparison to the TIP-1⅐iCAL36 complex, for ␤2/␣2 elements. In particular, the stereochemistry of the P 0 binding pockets and the position of the P 0 main-chain atoms within the pockets are highly conserved (Fig. 5E). Comparisons to previously published structures of TIP-1 in complex with C-terminal ␤-catenin (P 0 Leu) or Kir2.3 (P 0 Ile) peptides also reveal very similar P 0 binding pockets (48, 49) (data not shown), as does alignment to the N1P1⅐CFTR complex (47) (r.m.s. deviation ϭ 0.64 Å). Consistent with the loop offset mechanism, the distance between the ⌽ 1 C ␣ position of the N1P1⅐CFTR complex (Tyr 24 ) and the corresponding position of TIP-1⅐iCAL36 L (Leu 30 ) is 1.7 Å.
This suggests that displacement of the carboxylate-binding loop is not only a prerequisite for binding Ile, or a ␤-branched P 0 ligand side chain. Instead, it most likely represents a fundamental characteristic of the TIP-1 binding site, as it does for CALP.
Taken together, all four TIP-1 and CALP protomers in complex with Leu, and three of four protomers in complex with Ile share a common stereochemistry for the P 0 pocket. For these structures, the loop-offset strategy is utilized to accommodate either a P 0 Leu or Ile side chain. Because the loop offset is not selectively required by the presence of Ile, or a ␤-branched P 0 ligand side chain, it most likely represents a fundamental conformation for the TIP-1 and CALP binding sites. As a result, although the Leu/Ile switch is associated with an expanded steric volume in the P 0 pocket of the CALP B-protomer, such expansion is not a requirement of accommodation.
⌽ 1 Residue Identity Influences the Position of the Carboxylate-binding Loop-Because the C-terminal Ile side chain can be reoriented to fit even within a restricted P 0 binding pocket, we wanted to determine which stereochemical factors prevented the N1P1 domain from utilizing the same strategy. The carboxylate-binding motif of N1P1 (GYGF) shares a Phe at the ⌽ 2 position with TIP-1 (ILGF). However, at the ⌽ 1 position, N1P1 contains a bulky Tyr residue, in contrast to the Leu residue found in either CAL or TIP-1. This suggested that steric clashes may prevent domains with large ⌽ 1 residues from accessing the loop offset mechanism of P 0 Ile accommodation.
To test this hypothesis, we first transformed the N1P1 carboxylate-binding loop ( 23 GYGF 26 ) in silico to reproduce the conformation observed in the TIP-1⅐iCAL36 L complex ( 28 ILGF 31 ). This operation causes Tyr 24 to clash sterically with Ile 79 , the ␣2-8 residue, as well as Ala 82 , Ala 85 , and Val 86 (Fig.  5F). A similar set of clashes was observed when the N1P1 23 GYGF 26 residues were transformed to reproduce the conformation of the 290 GLGI 293 carboxylate-binding loop of CALP⅐iCAL36 L (data not shown). Thus, the carboxylate-binding loop of N1P1 cannot adopt the conformation seen in CALP or TIP-1 due to steric constraints at the ⌽ 1 position.
Based on these observations, we extended our analysis to other Class I PDZ domain complex structures. We first analyzed carboxylate-binding loop positioning as a function of the X⌽ 1 G⌽ 2 sequences and observed tight clustering for most variations (Fig. 6, A and B). Although the ␤1-␤2 loop residues upstream of the first position in the motif are highly variable in these comparisons, this variability appears to reflect well determined differences in the upstream conformations, rather than dynamic instability: the average B-factors for residues both upstream and downstream of X⌽ 1 G⌽ 2 are reflective of stable conformations (data not shown).
The carboxylate-binding loop position also appears to be a property of each domain, and is strongly determined by the X⌽ 1 G⌽ 2 sequence. However, we do see slight variability in domains that contain smaller side chains at both the ⌽ 1 and ⌽ 2 positions (GLGI or SLGI; Fig. 6A). This may reflect flexibility for this class of domains, analogous to that seen in our CALP structures. We also see a single outlier for domains with a GFGF carboxylate-binding loop sequence. The MAGI3 PDZ3 structure (PDB code 3SOE; Fig. 6B) reveals an atypical 4.9-Å secondary structure offset that expands the overall steric volume of the peptide-binding cleft. As discussed below, this outlier also has implications for P 0 motif predictions.
To dissect the ability of the key side chains in this interaction to determine loop positioning, we then combined all structures based on the identity of the P 0 , ⌽ 1 , or ⌽ 2 residues. When loop position is analyzed solely by P 0 residue identity (Fig. 6C), a spectrum of loop positions is seen for the Leu and Val side chains, but all the structures with Ile side chains contain aliphatic residues at ⌽ 1 , consistent with our hypothesis that ⌽ 1 stereochemistry can co-determine P 0 side-chain accommodation.
When the loops are superimposed and colored by the ⌽ 2 character (Fig. 6D), we see a spectrum of carboxylate-binding loop positions for each classification (aliphatic versus aromatic) that reveals no clear pattern. This is supported by our observation that, despite a shared ⌽ 2 Phe, TIP-1 and N1P1 exhibit a differential C ␣ offset in this position of 1.3 Å upon P 0 Leu binding (Fig. 5D).
In contrast, as seen in Fig. 6E, when the full set of loops is superimposed using the ⌽ 1 -based color scheme from Fig. 6, A and B, distinct clusters are seen. The loops bearing a ⌽ 1 Leu side chain (Fig. 6E, cool colors) are shifted toward the entrance of the P 0 binding pocket, as seen in the CALP and TIP-1 loop offset structures. In contrast, the loops bearing an aromatic ⌽ 1 side chain are shifted away from the entrance (Fig. 6E, warm colors), as seen in the N1P1 structure. As a further test of the role of the carboxylate-binding loop residues in ligand motif prediction, we evaluated the ability of the ⌽ 1 and ⌽ 2 residues to predict P 0 preferences identified for 49 PDZ domains by a high throughput experimental analysis (see supplemental Table S4 in Ref. 11).
We first evaluated the predictive value of an aromatic ⌽ 1 residue, which our studies suggest should disfavor a P 0 Ile. Among the 49 human and Caenorhabditis elegans domains studied, 14 have an aromatic ⌽ 1 residue. Only three of these accommodate a P 0 Ile. ZO-1/TJP1 PDZ1 exhibits a weak pref-FIGURE 5. Loop offset is seen with both P 0 Ile and Leu residues. A and B, following superposition, the ⌽ 1 G⌽ 2 carboxylate-binding loops (stick figures), the P 0 C ␣ atoms (stick figures), and the pocket surfaces (mesh) of the CALP⅐iCAL36 L complex (magenta) are closer to the A-protomer (loop offset) than to the B-protomer (pocket expansion) of the CALP⅐iCAL36 complex (green and yellow, respectively). The PDZ ⌽ 1 G⌽ 2 and the peptide P 0 residues are labeled. Non-carbon main-chain atoms are colored by element (red ϭ O, blue ϭ N). C, following superposition, the P 0 offset (stick figure) and pocket dimensions (mesh) of the TIP-1⅐iCAL36 complex (cyan) also more closely resemble the CALP⅐iCAL36 A-protomer than the B-protomer (green and yellow, respectively). D, the P 0 side chain and carboxylate-binding loop (stick figures) of the TIP-1⅐iCAL36 complex (cyan) are offset in comparison to those of the N1P1⅐CFTR complex (orange). E, superposition of the TIP-1 complexes with iCAL36 (cyan) and iCAL36 L (black) reveals very similar P 0 carboxylate and ⌽ 2 positions (stick figures) and binding pocket geometries (mesh) for peptides containing either Ile or Leu C-terminal side chains. F, steric clash prevents N1P1 from translating its carboxylate-binding loop like TIP-1. The X⌽ 1 G⌽ 2 carboxylate-binding loop motif 23 GYGF 26 (stick figure) of N1P1 is shown before (gray) and after (orange) transformation to match the conformation of the loop in the TIP-1⅐iCAL36 (C ␣ trace, cyan). Post-transformation steric clashes (d Ͻ3.4 Å) between N1P1 loop residues (orange) and nearby Ile 79 , Ala 82 , Ala 85 , and Val 86 residues (gray) are marked by red icons. erence for Ile; the carboxylate-binding loop sequence of ZO-1 is GFGI, suggesting a possible ⌽ 2 rotamer-based mechanism of P 0 Ile accommodation. The other exceptions are the MAGI PDZ2 and PDZ3 domains. Fig. 6E confirms that the MAGI3 PDZ3 domain (GFGF) is an outlier, and the only warm colored (olive) loop that clusters with the cool colors. Additional conformational changes may enable the carboxylate-binding loop of MAGI3 PDZ3 to shift, allowing the domain to accommodate a P 0 Ile despite the aromatic residue at both the ⌽ 1 and ⌽ 2 positions. Overall, in the absence of such additional conformational changes, an aromatic ⌽ 1 residue appears to strongly disfavor binding of a P 0 Ile. We next evaluated the predictive value of an aliphatic ⌽ 1 residue, which our studies suggest should permit accommodation of P 0 Ile even in the presence of an aromatic side chain at the ⌽ 2 position (example GLGF). Previous studies indicate that this combination should be unfavorable for Ile (17). However, of the 28 domains with a ⌽ 2 Phe, 10 can accommodate a P 0 Ile. Of those, 8 carry a small ⌽ 1 residue, consistent with the loopoffset mechanism of accommodation seen in TIP-1 (ILGF). The other two are the MAGI3 domains. Overall, these results substantiate the importance of the ⌽ 1 residue in positioning the P 0 ligand side chain within its binding pocket.

DISCUSSION
A detailed understanding of PRD binding specificity is a requirement for mapping the switchable protein-protein interactions that regulate cellular functions. Although it is relatively straightforward to identify the PRDs themselves based on sequence alignments, their short and often degenerate recognition sequences are much harder to capture by brute-force analysis of even large-scale sequence datasets (50). Several highthroughput approaches have been employed to provide experimental constraints for this problem (11,51,52). Nevertheless, robust biochemical detection is challenging, due to the transient nature of the protein-peptide interactions; the associated affinities are frequently 1-100 M or even weaker (12). As an alternative approach, numerous groups have developed algorithms to identify PRD specificity determinants based on the primary sequence of the domain (e.g. Refs. 53-61). Despite the considerable success of these approaches, in many cases, the stereochemical basis of even dominant specificity determinants remains obscure.
A clear example is provided by the secondary P 0 motif preferences of the cluster of PDZ proteins that share affinity for the C terminus of CFTR. Published NMR structures had provided conflicting models of how the CAL domain can accommodate ␤-branched side chains (42,43). Our crystal structures of CALP bound to iCAL36 and iCAL36 L address this apparent contradiction. They reveal that the aliphatic CALP ⌽ 1 and ⌽ 2 side chains facilitate binding of a ␤-branched peptide C-terminal Ile residue by two distinct mechanisms (Fig. 4B) that highlight the central role of these side chains in PDZ binding interactions.
In fact, PDZ domains were initially referred to as "GLGF repeats" (13). The first PDZ domain structures revealed that the GLGF sequence forms critical hydrogen bonds with the peptide main-chain carboxylate (62). In doing so, it positions the C-terminal peptide side chain within a pocket at the bottom of the cleft. As additional PDZ domain family members from multiple genera were identified, it became apparent that sequence variations in 3 of the positions are ubiquitous (15,63,64). As a result, the GLGF loop became known as the G⌽G⌽ or X⌽G⌽ motifs (44).
A recent analysis of the effect of these sequence differences on domain binding suggested that an aromatic residue at the ⌽ 2 position, which forms one side of the P 0 binding pocket, could sterically inhibit binding of a bulky C-terminal peptide side chain (17). However, our structures show that an Ile-bearing peptide can bind to TIP-1 even in the presence of a bulky Phe side chain at ⌽ 2 (Fig. 5C). Based on analysis of a collection of Class I PDZ domain structures, the loop offset is broadly regulated by the steric volume of the ⌽ 1 residue (Fig. 6E), which has not previously been implicated in the stereochemistry of P 0 binding. Thus, the two hydrophobic residues of the X⌽ 1 G⌽ 2 loop together co-determine stereochemical preferences at the C-terminal residue: taking both into account clearly improves prediction of one of the core determinants of PDZ binding specificity.
The ability to connect PDZ domain sequences directly to binding motif preferences will also help to understand the evolution of PDZ-mediated interaction networks. Interaction data suggest that PDZ target preferences are hard-wired into the PDZ domain itself (18,65,66). Indeed, one of the hallmarks of organismal complexity is the expansion of the number of PDZ domains and the rewiring of their interactions (66 -68). Given the limitations of determining individual PDZ interactomes experimentally, the refinement of sequence-based computational specificity predictions will be critical to understanding these fundamental evolutionary processes. Indeed, understanding the impact of site-directed mutagenesis on PDZ domain binding specificity has been the target of recent computational challenges (e.g. Refs. 55 and 60).  A and B) or clustered as a function of ligand P 0 identity (C), ⌽ 2 character (D), or ⌽ 1 character (E). Protein and peptide C ␣ traces are shown in gray, following least-squares superposition based on ␣2/␤2 secondary structure main-chain atoms. A-C and E, the carboxylate-binding loop structures are colored based on X⌽ 1 G⌽ 2 sequence. A, sequences bearing an aliphatic ⌽ 1 residue are in cool colors: GLGF, purple (PDB entry codes 1BE9, 1TP5, 2FNE, 2I0I, 2I0L, 2I1N, 3JXT, and 3RL7); ILGF, green (TIP1⅐iCAL36, PDB entry code 3SFJ; TIP-1:iCAL36 L , PDB entry codes 4E3B, 3DIW, and 3GJ9); GLGI, cyan (CALP⅐iCAL36, PDB entry code 4E34; CALP⅐iCAL36 L , 4E35); SLGI, marine blue (PDB entry codes 1VJ6, 2FCF, and 2IWQ); ELGF, dark blue (1N7T). B, sequences bearing an aromatic ⌽ 1 residue are shown in warm colors: GFGF, olive (PDB entry codes 3QJM, 3QJN, and 3SOE); GYGF, orange (PDB entry codes 1I92, 2HE4, 2OCS, 3QGL, and 3R69); NYGF, yellow (PDB entry code 3NGH); GFGI, sand (PDB entry codes 2H2B and 2H2C); TFGF, red (PDB entry codes 2EGN and 2EGK). The MAGI3 PDZ3 (X⌽ 1 G⌽ 2 ϭ GFGF; PDB entry code 3SOE) structure represents the largest conformational outlier compared with the other structures bearing the same carboxylate binding loop sequence (black arrow). C, the structures shown in panels A and B are sorted based on the identity of the P 0 residue (Leu, Val, and Ile). Sequences containing either aromatic and aliphatic ⌽ 1 residues are found in the P 0 Leu and Val clusters (left and middle), but only structures containing an aliphatic ⌽ 1 residue are found in the P 0 Ile cluster (right). D, the complete set of carboxylate-binding loop structures colored by the ⌽ 2 character (aliphatic ϭ cyan, aromatic ϭ orange) shows no consistent influence of the ⌽ 2 residue on the position of the loop. E, in contrast, when the complete set of structures is superimposed and colored as in panels A and B, the C ␣ atoms of the ⌽ 1 residues (spheres) cluster by aliphatic (cool) and aromatic (warm) character (black arrow).
Critically, a more complete understanding of sequence-specificity relationships will not only enhance our ability to predict and interpret cellular interactomes, it will also facilitate the design of specific PDZ domain inhibitors as therapeutic agents, even among domains with overlapping endogenous targets (6,8,9,69). Other studies have shown that the conformation of the carboxylate-binding loop is not determined exclusively by the ⌽ 1 and ⌽ 2 residues; it is also affected by surrounding residues, including the ␤1-␤2 loop and by nearby allosteric interactions, for example, with other PDZ domains (70 -73). Thus, one way to target PDZ domains may be through small molecule-mediated allosteric regulation (69). Despite the promiscuous nature of PDZ domain binding, our data suggest that the design of efficacious and specific PDZ inhibitors can exploit the subtle, yet crucial, carboxylate-binding loop determinants that have been carefully calibrated throughout evolution.