Structural basis of galactose recognition by C-type animal lectins.

The asialoglycoprotein receptors and many other C-type (Ca2+-dependent) animal lectins specifically recognize galactose- or N-acetylgalactosamine-terminated oligosaccharides. Analogous binding specificity can be engineered into the homologous rat mannose-binding protein A by changing three amino acids and inserting a glycine-rich loop (Iobst, S. T., and Drickamer, K. (1994) J. Biol. Chem. 269, 15512-15519). Crystal structures of this mutant complexed with beta-methyl galactoside and N-acetylgalactosamine (GalNAc) reveal that as with wild-type mannose-binding proteins, the 3- and 4-OH groups of the sugar directly coordinate Ca2+ and form hydrogen bonds with amino acids that also serve as Ca2+ ligands. The different stereochemistry of the 3- and 4-OH groups in mannose and galactose, combined with a fixed Ca2+ coordination geometry, leads to different pyranose ring locations in the two cases. The glycine-rich loop provides selectivity against mannose by holding a critical tryptophan in a position optimal for packing with the apolar face of galactose but incompatible with mannose binding. The 2-acetamido substituent of GalNAc is in the vicinity of amino acid positions identified by site-directed mutagenesis (Iobst, S. T., and Drickamer, K. (1996) J. Biol. Chem. 271, 6686-6693) as being important for the formation of a GalNAc-selective binding site.

core proteins of cartilage and other tissues and are presumed to contribute to the organization of the extracellular matrix (7).
Previous crystallographic analyses of rat mannose-binding proteins (MBPs) A and C have shown that Man-binding C-type lectins recognize their sugar ligands by formation of direct coordination bonds between a Ca 2ϩ (designated site 2) and a lone pair of electrons from each of two vicinal hydroxyl groups possessing the same stereochemical arrangement as the equatorial 3-and 4-OH groups of D-mannose (8,9). The Ca 2ϩ is 8-coordinated in a pentagonal bipyramidal arrangement, with the two sugar hydroxyls bisecting one of the apical positions (8) (see Fig. 1a). In addition, the same OH groups form hydrogen bonds with amino acid side chains that are Ca 2ϩ site 2 ligands, producing an intimately linked ternary complex of protein, Ca 2ϩ , and sugar (see Fig. 1a). Only one other contact, an apolar van der Waals contact between a ring carbon and the C ␤ of residue 189 contributes significantly to binding (8,10).
Studies with derivatized sugars have shown that free 3-and 4-OH groups are essential for binding to mammalian asialoglycoprotein receptors as well as Man-binding C-type lectins, whereas substitutions at other ring positions have little or no effect on binding (11). However, the 3-and 4-OH groups of galactose have an equatorial/axial arrangement, so the mechanism of Gal-and Man-type ligand recognition must be different. Sequence analysis reveals that of the Ca 2ϩ 2 ligands, positions equivalent to Glu 193 , Asn 205 , and Asp 206 of MBP-A are highly conserved among C-type lectins regardless of specificity. In contrast, positions 185 and 187 are found to be Glu and Asn in Man-binding family members, whereas Gal-binding C-type lectins have Gln and Asp at these positions. The Glu 185 3 Gln/Asn 187 3 Asp mutant of MBP-A, designated "QPD", binds to galactose in preference to mannose by a factor of 3 but with relatively low affinity for either sugar (12). Position 189 of MBP-A (Fig. 1a) is not conserved among Man-binding C-type lectins but is always either Trp or Phe in Gal-binding family members. Replacement of His 189 of MBP-A with Trp in the QPD mutant to make "QPDW" gives a protein with affinity for Gal comparable with natural Gal-binding C-type lectins but that still does not discriminate well between Gal and Man (13). However, insertion of a glycine-rich loop found in the major form of the rat asialoglycoprotein receptor, rat hepatic lectin-1 (RHL-1), and other Gal-binding C-type lectins that display strong discrimination against mannose results in a mutant ("QPDWG") with galactose affinity and selectivity comparable with RHL-1 (13). The affinity for galactose is comparable in QPDW and QPDWG, indicating that the determinants of affinity and selectivity are somewhat distinct.
NMR measurements reveal similar modes of galactose binding by QPDWG and RHL-1 (13), demonstrating that galactose specificity in C-type lectins is determined by a few residues and can be studied in the well characterized MBP-A background.
Here we describe the structure of a trimeric fragment of QP-DWG containing the neck and COOH-terminal CRD (14), both alone and complexed with ␤-methyl galactoside (␤MeGal) and N-acetylgalactosamine (GalNAc). The structures reveal the molecular basis of selective galactose recognition by C-type lectins. The structure of the QPDWG-GalNAc complex is consistent with results of site-directed mutagenesis experiments that have identified amino acid positions that contribute to the preferential binding of GalNAc over Gal by certain C-type lectins.

EXPERIMENTAL PROCEDURES
Materials-Unless otherwise specified, chemicals were obtained from J. T. Baker Inc. LB medium was obtained from Life Technologies, Inc. Guanidinium hydrochloride and isopropylthiogalactopyranoside were obtained from Boehringer Mannheim. Clostripain was obtained from Worthington Biochemical. Sepharose 6B, polyethylene glycol 8000, ␤-methyl galactoside, and N-acetylgalactosamine were from Sigma Divinylsulfone was obtained from Fluka Chemical Co. 2-Methyl-2,4-pentanediol was obtained from Aldrich.
Purification of QPDWG-The QPDWG mutant of rat MBP-A was expressed in Escherichia coli as described previously (13), except that the amount of isopropylthiogalactopyranoside used to induce expression was 1 mM. The clostripain-treated fragment of QPDWG, cl-QP-DWG, was purified as described for wild-type cl-MBP-A (14), except that galactose-Sepharose (prepared by the divinylsulfone method (15)) was used in all affinity chromatography steps.
Crystallization and Data Collection-Crystals of cl-QPDWG were grown at 20°C by hanging drop vapor diffusion by mixing equal volumes of 25 mg/ml cl-QPDWG in 10 mM CaCl 2 /10 mM NaCl and reservior solutions containing 12-15% polyethylene glycol 8000/100 mM Tris-HCl, pH 8.0/20 mM CaCl 2 /10 mM NaCl/0.02% NaN 3 (solution A). Crystals appeared within 3-4 days and grew to full size (typically 0.3 ϫ 0.3 ϫ 0.2 mm 3 ) in 7-10 days. Prior to data collection, the crystals were adapted in a stepwise fashion to solution A plus 0, 5, 7.5, 10, 15, and 20% 2-methyl-2,4-pentanediol. Complexes with monosaccharides were prepared by including 200 mM ␤MeGal or GalNAc in the soaking solutions. Crystals were flash-cooled at 100 K, and diffraction data were measured on an R-AXIS II imaging plate detector mounted on a rotating copper anode operating at 4.5 kW. A total of 180°of data were collected in 1.2°oscillation scans from a single orientation and processed using DENZO and SCALEPACK (16). A data set used in the initial stages of the structure determination was obtained to 2.0 Å resolution (R sym ϭ 4.9%; 88.2% complete). A more complete data set was subsequently measured from another unliganded cl-QPDWG crystal and used in the final stages of refinement (see Tables I and II).
Structure Solution and Refinement-Crystals of cl-QPDWG are nearly isomorphous with those of wild-type cl-MBP-A and permitted structure solution by rigid body refinement of the wild-type cl-MBP-A model (14) against the first cl-QPDWG data set. The side chains of residues 185 and 187, the entire loop from 189 to 198, Ca 2ϩ , and water molecules were omitted from the model. Temperature factors from the wild-type model were retained. All calculations were performed using X-PLOR (17). The protomers were refined as individual rigid bodies against data from 10 -4 Å and then 10 -2.8 Å (R ϭ 0.369). In order to remove model bias, this model was subjected to simulated annealing refinement (18) starting at 3000 K with 10% of the reflections omitted for calculation of R free (19), followed by positional and isotropic temperature factor refinement against data from 5.0 -2.5 Å. Resolution-dependent weights were applied as 1/(1 Ϫ 5.5 (1/(2d) Ϫ 1/6)) 2 where d is the Bragg spacing of the reflection. At this point, the glycine-rich loop and the omitted side chains were built using the program O (20). Water molecules were added, and positional and temperature factor refinement was carried out against data from 5-2.0 Å. Reflections from 10 -2.0 Å were then included and refinement continued. This model was then subject to rigid body refinement against the second data set from 10 -2.8 Å, followed by several rounds of positional and isotropic temperature-factor refinement alternating with model adjustment (initially using data from 5-2.0 Å and then extending from 10 Å to the high resolution limit). Sugar complexes were refined by a similar strategy, starting from the same model used for refinement of the unliganded structure against its second data set. The methyl aglycon of ␤MeGal was not modelled in the most poorly ordered copy. Only the ␤ anomer of GalNAc could be modelled reliably (the ratio of ␣:␤ galactose is approximately 1:2 (21) but has not been reported for GalNAc). An overall anisotropic temperature factor (22) was applied to each structure, although it significantly reduced the R and R free values only for the unliganded QPDWG model. Noncrystallographic symmetry restraints were not imposed at any point in the refinement.

RESULTS AND DISCUSSION
A trimeric fragment of QPDWG containing the neck and COOH-terminal CRD (14) was crystallized, and the structure was solved by molecular replacement, both alone and com- for reflections in the working set (R cryst ) or in the test set (R free ) (19).
TABLE II Data collection and refinement statistics: model geometry All residues except Lys 152 of one protomer of the unliganded cl-QPDWG model, which lies in a poorly ordered turn, fall within the allowed regions of the Ramachandran plot. No significant differences in structure are observed among the different copies, except in regions of lattice contacts. The average temperature factor of the one protomer with the fewest lattice contacts is higher in the uncomplexed structure than in the sugar-bound structures, but it is not clear if this is a consequence of sugar binding. The side chain of residue His 99 in protomer 1 of QPDWG was modeled in two conformations. The side chains of residues Ser 90 and His 99 in protomer 1 and Met 103 in protomer 2 were modeled in two conformations in QPDWG ϩ MeGal. The side chains of residues His 99 in protomer 1 and Ser 102 , Met 103 , and Ser 129 in protomer 2 were modeled in two conformations in QPDWG ϩ GalNAc. RMSD, root-mean-square deviation. plexed with ␤MeGal and GalNAc (Tables I and II). The structures were refined to resolutions of 2.0 Å or better (Tables I and  II). Apart from the His 189 3 Trp change and the glycine-rich insertion at the carbohydrate-binding site, the structures of wild-type MBP-A and the QPDWG mutant are identical to within the coordinate error. In particular, the Ca 2ϩ site 2 ligands of the two structures superimpose, with the side chain amide nitrogen of Gln 185 and the carbonyl oxygen of Asp 187 of QPDWG in the same positions as the carbonyl oxygen of Glu 185 and the amide nitrogen of Asn 187 in the wild-type protein.
Despite the different stereochemistry of the 3-and 4-OH groups, the mechanism of ␤MeGal and GalNAc binding to QPDWG is similar to that of Man-type ligands to wild-type MBPs, with the full noncovalent bonding potential of 3-and 4-OH groups used for Ca 2ϩ coordination and hydrogen bond formation with Ca 2ϩ ligands (8,9) (Fig. 1, a and c). However, maintenance of the pentagonal bipyramidal Ca 2ϩ coordination geometry forces the pyranose ring into a very different orientation from that observed in mannose binding to wild-type MBPs (8,9). The apolar patch formed by the 3, 4, 5, and 6 carbons of ␤MeGal and GalNAc packs against the side chain of Trp 189 , an interaction observed in all galactose-lectin interac- tions studied to date (23) (Fig. 1c). The angle between the least squares plane through the pyranose ring of galactose and the plane of the Trp 189 side indole ring falls within the range found in other galactose-binding lectins (Table III). This interaction is especially noteworthy given that no aromatic residues interact with the sugar ligand in Man-binding C-type lectins, which in fact make few nonpolar contacts with the sugar ligand (8,9).
Interaction of the Trp 189 side chain with the Gly-rich loop is critical to the selectivity of QPDWG for galactose and discrimination against mannose. The loop is a rigid structure with a somewhat unusual conformation (Fig. 2, a and b). His 192 tucks into the loop and stabilizes the structure by forming hydrogen bonds with a main chain amide and a carbonyl oxygen; Leu 194 is on the outside of the loop and packs against Ala 216 , thereby holding the loop down against the lower part of the protein (Fig. 2, a and b). The C ␣ of Gly 191 packs against Trp 189 and holds it in a slightly unfavorable 2 rotamer (ϩ60°). Modelling indicates that neither of the most favored 2 rotamers of Trp (Ϯ90°) can be accommodated on the mutant protein, nor can other 1 rotamers. The Gly-rich loop thus serves as a "doorstop" that prevents Trp 189 from adopting a more favorable conformation. Mutagenesis data show that changes in many of the loop residues are tolerated with only small effects on galactose selectivity (13), consistent with the notion that the loop serves as a rigid unit that restricts the conformation of Trp 189 rather than providing specific interactions with the sugar or other residues of the protein. Superposition of mannose bound as observed in a Man 6 oligosaccharide-MBP-A complex (8) (Fig.  1a) on QPDWG reveals that the exocyclic C6 clashes with Trp 189 (Fig. 2c). Man-type ligands bind to the homologous MBP-C in an orientation reversed 180°from that shown in Fig.  1a, such that the positions 3-and 4-OH groups are exchanged (9), and preliminary data indicate that MBP-A can also bind to monosaccharides in this manner. 2 In this orientation, the anomeric oxygen in the ␣ configuration sterically clashes with Trp 189 . Thus, the position of Trp 189 imposed by the Gly-rich loop excludes Man-type ligands from the site and explains the essential role of this loop in galactose selectivity. No significant differences in the protein are observed between unliganded and sugar complex structures. The sugar complexes were prepared using sugar concentrations in approximately 100-fold excess over the K d (13), and the average temperature factors of the sugar and its liganding residues are quite similar. Thus, the sugars appear to be fully occupied in the binding sites, although the correlation of temperature factor and occupancy at the resolutions used in this study pre-cludes refinement of the sugar occupancy. In two of the three crystallographically independent copies (protomers 1 and 3; Table II), the glycine-rich loops in both the unliganded and complexed structures have similar temperature factors. The glycine-rich loop of protomer 2 of each structure has consistently higher temperature factors and is most likely a consequence of participating in relatively few lattice contacts. The average temperature factors of the loop in protomer 2 differ by approximately 25 Å 2 between the unliganded and complexed structures. It is possible that lattice contacts immobilize the loop so that any effect of sugar binding on loop mobility would be detectable only in the copy with no lattice contacts. However, the entire protomer 2 of the unliganded structure has significantly higher temperature factors than the equivalent protomer in the complexed structures (Table II), so it cannot be concluded that sugar binding significantly affects the mobility of the binding site region.
The QPDWG structures explain binding, mutagenesis, and spectroscopic data obtained from several galactose-binding mutants of MBP-A. Proton NMR spectra of ␤MeGal in the presence of QPDWG show upfield shifts of the H5, H6, and H6Ј protons of Gal consistent with their interaction with the delocalized electron system of the Trp ring observed in the crystal structure (13). The line widths of the aromatic protons of Trp 189 are broadened upon Gal binding to QPDW, whereas they are broad in the absence or the presence of Gal in QPDWG, consistent with the notion that the Gly-rich loop immobilizes Trp 189 in a position optimal for interaction with Gal (13). The proteoglycan core protein CRDs have Phe instead of Trp at position 189 and exhibit relatively poor selectivity against Man-type ligands. The corresponding MBP-A mutant QPDFG, which includes Phe 189 , binds to Gal-type ligands only 6-fold more strongly than Man-type ligands, as opposed to the 40-fold selectivity for Gal-type ligands shown by QPDWG (13). These properties are explained by exclusion of Man by the 6-membered portion of the Trp 189 ring (Fig. 2c), which extends farther out than the side chain of Phe.
Several Gal-binding C-type lectins, including RHL-1, display strong preference for GalNAc over Gal, whereas others do not discriminate between these two sugars. An example of the latter is the macrophage galactose receptor (MGR), and the QPDWG mutant of MBP-A mimics MGR in this respect. Sitedirected mutagenesis of MGR based on sequence comparisons with the asialoglycoprotein receptors has identified residues in four regions of the sequence that provide selectivity for GalNAc over Gal (24). Of these regions, the residue equivalent to Ser 154 of MBP-A provides 20-fold of the observed 60-fold selectivity for GalNAc over Gal by RHL-1 (24). Moreover, a histidine equivalent to Thr 202 of QPDWG is found in both RHL-1 and MGR and must be present in order to observe the enhancement provided by the residue at 154. The structure of QPDWG complexed with GalNAc shows that the 2-acetamido substituent is in the vicinity of Thr 202 , which in turn lies near Ser 154 (Fig. 3), and is thus consistent with the formation of a GalNAc-specific binding site by residues in these positions in RHL-1.
The present structures leave unclear how the Glu 185 3 Gln/ Asn 187 3 Asp differences lead to specificity for Man-or Galtype ligands. The residues in the binding sites of wild-type and mutant MBP-A superimpose closely, so it is not obvious why galactosides do not bind to wild-type MBPs in the orientation observed in the present structures. Indeed, free galactose binds to MBP-C through the 1-and 2-OH groups, emphasizing the selectivity of the wild-type site for equatorial OH groups having the same stereochemical arrangement as the 3-and 4-OH of mannose (9). These OH groups are related by a 2-fold rotation axis that bisects the pyranose ring and form hydrogen bonds 2 S. Park-Snyder and W. I. Weis, unpublished results.

TABLE III
Angle between galactose and aromatic amino acid side chain in galactose-binding lectins The angle between the least squares plane through the pyranose ring of galactose and the plane of an aromatic side chain was computed for QPDWG and one example of each galactose-binding lectin found in the Protein Data Bank (25). Similar angles are obtained if a least squares plane is computed using only the C3, C4, C5, and C6 atoms of galactose. The mean Ϯ standard deviation was 32 Ϯ 13. with side chain carbonyl oxygen and amide nitrogen atoms that conform approximately to this symmetry in the wild-type site but not in the QPD site (Fig. 4). Although the mechanism is not obvious, this difference in symmetry may be related to the weaker affinity of QPD for either Gal-or Man-type ligands (12). The absolute affinity of wild-type MBP-A for Man is similar to that of QPDW or QPDWG for Gal, which implies that the binding energy of Man to the wild-type Ca 2ϩ site is greater than that of Gal to the QPD mutant site. Thus the favorable interaction with the aromatic residue at position 189 can be viewed as compensating for the loss of symmetry in the mutant site to provide affinity for Gal comparable with that of wild-type MBP-A for Man. In the absence of the glycine-rich loop, mannose is not excluded from the site but interacts with lower affinity due to the asymmetric arrangement of its hydrogenbonding partners. Another potential source of the different specificities of wildtype and QPD sites is the displacement of ordered water molecules upon sugar binding. High resolution structures of MBP-C show that the 3-and 4-OH of Man-type ligands replace two water molecules that form the same set of hydrogen and Ca 2ϩ coordination bonds (9). Unfortunately, the amount of visible, ordered water structure in the uncomplexed QPDWG site varies among the three crystallographically independent copies, making it difficult to draw firm conclusions. In the best ordered site, two water molecules that form hydrogen bonds with the Ca 2ϩ 2 ligands at 185, 187, 198, and 210 equivalent to those formed by Gal can be discerned. These water molecules are in approximately the same position as the 3-and 4-OH groups of Gal but only one of them appears to be close enough to Ca 2ϩ 2 to form a coordination bond. Only one water molecule is observed in another copy, and no water molecules can be placed with confidence in the third site. Higher resolution structures of the uncomplexed QPDWG site will be required to assess whether or not there is a change in Ca 2ϩ coordination number upon ligand binding.
The different locations of the bound pyranose ring seen in the present structures and the structures of wild-type MBPs complexed with Man-type ligands are a consequence of Ca 2ϩ coordination geometry. This observation and the fact that few other contacts are made with the protein demonstrate the dominant role that Ca 2ϩ coordination plays in sugar recognition by C-type lectins. The different pyranose ring locations dictated by Ca 2ϩ coordination geometry forms the basis of selective recognition of galactose by steric exclusion of Man-type ligands provided by Trp 189 and the glycine-rich loop.