X-ray Crystal Structure of the Human Galectin-3 Carbohydrate Recognition Domain at 2.1-Å Resolution*

Galectins are a family of lectins which share similar carbohydrate recognition domains (CRDs) and affinity for small β-galactosides, but which show significant differences in binding specificity for more complex glycoconjugates. We report here the x-ray crystal structure of the human galectin-3 CRD, in complex with lactose and N-acetyllactosamine, at 2.1-Å resolution. This structure represents the first example of a CRD determined from a galectin which does not show the canonical 2-fold symmetric dimer organization. Comparison with the published structures of galectins-1 and -2 provides an explanation for the differences in carbohydrate-binding specificity shown by galectin-3, and for the fact that it fails to form dimers by analogous CRD-CRD interactions.

Galectins are a family of lectins which share similar carbohydrate recognition domains (CRDs) and affinity for small ␤-galactosides, but which show significant differences in binding specificity for more complex glycoconjugates. We report here the x-ray crystal structure of the human galectin-3 CRD, in complex with lactose and N-acetyllactosamine, at 2.1-Å resolution. This structure represents the first example of a CRD determined from a galectin which does not show the canonical 2-fold symmetric dimer organization. Comparison with the published structures of galectins-1 and -2 provides an explanation for the differences in carbohydrate-binding specificity shown by galectin-3, and for the fact that it fails to form dimers by analogous CRD-CRD interactions.
Galectin-3 is a member of the galectin family of lectins defined by a conserved ϳ14-kDa carbohydrate recognition domain (CRD) 1 showing affinity for ␤-galactosides (1,2). Abundantly expressed in a few cell types, such as macrophages and polarized epithelial cells in adults (2,3) and others during embryogenesis (4), it tends to be localized in the cytoplasm and the nucleus. Although functions for galectin-3 have been proposed in each of these subcellular locations (5)(6)(7), it is also secreted by a nonclassical pathway (8,9) and is found on the cell surface and in the extracellular matrix. There it binds and cross-links selected carbohydrate-containing ligands (10,11) and is thought to modulate cell adhesion (12)(13)(14) and cell signaling (15,16). Many groups are currently studying the roles and uses of galectin-3 in cancer, inflammation, hostpathogen interaction, and nerve injury, among others (17,18).
Intact galectin-3, but not its CRD alone, shows avidity for multivalent glycoconjugates (10,11), modulates cell adhesion (14), and induces intracellular signals (15). Thus it is thought that the N-terminal domain of galectin-3 promotes the formation of dimers or higher order oligomers, thereby permitting multivalent interactions essential for its biological activities.
We report here the x-ray crystal structure of the CRD of human galectin-3 in complex with Lac and LacNAc at 2.1-Å resolution. Previously we and others showed that galectin-1 (22,23) and galectin-2 (24) are 2-fold symmetric homodimers of the canonical 14-kDa CRD. We now show that, although the galectin-3 CRD is similar to that found in galectin-1 and galectin-2, it displays structural features which provide an explanation for the known differences in galectin-3 carbohydrate-binding specificity and mode of self-association.
Data Collection, Structure Determination, and Refinement-Five heavy atom derivatives were used to calculate the initial phases (see Table I). With the exception of dimethylmercury, the heavy atom compounds were dissolved in artificial mother liquor containing 32-35% PEG 6000, 100 mM Tris-HCl, pH 8.5, and 100 -150 mM MgCl 2 , at concentrations of 1-3 mM, and soaked into crystals for 2-3 days. The dimethylmercury derivative was obtained through vapor diffusion. After mounting a crystal in a standard glass x-ray capillary, a small volume (ϳ2 l) of dimethylmercury was pipetted into the end of the capillary before sealing it with wax and epoxy. The crystal was then allowed to equilibrate for 48 h before data collection. Native and derivative data sets were collected at room temperature on a Siemens multiwire area detector with a conventional rotating anode x-ray source, and reduced using XDS (25). Heavy atom parameter refinement, phase calculations, and solvent flattening were performed using PHASES (26). An initial phase set was calculated using the isomorphous and anomalous components of the derivative data sets (Table I) in conjunction with solvent flattening. The resulting electron density map at 2.5-Å resolution was of very high quality permitting an unambiguous chain tracing. The initial model was built using O (27). Model refinement was performed with X-PLOR (28) using the simulated annealing and conventional energy minimization protocols with F obs Ͼ 1(F obs ) between 8.0-and 2.1-Å resolution. Restrained atomic temperature factors were refined using F obs Ͼ 2(F obs ) between 5.0-and 2.1-Å resolution. Water molecules were selected based on difference electron density and hydrogen bond geometry and assigned an occupancy of 0.6 (29). All figures were made with SETOR (30) with the exception of Figs. 4 and 7, which were created with GRASP (31).

RESULTS AND DISCUSSION
Structure Description-The structure of galectin-3-C has been determined in the presence of both Lac and LacNAc, and in both cases refined at 2.1-Å resolution with good geometry (Table I). Analysis using PROCHECK (32) shows that all nonproline and non-glycine residues are found in the most favored or additionally allowed regions of the Ramachandran plot. In both complexes, Leu-114 is the first residue for which electron density is observed, and hence, the first 6 residues are presumed to be disordered. Well defined electron density is observed for all other residues, including the C-terminal residue Ile-250. Both complexes are very similar to each other, and all further reference to the structure will pertain to the LacNAc complex unless otherwise indicated.
In the canonical dimeric galectins-1 and -2, ␤-strands F1 and S1 from each monomer extend the antiparallel ␤-strand interactions across the 2-fold symmetric dimer interface (S1-S1Ј and F1-F1Ј in Fig. 1), whereas the S1 and F1 ␤-strands of galectin-3-C form a solvent-exposed surface as discussed below in detail. Primary Carbohydrate Binding Site-As shown in Fig. 1 the Lac/LacNAc binding site is formed by ␤-strands S4 -S6a/S6b. With S3 these ␤-strands define a carbohydrate-binding cassette, encoded for by a single DNA exon (2,24), which is evolutionarily conserved among members of the galectin family. The amino acids making direct interaction with the bound carbohydrate are highly conserved among all galectins sequenced to date and are contained on these ␤-strands. The galactose moiety of Lac/LacNAc is most deeply buried in the binding site (Fig. 2); 166 Å 2 of its total 230-Å 2 surface area is buried by the protein. Its C-4 hydroxyl group plays a central role in binding, likely accepting hydrogen bonds from the highly conserved residues His-158 and Arg-162, while donating hydrogen bonds to Asn-160 and W1 (Table II). The galactose C-6 hydroxyl group also displays this cooperative hydrogen bonding pattern (33), interacting with Glu-184, Asn-174, and W3. The planar C-3, C-4, C-5, and C-6 carbon atoms of the b Mean figure of merit after solvent flattering is indicated in parentheses.
where F o is the observed structure factor amplitude and F c is that calculated from the refined model. d R free ϭ R factor calculated using 10% of the unique reflections randomly selected and excluded from the refinement. e r.m.s., root mean square deviation from ideality.
galactose moiety are in van der Waals contact with the aromatic side chain of Trp-181 in a fashion similar to that seen in a number of other galactose and lactose binding lectins (34). The N-acetylglucosamine (GlcNAc) moiety is more solventexposed, with only 91 Å 2 buried by the protein. Only its C-3 hydroxyl group, which hydrogen bonds to Glu-184 and Arg-162, makes direct hydrogen bonds with the protein (Fig. 2 and Table  II). The only other contacts involving the GlcNAc moiety are mediated through its N-acetyl group; the amide proton is hydrogen bonded through water (W2) to Glu-165, and the methyl group makes a van der Waals contact with the guanidino head group of Arg-186 (ϳ20-Å 2 buried surface). Although there is a water molecule in an analogous position in the lactose complex, which hydrogen bonds to the lactose O-2 hydroxyl group, the hydrogen bond distance is much greater than that found in the LacNAc complex (Table II). The van der Waals interaction and the strength of the hydrogen bond involving the 2 position of the glucose/N-acetylglucosamine moiety represent the only significant differences between the Lac and LacNAc complexes, and presumably account for the approximately 5-fold higher binding affinity, shown by human galectin-3, for N-acetyllactosamine over lactose (20). Interestingly, galectin-1 also shows a water-mediated hydrogen bond involving the NH of the Glc-NAc moiety (22), even though there is a difference in the way in which the water molecule is hydrogen bonded to the protein. van der Waals interactions with the N-acetyl group are also important, and like galectin-3 it shows higher affinity for Lac-NAc over lactose (20,21,35).
The bound galactose and N-acetylglucosamine moieties are very well defined, with average temperature factors of 21 and 32 Å 2 , respectively. The / values for the ␤(1,4)-glycosidic linkage of Lac and LacNAc, respectively, are Ϫ70°/Ϫ103°and  Ϫ68°/Ϫ103°, close to the calculated minima for the saccharides in solution (36). Extended Ligand Binding Site-Examination of the solventexposed surfaces of galectin-3-C, shows that the Lac/LacNAc binding site, formed by ␤-strands S4 -S6, is a cleft, open at both ends (see Fig. 3). At the nonreducing end (galactose) of the bound carbohydrate the cleft is extended by residues on ␤-strands S1-S3, whereas at the reducing (GlcNAc) end it is open to the surrounding solution. The carbohydrate binding site is similar in galectin-1 and galectin-2, consistent with the demonstrated ability of galectins-1 and -3 to bind longer oligo-saccharides such as polylactosaminoglycans (21,(37)(38)(39). These lectins, however, do show differences in affinity for longer oligosaccharides, particularly those substituted on the O-3 of the nonreducing galactose moiety (21,38). Galectin-3, for example, binds GalNAc␣1-3(Fuc␣1-2)Gal␤1-4Glc with almost 100-fold higher affinity than does galectin-1 (20,21). The ␣-linked Gal-NAc moiety would be expected to interact with residues in the extended cleft formed by ␤-strands S1-S3. As shown in Fig. 4, although the identity and conformation of residues involved in binding the Lac/LacNAc moiety in the primary binding site of galectin-1 and galectin-3 are very similar, they differ in the vicinity of the galactose O-3. Galectin-3 has an arginine residue at position 144 which is well positioned to interact with the GalNAc moiety or other saccharide residues linked to the galactose O-3 (Fig. 4). The serine found in galectin-1 is presumably unable to make similar interactions. In addition, the bulky leucine residue at position 31 in galectin-1 is reduced to an alanine (Ala-146) in galectin-3, creating more space for O-3 substituents (Fig. 4).
Possible Site of Interaction with RNA-Mouse galectin-3 (formerly known as CBP35) has been shown to be part of the heterogeneous nuclear ribonucleoprotein complex (40,41) and may be involved in pre-mRNA splicing in vitro (5). In addition, human galectin-3 has recently been shown to directly bind RNA fragments in a gel-shift assay (42). Electrostatic potential calculations (see Fig. 3.) show that the carbohydrate binding cleft of galectin-3 is flanked by a linear array of three positively charged arginine residues (Arg-186, Arg-162, and Arg-144). These residues are spaced approximately 5.3 and 7.9 Å apart, close to the phosphate repeat distance in RNA, leading to the possibility that the carbohydrate binding cleft may also be the RNA binding site. Consistent with this suggestion is the fact that the RNA splicing assay is inhibited by soluble oligosaccharides in a rank order reflecting their affinity for galectin-3 (5), even though the interaction of galectin-3 with heterogeneous nuclear ribonucleoprotein particles (41) and RNA fragments (42) appears not to be inhibited by lactose.
The Homologous Galectin-1/Galectin-2 Dimer Interface-A striking feature of galectin-3-C is that, in contrast with galectins-1 or -2, it is found to be monomeric in solution at protein concentrations of up to approximately 0.1 mM (11). For this reason it was of particular interest to examine the region in the galectin-3 CRD corresponding to the canonical galectin-1 and The ring carbon atoms of the carbohydrate along with 29 C␣ atoms selected from the core ␤-strands of the 6-stranded ␤-sheet were used in the superimposition of the two molecules. Gal and Nag label the galactose and N-acetylglucosamine residues, respectively. Residue labels include the amino acid single letter code, followed by the residue number in the order galectin-1/galectin-3. O3 and O4 label hydroxyl groups on the galactose moiety.
-2 dimer interface. As shown in Figs. 5 and 6 the galectin-1 dimer interface is a very apolar surface composed of residues Leu-4, Ala-6, Leu-9, Phe-133, Val-131, and Ile-128. In the galectin-3 CRD, the apolar nature of this interface has largely been eliminated by a reduction of the F1-S1 ␤-strand separation and the introduction of Tyr-118 and Tyr-247. In both cases the ring hydroxyl group of the Tyr residues point into solution creating a much more polar exposed surface (Fig. 6). Furthermore, the canonical dimer interface is partially obstructed by residues Leu-114, Ile-115, and Val-116, the end of the galectin-3 N-terminal domain (see Fig. 5). In fact, Val-116 is the start of a conserved tripeptide sequence (V(L)PY) found in galectin-3, galectin-4, and many other galectins, but not found in galectins-1 and -2. It makes a cis-peptide linkage with the highly conserved Pro-117 which is in turn followed typically by a tyrosine residue, in this case Tyr-118, one of the two tyrosine residues responsible for the increased polarity of the canonical dimer interface. Taken together it would appear that the region corresponding to the dimer interface in galectin-1 and galectin-2 does not serve a similar role in galectin-3.
Although intact galectin-3 also migrates as a monomeric (30 kDa) protein by gel filtration at concentrations of up to approximately 0.1 mM, several lines of evidence suggest that it can form dimers or oligomers at higher concentrations, where it binds multivalent ligands better than monovalent ones. The latter properties appear to be largely dependent on the proline/ glycine rich N-terminal domain, although interactions with the CRD may also be important (10,11,43). In fact, analysis of the galectin-3 CRD (not shown) reveals an apolar patch in the face of the 5-stranded ␤-sheet which may provide a site for monomer-monomer interactions. Associations involving both the Nand C-terminal domains could easily be achieved by a parallel orientation of monomers, an arrangement seen in the collectins, soluble C-type lectins with an N-terminal collagenous triple helical oligomerization domain (44).
The CRDs of many lectin types have been found in different structural arrangements among members of their respective families. In fact, the precise structural arrangement of CRDs in multimeric lectins appears to be an important means of conferring both specificity and affinity on their interactions with multivalent carbohydrates (34). The fact that galectin-3 does not possess the canonical 2-fold symmetric dimer interface characteristic of galectins-1 and -2 suggests that the galectins are no exception. Moreover, the fact that monomeric galectin-3 is in equilibrium with higher order oligomers under appropri- ate conditions, further suggests that in this case the valency is dynamic.