High Resolution Crystal Structures of Siglec-7

Sialic acid-binding immunoglobulin-like lectins (Siglecs) recognize sialylated glycoconjugates and play a role in cell-cell recognition. Siglec-7 is expressed on natural killer cells and displays unique ligand binding properties different from other members of the Siglec family. Here we describe the high resolution structures of the N-terminal V-set Ig-like domain of Siglec-7 in two crystal forms, at 1.75 and 1.9 Å. The latter crystal form reveals the full structure of this domain and allows us to speculate on the differential ligand binding properties displayed by members of the Siglec family. A fully ordered N-linked glycan is observed, tethered by tight interactions with symmetry-related protein molecules in the crystal. Comparison of the structure with that of sialoadhesin and a model of Siglec-9 shows that the unique preference of Siglec-7 for α(2,8)-linked disialic acid is likely to reside in the C-C′ loop, which is variable in the Siglec family. In the Siglec-7 structure, the ligand-binding pocket is occupied by a loop of a symmetry-related molecule, mimicking the interactions with sialic acid.

Sialic acid-binding immunoglobulin-like lectins (Siglecs) recognize sialylated glycoconjugates and play a role in cell-cell recognition. Siglec-7 is expressed on natural killer cells and displays unique ligand binding properties different from other members of the Siglec family. Here we describe the high resolution structures of the N-terminal V-set Ig-like domain of Siglec-7 in two crystal forms, at 1.75 and 1.9 Å. The latter crystal form reveals the full structure of this domain and allows us to speculate on the differential ligand binding properties displayed by members of the Siglec family. A fully ordered N-linked glycan is observed, tethered by tight interactions with symmetry-related protein molecules in the crystal. Comparison of the structure with that of sialoadhesin and a model of Siglec-9 shows that the unique preference of Siglec-7 for ␣(2,8)-linked disialic acid is likely to reside in the C-C loop, which is variable in the Siglec family. In the Siglec-7 structure, the ligandbinding pocket is occupied by a loop of a symmetryrelated molecule, mimicking the interactions with sialic acid.
The Siglecs 1 are a specialized subgroup of the Ig superfamily that share significant sequence similarity and the ability to recognize sialylated glycoconjugates (1). There are at least 11 bona fide members in humans and potentially 8 in mice, all of which are type 1 membrane proteins, containing an N-terminal sialic acid-binding V-set Ig domain and varying numbers of C2-set Ig domains (2). Apart from myelin-associated glycoprotein (MAG, Siglec-4), which is found uniquely in the nervous system, Siglecs are expressed predominantly in the hemopoietic and immune systems and appear to mediate both adhesive and signaling functions (3).
Siglecs can be divided into two subgroups based on sequence similarity in the extracellular and intracellular regions. Sialoadhesin (Siglec-1), CD22 (Siglec-2), and MAG constitute one subgroup, share ϳ25-30% sequence identity in the extracellu-lar region, and have divergent cytoplasmic tails. The second subgroup is made up of the CD33-related Siglecs that include CD33 (Siglec-3) and the recently discovered human Siglecs 5-11 (reviewed in Ref. 3). These proteins share 50 -80% sequence similarity and have two highly conserved tyrosinebased motifs in their cytoplasmic tails. In humans, the CD33related Siglec genes are clustered on chromosome 19q13.3-4 and are separate from the genes encoding CD22, MAG (19q13.1), and sialoadhesin (20p). Recent studies using specific monoclonal antibodies have shown that the CD33-related Siglecs are expressed in a partially overlapping manner on all major leukocytes of the innate immune system, suggesting a role for these proteins in regulation of leukocyte function. This is further supported by the finding that human CD33-related Siglecs have two conserved immunoreceptor tyrosine-based inhibitory motif-like sequences, which in all cases studied can interact with the tyrosine phosphatases SHP-1 and SHP-2 following tyrosine phosphorylation (4 -8). Recent studies of CD22 (Siglec-2) have also illustrated the importance of sialic acid recognition and specificity in triggering biological functions mediated by Siglecs (9 -11).
To date, structural information on the Siglecs is limited to the V-set domain of sialoadhesin (Siglec-1) in complex with 3Ј-sialyllactose (12). A 1.8-Å crystal structure revealed the basis for recognition of the terminal sialic acid. This mode of recognition is likely to be common to all Siglecs. However, the structure did not give insights into the differential specificity for sialic acid linkages exhibited by the Siglecs. Furthermore, the low sequence similarity shared between sialoadhesin and the CD33-related Siglecs is an obstacle to interpreting mutagenesis data in a structural framework. It would therefore be desirable to obtain the structure of at least one member of the CD33-related subgroup, which could be used as a template for modeling other CD33-related Siglecs as well as for gaining molecular insights into sialic acid linkage recognition in the Siglec family. In this context, Siglec-7 is of particular interest since it is the major Siglec expressed by natural killer cells (13,14) and possesses a unique preference for ␣(2,8)-linked disialic acids over ␣(2,6)and ␣(2,3)-linked sialic acids (15). When cross-linked at the cell surface in a redirected killing assay, Siglec-7 was capable of inhibiting the cytotoxicity toward cell targets and is therefore a potential natural killer cell inhibitory receptor (13). Similar to CD22 on B cells, glycan recognition by Siglec-7 is likely to be directly linked to its function in modulating the activation of natural killer cells.
Siglec-7 is closely related to Siglec-9 with 80% overall sequence identity (2,16). However, Siglec-9 cannot bind ␣(2,8)linked disialic acids and prefers ␣(2,3)and ␣(2,6)-linked terminal sialic acids (2,15,16). Using protein chimeras, this difference in sialic acid linkage preference was recently shown to reside in a 6-amino acid stretch within the C-CЈ loop of * This work was funded by Biotechnology and Biological Sciences Research Council Grant 94/B14010. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The Siglecs-7 and -9 (15). In the present study, we have obtained the first crystal structure of the Siglec-7 V-set domain, in two crystal forms refined to 1.75 and 1.9 Å. Comparison of Siglec-7 with a model of Siglec-9 shows that differences in residues in the tip of the C-CЈ loop may explain the preference of Siglec-7 for ␣(2,8)-linked disialic acids. The Siglec-7 structure also reveals a fully ordered N-linked glycan and interesting crystallographic packing interactions within the ligand-binding site. The structure will provide a useful template for modeling other CD33-related Siglecs.

MATERIALS AND METHODS
Molecular Cloning, Overexpression, and Purification-The Siglec-7 V-set domain was PCR-amplified from CD33-like HDPUW68 in the pSPORT vector (Invitrogen) (14) using the forward primer (5Ј-ACT-TCTAGAGCACCTCCAACCCCAGATATG-3Ј) and reverse primer (5Ј-ACTGGATCCTTATGTCACGTTCACAGAGAGCTG-3Ј). The amplified gene was inserted into the pDEF mammalian expression vector via XbaI and BamHI restriction sites introduced by the primers (bold). The resulting plasmid was transformed into Chinese hamster ovary Lec1 (CHO Lec1) cells using FuGENE (Roche Pharmaceuticals). The CHO Lec1 cells stably expressing the Siglec-7 V-set domain were selected in the presence of hygromycin and cultivated in ␣-MEM containing 5% fetal calf serum and 1% penicillin/streptomycin mix (Invitrogen). Secreted protein was harvested and filtered through a 0.2-m filter and then passed over an 8-ml anti-Siglec-7 polyclonal antibody column. This was prepared by coupling affinity-purified sheep anti-Siglec-7 IgG to cyanogen bromide-activated Sepharose CL-4B (Sigma) at 5 mg/ml. Protein was eluted from the antibody affinity column using 8-ml aliquots of 0.1 M glycine buffer, pH 2.5, followed by immediate neutralization with 10% 1.0 M Tris-HCl, pH 8.0. Resulting fractions were buffer-exchanged into 0.025 M Tris, pH 7.5, 0.1 M NaCl. Further purification was achieved using a Superdex 75 16/60 column on an Akta Purifier system (Amersham Biosciences). Purity was assessed by SDS-PAGE and matrixassisted laser desorption ionization time-of-flight mass spectrometry. The protein was concentrated for crystallization to 5 mg/ml using a Vivaspin 10-kDa cutoff spin concentrator (Vivascience).
Crystallization and Data Collection-Crystallization experiments were carried out at 20°C using hanging drop vapor diffusion. Crystallization conditions were found using a sparse matrix sampling approach with Crystal Screens I and II from Hampton Research. A drop of 0.5 l of protein (5 mg/ml in 25 mM Tris-HCl, pH 8.0) was mixed with 0.5 l of reservoir solution and equilibrated against 250 l of reservoir solution. Small crystals appeared after 2 days in Screen I conditions 9 (0.2 M ammonium acetate, 0.1 M trisodium citrate, pH 5.6, 30% polyethylene glycol 4000) and 28 (0.2 M sodium acetate trihydrate, 0.1 M sodium cacodylate, pH 6.5, 30% polyethylene glycol 8000). The initial crystals (crystal form 1, Sig7a) from condition 9 were cryoprotected using solution 9 from Cryo Screen (Hampton Research) and frozen in a nitrogen stream. Subsequently, a second crystal form (Sig7b) was found in conditions identical to those for Sig7a. These crystals were soaked in a cryoprotectant consisting of 30% polyethylene glycol monomethylether 5000, 0.1 M MES, pH 6.5, 0.2 M ammonium sulfate, and 5% 2-methyl-2,4-pentanediol and then frozen in a nitrogen stream.
X-ray data for Sig7a were collected at beamline ID14-EH2 European Synchrotron Radiation Facility (ESRF) on an ADSC Quantum4 CCD at a temperature of 100 K (see Table I). Data for Sig7b were collected at beamline ID29 (ESRF) on an ADSC Q210 CCD at a temperature of 100 K (see Table I). All data were processed and scaled with the HKL suite of programs (17) Structure Determination and Refinement-The Sig7a structure was solved by molecular replacement with AMoRe (18) using the sialoadhesin structure (Protein Data Bank entry: 1QFO (12)) as a search model. A single solution was found with an R-factor of 0.505 (correlation coefficient ϭ 0.337). The resulting model phases were used as input for Arp/wARP (19), which built 82 out of 127 residues. This was followed by refinement with CNS (20) interspersed with model building in O (21). Five percent of the data were set aside for calculation of R free (22). After several rounds of refinement, the model was incomplete. No electron density was observed for the B-C and the C-CЈ loops (residues 53-57 and 68 -74, respectively) and the N terminus (residues 18 -23). This structure was used as a search model in AMoRe against 8 -4-Å diffraction data collected on Sig7b. A single solution was found with an R-factor of 0.386 (correlation coefficient ϭ 0.723). This was followed by refinement in CNS and manual model building with O. A well defined density for the complete glycan structure attached to Asn-105 was observed (see Fig. 1). In general, the Sig7b electron density maps, although calculated from lower resolution data, were of higher quality than those for Sig7a and allowed building of the flexible loops and extension of the N terminus, resulting in a model with a final R-factor of 0.210 (R free ϭ 0.255, see Table I).
A model of Siglec-9 was produced by homology modeling with WHAT IF (23) using the Sig7b structure as a template. A model of the Siglec-7/9 interaction with disialic acid was produced through superposition of the sialoadhesin-sialyllactose structure (Protein Data Bank entry 1QFO (12)) of the Siglec-7/9 structures followed by superposition of disialic acid (from Protein Data Bank entry 1FV2 (24)) using the coordinates for the first sialic acid.

RESULTS AND DISCUSSION
Overall Structure-The structure of the Siglec-7 V-set ligand-binding domain was solved by molecular replacement in two crystal forms (Sig7a and Sig7b) and refined to 1.75 Å (r ϭ 0.223, R free ϭ 0.249) and 1.9 Å (r ϭ 0.210, R free ϭ 0.255) resolution, respectively (Table I). The Sig7a structure is incomplete, with loops B-C, C-CЈ, and the 6 N-terminal residues missing. The Sig7b structure is complete from residue 18 to 144 and includes a fully ordered N-linked glycan with a truncated structure as defined by the CHO Lec1 cell line used (discussed below). All analyses described here are based on the Sig7b structure, although all figures shown here include the side chain conformation of Arg-124 (a key ligand-binding residue) as seen in the Sig7a structure. The Sig7b Ramachandran plot calculated with PROCHECK (25) shows that 98.3% of nonglycine residues are in the most favorable and additionally allowed regions. Two residues, Gln-19 and Asp-93, are in disallowed regions, which are located in disordered regions of the map.
CHO Lec1 mammalian cells cannot synthesize complex oligosaccharides (26) due to a point mutation in the N-acetylglucosaminyltransferase I (GnTI), resulting in homogenous glycosylation with the structure Man 5 -GlcNAc 2 (27)). This is advantageous for protein crystallography studies, which can be hampered by glycan heterogeneity. The crystal structure of Sig7b reveals a complete Man 5 -GlcNAc 2 glycan N-linked to Asn-105 that is well defined in the electron density maps (Fig.  1). This high degree of order can be attributed to crystal packing. There are 3 hydrogen bonds between the glycan and the protein and an additional 16 hydrogen bonds between the glycan and symmetry-related protein molecules (Fig. 1). Thus, a total of 19 hydrogen bonds tether the glycan within the crystal lattice.
Comparison with Sialoadhesin-Siglec-7, given its higher sequence identity with other members of the Siglec family as compared with sialoadhesin (Fig. 2), may provide a more suitable template for interpretation of structure-function relationships in the whole Siglec family. Despite the low sequence identity of Siglec-7 with sialoadhesin (Fig. 2), the structures share many common features and superpose with a root mean square deviation of 1.04 Å using 100 C␣ atoms (Fig. 3). Siglec-7 shows an Ig-like fold based on a ␤-sandwich formed by two ␤-sheets consisting of strands AЈGFCCЈ and ABED, respectively. The G strand is split (forming G and GЈ), as is the A strand (to give A and AЈ). As with sialoadhesin, an intra-sheet disulfide is present (between Cys-46 and Cys-106), replacing the inter-sheet disulfide more commonly observed in other Ig-like folds (28). The cysteine on strand F is replaced by Phe-123, which lies 10 Å from Cys-46, farther than the 6 -8-Å distance observed in other Ig domains (28). This results in a widening of the cleft between the sheets, exposing residues that may interact with ligand. The largest differences between the sialoadhesin and Siglec-7 structures occur in the flexible loops on the outside of the protein, namely the B-C and C-CЈ loops, the latter of which is different in length (Figs. 2 and 3). The C-CЈ loops of both Siglec structures extend away from the main body of the protein (Fig. 3).
The Ligand-binding Site-Despite repeated attempts with a range of ligands, either by soaking or by co-crystallization, a crystal of a Siglec-7-ligand complex could not be obtained.
Although the structure of Sig7b was solved without a ligand in the binding site, comparison of the sialoadhesin structure with 3Ј-sialyllactose-bound (12) provides a model of the interactions that may occur and identifies the residues essential for specificity. The ligand-binding site lies between strands A and G (Figs. 3 and 4). As discussed previously, the partial opening of the ␤-sandwich, caused by the absence of the inter-sheet disulfide, provides a large flat surface onto which the ligand binds in the sialoadhesin structure (12). Compared with sialoadhesin, this surface has a more basic character in Sig7b, resulting from the additional basic residues Arg-23, Arg-120, and Lys-135 (Fig. 3). The additional positive charge presented by the binding face of Siglec-7 may provide a further docking site for a negatively charged sugar, possibly explaining the Siglec-7 preference for disialylated ligands.
Structural Basis for Differential Specificity-Comparison of the sialic acid-binding site of Siglec-7 and sialoadhesin show that many of the residues important in the interaction with sialic acid are conserved (Figs. 2 and 4). An arginine (Arg-124), conserved in all Siglecs, interacts with the carboxyl group on the terminal sialic acid sugar (Fig. 4). Trp-132 provides a hydrophobic interaction with the glycerol moiety. Protein backbone hydrogen bonding with the glycerol and N-acetyl groups seen in the sialoadhesin complex are also possible in the Siglec-7-binding site. However, there are some key differences. Sialoadhesin Trp-2 forms a hydrophobic contact with the N-acetyl methyl group. This tryptophan is replaced by Tyr-26 in Siglec-7, which could potentially hydrogen-bond with the N-acetyl carbonyl. In Siglec-9, this residue is absent. In sialoadhesin, mutation of the equivalent Trp-2 revealed that the hydrophobic interaction is important for N-acetyl neuraminic acid interactions (12). This could also explain why sialoadhesin cannot interact with N-glycoylneuraminic acid since the additional oxygen atom in the latter form of sialic acid would be expected to result in a steric clash (Fig. 4). Apart from sialoadhesin and MAG, all other siglecs examined so far, including hCD22, hCD33, and Siglec-6 (29), can bind both N-acetyl and N-glycoylneuraminic acid, suggesting that the hydrophobic contact with the N-acetyl neuraminic acid may not be required for sialic acid recognition in all cases. This could also explain why Siglec-9 mediates robust sialic acid binding in the absence of an equivalent aromatic residue (2,15,16).
An important aspect of Siglec function is the differential linkage specificity displayed by members of this family. This issue was recently addressed in a domain-swapping experiment between Siglec-7 and -9 (15). Surprisingly, this experiment showed that a single stretch of 6 amino acids in the tip of the C-CЈ loop could confer Siglec-9-like binding specificity on Siglec-7. The comparison of sialoadhesin with Siglec-7 shows that in the latter, the C-CЈ loop is longer and extends farther away from the body of the protein, offering the potential for more interactions with additional sugars (Figs. 3 and 4). In particular, the absence of an equivalent of sialoadhesin Tyr-44 in Siglec-7 creates a larger cavity and exposes the basic Lys-75 (Fig. 4). Sequence alignment shows that the C-CЈ loop is variable in the Siglec family, and it is thus possible that interactions with specific side chains that extend toward the binding site for the second sugar are responsible for binding selectivity. For Siglec-7, these residues are represented by Asn-70, Ile-72, and Lys-75, whereas the equivalent residues in Siglec-9 are Ala-66, Thr-68, and Asp-71. Thus, from a structural point of view, it is possible that the linkage specificity displayed by Siglecs is due to the interaction of side chains in the C-CЈ loop with subterminal sugars.
A Symmetry-related Loop Occupies the Binding Pocket-Although Siglec-7 was crystallized in the absence of any carbo- hydrate ligands, the binding pocket is not empty. Expansion of the space group symmetry shows that the pocket is occupied by the AЈ-B loop and C terminus of a symmetry-related protein molecule (Fig. 5), burying a total surface area of 200 Å 2 . Comparison with sialic acid binding in sialoadhesin shows that the amide of Gln-37 mimics the interaction with the sialic acid glycerol side chain (Fig. 5). Met-40 has hydrophobic interactions with the C␤ of Lys-131 and C␣ of Trp-132. In addition, there are several water-mediated hydrogen bonds (Fig. 5). Although the interactions with the symmetry-related loop are not extensive, this is one of the few examples of a peptide occupying a carbohydrate-binding pocket (30) and perhaps represents a first step toward a peptide-based Siglec-7 inhibitor. Interesting examples of a such peptide inhibitors have been described recently for family 18 chitinases (31) and concanavalin A (32).
Conclusions-The structure of the Siglec-7 sialic acid-binding domain is the first such report on a member of the CD33related Siglec subset. Our results support previous predictions that sialic acid recognition by Siglecs is based on a common template and provide insights into the molecular basis of sialic acid linkage specificity exhibited by different Siglec family members. The structure presented here could be used as a template for the design of specific inhibitors, which could be used to dissect the precise role of CD33-related Siglecs in the regulation of leukocyte activation. Future studies will be aimed at elucidating the precise structural basis for sialic acid recognition by this family of mammalian lectins.