![]()
|
|
||||||||
J. Biol. Chem., Vol. 280, Issue 18, 18189-18201, May 6, 2005
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


¶








From the
Medical Research Council Immunochemistry Unit and the
Department of Biochemistry, University of Oxford, Oxford OX1 3QU, United Kingdom and the ||Department of Biochemistry and Molecular Biology, Oklahoma Center for Medical Glycobiology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma 63104
Received for publication, December 21, 2004 , and in revised form, February 7, 2005.
| ABSTRACT |
|---|
|
|
|---|
stacking interactions with sequential rings in the sugar. We have used this observation to construct a model of a protein·HA complex, which was then tested against existing experimental information and by acquisition of new NMR data sets of [13C, 15N]HA (8-mer) complexed with unlabeled protein. A major finding of this analysis was that acetamido side chains of two GlcNAc rings fit into hydrophobic pockets on either side of the adjacent tyrosines, providing a selectivity mechanism of HA over other polysaccharides. Furthermore, two basic residues have a separation that matches that of glucuronic acids in the sugar, consistent with the formation of salt bridges; NMR experiments at a range of pH values identified protein groups that titrate due to their proximity to a free carboxylate in HA. Sequence alignment and construction of homology models for all human Link modules in their HA-bound states revealed that many of these features are conserved across the superfamily, thus allowing the prediction of functionally important residues. In the case of cartilage link protein, its two Link modules were docked together (using bound HA as a guide), identifying hydrophobic residues likely to form an intra-Link module interface as well as amino acids that could be involved in supporting intermolecular interactions between link proteins and chondroitin sulfate proteoglycans. Here, we propose a mechanism for ternary complex formation that generates higher order helical structures, as may exist in cartilage aggregates. | INTRODUCTION |
|---|
|
|
|---|
/
-structural domain of
100 amino acids termed a Link module (810). The superfamily of Link module-containing proteins (which currently has 14 members in humans) has been divided into three subgroups (Types AC) according to the known (or predicted) size of their HA-binding domains (HABDs) (6, 7). Type A domains are formed from a single independently folding Link module and are typified by TSG-6 (the protein product of tumor necrosis factor-stimulated gene-6 (11, 12)); as described in detail below, the Link module from human TSG-6 has been extensively characterized (7, 9). Type B HABDs also consist of a single Link module, but the
-sheet structure is extended by N- and C-terminal flanking sequences that are necessary for correct folding and functional activity (10). The best characterized of these is the
150-amino acid HABD from human CD44 (the major cell-surface receptor for HA), for which the tertiary structure has recently been determined (10); Lyve-1 (13), found on lymph vessel endothelium, is also likely to have this kind of HABD. Type C domains are composed of a contiguous pair of Link modules, and HABDs belonging to this subgroup have been described in four related link proteins (HAPLN1, HAPLN2, HAPLN3, and HAPLN4 (14)) and in the G1-domains of the chondroitin sulfate proteoglycans (CSPGs) aggrecan, versican, neurocan, and brevican (7); in some cases, an additional N-terminal Ig domain may be required to stabilize the Link module folds (15).
TSG-6 is a highly conserved
35-kDa secreted protein composed mostly of a Link module and a CUB module (11, 12). It is not constitutively produced in healthy adult tissues, but its expression is induced in a wide variety of cell types in response to inflammatory mediators and growth factors, being associated, for example, with arthritis and blood vessel injury (16, 17). The precise roles of TSG-6 in inflammation are still unclear, although recent studies in mouse models of arthritis have revealed that its potent anti-inflammatory effects protect against cartilage matrix destruction (1820). In addition, TSG-6 is also produced during ovulation (21, 22) and cervical ripening (23), processes that have many characteristics in common with inflammation. In this regard, TSG-6 has been shown to have a critical role in murine ovulation via its interactions with inter-
-inhibitor (24, 25) and pentraxin-3 (26), leading to stabilization of the HA-rich extracellular matrix that forms around the oocyte and that is required for fertilization in vivo (27).
Considerable progress has been made in our understanding of hyaladherins with the recent determination of high resolution structures for the Type A and B HABDs of TSG-6 (9) and CD44 (10), respectively. Comparison of the Link module structures for TSG-6 and CD44 reveals that their backbones overlay extremely well, with both the positions and relative orientations of the secondary elements being highly conserved. These comparisons now allow the structural role for each conserved residue to be understood in detail and provide a firm basis for generating accurate sequence alignments and homology models of the other superfamily members. This is particularly important in the case of the Type C HABDs, for which no tertiary structures are yet available.
The three-dimensional structure of the TSG-6 Link module (termed Link_TSG6) has been determined in both its free and HA-bound conformations (9) using identical NMR methodology; in the latter state, the structure determination was performed in the presence of an HA octasaccharide (HA8), which has been shown to be the shortest length of testicular hyaluronidase-derived oligomer that binds optimally. Analysis of the free and bound structures revealed that, upon interaction with HA, conformational changes (e.g. in a loop between strands
4 and
5) and subtle side chain rearrangements result in the opening of a shallow groove on the protein surface. This groove contains the five key HA-binding residues identified previously by site-directed mutagenesis (i.e. Lys11, Tyr12, Tyr59, Phe70, and Tyr78) (28) and Arg81, which is believed to make a salt bridge with the bound HA8 molecule on the basis of our NMR studies (9). In this regard, isothermal titration calorimetry (ITC) experiments performed over a range of NaCl concentrations indicate that Link_TSG6 makes between one and two salt bridges with HA8 (29), with the available mutagenesis data indicating that Lys11 and Arg81 are the best candidates for these interactions (9). Furthermore, the polarity and register of the HA within the binding groove were determined from the analysis of discrete changes in chemical shift perturbation caused by different lengths of HA oligomer, and a simple model showed that the groove had dimensions (
20 Å long,
10 Å wide, and
7 Å deep) suitable for accommodating an octasaccharide (9). However, this detailed information is insufficient to determine the exact position or conformation of an HA8 molecule within the binding groove, i.e. de novo determination of a Link_TSG6·HA8 structure. The identification of intermolecular nuclear Overhauser effect (NOE) restraints and residual dipolar couplings in the bound sugar would clearly help overcome this problem, but collection of these data requires at least partial assignment of the hydrogen nuclei in the bound HA; unambiguous assignment of a repetitive molecule of this kind is extremely difficult, having only been fully accomplished to date for a free HA tetrasaccharide (30). In addition, the high interaction affinity of this system (5 x 106 M-1) (9) makes the use of transferred NOE experiments to determine the conformation of the bound ligand problematic (3133). Because our attempts to crystallize the Link_TSG6·HA8 complex have so far been unsuccessful, a modeling approach with a firm experimental basis (33) represents a viable alternative means of providing new insights into the conformation of the HA molecule within the Link module ligand-binding groove.
The HA-binding groove of Link_TSG6 is noticeably rich in aromatic residues (Tyr12, Tyr59, Phe70, Tyr78, and Trp88), making it likely that stacking interactions of saccharide rings against aromatic planes (CH-
stacking) have an important role in the association with HA (9). In particular, stacking interactions between sugar rings and two tyrosines (Tyr59 and Tyr78) have been implicated because these amino acids both change conformation and acquire distinct H
chemical shifts upon binding to HA8 (6.26 and 6.16 ppm in Tyr59, and 6.35 and 5.91 ppm in Tyr78); observation of distinct H
shifts results from tight, asymmetric interactions against the aromatic plane of the tyrosine ring, since these rings typically have a fast "flipping" motion, leading to averaging of their chemical shift values. The ring hydroxyl group of Tyr78 also appears to make a hydrogen bond with the sugar because a slowly exchanging H
proton was observed in the HA-bound protein that was not seen in the absence of ligand (9). In the family of 20 structures determined for Link_TSG6 in its HA-bound state (Protein Data Bank code 1o7c
[PDB]
), the conformations of Tyr59 and Tyr78 are extremely well defined (by 50 and 30 NOEs, respectively), as evidenced by the very low root mean square deviation seen over all their heavy atoms (0.19 Å). Of particular note is the relative orientation and separation of the aromatic planes of these two residues: they lie on the same axis, with a slight twist between them, and are
6 Å apart. Given that adjacent sugar rings in HA have an average separation of
55.5 Å (34), it is reasonable to suggest that Tyr59 and Tyr78 stack with sequential saccharides in the complex. This raises the possibility of modeling an HA molecule into the open groove of Link_TGS6 (i.e. in its HA-bound conformation) based on the optimization of these sequential stacking interactions.
Here, we present a model of the Link_TSG6·HA8 complex generated by building HA8 into the Link module binding groove using the bound conformation of the protein, the polarity of HA, and potential stacking interactions. This model was then tested against previously acquired experimental information (the expected register and salt bridges) and newly collected NMR data. This modeling process has revealed the principal features underlying the specificity of HA recognition by the TSG-6 Link module. These features were further examined in light of new homology models generated for the bound conformers of other members of the human Link module superfamily and used to construct an HA-bound model of the Type C HABD from cartilage link protein (HAPLN1). This model reveals the residues most likely to be involved in stabilizing the inter-Link module interface, which are highly conserved across the HAPLN/CSPG subfamily. This has important implications for the condensation of these proteins along an individual HA chain to form ternary complexes and higher order structures.
| MATERIALS AND METHODS |
|---|
|
|
|---|
1
3)- and (
1
4)-linked disaccharides of HA (i.e. GlcA(
1
3)GlcNAc and GlcNAc(
1
4)GlcA, respectively) were placed in the binding groove of each protein structure in the orientation established previously (i.e. the reducing terminus over Tyr78) (9) with the planes of the carbohydrate rings held above the planes of the side chains of Tyr59 and Tyr78 at the distance expected for stacking interactions (3.5 Å). A grid search was performed over all possible orientations and glycosidic angle conformations (10° increments in
and
, giving 1296 combinations) for each disaccharide. Models with low van der Waals energies (Lennard-Jones potential) were selected for both types of disaccharide, i.e. those with coplanar aromatic and saccharide rings at both Tyr59 and Tyr78. Then, with the saccharide rings held in position, the carbohydrate acetamido, carboxylate, and hydroxyl side chains were allowed to relax into conformations complementary to the protein surface by minimization of the van der Waals energy (1000 steps of steepest descents). Next, an individual saccharide ring was added to the nonreducing terminus of each disaccharide and, with the coordinates of the disaccharide fixed, allowed to explore all orientations and glycosidic angles. The lowest potential energy models (assessed by the Lennard-Jones potential) were kept, and then the whole of each trisaccharide was subjected to energy minimization (1000 steps). Another saccharide ring was added to the nonreducing terminus and allowed to search conformational space as described above; in the case of the (
1
3) models (where a GlcA was added), the GlcA carboxylate group was weakly restrained to be within4ÅoftheendoftheLys11 side chain (N
) to allow for the expected salt bridge (9, 28, 29); no electrostatic forces were included in this step. At this point, the nonreducing terminal GlcNAc ring of the tetrasaccharide formed from the stacked (
1
4)-linked disaccharide (i.e. GlcNAc-GlcA-GlcNAc-GlcA) was found to prefer orientations that sharply exited the binding groove, and it was therefore not possible to continue with these models (see "Results"). In contrast, the nonreducing terminal GlcA ring of the tetrasaccharide formed from the stacked (
1
3)-linked disaccharide (i.e. GlcA-GlcNAc-GlcA-GlcNAc) remained within the groove, allowing this line of modeling to be pursued further. For this tetrasaccharide, the addition of the next ring (GlcNAc) to the nonreducing terminus was performed as described above. The resulting low energy models now completely filled the binding groove, and the sugar was extended to an octamer by the addition of a monosaccharide to the nonreducing terminus and a disaccharide to the reducing terminus with the solution
H and
H glycosidic conformations (
H 60°, H1C1OxCx;
H 0°, C1OxCxHx). Following minimization of the whole octasaccharide (10,000 steps), the lowest energy HA8-bound model for each of the 20 protein structures was retained, and the side chain of Arg81 was reconstructed in an extended conformation. A salt bridge was included between this side chain and the neighboring carboxylate group on HA (GlcA ring 6) by allowing all bonds in the Arg81 side chain and the carboxylate group to move freely for 1000 steps while restraining the carboxylate group O-6A and O-6B atoms to be within 2.3 Å of either the H
and H
12 protons or the H
11 and H
21 protons (i.e. allowing for both possible salt bridge configurations).
Oligosaccharide and NMR Sample PreparationHigh molecular mass HA (
1 MDa) uniformly labeled with 15N and/or 13C was produced in minimal medium (containing 15NH4Cl and [13C6]glucose as the sole nitrogen and carbon sources, respectively; CK Gas Products) by growth of Escherichia coli K5 transfected with recombinant HA synthase from Pasteurella multocida (37) and purified as described (9). 15N- and 13C, 15N-labeled octasaccharides were generated from this material by digestion with ovine testicular hyaluronidase and purified by ion exchange chromatography as described previously (9). Unlabeled Link_TSG6 and 15N-labeled Link_TSG6 were produced following previously described methods (9, 38). NMR samples (1.0 mM unlabeled Link_TSG6/[15N]HA8, 2.3 mM unlabeled Link_TSG6/[13C, 15N]HA8, and 1.0 mM 15N-labeled Link_TSG6 with and without 1.0 mM unlabeled HA8) were prepared from lyophilized protein and oligosaccharide reconstituted in 10% (v/v) D2O and 0.02% (w/v) NaN3 and adjusted to pH 6.0.
NMR SpectroscopyAll NMR experiments were performed at 25 °C on spectrometers operating at 500 or 600 MHz, processed with FELIX Version 2.3 (Biosym Technologies), and analyzed with XEasy. The nitrogen carrier frequency was set to the center of the amide resonances (122.5 ppm) in all 15N-edited experiments on labeled HA. The 1H, 15N heteronuclear single quantum correlation (HSQC) spectrum was recorded at 500 MHz on the 1.0 mM Link_TSG6/[15N]HA8 sample with acquisition times of 161.80 ms (t1, 15N) and 128.00 ms (t2, 1H). The 1H, 15N total correlation spectroscopy (TOCSY)-HSQC spectrum was acquired at 600 MHz on the same sample with a mixing time of 86.10 ms and acquisition times of 2.56 ms (t1, 1H), 2.62 ms (t2, 15N), and 81.92 ms (t3, 1H). The 1H-, 13C-HSQC spectrum was recorded at 600 MHz with the 13C carrier frequency set to 70 ppm and acquisition times of 58.00 ms (t1, 13C) and 81.92 ms (t2, 1H), i.e. a spectral width of 40 ppm, achieving optimal folding of the C1 and CMe resonances. The HCCH-TOCSY data set was collected at 600 MHz with a heteronuclear mixing time of 17.40 ms; acquisition times of 3.20 ms (t1, 1H), 8.70 ms (t2, 13C), and 6.00 ms (t3, 1H); and the 13C carrier frequency set to 91.2 ppm. The HNCA spectrum was acquired at 500 MHz with a constant time period of 28 ms in the 15N domain, acquisition times of 7.92 ms (t1, 15N), 21.60 ms (t2, 13C), and 128.00 ms (t3, 1H); and the 13C carrier frequency set at 42.5 ppm.
The 1H-, 15N-HSQC spectra for 15N-labeled Link_TSG6 (with unlabeled HA8) were recorded at 500 MHz at pH values from 3.5 to 7.5 in 0.25-unit increments (161.80 ms, t1, 15N; 128.00 ms, t2, 1H; 15N carrier frequency set to 119.0 ppm). The chemical shift changes of affected resonances were plotted as a function of pH, and the curves were fitted by nonlinear least-squares analysis to a one-site model, giving a measurement of the apparent pKa values of the titrating group. The 1H-, 15N-nuclear Overhauser effect spectroscopy (NOESY)-HSQC, 1H-, 13C-NOESY-HSQC, and two-dimensional NOESY spectra referred to under "Results" are those reported previously (9). The chemical shifts for free and HA8-bound Link_TSG6 have been deposited in BioMagResBank with accession numbers 6392 [BMRB] and 6393, respectively.
Link Module Homology ModelsThe primary sequences of each of the known human Link modules (i.e. from CD44, Lyve-1, stabilin-1, stabilin-2, KIA0527, HAPLN1, HAPLN2, HAPLN3, HAPLN4, aggrecan, brevican, neurocan, and versican) were aligned with that of Link_TSG6, taking into account both sequence identity and structural knowledge of the Link module fold; the resulting alignment is identical to that in Ref. 9, except for the second Link module of the Type C CSPG domains in which the single amino acid deletion site located between strands
5 and
6 was moved four positions toward the N terminus. This alignment was then used to generate homology models for each Link module based on the lowest energy structure of Link_TSG6 in its HA8-bound conformation, where aligned regions and side chain atoms of identical or structurally related amino acids were modeled into the Link_TSG6 coordinates (Protein Data Bank code 1o7c
[PDB]
) (9). The remaining side chain atoms that did not have a match were initially positioned using standard internal coordinates. Residues in insertions were included in the same fashion, assigning them standard backbone (
-sheet) and side chain geometries while maintaining an unbroken peptide backbone over the site of insertion. All atoms within insertions/deletions (relative to the TSG-6 sequence) were then allowed to relax in each model, over eight designated regions (i.e. residues 69, 1011, 2730, 4043, 5355, 7073, 7783, and 8486, numbered as for Link_TSG6), using energy minimization (1000 steps of steepest descents and conjugate gradients) while the coordinates in the remainder of the structure were kept fixed. The final stage in the modeling process was to allow all side chains throughout the structure to relax (10,000 steps of steepest descents), leading to energy-minimized homology models of the Link modules in their bound conformations. Models containing bound ligand were then constructed by inserting the coordinates of the HA8 molecule from the lowest energy HA8-bound model of Link_TSG6 (see above) into the structure file of the Link module homology models, where these were aligned over strands
3,
4, and
5.
Models of the cartilage link protein Type C HABD were generated in MOLMOL by positioning the two component Link modules (HAPLN1-1 and HAPLN1-2), both containing HA8 within their ligand-binding grooves, such that the bound HA molecules could form a continuous chain (requiring deletion of an overhanging disaccharide from either the reducing or nonreducing terminus of each oligomer as appropriate) while maintaining a distance between the C and N termini (of the first and second Link modules, respectively) that could be bridged by the intermodule linker peptide (Ser-Asn-Phe-Asn). This linker was added in an extended conformation from the C-terminal residue of HAPLN1-1 (Thr96) and allowed to relax into place using 10,000 steps of steepest descent minimization. All side chain atoms were then energy-minimized, and finally, the two bound HA hexamers were joined together and minimized. Molecular diagrams were generated using MOLMOL and RASMOL.
|
| RESULTS |
|---|
|
|
|---|
1
3)- and (
1
4)-linked sugars) against the side chains of Tyr59 and Tyr78 in the polarity previously established (i.e. with the nonreducing terminal ring over Tyr59) (9). This was done for each member of the family of the 20 lowest energy protein structures (Protein Data Bank code 1o7c
[PDB]
), searching the full range of glycosidic linkage conformational space for both disaccharides, with the coordinates of the protein kept fixed. The resulting low energy models, in which the sugar side chains had been allowed to relax relative to the protein, were extended from the nonreducing terminus by the sequential addition of monosaccharides, which were also allowed to search conformational space. In the trisaccharide formed (GlcNAc-GlcA-GlcNAc) by the addition of GlcNAc to the (
1
3)-linked disaccharide, the new N-acetyl side chain was found to be accommodated in an open pocket at the bottom of the binding groove (pocket I), with favorable glycosidic bond angles. Significantly, this pocket is present only in the HA-bound conformation of the Link module (see below). When extended to a tetrasaccharide (GlcA-GlcNAc-GlcA-GlcNAc), the carboxyl group on the new GlcA ring adopted orientations that readily allowed a salt bridge to be formed with Lys11. This salt bridge is expected to be present on the basis of site-directed mutagenesis, ITC, and chemical shift data (9, 28, 29); the weak restraint used to position the carboxylate of GlcA within 4 Å of the end of the lysine side chain was easily satisfied in this family of models. In contrast, the trisaccharide (GlcA-GlcNAc-GlcA) formed from the (
1
4-)linked disaccharide was not well accommodated and, when extended to a 4-mer, tended to exit vertically the binding groove due to steric clashes with the protein. Therefore, this line of models could not be pursued further; the GlcNAc-GlcA-GlcNAc-GlcA tetrasaccharide was unable to find positions that would allow the formation of the salt bridge with Lys11. Thus, in this modeling approach, we have found that an HA molecule (with 4C1 ring conformations) can be extended only within the Link_TSG6 binding groove when the required sequential CH-
stacking interactions with Tyr59 and Tyr78 are made by a (
1
3)-linked disaccharide. Importantly, this register of HA is the same as that determined by our previous shift mapping studies with HA oligosaccharides of various lengths, which predicted that a GlcNAc ring was in intimate contact with Tyr78 (9). The low energy Link_TSG6·HA4 models, resulting from construction of the (
1
3)-linked disaccharide, were therefore extended to an octasaccharide, with the inclusion of a salt bridge with Arg81 (predicted from shift mapping and ITC data) (9, 29), which, like Lys11, was found to be in close proximity to a GlcUA carboxylate group.
Selection of Models to Form a Family
The model complex resulting from building HA8 into the lowest energy protein structure is shown in Fig. 1A. As can be seen, the HA molecule is readily accommodated within the binding groove and lies in good contact with all key HA-binding residues (i.e. Lys11, Tyr12, Tyr59, Tyr78, and Arg81), except Phe70, the side chain of which was not precisely defined in the family of protein structures (9). In addition, the bound HA8 molecule lies within the region of the protein that displays significant chemical shift perturbations, and changes in side chain conformation, upon interaction with the ligand (9). However, the modeling approach employed was not successful for all members of the family of 20 Link_TSG6 structures, and in five of these, the HA molecule penetrated through the protein. This is not surprising because all the protein atoms were kept rigid throughout the modeling process and therefore led to steric clashes with certain less well defined amino acid side chains (in particular, Gly69 and Phe70); this lack of definition is partially a result of structure calculations for Link_TSG6 in its bound state being carried out in the absence of HA8 (9). These models, which tended to be from the higher energy protein structures, were therefore discarded. In a further four models, other steric clashes caused the HA chain to kink sharply out of the groove (Supplemental Fig. 1); pocket I was left empty in one of these, whereas in the other three cases, the 6-OH group of the GlcNAc ring was found to partially occupy the pocket. In these models, HA had very strained glycosidic angle conformations, and although the HA molecules could still form a salt bridge with Lys11 and maintained the same register as that of the model built on the lowest energy protein structure, they were considered to be outliers and were therefore discarded. The remaining 11 models (including the one arising from the lowest energy Link_TSG6 structure) were all very similar and could therefore be considered to form a family (Fig. 1B).
Only Five Saccharide Rings Are Required to Fill the Binding Groove
It is clear from the family of models that the binding groove is only long enough to interact with a stretch of five saccharide rings (from GlcA ring 3 at Lys11 to GlcA ring 7 at Arg81); in fact, the remaining three rings of the bound octasaccharide (one at the reducing terminus and two at the nonreducing terminus) are farther than 5 Å from the protein surface. Besides interactions with the key binding residues, we predict from the family of models that the side chains of Cys47, Val57, Ile61, Cys68, and Trp88 are also likely to be in intimate contact with the HA molecule. In this regard, all these residues displayed large chemical shift perturbations (e.g. Cys47 C
, 3.14 ppm; Ile61 C
1, 2.82 ppm; and Trp88 C
3, 0.43 ppm) and a change in conformation upon binding to HA (e.g. the plane of Trp88 pitched by
15° and yawed by
45°).
Hydrophobic Pockets Accommodate Acetamido Side Chains
Comparison of the free and bound structures determined for Link_TSG6 (9) reveals that the HA-binding groove contains two hydrophobic pockets (I and II) that are formed upon its interaction with HA8 (Fig. 2A). Pocket I results primarily from rearrangements within the
4
5-loop (hinged on Pro60 and Gly74), which are caused by a change in conformation of the disulfide bridge Cys47Cys68 (9). The surface of this pocket is formed by the backbone of Ala49 and Gly69, side chain atoms in Cys47, Cys48, Ile61, and Lys72, and the C
/H
atoms of Tyr59 (Fig. 2B). During the modeling procedure, the bulky and hydrophobic acetamido group of GlcNAc ring 4 was found to locate in this pocket in 15 of the 20 models (including the selected family of 11 structures). In fact, HA molecules could be extended only within the binding groove if an acetamido side chain was located in this position (see above). It should be noted that this pocket is therefore largely responsible for determining the register of HA within the binding groove. As shown in Fig. 2A, a second, more exposed pocket (II) is also present on the opposite side of the binding groove in a position that can accommodate the acetamido group on ring 6. This pocket also opens upon binding to HA due to changes in side chain orientations of Trp88, Tyr78, and Tyr59 and is principally formed by Trp88 and Val57. During the modeling, the side chain of GlcNAc ring 6 located itself in this pocket, being bound in this way in 15 of the 20 models (including all of the family of 11 selected models). Analysis of these models indicated that N-acetyl group penetrates into the pocket to different depths in each case, which is not surprising because no restraint was specifically included to localize it in the pocket.
|
H and
H angles across the family of 11 structures (i.e. at each of the four linkages between the five bound saccharide rings; see above) is compared with the preferred conformations predicted for HA in solution (3941). It is clear that, considering the limitations expected for modeling HA into rigid protein conformations, the HA8 molecule has bond angles at each linkage remarkably similar to those predicted for HA in solution. Of particular interest is the observation that the relative orientations of the planes of Tyr59 and Tyr78 prefer stacking interactions to be made with both rings 5 and 6 in their solution conformation; in Fig. 3A, only 2 of the 11 models lie outside the low energy contours. Accommodation of the GlcNAc ring 4 acetamido side chain in pocket I serves to introduce twists at the glycosidic
H angle of both its (
1
3)-linkage (
90°) (Fig. 3C) and (
1
4)-linkage (
60°) (Fig. 3D), generating distributions of conformations skewed away from those predicted in solution. Also note that, with this arrangement of glycosidic conformations, both 6'-hydroxyl groups (on GlcNAc rings 4 and 6) are solvent-exposed, which, because of their high mobility relative to other groups within HA, might be expected on entropic grounds. Therefore, in these models, HA was found to be able to be accommodated in the groove in a conformation that is similar to an allowed free solution conformation. A consequence of this is that most of the HA intramolecular hydrogen bonds are maintained in the models, which would also be energetically advantageous.
|
stacking interaction with HA in the model; consistent with this, Tyr12 does not show distinct H
chemical shifts upon binding to HA8. It does, however, acquire a slowly exchanging H
proton upon interaction with HA8, and as shown in Fig. 4A, this group is clearly pointing toward the octasaccharide in a favorable orientation to make a hydrogen bond (predicted to be to O-2 of ring 4). The models also suggest that the slowly exchanging H
proton of Tyr78, seen in the HA8-bound protein (9), makes a hydrogen bond with O-4 of ring 2. The slight twist introduced into the bound HA8 molecule by accommodating the acetamido group of ring 4 in pocket I appears to serve to direct the hydrophobic face of this saccharide ring toward Phe70 (Fig. 4A); the hydrophobic face of this ring was found to be directly opposite the side chain of Phe70 in 6 of the 11 selected models. It therefore seems plausible that Phe70 makes a stacking interaction against this ring, closing over the HA molecule in the groove and holding it in place.
Testing the Model
Chemical Shifts of Bound HA8Samples containing 15N- or 13C, 15N-labeled HA8 bound to unlabeled Link_TSG6 were prepared for use in isotope-edited experiments, in which only the HA8 component of the complex was visible. 1H-, 15N-HSQC, 1H-, 15N-TOCSY-HSQC, and 1H-, 13C-HSQC spectra were recorded (Fig. 5), allowing the measurement of all 1H, 13C, and 15N chemical shifts within the bound HA8 molecule, which aids in the identification of intermolecular NOEs (see below). As expected for NMR spectra of such a repetitive oligosaccharide, the chemical shifts were highly degenerate, leading to extensive resonance overlap, and it is therefore currently not possible to assign the majority of these resonances to specific nuclei. Nevertheless, comparison of these spectra with published values for the free sugar (30, 4245) allows some general conclusions to be drawn.
|
and
arise from the
- and
-anomers of the reducing terminal GlcNAc ring (9, 45, 46), and the corresponding C1, C2, CMe, H1, H2, and H3 chemical shifts of ring 8 were measured (using the spectra shown in Fig. 5B in combination with HNCA- and HCCH-TOCSY experiments) and found to have near identity to the free solution values:
0.01 ppm difference for 1H (45) and
0.1 ppm difference for 13C (30, 44). These chemical shift data are consistent with the expectation from the model that the binding groove can only accommodate five rings, and thus, the reducing terminal ring does not contact the protein at all. The amide groups from the other three GlcNAc rings in Link_TSG6·HA8 are partially resolved due to the presence of the protein, which causes chemical shift perturbations away from the solution values, generating resonances cg (Fig. 5A). The perturbation to resonances eg is quite large in the proton dimension (0.41.1 ppm upfield), consistent with the binding of one or more N-acetyl side chains in unique environments and/or their close proximity to aromatic residues, both of which the model predicts for rings 4 and 6. However, because there are only four amide groups within the HA8 molecule (expected to give rise to five NH resonances), the presence of seven species in the 1H, 15N HSQC spectrum indicates additional complexity in the mechanism of HA binding (e.g. with regard to dynamics or multiple conformations), which is as yet not fully understood. It is relatively straightforward to assign regions of the 1H-, 13C-HSQC spectrum (Fig. 5C) to groups within a disaccharide (e.g. GlcNAc C2 and H2) by comparison with published values for the free sugar (30, 44); however, these cannot be assigned as yet to an individual ring (e.g. ring 2 or 4). Many of the observed resonances are very similar or identical to those of free HA8, which is not surprising given that only five rings are predicted from the modeling to be in direct contact with the protein surface, i.e. the three non-interacting rings would be expected to have unperturbed chemical shift values. Although the methyl region in the 1H-, 13C-HSQC spectrum has considerably more than four peaks (corroborating the observations from the 1H-, 15N-HSQC spectrum described above), it is clear that some of these show distinct chemical shift perturbations (Fig. 5C). This indicates that certain methyl groups are bound to the protein in unique environments, such as, for instance, the hydrophobic pockets predicted from the molecular modeling.
Intermolecular NOEsTotal (100%) assignment of 1H nuclei within the bound protein has been achieved, and high resolution structures of the protein have been determined (9), thereby allowing each NOE in the NOESY spectra to be accounted for with high confidence. Furthermore, observation of all 1H chemical shifts in the bound HA8 molecule (i.e. in the HSQC spectra shown in Fig. 5) allows potential intermolecular NOEs (between Link_TSG6 and HA8) to be examined in order to determine whether an appropriate chemical shift value is present in the sugar. In this regard, 13 NOEs from protons with chemical shifts that match those observed for the bound HA8 ring protons (over the range 3.24.7 ppm) to amino acids within the protein ligand-binding site could not be assigned to groups within Link_TSG6 (Supplemental Table 1). Therefore, these represent extremely good candidates for intermolecular NOEs, especially because there were no remaining NOEs over this chemical shift range unaccounted for within the rest of the protein. Thus, clear intermolecular NOEs are evident, for example, to Ile61 HN (from 4.50, 3.74, and 3.55 ppm in HA8) and Tyr78 H
2 (from 3.84 and 3.74 ppm), indicating the intimate contact of these residues with the HA ring hydrogen atoms, as seen in the model. In addition, two further NOEs from 1.98 ppm to Ala49 HN and Val46 HN (both found near the bottom of pocket I) are of particular interest because they could originate only from GlcNAc methyl groups (i.e. with a modest chemical shift perturbation of 0.06 ppm from its free solution value) (42). If these are from the same methyl group in the bound HA8, which is likely because Ala49 HN and Val46 HN are close together in the structure, they would position a methyl group at the bottom of pocket I (Supplemental Fig. 2), in direct confirmation of this aspect of the model.
|
atoms was 3.2 Å across the family of 11 models, showing that the "restraint" of 4 Å had no influence. Thus, the protein is predicted to bind to alternate carboxylate groups on HA8 (i.e. rings 3 and 7), leaving the charge on ring 5 pointing toward the
4
5-loop (Fig. 4B). In this regard, the 1H-, 15N-HSQC spectra of the HA8-bound protein recorded over pH 3.57.5 revealed that several resonances within this loop (i.e. Gly69, Gly71, and Lys72 (Fig. 6) and Asn67 and Phe70 (data not shown)) were observed to titrate at low pH, with a pKa value of
3.03.5, which is similar to that observed for free GlcA carboxylate groups (pKa
3.04.0). The titration of these amide groups cannot be accounted for by proximity to any protein carboxylate moiety (the nearest, Asp77, is 13 Å away) or by an intermediate population of both free and bound species because the chemical shifts of these and other non-titrating nuclei (e.g. Arg81 HN
) indicate that the protein is still in the bound conformation. It therefore seems probable that these amino acids on the
4
5-loop are directly reporting the titration of a carboxylate group within HA8 and thereby localize it in the complex structure; such positioning agrees very well with the location of the unneutralized charge on ring 5 predicted by the model (Fig. 4B). It should be noted that, although the side chain of Lys72 is close to this carboxylate, it is almost certainly not making a salt bridge because there are no chemical shift perturbations to the end of the Lys72 side chain upon binding HA8 and the H
nuclei have the typical "free solution" value of 2.99 ppm in both free and bound states. Furthermore, mutation of Lys72 to Ala does not cause a reduction in affinity (28).
Basis for Homology Modeling of Link Module Superfamily Members
Recently, the structures of the Link modules of TSG-6 and CD44 have been determined at high resolution (9, 10), revealing that the backbone topology is highly conserved (as would be expected with a 32% sequence identity). Furthermore, comparison of the functions of side chain groups in these structures now permits the role of each conserved residue to be understood in detail (Supplemental Table 2). For example, the hydrophobic residues involved in forming the Link module core are well understood and highly conserved across the family, and the conservation of other particular small or charged residues arises from features such as helix caps (e.g. Thr15/Glu18 and Thr32/Gln35, numbered as in Link_TSG6), turns (e.g. Ala31 and Gly55), and the close approach of secondary elements (e.g. Gly50 allows the parallel hydrogen bonding of strands
3 and
6). This knowledge has been combined to produce a precise multiple sequence alignment for the superfamily as a whole (Fig. 7), with sequence identities to Link_TSG6 of between 30 and 40%, with the exceptions of KIA0527 (22%), stabilin-1 (47%) and stabilin-2 (43%). Therefore, this sequence alignment can be used for the generation of reliable homology models of the other Link module family members in their open (i.e. HA-bound) state based on the HA8-bound Link_TSG6 coordinates (Protein Data Bank code 1o7c
[PDB]
) (9). Analysis of the models for the second Link module from the CSPGs (i.e. aggrecan-2, brevican-2, versican-2, and neurocan-2) and the related aggrecan-4, which constitute a subgroup of the superfamily (see Fig. 7), indicated that these overlaid on the Link_TSG6 structure with a significantly lower root mean square deviation compared with the models generated from the alignment of Blundell et al. (9), where the insertion between strands
5 and
6 was in a different position. It should be noted that all other deletions and insertions (such as the long loop in HAPLN4-1 between strands
4 and
5 and the insertion between strands
5 and
6 in the second Link module of the CSPGs) could be easily accommodated within the Link module tertiary structure. Therefore, the 23 Link module models constructed here in their ligand-bound conformations allow the HA binding capabilities of these domains to be re-evaluated and also provide a firm basis for future programs of site-directed mutagenesis.
|
|
|
All of the other Link module models have structures and sequences consistent with a capability to interact with HA. From the multiple sequence alignment shown in Fig. 7 (where the putative HA-binding residues are colored red), it is clear that most functional Link modules appear to share more features in common with TSG-6 than CD44. For example, Tyr59 and Tyr78 of Link_TSG6, although not present in CD44 (or Lyve-1), are highly conserved across the superfamily as a whole. In the case of Tyr59, an equivalent tyrosine is found in 16 out of 18 of the HAPLN/CSPG Link modules, whereas the other two have a phenylalanine or histidine ring at this site; Tyr78 is absolutely conserved in the first Link modules of HAPLN/CSPGs, whereas in the second, it is replaced by Phe, Leu, or Val, i.e. aromatic or large and planar faced hydrophobic residues that could also stack against a GlcNAc ring. Furthermore, the critical residues Arg78 and Tyr79 of the CD44 HABD (50) are almost unique to this receptor, with no other examples of a basic amino acid at this position and only two Link modules having a tyrosine at a comparable sequence location (i.e. Lyve-1 and versican-2). It should also be noted that CD44 has been hypothesized to have two modes of ligand binding (10), where mode 1 is likely to be equivalent to the HA-binding site described above for TSG-6. Therefore, CD44 and the related HA receptor Lyve-1 (which both have Type B HABDs) (10) appear to be outliers of the superfamily, and therefore, some caution is required when interpreting models for these proteins based on TSG-6.
As shown in Fig. 7, the residues contributing to the formation of pockets I and II (colored blue and green, respectively) are very highly conserved across the superfamily, indicating that accommodation of GlcNAc side chains in this way is likely to be a general feature of HA binding to Link modules, including those of CD44 and Lyve-1 (except that the latter does not have the hinging proline residue of pocket I). KIA0527, however, does not have most of the residues required to form pocket I and lacks the disulfide bridge that permits opening of the groove, providing further evidence that it is unlikely to be functionally active with regard to HA binding. Importantly, because the combination of pocket I and the central aromatic residue Tyr59 (which is highly conserved as described above) determines the binding register in Link_TSG6, it is to be expected that other Link modules bind HA in the same polarity and (approximate) register. Consistent with this, basic residues at positions 11 and 81 are found in many Link modules, i.e. in the same location relative to the pocket and central aromatic residue. Nevertheless, although the register and orientation of HA within the binding groove are probably conserved in most cases, the details of the molecular interaction are expected to be subtly different in each case; for example, aggrecan1 has two additional basic amino acids in its binding site (compared with brevican1) at positions equivalent to Lys72 and Arg81 in Link_TSG6 (Fig. 8). In this regard, although a basic residue at position 11 is very highly conserved across the superfamily (and functionally implicated in both TSG-6 and CD44) (28, 49, 50), recent mutagenesis data for neurocan indicate that, in this case, an arginine at this location does not contribute to HA binding in either of its Link modules (52). This suggests that there is considerable diversity in the way specific Link modules interact with HA, perhaps stabilizing different conformations of the sugar as suggested previously (57), and comprehensive programs of mutagenesis are now required to determine which of the amino acids identified as potentially functional by our analysis are actually involved in binding.
|
An important observation from this modeling exercise was that four aromatic and two large hydrophobic residues were found at the predicted Link module-Link module interface (Fig. 9B, magenta), whereas a conserved glycine residue at the start of helix
2 in the second domain was found to greatly aid the close approach of the two modules (where a larger side chain would result in a steric clash). Because these amino acids are seen to be well conserved across the HAPLN/CSPG Link modules (Fig. 7, magenta boxes) and are not highly conserved in the Type A and B HABDs (i.e. those that have a single Link module), these sequence positions are predicted to form the interdomain interface in the other Type C HABDs.
Both component Link modules in the Type C domain are distinctly wedge-shaped, and therefore, their tandem association to form a continuous HA-binding site naturally results in the introduction of a gentle curve into the bound HA chain (Fig. 9, A and B). Interestingly, aromatic/hydrophobic residues also cluster on the free "ends" of HAPLN1 (Fig. 9B, cyan), making two additional faces ideal for interaction with other proteins; these residues are also well conserved within the HAPLN/CSPG Link modules (Fig. 7, dashed cyan boxes). In fact, during construction of the model, it was found that the opposite order of Link modules is also a reasonable organization for the structure of HAPLN1 (i.e. with the two modules in Fig. 9 (A and B) swapped around) because these hydrophobic patches come together and make an interface (although in this case, it is smaller, and some charged residues are slightly buried). This potential to form Link module-Link module interfaces on either side of an individual Link module may underlie the ability of the HAPLN proteins to aggregate with CSPGs on a single HA chain, i.e. one side of each module would form intramolecular contacts, whereas the other would form intermolecular ones. A clear consequence of this hypothesis is that they could fit together in a repeated array, and this tessellation would propagate the gentle curve introduced into the HA at each binding site, giving rise to higher order superhelical structures (Fig. 9C). This model is consistent with the finding that the Link modules of versican mediate its interaction with HAPLN1 (54); it could also potentially explain how versican alone could bind HA in a cooperative manner as has been suggested recently (53). For aggrecan, the Ig module is necessary for binding to HAPLN1 in the absence of HA (55), but this does not preclude that intermolecular Link module-Link module interactions take place in the ternary complex. In this regard, we predict that Ig modules could be accommodated in the inner face of the helix, where their "homotypic" interaction should further stabilize the higher order complex. However, at this point, there are no experimental data permitting the exact position of the Ig domains to be determined.
| DISCUSSION |
|---|
|
|
|---|
As mentioned above, stacking interactions between sugar rings and aromatic side chains are ubiquitous in protein· carbohydrate complexes (5658), and three such contacts can be readily envisaged within the Link_TSG6·HA8 complex (i.e. with Tyr59, Phe70, and Tyr78). For instance, CH-
stacking interactions have been found to be partly responsible for the precise positioning of HA (for catalysis) in hyaluronate lyases from Streptococcus pneumoniae (5962) and Streptococcus agalactiae (63, 64), and in the case of Link_TSG6, it appears that the stacking interactions with Tyr59 and Tyr78 precisely position the bound HA in a similar fashion. In addition, the model also provides insight into the molecular basis of the selectivity of Link_TSG6 for HA over other carbohydrate ligands. Specifically, two hydrophobic pockets select for the alternating GlcNAc side chains of HA, whereas two basic residues at the correct separation (
20 Å) provide specificity for the GlcA groups. The binding