Structural Aspects of N-Glycosylations and the C-terminal Region in Human Glypican-1*

Background: Glypicans are a family of cell surface proteoglycans implicated in diverse cell signaling pathways. Results: We provide a description of the structure and functions of the N-glycans and C terminus of human glypican-1. Conclusion: The studies revealed the structural topology of glypicans with respect to the membrane and the protective roles of their N-glycans. Significance: Improved structural knowledge of glypican-1 helps to elucidate the functional roles of the glypicans. Glypicans are multifunctional cell surface proteoglycans involved in several important cellular signaling pathways. Glypican-1 (Gpc1) is the predominant heparan sulfate proteoglycan in the developing and adult human brain. The two N-linked glycans and the C-terminal domain that attach the core protein to the cell membrane are not resolved in the Gpc1 crystal structure. Therefore, we have studied Gpc1 using crystallography, small angle x-ray scattering, and chromatographic approaches to elucidate the composition, structure, and function of the N-glycans and the C terminus and also the topology of Gpc1 with respect to the membrane. The C terminus is shown to be highly flexible in solution, but it orients the core protein transverse to the membrane, directing a surface evolutionarily conserved in Gpc1 orthologs toward the membrane, where it may interact with signaling molecules and/or membrane receptors on the cell surface, or even the enzymes involved in heparan sulfate substitution in the Golgi apparatus. Furthermore, the N-glycans are shown to extend the protein stability and lifetime by protection against proteolysis and aggregation.

Glypiated proteins are anchored to the extracellular surface of the eukaryotic cell membrane by covalent linkage of their C termini to glycosylphosphatidylinositol (GPI) 3 (1). Glypicans (Gpcs) are a family of glypiated extracellular proteoglycans that mainly work as co-regulators of several signaling pathways, and they are thereby involved in the control of many biological processes such as cellular division, differentiation, and morphogenesis. To date, six different Gpcs have been identified in vertebrates, Gpc1 to Gpc6, with ϳ25% amino acid identity within the family, two in Caenorhabditis elegans (Gpc1 and Lon-2), two in Drosophila melanogaster (Dally and Dally-like protein), and one in zebrafish (knypek). The mature forms of human Gpcs have a core protein of ϳ60 -70 kDa in size and share a pattern of 14 conserved cysteine residues. In the C-terminal region, Gpcs share attachment sites for glycosaminoglycan (GAG) chains and a hydrophobic sequence for the addition of the GPI anchor (2).
GAGs are O-linked linear anionic hetero-polysaccharides found in all mammalian tissues in extracellular and/or intracellular environments. Their sequences are composed of repeating disaccharide units and are divided into classes depending on the disaccharide building blocks. The most widespread categories are as follows: 1) heparan sulfate (HS), which consists of repeating units of glucuronic acid (GlcUA) or iduronic acid (IdoUA) followed by N-acetylglucosamine (GlcNAc), i.e. (GlcUA/IdoUA-GlcNAc) n ; 2) chondroitin sulfate (CS) and dermatan sulfate, which consist of GlcUA or IdoUA followed by N-acetylgalactosamine (GalNAc), i.e. (GlcUA/IdoUA-GalNAc) n . The biosynthesis of GAG chains is performed by membrane-bound glycosyltransferases in the Golgi apparatus and starts with addition of xylose (Xyl) to the GAG attachment serine residue by a xylosyltransferase. Then two consecutive galactose units and a GlcUA unit are transferred, forming the linkage tetrasaccharide (GlcUA␤1-3Gal␤1-3Gal␤1-4Xyl␤1-O-Ser). Addition of the next residue is the critical factor in determining which type of GAG will be formed as follows: incorporation of GalNAc promotes CS/dermatan sulfate assembly, and addition of GlcNAc initiates HS synthesis. The GAG chain then becomes elongated by addition of the corresponding repeating disaccharides and undergoes serial modifi-cations, including N-deacetylation, N-and O-sulfation, and epimerization, generating GAG chains with heterogeneous structures (3).
Human Gpc1 is mainly expressed in the neural and skeletal systems during development, as well as many other tissues in the adult. Gpc1 is involved in the uptake of different macromolecules such as growth factors (FGF2), viral proteins, cytokines, and polyamines (4). Gpc1 knock-out mice have significant reduction in brain size at birth (about 30%), indicating a role for Gpc1 in brain development (5). Many studies have revealed involvement of Gpc1 in the pathogenesis of neurodegenerative diseases, including Alzheimer disease and transmissible spongiform encephalopathies (6,7). Moreover, it has been shown that Gpc1 is up-regulated in many human cancer types such as glioma, pancreatic cancer, and breast cancer (8,9).
The Gpc1 protein is composed of an N-terminal core (residues 24 -474) and a C-terminal GAG attachment region (residues 475-530), ending with a sequence of hydrophobic residues that link it to the GPI anchor (Fig. 1A). The Gpc1 protein is decorated with two N-linked glycans at positions Asn-79 and Asn-116 and is also substituted with three chains of HS at Ser-486, Ser-488, and Ser-490 (Fig. 1A). Previously, we determined the crystal structure of the N-glycosylated, C-terminally truncated Gpc1 core protein at 2.6 Å (PDB entry 4ACR) and showed that it has an elongated cylindrical form with dimensions 120 ϫ 30 ϫ 30 Å and an all ␣-helical fold (␣1-␣14) with three major loops (L1, L2, and L3) (Fig. 1B) (10). The complete disulfide pattern of the 14 conserved Cys residues across the Gpc family was also revealed in the structure. Six disulfide bonds are located near the protein N terminus in a region termed the Cys-rich lobe. This region is followed by the central lobe, stabilized by two evolutionarily conserved hydrophobic cores. Finally, the protease site lobe contains a furin protease site found in many Gpc family members. Recently, we achieved improvements of Gpc1 crystal diffraction properties by controlled crystal dehydration using a humidity control device (HC1b) (11) and generated better electron density for crystals of C-terminally truncated Gpc1 (12). Unfortunately, the 3-Å crystal structure of full-length Gpc1 (PDB entry 4AD7) contains no electron density for the C-terminal region (ϳ53 residues) that attaches the core protein to the cell membrane. Therefore, it was not possible to predict the location and orientation of the Gpc1 core protein and its HS chains with respect to the cell surface. Previous work from our laboratory shows that Gpc1 is invariably decorated with two N-glycans, which affect the Gpc1 expression level and HS substitution (13). Interestingly, the Gpc1 protein was correctly folded in the absence of N-glycans. The crystal structure of Gpc1 displayed only the first residue of the N-linked glycans, and therefore it is still of great interest to characterize the structure and function of Gpc1 N-glycans using other techniques.
In this work, we couple the existing knowledge from protein crystallography with small angle x-ray scattering (SAXS) and other biophysical techniques to provide new insights into the structural-functional characteristics of the Gpc1 core protein in solution. Briefly, we report the highest resolution and complete full-length Gpc1 crystal structure to date by further application of crystal dehydration to this system. Moreover, the composition and structure of the N-linked glycans have been characterized. Finally, molecular modeling based on SAXS data from various Gpc1 constructs was pursued to elucidate the spatial relationship between the cell membrane, Gpc1 core protein, and the N-and O-linked glycans. Taken together, our study provides a clearer topological picture of Gpc1 with respect to the cell membrane.

Experimental Procedures
Expression and Purification of Human Gpc1-The design of different Gpc1 variants is summarized in Table 1. All Gpc1 constructs were cloned and expressed in stable HEK293 cells as described previously (13). The conditioned medium was collected, and the proteins were purified using Ni-NTA affinity chromatography. The proteins were then dialyzed into 20 mM NaCl, 20 mM Tris, pH 8, for the C-terminally truncated protein (Gpc1-dC) and the same buffer with additional 2 mM DTT for the others before any further experiments.
Enzymatic Treatments-Deglycosylation of Gpc1-dC and the full-length protein without the HS chains (Gpc1-dHS) was carried out by growing the Gpc1-producing cells in protein-free medium containing 10 M plant alkaloid kifunensine, which forces the cells to express proteins with N-glycans of the high mannose type that are sensitive to endoglycosidase H (EndoH) (14). Afterward, EndoH treatment (New England Biolabs) of 1 mg of native high mannose Gpc1 was carried out in 25 mM sodium phosphate buffer, pH 7.0, by overnight incubation at 37°C with 30 milliunits of the enzyme. The deglycosylation efficiency was tested by SDS-PAGE, which showed a reduction in the size of the Gpc1-dC and Gpc1-dHS core protein bands by 5 kDa after deglycosylation, producing deglycosylated forms of the Gpc1 core proteins (Gpc1-dC-dN and Gpc1-dHS-dN), respectively. EndoH was removed from the sample by repeating the Ni-NTA purification.
Digestion of the HS chains was carried out as follows: the purified proteins were dialyzed into 10 mM HEPES and 3 mM Ca(OAc) 2 buffer, pH 7.0, overnight, and then 150 milliunits of HS lyase (Seikagaku, Japan) were added to 1 mg of protein and incubated overnight at 37°C. The protein buffer was exchanged to 0.3 M NaCl, 50 mM sodium phosphate, pH 8.0, by ultrafiltration, and then incubated with DE-53 DEAE-cellulose (for anion exchange) on a rocker at 8°C for 1 h. The unbound proteins (without heparan sulfate) were released by washing the cellulose twice with 0.3 M NaCl, 50 mM sodium phosphate, pH 8, and finally the HS lyase was removed by repeating the Ni-NTA purification. HS lyase digests the HS polysaccharide chains, leaving only the tetrasaccharide linkers (GlcUA␤1-3Gal␤1-3Gal␤1-4Xyl␤1) on the consensus serine residues of the HS attachment sites. Removal of HS chains from the wild type (GPC1-WT) and the proteins with disrupted N-glycosites, Gpc1-N79Q and Gpc1-N116Q, produced the Gpc1-WT-dHS, Gpc1-N79Q-dHS, and Gpc1-N116Q-dHS proteins, respectively.
Protein Crystallization, Dehydration, and Structure Determination-Crystallization of Gpc1-dHS protein was performed using sitting drop vapor diffusion by mixing 2 l of protein at 25 mg/ml with 2 l of reservoir solution containing 14 -16% PEG 6000, 0.1 M Tris-HCl, pH 8.0, and 0.2 M CaCl 2 , equilibrated over 0.5 ml of reservoir solution. Thin plate-like crystals of dimen-sions around 0.9 ϫ 0.2 ϫ 0.05 mm grew in 2 weeks within a lot of precipitated protein. Gpc1-dHS crystals were soaked in 16% PEG 6000, 0.2 M CaCl 2 , 0.1 M Tris, pH 8.0, and 15% ethylene glycol and then mounted on a mesh LithoLoop (Molecular Dimensions, UK) in the HC1b machine at beamline I911-3 of the MAX IV Laboratory, Lund, Sweden. The Gpc1-dHS crystals were dehydrated for a total incubation time of 50 -60 min to a final relative humidity of 86 -88%. The dehydrated crystals were then flash-frozen and stored in liquid nitrogen using the CATS sample changer (IRELEC, Saint-Martin-d-Hères, France), and subsequently complete diffraction data sets were collected at 100 K. Diffraction images were indexed and scaled using XDS (15) and were further processed using programs from CCP4 package (16). The initial model was obtained by rigid body refinement of the dehydrated Gpc1-dC model (PDB code 4BWE) using REFMAC5 (17), followed by manual building in Coot (18) and rounds of refinement using phenix.refine (19). Finally, model validation was performed using Molprobity (20). PyMOL Version 1.6 (Schrödinger, LLC) was used for molecular rendering.
N-Glycan Characterization by HILIC-FLD-UPLC-The Gpc1 proteins (150 g) were immobilized on a 10-kDa spin filter (Pall, Port Washington, NY), treated with denaturation buffer (50 l of 50 mM DTT, 20 mM NaHCO 3 ), and incubated at 65°C for 15 min to denature the proteins to allow for efficient cleavage of the N-glycans. After cooling to room temperature for 10 min, an iodoacetamide solution (50 mM, 50 l per well) was added and incubated at room temperature for 30 min. Samples were spun at 12,000 rpm for 5 min, and the flow-through was discarded, followed by two washings with MilliQ water (50 l) to remove residual amounts of denaturation reagents. N-Glycans were enzymatically released by digestion with recombinant N-glycosidase F (50 l, 0.5 milliunits in 20 mM NaHCO 3 , pH 7.0) (ProZyme, Hayward, CA) and incubated at 37°C overnight. After extraction, glycans were derivatized with 2-aminobenzamide (2-AB), sodium cyanoborohydride, 30% v/v acetic acid in DMSO at 65°C for 2 h. After cooling down to room temperature for 10 min, 100 l of acetonitrile were added to the 2-AB-labeled sample and loaded into a pre-conditioned Glycoworks HILIC cartridge (Waters, Milford, MA) for excess fluorophore removal. The cartridge was washed with 400 l of 85% acetonitrile, and glycans were eluted with 100 l of 100 mM ammonium acetate in 5% acetonitrile. The elution was repeated two more times, and the samples were concentrated to dryness in a vacuum evaporator. The samples were dissolved in acetonitrile/H 2 O (70:30) and analyzed by HILIC-FLD-UPLC using a 1.7-m BEH glycan column (2.1 ϫ 15 mm, Waters) and Waters ACQUITY UPLC I-class with fluorescence detection. The column temperature was kept at 40°C, and the flow rate set to 0.561 ml/min using a linear gradient of 50 mM ammonium formate, pH 4.4, against acetonitrile with ammonium formate from 30 to 47% over a 25-min period. An injection volume of 25 l of sample prepared in 70% v/v acetonitrile was used throughout. Fluorescence detection was achieved using excitation and emission wavelengths of 330 and 420 nm, respectively.
A representative HILIC-FLD-UPLC profile of Gpc1 was annotated with glucose unit (GU) values by comparison with a dextran hydrosylate ladder (21). Initial structural assignment of the glycans present in the peaks was performed by comparison of experimental data with GlycoBase (22). The Consortium for Functional Glycomics (CFG) glycan notation is used throughout (23).
Analytical SEC and DLS-SEC analysis was performed at room temperature using a pre-equilibrated Superdex 200 10/300 GL column (GE Healthcare) in the standard buffers. Sample homogeneity was assessed by DLS using a Zetasizer APS DLS system (Malvern Instruments Ltd., Malvern, UK) under the same conditions as the SAXS measurements were done, and the data were analyzed using Zetasizer software version 7.03.
SAXS Data Collection and Processing-Synchrotron radiation SAXS data were collected on the ID14-3 and BM29 beamlines, ESRF, Grenoble, France (24). Water measurements (empty capillary and water) were employed as a reference for further measurements and to give preliminary estimates for the sample molecular weight using the known absolute scattering of water (I 0, abs (water) ϭ 1.632.10 Ϫ2 cm Ϫ1 , at 25°C).
Size exclusion directly in-line with SAXS (SEC-SAXS) was used to obtain scattering data from highly pure monomers of Gpc1-dHS and Gpc1-dC using the HPLC system (Viscotek GPCmax, Malvern Instruments) attached directly to the sample inlet valve of the sample changer at BM29. 100 l of ϳ6 mg/ml clarified sample was loaded onto a Superdex 200 10/300 GL column (GE Healthcare) pre-equilibrated with 2 column volumes of the protein standard buffer. The protein was eluted at a flow rate of 0.5 ml/min and passed through the capillary cell, and a scattering frame was collected every 2 s with a total of 1200 frames. The EDNA pipeline (25) provided a one-dimensional profile for each frame. The individual frame was subtracted and processed using tools from the ATSAS suite (26) calculating the forward scattering I(0) and radius of gyration R g . Frames with stable R g values from the monomer peak scattering intensity (the second half of the peak) were merged to provide a single averaged frame corresponding to the scattering of an individual SEC-purified monomer.
For concentration series measurements, the samples were dialyzed overnight against the standard buffers, and concentration series were prepared. A sample changer robot was used to load 35 l of the clarified sample into the measurement capillary, and SAXS data were collected in flow mode. Scattering profiles of the filtered dialysis buffers were subtracted from the corresponding sample scattering profiles. Real space R g , excluded particle (Porod) volume, and the maximum particle dimension (D max ) were estimated from the pair distance distribution function P(r) using the program GNOM. Molecular mass was determined using the program SAXSMoW. Porod-Debye plots were employed to analyze the protein flexibility according to Rambo and Tainer (27).
Ab Initio Modeling-Ab initio shape reconstruction was performed using the simulated annealing method as implemented in the DAMMIN program (28). 20 different models were aligned and averaged, and the most typical model was generated using DAMAVER program suite (29). To test the dependence of the SAXS shape reconstruction on methods, the program GASBOR (30) was also used to compute a set of 20 different ab initio shape envelopes using GNOM output up to 4.5 nm Ϫ1 and the same D max and R g values used with DAMMIN.
All-atom Modeling, MES, and EOM-All-atom models (AllosMod) of Gpc1-dC and Gpc1-dHS were generated using tools on the SaliLab web server (54). ModLoop was used for modeling of loops missing in the Gpc1 crystal structure (PDB code 4YWT) (31). No building was performed for the missing residues at the N and C termini of the core proteins. Next, static models of Gpc1-dC and Gpc1-dHS with flexible glycans were generated in MODELLER utilizing the ModLoop output. Monosialylated digalactosylated biantennary glycans (at position Asn-79) and core-fucosylated monogalactosylated biantennary complex glycans with bisecting GlcNAc (at position Asn-116), corresponding to the predominant glycoforms identified in the MS studies, were added with ideal geometries, followed by a 1 Å randomization of the all-atom coordinates, where the motions were restricted to loops and surface side chains. The generated core models were then used for further modeling jobs. Finally, a 330 K AllosMod simulation was used to generate 2000 protein conformations with rigid sugars consistent with the input structure. The ensembles of glycosylated models were fitted to the raw SAXS data using FoXS (32). The 2 value was used to evaluate the goodness of fit and to select the AllosMod models of best/poorest consistency with the SAXS data. Minimal ensemble searches (MES) with up to four conformations were tried to minimize the 2 values (33).
Furthermore, an ensemble containing a number of different conformations of the flexible parts was obtained by the EOM (34). The static Gpc1 model generated by MODELLER (residues 26 -476) was used as a fixed core in RANCH to generate a large pool of 40.000 independent conformers of the Nand C-terminal residues of Gpc1-dHS with the four best orientations of N-glycans (obtained from Gpc1-dC MES results). GAJOE was used for ensemble selection by minimizing the discrepancy 2 .
Rigid Body Modeling-Molecular modeling of the N and C termini was conducted using rigid body modeling as imple-mented in the program CORAL (26), based on the static model generated by MODELLER. N-Glycan chains with structures identical to the ones used in Allosmod runs were treated as rigid bodies with contact restraints of 7 Å to the appropriate asparagine residues Asn-79 and Asn-116. Fifty independent CORAL runs were performed by minimizing the discrepancy 2 between the theoretical scattering curve calculated from the model and the experimental data. For Gpc1-dC dimer generation, 2-fold symmetry was applied. All CORAL models were aligned in Supcomb13 (35) and analyzed in PyMOL to identify the most typical structures. The theoretical scattering amplitude of the generated models together with the their discrepancies 2 to the experimental data were calculated using CRYSOL (36).
Peptide Mass Spectrometry-In-gel trypsin digestion of Gpc1 SDS-PAGE bands followed by LC-MS/MS analysis was performed to identify protein sequences as described previously (37). Data-dependent mass spectrometry experiments were performed with an EASY LC Nano Flow high performance liquid chromatography (HPLC) system (Proxeon Biosystems, Odense, Denmark) connected to an LTQ-Orbitrap Velos Pro mass spectrometer (Thermo Fisher Scientific, Waltham, WA) equipped with a nano-Easy spray ion source (Proxeon Biosystems, Odense, Denmark). The chromatographic separation was performed at 40°C on a 15-cm (75-m inner diameter) EASY-Spray column packed with 3 m of resin (Proxeon Biosystems, Odense, Denmark). The nano-HPLC intelligent flow control gradient was 5-20% solvent B (0.1% (v/v) formic acid, 100% (v/v) acetonitrile in water) in solvent A (0.1% (v/v) formic acid in water) for 120 min and then 20 -40% for 60 min followed by an increase to 90% for 5 min. A flow rate of 300 nl/min was used through the whole gradient. An MS scan (400 -1400 m/z) was recorded in the Orbitrap mass analyzer set at a resolution of 60,000 at 400 m/z, 1 ϫ 10 6 automatic gain control target, and 500-ms maximum ion injection time. The MS was followed by data-dependent collision-induced dissociation MS/MS scans on the eight most intense multiply charged ions in the LTQ at a 500 signal threshold, 3 m/z isolation width, 10-ms activation time at 35 normalized collision energy and dynamic exclusion enabled for 60 s. The general mass spectrometric conditions were as follows: spray voltage 2.0 kV; no sheath or auxiliary gas flow; S-lens 60%; ion transfer tube temperature 275°C. Raw data were processed by Mascot Distiller searching the Swis-sProt database (release, December 11, 2013, containing 541,954 entries) with an in-house Mascot database. The search parameters for the Mascot searches were as follows: taxonomy, Homo sapiens; enzyme, trypsin or chymotrypsin; variable modifications, oxidation (methylation); precursor tolerance, 20 ppm; and MS/MS fragment tolerance, 0.1 Da.
Gpc1-dHS Crystal Dehydration and Structure Determination-The full-length Gpc1-dHS protein crystallized in space group P2 1 , containing four monomers in the asymmetric unit, with unit cell dimensions of a ϭ 47.2, b ϭ 169.0, c ϭ 151.6 Å, ␤ ϭ 95.0°, and crystals diffracted to ϳ3 Å resolution (10). Similarly to crystals of Gpc1-dC, the Gpc1-dHS crystals were not isomorphous (with c dimension varying between 148 and 155 Å) and diffracted anisotropically, as revealed by a significantly higher B factor in the c* dimension than in the a* and b* directions (⌬B ϭ 56 Å 2 ) (39). Some parts of the structure, including the long C terminus, were disordered and not visible in the electron density map.
We have previously shown that controlled dehydration using the HC1b machine greatly improved the diffraction properties and amount of visible structure in Gpc1-dC crystals (12). Here, we investigated whether we could improve the diffraction quality of full-length Gpc1-dHS crystals using the same method, aiming to resolve at least some of the structure of the C-terminal region. We optimized the dehydration protocol for Gpc1-dHS crystals and succeeded in reproducing isomorphous dehydrated crystals with unit cell dimensions of a ϭ 46.8, b ϭ 166.6, c ϭ 137.7 Å, ␤ ϭ 90.4°, which diffracted to 2.1 Å in the best orientation. This represents a reduction in the c axis length by 13.9 Å. Complete 2.3 Å data were collected from a crystal dehydrated to a final relative humidity of 87% with dehydration rate of 0.5% per 200 s and total incubation time of 55 min in the humidified air stream of the HC1b machine ( Table 2). The anisotropy ⌬B (22 Å 2 ) and Wilson B factor (38.7 Å 2 ) of the new data were reduced by 61 and 27%, respectively, compared with nondehydrated crystals. These revealed significant improvement in the lattice order and packing after dehydration, generated much better and less noisy electron density maps, allowing the building of more complete monomers in the asymmetric unit (5% more than in 4AD7), and displayed a better defined side-chain density. The final Gpc1-dHS model included residues Pro-25-Asp-475 with only a few missing residues (Pro-350 to Arg-360 and Ser-408 to Asp-412). The overall model B factor fell from 74.8 to 59.1 Å 2 after dehydration. The backbone flexibility of Gpc1 as measured by the C ␣ B factors was significantly higher than average for residues close to the Asn-79 glycosylation site, the N-terminal region (including elements ␣1, L1, and part of ␣2), and parts of the protease site lobe (involving ␣4, ␣5, ␣11, and L2). The L3 loop was less flexible than the others, being stabilized by two disulfide bonds (Fig. 1B). Unfortunately, no additional electron density was observed for the C-terminal domain after dehydration, which confirms that the HS attachment region is highly flexible and has no unique structure in the Gpc1-dHS crystals. This raises the question whether the C terminus is intrinsically disordered or contains any locally ordered structure.
C-terminal Flexibility and Bioinformatics-Different intrinsic disorder predictors such as IUPred (40), PONDR-FIT (41), DISEBML (42), POODLE (43), and DISpro indicated that the C terminus of Gpc1 (Asp-475-Thr-529) has a large tendency to be unstructured, with a slight propensity for higher order in the region from Gly-493 to Lys-506 (Fig. 1C). Furthermore, the secondary structure predictors PSIPRED (44) and JPRED (45) suggested the absence of secondary structure. However, the PROF (46), GLOBPLOT (47), and SSPro (48) predictors proposed a short low complexity sequence between Cys-494 and Ser-507. Taken together, these analyses imply that indeed most of the Gpc1 C-terminal sequence lacks tertiary structure and that the HS attachment region and the last part of the C termi-

Gpc1 proteins as cited in text Characteristics
Gpc1-dC C-terminally truncated Gpc1 carrying two N-glycans Gpc1-dC-dN EndoH-deglycosylated C-terminally truncated Gpc1 Gpc1-dHS Full-length Gpc1 carrying two N-glycans but no HS substitution (by mutagenesis of S486A, S488A, and S490A) Gpc1-dHS-dN EndoH-deglycosylated full-length Gpc1 Gpc1-WT Wild-type Gpc1 substituted with three HS chains and two N-glycans Gpc1-WT-dHS Gpc1-WT treated with HS lyase enzyme Gpc1-N79Q Gpc1 substituted with three HS chains and only one glycan at Asn-116 (by mutagenesis of N79Q) Gpc1-N79Q-dHS Gpc1-N79Q treated with HS lyase enzyme Gpc1-N116Q Gpc1 substituted with three HS chains and only one glycan at Asn-79 (by mutagenesis of N116Q) Gpc1-N116Q-dHS Gpc1-N116Q treated with HS lyase enzyme N-Glycan Characterization-The structures of the N-glycans decorating the Gpc1 core protein were investigated by chromatographic approaches. The total N-glycan pools of Gpc1-WT, Gpc1-N79Q, and Gpc1-N116Q were assigned using exoglycosidase digestions and analysis on the 1.7-m HILIC phase (Fig. 2). The resulting data confirmed the structural diversity of the glycans in all the samples. Emphasis was placed on the identification of the most abundant peaks to use in SAXS modeling (see below) rather than a comprehensive analysis of the structures of all N-glycans. The Gpc1-WT profile consisted of 44 chromatographic peaks (supplemental Table S1). The largest proportions of oligosaccharides on Gpc1-WT have core-fucosylated biantennary structures with bisecting GlcNAc carrying one galactose (FA2BG1) and N-glycan core structure with 4 N-acetylhexosamine residues (HexNAc), which represent 9.7% of the total glycan pool. Core fucosylated biantennary structures with bisecting GlcNAc carrying two galactoses (FA2BG2), mannose-5 (M5), biantennary structure with bisecting GlcNAc carrying two galactoses and one sialic acid (A2BG2S1(6)), and finally biantennary glycan with bisecting GlcNAc and one galactose (A2BG1) accounted for 5.8, 4.6, 4.4, and 4% of the total glycan pool, respectively (supplemental Table S2).
A series of exoglycosidase digestions was performed to identify and assign N-glycan structures to particular chromatographic peaks. The undigested profile of Gpc1-WT consisted of 44 chromatographic peaks, but the majority of the peaks had less than 2% abundance (supplemental Table S1). A panel of digestions for Gpc1 N-glycans that includes the most commonly used exoglycosidase enzymes (␣-sialidase, ␣-fucosidase, ␤-galactosidase, ␤-hexosaminidase, and ␣-mannosidase) is shown in Fig. 3. The digested glycans separated by HILIC have a logical movement in GU value, whereby each oligosaccharide residue can be accounted for by a constant value, depending on linkage. The GU shifts followed the consecutive removal of terminal sugar residues. After sequential digestion with exoglycosidases, we were able to assign the most abundant peaks.
After sialidase treatment (ABS digestion), biantennary structures with bisecting GlcNAc carrying two galactoses and one sialic acid (␣2,6-linkage) with GU 8.30 digest back to biantennary structures with bisecting GlcNAc carrying two galactoses with GU 7.15. The digestion with NAN1 enzyme (a recombinant sialidase that removes ␣2,3-linked nonreducing terminal sialic acids) confirms that sialic acid is linked via an ␣2,6-linkage, as the addition of this enzyme does not move the peak. The peak with GU 7.65 does not move upon addition of sialidase enzymes, but upon addition of ␣-fucosidase, the enzyme moves back by 0.5 GU, suggesting that is core-fucosylated. In addition treatment with ␤-galactosidase resulted in peak movement by 1.6 GU, corresponding to two galactose residues. The most abundant peak with GU 6.86 appeared to contain two co-eluting structures. Digestion with ␤-galactosidase confirmed the presence of one galactose residue at the terminal end, but some proportion (30%) of the peak remained after addition of siali-   dase, fucosidase, galactosidase, and mannosidase, suggesting the presence of terminal HexNAc residues. Following enzymatic treatment with ␤-N-acetylhexosaminidase the peak changed elution position. Unfortunately, because of the specificity of the enzyme, it is not possible to confirm whether the terminal residue represents a GlcNAc or a GalNAc residue. Based on database matching, the structure could correspond to a tetra-antennary structure but also to the structure containing LacdiNAc residue (GalNAc␤1-4GlcNAc), which seems more likely because HEK cells were previously reported to express this structural feature (49,50).
The identification of GU 6.5 peak was primarily based on ␣-fucosidase digestion, as the peak did not change either its elution or migration position following enzymatic treatment, but its relative area increased due to digestion of the core-fucosylated analogue. The presence of a high mannose structure in the peak with GU 6.17 was confirmed by database matching and also with ␣-mannosidase digestion.  To identify glycans originating from each glycosylation site, the Gpc1 mutants of Gpc1-N79Q and Gpc1-N116Q were analyzed by HILIC-FLD-UPLC, allowing for identification of the most abundant structures from the Asn-116 and Asn-79 sites, respectively. Most of the structures overlapped between the two glycosylation sites; however, there were some trends specific for each one. Less complex glycans were more abundant at Asn-116 (Gpc1-N79Q), whereas more complex sialylated glycans were more prominent at Asn-79 (Gpc1-N116Q) (Fig. 2, B and C).
Biophysical Characterization of Purified Gpc1-Analytical SEC of Gpc1-dC and Gpc1-dHS displayed a main elution peak representing a Gpc1 monomer (Fig. 4A). The molecular mass of the Gpc1-dC monomer calculated from SEC (ϳ62 kDa) was significantly smaller than that of the Gpc1-dHS monomer (ϳ71 kDa), confirming the presence of the C-terminal domain in the dHS protein (verified by SDS-polyacrylamide gel). The homogeneity of the eluted protein fractions was assessed by DLS, without performing further concentration (Fig. 4B). The volume distribution plots of the fractions from the second half of the SEC peak of Gpc1-dC and Gpc1-dHS showed single, monodisperse peaks with a polydispersity index of 11-14% and similar patterns with R h ϭ 4.1 nm for Gpc1-dC and 4.25 nm for Gpc1-dHS. An estimation of the molecular mass assuming a globular protein suggested a mass of 89.4 Ϯ 12.6 kDa for Gpc1-dC and 99.6 Ϯ 27 kDa for Gpc1-dHS. The reason for such molecular mass overestimation can be the highly nonglobular shape of Gpc1 and/or the flexibility of the N-glycan chains decorating the core protein.
Structural Studies Using SAXS-To elucidate the structure of glycosylated Gpc1 in solution, SAXS data were collected for both the glycosylated monomeric Gpc1-dC and the Gpc1-dHS proteins using in-line SEC-SAXS (Fig. 4 and Table 3). No sign of protein aggregation was observed. The SAXS curves of Gpc1-dC and Gpc1-dHS were distinctly similar over the whole q-range collected, except in the region between 0.8 and 1.5 nm Ϫ1 , producing similar R g (ϳ3.6 nm), but with a larger Porod volume for the full-length protein (about ϳ14% higher). The most accurate method for the estimation of the mass of the glycoproteins was provided by SAXSMOW calculation (51), with an expected uncertainty of 10%. In our hands, the SAX-SMOW estimated masses of 67.0 and 74.9 kDa for Gpc1-dC and Gpc1-dHS, respectively, are consistent with monomeric proteins. Thus, the solution scattering data meet the essential requirements for extracting accurate monomer shape information.
To estimate the distribution of masses within the particle and its shape, the pair-distance distribution P(r) function was calculated from the scattering data using GNOM (52). The P(r) profiles of Gpc1-dC and Gpc1-dHS showed main peaks around 2.75 and 2.87 nm, respectively, with inclined distributions of vector lengths (Fig. 4C). This indicates that both proteins have an extended structure in solution, with maximum dimensions of 118.0 and 119.5 Å, respectively, consistent with the crystal structures. However, the P(r) plot of Gpc1-dHS exhibited an asymmetric fall-off accompanied by an extra shoulder around 5.8 nm that may be correlated to the C-terminal extension. The characteristic shape factor (the R g /R h ratio) for a globular protein is ϳ0.774; however, when molecules deviate from globular shape to ellipsoidal, increases, as the R g becomes larger than R h (53). Gpc1-dC has ϭ 0.9, whereas Gpc1-dHS has ϭ 0.87, which reveals a more elongated structure for the truncated version of Gpc1 than for the full-length one. This is consistent with the idea that the C-terminal region may extend perpendicular to the protein surface (see below). To gain insight into Gpc1 flexibility, Porod Debye plots were calculated and displayed a loss of the plateau in the full-length protein compared with the truncated version (Fig. 4D). This implies that the scattering contrast had become more diffuse, suggesting increased flexibility in the presence of the C-terminal domain.
The DAMMIN ab initio shape reconstruction program (28) was employed to generate 20 models of Gpc1-dC and Gpc1-dHS (Fig. 4C, insets, and Table 3) with good structural convergence, as reflected in the low normalized spatial discrepancy values following structural alignment (normalized spatial discrepancy 0.71 Ϯ 0.02 and 0.64 Ϯ 0.02 for Gpc1-dHS and Gpc1-dC respectively; see Table 3). The resulting models revealed sizes and shapes consistent with the core protein substructures, in good agreement with the SAXS patterns (Fig. 4E). Despite the fact that the N and C termini and two N-glycans should contribute to the SAXS patterns, they were not resolved as features in the averaged low resolution envelopes. These would presumably provide extra protrusions to the envelopes close to their attachment sites in the individual bead models, which are averaged out during the process of averaging and filtering to generate the final conserved model. The Gpc1-dHS ab initio model showed an additional small conserved bulge that protruded ϳ10 Å from the middle of the protein, proximal to the last C-terminal residue visible in the crystal structure (Fig. 4C). To test the dependence of the ab initio envelopes on the program used to determine them, a different algorithm, namely GASBOR, was also tested. Both GASBOR and DAM-MIN produced truncated and full-length models with similar overall shapes (data not shown). Structural Reconstruction of the N-Glycans Decorating the Gpc1 Core Protein-To explore the N-linked glycan assembly on the Gpc1 core protein, we employed the Gpc1 crystal structure determined in this study to reconstruct the N-glycan structure using the SAXS data from GPC1-dC (i.e. lacking the C-terminal extension). First, the structures of missing loops in Gpc1 were built and optimized using the ModLoop protocols (31). Subsequently, the predominant N-linked glycan structures and the N terminus were reconstructed using all atom modeling (AllosMod, as described in the "Methods" section). Modeling with the most abundant glycan structures has been demonstrated to give good agreement with SAXS data (54). The predominant N-glycan structures vary between different batches of purified protein, so we assigned the structures of the predominant glycoforms from the same protein batch that was used for SAXS data collection, and these structures were used for further modeling. Monosialylated digalactosylated biantennary complex glycans (at position Asn-79) and core-fucosylated monogalactosylated biantennary complex glycans with bisecting GlcNAc (at position Asn-116) were the predominant glycoforms derived from the chromatographic data of purified Gpc1 without heparan sulfate (data not shown).
The simulated Gpc1-dC model without N-glycans gave a poor agreement with the observed SAXS data, even when the N terminus was reconstituted by modeling (Fig. 5A). In contrast, good agreement with the SAXS data were achieved with the glycosylated models; thus, the sampling of glycan conformations (accounting for ϳ9% of the total scattering mass) was crucial for generating accurate models. A comparison of the 10 best and poorest fitting AllosMod models of Gpc1-dC indicated an obvious difference in the glycan orientation (Fig. 5, A and B). In the 10 best models ( 2 ϭ 1.0 Ϯ 0.05), the two glycan chains protrude outward from helix ␣2, and the N terminus spreads out close to the protein surface, whereas in the 10 worst fitting models ( 2 ϭ 3.99 Ϯ 0.08), the glycan chains were localized close to the protein surface and the N terminus protruded away from the core.
Fifty CORAL runs (26) were carried out to check the Gpc1-dC model reproducibility using a different algorithm than AllosMod. The proposed CORAL models were consistent throughout different modeling runs. The five CORAL models having the best agreement with the SAXS data ( 2 ϭ 0.9 Ϯ 0.002) had the glycan chains consistently pointing outward in similar orientations to the best models from Allosmod (data not shown). Nonetheless, AllosMod has the advantage of an allatom modeling approach to obtain stereochemically sound models.

FIGURE 5. Molecular modeling of Gpc1 N-glycans and the N and C termini by SAXS.
A, comparison of experimental scattering data of the Gpc1-dC protein (gray dots) and the fit of the deglycosylated model (green), the AllosMod best models (red), worst models (blue), and MES models (orange), respectively. D, experimental SAXS data of the Gpc1-dHS (gray dots) and their agreement with the theoretical SAXS profiles calculated for Gpc1 truncated model without the C terminus (green), the AllosMod best (red), worst (blue), and MES models (orange), respectively. The Gpc1-dC models are shown in B and C; the corresponding Gpc1-dHS models are shown in E and F. The core protein is shown as a light pink schematic with N-glycans displayed as spheres colored with the corresponding fit color in A. All the models are docked in the ab initio gray envelope of Gpc1-dC (B and C) and Gpc1-dHS (E and F).
Because of the inherent flexibility of the N-glycan chains, MES was attempted to improve the agreement with the SAXS data by including up to four models from each ensemble (33). MES provided a modest improvement in the consistency with the experimental SAXS data over the single model ( 2 ϭ 0.87 versus 0.94 for the best single model) (Fig. 5, A and C). Both of the N-glycans were modeled in proximity to high B factor regions of the core protein crystal structure (Fig. 1B) and oriented alongside helix ␣2 with higher locational variability for Asn-79 glycans than for those on Asn-116. The Asn-79 glycan chains diverged widely over an area, including the N-terminal helix ␣1, the hydrophobic part of L1 (GFSLSDVPQA), and the beginning of the L3 loop, suggesting that they are not involved in specific interactions with the core protein. In contrast, the Asn-116 glycan chains were oriented in the vicinity of the end of L2 with smaller divergence (Fig. 5C).
Spatial Occupancy of the Gpc1 C Terminus in Solution-To elucidate the structural orientation of the C-terminal residues and to allow an estimation of the distance between Gpc1 and the cell membrane, similar modeling strategies were used to build models that agree with the Gpc1-dHS SAXS data. The best N-glycosylated truncated Gpc1 structure lacking the C-terminal residues had poor agreement with the Gpc1-dHS SAXS data ( 2 ϭ 1.73, Fig. 5D). Therefore, generating Gpc1-dHS SAXS-consistent models require the capturing of the correct N and C termini and glycan orientations. The C terminus is not folded around the core protein in the best fitting models ( 2 ϭ 0.960 Ϯ 0.007; Fig.  5E) but is rather extended toward the periphery, stretching for ϳ40 Å from the last helix in the crystal structure (␣14). In the poorest fitting models ( 2 ϭ 4.6 Ϯ 0.1), the C terminus was even more highly extended (Ͼ70 Å). Furthermore, the modeled positions of the N terminus and N-glycan chains in the best and worst models generated by AllosMod were highly similar in both Gpc1-dC and Gpc1-dHS, which strengthens our confidence in the interpretation of these structural features. The CORAL approach reproduced the characteristic extension of the C terminus obtained by the AllosMod method with comparable agreement with the experimental SAXS data ( 2 ϭ 0.92 Ϯ 0.008; data not shown).
Because of the high number of variables in Gpc1-dHS modeling (N and C termini and two glycan chains), the probability of capturing a single conformation that most correctly explains the SAXS pattern is quite limited. MES was successful in improving the fit over a single model ( 2 of 0.9 and 0.95, respectively). The best fitting C-terminal conformations form a compact cluster, spreading between 35 and 40 Å in a direction perpendicular to the core protein and mainly localized between the central and the protease site lobes (Fig. 5F). Furthermore, EOM modeling (34) was used to check the consistency of the MES modeling processes. The EOM ensemble of models of Gpc1-dHS showed the C terminus in the same orientation as in the MES results but with a shorter extension (between 30 and 35 Å; data not shown). The Gpc1 membrane-proximal surface, defined by this orientation of the C terminus, is composed of the conserved ␣14, L1, and L3 in the Cys-rich lobe and ␣4 and ␣5 in the protease-site lobe (Fig. 10). The Asn-79 glycan chains are directed perpendicular to the C terminus, whereas the Asn-116 glycans are located on the opposite surface of the protein.
Does Gpc1 Self-associate in Solution?-In the SEC experiments, when we increased the loading concentration of Gpc1-dC to Ն1.5 mg/ml, the position of the main peak moved toward a higher mass, and a small shoulder was eluted at 12.4 ml (Fig. 6A), suggesting a protein dimer. This shift in the eluted peaks might originate from various monomer/dimer mixtures in the solution. SAXS measurements of Gpc1-dC at different concentrations (0.75-6 mg/ml) showed apparent changes in the low-q region indicative of concentration-dependent self- FIGURE 6. Comparative studies of Gpc1-dC and Gpc1-dHS at different protein concentrations. A, analytical SEC profiles showing the concentration effect on the elution volume. One hundred l of 0.2 mg/ml (light blue), 0.5 mg/ml (blue), 1.4 mg/ml (orange), 3.4 mg/ml (green), and 6 mg/ml (red) solutions of Gpc1-dC and 1 mg/ml (violet) and 6 mg/ml (brown) Gpc1-dHS solution were loaded onto the Superdex 200 column. SAXS curves of Gpc1-dC (B) and Gpc1-dHS (C) at protein concentrations of 0.75 mg/ml (orange), 3 mg/ml (green), and 6 mg/ml (red) in a comparison with the SAXS pattern of the equivalent monomer (blue) collected from SEC-SAXS setup. The related P(r) plots are shown in insets in B and C. SEPTEMBER 18, 2015 • VOLUME 290 • NUMBER 38 association or oligomerization ( Fig. 6B and Table 4). A small change in the SAXS curves (particularly at 6 mg/ml) beyond the Guinier region was observed when compared with Gpc1-dC SEC-SAXS data, which support the emergence of dimer species at high concentrations. Furthermore, the measured R h of GPC1-dC from DLS was directly proportional to the protein concentration, with a dramatic difference between the monomer (collected from the SEC fractions) and concentrated samples (Table 5). Importantly, the intermediate samples showed a high polydispersity index, suggesting the presence of multiple species.

Solution Structure of Human Glypican-1 Core Protein
The DLS data show that at 6 mg/ml concentration, Gpc1-dC forms a highly extended dimer with a shape factor of 1.2. Moreover its SAXS P(r) plot showed a main peak at 2.7 nm with an extra shoulder at 5.9 nm, introduced by the dimerization (Fig.  6B, inset). Hybrid modeling by DAMMIN and CORAL (26) of the SAXS data collected at 6 mg/ml suggested an elongated architecture for the dimer in which the N terminus was predominantly extended away from the protein (Fig. 7). Gpc1-dC dimerized in the vicinity of the protease site lobe, particularly the region of ␣5 and ␣6, which is close to the location of the C terminus in the Gpc1-dHS SAXS models (Fig. 7). In contrast, little SEC peak shift or change in R h was observed for the Gpc1-dHS samples (Fig. 6A). Furthermore, all SAXS patterns collected at different Gpc1-dHS concentrations have similar shapes, with little difference in the Guinier region and P(r) when compared with the Gpc1-dHS monomer SAXS curve (Fig. 6C). Taken together, this indicates that self-association does not occur for Gpc1-dHS in the tested concentration range of 0.75 to 6 mg/ml and suggests that the presence of the C-terminal tail prevents unwanted interactions of the core proteins.
N-Glycosylation Protects Aggregation-and Degradationprone Regions of Gpc1-EndoH enzymatic removal of the N-glycans from Gpc1-dHS was achieved as described under "Experimental Procedures," producing the deglycosylated Gpc1-dHS-dN with a mass ϳ5 kDa lower than that of the glycosylated protein, as shown on an SDS-polyacrylamide gel (Fig.  8A). Importantly, SAXS measurements of Gpc1-dHS-dN showed little concentration-dependent behavior, displaying invariability in the q-range beyond the Guinier region, which speaks against variation in the oligomeric structure in the concentration range 0.7-2.5 mg/ml. However, models generated by DAMMIN and CORAL using the merged SAXS data of dHS-dN revealed an elongated dimeric structure rather than a monomer (Fig. 8, B and C). Gpc1-dHS-dN dimerized through parts of the Cys-rich lobe that are covered by the Asn-79 gly-  cans in the glycosylated Gpc1. Removing the N-glycans from Gpc1-dC resulted in complete aggregation during purification. Furthermore, the mutant N79Q showed a high propensity for aggregation after removing the HS chains (Gpc1-N79Q-dHS; data not shown). Taken together, these data suggest a vital role for Asn-79 glycans in preventing protein aggregation, at least in vitro. The solubility difference between Gpc1-N79Q-dHS and deglycosylated Gpc1-dHS might originate from the first GlcNAc residue linked to Asn-79, which is not removed by EndoH. This GlcNAc confers most of the thermal stabilization effect of the N-glycans on Gpc1, as reported previously (13), and it seems likely that it also confers solubility enhancement.
In addition to aggregation, the Gpc1-N79Q samples showed degradation on SDS-polyacrylamide gels, with a strong extra band at ϳ47 kDa (Fig. 9A), which indicates that the lack of the Asn-79-glycosylation has rendered N79Q susceptible to intracellular and/or extracellular proteolytic degradation. This proteolysis site was identified by a mass spectrometric analysis and shown to be close to the beginning of the L3 loop (between Lys-404 -Arg-414). This evidence reveals a role for Asn-79 glycans in protecting against degradation of Gpc1.
We used highly monodisperse preparations of purified Gpc1-WT-dHS and Gpc1-N116Q-dHS (after removal of the HS by HS lyase) for SAXS data collection at different protein concentrations (up to 3 mg/ml) (Fig. 9, A and B). The HS lyase enzyme degraded the HS chains but did not remove the tetrasaccharide linker attached to the serine. The Gpc1-WT-dHS and Gpc1-N116Q-dHS scattering profiles were smoother than for the proteins without the HS linkers and had almost identical concave patterns accompanied by negligible differences in R g and D max (Table 4), without concentration-dependent self-association or aggregation at the tested concentrations. Their molecular masses and Porod volumes were highly overestimated, presumably because of the flexibility of the partially glycosylated C-terminal tails (bound to three tetrasaccharide linkers) and/or the presence of some oligomers. The pair distribution functions of Gpc1-WT-dHS and Gpc1-N116Q-dHS suggested similar shapes, with Gpc1-WT-dHS being slightly larger due to its extra Asn-116-linked glycan chain (Fig.  9C). The generated ab initio envelopes featured similar overall expanded shapes, with a larger volume for the WT samples. Attempts to model Gpc1-WT-dHS and Gpc1-N116Q-dHS using AllosMod and CORAL failed due to the complexity at the C terminus, with three flexible tetrasaccharide chains in a very confined region; thus the programs were unable to model them correctly. We concluded from comparison of Gpc1-WT-dHS and Gpc1-N116Q-dHS that no structural changes in Gpc1 (at the resolution level of SAXS) could be noticed after disrupting the Asn-116 site, and therefore that this glycan might have other functional roles.

Discussion
Glypicans are multifunctional GAG-substituted proteoglycans involved in the regulation of several cellular signaling pathways. Defects in their function lead to developmental distortions (2,55). The regulatory activity of Gpcs is based on their ability to either inhibit or stimulate the interaction of many growth factors with their signaling receptors. GAG chains are responsible for many of the biological functions of Gpcs. However, recent studies suggest regulatory roles for Gpc core proteins, for example in mediating cell signaling by direct binding to e.g. BMP4, FGF2, Wnt, and Hedgehog (56 -58).
The crystal structure of the Gpc1 core protein reveals a quite rigid, elongated single-domain ␣-helical fold, although flexibility is higher toward the ends. Almost identical disulfide connectivity patterns and structural similarity between the Cys-rich lobe of Gpc1 and the Cys-rich domain of the Frizzled receptors (functioning in Wnt signaling) have been reported (59), but whether this has functional relevance is still unknown.
Structural Features of the Gpc1 N-Glycans-The long ␣2 helix of the Gpc1 structure traverses the entire length of the protein and carries two N-glycan chains, one at each end (10). The Gpc1 N-glycan chains are not fully resolved in the crystal structure, due to intrinsic heterogeneity and flexibility. Glycoproteins are usually micro-heterogeneous, presenting a range of different N-glycans at each glycosylation site. The expression system and/or the folded protein itself can influence the N-glycan diversity, probably by affecting the substrate availability and the proximity of the N-glycosylation sites to the Golgi glycosyltransferases and glycosidases (60).
The N-glycan analyses presented here have revealed highly heterogeneous complex-type glycoforms for Gpc1, but it is not clear whether this heterogeneity has a functional impact. Characteristics of the Asn-79-linked glycans include a high incidence of core fucosylation and sialylation, presenting relatively strong electronegative charges, which we discovered, based on the SAXS modeling, cover the hydrophobic patches on the Cys-rich lobe, including parts of the L1 loop and the ␣1 and ␣2 helices, that would otherwise promote aggregation. Furthermore, these Asn-79 oligosaccharides seem to limit intra-and/or extracellular proteolysis of the Gpc1 core protein, specifically at the beginning of the L3 loop, extending the protein stability and lifetime, whereas removal of Asn-116 glycans does not affect the overall structural characteristics of Gpc1.
Topology of the Gpc1 Core Protein, N-and O-Glycans with Respect to the Cell Surface-The Gpc1 C-terminal region lacks significant secondary or tertiary structure and is disordered in Gpc1-dHS crystals, even after controlled crystal dehydration, which otherwise significantly improves crystalline order, resolution, and diffraction anisotropy. In view of this, we coupled our crystallographic knowledge on Gpc1 with SAXS to obtain information on the localization of the C terminus and thus the potential spatial orientations of Gpc1 relative to the cell surface. Our results show that the C-terminal domain is highly flexible and extends ϳ35-40 Å from the core protein. Molecular dynamics simulations have shown that the GPI anchor of many glypiated proteins is highly flexible but that it nevertheless can retain the proteins at distances between 9 and 13 Å from the cell surface, with minor impact on the proteins' degrees of freedom for movement and orientation relative to the membrane (61,62). Accordingly, by adding the length of the GPI anchor to that of the C-terminal tail, we suggest that the Gpc1 core protein is located approximately ϳ44 -53 Å from the cell membrane. This distance is likely sufficient for HS assembly enzymes or even a membrane receptor candidate to interact with the membrane-proximal surface of Gpc1. It seems likely that Gpc1 "lies down" in a transverse orientation to the membrane, with the Gpc1 orthologs' evolutionarily conserved surface (L1 and L3 loops, ␣4, ␣5, and ␣14 helices) facing the cell surface. No evolutionary conservation has been detected for the other surfaces of Gpc1 (Fig. 10). Evolutionary conservation of these surfaceexposed residues suggests that they are implicated in interaction with other macromolecules related to Gpc1 function. In the Drosophila glypican Dally-like core protein (lacking the HS chains), structure-guided mutational evidence suggested that helices ␣4 and ␣5 are important for mediating Hedgehog signaling (63), consistent with these being oriented toward the membrane. The flexibility of the C-terminal region presumably allows great freedom for Gpc1 and other glypicans to reorient to accommodate binding to receptors and other signaling molecules, generally with the participation of the HS chains.
Bioinformatics tools predict a highly disordered structure for the anionic linker (10 residues) between the folded core protein and the GAG attachment site (at least 10 -15 Å). Consequently, we can hypothesize that the HS attachment sites are located more than 30 Å from the membrane, and therefore they could mediate the interaction of glypicans with other cell surface proteins. Previous work has reported that the HS chains do not stabilize, and probably do not interact with, the Gpc1 core protein (38), which is consistent with the extended C-terminal structure that we observe. However, the HS chains on Gpc1 protect the protein against irreversible aggregation, probably by providing the protein additional negative charge, resulting in electrostatic repulsion.
Hence, we predict that the anionic HS chains extend from the C-terminal region in a plane initially approximately parallel to the membrane, then more divergent from it, to avoid FIGURE 9. SAXS characterization of nonglycosylated Gpc1 variants. A, SDS-polyacrylamide gel of different expressed Gpc1 mutant with and without disruption of the glycosylation sites before and after the enzymatic removal of HS chains. B, SAXS patterns with the DAMMIN fit of Gpc1-WT-dHS (red) and Gpc1-N116Q-dHS (green), with their corresponding Guinier plots as an inset. C, intraparticle distance distribution P(r) of Gpc1-WT-dHS (red) and Gpc1-N116Q-dHS (green) with their calculated filtered averaged ab initio models.
repulsive interactions with the negatively charged phospholipid bilayer (Fig. 10). Glypican HS chains can contain 50 -150 repeated disaccharides. Recent work indicates that short 12-disaccharide HS polymers are extended in solution, with a partly bent conformation, up to 10 nm in length (64). Therefore, in mature Gpc1, the HS chains could be extended to 3-10 times as long as the compact cylinder of the core protein (40 -130 nm). Therefore, Gpc1 could bind to the signaling molecules directly via its core protein, through its extended HS side chains, or both. To what extent the HS chains might contribute to the conformation or stability of the C-terminal region requires further investigation.
The Asn-116-linked oligosaccharides protrude from the membrane-distal surface of the core protein, which limits their potential interactions with the lipid bilayer and the HS. In contrast, the bulky negatively charged Asn-79-glycans are directed perpendicularly to the evolutionarily conserved membraneproximal Gpc1 surface (Fig. 10).
Membrane-proximal Surface on Gpc1 and HS Assembly-Glypiated proteins are often associated into ordered microdomains of the membrane called lipid rafts. However, it has been shown that the GPI anchor does not target the glypiated protein to the apical surface of epithelial cells, although the N-glycans do (65). Thus, it is reasonable to suggest that the evolutionarily conserved surface of the core protein and/or the Asn-79 glycans have a dual role; they assist both in targeting and associating the Gpc1 protein to the consensus lipid raft domains of the cell surface and in GAG substitution on the C-terminal domain.
Gpc1 is exclusively substituted with HS, whereas other glypicans like Gpc5 possess both HS and CS (66). Chen et al. (67) concluded that the core protein plays a vital role in directing the assembly of HS rather than CS on rat Gpc1, as expression of the GAG attachment domain without the core protein results in substitution with ϳ90% CS. Moreover, mutational analysis shows that the L3 loop, ␣14, and other nearby parts of the core protein are required for preferential HS assembly (67). Previous work and the present data both confirm that the N-glycosylations of Gpc1 are not involved in GAG class determination but do affect the amount of HS substitution and also chain elongation (13). Accordingly, parts or all of the surface-conserved elements (L1, L3, ␣4, ␣5, and ␣14) would be predicted to be involved in GAG class determination and synthesis by interacting with some components of the GAG biosynthetic pathway, e.g. glycosyltransferases, necessary cofactors for the enzyme activity or some components required for trafficking the protein within the Golgi apparatus to regulate the HS substitution on Gpc1. HS biosynthesis is mediated by Golgi apparatus transmembrane glycosyltransferases of the exostosin (EXT) family, which initiate, elongate, and terminate HS backbone formation (68). Five members have been identified in mammals, including EXT1, EXT2, EXTL1, EXTL2, and EXTL3. In general, the published results so far suggest that EXTL3 works as an initiator of HS chain biosynthesis, as no HS was detected in 9-day-old mouse embryos lacking EXTL3 (69). Future work would be of importance to determine precisely how the Gpc1 core protein regulates HS class determination and assembly. This could be by searching for the Gpc1 interacting partner from the exostosin family and in particular by studying its affinity and interaction with EXTL3 as an HS initiator enzyme. Furthermore, systematic mutagenesis studies of the evolutionarily conserved surface structural elements would be helpful to map more distinctly the residues of the Gpc1 core protein that might be involved in preferential HS assembly. FIGURE 10. Predicted topology of Gpc1 on the membrane. A, schematic overview of the Gpc1 structure represented in rainbow colors (blue, N terminus; red, C terminus) with N-glycans as red spheres. The model is aligned with a transparent surface showing the sequence conservation of surface residues, colored as in the inlaid figures. The inset smaller figures show the structural conservation of surface-exposed residues in Gpc1 orthologs (Gpc1 human, P35052; Gpc1 mouse, Q9QZF2; Gpc1 rat, P35052; Gpc1 bovine; Q2KJ65; Gpc1 chick, P50593; Gpc1 zebrafish, Q1LXM6; Turkey, G1MQ88; Gpc1 chimpanzee, K7BMY8) as generated using ConSurf server (70) and presented by PyMOL as spheres colored from turquoise (variable) to purple (conserved). B, supposed spatial orientation of Gpc1 on the cell membrane. The membrane is shown as a gray lipid bilayer with orange GPI anchor connected to the Gpc1 C terminus and carried three chains of HS (blue).