Crystal Structure of the Carbohydrate Recognition Domain of p58/ERGIC-53, a Protein Involved in Glycoprotein Export from

p58/ERGIC-53 is an animal calcium-dependent lectin that cycles between the endoplasmic reticulum (ER) and the Golgi complex and appears to act as a cargo receptor for a subset of soluble glycoproteins exported from the ER. We have determined the crystal structure of the carbohydrate recognition domain (CRD) of p58, the rat homologue of human ERGIC-53, to 1.46 A resolution. The fold and ligand binding site are most similar to those of leguminous lectins. The structure also resembles that of the CRD of the ER folding chaperone calnexin and the neurexins, a family of non-lectin proteins expressed on neurons. The CRD comprises one concave and one convex beta-sheet packed into a beta-sandwich. The ligand binding site resides in a negatively charged cleft formed by conserved residues. A large surface patch of conserved residues with a putative role in protein-protein interactions and oligomerization lies on the opposite side of the ligand binding site. Together with previous functional data, the structure defines a new and expanding class of calcium-dependent animal lectins and provides a starting point for the understanding of glycoprotein sorting between the ER and the Golgi.

Secretory and membrane proteins undergo a quality control process that assures their proper folding, oligomerization, and maturation before exit from the endoplasmic reticulum (ER) 1 (1). This is followed by the critical step of selection of cargo proteins to be exported from the ER to the Golgi complex via the ERGIC (ER-Golgi intermediate compartment). Export is then mediated by COPII vesicles (2). Although many proteins may become incorporated into COPII vesicles by default (bulkflow transport), it is now generally believed that export is an active regulated process, whereby cargo molecules are first localized to ER exit sites and then selectively incorporated into COPII vesicles by one of several mechanisms (2,3). Selective export of soluble proteins may occur by interaction with export receptors harboring motifs recognized by COPII coatomers.
The p58 and ERGIC-53/MR60 proteins are the most commonly used markers for the ERGIC (4,5). p58 (4) and ERGIC-53 (5) were originally identified as proteins reacting with antibodies prepared against Golgi membrane fractions, whereas MR60 was identified as a mannose-binding protein (6). Subsequent cDNA cloning showed that ERGIC-53 (7) and MR60 (8) were identical to each other and represented the human homologue of the rat p58 protein (9).
p58/ERGIC-53/MR60 is a type I transmembrane, nonglycosylated protein with a lumenal domain, a transmembrane domain, and a short cytoplasmic domain. In cells, it is present as dimers and hexamers (9,10). The lumenal domain can be divided into two subdomains, an N-terminal carbohydrate recognition domain (CRD) (residues 31-285) and a membraneproximal ␣-helical coiled domain (residues 290 -460) (10). The cytoplasmic tail contains a KKFF sequence at its extreme C terminus that is essential for both ER exit and Golgi retrieval (11), meaning that p58/ERGIC-53/MR60 cycles between the ER and Golgi compartment (12).
The identification of MR60 as a mannose-binding protein (8) and mutagenesis studies of ERGIC-53, in which the calciumdependent binding of mannose to this protein was abolished (13), support the suggestion that p58/ERGIC-53 serves as a mammalian intracellular lectin (14). This proposition was also based on the similarity between ERGIC-53 and VIP-36, a membrane protein isolated from Madin-Darby canine kidney cells (15), shown to bind mannose, and localized in the early secretory pathway (16,17). It was proposed that p58/ERGIC-53 and VIP-36 constituted a new class of animal lectins (14). Blocking the export of ERGIC-53 from the ER impaired but did not block export of the lysosomal enzyme cathepsin C and cathepsin Z-related protein (18). Furthermore, ERGIC-53 could be crosslinked to a cathepsin Z-related protein. The glycan structure binding to ERGIC-53 in the ER was suggested to be a ninemannose form (Man 9 ), i.e. the asparagine-linked core glycan from which three terminal glucose residues have been trimmed (19). Efficient secretion of coagulation factors V and VIII requires a functional ERGIC-53 (20), and mutations in the gene coding for ERGIC-53 cause a rare hereditary bleeding disorder, a combined deficiency of factors V and VIII (21). Taken together, these data strongly suggest that p58/ERGIC-53/MR60 functions as a lectin-like receptor involved in facilitating export from the ER of a subset of secretory glycoproteins (22).
As a first step toward a better understanding of how glycans on cargo molecules interact with the lectin receptor, we have determined the crystal structure of the CRD of p58. The structure reveals a link between leguminous and animal lectins and defines a new class of animal calcium-dependent lectins that function in the secretory pathway.

EXPERIMENTAL PROCEDURES
Protein Production-The CRD of rat p58/ERGIC-53 (residues 31-285) was defined by a combination of sequence alignment, secondary structure prediction, and limited proteolysis. The CRD (including the N-terminal signal sequence comprising residues 1-30 and a 6xHis tag) was produced in insect cells using a baculovirus vector and purified as described elsewhere (23). The region encompassing residues 286 -478, which is not present in the construct whose structure is described here, is the oligomerization domain of p58/ERGIC-53 and is required for dimerization and formation of hexamers (10). Therefore, the CRD is monomeric in solution as assayed by native gel electrophoresis and gel filtration chromatography (data not shown).
Crystallization-Crystals were grown by vapor diffusion from hanging drops containing equal volumes of a 10 mg/ml protein solution in 10 mM Tris-HCl, pH 7.5, 1 mM CaCl 2 , and well solution. The drops were equilibrated against 1 ml of well solution that consisted of 100 mM Na-HEPES, pH 7.25, 1.6 M Li 2 SO 4 , and 10 mM EDTA, as described previously (23). The crystals belong to the orthorhombic space group I222, with cell dimensions a ϭ 49.6 Å, b ϭ 86.1 Å, and c ϭ 128.1 Å, and they have one monomer in the asymmetric unit (23).
Data Collection-X-ray data used for the structure determination were collected on a MAR 300mm Image plate detector mounted on a Rigaku R200 x-ray generator, operating at 50 kV and 90 mA at 110 K. Crystals were transferred to a solution containing 1.2 M Li 2 SO 4 , 0.1 M Na-HEPES, pH 7.25, and 20% PEG400, allowed to equilibrate for a few seconds, and frozen in a stream of nitrogen. The high-resolution data set used for refinement was collected at beamline 711 at the MAX Laboratory (Lund, Sweden) synchrotron radiation source. The crystal was transferred to mother liquor containing 20% ethylene glycol before freezing. All data were processed with DENZO and SCALEPACK (24).
Structure Determination and Refinement-The structure was solved by multiple isomorphous replacement based on five heavy metal derivatives (Table I). Location of metal binding sites and phase calculations were performed using SOLVE (25) with a native data set collected at the home source as a reference. The initial electron density map calculated at 2.7 Å was of sufficient quality to allow automated model building of most residues in the structure and phase extension to 1.46 Å using wARP (26). This procedure was followed by multiple cycles of refinement in crystallography NMR software (CNS) (27) using a maximum likelihood target and bulk solvent correction and using model building in O (28). The final cycles of refinement were performed using Refmac5 (29) because it led to faster convergence and lower R factor and R free values. Crystallographic data have been deposited at the PDB (accession code 1GV9 for the coordinate entry and accession code R1GV9SF for the structure factors).

RESULTS AND DISCUSSION
Structure Determination of p58 -The crystal structure of the CRD of p58 was solved by multiple isomorphous replacement ( Table I). The initial electron density map was of sufficient quality to allow tracing of most residues in the structure. The model was subsequently refined to R work /R free values of 19.1 and 21.1%, respectively. All data collection, phasing, and refinement statistics are summarized in Table I. The final model contains residues 50 -277 of p58, 197 water molecules and 2 sulfate ions. The region corresponding to residues 1-30 constitutes the signal sequence, which is cleaved upon secretion of the protein into the medium and is therefore not present in the mature form of the protein.  and the 6xHis tag inserted between residues 34 and 35 (23) are not visible in the electron density maps. There is also no density for the final 8 residues of the construct and for residues 165-169 of the p58 sequence, which presumably are part of a flexible loop.
Overall Fold-The CRD domain of p58 has an overall globular shape and is composed of 15 ␤-strands, a small ␣ helix, and  Three-dimensional Structure of p58/ERGIC-53 one turn of 3 10 helix (Fig. 1). Two major twisted antiparallel ␤-sheets, one seven-stranded (major) ␤-sheet, and one sixstranded (minor) ␤ sheet pack against each other, forming a ␤-sandwich, in a variation of the jelly roll fold. The N terminus starts with two short ␤-strands (␤1a and ␤1b) separated by a turn of 3 10 helix. This structural motif is replaced by a single and longer strand in other lectin structures; therefore, we refer to it as ␤1 in p58. This is the first strand of the minor ␤-sheet. It is followed by a long loop and the first strand (␤2) of the major ␤-sheet. A ␤-hairpin (strands ␤3 and ␤4) is inserted between ␤2 and the second strand of the major ␤-sheet (␤5). From here, the chain makes an excursion into the opposite (minor) ␤-sheet, contributing one strand (␤6) before returning to the major ␤-sheet and forming four antiparallel strands (␤7Ϫ␤10). A loop followed by an ␣ helix of two turns is inserted between ␤9 and ␤10. The chain then crosses over to the minor ␤-sheet, contributing three antiparallel strands (␤11Ϫ␤13). The last pair of strands is split between the two sheets, with ␤14 in the major ␤-sheet and ␤15 in the minor ␤-sheet. The N and C termini of the polypeptide chain are close to each other in space.
The two ␤-sheets are curved, giving rise to a concave surface of the molecule on the side of the major ␤-sheet and a convex surface on the side of the minor ␤-sheet. A cleft is formed by a 15-residue-long loop between strands ␤7 and ␤8 of the major ␤-sheet, by the ␣ helix and the loop preceding it. Residues Cys 198 (strand ␤-10, major ␤-sheet) and Cys 238 (strand ␤13, minor ␤-sheet) form a disulfide bond. Two peptide bonds are observed in cis-conformation: one between residues Ala 128 and Asp 129 at the entrance of strand 9, and the other between Gly 62 and Pro 63 at the beginning of the loop joining strands ␤1b and ␤2.
Based on their weak homology to plant lectins, it has been proposed (14) that ERGIC-53 and VIP-36 define a new class of animal lectins in the secretory pathway. It is thought that these proteins function as sorting receptors for glycoproteins exiting the ER (22). Orthologues of this gene have been found in several organisms. Moreover, two other genes named ER-GIC-53-like (ERGL) and GP36b have recently been identified  as displaying significant homology to p58/ERGIC-53 (30), 2 suggesting that their products might also act as cargo receptors for glycoproteins. The fold observed here is most likely conserved in all ERGIC-53-like proteins recognized thus far because they share significant sequence identity (25-35%) and align very well in the region spanning the CRD (Fig. 2). Structural Similarity of p58 to Other Lectins-Comparisons of the structure described here against the PDB data base using the DALI server (32) and the program TOP (33) revealed that the CRD of p58 is structurally most similar to the leguminous lectins (Table II). Most leguminous lectins have a core structure composed of a ␤-sandwich with a concave face comprising seven ␤-strands and a convex face comprising six ␤-strands (Table II). Despite the fact that the sequence identities between p58/ERGIC-53 and leguminous lectins are generally Ͻ20%, the core of these structures shares the same basic architecture, and the secondary structure elements of p58 superimpose quite well with those of the leguminous lectins. The sugar binding sites in these lectins are all located on the concave ␤-sheet, with most of the residues that participate in ligand binding coming from the loops between strands at the top of the sheet.
Similarity is also found between p58 and other animal proteins, including lectins such as calnexin and galectin-3, as well as the ligand-binding domain of neurexin 1␤ (Table II). Calnexin is also a calcium-dependent lectin that resides in the ER and is involved in quality control mechanisms (34). region. Despite the low sequence identity between the two proteins (Table II), 2 aspartate residues, which coordinate Ca 2ϩ in the leguminous lectin structures, are conserved and occupy similar positions in both animal lectin structures. Galectin-3 belongs to a family of calcium-independent animal lectins that are predominantly cytoplasmic (35). Compared with p58, the ␤-sheets and loops of galectin-3 are smaller, despite their similar arrangement. Neurexin 1␤ belongs to a family of proteins expressed in hundreds of isoforms in neuronal tissues and thought to function as cell recognition molecules. Although ligands for these molecules have still not been identified, it has been proposed that they use the same binding fold and surface as the leguminous lectins to interact with cell surface molecules (36). However, p58 and neurexin 1␤ superimpose poorly in the region of the putative ligand binding site, and there is no similarity between residues thought to be involved in ligand binding in p58/ERGIC-53 and the corresponding residues of neurexin 1␤.
Evolutionary Conservation of p58/ERGIC-53-like Domains-Alignment of p58/ERGIC-53 and related sequences from different organisms within the animal kingdom reveals a high degree of sequence identity (Fig. 2). The sequence conservation is well distributed throughout the polypeptide chain. Highly conserved residues include: (i) tryptophans (Trp 51 , Trp 78 , Trp 110 , and Trp 128 ) forming a hydrophobic ladder that runs through the hydrophobic core of the protein, (ii) glycines in loops between the ␤-strands, (iii) the two disulfide-bonded cysteines, and (iv) the proline observed in cis-conformation in p58/ERGIC-53 and conserved in all sequences of this domain family. It is therefore likely that the overall structure observed here is conserved in other p58/ERGIC-53-like domains. Two patches of conserved residues on opposite sides of the molecule are evident, indicating that conserved residues are not confined to the hydrophobic core of the protein (Fig. 3).
Putative Ligand Binding Site-The oligomeric form of p58/ ERGIC-53 binds to mannose-substituted (Man 1 ) resins, although mannose monosaccharide is probably not the ligand of p58 in vivo (19). Rather, it is thought that p58/ERGIC-53 recognizes Man 9 on glycoproteins (6,22). Mutagenesis studies have implicated 2 residues, Asp 129 and Asn 164 , to be required for binding of ERGIC-53 to mannose-substituted resins (13). Due to the presence of EDTA in the crystallization conditions, no Ca 2ϩ ions are observed in the putative ligand binding site of the p58 structure when compared with those of leguminous lectins. However, similarities are observed between these binding site structures (Fig. 4): residues Asp 129 and Asp 160 in p58 are in positions similar to those of the equivalent Asp 81 and Asp 121 from the Lathyrus ochrus and pea lectin structures in complex with mannose (PDB codes 1RIN and 1LOB, respectively; hereafter, we refer to these two proteins by their PDB codes) (37,38). Both these residues coordinate the Ca 2ϩ ion, and Asp 81 also binds mannose in these complexes. The peptide bond between residues Ala 128 and Asp 129 is in the cis-conformation in p58, as in the leguminous lectins. This is essential for the correct geometry of the Ca 2ϩ binding site and for sugar binding in these lectins (37).
Major differences observed between the binding sites from p58, 1LOB, and 1RIN include: (i) a different conformation of Asn 170 in p58 as compared with the equivalent Asp 129 in 1RIN and 1LOB, due mostly to a disordered region between residues 165 and 169 of p58 (Fig. 1), (ii) positioning of the side chain of Asn 164 ϳ12.0 Å away from the equivalent Asn 125 in 1RIN and 1LOB (Fig. 4), and (iii) substitution of residues Glu 119 and His 136 , which are involved in Mn 2ϩ coordination in leguminous lectins, by Phe 158 and Ala 172 in p58, which probably renders it unable to bind Mn 2ϩ ions. Of the other residues of 1LOB and 1RIN that interact with mannose, Ala 210 and Glu 211 superimpose well on the equivalent residues Gly 259 and Gly 260 from p58, whereas the third residue is part of a loop and is located around 6 Å from its counterpart in 1LOB and 1RIN.
Functional Implications for Cargo Recognition-Recent data indicate that ERGIC-53 acts as a cargo receptor for a subset of glycoproteins through recognition of mannose residues in their sugar moieties (19,22). Because this is a selective process, and all glycoproteins have identical sugar structures while still in the ER, specific recognition of a small subset of glycoproteins is most likely also dependent in part on characteristics intrinsic to the recognized proteins other than their sugar structure. Electrostatic surface charge calculations show a marked charge polarity on the surface of p58. A deep, negatively charged pocket is situated adjacent to the residues that are thought to be involved in calcium coordination and mannose binding (Fig.  3A). The presence of a Ca 2ϩ ion would reduce the negative charge in this pocket, but electrostatic calculations suggest that it would still have a predominantly negative character (data not shown). The sequence conservation in the vicinity as well as inside of the negatively charged pocket is quite strong (Fig. 3B), suggesting that this region and its intrinsic electrostatic character are important for protein-protein interactions.
On the side of p58 opposite to the ligand binding site, there is a surface patch made up of residues conserved only in p58/ ERGIC-53 orthologues (Fig. 3D). It maps to the convex ␤-sheet in p58 and has well-balanced charge distribution (Fig. 3C). Due to its conservation only in p58/ERGIC-53 orthologues, it may be involved in protein-protein interactions that are unique to p58/ ERGIC-53, such as receptor-cargo binding or contacting neighboring molecules in the formation of oligomers. Cargo proteins might use two different and opposing faces of the p58 ␤-sandwich for complex formation. Whereas the carbohydrate moiety interacts with the negatively charged pocket on one side of p58/ERGIC-53, another region, conferring specificity for the cargo protein, could bind to the conserved surface patch on the other side of p58/ERGIC-53. This would be reminiscent of the interactions between calnexin and its substrates using both its arm and lectin domains (34). p58/ERGIC-53 oligomerizes into dimers and hexamers in contrast to other members of this family, which are monomeric. Therefore, involvement of this surface patch in oligomerization would also explain its conservation only in p58/ERGIC-53 orthologues.
In summary, we have demonstrated that the CRD of p58/ ERGIC-53 is an example of the utilization of the plant lectin fold within the quality control system in the ER. The fold of the structure reported here is a versatile scaffold for carbohydratemediated ligand-receptor interactions. The structural conservation between the CRD domain of p58 and calnexin highlights the importance of quality control mechanisms for the proper maturation of proteins in the ER. It is therefore likely that this domain was present in evolution before the separation of the plant and animal kingdoms. The structure presented here, together with functional data on p58/ERGIC-53, suggests that the leguminous lectin fold is also used for Ca 2ϩ -dependent recognition of carbohydrate structures in animals, reinforcing the proposition that these proteins constitute a new class of animal calcium-dependent lectins.