Crystal Structure of the Third Extracellular Domain of CD5 Reveals the Fold of a Group B Scavenger Cysteine-rich Receptor Domain*

Scavenger receptor cysteine-rich (SRCR) domains are ancient protein modules widely found among cell surface and secreted proteins of the innate and adaptive immune system, where they mediate ligand binding. We have solved the crystal structure at 2.2 Å of resolution of the SRCR CD5 domain III, a human lymphocyte receptor involved in the modulation of antigen specific receptor-mediated T cell activation and differentiation signals. The first structure of a member of a group B SRCR domain reveals the fold of this ancient protein module into a central core formed by two antiparallel β-sheets and one α-helix, illustrating the conserved core at the protein level of genes coding for group A and B members of the SRCR superfamily. The novel SRCR group B structure permits the interpretation of site-directed mutagenesis data on the binding of activated leukocyte cell adhesion molecule (ALCAM/CD166) binding to CD6, a closely related lymphocyte receptor homologue to CD5.

The SRCR 3 domain is an ancient and highly conserved protein module that defines a superfamily of either soluble or membrane-bound receptors expressed by hematopoietic and non-hematopoietic cells, at either embryonic or adult stages (1,2). The existence of two types of SRCR domains allows the division of the SRCR superfamily in two groups. Members of group A contain six cysteine residues and are encoded by no less than two exons, whereas those of group B contain eight cysteines and are encoded by a single exon. This different exonintron organization indicates that group A and B members might have evolved from different ancestral genes. In both cases all the cysteines are engaged in intradomain disulfide bridges. Group A members are usually present as multidomain mosaic proteins containing a single SRCR domain associated to other functional domains. On the contrary, group B members are mostly composed of tandem repeats of SRCR domains (1,2). Members of these groups are found in different animal species, from low invertebrates to high vertebrates. Although no common function has been defined for SRCR superfamily members, several evidences in conjunction with the high degree of structural and phylogenetic conservation of the SRCR family suggest an important role in basic homeostatic functions including innate immune defense.
Human CD5 and CD6 are lymphocyte surface receptors composed of group B SRCR domains. The contiguous location and the conserved structure of their genes suggest that they may have evolved from duplication of a primordial gene. They act as accessory molecules expressed in thymocytes, mature peripheral T lymphocytes, and a subpopulation of peripheral B lymphocytes (B-1a cells). B-1a cells have been related to the production of auto-antibodies, and this subpopulation is expanded in certain autoimmune diseases and in B-cell chronic lymphocytic leukemias (3,4).
Ligands of the SRCR superfamily have in general been difficult to identify despite their important biological function. Several reports have proposed a role for CD5 in cognate interactions between T and B cells (5) (6) from the reported interactions between CD5 and distinct B-cell surface proteins such as CD72 (7,8), gp40 -80 (6,9), Ig Vh framework region sequences (10,11), and gp150 (12). However, the ultimate nature of the CD5 ligand is still a controversial matter.
On the contrary it is widely accepted that CD6 specifically binds to ALCAM/CD166, a type I transmembrane protein belonging to the immunoglobulin superfamily (13,14). ALCAM/CD166 extracellular region consists of five Ig-like domains, and its identification as a ligand of CD6 has triggered extensive efforts to study this interaction. Biochemical studies have identified domain III (DIII) of CD6 and the domain I (DI) of ALCAM/CD166 as responsible for the interaction between these two molecules (15,16). Although site-directed mutagenesis studies have allowed the mapping of the residues responsible for this interaction (17)(18)(19), no detailed structural model of the CD6DIII-ALCAM/CD166DI interaction has been proposed due to the lack of structural information of the SRCR group B family.
Here we present the crystallographic structure of the human CD5 receptor DIII, the first structure solved of a group B SRCR family member. The structure unravels the network of disulfide bridges that form this folding module. The architecture of the CD5 membrane receptor could be modeled based on its domain composition. Besides, the high homology between CD5DIII and CD6DIII permits the use of our structure as a template for a CD6DIII model which sheds light onto the CD6-ALCAM/CD166 interaction.

EXPERIMENTAL PROCEDURES
CD5 Domain III Expression-The recombinant form of the human CD5DIII was expressed using an episomal expression system in human embryonic kidney cells (HEK 293-EBNA). These cells constitutively express the Epstein-Barr viral protein EBNA-1, allowing episomal replication of the pCEP-Pu vector, a kind gift from Drs T. Sasaki and R. Timpl (Max Planck Institute for Biochemistry, Martinsried, Germany) (20). The cDNA sequence coding the human CD5DIII was obtained by PCR amplification (12) with the 5Ј-GCCCCGCTAGCTTTCCAGC-CCAAGGTGCAG-3Ј (forward), and 5Ј-GTGGATCCTA-ATCCTGGCATGTGACAAAC-3Ј (reverse) primers. The forward primer incorporated the NheI restriction site, whereas the reverse primer incorporated a stop codon followed by the BamHI restriction site (underlined). The amplified PCR products were cloned into a digested pCEP-Pu vector, in-frame with the BM-40 leader sequence.
The resulting constructs were expressed into HEK 293-EBNA cells as described before (20). Briefly, HEK 293-EBNA cells in culture dishes were transfected using Lipofectamine and then grown in monolayers using DMEM:F-12 (1:1) plus Glutamax (Invitrogen) supplemented with fetal bovine serum (10%; Sigma), penicillin-streptomycin (100 units/ml and 100 g/ml; Invitrogen), Geneticin (250 g/ml; Invitrogen), and puromycin (2.5 g/ml; Sigma). For large scale protein production, cells in exponential growth phase were trypsinized and plated finally in Costar culture flasks (162 cm 2 ). The medium was exchanged every 48 h, and when 80% of the surface was covered, new media without fetal bovine serum was added. The expressed protein was collected in the culture media 3-4 days after this change.
Every 2 days the media was collected, pooled in 50-ml falcon tubes, and spun at 1000 ϫ g in a centrifuge for 5 min to discard cell debris. The clean supernatant was transferred into a new falcon tube, flash-frozen in liquid nitrogen, and stored at Ϫ80°C. Around 2.5-3 liters of supernatant containing around 200 -300 g/liter of protein were collected in 3-4 weeks. At that time cells started to detach from the surface, and the culture was stopped.
Protein Purification-CD5DIII was purified at 4°C by affinity chromatography using a mouse monoclonal antibody (83-C4) specific to CD5DIII (21). Around 4 liters of supernatant from the hybridoma-producing 83-C4 mAb were precipitated adding 50% (NH 4 ) 2 SO 4 with gentle stirring for 30 min on ice. Subsequently, the mixture was centrifuged at 6000 ϫ g for 30 min. The pellet containing the antibody was resuspended in 50 ml of PBS (0.137 M NaCl, 0.01 M Na 2 HPO 4 ⅐2H 2 O, 2.7 mM KCl, 2 mM KH 2 PO 4 ) and then extensively dialyzed against PBS to eliminate the residual (NH 4 ) 2 SO 4 . After dialysis the sample was filtered with a 0.22-m filter (Millipore). 83-C4 mAb was purified using a 5-ml HiTrap Protein G column (Amersham Biosciences) attached to an Á kta prime. The column was equilibrated with PBS, and the sample was loaded onto the column. After extensive washing with PBS, the 83-C4 mAb was eluted using 0.1 M glycine, pH 2.6. The tubes in the fraction collector contained 200 l of 1 M Tris, pH 9.0, to avoid denaturation. Fractions containing the antibody were pooled and dialyzed against 0.1 M NaHCO 3 , pH 8.3, plus 0.5 M NaCl. The usual yield was around 100 mg of 83-C4 mAb. Once the antibody was isolated, it was loaded onto a CNBr-Sepharose 4B resin (Amersham Biosciences) previously activated with several washes of 1 mM HCl and washed once with the coupling solution (0.1 M NaHCO 3 , pH 8.3, 0.5 M NaCl), and then the resin was mixed with the ligand and incubated overnight at 4°C using a rotating wheel. Finally the excess of 83-C4 mAb was washed out with coupling buffer, and the remaining active groups were blocked by incubation with 0.1 M Tris-HCl, pH 8.0, for 2 h. The resin was used to pack a column which was attached to an Á kta prime for the purification process. The supernatant from 2.5-3 liters of HEK 293-EBNA-CD5DIII culture was used for purification. This volume was concentrated at 4°C in an Amicon cell to 100 -200 ml and then loaded onto a 10-ml 83-C4 mAb column previously equilibrated with PBS. The sample was recirculated 10 times to improve binding. Subsequently, the column was washed with 10 volumes of PBS plus 0.5 M NaCl and 1% Nonidet P-40 and then with 10 volumes of PBS. Finally, the protein was eluted with 3.5 M MgCl 2 in PBS. All the chromatographic steps were performed at a 1.5 ml/min flow rate. The amount of purified protein depended on the collected supernatant and ranged from 200 to 500 g from 2.5-3 liters of culture. Finally the elution buffer was exchanged to 0.01 M HEPES, 0.2 M NaCl, pH 7.4, using a desalting column (Amersham Biosciences). The purified protein was concentrated to 4 mg/ml at 4°C using a Centricon device with a 10-kDa cut-off. The purity of the sample after the isolation process was tested by SDS-PAGE using Coomassie and silver staining (22). The mass spectrometry analysis of the purified CD5DIII revealed the protein exact molecular weight. Additional experiments in the presence of a reducing agent (dithiothreitol) showed the existence of four disulfide bridges involving the eight cysteines (Cys-1-Cys-4, Cys-2-Cys-7, Cys-3-Cys-8, and Cys-5-Cys-6) in the recombinant protein (data not shown). These data together with the sharp and disperse signals observed in the one-dimensional 1 H NMR spectrum recorded in a Bruker 600 MHz spectrometer indicated that the SRCR domain folding was well defined (data not shown). The production of good quality CD5DIII in HEK 293-EBNA cells contrasts with other attempts to express CD5 domains in a heterologous system (Pichia pastoris) (23), when this recombinant protein was used for NMR studies did not yield well defined spectra. Data Collection-All data were collected at cryogenic temperatures using synchrotron radiation at 100 K. The CD5DIII crystals were mounted and cryoprotected using 30% glycerol. Diffraction data were collected using synchrotron radiation at the X06SA beamline at the Swiss light source (Villigen). Diffraction data were recorded on a Mar225 CCD detector. A sulfur SAD data collection was performed using an inverse beam strategy to maximize the anomalous differences collecting intact Bijvoet pairs. A 1.80-Å wavelength was used to optimize the S anomalous signal. The diffraction intensities were processed with HKL2000 (24).
Structure Determination and Refinement-Reduced intensities were used to search for the S substructure. A resolution cut-off of 4.5 Å was applied during the substructure solution. Thus, the four disulfides were treated as super-sulfur atoms. Initially five super-sulfur positions were found with SnB (25) and SHELXD (26). These positions were fed into SHARP (27). After solvent flattening with SOLOMON (28), the initial 2.8 Å map, showing all the disulfides, was used for automatic model building using MAID (29). The model of CD5DIII was rebuilt and refined to 2.5 Å (PDB code 2OTT). A high resolution data set of a trigonal crystal form at 2.2 Å was used for refinement. The CD5DIII structure solved in the tetragonal space group was used as model in MOLREP (30) to solve the trigonal crystal structure by molecular replacement. ARP/wARP (31) and  were applied for automatic model building and refinement (Table 1) (PDB code 2JA4).

RESULTS AND DISCUSSION
Overall Structure Description-The structure was determined using the sulfur SAD method at 2.8 Å of resolution, refined to 2.5 Å in the same tetragonal space group and subsequently solved by molecular replacement at 2.2 Å in a trigonal space group (Table 1, Fig. 1). The first and last residues observed in the electron density are Ala-269 and Asp-369, a slightly longer polypeptide chain than the defined canonical group B SRCR. The extra seven amino acids (Ala-269 to Asn-275) at the N terminus comprise a well ordered loop that connects domains II and III (Fig. 1a). The CD5DIII domain adopts a dense fold inside a heart shape of approximate dimensions 28 ϫ 38 ϫ 35 Å. The structure core is formed by the association of helix ␣1 with a curved fourstranded antiparallel ␤-sheet, which includes ␤1, ␤2, ␤7, and ␤4 (Fig. 1, a  and b). The three additional ␤-strands (␤3, ␤5, and ␤6) form another antiparallel ␤-sheet at the bottom of the domain core. The helix ␣2 connects the end of ␤4 to ␤5, whereas ␣3 joins ␤5 and ␤6 (Fig. 1, a and b).
The central hydrophobic core is formed by the helix ␣1 in association with the concave face of the larger ␤ sheet and the upper part of the smaller one. This central hub of the CD5DIII forms a triangular cavity composed by strands ␤2, ␤7, and ␤3 occupied by a buried water molecule, which hydrogen- bonds to all three strands (Val-289, Leu-300, and Val-363 main chains). There are eight cysteines in CD5DIII SRCR engaged in disulfide bridges; Cys-285 and Cys-321 form a disulfide bridge that links the N terminus of ␤2 with the C terminus of ␣1, Cys-316 -Cys-367 disulfide connects the helix ␣1 to the strand ␤7 located in the middle of the antiparallel ␤-sheet, the Cys-342-Cys-350 disulfide encloses the helix ␣3, and finally, Cys-301-Cys-360 bridge joins the end of the ␤3 strand with the N-terminal side of the final strand ␤7 (Fig. 1, a and b).
The solvent-exposed regions of CD5DIII show a special arrangement on the protein surface. Some residues (Lys-273, Arg-307, Arg-328, Lys-346, Lys-361, Arg-356) are organized, forming two basic parallel stripes running on the surface from the N terminus over ␣1 and the back side of the four-antiparallel-strand ␤-sheet and ending in the cavity formed by the three antiparallel ␤-sheet and ␣2 (Fig.  1c). The acidic residues are grouped  in two opposite sites of the molecule. The first acidic area is formed by the loop that joins ␤1 and ␤2 together with the end of ␤7, the strand that connects the extracellular domain to the transmembrane helix. It also extends through a cavity formed by the N terminus of ␤1 and one side of the helix ␣1 (Glu-314 and Glu-318). The second area is located on the back side of ␣2 together with the loop residues that connect the helix to ␤4 and ␤5 (Asp-331 and Asp-334).
Comparison of the SRCR Family Group A and Group B Domains-The main differences between the members of the SRCR group A and B families reside both in the number of cysteines engaged in disulfide bridges (Fig. 2) and their domain composition (2). Nowadays there is scant structural information regarding the two groups. The structures of M2bp (35) and hepsin SRCR domain (36) are the only representatives of group A. Our structure provides the first structural information of group B and contributes to a better understanding of this ancient superfamily. The DALI server (37) was used to search the data base for structures with a fold similar to the CD5DIII structure. This search only retrieved the structures of M2bp and Hepsin. A close comparison of the three structures reveals that although the root mean square deviation differences are significant (CD5DIII-M2bp: 2.4 Å, for 87 matching C␣-atoms; CD5DIII-Hepsin: 2.6 Å, for 79 matching C␣-atoms), they have a similar fold with most of the secondary elements conserved. The main differences are observed at the connecting loops (Fig.  3). The superposition of the structures shows that the first four ␤-strands and ␣-helix follow a parallel track. From that point onward, the structures start to diverge, converging again along the last ␤-strand. Remarkably, the position of the water molecule that joins strands ␤2, ␤7, and ␤3 by interacting with the main chain atoms from one residue of each strand is preserved in the three structures.
Some of the loops are shortened or simply not present in CD5 (Figs. 2 and 3); this combination gives a somewhat more compact structure since they are located at the surface of the three molecules. Fig. 3, b and c, show the two main differences, which are probably produced by the differences in the disulfides positions. The first is located at the back of the molecules where the loops form a tight turn with different degrees of bending depending on the molecule (Fig. 3b). M2bp is the more curved, Hepsin occupies an intermediate position, and CD5 is extended. All start at the end of the fourth ␤-strand, and whereas CD5 and Hepsin finish leading the fifth ␤-strand, Mb2p does it into a disordered loop. The second loop differences are more striking (Fig. 3c). The three structures present a long loop protruding from one face of the ␤-sheet that ends facing the opposite ␣-helix. This arrangement forms a depression of ϳ18 Å length with a small gap of 5 Å between residues forming the tip of their loops and the ␣-helix. The partial deletion of the tip of the CD5DIII loop with respect to the others is compensated by the presence of larger residues so that the opening distance is conserved.
Only one of the four disulfides stabilizing the structure of CD5DIII (Cys-367-Cys-316) conserves its position in Hepsin and M2bp. Although the Cys-360 -Cys-301 and Cys-342-Cys-350 disulfides are present in M2bp and Hepsin, their positions are not equivalent between the group A and B SRCR domains. However, the fourth disulfide bridge, exclusive of the group B SRCR (Cys-285-Cys-321) constrains the position of the C terminus, which joins the extracellular domain with the membrane.
To date the single common characteristic among the different SRCR domain-containing proteins, independently of their classification as group A or B, is that they are all extracellular. This group of molecules seems to be as diverse as the Ig superfamily of proteins without a unifying function. This seems to be the situation in the SRCR superfamily where independently of the protein function both the group A and B domains have evolved toward a similar scaffold with small variations depending on the function.
Orientation of CD5 Extracellular Region with Respect to the Plasma Membrane-The crystal structure suggests how the extracellular part of the CD5 receptor could be oriented with respect to the plasma membrane. The C-terminal residue Asp-369 is only 8 residues away from the predicted transmembrane region of CD5. This short stretch imposes strong restraints on the spatial relation between the extracellular portion of CD5 and its transmembrane helix. Therefore, the strand ␤7 of CD5DIII, which contains its C terminus, should be oriented nearly perpendicular to the membrane plane (Fig. 4a). The proposed membrane-spanning helix consists of 22 amino acids (Fig. 1a, upper panel); consequently, the length indicates that CD5 contains a single transmembrane helix that anchors the receptor. Our structure also includes residues that belong to the linker and the N terminus of CD5DII, indicating that the rest of the extracellular part of CD5 is located on one side of DIII, leaving the upper part exposed. Thus, considering the dimensions of CD5DIII, the extracellular region of the receptor should protrude from the plasma membrane around 90 -100 Å. This organization is similar to the arrangement of the asymmetric unit in the tetragonal crystal form.
The CD6-ALCAM/CD166 Interaction-Previous studies of the CD6-ALCAM/CD166 interaction were based on the structural similarities of CD6DIII with M2bp, a SRCR group A member with unrelated biological function (19). Using our CD5DIII structure, a typical SRCR domain group B member, we have produced a structural model for CD6DIII (38) based on this structural and functional homologue (39) (Fig. 2). Experimental restraints based in antibody mapping and site-directed mutagenesis data in CD6 and ALCAM/CD166 (17,18) were used to dock (33) the two proteins and model their interaction. The contact surface is highly complementary, and no bad contacts were detected ( Fig. 4b and supplemental Figs. 1 and 2). The ALCAM/CD166 residues that were identified by site-directed mutagenesis were mainly found in the interface with a, arrangement of CD5 extracellular region with respect to the plasma membrane (see "Results and Discussion" for details). b, detailed view of the CD6DIII-ALCAMDI interaction. CD6 and ALCAM surfaces are colored in blue and dark red, respectively. The residues involved in polar contacts are colored in green, and hydrophobic interactions are in orange (for a detailed scheme, see supplemental Fig. 1). The model shows a high degree of surface complementarity (see supplemental Fig. 2). c, molecular architecture of the CD6-ALCAM/CD166 model in a T-cell antigen-presenting cell interaction.
CD6 (Phe-26, Phe-40, Lys-28, Phe-43, Lys-48, Asp-54) (Fig. 4b), whereas some of the CD6 residues were found out of the interaction area. This could be due to the fact that antibody epitopes overlapped with the CD6 surface involved in ALCAM/CD166 interaction imposing a large uncertainty in the identification of the residues. Some of them, such as Tyr-327 and Gln-352, seem to be important for protein folding, but they are located away from the interaction surface. Noteworthy are the contacts of ALCAM/CD166 Lys-28 and Asp-53 with the CD6 side chains of Glu-293, Asp-291, Ser-351, and the contacts of ALCAM/ CD166 Lys-48 and Ser-50 with CD6 side and main chains of Asn-346 and Asn-345 (Fig. 4b). The rest of the ALCAM/CD166 and CD6 residues build the interaction surface by hydrophobic contacts except Ser-290 and Thr-47, which are located in the middle of the interface. The buried surface area is 695 Å 2 , which is in the range of other typical protein-protein associations.
The orientation of CD6DIII with respect to the membrane (Fig. 4c) must be similar to its homologue CD5DIII. Therefore, the location of the extracellular domains, DII and DI, leaves the interaction area exposed to the extracellular media, allowing the interaction with the N-terminal IgV-like domain from ALCAM/CD166. The localization of CD6 and ALCAM/CD166 at the central supramolecular activation clusters formed at the immunological synapse indicates that the distance spanned by the interaction between CD6 and ALCAM/CD166 must be similar to that spanned by the T-cell antigen receptor-major histocompatibility complex (150 Å) interaction (40). Otherwise, it would be physically impossible for the CD6-ALCAM/ CD166 pair to redistribute and to co-localize with T-cell antigen receptor/CD3 (39). Our model of CD6-ALCAM/CD166 interaction, in which the ALCAM V 1 domain embraces the CD6DIII domain, is compatible with that assumption (39). The length of the CD6 receptor should be similar to CD5 (see previous section); thus, the interaction of CD6DIII with ALCAM/ CD166 must occur in an intercellular distance around 150 Å, and the position of the N and C termini of both molecules in the complex would link opposing cell surfaces.
CD5 and CD6 are closely related molecules involved in the modulation of antigen-receptor-induced T cell activation and differentiation. Based on the first SRCR group B domain structure, we modeled the CD6DIII-ALCAM/CD166 association addressing the molecular basis of the T-B or T-dendritic cell interactions. The high homology between CD5 and CD6 SRCR domains (Fig. 2) suggests that a similar mode of interaction could be the basis of the CD5DII-IgV H association, which has been proposed to affect the maintenance and selective expansion of normal and malignant human B cells (11). The available structures of the SRCR domains in conjunction with the reported biological interactions indicate that this ancient scaffold can be adapted to associate with other extracellular domains.