The Crystal Structure of the Plexin-Semaphorin-Integrin Domain/Hybrid Domain/I-EGF1 Segment from the Human Integrin β2 Subunit at 1.8-Å Resolution*

Integrins are modular (αβ) heterodimeric proteins that mediate cell adhesion and convey signals across the plasma membrane. Interdomain motions play a key role in signal transduction by propagating structural changes through the molecule, thus controlling the activation state and adhesive properties of the integrin. We expressed a soluble fragment of the human integrin β2 subunit comprising the plexin-semaphorin-integrin domain (PSI)/hybrid domain/I-EGF1 fragment and present its crystal structure at 1.8-Å resolution. The structure reveals an elongated molecule with a rigid architecture stabilized by nine disulfide bridges. The PSI domain is located centrally and participates in the formation of extended interfaces with the hybrid domain and I-EGF1 domains, respectively. The hybrid domain/PSI interface involves the burial of an Arg residue, and contacts between PSI and I-EGF1 are mainly mediated by well conserved Arg and Trp residues. Conservation of key interacting residues across the various integrin β subunits sequences suggests that our structure represents a good model for the entire integrin family. Superposition with the integrin β3 receptor in its bent conformation suggests that an articulation point is present at the linkage between its I-EGF1 and I-EGF2 modules and underlines the importance of this region for the control of integrin-mediated cell adhesion.

Integrins are heterodimeric adhesion proteins that transmit signals across the plasma membrane in both directions, thus serving as communication molecules between the cytoskeleton and the extracellular environment (1). Integrin ␣ and ␤ subunits associate non-covalently forming a "head" at one extremity of the molecule that binds ligands and two leglike extensions, each with a single pass transmembrane helix that anchors the integrin molecule to the plasma membrane, followed by two short cytoplasmic C-terminal tails (2). The crystal structure of the ectodomain of the integrin ␣ V ␤ 3 revealed its modular architecture (3,4). Interaction between the ␤-propeller of the ␣ subunit and the I-like domain (also referred to as ␤A domain) of the ␤ subunit forms the head that contains the Arg-Gly-Asp (RGD) ligand binding site (5). In addition, the ␣ subunit leg contains three ␤-sandwich domains. The ␤ subunit leg comprises a hybrid domain, a PSI 1 domain, and four I-EGFlike repeats followed by a ␤-tail domain at its C-terminal end ( Fig. 1). In the ␣ V ␤ 3 integrin crystal structure, the I-EGF1 and I-EGF2 domains are not resolved, and only poor electron density is present for the I-EGF3 domain with a very high average temperature factor (3). Thus, both the three-dimensional structure of the I-EGF1 and its interactions with the flanking domains of the hybrid, PSI, and I-EGF2 domains are currently unknown. Conformational changes have been proposed to play an important role in the regulation of integrin-mediated adhesion (for a recent review, see Ref. 6). Lateral clustering of integrins in the plasma membrane is an alternative model that has been put forward to account for the increase of their binding avidity for the ligands upon activation. These two models, however, are not mutually exclusive, and some coupling between these two integrin activation pathways has been suggested (7,8).
Quaternary structural changes have been compared with a "switchblade"-like motion, and a correlation between these conformers and the activation state, ligand affinity, and adhesive properties of the integrin molecules has been proposed (9 -11). This model for integrin activation is thus based on collective evidence from electron microscopy (12), x-ray, and NMR studies of integrin fragments of one or several domains (9,11,13); mutational studies; and the functional effects induced by monoclonal antibodies after binding to different regions of integrin molecules (6,14). Recently the x-ray structure of the ␣ IIb ␤ 3 integrin head fragment in complex with ligand mimetics suggested that ligand binding may involve the outward swing of the hybrid domain with respect to the I-like (␤A) domain and that the hybrid domain forms a rigid complex with the PSI domain (9). This is in agreement with previous observations from electron microscopic image analyses (12) and the exposure of antibody epitopes in the hybrid and PSI domains upon integrin activation (14 -16). In the absence of a structure for the I-EGF1 and I-EGF2 domains, however, it is not clear how the movement of the hybrid-PSI complex is connected to the leg of the ␤ subunit. To address this question, we expressed a soluble fragment of the integrin ␤ 2 subunit consisting of the PSI, hybrid, and I-EGF1 domains and present a high resolution structure of these three domains (hereafter abbreviated PHE1). The PSI domain is located centrally in the structure and participates in the formation of extended interfaces with both the hybrid domain and I-EGF1 domains, respectively. The structure brings an important missing piece to the picture of the extracellular portion of the integrin receptor by suggesting that these three domains act as a rigid unit and that an articulation point is located between the I-EGF1 and I-EGF2 modules of the integrin.

EXPERIMENTAL PROCEDURES
Protein Expression and Purification-The cDNA coding for PHE1 was constructed from the wild type human ␤ 2 cDNA by removing the region encoding residues Lys 101 -Asn 339 , corresponding to the ⍜-like domain insertion, and by adding a hexahistidine tag to facilitate purification followed by a stop codon after Glu 460 , which marks the C-terminal end of the I-EGF1 domain (17,18) (Fig. 1A). The PCR product was inserted into the pcDNA3 expression vector, and the plasmid was transfected into 293 cells. Stable cell lines were created by selection for G418 resistance. A single clone with high PHE1 expression was grown in Dulbecco's modified Eagle's culture medium (JRH Biosciences). Typically 1 liter of culture supernatant was concentrated in a stirred cell (Amicon-8200, Millipore) through a polyethersulfone membrane with a 5-kDa molecular mass cutoff to a final volume of 50 ml in buffer A (20 mM Tris-HCl, pH 8.0, 0.2 M NaCl, 10% glycerol). 1.5 ml of packed of nickel-nitrilotriacetic acid magnetic agarose beads (Qiagen) in a total of 5 ml was added to the sample, and the mixture was incubated for 2 h at 4°C and loaded onto an Econo column (Bio-Rad). After washing three times with a total of 15 ml of buffer A, the protein was eluted with the same buffer containing 0.5 M imidazole. The eluate was concentrated in buffer A to a final volume of 1 ml, and the sample was loaded onto a Sephacryl S-100 HR column (16 mm ϫ 60 cm) mounted on a fast protein liquid chromatography system (Amersham Biosciences). Approximately 2 mg of pure PHE1 protein (as assessed using SDS-PAGE) was obtained. The molecular mass of the sample was determined using a matrix-assisted laser desorption ionization time-of-flight mass spectrometer.
Crystallization and Data Collection-The protein was crystallized by vapor diffusion using the hanging drop method. The precipitating solution contained 18% polyethylene glycol 8000, 0.2 M calcium acetate, 0.1 M sodium cacodylate at pH 6.5. 2 l of the protein, concentrated by ultrafiltration to 20 mg/ml in 20 mM Tris-HCl at pH 8.0, were mixed with an equal volume of the precipitating solution, and the drop was equilibrated against a reservoir containing 1 ml of the precipitating solution at 18°C. After macroseeding, crystals grew as thin plates over 2-3 days to dimensions of ϳ0.05 ϫ 0.3 ϫ 0.3 mm 3 . For data collection, crystals were prepared for cryocooling by soaking in the crystallization buffer to which 25% glycerol (v/v) had been added, mounted, and cooled to 100 K in a nitrogen gas stream (Oxford Cryosystems). Native diffraction intensities from a single cryocooled crystal were recorded on an ADSC charge-coupled device detector on the ID14-4 beamline at the European Synchrotron Radiation Facility (Grenoble, France). A potassium tetrachloroplatinate (K 2 PtCl 4 ) derivative was collected on an Raxis IVϩϩ Image Plate detector using CuK ␣ radiation from a Micromax-007 rotating anode operating at 20 mA and 40 kV. Data were processed with the programs MOSFLM and SCALA (19). Crystal parameters and data collection statistics are summarized in Table I.
Structure Determination and Refinement-The structure was determined by the single isomorphous replacement method using anomalous scattering (SIRAS) from a K 2 PtCl 4 derivative. Two heavy atom binding sites were located using the program SOLVE (20), and an initial map was calculated after solvent flattening. This initial map allowed the tracing of most of the main chain atoms of the hybrid and PSI domains. Phases calculated from the partial model combined with the experimental SIRAS phases were used to calculate a new map that was modified using the program DM (19). The whole procedure was iterated, and the envelope was modified. Once about 70% of the residues had been traced, the program ARP/WARP (19) was used for final phase improvement and model building. Phasing and refinement statistics are presented in Table II. The model was refined by slow cool energy minimization and B-factor refinement protocols of the program CNS (22) with the Engh and Huber force field constants (23) using the maximum likelihood amplitude target. Manual inspection and correction of the model were made using the program O (24). The free R-factor was calculated from 5% of the measured unique data, randomly chosen, that were not included in the refinement. The coordinates and structure factors have been deposited in the Protein Data Bank with code 1YUK. Surface area calculations were carried out with the program AREAIMOL (19) with a radius of the probe sphere of 1.7 Å. Figs. 2 and 4 -6 were drawn using the program Pymol (written by Dr. Delano, available at pymol. sourceforge.net).

RESULTS
Protein Expression-The integrin ␤ 2 subunit can be expressed in the absence of any integrin ␣ subunits on COS-7 cells but can only be detected by monoclonal antibodies whose epitopes map to regions outside the I-like domain (15). This result suggests that the proper folding of the I-like (␤A) domain is the calculated heavy atom structure factor for acentric and centric reflections, respectively. e Phasing power is the r.m.s heavy atom structure factor divided by the r.m.s. lack of closure.
f Mean value of figure of merit before density modification and phase combination. Residues in most favored regions (%) 87.9 Residues in additional allowed regions (%) 11.5 Overall G factor c 0.22 a R factor ϭ ⌺ʈF obs ͉ Ϫ ͉F calc ʈ/⌺͉F obs ͉. b R free was calculated with 5% of reflections excluded from the whole refinement procedure. c G factor is the overall measure of structure quality from PRO-CHECK (21). requires the interaction with the ␤-propeller of the associated ␣ subunit of the integrin head. To avoid the complexity introduced by the I-like domain, we constructed a ␤ 2 subunit with the I-like domain deleted, ␤ 2 ⌬I. When transfected to COS-7 cells, surface expression was detected using a panel of monoclonal antibodies (15). As expected, the ␤ 2 ⌬I subunit failed to associate with the ␣ L subunit (15). The ␤ 2 ⌬I subunit was truncated after the I-EGF1 domain to yield the PHE1 fragment containing the PSI, hybrid, and I-EGF1 domains. A stably transfected 293 clone was obtained to express the PHE1 fragment. The protein recovered from the culture supernatant was shown to express the conformational epitopes of MEM148, 7E4, KIM202, KIM89, and H52 (Fig. 1B), suggesting that it adopts a native conformation. Mass spectrometry analysis of PHE1 revealed a single species with a molecular mass of 29,677.3 Da, a value in excess of 4,300 Da over the calculated molecular mass of the polypeptide chain. After treatment with peptide N-glycosidase F, the molecular mass was reduced to the calculated value of the polypeptide, suggesting that the excess mass was due to glycosylation (Fig. 1C) Quality of the Model-The final model refined at 1.80-Å resolution includes 58 residues of the PSI domain, 120 residues of the hybrid domain, and 34 residues of the I-EGF1 domain (Fig. 2). The 4 residues (Ala-Lys-Leu-Ser) that connect the hybrid domain to the N-and C-terminal portions of the I-like (␤A) domain insertion could not be traced in the electron density maps. Mass spectrometry and SDS-PAGE analysis of dissolved protein crystals rules out proteolytic cleavage at this position. Thus, these residues are presumably mobile. Likewise 5 residues in the glycine-rich loop connecting strands ␤X with ␤A of the hybrid domain are not visible in the electron density map and were not included in the current model. A total of 219 well defined water molecules as well as two N-linked GlcNAc residues ( Fig. 2A) and 2 histidine residues from the C-terminal His 6 tag were placed (Fig. 1A). All residues from the model fall into favorable or allowed regions of the Ramachandran diagram (Table I).
Overall Architecture-The PHE1 segment adopts an elongated rodlike structure with overall dimensions of 82 ϫ 30 ϫ 30 Å. The structure is organized around three ␤-strands, ␤X, ␤AЈ, and ␤G, which form a central platform with the middle strand ␤AЈ connecting back to strand ␤B in the hybrid domain, while strands ␤X and ␤G insert into the split PSI domain (see Figs. 1 and 2). The PSI domain is located centrally in the structure and makes extensive interactions with both the hybrid and I-EGF1 domains. Thus, the structure appears to be rather rigid and is further stabilized by nine disulfide bonds. A superposition of the PSI/hybrid tandem with the equivalent domains in the ␣ IIb ␤ 3 integrin structure (Protein Data Bank code 1TYE) gives a r.m.s. deviation of 2.2 Å for 169 equivalent C␣ atoms. This illustrates that the relative orientation is well preserved between the hybrid and PSI domains of ␤ 2 and ␤ 3 integrins, an observation in agreement with the good conservation of the residues that form the interface (see below).
The Hybrid Domain-The N-and C-terminal sequences flanking the I-like (␤A) domain of integrin ␤ 2 (Fig. 1A) cluster in space and form the hybrid domain that adopts a ␤-sandwich fold. In the PHE1 construct, the I-like (␤A) subunit is absent, and the 4 residues connecting to it are not resolved in the electron density map. Interestingly despite the absence of the whole I-like (␤A) subunit, the ␤ 2 hybrid domain presented here superimposes well with the corresponding domain of the ␤ 3 integrin with a r.m.s deviation of 1.6 Å for 115 equivalent C␣ atoms. This suggests that the I-like (␤A) subunit does not influence the conformation of the hybrid domain and that other protein modules could be inserted at the same position, opening the possibility of creating chimeric receptor molecules. The loop connecting the ␤X and ␤A strands is significantly shorter in integrin ␤ 2 compared with the corresponding loops in ␤ 1 , ␤ 3 , ␤ 5 , ␤ 6 , and ␤ 7 integrins (Figs. 2 and 3) and is presumably flexible. In the ␤ 3 integrin structure, the same loop is longer and is stabilized through contacts with residues from the I-like (␤A) domain (3,9).
The PSI Domain-The PSI domains of integrin ␤ 2 and ␤ 3 have similar structures (4,9) with a residual r.m.s. deviation between the two domains of 1.8 Å for 54 equivalent C␣ atoms after superposition. All four disulfide bridges, including the one between the second cysteine residue (Cys 11 in integrin ␤ 2 ) and the cysteine following the hybrid domain (Cys 425 ), are conserved and superimposable. However, differences exist between the two molecules. The N-terminal helix ␣1 present in ␤ 3 is missing in integrin ␤ 2 , and an additional 3/10 helix and an ␣-helix, ␣3a, are present ( Fig. 2A). An N-linked carbohydrate structure is present at Asn 28 currently modeled as a single GlcNAc residue, although there is indication of an additional branched sugar residue.
The I-EGF1 Domain-The I-EGF1 domain of the integrin ␤ subunits was not resolved in previous structural studies, and our data provide the first view at atomic resolution of this domain. Unlike the integrin-EGF domains 2, 3, and 4 as well as those of the TIED (ten ␤-integrin EGF-like repeat domains) protein (17,25) that have 8 cysteine residues, the I-EGF1 structure ␤ 2 subunit has only a total of 6 cysteines engaged in three disulfide bonds (Figs. 3 and 4). The structure of I-EGF3 domain of the ␤ 2 subunit was determined by NMR, and the 8 cysteines were shown to arrange in the C1-C5, C2-C4, C3-C6, C7-C8 pattern (11). Amino acid sequence alignment shows that I-EGF1 belongs to this group of EGF domains but with the C2-C4 disulfide pair missing and not to the laminin EGF domains that have the C1-C3, C2-C4, C5-C6, C7-C8 disulfide arrangement (17). A comparison with the I-EGF3 structure of the ␤ 2 integrin (11) and the I-EGF4 structure of the ␤ 3 integrin (3) reveals that the absence of the second and fourth cysteine is accompanied by a large movement of the polypeptide chain (including the basic patch formed by Arg 428 , Arg 432 , and Arg 434 ) that projects away from the central ␤-sheet formed by strands ␤1 and ␤2 (Fig. 4A). This is presumably partly due to the release of the structural constraint brought by the C2-C4 disulfide bridge in the I-EGF3 domains, which is absent in I-EGF1. Interestingly the closest structural homologue of I-EGF1 is the EGF domain of P-selectin (26) whose polypeptide chain follows a similar path with 33 equivalent residues superimposed with an r.m.s. deviation of 1.7 Å (Fig. 4A).
Interaction between Domains-The interactions between the PSI domain with the hybrid and I-EGF1 domains are stabilized through the formation of extensive interfaces of 815 and 1099 Å 2 , respectively. The interface between the PSI and hybrid domains in the integrin ␣ IIb ␤ 3 are similar with a comparable buried surface area of 860 Å 2 and the involvement of a highly invariant Arg residue in multiple contacts with main chain atoms of the PSI domain in both cases (Fig. 5). The side chain of Arg 86 (Arg 93 in integrin ␤ 3 (9)) is deeply buried in the PSI/ hybrid interface forming four hydrogen bonds with main chain atoms from Gly 18 , Pro 19 , Cys 21 , and Pro 59 . A water molecule trapped in the interface mediates an additional interaction between the carbonyl oxygen of the conserved Pro 87 with the PSI domain. Thus the hybrid and PSI domains are held in a rigid orientation with respect to each other in both the ␤ 2 and ␤ 3 integrins, and given the strict conservation of the Arg 86 residue, the relative orientation of these two domains is likely to be conserved in all integrins (Figs. 3 and 4). The PSI domain and I-EGF1 domains are also found to interact extensively. A list of interactions is given in Table III highly invariant among the ␤ integrins sequences (Fig. 3), suggesting that the interface between PSI and I-EGF1 may also be well conserved. The compactness of the structure is further strengthened by the formation of the Cys 11 -Cys 425 and Cys 427 -Cys 445 disulfide bonds that maintain the potentially mobile linker region 423-427 of the PSI domain in close proximity with the bulk of the PSI and I-EGF1 domains, respectively (Fig. 2). DISCUSSION Taken together, the structural data presented here suggest that the PSI, hybrid, and I-EGF1 domains represent a rather rigid unit that may act as a lever to orientate the I-like (␤A) domain with respect to the lower leg. However, the current evidence is partly based on the close match between the hybrid/PSI domains in only two experimentally determined structures (4,9), and more data are needed to resolve this issue. In contrast to the ␤ 3 integrin, the ␣-helix ␣1 is missing in our PHE1 ␤ 2 structure. Rather at its N terminus, the polypeptide chain of the ␤ 2 integrin PSI domain adopts an extended conformation. It would be interesting to see whether this conformation is preserved in the presence of an ␣ subunit partner and the flanking I-EGF domains. Indeed a number of structural modifications are expected in the PSI and I-EGF1 domains in the context of an ␣ subunit and the flanking I-EGF2 domain due to inter-and intrachain interactions. In particular, the assessment of the existence or not to the contacts between the PSI and hybrid domains are highlighted in light blue, and those between the PSI and I-EGF1 domains are in pink. The nine disulfide pairs are marked with arrows. The two N-glycosylation sites are indicated by red inverted traingles. Amino acid sequence GenBank TM accession numbers, labeled as ITB (integrin ␤), are as follows: ␤ 1 , 124963; ␤ 2 , 124966; ␤ 3 , 124968; ␤ 4 , 13638154; ␤ 5 , 124970; ␤ 6 , 13432176; ␤ 7 , 124973; ␤ 8 , 4504779. of a fourth ␤4 strand in the I-EGF1 module, which would form hydrogen bonds with the currently unpaired ␤3 strand (see Figs. 2 and 4), must await the structural determination of an integrin fragment comprising both the I-EGF1 and I-EGF2 domains. Although the disulfide linkages are conserved in the integrin structures determined so far (3,9,11), several pairs of cysteine residues are found in close vicinity (e.g. Cys 11 -Cys 425 is spatially close to Cys 427 -Cys 445 , and Cys 3 -Cys 21 is close to Cys 14 -Cys 40 ; see Fig. 2). Given small adjustment in the three-dimensional structure, their sulfhydryl groups are at the right distance to form alternate disulfide bonds. Thus, we cannot completely rule out the possibility of an alternative disulfide arrangement that would introduce subtle alterations in the structure, possibly acting as relays to transduce signals. In our structure, which is devoid of the domains C-terminal to the I-EGF1, Ile 455 is largely exposed to the solvent at the lower end of the molecule (Fig. 5). A hydrophobic residue is also found at the same position in integrins ␤ 1 , ␤ 3 , ␤ 5 , and ␤ 7 (Fig. 3). It is thus probable that Ile 455 of I-EGF1 makes contact with some hydrophobic residues protruding from the flanking I-EGF2 domain. Having defined the relative orientation of the hybrid, PSI, and I-EGF1 domains, what can we infer about the probable location of the I-EGF2 fragment? We performed a super-position of the PSI/hybrid domains of the PHE1 structure with the corresponding domains resolved in the bent conformation of the integrin ␣ V ␤ 3 structure (4) (Fig. 6). Based on the resulting locations of the C-terminal end of the I-EGF1 domain and the N-terminal end of the I-EGF3, we observed that to join the I-EGF1 and I-EGF2 segments together, these two modules must associate in antiparallel fashion, presumably exposing the long hydrophilic stretch present between cysteines C1 (Cys 461 ) and C2 (Cys 475 ) of the I-EGF2 module of the ␤ 2 integrin subunit at the tip of the molecule in its bent conformation (Fig. 6). This observation is in agreement with the extended model proposed for the I-EGF2/I-EGF3 tandem based on NMR data (9). Two monoclonal antibodies that stimulate ligand binding to integrin ␣ 5 ␤ 1 recognize epitopes located at the N terminus of the PSI domain, further emphasizing the importance of this region of the integrin molecule for signal transduction (16). In addition, a pathogenic hantavirus recognizes the PSI domain in its bent conformer, and this interaction was suggested to disturb vascular permeability by restricting integrin dynamics (27). Recent reports suggest that the bent conformers of ␣ V ␤ 3 (28) and ␣ 5 ␤ 1 (29) can interact with ligands under appropriate conditions. These data point to the existence of several integrin conformers displaying subtle variations in terms of their activation states, thus emphasizing the importance of studies aimed at defining these structural states and how they correlate with integrin activation.
In conclusion, one critical region for the transmission of structural changes with functional significance is located at the junction of the PSI, hybrid, and I-EGF1 domains with a linkage between I-EGF1 and I-EGF2 possibly acting as a conformational switch between the head and the legs of the ␤ subunit of the integrin receptor. Our result point to this linker as an important region for regulating integrin-mediated cell adhesion. This hypothesis can now be tested by raising antibodies directed against peptides connecting the I-EGF1 and I-EGF2 modules and assessing their ability to interfere with integrin activation.