Crystal Structure of the I Domain from Integrin α2β1*

We have determined the high resolution crystal structure of the I domain from the α-subunit of the integrin α2β1, a cell surface adhesion receptor for collagen and the human pathogen echovirus-1. The domain, as expected, adopts the dinucleotide-binding fold, and contains a metal ion-dependent adhesion site motif with bound Mg2+ at the top of the β-sheet. Comparison with the crystal structures of the leukocyte integrin I domains reveals a new helix (the C-helix) protruding from the metal ion-dependent adhesion site face of the domain which creates a groove centered on the magnesium ion. Modeling of a collagen triple helix into the groove suggests that a glutamic acid side chain from collagen can coordinate the metal ion, and that the C-helix insert is a major determinant of binding specificity. The binding site for echovirus-1 maps to a distinct surface of the α2-I domain (one edge of the β-sheet), consistent with data showing that virus and collagen binding occur by different mechanisms. Comparison with the homologous von Willebrand factor A3 domain, which also binds collagen, suggests that the two domains bind collagen in different ways.

The integrins are a family of plasma membrane proteins that transduce bidirectional signals between the cytoplasm and the extracellular matrix or other cells (1). The integrin ␣2␤1 is expressed on a variety of cell types, serving as the collagen receptor on platelets and fibroblasts, and as both a collagen and laminin receptor on endothelial and epithelial cells (2,3). It also acts as the receptor for the human pathogen echovirus-1 (4). In common with six other integrin ␣-chains (␣1, ␣D, ␣E, ␣L, ␣M, and ␣X) the ␣2 chain contains a 200-amino acid inserted domain, the I domain, that is homologous to the von Willebrand factor A domains (5). Recombinant ␣2-I domain recapitulates many of the ligand binding properties of the parent integrin (6 -8). It exhibits specific binding to various fibrillar collagens, and two groups have shown that, like collagen binding to the complete receptor (9,10), binding to the I domain is cation-dependent, being supported by magnesium or manganese but not by calcium (7,11). The triple-helical structure of collagen is required for recognition by ␣2␤1, but specific collagen sequences have not been identified (for review, see Ref. 12).
The first crystal structure of an integrin I domain, from ␣M␤2, showed that it adopts the dinucleotide-binding fold, with a central parallel ␤-sheet surrounded on both sides by ␣-helices (13). In this class of fold, a functional surface of the domain always lies at the C-terminal end of the ␤-sheet (14). In the I domain, a novel cation coordination sphere is located there, and in the ␣M-I domain crystal structure with bound Mg 2ϩ , a glutamate side chain from a neighboring I domain in the crystal lattice completes the octahedral coordination sphere of the metal. This led to the suggestion that the glutamate behaves as a ligand mimetic, as most integrin ligands possess a critical aspartate residue (or glutamate) as a key feature of their integrin-binding motifs, and mutation of any of the metalcoordinating side chains of the I domain (8,(15)(16)(17) abolishes binding in a dominant negative fashion. This motif was therefore dubbed the metal ion-dependent adhesion site (MIDAS) 1 (13). Apart from the highly conserved residues that directly coordinate the metal, the upper surface of the domain surrounding the MIDAS motif is highly variable, suggesting that the metal-Glu/Asp bond contributes some but not all of the binding energy, with the rest of the energy, and the specificity, arising from further interactions (ionic/polar/hydrophobic) between complementary surfaces of the integrin and ligand. In support of this notion, Huang and Springer (18) utilized mousehuman chimeras and site-specific mutagenesis to demonstrate that residues essential for the interaction of ␣L␤2 with intercellular adhesion molecule-1 are located on the MIDAS face surrounding the site of metal coordination. In addition, two of the epitopes for function-blocking antibodies map to the same face (19). Similarly, Rieu et al. (20) showed that residues essential for the binding of the hookworm pathogen, neutrophil inhibitory factor, a protein that blocks the binding of natural ligands to ␣M␤2, cluster around the MIDAS face of ␣M-I.
Crystal structures have previously been reported for the ␣L-I and ␣M-I domains with bound Mg 2ϩ and Mn 2ϩ (13,(21)(22)(23). The crystal structure of an A-domain from von Willebrand Factor (vWF-A3) has also been solved recently (24). We now report the crystal structure of the ␣2-I domain, which we determined as a first step in understanding the atomic level determinants of ligand binding, and compare the structure with other A/I domains. vWF-A3 also binds collagen, suggesting that the two domains might have similar binding motifs. Our crystal structure suggests that they do not.

EXPERIMENTAL PROCEDURES
Purification and Crystallization of the ␣2-I Domain-Human ␣2-I domain (residues 140 -337) was expressed as a glutathione S-transferase fusion protein in Escherichia coli, cleaved, and purified as described previously (25). The protein was next loaded onto an affinity iminodiacetic acid-Sepharose column (Pharmacia) charged with Ni 2ϩ and * This work was supported in part by grants from the Biotechnology and Biological Sciences Research Council and the National Institutes of Health. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The  Data Collection and Structure Determination-A 2.5-Å data set (see Table I) was collected from a single crystal mounted and frozen in a stream of boiled-off nitrogen at 100 K using a Rigaku RU-200 x-ray generator with focusing mirrors and an RAXIS II image plate. Data were reduced with DENZO and scaled with SCALEPACK (26) with an R merge ϭ 4.2%, and 95% completeness to 2.5-Å resolution. This data set was used to perform the molecular replacement calculations and the early stages of refinement. A room temperature data set was subsequently collected from an imperfect twin. As the twins of the crystal were randomly oriented and diffracted with approximately equal intensity, both triclinic lattices could be indexed and data merged, with an R merge ϭ 11.2% (31% in the outer shell) and completeness of 96.5% to 1.9-Å resolution. A 7 peak on the ϭ 180°section of a self-rotation function calculated using GLRF (27) indicated the presence of two molecules in the asymmetric unit, consistent with a solvent content of 55%. Molecular replacement was performed with AMoRe (28), using the superposed structures of ␣M-I (Mn 2ϩ -bound conformation (21)) and ␣L-I domains (22) and the vWF-A3 domain (24) as a search model. The top two peaks in a cross-rotation function gave the correct orientations of the two monomers, and their relative displacement was readily determined with a translation function. Applying this solution to all protein atoms in the ␣M-I structure resulted in a model with good crystal packing, and an R factor and correlation coefficient of 52% and .24, respectively (20-6 Å), improving to 50% and .48 on rigid body refinement. The ␣M-I model was next stripped of side chains and loop regions; the remaining 114 alanine and glycine residues constituted 5 ␣-helices (␣7 omitted) and 6 ␤-strands, which were refined as individual rigid bodies in XPLOR (29) using 10 -2.8 Å data. The R factor at this stage was 41.6% and R free (calculated on 10% of reflections) was 47.9%. A 2F o Ϫ F c electron density map revealed a number of new features including several side chains and the C-terminal helix ␣7. At this point, the refinement was extended to the resolution limit of the room temperature data set (1.9 Å). Several cycles of simulated annealing XPLOR refinement, followed by inspection of 2F o Ϫ F c maps and manual model building, led to a complete polypeptide trace. Water molecules were then added at the positions of F o Ϫ F c peaks greater than 3, where reasonable hydrogen bonding partners existed. After applying a bulk solvent correction, the current R factor is 19.6% for all data between 15 and 1.9 Å (R free ϭ 24.8%); the model includes residues 139 -339 (including 2 residues at the C terminus from the expression construct) for both molecules in the asymmetric unit, and 267 water molecules. The only residue in an unfavored region of the Ramachandran plot is Ala 188 . Electron density for this residue is persuasive and stabilizing hydrogen bonds exist. There are two cis proline residues, Pro 158 in the ␤A-␣1 turn and Pro 307 in the ␣6-␤F turn.
I-domain:Collagen Modeling-We built a collagen triple helix based on the published crystal structure (30) (PDB entry 1cag) of a collagenlike peptide (Pro-Hyp-Gly) 4 -Pro-Hyp-Ala-(Pro-Hyp-Gly) 5 . This peptide contains a single Gly to Ala substitution at the center of a 30-residue sequence which disrupts the (X-Y-Gly) n motif, creating a local untwisting of the helix. We used repeats 2 to 4 (residues 4 -12) of each polypeptide chain and extended these along the helical axis to generate an ideal collagen triple helix 80 Å long. At the center of this triple helix, a proline (at the X position) or a hydroxyproline (at Y) was replaced with a glutamate in its preferred gauche(ϩ),trans side chain conformation. The collagen model and I domain were then manually docked as rigid bodies by constraining the distance between the glutamate carboxylate and the metal ion to 2 Å (the observed bond distance in the ␣M␤2 I domain) and by minimizing intermolecular steric clashes (monitored using the CCP4 program CONTACT (28)). With Glu at either the X or Y position, a convincing fit could be found with only minor steric clashes. These clashes could be relieved either by trimming 3 or 4 of the collagen Pro/Hyp side chains to Ala or Gly, or by allowing a small number of side chains on the I domain to adopt alternate rotamer conformations. When the Glu was replaced by an Asp, severe steric clashes occurred between backbone atoms of the two docking partners.

RESULTS AND DISCUSSION
Crystal Structure of the ␣2-I Domain-The structure, as expected, adopts the classic dinucleotide-binding fold, with seven helices surrounding a core of five parallel ␤-strands and one short antiparallel ␤-strand (Figs. 1 and 2). Compared with the ␣M and ␣L-I domains, the most striking difference is a new turn-and-a-half of ␣-helix, residues 284 -288, which we call the C-helix, extending from the top of strand ␤E and protruding from the MIDAS face (see below). There is a buried glutamic acid in place of the usual glycine in the position following the MIDAS aspartic acid (DESNS), but this is accommodated without distortion of the MIDAS motif. Space is created by a 1residue insertion in the ␣3-␣4 loop that wraps over the top of ␤A and ␤B, and the charge is neutralized by a salt bridge to Arg 192 from ␤C. A buried water molecule adjacent to the MIDAS motif is closely conserved in ␣M-I and ␣L-I; it makes hydrogen bonds to the main chain carbon oxygen of Gly 255 , to the side chain hydroxyls of Thr 223 and Thr 253 and to the carboxylate of Glu 299 .
The buried Phe in ␣M-I and ␣L-I (Ile in vWF-A3) at the top of helix ␣7 (which becomes exposed in the "active" conformer of ␣M-I (21)) is replaced by a glutamic acid in the ␣2-I domain (Glu 318 ). The side chain turns upwards to avoid complete burial, creating a cavity which is filled by a water molecule that is not found in ␣L-I, ␣M-I, or vWF-A3. The water molecule hydrogen bonds to the carboxylate of the MIDAS aspartate (Asp 254 ), the main chain nitrogens of Gly 284 and Tyr 285 , as well as the carboxylate of Glu 318 . A salt bridge is provided by the side chain of Arg 288 from the C-helix. The orientation of the ␣7 helix is nevertheless very similar to that in the ␣M-I Mn structure.
Sequences of ␣2␤1 have been reported from human, cow, mouse, and pig. Within the I domains, 43 positions are not invariant. These all lie on the surface of the molecule, except for two conservative changes in the hydrophobic core (Val 182 -Met and Leu 328 -Ile in bovine). None of the changes are found on the MIDAS face, including the new C-helix.
Although the two molecules in the crystal unit cell were refined independently the overall structures are almost identical (RMSD ϭ 0.25 Å for main chain atoms) except for the chain termini. Both chain termini are ordered to a greater extent than in other I domain structures. The N terminus extends 5 residues before strand ␤A, and the C terminus 3 residues beyond helix ␣7. In the crystal, a disulfide bridge is formed between the N-terminal cysteine residues (Cys 140 ) of the two molecules in the asymmetric unit, and this results in different conformations of the termini. It is unlikely that such a disulfide bond forms in vivo, and Springer (31) has predicted that Cys 140 makes a disulfide bridge with a cysteine residue within the 4-1 loop of his propeller model. In one molecule, the C-terminal residues (335-337) pack into a crevice formed by the N-terminal loop (residues 138 -142) and the ␣4-␤D loop (residues 244 -246). The side chain of Ile 335 packs into this crevice, the main chain makes a number of ␤-sheet hydrogen bonds to both loops, and the N and C termini are brought into very close proximity. In the second molecule in the crystal, it appears that disulfide bond formation induces a new conformation in the N-terminal loop which squeezes the C-terminal residues out of the crevice, and we suggest that the first molecule better reflects the native conformation of the domain termini in the intact integrin.
The MIDAS Motif-The MIDAS motif (Fig. 3) binds a magnesium ion in the ␣2-I domain, as expected given the strict sequence conservation of the motif throughout the integrin I domains (the MIDAS motif in the vWF-A3 domain is not strictly conserved, and the vestigial motif does not bind a metal ion (24)). The metal is directly coordinated by three side chains (from residues Ser 153 , Ser 155 , and Asp 254 ) and three water molecules, making strong bonds (2.0 Ϯ 0.1 Å) in an octahedral arrangement. Asp 151 makes hydrogen bonds to Ser 153 and a water molecule (2.7-2.9 Å) but no direct bond. Thr 221 does not coordinate the metal directly (Mg-OH(Thr) ϭ 4.1 Å), but makes a hydrogen bond (2.9 Å) to one of the water molecules. This coordination is very similar to that found in the Mn 2ϩ -bound structures of ␣L-I and ␣M-I (␣L Mn -I and ␣M Mn -I, defined as the "inactive" form by Lee et al. (21)), but different from the coordination observed in the ␣M-I structure with bound Mg 2ϩ (␣M Mg -I, the active form). It is the first high resolution crystal structure of the inactive conformer with bound Mg 2ϩ , and confirms the coordination predicted from the lower resolution (2.8 Å) structure of ␣L-I with bound Mg 2ϩ (23). The role of the MIDAS threonine is intriguing; it is the only MIDAS residue that is absolutely critical for collagen binding to the recombinant ␣2-I domain (32), and it is also critical for ligand binding in the ␣M-I and ␣L-I domains (33). Only in the active ␣M Mg -I domain does the threonine coordinate the metal directly, suggesting that the threonine is required for stability of the active conformer and supporting the theory of tertiary structure change within the I domain (21). It appears that using modified protein, the requirement for cation can be circumvented (see below), but it is puzzling that the threonine remains essential under those conditions also (32). This issue will only be properly resolved by structure determination of an authentic I domain-ligand complex.
Comparison with Other A/I Domains-The central core of five parallel ␤-strands and one short antiparallel ␤-strand is highly conserved among the integrin I domains and vWF-A3, with RMS deviations of 0.6 -0.7 Å (Figs. 1 and 2). The ␤B-␤C hairpin and a ␤-bulge at the end of ␤C are almost identical in all four structures. By contrast, the helices are more variable, with only helices ␣1 and ␣4 showing a general agreement of length and orientation. Helix ␣2 is replaced by a short turn and helix ␣3 extended by a turn in ␣2-I and vWF-A3. The Cterminal helix, ␣7, has a similar conformation in ␣M-I Mn , ␣2-I, and vWF-A3, but is different in ␣L-I, where the helix splays out from the side of the domain, exposing a large hydrophobic crevice that is filled by a hydrophobic C-terminal sequence from another molecule in the crystal lattice (22). The conformation of helix ␣7 is also very different in the active conformer of ␣M-I Mg , where a functional role in propagating structural changes from the MIDAS face to the rest of the integrin has been proposed (21).
The MIDAS Face-The loops surrounding the MIDAS motif, which comprise the MIDAS face, are ␤A-␣1, ␣3-␣4, ␤D-␣5, and ␤E-␣6 (Fig. 4). These loops, except for ␤E-␣6, have been implicated in ligand binding to the ␣L-I and ␣M-I domains by mutagenesis experiments (18 -20). The loops have highly variable surface-exposed residues in all of the A/I domains, even when the main chain conformation is conserved, consistent with their being the principal determinants of ligand binding specificity. The ␤A-␣1 loop, which includes the metal-coordinating DxSxS consensus sequence, has a 2-residue deletion in the ␣2-I domain, at the beginning of the ␣1 helix, but is otherwise conserved. The ␣3-␣4 loop has a 1-residue insertion in the ␣2-I domain, which creates space for the glutamate (in DESNS), as already noted. The ␤D-␣5 loop is similar in ␣2-I, ␣L-I, and vWF-A3, but different in ␣M-I, while ␣L-I lacks most of helix ␣5. The ␤E-␣6 loop is the site of the principal insertion in the ␣2-I domain that creates the protruding C-helix. Its conformation is similar in ␣L-I and ␣M-I, but very different in ␣2-I and vWF-A3. It undergoes substantial rearrangement in the two structures of ␣M-I, creating an acidic pocket in the active conformer, and we suggest that this loop is a major determinant of ligand specificity.
Collagen Binding-Integrin ␣2␤1 binds several types of fibrillar collagens (types I-VI and XI), and recombinant ␣2-I domain exhibits specific binding to some, but not all of these (reviewed in Ref. 12). Two groups have shown that collagen binding to the ␣2-I domain is cation-dependent, being supported by Mg 2ϩ and Mn 2ϩ but not by Ca 2ϩ (7,11). The MIDAS residues Asp 151 , Thr 221 , and Asp 254 are all essential for collagen binding to the ␣2␤1 integrin, and Thr 221 is also critical in the recombinant ␣2-I domain (8,32). While the sequence motifs in collagen to which ␣2␤1 bind have yet to be defined, the triple helical structure is required, and Asp/Glu and Arg (but not Lys) have been shown to be important for the binding of ␣1␤1 to collagen IV (34).
The new C-helix on top of the MIDAS face creates a groove about 25 Å long and 20 Å wide centered on the metal ion, with a tyrosine residue (Tyr 285 ) projecting prominently into the groove. Into this groove we manually docked a collagen triple helix, which was derived from the crystal structure of a collagen-like peptide (30) and modified to contain a glutamate residue to coordinate the MIDAS metal ion (Fig. 5). The shape of the groove places strong constraints on the position of the collagen helix; in particular, the projecting C-helix, the top of helix ␣6, and their connecting loop, severely restrict rotations of the collagen about an axis parallel to the glutamate-metal bond. The amino acid side chains in natural collagen sequences would restrict rotations still further. Replacement of the glutamate by the shorter aspartate creates steric clashes in all possible orientations of the collagen. The model predicts that the following I domain residues make contact with the collagen: from the ␤A-␣1 turn (Asn 154 ), from the ␣3-␣4 turn (Asp 219 and Leu 220 ), from the ␤D-␣5 turn (Glu 256 and His 258 ), and from the C-helix, ␣6 and C-␣6 turn (Tyr 285 , Asn 289 , Leu 291 , Asn 295 and Lys 298 ). The "footprint" on the collagen is about 10 residues long. Mutagenesis data (32) show that individual alanine mutants of two of these residues, Glu 256 and Asn 295 , do not detectably affect collagen binding, but given the large number of potential contacts this may not be surprising. The epitopes for two blocking antibodies have been mapped to Asp 160 and Tyr 216 (8,35,36). These residues lie on the surface of the domain to the side of the MIDAS face, 12-15 Å from the MIDAS motif, consistent with their forming parts of epitopes for blocking antibodies.
Takada's group (32) have reported conflicting data on collagen binding, showing that in their system binding is cationindependent. They did, however, also show that binding was completely abrogated by alanine mutagenesis of the MIDAS residue Thr 221 . A possible resolution of these conflicting data has been provided by Tuckwell et al. (7), who showed that although most binding was cation-dependent, there was nonetheless a cation-independent fraction which correlated with the degree of protein modification. Takada's group modified their protein by iodination. Our crystal structure reveals a tyrosine residue on the MIDAS face very close to the metal ion, and our modeling studies suggest that this tyrosine is intimately involved in collagen binding. It is therefore plausible that iodination of this tyrosine affects I domain-collagen binding, either by providing an additional hydrophobic element to binding, or by directly inducing a conformational change to a high affinity state, abrogating the need for a metal ion.
The ␣1-I domain has a similar but not identical ligandbinding range as ␣2-I, binding collagen types I and III-VI, and laminins (17). The ␣1 and ␣2 sequences form a subfamily distinct from the leukocyte integrins, and the sequences can be aligned with no gaps, so that the ␣2-I domain crystal structure provides an excellent main chain model for ␣1-I. The MIDAS motif is strictly conserved in ␣1-I, as is the aspartic acid at the top of helix ␣7. The new C-helix is structurally conserved, although divergent in sequence (Tyr 285 is replaced by a serine,  (38). Lowercase letters denote a lack of structural similarity. Sequence identities with the ␣2-I domain are 26.7% (␣M), 24.0% (␣L), and 20.3% (vWF-A3). The ␣M and ␣L-I domains form a subfamily with 33.9% sequence identity. The ␣1-I domain sequence alignment is also shown; its structure has not been determined but is likely to be very similar to the ␣2-I domain. while Leu 286 is replaced by a tyrosine). The ␤A-␣1 and ␤D-␣5 loops are strictly conserved around the MIDAS motif, while the ␣3-␣4 loop is divergent.
The vWF-A1 and A3 domains have been shown to bind collagens type I and III. Binding to A3 is independent of cation, and the crystal structure of vWF-A3 does not contain a bound metal (24). In addition, mutation of the MIDAS residues do not affect collagen binding. The ␤E-␣6 loop, which contains the C-helix insert in ␣2-I, is truncated in vWF-A3, so that it is even shorter than in ␣L-I and ␣M-I. The result is that the vestigial MIDAS face is quite featureless, suggesting that ␣2-I and vWF-A3 bind collagen in different ways.
Echovirus Binding-The ␣2␤1 integrin is the receptor for the human pathogen echovirus-1 (4), and the ␣2-I domain binds directly to the virus (25). Virus binding is cation-independent (35), is not affected by mutations of the MIDAS motif (37), and does not require activation of the integrin (35). King et al. (37) have recently shown that residues 199 -201 and 212-216 are involved in virus binding. These residues map to the loops flanking both ends of helix ␣3, forming part of a flat surface (ϳ20 Å ϫ 30 Å) at one end of the ␤-sheet, adjacent to but not overlapping the MIDAS face. This location is consistent with biochemical data suggesting that the collagen and echovirus-binding sites are distinct (35). Several monoclonal antibodies block both virus and collagen binding, supporting the idea that the binding sites are in close proximity (35). Part of the epitope for one such antibody, 5E8, has been mapped; it includes residue Tyr 216 (36), on the loop between helix ␣3 and the MIDAS motif.  (30) has been fitted into a groove on the MIDAS face, with a glutamate side chain from the collagen coordinating the metal ion. The I domain is colored according to surface charge distribution (blue positive, red negative, white neutral), as calculated with the program GRASP (39). Two orthogonal views are shown.