The Structure of Integrin α1I Domain in Complex with a Collagen-mimetic Peptide*

Background: Collagen-binding integrins bind differentially to different types of collagen. Results: The solution structure of integrin α1I domain in complex with a collagen-mimetic peptide was determined. Conclusion: Integrin α1I domain binds collagen in a distinct orientation compared with α2I, but the signal transduction mechanisms appear to be conserved. Significance: Understanding the collagen binding specificity of integrins might enable their selective modulation in disease. We have determined the structure of the human integrin α1I domain bound to a triple-helical collagen peptide. The structure of the α1I-peptide complex was investigated using data from NMR, small angle x-ray scattering, and size exclusion chromatography that were used to generate and validate a model of the complex using the data-driven docking program, HADDOCK (High Ambiguity Driven Biomolecular Docking). The structure revealed that the α1I domain undergoes a major conformational change upon binding of the collagen peptide. This involves a large movement in the C-terminal helix of the αI domain that has been suggested to be the mechanism by which signals are propagated in the intact integrin receptor. The structure suggests a basis for the different binding selectivity observed for the α1I and α2I domains. Mutational data identify residues that contribute to the conformational change observed. Furthermore, small angle x-ray scattering data suggest that at low collagen peptide concentrations the complex exists in equilibrium between a 1:1 and 2:1 α1I-peptide complex.

We have determined the structure of the human integrin ␣1I domain bound to a triple-helical collagen peptide. The structure of the ␣1I-peptide complex was investigated using data from NMR, small angle x-ray scattering, and size exclusion chromatography that were used to generate and validate a model of the complex using the data-driven docking program, HADDOCK (High Ambiguity Driven Biomolecular Docking). The structure revealed that the ␣1I domain undergoes a major conformational change upon binding of the collagen peptide. This involves a large movement in the C-terminal helix of the ␣I domain that has been suggested to be the mechanism by which signals are propagated in the intact integrin receptor. The structure suggests a basis for the different binding selectivity observed for the ␣1I and ␣2I domains. Mutational data identify residues that contribute to the conformational change observed. Furthermore, small angle x-ray scattering data suggest that at low collagen peptide concentrations the complex exists in equilibrium between a 1:1 and 2:1 ␣1I-peptide complex.
Integrins comprise a family of non-covalently associated heterodimeric cell surface receptors containing an ␣ subunit and a ␤ subunit. They mediate interactions between individual cells as well as interactions between cells and the extracellular matrix (ECM). 2 These receptors act as mediators that transmit bidirectional signals across the cell membrane. In outside-in signaling, signals are triggered by the binding of ECM ligands to the extracellular domain of the integrin and are propagated through the receptor to reach the intracellular domain and generate cellular responses. In contrast, inside-out signaling results from the binding of cytosolic molecules to the cytoplasmic domain of the integrin receptor, which can regulate the binding affinity of the receptors for ligands in the ECM (1).
Four of the 24 integrins are characterized by their ability to bind to collagen and are categorized as the collagen-binding integrins. All four collagen-binding integrins contain a ␤1 subunit that is in complex with an ␣1, ␣2, ␣10, or ␣11 subunit to form the ␣1␤1, ␣2␤1, ␣10␤1, and ␣11␤1 receptors. These four collagen-binding integrins vary in terms of their tissue distribution and cellular signaling pathways, but they share a common role of mechanically supporting cell adhesion to collagen in the ECM and maintaining tissue integrity (2). Of these, ␣1␤1 and ␣2␤1 are the best characterized and have been reported to play important roles in cellular processes such as angiogenesis, the regulation of collagen expression by fibroblasts, T-cell activation, and collagen-induced platelet aggregation (2)(3)(4). They have also been implicated in disease processes including inflammation as well as metastasis in certain cancers, making them attractive therapeutic targets (4 -7).
All four collagen-binding integrins contain an inserted domain called the ␣I domain (␣I), comprising ϳ200 residues, which is located in the extracellular region of the ␣ subunit. The ␣I domains contain the major binding site for ECM ligands of these receptors, and recombinant isolated ␣I domains recapitulate many of the ligand binding properties of the intact integrins (8,9). This makes the ␣I domains suitable models for the study of specific interactions between integrins and ECM ligands. All ␣I domains assume a Rossmann fold, which is composed of five parallel ␤-strands and one antiparallel ␤-strand surrounded by six ␣-helices. The ␣I domain coordinates a divalent metal ion at a site termed the metal ion-dependent adhesion site (MIDAS), which is part of the collagen-binding site. The metal-coordinating residues of the MIDAS are located in three loops on one surface of the ␣I domain (10,11).
The four collagen-binding integrins bind differentially to different types of collagen. For example, ␣1␤1 has a higher binding affinity for collagen type IV than for collagen type I, but ␣2␤1 preferentially binds collagen type I over collagen type IV (12)(13)(14). Synthetic peptides that mimic specific collagen sequences and contain repeats of the amino acid motif GXX have been used widely in the study of integrin-collagen interactions. These peptides spontaneously self-assemble to adopt the triplehelical conformation that is found in native collagen. As these peptides form homotrimers, they do not recapitulate the heterotrimeric structure observed in some collagens. Nonetheless, they have proved to be extremely valuable tools to probe the binding specificity and structure of collagen-binding proteins (15)(16)(17)(18). Using peptides such as these, several specific recognition motifs have been identified for ␣1␤1 and ␣2␤1 integrins (18 -21). The recognition motifs are generally six-residue sequences flanked on either side by several repeats of GPO (glycine-proline-hydroxyproline). They include GFOGER (15), which is found in collagen type I, as well as GLOGEN and GROGER from collagen type III (18,21).
A crystal structure has been reported previously for the ␣2I domain (␣2I) in complex with a triple-helical peptide containing the GFOGER motif (22). Comparison of the liganded and unliganded ␣2I structures revealed that significant conformational changes occur on binding (22). This structure provided the first insights into the activation mechanism of collagenbinding ␣I domains. The major structural changes observed on ligand binding include 1) a rearrangement at the MIDAS that was caused by a change in the metal ion coordination involving a glutamate residue from the collagen peptide (GFOGER); 2) a 10-Å downward shift of the C-terminal helix (helix 7), which was proposed to form an interaction with the ␤1 subunit of the receptor and propagate the signal toward the cytoplasmic domain (22)(23)(24)(25)(26); and 3) an uncoiling of a one-turn-and-a-half helix (C helix), which is located above the MIDAS in the unliganded state. The uncoiling was proposed to open up the MIDAS to allow the collagen-mimetic peptide to bind. Ligand binding to ␣2I was accompanied by the breakage of a salt bridge between Arg 288 , which is located in the C helix, and residue Glu 318 from helix 7. Loss of this salt bridge was proposed to be partly responsible for the uncoiling of the C helix that accompanies peptide binding. A single point mutation at Glu 318 (E318W) showed increased binding affinity for collagen, suggesting the involvement of this conformational change in ␣2I activation (27,28).
Because of their sequence and structural homology and their shared ability to bind to collagen, ␣1I and ␣2I domains have been proposed to use a similar activation mechanism to transmit signals. However, there have been no structures of any ␣1Ipeptide complexes to confirm this hypothesis. The structure of the unliganded form of ␣1I reveals a salt bridge between residues Arg 287 and Glu 317 , and the structure of ␣1I containing the mutation E317A was reported in 2011 (29). The E317A mutant showed enhanced binding affinity for collagen as had previ-ously been observed following mutation of the similar residue in ␣2I (E318W), although the structure was different from that of ␣2I in complex with GFOGER. The C helix in ␣1I E317A uncoiled as observed in the ␣2I-GFOGER complex, but the displacement of helix 7 did not occur, and the metal ion at the MIDAS of ␣1I E317A was pentacoordinated, whereas the MIDAS in ␣2I-GFOGER was hexacoordinated. More recently, analysis of the solution structure of E317A using NMR (30) showed evidence of significant conformational change in the mutant. In addition, the E317A mutation led to increased binding to collagen and increased activation of the intact ␣1␤1 receptor. In contrast, the mutation R287A produced less pronounced effects on collagen binding and receptor activation, whereas a chargereversed mutant, R287E/E317R, displayed a phenotype more similar to E317A. This led the authors to propose that 1) the role of Glu 317 in stabilizing the low affinity conformation of ␣1I is largely independent of the salt bridge with Arg 287 and may result from interactions with helix dipoles in the structure and 2) mutation of Glu 317 may be sufficient to result in displacement of helix 7 in contrast to the crystal structure (29). However, the NMR data demonstrated that the protein containing the E317A mutation was conformationally flexible, and it is possible that the crystal structure represents one state of E317A.
Here, we report the solution structure of human ␣1I domain in complex with a collagen-mimetic peptide. The triple-helical peptide contained a GLOGEN recognition sequence, which is a high affinity ligand of ␣1I (18). The binding orientation of the peptide is different from that observed for GFOGER binding to ␣2I, and a close comparison of the interfaces reveals differences that may account for their collagen binding preferences. Despite the difference in ligand binding orientations, the signaling mechanisms appear to be conserved. GLOGEN binding to ␣1I resulted in significant conformation changes including the uncoiling of the C helix, displacement of helix 7, and breakage of the Arg 287 -Glu 317 salt bridge, consistent with what was observed in ␣2I domain upon binding. The structure of the complex revealed the formation of a new salt bridge between Glu 317 and Arg 171 from helix 1. However, R171A mutation had little effect on the structure of either the unliganded or the peptide-bound state from which we infer that the Glu 317 -Arg 171 salt bridge is not crucial for the binding of collagen or ␣1I activation. In contrast, mutation R287A or E317A produced significant conformational change in the ␣1I domain. Analysis of the NMR data provided strong evidence that helix 7 was displaced in both of these mutants, suggesting that the salt bridge observed in the unliganded structure is important in maintaining the low affinity conformation. However, the two mutants showed different patterns of chemical shift perturbations in NMR spectra for residues at the collagen-binding site, suggesting that they have differential effects on the structure of binding site. These differences may explain the different effects of these mutations on the collagen binding affinity observed. During the course of the investigation, we also identified and characterized a dimeric complex of ␣1I-GLOGEN that supports the binding pose determined for the monomer. These data together provided a more complete structural profile of the collagen-bound ␣1I domain that advances our understanding of signaling and specificity in collagen-binding integrins.

Production of ␣1I Domain
Uniformly 15 N-labeled, 15 N, 13 C-labeled, and 2 H, 15 N, 13 C-labeled human ␣1I was prepared as described previously (31). For expression of unlabeled ␣1I, the same protocol was followed except Luria broth (Sigma-Aldrich) was used instead of isotopically labeled minimal medium, and expression was initiated by induction with isopropyl 1-thio-␤-D-galactopyranoside (Astral Scientific, Australia; 1 mM) at an A 600 of 0.6.

Synthesis of Collagen Peptide
A collagen peptide containing the sequence Ac-GPOGPO-GLOGENGPOGPOGPO-NH 2 was synthesized and purified as described previously (32) except that Fmoc (N-(9-fluorenyl)methoxycarbonyl) Rink-amide resin (Chem-Impex International) was used. The peptide is referred to hereafter as GLOGEN.

Backbone Resonance Assignment
NMR experiments were conducted at 298 K on a Bruker Avance 800 spectrometer equipped with a cryoprobe. The sample contained 15 N, 13 C-labeled ␣1I (1.2 mM) and unlabeled GLOGEN peptide (2.4 mM) in the NMR buffer (50 mM HEPES, pH 7.4, 50 mM NaCl, 5 mM MgCl 2 , and 10% 2 H 2 O). The chemical shifts were referenced according to the method described by Wishart et al. (33) in which the 1 H chemical shifts were referenced to the water peak, whereas the 15 N and 13 C chemical shifts were referenced by the 15 N/ 1 H and 13 C/ 1 H gyromagnetic ratios. For the backbone resonance assignments, two-dimensional 1 H, 15 N HSQC, three-dimensional HNCA, three-dimensional HN(CO)CA, three-dimensional HNCACB, and three-dimensional HNCO spectra were acquired. Except where noted otherwise, spectra were processed using Topspin (version 3.0, Bruker-BioSpin TM ), and the analysis was carried out using SPARKY (34).

Titration of Wild-type (WT) ␣1I with GLOGEN Using Two-dimensional NMR
To observe GLOGEN binding to ␣1I, a series of two-dimensional 1 H, 15 15 N NOE spectrum was acquired with 3 s of weak irradiation at either the center of the amide proton frequency range to generate the heteronuclear NOE or 10,000-Hz off-resonance for the no-NOE control. Spectra were processed using NMRPipe (37), and the signal decay was analyzed and plotted using SPARKY (34). Theoretical T1/T2 ratios for different structural models were generated using the program HYDRONMR (38).

Side-chain Assignments and NMR Structure Calculation
For the side-chain assignments and generation of structural restraints, three-dimensional HBHA(CBCACO)NH was recorded on a Bruker Avance 500-MHz spectrometer fitted with a cryoprobe, three-dimensional 1 H, 1 H NOESY 15 N HSQC and three-dimensional 1 H, 1 H NOESY 13 C HSQC (aromatic) experiments were recorded on a Bruker Avance 800-MHz spectrometer fitted with a cryoprobe, and three-dimensional 1 H, 1 H NOESY 13 C HSQC (aliphatic) was recorded on a Bruker Avance 900-MHz spectrometer fitted with a cryoprobe. All spectra were recorded at 298 K. For the NOESY spectra, the mixing time was 60 ms, and the 13 C-carrier frequency was 28 and 123.5 ppm for the aliphatic and aromatic 13 C-edited NOESY spectra, respectively. CARA (Computer-aided Resonance Assignment) was used for spectral analysis. Automated structure calculation was performed using the software package UNIO-ATNOS/CANDID (39,40) in combination with the torsion angle dynamics program CYANA 3.0 (41) following the protocol described by Serrano et al. (42). Structures were calculated initially in the absence of a Mg 2ϩ ion. The 20 conformers with the lowest target function values were analyzed to determine the site of Mg 2ϩ ion binding. Based on the CSP pattern of ␣1I on Mg 2ϩ ion binding and the crystal structure of ␣2I-collagen peptide complex, the Mg 2ϩ was assigned to coordinate with residues Ser 152 , Ser 154 , and Thr 220 in our ␣1I structures. Subsequently, a Mg 2ϩ ion was introduced to the structures using a "pseudo-link" consisting of 30 "pseudo-residues" extending from the C terminus. Structures were recalculated in the presence of the Mg 2ϩ ion using UNIO-ATNOS/CANDID as described above with the addition of backbone torsion angle restraints generated by the program TALOSϩ (43). In addition, hydrogen bond restraints (two per bond) were added in regions of canonical secondary structure where a unique hydrogen bond donor-acceptor pair was evident from structural convergence. The 20 structures having the lowest target function with Mg 2ϩ bound were then refined using Cartesian dynamics in CNS (44) after removal of the pseudo-link to produce the final structures. The 20 lowest energy conformers with no NOE violations Ͼ0.25 Å, no bond violations Ͼ0.05 Å, and no improper or dihedral angle violations Ͼ5°were chosen to represent the solution structure of liganded ␣1I.

HADDOCK Docking
The docking of collagen peptide to ␣1I was performed using the data-driven docking program HADDOCK (45,46).
␣1I Template-The 20 lowest energy structures of ␣1I in its liganded state were used as the protein input templates. Restraints that were used in the original structure calculation, i.e. NOEs, H-bonds, and dihedral angle restraints, were included in the docking process to constrain the protein in its liganded conformation. Regions of the protein with low T1/T2 ratios, low values of the heteronuclear NOE, and low angular order (S 2 ) values predicted from chemical shifts were allowed to be fully flexible during docking. HADDOCK requires a set of ambiguous interaction restraints (AIRs) at the binding interface that are divided into "active" and "passive" categories where active residues are those directly implicated in binding from experimental data and passive residues are their near neighbors. Residues on ␣1I for which active AIRs were generated were selected based on their having NMR CSPs Ͼ0.5 ppm and a loss of NMR signal intensity Ͼ70% upon titration of ␣1I with GLOGEN. Exclusion criteria were residues in regions of the protein that showed fast time scale dynamics in the heteronuclear NOE experiment and residues that were not solvent-accessible.
GLOGEN Peptide Template-For the peptide input template, the initial model was obtained by computationally modifying an available crystal structure of a triple-helical collagen peptide (Protein Data Bank code 1Q7D) (47). The peptide in the crystal structure contains a GFOGER recognition motif, which was converted to GLOGEN using the mutagenesis function in PyMOL (The PyMOL Molecular Graphics System, Version 1.3, Schrödinger, LLC). Although the peptide is a homotrimer, each of the strands is in a unique chemical environment, referred to as the "leading," "middle," and "trailing" strands, as viewed from their N termini. To make sure that the triple-helical structure was maintained during the docking, hydrogen bonds present in the crystal structure were included as input restraints. AIRs were also generated for the peptide based on previous studies that showed the essentiality of the glutamate residue in the GLOGEN motif. It was known that one of the three glutamate residues is responsible for coordinating with the metal ion at the ␣1I, but it was unclear whether any one of the glutamate residues in the trimeric peptide model is preferred over the other two. To predict whether ␣1I has a preference for coordinating to any one of the three strands, a preliminary docking was conducted with an ambiguous distance restraint set such that any of the three glutamate residues of the triple-helical peptide may coordinate with the magnesium ion. Residues within 8 Å of a glutamate were defined as active, and residues between 8 and 12 Å from the glutamate were defined as passive. This length matches the diameter of the binding interface of ␣1I and is likely to cover all potential residues on the peptide that could make direct interactions with ␣1I. Based on the lowest energy cluster arising from the preliminary dock, a unique peptide interface was defined, and seven active and eight passive residues were selected as the "optimized" peptide AIRs. The active residues comprised Hyp 10 , Gly 11 , and Asn 13 from the leading strand and Hyp 110 , Gly 111 , Asn 113 , and Gly 114 from the middle strand. The passive residues comprised Gly 8 , Leu 9 from the leading strand; Gly 108 , Leu 109 , Pro 115 , and Hyp 116 from the middle strand; and Pro 215 and Pro 216 from the trailing strand. The leading strand glutamate-magnesium ion interaction was assigned as an unambiguous distance restraint as previous studies have confirmed its essentiality for binding (10). The docking process included a rigid body energy minimization step, which produced 1000 structures. The best 200 structures were subjected to a semiflexible simulated annealing step and then a final low temperature flexible refinement in explicit waters.

Mutagenesis
Mutagenesis was performed using the QuikChange II mutagenesis kit (Agilent Technologies) according to the manufacturer's instructions. The pET28a plasmid containing the WT ␣1I insert was used as the template with the following primers used to generate the R171A and E317A mutations: R171A: forward, 5Ј-AGCTTTTTTAAATGACCTTCTTGAAGCAATGGATAT-TGGTCCTAAACAGACA-3Ј; reverse, 5Ј-TGTCTGTTTAGG-ACCAATATCCATTGCTTCAAGAAGGTCATTTAAAAAA-GCT-3Ј; E317A: forward, 5Ј-AAGCATTTCTTCAATGTCTCT-GATGCCTTGGCTCTAGTCACCATTGTTAAA-3Ј; reverse, 5Ј-TTTAACAATGGTGACTAGAGCCAAGGCATCAGAGA-CATTGAAGAAATGCTT-3Ј. The presence of mutations was verified by DNA sequencing. For the R287A mutant, a synthetic gene encoding ␣1I with the mutation in the pET28a plasmid was ordered from DNA2.0. The mutants were expressed in 15 N-labeled minimal medium following the same protocol used for the WT ␣1I integrin.

Analytical Size Exclusion Chromatography (SEC)
Analytical SEC was carried out using a Superdex 75 HR 10/30 column (GE Healthcare) with a bed volume of 23.6 ml on an Ä KTA TM purifier protein chromatography system. Samples (100 l) containing ␣1I (10 M) with or without GLOGEN (20 M) were applied to the column equilibrated with 50 mM HEPES buffer, pH 7.4, 50 mM NaCl, and 5 mM MgCl 2 . A flow rate of 0.7 ml/min was maintained, and the elution was monitored by continuous measurement of UV absorbance at 280 nm (A 280 ). The runs were conducted at room temperature.

SEC-SAXS
All SAXS data were acquired at the Australian Synchrotron SAXS/WAXS beamline and the data collection and scatteringderived parameters are described in Table 2 according to the recommendations of the International Union of Crystallography Commission on Small-Angle Scattering (48). The SAXS experiments were set up in line with gel filtration chromatography as described by Gunn et al. (49). Samples (50 l) contain- DECEMBER  The runs were carried out at room temperature, and the flow rate was 0.2 ml/min. SAXS15ID software was used to analyze the detector images as averages of 10 sequential 2-s exposures, and the data were converted to individual I(q) SAXS profiles. The scattering intensity (I) was collected over the momentum transfer vector (q) range of 0.011-0.620 Å Ϫ1 (q ϭ (4sin)/ where (2) is the scattering angle and is the x-ray wavelength, which was 1.0332 Å). For the final data sets, 367 and 357 data points were extracted within the range of 0.01-0.5 and 0.02-0.5 Å Ϫ1 from the original unliganded and GLOGEN-bound ␣1I data sets, respectively. The SAXS profiles were analyzed using the ATSAS program suite (50). The radius of gyration (R g ) was estimated using Guinier analysis using AUTORG. The maximum dimension of the scattering particles was estimated according to the pair distance vector distribution functions, P(r), using the program AUTOGNOM. The volume and mass of the scattering particles were estimated using AUTOPOROD, and the ab initio shapes of the scattering molecules were estimated using DAMMIF. The fitted models of the dimeric complex were constructed using PyMOL. The template monomeric complexes of the ␣1I-GLOGEN HADDOCK structure and the ␣2I-GFOGER crystal structure (22) were first duplicated. The two structures were aligned based on the six-residue recognition motif GLOGEN or GFOGER on different strands of the triple-helical peptides. Theoretical SAXS profiles of the models were generated and fitted to the experimental SAXS curve using CRYSOL (51). The statistical analysis of goodness of fit was performed as described by Mills et al. (52). The program OLIGOMER (53) was used to compute the proportions of monomer and dimer in both ␣1I and the ␣1I-GLOGEN complex.

RESULTS
GLOGEN Binding to ␣1I Observed by NMR-The sequence of the peptide used in the study contained the integrin recognition sequence GLOGEN flanked by five GPO repeats. Its ability to self-assemble into the collagen-like triple-helical conformation was confirmed using circular dichroism (data not shown). The GLOGEN motif is the most potent ligand for ␣1␤1 among the fibrillar collagen sequences, and it inhibits the binding of ␣1␤1 to collagen type IV with an IC 50 of ϳ3 M (21). Comparison of two-dimensional 1 H, 15 N TROSY spectra of the ␣1I domain in the presence and absence of the GLOGEN peptide revealed extensive CSPs upon addition of excess peptide (Fig. 1). Gradual titration of ␣1I with the peptide revealed that at a low GLOGEN:␣1I ratio (Ͻ1:1) the two-dimensional 1 H, 15 N SOFAST HMQC spectra of ␣1I exhibited severe line broadening for most peaks in the spectrum (Fig. 1C). The intensity of the peaks increased as the peptide concentration exceeded that of ␣1I, and at a GLOGEN:␣1I ratio of ϳ3:1, most of the signals were recovered (Fig. 1D). Similar broadening has previously been reported upon titration of the ␣2I domain with a peptide containing the GFOGER recognition motif (54). Such broadening T1/T2 Analysis of the Stoichiometry of the Complex-The oligomeric state of the ␣1I-GLOGEN complex was assessed using NMR by measuring the heteronuclear 1 H, 15 N T1/T2 relaxation ratios for ␣1I domain in the absence and presence of a 2-fold excess of GLOGEN. The experimental data were compared with theoretical data generated using HYDRONMR (38) for the isolated ␣1I domain (Protein Data Bank code 1QCY (55)), isolated ␣2I domain (Protein Data Bank code 1AOX (56)), the monomeric GFOGER-␣2I complex (Protein Data Bank code 1DZI (22)), and a model that consisted of two ␣I domains bound to one triple-helical peptide (Fig. 2). The model of the 2:1 complex was generated manually by docking a second ␣2I molecule onto the structure of the GFOGER-␣2I complex. The theoretical T1/T2 ratios for the 2:1 complex were significantly higher than the rest of the models due to its size and shape. The fluctuation in the T1/T2 values is related to the orientation of the amide N-H bond vectors with respect to the principle axis frame of the rotational diffusion tensor in the calculation. This kind of variation is typical for elongated ellipsoidal proteins such as that of the 2:1 complex. The global T1/T2 ratio measured for the ␣1I-GLOGEN complex is in good agreement with the theoretical data derived for the monomeric complex and is consistent with the formation of a 1:1 complex between ␣1I domain and the triplehelical GLOGEN at high GLOGEN:␣1I domain ratios.
Structure of the ␣1I Domain Bound to GLOGEN-The structure of ␣1I domain bound to GLOGEN was determined using a sample containing 15 N, 13 C-labeled ␣1I (1.2 mM) in the presence of Mg 2ϩ (5 mM) and unlabeled GLOGEN (2.4 mM). The pattern of CSPs observed upon addition of Mg 2ϩ to ␣1I was consistent with the position of the metal ion in the crystal structure of GFOGER-␣2I complex (31). The coordination of Mg 2ϩ in our structure was therefore assumed to be similar to the ␣2I complex structure. An ensemble of 20 conformers representing the structure of ␣1I bound to GLOGEN (from a total of 40 structures calculated) is shown in Fig. 3. Structural statistics for the ensemble are shown in Table 1. The structure assumes a typical Rossmann fold consisting of five parallel ␤-strands and one antiparallel ␤-strand surrounded by six ␣-helices. Comparison of the lowest energy model with the unliganded structure of ␣1I (Fig. 4) shows that the central ␤-sheet core remains sim-  ilar with an root mean square deviation on the C␣ of 1.1 Å. Major conformational differences were identified in two regions. The first region is the C helix of the unliganded state that becomes completely unfolded upon peptide binding. Unwinding of the short C helix results in extending the ␤E-␣6 loop (residues 281-292). Very few NOEs could be assigned in the ␤E-␣6 loop in NOESY spectra of the complex. The TALOSϩ-predicted angular order (S 2 ) values (43) showed a clear decrease in the predicted order in the ␤E-␣6 loop of ␣1I upon binding to GLOGEN (Fig. 4A). This is supported by comparison of the 1 H, 15 N heteronuclear NOE for ␣1I unliganded and in complex with GLOGEN that revealed a decrease upon peptide binding, consistent with increased dynamics in this region in the bound state (Fig. 4B). These findings are consistent with the lower precision observed in this region of the structure of ␣1I (Fig. 3A). The second region undergoing major conformational change is the C-terminal helix (helix 7), which is displaced downward by 12 Å upon GLOGEN binding relative to its position in the unliganded structure. In the unliganded state, helix 7 is linked to the C helix by a salt bridge between Arg 287 and Arg 317 . This salt bridge is broken in the complex with GLOGEN. In its new position in the complex, Glu 317 is in a position from where it may form a salt bridge with Arg 171 from helix 1 (Fig. 4F). To investigate the importance of the salt bridges in regulating the conformational changes, single point mutants with R171A, R287A, and E317A substitutions were studied (Fig. 5).
The 1 H, 15 N TROSY spectrum of the R171A mutant (275 M) in the absence and presence of GLOGEN (500 M) showed a high degree of similarity to the spectrum of the unliganded and peptide-bound states of WT ␣1I, respectively. Chemical shift changes were only observed for residues located in close proximity to the mutation site in each case (Fig. 5, A and B), suggesting that the mutation had little impact on the structure of either the unliganded or peptide-bound state of ␣1I.
In contrast, the 1 H, 15 N TROSY spectrum of R287A (200 M) in the absence of GLOGEN showed substantial differences from the spectrum of the unliganded WT ␣1I. Comparison of the spectrum with the GLOGEN-bound WT ␣1I, however, showed a much higher degree of similarity (Fig. 5, C and D). In particular, a strong peak from the spectrum of the unliganded R287A coincides with the peak corresponding to the C-terminal residue, Ile 331 , in the GLOGEN-bound WT ␣1I (Fig. 5G). The chemical shift and intensity of this peak suggest that this FIGURE 4. Effects of GLOGEN binding on ␣1I. A, TALOSϩ-predicted S 2 value for the unliganded (red) and GLOGEN-bound (black) ␣1I based on the NMR chemical shifts. S 2 , the angular order parameter, describes the rigidity of a residue as a measure of its backbone conformational entropy. It ranges between 0 for a completely disordered residue and 1 for a rigid residue. Significant drops at the ␤E-␣6 loop (uncoiled from C helix) and at the C terminus in the bound state indicate a gain in flexibility in these regions of ␣1I upon GLOGEN binding. B, 1 H, 15 N NOE measurements for the unliganded (red) and GLOGEN-bound ␣1I (black). The low values for residues in the ␤E-␣6 loop and at the C terminus are consistent with increased flexibility in the presence of GLOGEN. C, percentage of signal intensity loss in the 1 H, 15 N TROSY spectrum as a consequence of peptide binding. Residues with cross-peaks that lost more than 70% intensity are highlighted in red. D, weighted chemical shift perturbations measured in the TROSY spectra highlighted regions (highlighted with gray bars across the other panels) that were most significantly influenced by peptide binding. Regions with the largest CSPs include the three MIDAS loops, ␤E-␣6 loop, and helix 7, which is distant from the MIDAS but for which a 12-Å downward displacement was observed in the structure. E and F, aligned structures with these regions highlighted for the unliganded (red; Protein Data Bank code 1QCY) and GLOGEN-bound (black) ␣1I domain. E shows a zoomed image of the C helix, which is uncoiled in the liganded structure. The spheres represent the Mg 2ϩ ions. In F, the salt bridge formed between residue Glu 317 and Arg 287 in the unliganded (red) or Arg 171 in the liganded (black) states is indicated as a black dotted line. . The spectrum of the unliganded R287A mutant showed a greater degree of similarity to the peptide-bound state of WT ␣1I, suggesting that the unbound state of the R287A mutant adopts a conformation that is more similar to the activated state of the WT ␣1I. E, unliganded WT (red) and E317A (blue) ␣1I. F, WT ␣1I with GLOGEN (black) and unliganded E317A ␣1I (blue). The spectrum of the unliganded E317A was also more similar to the spectrum of the peptide-bound state of WT ␣1I, suggesting that the E317A also adopts a similar conformation. G and H, zoomed regions of the TROSY spectra highlighting residue Ile 331 in unliganded WT ␣1I (red), GLOGEN-bound WT ␣1I (black), and unliganded R287A (blue) (G) or E317A (blue) (H). NMR spectra were acquired at 800 MHz and 293 K. DECEMBER 27, 2013 • VOLUME 288 • NUMBER 52

JOURNAL OF BIOLOGICAL CHEMISTRY 36803
cross-peak in the spectrum of the R287A mutant also represents the C-terminal residue, implying that helix 7 is displaced in the R287A mutant in the absence of peptide. Addition of GLOGEN (400 M) to R287A induced relatively minor chemical shift changes, and the spectrum also resembles the spectrum of the GLOGEN-bound WT ␣1I. With the aid of the 15 N, 1 H assignments of the GLOGEN-bound WT ␣1I, residues that were more severely perturbed by the R287A mutation in the bound state were identified. The majority of perturbed residues are located in close proximity either to the site of mutation or to residues within or adjacent to the MIDAS (Fig. 6).
As was the case with R287A, the 1 H, 15 N TROSY spectrum of E317A in the absence of GLOGEN displayed significant differences from the spectrum of unliganded WT ␣1I. The spectrum appeared to be more similar to the GLOGEN-bound state of WT ␣1I (Fig. 5, E and F). The spectrum of E317A also contains a strong peak at a chemical shift that is coincident with the C-terminal residue (Ile 331 ) in the GLOGEN-bound spectrum of WT ␣1I, suggesting that helix 7 in E317A is displaced (Fig. 5H). Addition of GLOGEN (400 M) to E317A resulted in relatively minor changes to the spectrum. However, the pattern of perturbations relative to WT ␣1I was different from that observed with R287A. Fewer perturbations were observed for residues adjacent to the MIDAS in the spectrum of E317A. In contrast, the majority of the perturbed residues were located in helix 1, ␤E-␣6 (the uncoiled C helix), and ␤F sheet (Fig. 6).
Mapping the Binding Interface-The CSPs and chemical exchange broadening that were observed in the two-dimensional 1 H, 15 N TROSY spectrum of WT ␣1I upon GLOGEN binding were assessed to map the binding interface of the complex. CSPs and chemical exchange broadening are observed for residues that undergo a change in their chemical environment upon binding. This can be caused both by direct interaction with the ligand and indirect effects such as those due to conformational change. The analysis revealed that 22 residues lost more than 70% of their signal intensity on peptide binding. These residues were mostly located in the three MIDAS loops, loop ␤E-␣6, and helix 6. CSPs on the other hand mapped both to the same regions and to residues in helix 7 (Fig. 4, C and D).
Structure of the ␣1I-GLOGEN Complex-A structure of the ␣1I-GLOGEN complex was generated using the data-driven docking program HADDOCK (45, 46) (Protein Data Bank code 2M32). Based on the selection criteria described above, residues in helix 7 were excluded from the definition of the binding interface as they showed no broadening, and it was inferred that the CSPs were due to the observed conformational change in this region rather than a direct interaction with GLOGEN. In addition, residues 281-292 from the ␤E-␣6 loop were excluded from the binding interface as they showed fast time scale dynamics in the heteronuclear NOE experiment and were therefore deemed to be disordered in the bound state. The residues of ␣1I defined as active for the HADDOCK calculation comprised the surface-exposed residues with CSP Ͼ0.5 ppm as well as residues with a loss of signal intensity Ͼ70%, namely Ser 154 , Ile 155 , Tyr 156 , Arg 218 , Gln 219 , Glu 255 , and His 257 . Only Asn 153 satisfied the criteria for being a passive residue. These AIRs were used as input for the HADDOCK calculation. No intermolecular NOEs were included in the HADDOCK calculations. Analysis of the 200 final structures using their HADDOCK score (45,46) produced a cluster of 10 structures, all of which fell within the 20 best scoring structures. This cluster was selected to represent the structure of triple-helical GLOGEN peptide bound to ␣1I (Fig. 7). The lowest energy structure of the 10 was selected to best represent the model of the complex. It shows the peptide binding along a "trench" above the MIDAS (Fig. 7C). Consistent with the ␣2I-GFOGER complex crystal structure (22), the ␣1I molecule binds exclusively to the leading (cyan) and middle (orange) strands as shown in Fig. 7F. The trailing strand (green) does not make any interaction with ␣1I. The binding orientation of the peptide, however, is different from the ␣2I crystal structure in which GFOGER sits along the edge of the trench (Fig. 7, compare C and D) (22). A more detailed discussion and comparison of the two structures is presented below.
Analysis of the Complex at Low GLOGEN:␣1I Ratios-Potential causes of the broadening observed in the 1 H, 15 N TROSY spectrum of ␣1I at lower peptide concentrations were investigated using analytical SEC and SAXS. Analysis of the SEC elution profile of a sample containing ␣1I and GLOGEN peptide revealed the presence of two distinct peaks. The first of these eluted at a volume similar to unliganded ␣1I (11.62 ml), whereas the second, which was inferred to correspond to the ␣1I-peptide complex, eluted considerably earlier (10.02 ml) (Fig. 8). The Ͼ1.5-ml shorter elution volume suggested that the molecular weight of the complex species was much higher than that of the unliganded state. Comparison of the elution volume with other proteins with different molecular weight on the same column 3 suggested that this peak corresponds to a species that is much bigger than the expected 1:1 GLOGEN-bound ␣1I species (molecular mass, 27.4 kDa).
Synchrotron SAXS data were acquired for ␣1I in the absence and presence of GLOGEN ( Fig. 9 and Table 2). The SAXS experiments were set up in line with SEC. The sample containing the mixture of ␣1I and GLOGEN eluted earlier than the unliganded ␣1I sample, indicating a larger species in the mix-ture. Porod analysis of the scattering data indicated that the estimated molecular mass of ␣1I-GLOGEN complex was 40.7 Ϯ 0.5 kDa, whereas that of the unliganded ␣1I was 19.2 Ϯ 0.2 kDa. Although the SAXS-derived parameters such as R g and volume were stable across much of the peak (varying by less than 2%) and consistent with a complex containing two ␣1I molecules, the resolution of the small bed volume column used was insufficient to cleanly separate the larger species from later eluting species such as 1:1 ␣1I-GLOGEN complex or free ␣1I. Indeed, single value decomposition analysis using SvdPlot in PRIMUS (53) of the entire elution consisting of 11 SAXS data sets from across the peak suggested a mixture of at least three species.
The R g values, distance distribution functions P(r), and calculated maximum dimension (D max ) (Fig. 9) also suggested the presence of a GLOGEN-bound ␣1I species that was considerably bigger and more elongated than the unliganded ␣1I or a putative 1:1 ␣1I-GLOGEN complex. Therefore, we postulated that a complex composed of two ␣1I molecules bound to a single homotrimeric peptide may have formed. The ␣1I domain can in theory bind to any two of the three strands of the peptide to form three different combinations of two ␣1I domains bound to one GLOGEN triple helix (2:1 complex). These three models were built using the lowest energy HADDOCK model as a template. Two of the three combinations showed significant steric clashes and were excluded from further investigation (data not shown). Although the SAXS data were likely to be from a mixture, the Porod mass estimate suggested that the volume fraction of any smaller species was reasonably small (ϳ10%). Therefore, ab initio reconstruction from the SAXS was performed, and an average-filtered shape envelope was generated. Although the normalized special discrepancy of the shape envelopes was low (0.711 Ϯ 0.067), a relatively high reduced 2 statistic ( v 2 ) of 2.13 Ϯ 0.02 was noted from the DAMMIF reconstructions. This appears to be largely due to the detector count statistics of the synchrotron SAXS data underestimating the true uncertainty but may also reflect the inability of a single shape to adequately describe the scattering from the mixture. Only the middle and leading strands showed specific interactions with ␣1I. Note that the asparagine from the leading strand is accommodated in a surface pocket next to the Mg 2ϩ ion (E). DECEMBER 27, 2013 • VOLUME 288 • NUMBER 52

JOURNAL OF BIOLOGICAL CHEMISTRY 36805
The HADDOCK-derived 2:1 model shows good shape correspondence with the SAXS envelope (Fig. 9). Despite the overall shape similarity between the HADDOCK-derived model and average-filtered shape (normalized special discrepancy of 0.94), the fit of the theoretical scattering profile from this model to the SAXS data generated using CRYSOL (51) was relatively poor with a v 2 of 5.3. There are several plausible reasons why this is the case. Again, the issues of detector count statistics and the presence of other species (e.g. free peptide, unbound ␣1I, and monomeric complex) would have contributed to the high value. In addition, the GPO repeats at the tails of the GLOGEN peptides from the HADDOCK model fit particularly poorly in the envelope, suggesting that they may be flexible. To address the issue of mixed species, OLIGOMER (53) was used to estimate the scattering contribution of the different species present. The best fit obtained was for 88% 2:1 complex, 7% 1:1 complex, and 5% free peptide. The v 2 for this volume fraction composition was 3.1, which represents a statistically significant improvement in the goodness of fit (F ϭ 1.71; P F Ͻ 3 ϫ 10 Ϫ7 ) over the 2:1 complex alone or any other combination of species tested. It seems likely that further improvements in goodness of fit could be obtained by considering flexibility either in the GLOGEN peptide tails or in the ␤E-␣6 loop and helix 7 and/or some variability in the relative ␣1I domain orientation in the 2:1 complex. Nonetheless, in relative terms, the HADDOCK-derived 2:1 model provides a statistically significant improvement in fit compared with any other 2:1 model tested including the corresponding 2:1 complex based on the GFOGER-␣2I crystal structure, suggesting that the difference in peptide binding orientation observed between the ␣1I and ␣2I complex is genuine.

DISCUSSION
We have determined the structure of the ␣1I domain bound to a triple-helical peptide that is a mimic of its natural ligand FIGURE 8. Elution profile of ␣1I in the absence (red) and presence (black) of GLOGEN on an analytical size exclusion column. In the red trace, the peak at 11.68 ml represents unliganded ␣1I, whereas the peak at 10.02 ml observed in the presence of GLOGEN corresponds to a much larger species. Comparison of the elution volume with albumin (67 kDa) and ovalbumin (44 kDa) revealed that the complex is likely to have a molecular mass greater than the 27 kDa expected for a 1:1 GLOGEN-␣1I complex. FIGURE 9. SAXS model of ␣1I and GLOGEN-␣1I complex. A, plot of SAXS diffraction intensity versus q for the samples containing ␣1I alone (red) and ␣1I with GLOGEN (black). Error bars indicate Ϯ1 S.D. B, Porod analysis on the data acquired for the samples containing ␣1I alone (red) and ␣1I with GLOGEN (black). C, alignment of the model of the dimeric ␣1I-GLOGEN complex with the ab initio SAXS envelope. The model consists of two ␣1I molecules (red) and one triple-helical peptide (black). The two ␣1I molecules were orientated so that the Mg 2ϩ ions at the MIDAS are coordinated by glutamate residues from two of the three peptide strands. The SAXS envelope (shown in gray grid) shows a slightly bent but elongated shape, which fits well with the dimeric model even though the envelope is unable to fit the two ends of the peptide. AU, arbitrary units.
collagen. The structure reveals that binding is accompanied by movement of the MIDAS loops as well as major conformational changes in the C helix (residues Leu 282 -Gly 288 ), which became uncoiled in the bound state, and a displacement of the C-terminal helix (helix 7), which moved by ϳ12 Å relative to its position in the unliganded state. These structural rearrangements are consistent with those observed in the structure of the ␣2I upon binding to GFOGER. Further validation of the current structure is provided by a recent analysis of the dynamics of ␣1I in complex with collagen type IV using hydrogen-deuterium exchange mass spectroscopy (57). The study provided evidence of increased conformational mobility in the region between residues 283 and 290, corresponding to loop ␤E-␣6 (including the uncoiled C helix), as well as the region between residues 318 and 332, which corresponds to helix 7. The intervening elements between the two regions also showed moderate conformational changes as compared with the unliganded state of ␣1I. This is confirmed by our NMR results where T1/T2 relaxation (Fig. 2), heteronuclear NOE experiments (Fig. 4B), the S 2 values predicted from chemical shifts (Fig. 4A), and the NMR structures themselves (Fig. 3A) all indicate conformational changes and increased backbone dynamics in these regions of ␣1I upon GLOGEN binding.
The current structure provides a basis to investigate the mechanisms by which collagen binding, which is accompanied by subtle changes close to the MIDAS, leads to the large conformational changes that are thought to propagate signaling. In the peptide-bound state, the MIDAS rearrangement is associated with a change in the conformation of the side chain of Tyr 285 , which rotates away from its original position and exposes an accessible binding site for collagen or collagen peptides. The C helix unfolds, and the resulting ␤E-␣6 loop is highly flexible as indicated by the NMR relaxation data and bends away from the collagen peptide so that it is not involved directly in the binding interaction. Peptide binding also causes a kink in helix 6 (residues Glu 293 -Ala 303 ), which results in a small displacement (ϳ4.5 Å) in the adjacent short ␤-strand (␤F) and the large downward movement of helix 7 (residues Glu 317 -Glu 329 ). This downward movement of helix 7 has been postulated to be key for integrin signaling and mediates communication of the ␣1I domain with the ␤I domain of the integrin ␤ subunit (58). A new salt bridge between Glu 317 and Arg 171 was observed in the bound state, linking the displaced helix 7 to helix 1. To test whether this salt bridge is essential for stabilizing the active state of ␣1I, we made the R171A mutation and examined its effect upon the conformation of ␣1I in its unliganded and liganded states using 1 H, 15 N TROSY NMR. However, the similarity of the WT and R171A spectra in both their unliganded and GLOGEN-bound states suggests that the salt bridge is not crucial for collagen binding or for ␣1I activation.
In contrast, either the R287A or E317A mutation resulted in large perturbations being observed in the 1 H, 15 N TROSY spectra of the unliganded proteins relative to the spectrum of WT ␣1I. In the case of E317A, the perturbations we observed are consistent with a previous study (30). In that case, the authors proposed that E317A represents an activated conformation of ␣1I as the mutation of Glu 317 enhanced signaling as shown by increased ERK activation and down-regulation of collagen synthesis. It was suggested that the E317A mutation led to a downward displacement of helix 7 not due to disruption of the salt bridge with Arg 287 but as a result of the loss of a favorable monopole-helical dipole interaction between the negative charge of Glu 317 and the partial positive charge of the helix 7 dipole (30). In support of this interpretation, the R287A mutation in the same study did not result in similar activation of the intact receptor.
Our results do not support this interpretation. We have acquired 1 H, 15 N TROSY for both mutants, and with the benefit of full assignments for WT ␣1I, the spectra provide strong evidence that helix 7 is displaced in both cases. However, it is also apparent from analysis of the spectra that the two mutations have different effects on residues at the collagen-binding site of ␣1I (Fig. 6). Thus, we believe that the differences observed in the previous study may have resulted from the mutations altering the collagen binding affinity of the two proteins.
The HADDOCK model of ␣1I in complex with GLOGEN shows that the peptide binds along a surface trench above the MIDAS on ␣1I (Fig. 7C). In contrast, in the structure of ␣2I in complex with GFOGER, the peptide binds at the edge of a similar trench (Fig. 7D) (22). Our SAXS data support the HADDOCK model of ␣1I-GLOGEN, whereas a model based on the ␣2I-GFOGER crystal structure showed a very poor fit to the SAXS envelope. Analysis of the binding interfaces provides some indications as to the structural basis of these differences. First, the asparagine residues of the leading strand GLOGEN motif sits in an acidic pocket formed by Asp 253 and Gly 254 of ␣1I (Fig. 7E). Replacing the asparagine with a bulkier arginine could potentially create a steric clash in that pocket, which may TABLE 2 SAXS data collection and scattering-derived parameters † Reported for the averaged SAXS data where Porod volume estimates from successive blocks of 5 ϫ 2-s exposures did not vary by more than 1.5% from the mean value. ‡ Calculated for two ␣1 monomers bound to a GLOGEN triple helix. N/A, not applicable. DECEMBER 27, 2013 • VOLUME 288 • NUMBER 52 account for the observation that GLOGEN binds more potently to the ␣1I domain than GFOGER. Second, comparison of the sequences indicates a change where residue Ala 218 in ␣1I corresponds to Asp 219 in ␣2I. In the structure of the ␣2I-GFOGER complex, Asp 219 is located at the edge of the trench and forms a salt bridge with one of the arginine residues in the GFOGER motif. The change to Ala 218 in ␣1I precludes the formation of a similar interaction in the ␣1I complex, which may contribute to both the peptide binding preferences observed for the two ␣I domains (15,21) and the subtle change in binding orientation of the peptides between the two complexes.

Structural Analysis of Integrin ␣1I Domain-Collagen Complex
Homotrimeric helical peptides have been used widely as collagen mimetics and represent excellent tools to study the binding of collagen on collagen receptors. They have been especially helpful in the process of identifying specific recognition motifs for collagen-binding integrins (17). For such homotrimeric peptides, there are potentially three equivalent binding sites that may support binding to integrin ␣I domains. Therefore, in a case where the individual strands in the peptide are in the appropriate orientation for ␣I binding and no steric hindrance is present, more than one strand in the same peptide should in theory be able to bind to an individual ␣I molecule simultaneously. Herein, our size exclusion and SAXS results provide evidence for the existence of a dimeric complex for ␣1I bound to GLOGEN. To our knowledge, this is the first report describing a dimeric complex between an ␣I domain and a triple-helical collagen peptide, but we believe that the SEC result presented by Lambert et al. (54) may also demonstrate the same phenomenon in ␣2I upon peptide binding, although it was explained as an artifact resulting from the elution of the rigid, elongated helical peptide. In the case of ␣1I bound to GLOGEN, T1/T2 NMR relaxation data confirmed the formation of a 1:1 monomeric complex at higher peptide concentrations. Based on these observations, we propose that in the presence of GLOGEN ␣1I adopts a dynamic equilibrium between an unliganded state, a dimeric complex state ((␣1I) 2 -GLOGEN), and a monomeric complex state (␣1I-GLOGEN). Under conditions where the GLOGEN concentration is low with respect to the ␣1I, the dimeric complex state dominates, but as the peptide:␣1I ratio increases, the equilibrium position shifts to favor the formation of monomer. However, as collagen is abundant in the ECM and ␣I domains recognize heterotrimeric collagen sequences, we believe that this dimeric complex is unlikely to dominate in vivo and may not be biologically significant.

CONCLUSION
The structure of ␣1I bound to GLOGEN suggests that the mechanism of signaling in this receptor is similar to that reported previously for ␣2I. However, the structure suggests that the mode of peptide binding to ␣1I is subtly different from that observed with ␣2I. The current structure provides a rationale for the observed differences in binding specificity between ␣1I and ␣2I.