The Solution Structure of EMILIN1 Globular C1q Domain Reveals a Disordered Insertion Necessary for Interaction with the α4β1 Integrin*

The extracellular matrix protein EMILIN1 (elastin microfibril interface located protein 1) is implicated in maintaining blood pressure homeostasis via the N-terminal elastin microfibril interface domain and in trophoblast invasion of the uterine wall via the globular C1q (gC1q) domain. Here, we describe the first NMR-based homology model structure of the human 52-kDa homotrimer of the EMILIN1 gC1q domain. In contrast to all of the gC1q (crystal) structures solved to date, the 10-stranded β-sandwich fold of the gC1q domain is reduced to nine β strands with a consequent increase in the size of the central cavity lumen. An unstructured loop, resulting from an insertion unique to EMILIN1 and EMILIN2 family members and located at the trimer apex upstream of the missing strand, specifically engages the α4β1 integrin. Using both Jurkat T and EA.hy926 endothelial cells as well as site-directed mutagenesis, we demonstrate that the ability of α4β1 integrins to recognize the trimeric EMILIN1 gC1q domain mainly depends on a single glutamic acid residue (Glu933). Static and flow adhesion of T cells and haptotactic migration of endothelial cells on gC1q is fully dependent on this residue. Thus, EMILIN1 gC1q-α4β1 represents a unique ligand/receptor system, with a requirement for a 3-fold arrangement of the interaction site.

The extracellular matrix protein EMILIN1 (elastin microfibril interface located protein 1) is implicated in maintaining blood pressure homeostasis via the N-terminal elastin microfibril interface domain and in trophoblast invasion of the uterine wall via the globular C1q (gC1q) domain. Here, we describe the first NMR-based homology model structure of the human 52-kDa homotrimer of the EMILIN1 gC1q domain. In contrast to all of the gC1q (crystal) structures solved to date, the 10-stranded ␤-sandwich fold of the gC1q domain is reduced to nine ␤ strands with a consequent increase in the size of the central cavity lumen. An unstructured loop, resulting from an insertion unique to EMILIN1 and EMILIN2 family members and located at the trimer apex upstream of the missing strand, specifically engages the ␣4␤1 integrin. Using both Jurkat T and EA.hy926 endothelial cells as well as site-directed mutagenesis, we demonstrate that the ability of ␣4␤1 integrins to recognize the trimeric EMILIN1 gC1q domain mainly depends on a single glutamic acid residue (Glu 933 ). Static and flow adhesion of T cells and haptotactic migration of endothelial cells on gC1q is fully dependent on this residue. Thus, EMILIN1 gC1q-␣4␤1 represents a unique ligand/receptor system, with a requirement for a 3-fold arrangement of the interaction site.
EMILIN1 (elastin microfibril interface located protein 1) is a secreted extracellular matrix multidomain glycoprotein (1,2). It is characterized by a unique arrangement of structural domains, including the elastin microfibril interface domain at the N terminus, an ␣-helical domain predicted to form a coiledcoil structure in the central part of the molecule, a short collagenous sequence, and a region homologous to the globular domain of C1q (gC1q domain) 4 at the C-terminal end (3,4). Although the role of the coiled-coil region has not yet been elucidated, it has conclusively been demonstrated that EMILIN1 interacts with pro-tumor growth factor-␤ (5) through the elastin microfibril interface domain (6). EMILIN1 deficiency causes systemic arterial hypertension, and the expression of EMILIN1 at physiological levels by binding to pro-tumor growth factor-␤ prevents its maturation by protein convertases (5). Thus, EMILIN1 favorably located at the subendothelium of blood vessels is a new specific antagonist of tumor growth factor-␤, and the function of this constituent of elastic tissues is linked to the pathogenesis of hypertension. The C-terminal gC1q domain is involved in the oligomerization of EMILIN1 (7), in cell adhesion and migration via interaction with the ␣4␤1 integrin (8), and in trophoblast invasion (9).
The gC1q signature is found in a variety of proteins, and the essential features of the specific structure-function relationship were recognized with the elucidation of the crystal structure of the homotrimeric gC1q domain of mouse ACRP30 (adipocyte complement-related protein of 30 kDa) (10). It suggested a structural and evolutionary link between the tumor necrosis factor and the gC1q domains and led to the recognition of a gC1q/tumor necrosis factor superfamily. Three other gC1q structures, namely those of human collagen X (11), mouse collagen VIII (12), and human complement C1q (13), have been solved by x-ray crystallography. Each gC1q domain adopts a 10-stranded ␤-sandwich fold with a jelly roll topology, consisting of five-stranded ␤-sheets (A, AЈ, H, C, F, BЈ, B, G, D, and E), each made of antiparallel strands. Besides sharing the folding topology, the human complement C1q, ACRP30, collagen X, and collagen ␣1VIII gC1q domains also have similar oligomerization motifs derived from the noncovalent association of three polypeptide chains to form trimeric globular domains. Homotrimeric domains are formed by ACRP30, collagen X, and collagen ␣1VIII gC1q domains, whereas the association of three different chains gives rise to a heterotrimer in the complement C1q protein.
The EMILIN1 gC1q domain shares similar residues with the gC1q domains of several other extracellular matrix and nonextracellular matrix components. Based on the homotrimeric model of ACRP30 gC1q crystal structure (10), it was supposed that the EMILIN1 gC1q domain could also trimerize via several hydrophobic residues. This was confirmed, first by biochemical (3,7) and later by preliminary structural evidence (14).
So far, among the C1q superfamily members, EMILIN1 gC1q is the only one capable of interacting with ␣4␤1 integrin, a molecule, predominantly expressed on the surface of hemopoietic cells, that serves as a receptor to allow adhesion to blood vessel wall (15,16) or extracellular matrix constituents (17). The interaction between ␣4␤1 and EMILIN1 gC1q is particularly efficient because even very low ligand concentrations provide very strong adhesion (8) and migration (9).
In the present study, we report the NMR solution structure of the EMILIN1 gC1q homotrimer domain and demonstrate that the ␣4␤1 binding site is located in a flexible loop at the apex of the trimeric structure and that it is absolutely dependent on the Glu 933 residue. The data provide a new understanding of how this ligand/receptor pair interact to mediate ␣4␤1 integrin-dependent haptotactic cell migration as well as cell adhesion, under both static and flow conditions, to subendothelial EMILIN1.

EXPERIMENTAL PROCEDURES
Sample Preparation-The recombinant human EMILIN1 gC1q domain spanning residues from 846 to the end of the human EMILIN1 published sequence (3) was expressed as His 6 -tagged protein and purified as previously described (7). This recombinant product forms a stable trimeric protein with an acidic isoelectric point (pI, 5.32) having a molecular mass of 51,624 Da. The increase of 1,151 Da compared with the value formerly published (14) arises from correcting the primary sequence that had been used for calculating the molecular mass by nucleotide sequence controls and mass spectrometry measurements. A 20 mM phosphate, 100 mM NaCl, 0.02% (w/v) NaN 3 , 5% (v/v) D 2 O, pH 7.5 solution was used to dissolve the protein, yielding a clear sample that proved to be stable for several months. A 1.8 mM protomer concentration sample (0.6 mM homotrimer concentration) of protein uniformly labeled in 13 C, uniformly labeled in 15 N, 80% 2 H-labeled in the mentioned buffer was used for the majority of the NMR experiments. All of the labeled protein samples were from Asla Biotech.
Multidimensional NMR Spectroscopy-NMR experiments were performed at 37°C on home-built/GE-Omega spectrometers operating at 500, 600, or 750 MHz (proton resonance frequency) at Oxford and on a Bruker Avance 500 NMR system at Udine. Backbone sequential assignment was achieved using six three-dimensional triple resonance experiments: [ 15 N, 1 (18,19). For each of these experiments the corresponding 1 H-13 C overall projections were also collected. Typically two-dimensional 1 H-13 C correlation spectra were obtained over a 13 C sweep width of 28 -60 ppm, with 512 t1 increments of 1024 complex data points and 32-96 scans/t1. Three-dimensional spectra were typically acquired with 64 ϫ 48 ϫ 512 complex data points and 32-96 scans/FID (Free Induction Decay). Proton NOE connectivities were identified from the analysis of a three-dimensional 15 N NOE spectroscopy heteronuclear single quantum coherence spectrum (20) recorded at 600 MHz ( 1 H frequency) with a mixing time of 150 ms and acquisition times of 24.9 ms (t1, 1 H), 8.8 ms (t2, 15 N), and 57.3 ms (t3, 1 H) for a three-dimensional matrix of 220 ϫ 18 ϫ 512 complex data points. Two-dimensional 15 N{ 1 H} NOE measurements (21) were also performed at 500 MHz and 37°C, with a 5-s relaxation or saturation delay, 160 t1 increments of 1024 complex data points, and 480 scans/t1. The data were processed and analyzed with Felix (Accelrys), and Sparky 3. 5 Original sizes ( 15 N and 13 C indirect dimensions) were typically increased by linear prediction. Zero filling followed by apodization with shifted sine bell functions was applied prior to Fourier transformation. Proton chemical shifts were referenced to internal dioxane (3.750 ppm), whereas 13 C and 15 N chemical shifts were referenced indirectly from the hydrogen reference frequency by heteronuclear absolute frequency ratio.
Homology Modeling and Refinement-The homology model for the EMILIN1 gC1q domain was built after alignment with ClustalW (22), including 13 human proteins of the C1q protein family. The A chain of ACRP30 protein (ACRP30-A) (10) was recognized as the best sequence matching the EMILIN1 gC1q domain (supplemental Table S1), based on the pairwise alignment score and the number of matched amino acids, and used as a template for the modeling. Overall, 20% of sequence identity and 31.3% of sequence similarity were found on comparing the query protein and the template, after introducing in the query sequence six gaps, two short 3-and 4-residue insertions, and one long 12-residue insertion. Using the ACRP30-A homologous protein, a model for a monomer of the EMILIN1 gC1q domain was built with the program GeneMine. The symmetric trimer structure was assembled by superposition of three identical monomers onto the three subunits of the template, followed by regularization and minimization. The homology model refinement protocol followed a low temperature simulated annealing strategy described by Chou et al. (23), implemented in the program Xplor-NIH (24). The molecular dynamics algorithm of Xplor-NIH, besides containing conditional energy term penalties to enforce experimental constraints, includes additional potential energy terms specifically developed for NMR structure determination that bias the conformational space sampling in favor of regions that are physically accessible, thus improving the performance of the structure calculation. For our purposes, a crucial feature of the Xplor-NIH program was the possibility of including a dipolar energy term to exploit the experimental residual dipolar coupling (RDC) information measured in weakly aligned media to derive orientational restraints for structure determination (25,26). For the analysis of EMILIN1 gC1q domain in anisotropic phase, a uniformly compressed polyacrylamide gel system was 5 T. D. Goddard and D. G. Kneller, personal communication.
Site-directed Mutagenesis-Two sets of alanine mutants were prepared. One comprised a series of mutations in the most conserved residues of EMILIN1 gC1q domain including F852A, F886A, Y894A, L956A, F981A, and L986A. The second set comprised a series of mutants in the unstructured segment unique to EMILIN1 and EMILIN2. The mutations were E930A (L932A/E933A), E933A, (G945A/G948A), F950A, and F950Y. The codon mutation within the DNA sequence of EMILIN1 gC1q were generated by site-directed mutagenesis using the overlapping PCR approach (28). Briefly, in a first PCR the primer carrying the desired mutation was used in combination with 5Ј-and 3Ј-flanking primers to generate two overlapping fragments using, as template, the wild type EMILIN1 gC1q coding sequence inserted in the expression vector pQE-30 (Qiagen) between the I and KpnI restriction sites. The overlapping fragments were gel-purified and used as templates in a two-step PCR consisting of 12 elongation cycles in which the overlapping region work as primers and addition of the 5Ј-and 3Ј-flanking primers carrying BamHI and KpnI restriction sites, respectively, followed by 25 amplification cycles. The final products were purified by wizard SV columns (Promega), cut with BamHI and KpnI, and ligated in pQE-30 cut with the same enzymes. Ligation products were transformed in MI5 bacterial strain. Four deletion mutants, del(Leu 932 -Phe 950 ), del(Pro 942 -Phe 950 ), del(Tyr 927 -Gly 945 ), and del(Leu 932 -Gly 945 ) were generated following the same protocol. The expression constructs containing mutant sequences were verified by DNA sequencing (Beckman 2000). The oligonucleotides were purchased from Sigma Genosys.
Recombinant Protein Purification and SDS-PAGE Analysis-Recombinant proteins were expressed as His 6 -tagged protein and extracted under native conditions as previously described (7). Briefly, 100 ml of liquid culture grown at 0.6 A 600 nm was induced with 2 mM isopropyl-1-thio-D-galactopyranoside for 3 h at 37°C. The culture was then centrifuged at 4,000 ϫ g for 20 min, and the cell pellet was resuspended in sonication buffer (50 mM sodium phosphate, pH 8.0, 0.3 M NaCl) at 5 volumes/g of wet weight. The samples were frozen in a dry ice/ethanol bath, thawed in cold H 2 O, incubated 1 h with 1 mg/ml lysozyme, and sonicated on ice (1-min bursts/1-min cooling/200 -300 watts). The cell lysate was centrifuged at 10,000 ϫ g for 20 min; supernatant and pellet were loaded on SDS-PAGE gel to check the respective amounts of soluble and insoluble fractions. The purification of the soluble mutants was performed by affinity chromatography on nickel-nitrilotriacetic acid resin (Qiagen). When resistance to denaturation by heating in the presence of 1% SDS was assayed, the samples were incubated in sample buffer (20 mM Tris-HCl, pH 7.6, 100 mM NaCl, 25 mM EDTA, 1% SDS, 2 mM phenylmethanesulfonyl fluoride, 5 mM N-ethylmaleimide) for 1 h at different temperatures just before gel loading.
Cell Adhesion Assay-The quantitative cell adhesion assay used in this study is based on centrifugation and has been extensively described (29,30). Human Jurkat T cell leukemia or endothelial EA.hy926 cells that display a number of features characteristic of vascular endothelial cells (31) were labeled with the vital fluorochrome calcein AM (Molecular Probes Inc., Eugene, OR) for 15 min at 37°C and then aliquoted into the bottom miniplates assemblies. In experiments aimed at examining the effects of blocking antibodies, i.e. 1HG8 against EMILIN1 gC1q (8) and rabbit polyclonal against EMILIN1 (7), the various antibodies were added directly to the wells, just before plating the cells. The relative number of cells bound to the substrate (i.e. remaining in the wells of the bottom miniplates) and cells that fail to bind to the substrate (i.e. remaining in the wells of the top miniplates) was estimated by top/bottom fluorescence detection in a computer-interfaced GENios Plus microplate fluorometer (TECAN Italy) (30).
Cell Motility Assay-Haptotactic-like migration of EA.hy926 cells in response to wt Glu 933 or mutant E933A substrates was assessed by fluorescence-assisted transmigration invasion and motility assay (30 -32). The procedure is based on the use of Transwell-like inserts carrying fluorescence-shielding porous polyethylene terephthalate membranes (polycarbonate-like material with 8 m pores) and HTS FluoroBlok TM inserts (Becton-Dickinson, Falcon, Milan, Italy). The membranes were coated (20 g/ml in 50 l) on the underside in bicarbonate buffer at 4°C overnight and blocked with 1% bovine serum albumin for 1 h at room temperature. The cells were fluorescently tagged with the lipophilic dye DiI (Molecular Probes) used at a final concentration of 5 g/ml for 10 -15 min at 37°C. The cells were washed, resuspended, and then added to the upper side of the inserts (2 ϫ 10 5 cells/insert), and migratory behavior was monitored at different time intervals using the computer-interfaced SPECTRAFluor Plus instrument (TECAN Italy).
Perfusion Experiments-Proteins were coated on glass coverslips (24 ϫ 50 mm) that represented the lower surface of a modified Richardson's parallel flow chamber mounted on an inverted microscope; a silicon rubber gasket determined the flow path height (125 m) between the glass coverslip and the upper plate. The chamber was assembled and filled with phosphate-buffered saline, pH 7.4. A syringe pump (Harvard apparatus Inc., Boston, MA) and silicone tubing connections were used to aspirate the Jurkat cell suspension through the chamber at a flow rate of 20 s Ϫ1 . The perfusion system was mounted on an inverted microscope equipped with epifluorescent illumination (Diaphot-TMD; Nikon) and an intensified CCD video camera (C-2400-87, Hamamatsu Photonics). The images were captured at a sampling rate of 25 frames/s during the experiment, and image analysis was performed using recently developed algorithms and programs (33) based on the MatLaB Image tool (the MathWorks, Inc.).

RESULTS
NMR Structure of the EMILIN1 gC1q1 Domain-The recombinant EMILIN1 gC1q domain exists in solution as a stable trimeric protein formed by three identical polypeptide chains of 162 amino acids (150 from the natural domain sequence, 7 from N-terminal fused histidine tag and initial Met residue, and 5 derived from the cloning strategy) with an overall molecular mass of 51,624 Da (3). The 3-fold symmetry, expected from homotrimeric assembly, makes NMR analysis viable, albeit very challenging. We determined the three-dimensional solution structure of the EMILIN1 gC1q domain using a hybrid approach that involved homology modeling and structure EMILIN1 gC1q Structure and ␣4␤1 Interaction Site JULY 4, 2008 • VOLUME 283 • NUMBER 27 JOURNAL OF BIOLOGICAL CHEMISTRY 18949 refinement with experimental restraints obtained by NMR spectroscopy. For additional details of the three-dimensional solution structure refer to supplemental Table S2.
First, we acquired high resolution heteronuclear three-dimensional spectra for the sequential assignment of the backbone resonances using a uniformly 13 C, 15 N, 80% 2 H-labeled protein sample. To overcome fast signal decay problems, TROSY-type pulse schemes uniquely suited for the analysis of large molecules at high magnetic field were employed (34). Overall, the assignment process used six three-dimensional triple resonance experiments together with six two-dimensional 1 H-13 C high resolution spectra to remove ambiguities (14). 13 C ␣ and 13 C ␤ resonances were unambiguously identified for all of the 150 amino acids in the protein; the initial 12-residue segment containing the histidine tag were excluded. Assignment percentages of 98 and 97% were obtained for 13 CЈ and 1 H N / 15 N nuclei, respectively (chemical shifts have been deposited in the BioMagResBank under the BMRB accession number 5882).
From the available chemical shifts, a possible location of secondary structure elements for the EMILIN1 gC1q domain was rapidly derived using three different NMR approaches as summarized in Fig. 1: Chemical Shift Index (35), TALOS predictions for backbone torsion angles and (36), and identification of slow exchanging amide protons. The predicted secondary structure elements were mostly confirmed in the final refined three-dimensional structure (Protein Data Bank code 2OII) that, according to our data, is composed of nine ␤-strands and one unstructured region. Following the second-ary structure numbering of homologous proteins (10), the latter unstructured stretch reads Tyr 927 -Gly 945 , whereas the identified gC1q ␤-strands are: A (Als 851 -Ala 855 ), AЈ (Arg 870 -Val 871 ), BЈ (Tyr 879 -Asp 880 ), B (Val 885 -Thr 887 ), C (Gly 892 -Ala 898 ), D (Glu 909 -Leu 912 ), E (Asp 923 -Ser 924 ), G (Thr 961 -Asp 965 ), H (Phe 981 -Gly 988 ). The flexible character of the unstructured segment Tyr 927 -Gly 945 , suggested by the sharpness of the corresponding NMR signals, was confirmed by 15 N{ 1 H} NOE measurements, which exhibited local enhancements ranging between Ϫ90 and Ϫ110%, as opposed to an average value of Ϫ15% observed for all the other residues but the N-and C-terminal ones (not shown).
The definition of the tertiary and quaternary structure for the EMILIN1 gC1q domain was accomplished by means of a refinement of the regularized homology model with heteronuclear dipolar couplings measured in anisotropic gel phase, following a strategy proposed by Chou et al. (23). Although the number of one-bond RDCs that were measured for NH vectors ( 1 D NH ), CЈN vectors ( 1 D C'N ), and CЈC ␣ vectors ( 1 D C'C␣ ) did not reach the requirement of three values/residue (23) (supplemental Table S2), further information was added in the refinement data set from NOEs and from backbone dihedral angle restraints that proved consistent with the homology model (in addition to those consistent with TALOS). The starting homology model for the trimeric EMILIN1 gC1q assembly was obtained by using the ACRP30 chain A crystal structure (10) as template protein. The quaternary structure of this model is formed by three identical subunits arranged with a 3-fold symmetry axis. Each monomer in the trimer shows a ␤-sandwich folding topology containing 10 segments of ␤-strands and one long unstructured region spanning residues Gly 931 -Ser 951 . This region could not be modeled by homology because it corresponds to a long insertion that is unique to EMILIN1 and EMILIN2 (6). Attempts to model the same segment by sequence similarity with other proteins in the Protein Data Bank were unsuccessful, and the region was always classified either as "disordered" or "unstructured" by various secondary structure prediction servers (Psipred, Sam, and Jufo).
Tertiary and Quaternary Structure Analysis-The lowest RDC restraint energy structure of the trimeric EMILIN1 gC1q domain obtained after the two stages of the simulated annealing refinement is represented in Fig. 2 and supplemental Table S2. Each subunit is composed of nine ␤-strands arranged in two sheets giving a ␤-sandwich folding topology. The association between monomers mainly occurs through the buried strands A, AЈ, H, C belonging to sheet 1, and the unstructured region spanning residues Leu 947 -Pro 955 , whereas strands B, BЈ, D, E, and G constitute the external second sheet. The hydrophobic core of the EMILIN1 gC1q monomer is formed mainly by the side chains of residues Phe 852 , Tyr 879 , Phe 886 , Tyr 894 , Leu 956 , Val 962 , and Gly 983 (Fig. 2C), all belonging to the group of most conserved residues in C1q family (4). Notably, the corresponding side chains of the homologous protein ACRP30 (chain A) have similar orientations, which highlights a conservative overall packing of the cores in the two molecules. Thus, the refined model maintains the same folding topology as the starting homology model, although some changes are observed: (i) loops, which were not restrained during the simulated annealing, show some deviations from the starting conformation; (ii) small variations are found in the length of ␤-strand elements; and (iii) segment I953-L956, originally classified as ␤-strand (strand F) becomes unstructured. This region, however, still remains in contact with the adjacent subunit participating to the trimer interface, as confirmed also by the reduced local mobility assessed by heteronuclear NOE measurements (not shown).
The most relevant changes that occur between the structures before and after refinement are in the strand F region where comparison shows that: (i) in the initial regularized homology model, strand E (Ala 920 -Gly 925 , sheet 2) is joined to strand F (Ile 953 -Leu 956 , sheet 1) by a long unstructured loop (Tyr 927 -Gly 945 ); (ii) in the refined model, strand F melts away so that the whole segment secondary structure is a ␤-strand (E), a long unstructured loop (Tyr 927 -Gly 945 ), and an interfacial region with no secondary classification (F). Although shorter in length, the position of strand E remains constant before and after the refinement. Also the location of region F remains invariant, despite the fact that the new and angles no longer define a ␤ strand. Region F, however, is still involved in intersubunit interactions, as observed in the structure prior to refinement. In other words, despite the loss of local secondary structure, region F is still expected to be rigid. In contrast, the region Tyr 927 -Gly 945 was classified as flexible and disordered in every analysis performed, according to 15 N{ 1 H} NOE evidence. Therefore, we simply excluded that segment from the structure refinement consistently with the absence of any corresponding experimental restraints. The Tyr 927 -Gly 945 loop is located mostly in the apical part of the trimer in the initial regularized model, prior to simulated annealing refinement. Because, either before and after refinement, we know precisely the location and orientation of the preceding strand E and of the following strand/molten strand F, we infer that the position of the loop ends should remain invariant in the final structure. Region Tyr 927 -Gly 945 appears as a long flexible loop located by the apex of the molecule, possibly available for interactions with ligands/receptors of EMILIN1 gC1q.
The quaternary structure consists of three subunits that are mostly identical, apart from very small variations intrinsically arising during molecular dynamics simulations. The interface contacts between subunits are mainly hydrophobic and located near the base of the trimer, the top being still slightly open (ϳ150 Å 2 ) to form a cavity ϳ16 Å deep. These contacts involve the following strand and adjacent segment residues: strand AЈ (Arg 870 -Leu 873 ), strand A (Ala 851 -Ala 855 ), strand H (Phe 981 -Gly 988 ), strand C (Arg 893 -Ala 898 ), and region F (Leu 947 -Leu 956 ). Among the 30 residues that establish intersubunit interactions, only two are charged (Arg 870 and Arg 893 ) and located at opposite ends of an extended hydrophobic surface. The presence of hydrophobic contacts near the base of the trimer has been highlighted for other members of the C1q family and apparently is a common feature of these molecules (collagen X, ACRP30, and collagen VIII). In contrast, the occurrence of hydrophilic interactions at the apex of the trimeric adduct, seen in collagen X, ACRP30, and collagen VIII, cannot be confirmed in our structure, which is relatively loose in that area.
By comparing the association of the monomers in the initially regularized homology model and in the final refined structure, a definite increase of the size of the central cavity encompassed by the three subunits is observed, probably because of topology changes undergone by each monomer during the refinement to optimize the molecular packing. The buried surface upon trimerization calculated with a probe radius of 1.4 Å is 3,128 Å 2 , a value that indicates a weaker intermonomer interaction with respect to the tight associations that lead to burying at least 5,000 Å 2 in ACRP30 (5,320 Å 2 ) and complement C1q protein (5,490 Å 2 ) and up to 7,360 Å 2 in collagen X.
Structure-Function Relationship-The available solution structure for the EMILIN1 gC1q domain allows the identification of some key residues in the molecule that might be crucial for the native structure formation and function. In addition, based on the correlation between native structure and function, it is also possible to predict functional roles for specific amino acids. These hypotheses have been tested here by functional assay of recombinant mutants of the EMILIN1 gC1q domain with respect to the wild type protein. Fig. 3 reports some of the results from site-directed mutagenesis experiments (additional data are available in supplemental Table S3).
As expected, introduction of single amino acid mutations (Fig. 3) for any of the hydrophobic core residues of the monomeric domain hampered the formation of the native conformation and resulted in insoluble recombinant proteins confined in bacterial inclusion bodies. Mutations to Ala were introduced at Phe 852 , Phe 886 , Gly 892 , Tyr 894 , Leu 956 , Phe 981 , and Leu 986 , and the resulting proteins were analyzed by SDS-PAGE to detect the expressed products either in the supernatant or following extraction from the cellular pellet. Compared with wild type EMILIN1 gC1q that was soluble to a large extent, none of the mutated proteins appeared soluble (supplemental Fig. S4). Changes in the side chains of those conserved residues might greatly impair correct folding and hence the solubility of EMILIN1 gC1q trimeric assembly. In addition, the region preceding and including the F strand, despite the loss of its ␤-structure, is functionally relevant for a correct trimer formation as indicated by the finding that the point mutation L956A greatly affects solubility of the protein, as does deletion of preceding segments. Intriguingly, the mutation F950A did not affect solubility but strongly modified trimer stability in 1% SDS because all of the protein was monomeric already at 0°C (Fig. 3B). The

JOURNAL OF BIOLOGICAL CHEMISTRY 18951
Site-Although ␣4␤1 integrin is known to mediate cell adhesion and migration on an EMILIN1 substrate via the gC1q domain (8,9), no EMILIN1 specific site recognized by this integrin has yet been identified. The finding of a unique structural characteristic, i.e. the presence of a flexible and disordered region in EMILIN1 compared with all the other resolved gC1q structures, prompted us to investigate the possible functional role of the Tyr 927 -Gly 945 segment.
Thus a number of deletion and single point mutants were designed (Fig. 3A), expressed as recombinant fragments in a bacterial system, purified under native conditions, and monitored by SDS-PAGE (Fig.  3B). The two deletion mutants del(Leu 932 -Phe 950 ) and del(Pro 942 -Phe 950 ), which lack, respectively, part of the unstructured segment Tyr 927 -Gly 945 and five of the residues joining the unstructured loop to the molten F region, were almost completely retained in bacterial inclusion bodies highlighting the key role of the "molten" F strand for correct molecular folding. In contrast the deletion mutant del(Tyr 927 -Gly 945 ), lacking the entire unstructured loop, was fully soluble; the slight reduction in thermostability observed is most likely a consequence of the loss of conformational entropy (Fig. 3B). This result is consistent with the NMR analysis that indicated the presence of an unstructured loop spanning the region Tyr 927 -Gly 945 , which does not appear to be involved in monomer folding nor trimer formation.
By comparing the sequence of this region with the known ␣4␤1 binding sequence of fibronectin, vascular cell adhesion molecule 1 and thrombospondin 1 and 2 (37-39), we identified a potential EMILIN1 gC1q integrin recognition site corresponding to the amino acids Leu 932 -Glu 933 , i.e. the core of EMILIN1 gC1q fragment 931-934 (GLEN) matching an analogous fragment in vascular cell adhesion molecule 1 (KLEK) (Fig. 3D). These core residues belong to the stretch EXLE, which is not only shared between human EMILIN1 and EMILIN2 gC1q domains but is also highly conserved in the orthologs from several distant species (Fig. 3C). Thus, the L932A/E933A double mutant was designed together with the single mutant involving the acidic residue, i.e. E933A. To exclude the involvement of other acidic residues located nearby, a shorter deletion mutant del(Leu 932 -Gly 945 ) and two more single mutants, E930A and E939A, were generated and analyzed. Finally, the importance of the residues in the region between the unstructured loop and the molten F strand was assessed by expression of the double mutant G945A/G948A. All of the designed single and double-point mutants proved soluble and had the same thermal stability as the wild type EMILIN1 gC1q, suggesting correct folding. An example of E933A and del(Tyr 927 -Gly 945 ) solubility compared with wild type EMILIN1 gC1q is shown in Fig. 3B.
The different point-mutated proteins and the shortest deletion mutant del(Leu 932 -Gly 945 ) were then assayed in standard cell adhesion assays. Jurkat cells that express high levels of the ␣4␤1 integrin attached in a dose-response assay very efficiently to wild type and mutated EMILIN1 gC1q except del(Gly 932 -Gly 945 ), E933A, L932A/ E933A, and wild type EMILIN1 gC1q denatured in 8 M urea (Fig.  4A). The cells did not adhere to the nonfunctional mutants even at high concentration of plated ligand (up to 10 g/ml). Thus, amino acid substitution mutants revealed that the conserved residue Glu 933 is essential for binding of EMILIN1 gC1q to integrin ␣4␤1, whereas single mutations of several residues close to Glu 933 , do not affect the interaction between the two molecules. When the Glu 933 was substituted with an aspartic acid, the cell anchorage was still very efficient, especially at the lowest doses of plated ligand. Finally, to exclude the possibility that local spatial alterations caused by mutations could affect the cell binding functionality of the specific EMILIN1 gC1q mutant, the function blocking 1H2G8 monoclonal antibody and a rabbit polyclonal antibody against EMILIN1 gC1q were used in enzyme-linked immu-  (in A and B). C shows the amino acids forming the monomer hydrophobic core. The drawings were prepared using Open-Source PyMOL (DeLano Scientific LLC, South San Francisco, CA) and MOLMOL (45).   Table S3). The Glu 933 Residue Is Required for Haptotaxis and for Cell Adhesion under Flow Conditions-As a further demonstration of the relevance of the residue Glu 933 , EA.hy926 human endothelial cells that express lower levels of ␣4␤1 integrin were used in cell adhesion and haptotactic migration. The results demon-strated that mutation of Glu 933 residue fully impaired cell adhesion (Fig. 4, B and C) as well as cell haptotaxis (Fig. 4D). Because static adhesion and haptotaxis experiments showed that cells are highly dependent on Glu 933 for full functionality, we next mimicked cell flow in blood vessels to determine whether firm arrest of Jurkat T cells mediated by ␣4␤1 was also related to residue Glu 933 . Under flow conditions at a shear rate of 20 s Ϫ1 almost no Jurkat cells had arrested on E933A-gC1q-coated surfaces up to 4 min of flow. In marked contrast up to 160 cells/mm 2 were firmly arrested on coated wt gC1q-coated surfaces (Fig. 5).

DISCUSSION
To match structural evidence and functional inference, we have described the structural basis for the interaction between integrin ␣4␤1 and the globular C1q domain of EMILIN1, a protein with important roles in regulating blood pressure homeostasis (5), cell adhesion (8), and cell migration (9). The interaction restraints were dictated by the solution structure of the isolated EMILIN1 gC1q domain trimer, as determined by NMR spectroscopy. The quaternary structure of EMILIN1 gC1q assembly identified, for each monomer, a unique unstructured loop, mostly located at the apex of the homotrimer, that mediates the interaction with ␣4␤1. This loop is not present in any of the already determined gC1q structures. Sequence alignments suggest that a corresponding loop should occur in the analogous domain of the cognate human protein EMILIN2 (6) as well as in the paralogues from other organisms. Unique among ␣4␤1 ligands, static and adhesion under flow of T cells as well as haptotactic migration of endothelial cells was fully dependent on a single acidic residue, Glu 933 .
Determining the structure of EMILIN1 gC1q trimer, i.e. the monomeric subunit conformation and the homotrimeric arrangement of an adduct with a molecular mass of ϳ52 kDa, required the application of several different experimental and modeling approaches that led to the structure depicted in Fig. 2, which represents the refined geometry with the lowest devia- tion from the experimental restraints. The quaternary structure of the complex is formed by three subunits that are mostly identical, apart from very minor differences inherently associated with the molecular dynamic simulations. The original trimeric homology model, built by superimposing the backbone of three identical monomers to the template backbone that derived from the quaternary structure of the ACRP30 homologous protein, did not match satisfactorily the experimental data. In contrast to all the gC1q structures solved to date, i.e. ACRP30 (10), collagen X (11), mouse collagen VIII (12), and human complement C1q (13), the EMILIN1 gC1q conformation, while conserving the typical jelly roll topology of two ␤-sheets made of antiparallel strands, is formed by a ninestranded ␤-sandwich fold instead of a ten-stranded one. The segment Thr 946 -Leu 956 of EMILIN1 gC1q sequence corresponding to most of the missing ␤-strand of the canonical motif, i.e. strand F, appears to deviate from the straight path of an extended conformation and break into a hairpin-like arrangement because of a reverse turn at residues Ser 951 and Leu 952 . This geometry leads to a substantial increase in the size of the central cavity encompassed by the three subunits (Fig. 6). The loss of secondary structure for one of the two edge strands at the interface between each subunit pair of EMILIN1 gC1q trimer (E and F) has been already observed in ACRP30 (10), with significant departures from a regular ␤-extended conformation at strand E of a single subunit within the trimer. The edge strand melting, as originally referred to, could reflect stringency limits on the number of strands that are necessary to assemble the trimer. It has been recognized that a number of highly conserved hydrophobic residues, buried in the core of the gC1q monomer structure, are fundamental for a correct folding and trimer for-mation. Several single-point mutation experiments confirm, also for EMILIN1 gC1q, the essential role of those residues (in particular, Phe 852 , Phe 886 , Gly 892 , Tyr 894 , Leu 956 , Phe 981 , and Leu 986 ); the corresponding expressed recombinant proteins precipitated in inclusion bodies. Mutagenesis of residue Phe 950 , however, results in the successful isolation of a soluble F950A protein. This residue is located in the intersubunit cavity, precisely between the flexible and disordered region Tyr 927 -Gly 945 , and the diverging (␤I-like) turn preceding the molten F strand remnant, i.e. fragment Ile 953 -Leu 956 . Therefore the hydrophobic Phe 950 side chain, expected to establish interface interactions between monomers, does not appear essential for correct folding because its replacement does not affect EMILIN1 gC1q domain solubility, a prerequisite for successful assembly. However, residue Phe 950 exerts a key role in trimer stability as inferred from the inability to detect the trimeric form in the presence of 1% SDS even at 0°C (Fig. 3B) and the failure of specific recognition by monoclonal antibody 1H2G8 and a rabbit polyclonal antibody.
The complete loss of the cell adhesive function displayed by the EMILIN1 gC1q mutant devoid of segment Leu 932 -Gly 945 indicates that the integrin binding capacity resides entirely in that region, a statistically disordered loop according to NMR evidence. Solubility and thermostability of this deletion mutant, and of the del(Tyr 927 -Gly 945 ) mutant, are analogous to full-length EMILIN1 gC1q protein, indicating that region Tyr 927 -Gly 945 is not involved in the folding and assembly of the domain, a conclusion further reinforced by a monoclonal antibody recognition assay. The limited decrease in melting temperature observed with the deletion mutants could reflect destabilization caused by conformational entropy loss, which would confirm further the NMR-based structural picture of a flexible loop. The mutagenesis experiments clearly demonstrate that the interaction between ␣4␤1 and EMILIN1 gC1q is highly specific, because the del(Leu 932 -Gly 945 ) mutation dramatically impairs cell binding. The conformational characteristics of fragment Tyr 927 -Gly 945 provide a further example of the functional relevance of disordered regions in proteins (40). The structures of several polypeptides with integrin-binding capability have been determined; all of them display exposed aspartic or glutamic acid residues that are critical for integrin recognition and are generally located in mobile loops protruding from the main core of the whole ligand (41,42). Although the aspartic acid-based sequences (e.g. RGD, LDV, KGD, RTD, and KQAGD) bind to the majority of integrins (43), other integrins, including all ␤2 integrins, prefer ligands containing glutamic acid-based motifs (44). Thus, we focused our attention on the three Glu residues occurring in the unstructured segment Tyr 927 -Gly 945 , Glu 930 , Glu 933 , and Glu 939 . Notably, Glu 930 and Glu 933 are completely conserved among all known EMILIN1 and EMILIN2 sequences, from fishes to mammals (Fig. 3C). The E933A mutation fully inhibited Jurkat T cell and EA.hy926 endothelial cells adhesion and haptotactic migration. This loss of activity was not due to a global folding change of the protein as shown by biochemical and immunorecognition evidence. Therefore, Glu 933 appears to play a crucial role in ␣4␤1-integrin mediated interaction with the EMILIN1 gC1q domain. The cavity lumen is shadowed, with an upper entrance that is outlined by the ideal triangle identified by residues Gly 926 (red, ϳ19 Å apart) of each subunit. The drawing was prepared using Open-Source PyMOL (DeLano Scientific LLC).
Furthermore, the present study suggests a potential structural basis for mediating firm arrest and attachment to subendothelial EMILIN1 gC1q. The shear flow rates used in our experiments likely mimic physiological inflammatory conditions in microvessels, when microangectasias induced by inflammation blood flow to these rates. T cells trafficking from blood into peripheral tissues occurs through a sequential process that involves tethering, rolling, firm adhesion, and transendothelial migration. The ␣4␤1 integrin has been implicated in all of these phases of trafficking. When inflammation occurs, EMILIN1 gC1q might be exposed at sites of endothelial cells in postcapillary venules, and activated T lymphocytes, expressing high affinity integrins, might be capable of tethering and arrest. This is the first report of an integrin-binding site located on a homotrimeric assembly that could employ a different mode of integrin engagement.
Because the NMR structure locates residue Glu 933 in a flexible loop structure at the apex of the EMILIN1 gC1q domain assembled into a symmetric trimer, once the stoichiometry of three integrin molecules for a single gC1q trimer is ruled out for steric reasons, the interaction pattern with ␣4␤1 might well entail binding of the three Glu 933 carboxylate groups to integrin via metal ion(s). This model for the interaction geometry pattern presents a testable hypothesis to characterize the functional modulation of the recognition event for this unique integrin-binding site located on a homotrimeric assembly of a functional ligand.