Solution Structure of Factor I-like Modules from Complement C7 Reveals a Pair of Follistatin Domains in Compact Pseudosymmetric Arrangement*

Factor I-like modules (FIMs) of complement proteins C6, C7, and factor I participate in protein-protein interactions critical to the progress of a complement-mediated immune response to infections and other trauma. For instance, the carboxyl-terminal FIM pair of C7 (C7-FIMs) binds to the C345C domain of C5 and its activated product, C5b, during self-assembly of the cytolytic membrane-attack complex. FIMs share sequence similarity with follistatin domains (FDs) of known three-dimensional structure, suggesting that FIM structures could be reliably modeled. However, conflicting disulfide maps, inconsistent orientations of subdomains within FDs, and the presence of binding partners in all FD structures led us to determine the three-dimensional structure of C7-FIMs by NMR spectroscopy. The solution structure reveals that each FIM within C7 contains a small amino-terminal FOLN subdomain connected to a larger carboxyl-terminal KAZAL domain. The open arrangement of the subdomains within FIMs resembles that of first FDs within structures of tandem FDs but differs from the more compact subdomain arrangement of second or third FDs. Unexpectedly, the two C7-FIMs pack closely together with an approximate 2-fold rotational symmetry that is rarely seen in module pairs and has not been observed in FD-containing proteins. Interfaces between subdomains and between modules include numerous hydrophobic and electrostatic contributions, suggesting that this is a physiologically relevant conformation that persists in the context of the parent protein. Similar interfaces were predicted in a homology-based model of the C6-FIM pair. The C7-FIM structures also facilitated construction of a model of the single FIM of factor I.

The membrane attack complex (MAC) 2 is the terminal product of the complement cascade and is therefore a fundamental component of mammalian innate immunity. The formation of this multi-protein complex is triggered by proteolytic cleavage of complement component C5. This is followed swiftly by a remarkable, although little understood, self-assembly process involving multiple sequential protein-protein recognition events. MAC assembly culminates in the formation of a pore traversing the targeted cell membrane (1). Accumulation of multiple MACs in a membrane triggers cell-dependent responses and may result in cell lysis (2). The key to progress in understanding MAC formation will be three-dimensional structural information for each of its component proteins, namely C5b, C6, C7, C8, and C9.
Classical, alternative, and lectin pathways of complement activation converge at a step in which C5 is cleaved to release activated C5b. Immediately following C5b formation, C6 and C7 bind sequentially; the C5b6 complex is soluble and relatively stable (3), but soluble C5b67 has a brief half-life and is proposed to attach rapidly to target membrane surfaces (4,5). Subsequently, C8 binds to the nascent complex, inserting into the target membrane and causing disruptive rearrangements of the lipid bilayer. Finally the mature MAC, C5b6789 n , forms by recruitment of between 10 and 16 copies of C9 that insert in the membrane to form the pore. Notably, once C5b is generated, MAC assembly requires no additional enzymatic triggers; this implies that individual components encompass highly specific, complementary binding sites that become exposed during MAC formation.
Complement proteins C6, C7, C8 (␣ and ␤ subunits), and C9 comprise the "MAC family" (Fig. 1a) (6). Family members share, in addition to a large central membrane attack complex perforin domain (7)(8)(9), several tandemly arranged, cysteinerich modules of less than 80 amino acid residues each. These smaller modules include thrombospondin type I (10), low density lipoprotein receptor class A (11) and modules similar in sequence to epidermal growth factor (Fig. 1a). C6 and C7 each contain an additional four modules at their carboxyl termini: two ϳ60-residue complement control protein modules (12,13), followed by two cysteine-rich modules composed of ϳ75 residues each; these are the factor I-like modules (FIMs) (also known as factor I membrane attack complex domains (14,15)), so named because of their apparent relatedness to an aminoterminal domain of complement factor I (fI) (Fig. 1b).
Latent C5 was shown, in vitro, to bind reversibly to both C6 and C7 prior to activation. These interactions are distinct from and precede irreversible binding of C6 and subsequently C7 to C5b (18). It is hypothesized that the C56 and C57 preactivation complexes ensure that C6 and C7 are maintained proximal to C5 in the plasma. This may be significant because activated C5b is labile (19,20), hence swift assembly of C5b67 is advantageous. Within this preactivation complex, critical interactions occur between the carboxyl-terminal C345C domain of C5, C5-C345C (21), and the carboxyl-terminal FIM pair of both C6 and C7 (22,23). The involvement of these domains in MAC formation was demonstrated using recombinant proteins, where either C7-FIMs or C5-C345C inhibited the binding of C7 to C5b6 and inhibited complement-mediated erythrocyte lysis (23). The FIMs of C6, however, although shown to promote MAC assembly, do not appear to be essential for MAC formation (22). C7-FIMs have a stronger affinity than C6-FIMs for C5-C345C, suggesting that C7-FIMs may displace C6-FIMs during MAC assembly (23). Thus, interactions between C5-C345C and FIMs are key to the early assembly of MAC, and their structural basis is an important target of investigations.
The structure of the C5-C345C domain is well established (24,25); however, there has been no three-dimensional structural information available for any of the FIMs or for any other domains within C6 or C7. The closely related FIM within fI has been postulated to resemble a follistatin domain (26). Intrigu-ingly, however, disulfide mapping of human C6 isolated from plasma appeared to exclude that possibility (27). The threedimensional arrangement of the neighboring FIMs, and the extent of interactions between them, has also been a mystery.
We previously described a protein construct comprising the carboxyl-terminal pair of FIMs from human C7 (18), which folds homogeneously and binds to C5 in surface plasmon resonance assays. Here we report the solution structure of this consecutive pair of FIMs. This new structure reveals that, despite previous evidence to the contrary, each FIM adopts a follistatinlike fold, and the two FIMs are intimately associated to form a homodimer-like, pseudosymmetrical carboxyl terminus of C7. This work, therefore, serendipitously provides the first published structure of a follistatin-domain pair in the absence of ligand and suggests that conformational changes within FIM pairs accompany ligand binding. Novel structures of the FIMs from both C6 and fI have been modeled based upon our NMRderived solution structure of the C7-FIMs.

EXPERIMENTAL PROCEDURES
C7-FIMs, encompassing residues Asn 693 to Gln 843 of C7 with an amino-terminal His 6 tag, was cloned previously (18), and the recombinant protein was expressed in the Origami B strain of Escherichia coli as previously described (18,28). The His 6tagged protein was captured using a Co 2ϩ affinity column, the His 6 tag was then cleaved with thrombin before noncleaved material, and the His 6 tag was removed via a re-pass down the Co 2ϩ affinity column. The cleaved protein was purified by reverse phase high performance liquid chromatography (Supelco Discovery BIO Wide Pore C8 column; Supelco Inc., PA), eluting with an acetonitrile gradient in the presence of 0.1% (v/v) trifluoroacetic acid, before lyophilization and resus- . a, the MAC family of proteins aligned, domain-wise, with C6. b, the domain structure of fI. The heavy chain contains the amino-terminal domains and the light chain comprises a serine protease domain. An intramolecular disulfide bond between light and heavy chain (Cys 309 -Cys 435 ) and a proposed interdomain disulfide between the amino-terminal region and first low density lipoprotein domain (Cys 15 -Cys 237 ) are shown as diagonal lines. The domains were defined using the SMART data base (16,17). TSP, thrombospondin type 1; LDL, low density lipoprotein receptor type A; MACPF, membrane attack complex perforin domain; EGF, epidermal growth factor; CCP, complement control protein; FIM, factor I-like module; CD5, CD5-like; SP, serine protease domain.
pension into NMR buffer. Characterization of the purified protein was carried out using polyacrylamide gel electrophoresis and electrospray ionization-Fourier transform ion cyclotron resonance mass spectrometry.
NMR sample conditions were optimized by analysis of twodimensional 15 N, 1 H HSQC spectra at a range of protein concentrations (0.1-1.0 mM), salt concentrations (0 -150 mM), buffers (Arg-Glu (29) and phosphate), temperatures (10 -60°C), and pH values (3.0 -7.0). Final optimal sample conditions were a protein concentration of 300 M in 20 mM potassium phosphate, pH 6.5, 25°C, with 10% v/v 2 H 2 O. A standard suite of NMR experiments was implemented to obtain nearly complete resonance assignment (28). The spectra were processed using the Azara suite of programs (provided by Wayne Boucher and the Department of Biochemistry, University of Cambridge, Cambridge, UK), and resonance assignment was carried out using the Analysis package (30). Assignment of the NOE spectroscopy data was carried out using a combination of 10% manual peak assignment and 90% automated assignment within the structure calculation software CYANA 2.1 (31,32).
All of the proline residues were defined as trans on the basis of chemical shifts and NOEs between H ␣ (Pro nϪ1 ) and H ␦ (Pro n ). All 18 cysteines were inferred to be disulfide-bonded on the basis of accurate mass spectrometry. Therefore, although no specific disulfide linkages were initially incorporated, all of the cysteine residues were defined as in the oxidized state in the structure calculations. The incidence of hydrogen bonds between backbone amides and carboxyl groups was determined on the basis of amide exchange retardation as observed in 15 N, 1 H HSQC spectra collected 1 h after the protein was transferred to 99.9% (v/v) deuterated buffer. The hydrogen bond acceptor of the protected amide was then inferred from the corresponding network of NOEs. The CYANA calculation comprised a seven-cycle routine using combined, automated, NOE assignment, and structure determination. The upper limit distance constraints generated were transferred into the crystallography and NMR system (33,34) using the program Format Converter within the CCPN suite (30) to include ambiguous disulfide restraints and to perform a structure refinement using explicit water. Structure calculations proceeded iteratively with inclusion of hydrogen bonds, ambiguous disulfide bridges, dihedral restraints generated using TALOS (35), and, finally, unambiguous disulfide bridges. An ensemble of 25 structures was calculated based on the converged set from 100 structures calculated. In addition a further set of structures were calculated without disulfide bonds Cys 773 -Cys 782 and Cys 776 -Cys 789 explicitly constrained; this yielded 23 converged structures (from 100) of differing disulfide linkages in this region, although as expected, the overall structure was not affected. All 48 converged structures were deposited in the Protein Data Bank (36) under code 2WCY. NMR relaxation data were assessed from 15 N T 1 and T 2 (37,38) and the 1 H, 15 N HSQC heteronuclear NOE experiment (38). The quality of the data and structures were analyzed using Whatif (39), and the Ramachandran statistics were checked using PROCHECK (40,41).
The three-dimensional structure of C6-FIMs was modeled using, as a template, the structure of C7-FIMs closest to that of the mean from the ensemble generated in structure calculations. The single FIM in fI (fI-FIM) was modeled using the carboxyl-terminal FIM (C7-FIM2) from the closest-to-mean structure of C7-FIMs as the template. C6-FIMs and fI-FIM share 32 and 25% sequence identity with their templates, respectively. The optimal alignment between the targets and template sequences was achieved using an initial multiple sequence alignment of homologous protein sequences from a range of differing genera employing the program PRO-MALS3-D (54). Related sequences for C6-FIMs were first identified via the NPS@ server (55) with a BLAST (56) search against the UniProt data base (57). Because the FIM from fI exhibits a lower degree of sequence similarity to its template sequence, in this case a remote hidden Markov model (58), search methodology was employed to identify related sequences prior to alignment (data not shown). The resulting target-template alignments were manually refined from the multiple sequence alignment to place gaps optimally guided by positioning of predicted and identified secondary structure elements. Twenty models were built for each protein using the program Modeler 9v6 (59), and the one with the lowest objective function score was selected as the representative model. The quality of the models was assessed using PROCHECK v3.5.4 (40,41) and ProQ (60).

RESULTS
Protein Expression-The recombinant protein fragment C7-FIMs, encompassing residues Asn 693 to Gln 843 of human C7 with an amino-terminal His 6 tag (and where Gln 843 is the carboxyl-terminal residue of the complete C7 protein), was expressed as a soluble, folded protein in the Origami B strain of E. coli. The affinity tag was removed by thrombin, leaving four non-native residues (Gly-Ser-His-Met) at the new amino terminus. Protein expression levels in rich or isotopically enriched media were typically 4 mg liter Ϫ1 , yielding sufficient high quality protein to enable a solution state structure determination of C7-FIMs by NMR spectroscopy. Protein purity and the presence of nine disulfide bonds were confirmed by mass spectrometry (28).
NMR-derived Structure of C7-FIMs-The 15 N-, and 15 N, 13 Clabeled samples of C7-FIMs yielded high quality NMR spectra, thus permitting assignment of 90% of 1 H, 15 N, and 13 C nuclei (28). The missing backbone assignments arise from the four non-native amino-terminal residues and three stretches of residues within FIM2 (Cys 773 -Gly 774 , Cys 776 -Trp 779 , and Asp 783 -Lys 788 ). There was an absence of observable backbone resonances corresponding to these regions despite exploration of a variety of protein concentrations, ionic strengths, temperatures, pH values, and field strengths. All eight proline residues were judged to be trans on the basis of chemical shift differences ␦C ␤ Ϫ ␦C ␥ and NOEs between H ␦ (Pro n ) and H ␣ (Pro nϪ1 ). Initial NMR structures were determined using 2702 NOE-derived restraints. A total of 154 and dihedral angle restraints were deduced using TALOS (35) and incorporated into subsequent calculations. Early structure calculations containing only NOE data confirmed 28 hydrogen bonds that had been inferred by deuterium exchange; these were incorporated   (left panel) are exposed, with hydrophobic surfaces in olive, acidic in red, and basic in blue (calculated using MolSurfer) (43,44) with key residues labeled. c, the electrostatic surface map of C7-FIMs (Thr 696 -Gln 843 ) (calculated using APBS), showing key residues contributing to the highly basic surface of FIM1 and the highly acidic surface of FIM2. d, the lipophilic surface of C7-FIMs (calculated using SYBYL6.9) reveals an area of hydrophobicity on the surface of FIM1, with contributing residues labeled. e, mapping of surface-exposed residues that are conserved (yellow) and strictly conserved (orange) within the C7 family, shows that Lys 707 , Met 717 , Tyr 719 , and Glu 720 are clustered in a region on FIM1 that may be a potential binding site.
into the next round of calculations. Seven of the nine disulfide bonds inferred from mass spectrometry were clearly identified by supporting NOEs (often C ␤ -C ␤ ) and by the proximity of pairs of cysteines in these initially calculated  (62). The ensemble of 25 structures with lowest NOE-derived energies converged well overall, as may be judged from backbone overlays and from the root mean square deviations (RMSDs) of the C ␣ coordinates and the heavy atoms (Fig. 2, a and b, and Table 1). The initial residues of FIM2 (Cys 773 , Gly 774 , Pro 777 , and Leu 778 ), which lie adjacent to the intermodular linker, form a solvent-exposed loop that did not converge during structure calculations because of the low number of NOEs observed in this region; this is reflected in the high RMSDs of the relevant C ␣ coordinates (Fig. 2b). Relaxation experiments (T 1 , T 2 , and 1 H, 15 N NOE) identified residues Ala 767 -Ala 772 , as undergoing motion faster than the overall molecular tumbling (Fig.  2b). These residues are located in the linker region connecting the two modules indicating a significant amount of flexibility here; this is reflected by the lack of convergence observed in this region, depicted in the overlay (Fig. 2a). The surface-exposed loop within the ␤-hairpin (see below) of FIM2, Asp 783 -Ser 787 , is also likely to be very flexible; again there is a dearth of detectable NMR signals for these residues, indicative of conformational movement on the intermediate timescale (milliseconds). However, because of the lack of observed resonances, this dynamic behavior could not be confirmed by NMR relaxation data. The relaxation data also identified two solvent-exposed residues within FIM1 and FIM2 that may be considered mobile: Asn 754 within FIM1 and Arg 792 within FIM2.
Despite the fact that the relaxation data indicates flexibility of the linker between the two modules, i.e. residues Ala 767 -Ala 772 , heteronuclear NOE values for the residues within each module show that they are relatively immobile on the NMR timescale as shown by values over the 0.6 threshold, consistent with their intimate mutual association and the relatively low overall RMSD for the ensemble. Extensive interaction between the two FIMs is supported by a plethora of intermodular NOEs (supplemental Description of Structure of C7-FIMs-The NMR-derived solution structure of C7-FIMs reveals that FIM1 (residues 701-767) and FIM2 (residues 772-841) are highly similar, with an RMSD of 1.5 Å for the C ␣ coordinates of 57 structurally aligned residues using the program MultiProt (53), and that they are related by an axis of 2-fold rotational pseudosymmetry such that the molecule resembles a homodimer (Fig. 3a).
Each FIM possesses two distinct subdomains connected by a 3 10  Within the sheet, B3 lies between strands B4 and B5, whereas H1 makes the right-handed crossover between B4 and B5. The carboxyl terminus of the module, which is also the carboxyl terminus of the full-length C7 in the case of FIM2, is a short extended region with no canonical secondary structure that is anchored within the carboxyl-terminal subdomain by a disul-fide bridge (Cys 728 -Cys 763 in FIM1 and Cys 805 -Cys 838 in FIM2) to B3. Interestingly, the architecture of the carboxylterminal subdomain reveals the same topology as the KAZAL domain (66), a common serine protease inhibitor motif. The two solvent-exposed residues identified as mobile by the relaxation data occur between structured regions: Asn 754 occurs in the surface-exposed loop between helix H1 and strand B5 of FIM1; and Arg 792 is partially solvent-accessible (20% exposed) and is situated at the interface between the FOLN and KAZAL subdomains of FIM2.
The apparent dyad axis of pseudosymmetry passes between the carboxyl termini of B3 of FIM 1 (i.e. B3 FIM1 ) and B3 FIM2 , the amino termini of H1 FIM1 and H1 FIM2 , and the linkers between subdomains (Fig. 3a). From a perspective orthogonal to this axis, the KAZAL domains appear as "wings" open at an angle of 60°attached to a "body" composed from the FOLN domains and the linkers between the subdomains (Fig. 3a). The intimate interface between the modules is supported by both electrostatic and hydrophobic interactions (Fig. 4, a and b), including module-bridging hydrogen bonds and three salt bridges (Arg 704 -Glu 800 , Lys 716 -Glu 800 , and Asp 726 -Arg 824 ) (Fig. 4a), as well as aromatic-to-cysteine and aromatic-to-aromatic interactions (Fig. 4b).
The key hydrophobic residues at the interface are Trp 705 , Pro 723 , Leu 725 , Val 742 , and Leu 757 of FIM1 and Leu 778 , Phe 802 , Ile 804 , Val 832 , Ile 835 , and Pro 837 of FIM2. In particular, the aromatic ring of Phe 802 interacts with that of Trp 705 and is also involved in a -interaction with the sulfur of Cys 743 . The key residues involved in electrostatic interactions at the interface are Arg 704 , Lys 716 , His 746 , and Arg 760 from FIM1 and Glu 800 , Glu 799 , Glu 817 , and Arg 824 from FIM2.
Within each FIM, the ␤-hairpin of the FOLN subdomain projects away from H1 in the KAZAL subdomain at an angle of about 70° (Fig. 3b). Confidence in this orientation of the FOLN subdomain with respect to that of KAZAL is high because of extensive interactions between subdomains that are supported by NOEs between B1 in each FOLN domain and H1 in each KAZAL domain (supplemental Fig. S2). In particular, hydrophobic interactions observed between a highly conserved tryptophan in B1 (Trp 705 in FIM1 and Trp 779 in FIM2) and a side chain in the middle of H1 (Val 747 in FIM1 and Leu 823 in FIM2) are corroborated by a network of NOEs between Trp 705 in B1 FIM1 and residues in H1 FIM1 or between Leu 778 and Trp 779 in B1 FIM2 and residues in H1 FIM2 .
The surface potentials of the two individual FIMs from C7 clearly show complementarity in charge with a basic surface on FIM1 and an acidic one on FIM2, both within the interface and on the exposed surfaces. The surface-exposed residues that account for the basicity of FIM1 are Lys 701 , Arg 704 , Lys 707 , Arg 712 , Arg 733 , Lys 744 , and Arg 753 ; in contrast, the acidic nature of the surface of FIM2 is due to several surface-exposed glutamate side chains, namely Glu 785 , Glu 798 , Glu 799 , Glu 800 , Glu 807 , FIGURE 5. C6-FIMs homology model. a, cartoon representation of the model of C6-FIMs obtained by homology modeling based upon the solution structure of C7-FIMs, with FIM1 in red, FIM2 in blue, and the side chains of conserved residues as sticks. b, sequence alignment of C6-FIMs and C7-FIMs; shaded residues are conserved with colored residues highlighting those found at the interdomain interface. c, the buried intermodular interfaces are exposed, with hydrophobic surfaces in olive, acidic in red, and basic in blue (calculated using MolSurfer) (43,44) with key residues labeled. d, the electrostatic surface map of C6-FIMs (calculated using APBS) showing key residues contributing to the highly acidic surface of FIM1 and the highly basic surface of FIM2. e, the lipophilic surface of C6-FIMs (calculated using SYBYL6.9) reveals an area of hydrophobicity leading into a hydrophobic cleft between FIM1 and FIM2; key residues are indicated. f, surface exposed conserved (yellow) and strictly conserved (orange) residues within the C6 family. JULY 17, 2009 • VOLUME 284 • NUMBER 29

JOURNAL OF BIOLOGICAL CHEMISTRY 19643
Glu 812 , and Glu 841 (Fig. 4c). The clustering of some of these charged residues leads to a cleft of negative charge created by Glu 798 , Glu 799 , and Glu 800 . The electropositive surface of FIM1 is interrupted by a patch of hydrophobicity involving residues Met 717 , Tyr 719 , Leu 748 , and Tyr 757 that is exposed in the FIM pair (Fig. 4d). Moreover, we have identified a cluster of residues exposed on the surface of FIM1 that are strictly conserved throughout the C7 family, namely Lys 707 , Met 717 , Tyr 719 , and Glu 720 (Fig. 4e and supplemental Fig. S3).
Models of C6-FIMs and fI-FIM Built by Homology with C7-FIMs-The closest-to-mean structure of C7-FIMs from the ensemble of converged NMR-derived structures was used as a template for homology modeling of C6-FIMs; C6 and C7 share 32% sequence identity in the FIMs region. This exercise reveals that the two FIMs of C6 can adopt the same pseudosymmetric closed arrangement that is seen in C7, with five ␤ strands and one ␣ helix arranged in the same FOLN-KAZAL architecture as C7-FIMs. The 3 10 -helix present in C7-FIMs is well defined only in C6-FIM2 (Pro 885 -Gln 887 ) (Fig. 5a). The C6-FIMs structure is sustained by extensive ionic and hydrophobic intermodular interactions (Fig. 5, b and c) not dissimilar to those observed in C7. For example, two salt bridges are conserved: Lys 716 -Glu 800 in C7 corresponds to Asp 792 -Lys 890 in C6, and Asp 726 -Arg 824 in C7 corresponds to Glu 798 -Arg 918 in C6; a third, nonconserved intermodular salt bridge is present in each structure, Arg 704 -Glu 800 in C7 and Asp 839 -Arg 918 in C6. Furthermore, the orientations of FOLN and KAZAL subdomains are likely to be very similar in C6 on the basis of conserved hydrophobic and ionic interactions at the interface between subdomains (Fig.  5c). In addition, all of the cysteines are conserved, as are several surface-exposed residues. A significant difference between C7-FIMs and C6-FIMs is the presence of an additional 15 amino acids in the linker region between FIM1 and FIM2 of C6 ( 843 LEWGLERTRLSSNST 857 ), within which a stretch of three residues (Arg 849 -Arg 851 ) was weakly predicted to form a ␤-strand by PSIPRED (67). This region in the model of C6, was deemed to be of "low confidence" and hence was excluded from further structural analysis.
Although C6-FIMs and C7-FIMs share a common architecture, the individual FIMS are opposite in overall charge; the FIM1 of C6 is electronegative (resulting from Glu 783 , Asp 804 , Asp 806 , Asp 809 , and Glu 821 ), whereas that of C7 is electropositive, and C6-FIM2 is electropositive (because of Lys 859 , Lys 890 , Lys 907 , Arg 922 , and Lys 923 ), whereas C7-FIM2 is electronegative (Fig. 5d). Some exposed residues on the surface of C6-FIMs are conserved within the C6 family, including Gly 776 and Gly 863 , which are required for the turn preceding B1 in each FIM; and Glu 790 and Glu 925 ( Fig. 5f and supplemental Fig. S3). A hydrophobic strip, involving residues Phe 818 , Leu 819 , Phe 831 , Leu 832 , Leu 927 , and Leu 933 , is predicted to form on the surface at the FIM1-FIM2 junction (Fig. 5e); however, this might actually be buried by the C6 insertion in the linker region, which could not be modeled.
There is less similarity between the sequences of C7-FIMs and fI-FIM; therefore the approach used to generate a model of the structure of fI-FIM (Fig. 6a) necessarily differed from that used to create the model of C6-FIMs. Initial fold recognition searches (68) for the fI-FIM predict that it adopts a follistatin-like fold. Indeed, the fI-FIM has previously been modeled based upon a FD template from follistatin (Protein Data Bank code 1LR7 (65)) of 23% sequence identity (69). The higher sequence similarity between the fI-FIM and the newly solved C7-FIMs (34 and 25% identity) (supplemental Fig. S4) now afford the opportunity to build a more reliable model. The comparison of this newly created fI-FIM model with that reported previously can be found in supplemental Table S1.
Even though the fI-FIM is more similar to C7-FIM1 (34% identity) than C7-FIM2 (25%), upon constructing and evaluating numerous models based upon each or both templates together, the use of C7-FIM2 as the template was deemed more appropriate in light of significantly better evaluation statistics for the resulting models (comparison data not shown). This may be, in part, due to the conservation of an additional disulfide between fI-FIM and C7-FIM2, which is absent in C7-FIM1. Based upon the conservation of residues between subdomains, it is likely that, within fI-FIM, the FOLN is arranged in a similar manner with respect to the KAZAL. From structure validity checks (70) on this model, the final ␤-strand, B5, was identified as the area with lowest confidence; indeed, this strand of fI is poorly predicted by PSIPRED. It is possible that the subsequent domain within the intact fI protein influences the structure of this region such that it cannot be modeled with confidence on the basis of the carboxyl-terminal C7-FIM2. Like C7-FIM1, fI-FIM has a highly basic surface (Fig. 6b). In addition, a prominent area of lipophilicity (from side chains of Leu 63 , Tyr 65 , Phe 82 , Leu 91 , and L 94 ) surrounds a cavity in the surface (Fig. 6c). These residues are conserved in an alignment of fI orthologues ( Fig. 6d and supplemental Fig. S3), suggesting that they comprise an important interaction site for fI.

DISCUSSION
The remarkably rapid and precise process of MAC self-assembly is central to the capacity of the complement system to counter invasion by aggressively infectious microorganisms while minimizing damage to host tissue. The efficiency of this process depends upon interactions of C6 and C7 with C5 both before and after its activation by C5 convertases. A useful first step toward a full molecular description of the early steps in MAC formation is to elucidate the three-dimensional structures of FIM pairs at the carboxyl termini of C6 and C7. These module pairs play critical roles in formation of both pre-and post-activation complexes of C5b, C6, and C7. Although the single FIM of fI was previously modeled on the basis of suspected homology with FDs (69), the C6-FIMs and, by inference, C7-FIMs were reported to have a disulfide bonding pattern inconsistent with membership to the follistatin family (27).
FIMs Are a Type of Follistatin Domain-Our new NMR-derived solution structure, however, shows unambiguously that the topology of each FIM from C7 closely resembles that of FDs in crystal structures (Fig. 7a). Moreover, this new structure has enabled modeling of a reliable structure of the C6-FIMs by homology, confirming that these are also highly similar to FDs. Both FIMs and FDs are composed from an amino-terminal FOLN subdomain (70) linked to a carboxyl-terminal KAZAL subdomain (66). Indeed, the KAZAL subdomain within each C7-FIM overlays well with KAZAL subdomains within each of the 16 known examples of FD structures (supplemental Table S2). The intact, individual C7-FIMs were subjected to a pair-wise comparison with each of the set of 16 available FDs revealing that C7-FIMs are most similar to the FD from SPARC, a matricellular regulatory protein (Protein Data Bank code 1BMO (71) and 2V53 (72)), with a 12.9% sequence identity and RMSD of 3.05 Å over 62 residues for FIM1; and for FIM2, a 20.6% sequence identity and RMSD of 2.83 Å over 63 residues (supplemental Table S2). Disulfide bonding patterns within C7-FIMs and FDs are identical (1-3, 2-4, 5-9, 6 -8, and 7-10 for the 10 cysteines of each module), and there is similarity in positions of secondary structural elements (Fig. 7b). The disulfide bonding pattern, which was unambiguously determined in the case of FIM1 of C7 and was partially inferred in the case of FIM2, is different to that previously mapped for C6-FIMs (1-3, 2-9, 4 -7, 5-10, and 6 -8 for the 10 cysteines of each module) by multiple enzyme digests and Edman degradations of a C6 sample isolated from human plasma (27); conversely, the previously reported disulfide pattern is inconsistent with our structure of C6-FIMs modeled on its close homologue C7-FIMs. Indeed the disulfide linkages previously proposed for C6 would collapse the structure of the FOLN and KAZAL motifs. This discrepancy may reflect the complexity of the challenge to chemically elucidate the disulfide map of proteins with a high cysteine content, particularly when, as is the case here, several cysteines are in close proximity within the sequence.
In addition to the conserved cysteines, several residues are broadly conserved across the FIMs and FDs in our structure-based alignment. These include several fully or partially exposed hydrophobic residues that potentially interact with other domains or proteins, namely Val 714 in B2 FIM1 (Val 790 in B2 FIM2 ), Leu 757 in B5 FIM1 (Val 832 in B5 FIM2 ), and Tyr 755 (70% buried) in B5 FIM1 (Ile 830 FIGURE 6. Structure and features of the model of fI-FIM. a, cartoon representation of fI-FIM modeled on the FIM2 of C7 within the structure of the pair of C7-FIMs, in the same orientation as C7-FIM2 in Fig. 2a. b, the electrostatic map of fI-FIM (calculated using APBS) shows that the basic surface is extensive. c, residues contributing to a large area of hydrophobicity surrounding a cavity on one face of the module are indicated on the lipophilic surface of fI-FIM (calculated using MOLCAD). d, surface-exposed conserved (yellow) and strictly conserved (orange) residues within the fI family reveal a cluster of residues on the FOLN subdomain. in B5 FIM2 ) (Fig. 7b). Another conserved FIM residue of potential functional significance lies in H1 (His 749 in FIM1 and Arg 824 in FIM2); the equivalent residue in the second FDs of follistatin (Arg 192 ) and follistatin-like protein (Arg 199 ) is required for the binding of activin and similar ligands (73,74). Although many structural features are conserved between FIMs and FDs, there are some significant differences. The FOLN subdomain within the FDs commonly contains a long loop of seven or more amino acid residues connecting the two ␤-strands that comprise the ␤-hairpin, whereas this loop is reduced to two and four residues in FIM1 and FIM2, respectively. In follistatin, this loop provides a binding site for heparin (65), and it is also implicated in ligand binding in other FDs (71). In addition, the C7-FIMs have an insertion between B3 and B4 within the KAZAL subdomain, effectively lengthening strand B3; in FIM1, this insertion also extends the interstrand loop by two basic amino acid residues. The region connecting FOLN and KAZAL subdomains also differs between FIMs and FDs. In FIMs, but not in FDs, this stretch of residues is characterized by STRIDE as a 3 10 -helix (H0) (45).
Analysis of all 16 FD crystal structures reveals two classes of alternative orientations of the FOLN subdomain relative to the KAZAL subdomain; the orientations are stabilized by differing interactions between the subdomains. In the more commonly occurring type I FDs, an elongated conformation results from close proximity of B1 and H1 (Fig. 7b). On the other hand, in type II FDs, a more compact conformation results from the proximity of the B1-B2 loop to the H1-B5 loop (Fig. 7c). It is notable that, in the four crystal structures of multiple FD proteins, the first FD in each chain is type I, whereas subsequent FDs are type II. Both FIMs of C7 (and of C6) may be classified as type I FDs. Likewise the homology model of fI-FIM based on the C7-FIM is consistent with a type I FD, although its structure deviates from C6-and C7-FIMs. It is possibly not a coincidence that fI possesses a serine protease domain (Fig. 1c), whereas the KAZAL subdomain of its FIM resembles a fold adopted by many serine protease inhibitors. Indeed, the orientation of the fI-FIM FOLN and KAZAL subdomains ensures that this potential serine protease inhibitory site remains accessible. Thus, the model supports previous suggestions that fI may be auto-regulatory (15).
The FIM Pair in Comparison with Multiple FD Structures-There is a dramatic difference between the closely packed, highly symmetrical arrangement of the pair of C7-FIMs observed in the current study and the organization of tandem FDs within previously determined multiple FD crystal structures. In all of the latter structures, adjacent FDs adopt an open, elongated arrangement with large tilt relative to their neighbors (Fig. 7c). Moreover, none of the previous multiple FD structures exhibit symmetrical arrangements of neighboring FDs. The buried surface area between the two modules in FIMs (16% of the total surface area) is two to three times that of adjacent FDs (5-9%), and the length of the long axis of a FD pair is roughly double that of the equivalent axis of FIMs (supplemental Fig.  S5). The intimate association of FIMs in our structure and their unusual homodimer-like appearance are evidence that this is not an artifact of the truncated construct but is a good representation of FIMs in the context of intact C7. To the best of our knowledge, such an orientation has not previously been observed for a tandem module pair.
It is noteworthy, however, that each of the reported crystal structures feature tandem FDs bound to ligand, most commonly activin, whereas the C7-FIMs structure was solved in the apo-state. In each complex, ligand-binding sites are distributed over two or more FDs (73,74,75), and the interdomain arrangement is critical for complex formation; important binding sites would be buried if they adopted the closed arrangement of the C7-FIM pair. Indeed, the possibility that the three FDs of follistatin form a far more compact shape in the absence of ligand was mooted previously (76), based on investigation of a follistatin splice variant. This variant contains additional residues at its carboxyl terminus that interact with (and occlude) a heparinbinding site on the amino terminus of its three consecutive FDs. It is important to consider, however, that the linkers between FDs may not be long enough for them to adopt the arrangement of FIMs observed in apo-C7. In C6 and C7 the inter-FIM linkers (defined as residues lying between the tenth cysteine of FIM1 and the first cysteine of FIM2) are 24 and 9 residues, respectively, whereas linkers between FDs range from two to six residues among published structures. It is also possible that the lack, until now, of any structural information for apo-forms of multiple FDs may be due to a flexible organization of FDs that makes them more difficult to crystallize.
Proposed Sites for Interaction between FIMs and C5-C345C-Previous work established that C7-FIMs, and to a lesser extent C6-FIMs, bind to the C345C domain of C5b (18,22), which appears to be flexibly attached to the rest of C5b according to crystallographic studies (25). Further studies showed that C7-FIM2 in isolation was not sufficient for binding, supporting the suggestion of a C345C contact site that is spread across both FIMs (24). The structure of C5-C345C suggested that the exposed face of its ␣-subdomain could be a site for interaction with C6-and C7-FIMs. This part of the C5-C345C structure exhibits electronegative and hydrophobic features not seen in the C3-or C4-C345C domains, neither of which bind to C6 or C7. Of particular interest is a patch of five negatively charged side chains next to exposed hydrophobic side chains near the carboxyl terminus of C5b. It is thus noteworthy that FIM1 exposes a predominantly electropositive face, including the conserved lysine at position 707, within the context of C7-FIMs  (71), and 2V53 (72). Where more than one FD was present in the Protein Data Bank, each is consecutively numbered. d, the relative orientation of KAZAL and FOLN subdomains is highly conserved within a group comprised of single FDs and the first FD of a tandem repeat (cyan for FDs and red and blue for FIM1 and FIM2, respectively). We refer to these FDs as type I FDs. In type II FDs (yellow) the two subdomains are in a more compact conformation.

Structure of C7 Factor I Modules
that is interrupted by a patch of hydrophobic residues (Met 717 , Tyr 719 , Leu 748 , and Tyr 757 ), including some that are strictly conserved in C7 orthologues (Met 717 and Tyr 719 ). The complementary nature of these two regions, on C5-C345C and C7-FIMs, means they are obvious candidates for a future mutagenesis study.
A further region of potential interest in the C7-FIMs structure is the "top" face as viewed from the perspective of Fig. 2a, bounded by the two symmetrically disposed FOLN subdomains that are tilted away from one another. It consists of a large concave surface across which runs a cleft that is lined with negative charges. This composite FIM1/FIM2 face is distal to the linkage with preceding modules within C7 and could well be exposed at the carboxyl-terminal extremity of the intact protein. Thus, it is well positioned for protein-protein interactions.
By analogy with the notion that FDs undergo a closed-toopen transition upon ligand binding, an alternative possibility presents itself; namely that the "closed" FIMs of C6 and C7 may open up during MAC self-assembly. In this respect, it is interesting that a conserved, function-critical arginine of second FDs is also conserved (Arg 824 ) in an alignment of C7-FIM2s and intriguingly forms a salt bridge from FIM2 to FIM1 of C7. Such a major conformational change would carry an enthalpic cost in terms of breaking the intermodular interface, but this could be compensated for by favorable interactions with another protein in the nascent complex. The resultant activation energy barrier would be consistent with a safety mechanism designed to avoid inappropriately triggering the irreversible and potentially harmful process of MAC assembly.