Functional Insights from the Structure of the Multifunctional C345C Domain of C5 of Complement*

The complement protein C5 initiates assembly of the membrane attack complex. This remarkable process results in lysis of target cells and is fundamental to mammalian defense against infection. The 150-amino acid residue domain at the C terminus of C5 (C5-C345C) is pivotal to C5 function. It interacts with enzymes that convert C5 to C5b, the first step in the assembly of the membrane attack complex; it also binds to the membrane attack complex components C6 and C7 with high affinity. Here a recombinant version of this C5-C345C domain is shown to adopt the oligosaccharide/oligonucleotide binding fold, with two helices packed against a five-stranded β-barrel. The structure is compared with those from the netrin-like module family that have a similar fold. Residues critical to the interaction with C5-convertase cluster on a mobile, hydrophobic inter-strand loop that protrudes from the open face of the β-barrel. The opposite, helix-dominated face of C5-C345C carries a pair of exposed hydrophobic side chains adjacent to a striking negatively charged patch, consistent with affinity for positively charged factor I modules in C6 and C7. Modeling of homologous domains from complement proteins C3 and C4, which do not participate in membrane attack complex assembly, suggests that this provisionally identified C6/C7-interacting face is indeed specific to C5.

Expression of the segment of C5 corresponding to its C345C domain (14) followed by analysis using CD and NMR confirmed that these amino acid residues fold to form a compact threedimensional structure (15). Furthermore, C5-C345C, unlike the C345C domain of C3, is able to bind to both C6 and C7 in surface plasmon resonance (SPR)-based assays (14). In further work, C5-C345C was shown to inhibit recruitment of C7 by C5bC6 through an interaction between C5-C345C and the pair of factor I membrane attack complex (FIMAC) domains, also called factor I modules (FIMs), at the C terminus of C7 (16). Thus the C5-C345C domain provides at least part of the interacting surface between C5b and C7 in formation of the MAC. The C345C domain also harbors a region that interacts with the C5 convertase (17), although the cleavage site itself lies some 800 residues away toward the N terminus of the ␣ chain of C5.
Although the fold of C5-C345C might be anticipated to resemble the fold of the NTR module from PCOLCE-1 (9), the sequence identity is low (see Fig. 1A) and disulfide bonding patterns are different, 1-4, 2-5, and 3-6 in the PCOLCE-1 NTR module compared with 1-3, 2-6, and 4 -5 in the C3 equivalent (and therefore by inference in the C5 example). The C5-C345C sequence is longer and contains fewer prolines (147 residues including three prolines) compared with the PCOLCE-1 NTR module sequence (119 residues including 11 prolines). An experimentally determined three-dimensional structure of C5-C345C would therefore represent an important advance in understanding the basis, at atomic resolution, for the early steps of MAC assembly.
Here we report the use of solution NMR to solve the structure of C5-C345C. We thus provide the first new structural information for the C3/C4/C5 family of proteins since the structures of the C3d and C4d fragments were solved (18,19) and, in the case of C5, since the anaphylatoxic C5a fragment structure was determined in 1989 (6). The new structure allows the construction of useful models of the C345C domains from C3 and C4. The positions within the structure of residues previously identified as being functionally critical and the location of surface patches likely to be involved in protein-protein interactions are now revealed.

EXPERIMENTAL PROCEDURES
Protein Preparation-pET15b vectors encoding the amino acid residues of C5 from Ala 1512 to the C-terminal residue Cys 1658 (both with and without the point mutation F1613A) were constructed as described previously (14). The isotopically enriched recombinant proteins were overexpressed in the Escherichia coli strain Origami (Novagen, Madison, WI) and purified as described previously (14). For NMR studies, 15 N-and 15 N, 13 C-protein samples (0.5-1.0 mM) were prepared in buffer containing 20 mM sodium phosphate, 100 mM NaCl, 5 M EDTA, 0.02% NaN 3 , pH 6.0, in 95% H 2 O, 5% D 2 O.
Binding Studies-Affinities of the recombinant wild-type and F1613A versions of C5-C345C for C6 and C7 were measured using SPR as described previously (14).
NMR Spectroscopy-NMR spectra were acquired on Bruker AVANCE 600-and 800-MHz and Varian INOVA 600-and 800-MHz spectrometers, using 5-mm triple resonance probes equipped with pulse-field gradients. Spectra were processed using the AZARA package (provided by W. Boucher, University of Cambridge), using maximum entropy processing of F1 and F2 dimensions of the three-dimensional experiments, and resonance assignment was achieved using ANSIG as described previously (15). Distance restraints for the structure calculation were derived from the following three complementary NOE spectroscopy (NOESY) experiments: a 15 N-edited NOESY-HSQC and two 13 C-edited NOESY-HSQCs, one in H 2 O buffer and one in D 2 O buffer. All mixing times were 100 ms.
Slowly exchanging amide protons were identified by the detection of 26 NH resonances in a 15 N-HSQC spectrum recorded 1 month after exchanging a protein sample into D 2 O buffer. Hydrogen bond acceptors for most of these slowly exchanging protons were identified using the refined initial structures. Distance restraints corresponding to hydro-gen bonds were only introduced following identification of the supporting characteristic NOEs. 15 N Relaxation Measurements-15 N T 1 and T 2 relaxation times were measured by the method of Kay (20). The pulse sequence was modified according to Grzesiek and Bax (21) to keep the water magnetization on the z axis during the T 1 period.  8, 111.7, and 126.5 ms were employed for T 2 measurements. The T 1 and T 2 relaxation times were calculated by nonlinear least squares fitting. In each case, the spectrum corresponding to one of the relaxation delay values was re-collected to allow an estimation of the experimental error of the measured peak intensities. For the 1 H-15 N HSQC heteronuclear NOE experiment (21), a saturation experiment and a reference experiment were recorded with a relaxation delay of 5 s, of which 3 s was used for 1 H saturation in the 1 H-saturated experiment.
Structure Calculation-Wherever possible, resonances in the NOESY spectra were assigned unambiguously. Otherwise a set of two or more assignment possibilities were assigned on the basis of their chemical shifts using the Connect program within AZARA. Peak intensities were converted into four distance categories of 0 -2.7, 0 -3.3, 0 -5.0, and 0 -6.0 Å. A total of 3544 distance restraints were generated from the three NOESY-HSQCs, of which 2609 were unambiguous and nondegenerate.
The NOE-derived distance restraints were used as input for the structure calculations using CNS-Solve (22). At a later stage more distance restraints representing the 26 inferred hydrogen bonds were added to the restraints list. The simulated annealing protocol employed the PARALLHD force field, with the nonbonded energy function of PROLSQ (23) and included active swapping of pro-chiral centers. For the initial structure calculations, the six cysteines were defined as being in the oxidized state. In the absence of information from experimental disulfide mapping, however, no covalent linkages between sulfur atoms were initially defined in the molecular structure file in order to avoid bias. At a later stage, and based on the initial structure calculations, two disulfide bonds, Cys 1514 -Cys 1588 and Cys 1535 -Cys 1658 , were added. There was a lack of NOE-based evidence to support the formation of a disulfide between the remaining pair of cysteines, Cys 1636 and Cys 1639 . This arose, at least in part, from a paucity of assignments for nuclei in this region of the sequence. No covalent linkage was therefore defined between these residues. As the calculations progressed, the ambiguously assigned distance restraints were "filtered" iteratively to eliminate assignment possibilities contributing less than 1% to the total NOE, and redundant restraints (duplicates) were also removed. A total of 100 structures were calculated from which a representative ensemble of 40 structures, with the lowest NOE-derived energies, was selected. The quality of the ensemble of structures was checked with PRO-CHECK (24). The NOE-derived distance restraints used for the structure calculations and the coordinates of the ensemble of 40 structures of C5-C345C have been deposited in the Protein Data Bank under accession number 1XWE.
Modeling C345C Domains of C3 and C4 -Modeling of the C345C domains of C3 and C4 was undertaken based on the lowest NOE energy NMR-derived structure of C5-C345C using the program Modeller release 7, version 7 (25). The alignments between the target sequences of human C3 and C4 C345C domains and the template structure were based on initial multiple sequence alignments of C3-, C4-, and C5-C345C domain sequences from various organisms from the SwissProt (26,27) and the GenBank TM nonredundant databases, using the program MUSCLE (28,29). The multiple sequence alignment (Fig. 1B) was manually edited to ensure the most plausible alignment of conserved amino acid residues and of secondary structure elements as predicted by PsiPred (30) between the target and template. The three putative disulfide bridges and the longer predicted C-terminal ␣-helix of both the C3 and C4 C345C domains were restrained during model building. Twenty models were generated in each case, and the one with the lowest objective function score (25) was selected as the representative model. The representative model structures were protonated using the program REDUCE (31) and were checked for valid stereochemistry using PROCHECK (24).

Recombinant F1613A Mutant
Binds C6/C7-The protein fragment, C5-C345C (residues Ala 1512 to the C-terminal Cys 1658 of human C5), with an N-terminal His tag was overexpressed in the E. coli strain Origami. The use of a bacterial expression system facilitated isotopic enrichment, and the Origami strain was selected because its oxidizing intracellular  (38). The extent of average secondary structure is represented above the sequences (solid, strand; hatched, helix), whereas residues within the C5 secondary structure are boxed. Buried (Ͼ95%) residues and surface-exposed (Ͼ30%) hydrophobic residues are indicated by filled squares and open squares, respectively. The disulfide linkages of C5 are shown. B, multiple sequence alignment of C345C from C3, C4, and C5 in a range of species. An initial multiple sequence alignment was derived using the program MUSCLE (28, 29) and edited manually. Residues in C5-C345C that are Ͼ95% buried are indicated by yellow highlighting. White letters against black indicate hydrophobic residues that are Ͼ30% surface-exposed. Gaps in chicken and rat sequences of C5 correspond to sequence information either missing from the data base or erroneously deposited/translated. environment is conducive to formation of disulfide bonds (32). After thrombin cleavage of the His tag, four extra residues (Gly-Ser-His-Met) remained at the N terminus of the C5-C345C sequence. Protein expression levels in rich media were typically 4 mg liter Ϫ1 but only 0.5 mg liter Ϫ1 in Martek 9-labeled media. Yields were improved 4 -5-fold using a construct with the point mutation F1613A. To assess any structural perturbations that might be introduced by such a mutation, 15 N, 1 H-HSQC spectra of 15 N-labeled wild-type and F1613A C5-C345C samples were compared (data not shown). Nearly all the resonances coincide. Significant chemical shift differences were noted only for those peaks corresponding to residues located close in sequence to the mutation, namely Ile 1609 -Tyr 1617 ; of these only Asn 1612 , Phe/Ala 1613 , and Ser 1614 show major differences. This observation demonstrates the F1613A mutant of C5-C345C has a near identical structure to that of the native domain. To assay for functional activity, binding to C6 and C7 was measured (Fig. 2). As may be judged from the SPR-derived binding parameters (Table I), the affinities of the F1316A mutant for both these MAC components are similar or identical to those of the wild-type domain. Given its higher expression levels, the mutant was therefore used in the subsequent structural studies.
The Solution Structure of C5-C345C Is Solved-The 15 N and 15 N, 13 C-labeled samples of C5-C345C yielded high quality NMR spectra thus permitting the assignment of nearly all of the 15 N, 13 C, and 1 H nuclei (15). Only a few assignments were made for Ser 1637 , Ser 1638 , and the four non-native residues at the N terminus because they all gave rise to few detectable resonances. Several assignments for aromatic side chain atoms were missing, mainly due to overlapping signals; these were Tyr 1543 (C ⑀ and H ⑀ ), Tyr 1611 (C ⑀ and H ⑀ ), Phe 1556 (C and H ), Phe 1615 (C ⑀ , H ⑀ , C , and H ), Phe 1642 (C ⑀ and H ⑀ ), and Phe 1654 (C ⑀ , C , and H ). Tyr 1541 is unusual in that its H ␦ and its H ⑀ nuclei have nondegenerate chemical shifts; a strong chemical exchange peak between the resonances of H ␦1 and H ␦2 and between H ⑀1 and H ⑀2 , indicates restricted rotation of its aromatic side chain (subsequently, the structure reveals that this side chain is indeed well buried within the core of the protein). The H ␣ atom of Leu 1521 exhibits an unusually low chemical shift of 1.58 ppm (cf. average shift is 4.32 ppm). All three proline residues are in the trans conformation as evidenced by the differences in the chemical shifts, ␦C ␤ -␦C ␥ of 4.03, 5.00, and 4.88 ppm for Pro 1537 , Pro 1620 , and Pro 1631 , respectively (␦C ␤ -␦C ␥ is 4.51 Ϯ 1.17 ppm for trans and 9.64 Ϯ 1.27 ppm for cis (33)), as well as strong NOE cross-peaks between the H ␦ s of the prolines and the H ␣ of the preceding residues.
Subsequently, a structure calculation was undertaken using a total of 3544 NMR-derived distance restraints as detailed in Table II. Two disulfide (Cys 1514 -Cys 1588 and Cys 1535 -Cys 1658 ) bonds were added only after NOE-based calculations had established beyond a doubt the proximity and orientation of the contributing cysteine side chains. A third potential disulfide was not invoked because, although the remaining two cysteine residues are close in space, there is insufficient NOE-derived evidence to judge whether their side chains are appropriately juxtaposed. Similarly, distance restraints based on 26 inferred inter-␤-strand H-bonds were not added until the later stages of the structure calculation.
A total of 40 structures, selected on the basis of lowest NOE-derived energy from 100 calculated ones, converged well in most regions as may be judged from a backbone overlay (Fig.  3A) and the values for r.m.s.d. in Table II. The r.m.s.d. of the C ␣ coordinates of the 40 selected structures from those of the mean structure are plotted in Fig. 4A as a function of residue number and compared with the distribution of 1 H-1 H NOEs (Fig. 4B). Significantly fewer than average NOEs are exhibited by two stretches of residues within the sequence (Ile 1609 -Phe 1615 and Thr 1635 -Cys 1639 ) and by the N-terminal residues Ala 1512 and Asp 1513 . This is reflected in the elevated r.m.s.d. values of their C ␣ s and is also evident from inspection of the overlay in Fig. 3A. In the case of Ser 1637 and Ser 1638 , the aforementioned lack of detectable amide signals would account in part for the dearth of NOEs.
Description of the Structure-For the purposes of the description below, and unless stated otherwise, a residue is designated as belonging to an ␣-helix or ␤-strand in C5-C345C if it is so defined in the majority of the 40 members of the ensemble according to the Kabsch and Sander (34) criteria, as implemented in MolMol (35). Two views of the fold of the closest-tothe-mean C5-C345C structure are shown in Fig. 3B. The core of the structure is an OB-class fold that is most easily thought of as two orthogonal three-stranded, antiparallel twisted ␤-sheets composed from strands A C -B-C and strands A N -D-E, where the superscripts N and C denote the N-and the C-terminal halves of strand A (Tyr 1541 -Val 1552 ). There are two adjacent helices as follows: a short one (helix-1, Arg 1530 -Ala 1534 ) composed of residues from near the N terminus of the module, and a longer and irregular one (helix-2, Leu 1643 -Leu 1655 ) close to the C terminus of the module (and of the full-length protein). The two helices  Table I. The 2 values for the fits were 2.5, 11, 0.7, and 11 for the WT/C6, WT/C7, mutant/C6, and mutant/C7 sensorgrams, respectively. All data were obtained on different sensor chips.
WT C5-C345C 10 2 ϫ 10 4 2 ϫ 10 Ϫ4 3 3ϫ 10 4 9 ϫ 10 Ϫ5 F1613A C5-C345C 9 2 ϫ 10 4 2 ϫ 10 Ϫ4 2 3ϫ 10 4 6 ϫ 10 Ϫ5 are tilted with respect to one another but are essentially aligned with, and lie against, the convex face of the A N -D-E sheet. Strand B (Val 1557 -Lys 1568 ) extends beyond the A C -B-C sheet so that its C-terminal part participates in a four-stranded anti-parallel sheet B C -A N -D-E. In a small proportion of calculated structures, there are two segments to strand E, E 1 (Ile 1618 -Pro 1620 ) and E 2 (Trp 1626 -Tyr 1629 ), interrupted by coil. Strand E 1 , which is assigned (within MolMol) in only a few structures, forms a small parallel ␤-sheet with strand C (Glu 1579 -Lys 1584 ). In all of the C5-C345C structures, there is potential for H-bonds between the CO of Tyr 1619 and the NH of Thr 1581 , and between the NH of residue Tyr 1619 and CO of Ile 1583 , thus completing the hydrogen bond network that forms the barrel-like structure. Strand E 2 , which appears in all structures, is antiparallel to strand D (Gln 1598 -Gly 1603 ). Thus the barrel has a "closed" side made up from the ␤-strands, and a more "open" side (to the right of the view in the left-hand panel of Fig. 3B) occupied by Tyr 1619 and the residues prior to E 2 . The N-terminal segment of the domain runs above one end of the barrel, from Cys 1514 to the top of helix-1. Cys 1514 is disulfide-linked to Cys 1588 , which is located in the long CD loop; the CD loop crosses over the otherwise open end of the barrel from the A C -B-C sheet to the A N -D-E sheet (Fig. 3B). The 15 N T 1 /T 2 ratios (Fig. 4C) in some residues of the N-terminal segment (but not in the CD loop) suggest chemical exchange (i.e. microsecond to millisecond time scale conformational fluctuations), whereas in both the N-terminal segment and the CD loop there is also some evidence of rapid (i.e. nanosecond and faster) motion from the heteronuclear NOE plot (Fig. 4D). At the bottom of the short helix-1 the transition to the long strand A contains Cys 1535 , which is linked to Cys 1658 , the C-terminal residue of the module. The BC loop caps off the other end of the barrel and corresponds to a dip in the heteronuclear NOE plot consistent with rapid motion, but there is no evidence of chemical exchange among these residues. Following strand D, there is a 14-residue loop that in a few members of the ensemble contains antiparallel ␤-strands from Leu 1607 -Ile 1609 and Arg 1616 -Ile 1618 . This long DE loop protrudes prominently from the open side of the barrel-like structure, opposite to the pair of helices (Fig. 3B, left-hand panel). Although locally defined by 1 H, 1 H NOEs, the position of the tip of the loop relative to the remainder of the module is not experimentally defined as in-dicated by the very high r.m.s.d. values for these residues (Fig.  4A). The residues concerned (1610 -1614) are highly mobile on both fast and chemical exchange time scales. After strand E 2 , a stretch of 13 residues makes the transition to the top of helix-2. From residue 1634 onward, this region (which includes the sequence Cys 1636 -Ser-Ser-Cys 1639 ) is mobile on slow and fast time scales, is ill-defined by 1 H, 1 H NOEs, and forms a bulge above helix-2 (evident in Fig. 3B, right-hand panel).
Examination of the ensemble of calculated structures indicates that helix-2 is not a straight, regular ␣-helix over its entire length, a situation reminiscent of the equivalent regions of TIMP-1 and -2 (discussed below). Therefore, no inferred hydrogen bond-based restraints were used in this region so as to ensure the calculation was not biased. In some members of the ensemble, the first and last residues are classified as turns; and more significantly, in nearly half of the structures the ␣-helix (as assigned in MolMol) is broken by a bend or turn toward the middle. This is reflected in the broken nature of the helix as drawn by PyMol for the closest-to-the-mean structure (Fig. 3B).
The pattern of disulfide bond formation thus agrees with that predicted on the basis of disulfide mapping in C3 (36). The first Cys (1514) is disulfide-bonded to the third Cys (1588); and the second Cys (1535) is linked to the sixth Cys (1658). The third disulfide, involving the remaining fourth and fifth Cys residues (1636 and 1639), has presumably formed because biochemical analysis suggested that no free sulfhydryl groups are present in C5-C345C, but its presence could not be supported by the NOE data. It is possible that this bond exists only transiently due to the constraints placed by there being only two residues between these two cysteines.
Of 36 residues in C5-C345C that are Ͼ95% buried (on average in the ensemble) (see Fig. 1), three are likely to be charged in the solution conditions used. The alkyl chain of Lys 1584 is buried, but its amino group is exposed to solvent. However, Arg 1530 and Glu 1628 are completely buried and proximal, indicating the likelihood of an unusual ion pair or salt bridge connecting helix-1 with strand E of the barrel. Ala 1534 of helix-1, a stack of four residues located along one side of the second helical region (Phe 1642 , Leu 1646 , Phe 1649 , and Ile 1653 ), two buried residues from the start of strand A (Ile 1539 and Ala 1542 ), two residues from strand D (Leu 1600 and Met 1602 ), and Pro 1631 from beyond strand E are all deeply buried in a hydrophobic core between the helices and the barrel along with the Arg/Glu salt bridge. Of the remaining deeply buried residues, all contribute to the hydrophobic core of the barrel.
Most of the solvent-exposed (Ͼ30%) hydrophobic residues lie in the DE loop, whereas Phe 1654 and Leu 1655 are exposed near the C terminus. Adjacent to this exposed pair of side chains is a patch of negatively charged side chains (glutamates 1528, 1648, and 1651; aspartates 1647 and 1652) that dominate the electrostatic surface of C5-C345C (see below). This surface (to the left in the left-hand panel of Fig. 3B) is likely to be exposed in full-length C5 because it is distal to the N terminus of the C345C domain.  1A. This work therefore confirms that the C345C domain of C5 is an example of an NTR module. For the purposes of further discussion, all four domains represented in Fig. 5 (and the equivalent TIMP-1 domain) will henceforth be referred to as NTR modules. Many of the buried residues of C5-C354C that lie in strands are conserved or conservatively substituted in the other modules. These are drawn in Fig. 5; examples include the following: the first, second, third, fifth, and seventh residues of strand A; the first, third, fifth, and seventh residues of strand B; residues in positions 2-6 of strand D; and three residues (equivalent to Tyr 1619 , Leu 1621 , and Ile 1627 ) in the strand E region. On the other hand, many of the side chains that make up the hydrophobic core between the ␤-barrel-like (␤-) subdomain and the helix-rich (␣-) subdomain are not well conserved, consistent with a higher degree of structural divergence in the helical subdomain. The buried partners that comprise the putative salt bridge, Arg 1530 and Glu 1628 of C5, are replaced by hydrophobic residues in the other domains. Many of the exposed hydrophobic residues of C5-C345C lie in insertions, and examples include Leu 1523 , Pro 1537 , and Val 1573 (that lies in the BC loop and is one of the most exposed residues in the protein) and five residues (including Ala 1613 that replaces the wild-type Phe) in the DE protuberance. The conspicuous tandem pair of exposed hydrophobic residues near the C terminus (Phe 1654 and Leu 1655 ) is not conserved.
As would be expected from the conservation of buried residues, the ␤-subdomains of the known NTR module structures are minor variations on a common theme of an OB-fold. In TIMPs, strand E has two segments. In the NTR module of PCOLCE-1, both strands D and E have two segments. In these cases, strand E 1 is antiparallel to the C-terminal segment of strand D and runs parallel with strand C for a short way. In the agrin NTR module, strand E consists of only one segment, equivalent to E 1 , that is sandwiched between strand C (with which it is parallel) and strand D (anti-parallel). Thus C5-C345C is the only NTR module in which there is no evidence for a small mixed sheet formed by part of strand D, strand E 1 , and part of strand C, and although there is some hydrogen bonding between residues in equivalent parts of the sequence, the NOEs do not support regular secondary structure. The overall effect is that this side of the barrel is more open in C5-C345C (Fig. 5), and the DE loop is probably more flexible than in the other NTR modules.
In both the TIMP and PCOLCE-1 NTR modules, two disulfides staple the N-terminal region to the rest of the ␤-subdomain; the first cysteine connects to the crossover CD loop, the second disulfide connects the N-terminal segment either to the short sequence that interrupts strand E (in TIMP), or to the CD loop (in PCOLCE-1). The agrin NTR module and C5-C345C each have only one cysteine in this N-terminal region, which in both cases is disulfide-linked to the CD loop. Compared with C5-C345C, the other NTR modules have shorter N-terminal segments that take a more direct course to reach helix-1. None of them has an equivalent to the exposed leucine (1523) of C5. In TIMP, the N-terminal stretch of the NTR module corresponds to the N terminus of the full-length protein and harbors its metalloprotein-binding site (11). Given its very different conformation, it seems unlikely that the N-terminal segment of C5-C345C plays a comparable role.
TIMP-2 lacks the prominent loop seen in C5 between strands D and E 1 , but its A and B strands are more extended. The PCOLCE-1 NTR module also lacks the long loop between strands D and E that forms such a prominent feature of C5-C345C. Between strands B and C of the agrin NTR module, there are two turns of ␣-helix, absent in C5-C345C and the other NTR modules, that lie across one end of the barrel. As in C5, the agrin loop between strands D and E is extended, although it is not as long or as prominent as in C5.
In all the NTR modules there are helical regions, packed against the convex surface of the A-D-E sheet, forming the ␣-subdomain. The NTR modules of TIMP-2 and agrin are at the FIG. 3. Solution structure of C5-C345C. A, stereo-view (cross-eyed) showing backbone overlay of 40 lowest NOE energy structures from a total of 100 calculated. Structures are overlaid on all backbone atoms except for those in residues prior to Cys 1514 , 1610 -1615 and 1635-1639. B, two orthogonal views of a PyMol schematic (www.pymol.org), with secondary structure assigned by the standard settings within PyMol, of the closest-to-mean structure from the ensemble in A, with annotated strands, loops, helices and cysteine residues (drawn as sticks). The schematics are colored sequentially from blue (N terminus) to red (C terminus).
respective N termini of the full-length proteins; in the case of TIMP-2, the ␣-subdomain abuts against the C-terminal domain. By contrast, in PCOLCE-1 and C5-C345C, the NTR modules are located at the C terminus and one surface of the ␣-subdomain is likely exposed to solvent. The more N-terminal helix of each of the four NTR modules is structurally conserved, Schematics were drawn using PyMol to represent the closest-to-the-mean structure of C5-C345C, the NTR module of PCOLCE-1, and the N-terminal domains of TIMP-2 and agrin. Structures were superposed using the program MULTI-PROT (39) to achieve equivalent views (the same one as in the left-hand panel of Fig. 3B). The side chains of residues that are conserved, or conservatively replaced, among this set of proteins are drawn. The structures are colored sequentially from blue (N terminus) to red (C terminus). and with the exception of agrin, a disulfide connects a cysteine near the C terminus of helix-1 to a cysteine at or near the C terminus of the module. There is greater structural variation elsewhere in the ␣-subdomain. PCOLCE-1 and agrin lack the long sequence between strand E and helix-2 that in C5-C345C comprises the prominent bulge above helix-2. In TIMP-2, the C-terminal portion of the NTR module is composed of a helical turn and two separate segments of ␣-helix; this is reminiscent of the equivalent region of C5 in which there is also evidence of an interrupted ␣-helix. Helix-2 of PCOLCE-1 is the shortest among this set of modules. In both PCOLCE-1 and agrin, the C-terminal helix is straighter and more regular compared with helix-2 of C5-C345C with no evidence of any interruptions.
Comparison with Other C345C Domains-The C345C domains of C3 (residues 1518 -1663) and C4 (residues 1595-1744) are both 26% identical to C5-C345C (Fig. 1B). The Arg/Glu salt bridge is conserved as are many other buried residues. Exceptions include the following: Ala 1540 at the start of strand A that is replaced by an Asp or a Glu in C3 and C4, respectively; Val 1545 that is conserved in C4 but replaced with a Thr or a Gly in some examples of C3; Val 1557 at the beginning of strand B is replaced with Arg in C4 and an Asp in most examples of C3; Ile 1580 in strand C is replaced by Arg in C3 and C4. Exposed hydrophobic side chains that are conserved include the following: Pro 1537 (in an insertion relative to the non-C3/C4/C5 NTR modules) and Leu 1607 , which is also a large hydrophobic residue in most examples of C3 in the alignment but is replaced by Ser in rat and mouse C4 (and is not conserved in the non-C3/C4/C5 NTR modules). Finally, the pair of hydrophobic residues toward the bottom of helix-2 are well conserved in C5 and most examples of C3 but not in C4 (nor in the NTR modules of TIMP, PCOLCE-1, and agrin). The most conspicuous examples of exposed hydrophobic residues that are peculiar to C5 lie in the DE protuberance, which has a pronounced hydrophobic character.
A further comparison of the C3, C4, and C5 modules was made on the basis of homology-modeled three-dimensional structures of the C3 and C4 examples (Fig. 6). Obvious structural differences arise from the DE insertion of C5-C345C and the insertions in the sequences of C3 and C4 between the fourth and fifth Cys residues, prior to helix-2. According to the secondary structure prediction program PsiPred (30), helix-2 is strongly predicted to begin well before the fifth Cys of C3 and C4 (this region forms a turn but is not classified as a helix in the majority of the C5-C345C NMR-derived structures) and to extend to the C terminus. In human C3, the beginning of the predicted helix and the preceding region (following strand E 2 ) are rich in negatively charged residues (seven within a 10-residue stretch), and this feature dominates the electrostatic representation of the C3-C345C surface (Fig. 6). By contrast, the equivalent region of the human C5-C345C surface is neutral, whereas in human C4 it is positively charged (Fig. 6). In all three proteins, the middle part of helix-2 is negatively charged, with human C5 having the most charge and human C4 the least. Another feature common to C3, C4, and C5 is that the open face of the ␤-subdomain is neutral or positively charged.
Interpretation of Mutagenesis Data-In previous work, the DE region of C5-C345C had been investigated as a site of possible functional significance on the basis that it is close to an "indel" (indels are evolutionary insertions or deletions of amino acid residues that result in length polymorphisms among members of a protein family). Deletion of the putative insertion Ser 1623 and Leu 1624 resulted in significant loss of hemolytic activity (40% of wild type, with normal expression levels) in full-length C5 (37). In light of the three-dimensional structure it seems likely that such a deletion, just prior to strand E 2 , would affect the structural integrity of the ␤-subdomain or at least disrupt the open side of the barrel and the DE loop. Substitution of Leu 1607 -Tyr 1611 by the sequence DFWGE resulted in loss of all detectable hemolytic activity (37). In this case, the original sequence corresponds to a poorly structured region of the DE loop with no buried side chains, and the substitution would be unlikely to disrupt the structure of the domain. These observations therefore implicate the DE loop in function.
Subsequently, alanine substitutions of residues from Gly 1603 to Pro 1621 were carried out in full-length C5 (17). Most substitutions had little or no effect on the hemolytic activity of C5 or its susceptibility to proteolytic cleavage. From the structure it can be seen that Tyr 1619 is Ͼ95% buried within the ␤-subdomain, yet replacement with alanine resulted in only ϳ50% loss of activity. This implies that the structural framework provided by the ␤-barrel is able to accommodate such a substitution, which is intriguing because a bulky hydrophobic residue is conserved at this position in the other NTR modules of known structure. Substitution of Ile 1609 or Tyr 1611 by alanine would not be expected to disrupt any local structure within the DE protuberance because their side chains are exposed, and indeed these mutations had little effect on hemolytic activity or proteolytic susceptibility. Substitution of the exposed Lys 1610 , on the other hand, produced mutant C5 molecules with both low hemolytic activity and decreased sensitivity to proteolytic activation. This is consistent with the pentapeptide substitution experiment and pinpoints Lys 1610 as the functionally critical residue in that peptide.
Substitution of Phe 1613 and Phe 1615 also perturbed hemolytic activity and decreased proteolytic susceptibility to the classical pathway convertase (but not cobra venom factor, which is able to cleave both C3 and C5 and therefore presumably has a different recognition mechanism). Comparison of HSQC spectra for the wild-type and F1613A versions of C5-C345C proved that this mutation has no nonlocal structural effects, and indeed the F1613A mutant was used for structure determination in the current study. Phe 1615 is also exposed, and mutation to Ala would be equally unlikely to disrupt structure. Therefore these mutagenesis results clearly identify three exposed side chains (1610, 1613, and 1615) as being specifically involved in an interaction, either with the convertase or within the fulllength C5, that is critical for function. It is striking that these three residues, whose Ala substitutions cause 80 -90% loss of C5 activity, are at the tip of the DE extension and that their side chains, as well as those of two of the four residues whose substitutions cause 50% loss of activity (Arg 1616 and Tyr 1617 ), are located on the same side of the protuberance.
In the absence of a three-dimensional structure for C3, C4, or C5, the physical distance of the C345C module from the cleavage site (some 800 residues in terms of primary structure) is unknown. From the structure of the module, however, it can be seen that the DE loop exposes hydrophobic side chains (including those of the two critical Phe residues) and lies close to the N terminus of the module. Just prior to Cys 1514 in the C5 sequence is Cys 1509 that is (by extrapolation from disulfide-mapping in C3) disulfide-linked to Cys 848 . Thus the C345C domain is likely closely coupled to further structured domains, and therefore, the DE extension could be buried in the interface between the C345C domain and the remainder of the C5 protein. In that case mutations of the DE loop might disrupt the arrangement of domains within the full-length protein and exert their functional influence indirectly. Arguing against this, however, is the observation that a peptide extending from Lys 1604 to Arg 1616 inhibited complement hemolytic activity and activation of C5 by the convertase pathway C5 convertase (but not cobra venom factor) (17). Furthermore, the consequences for inhibitory activity of alanine substitution within the peptide reflected the results of alanine-scanning mutagenesis in C5. The peptide studies are therefore more consistent with a direct interaction between the DE extension and the classical pathway convertase.
C5-C345C, but not the equivalent domain from C3, binds reversibly to C6 and C7, with a preference for C7 (14) (and Fig.  2), and the interaction with C7, but not C6, appears to be essential for the nonreversible formation of the MAC. 2 The F1613A mutant used in the current structure determination retained the ability to bind C6 and C7 (Table I); indeed, none of several DE loop mutations in C5 influenced binding to C6 (17). Therefore, the C6/C7-binding site of C5-C345C is likely to lie elsewhere. Another set of mutants 2 was therefore constructed in which deletions or insertions were made in the N-terminal region, the AB, BC, and CD loops, and the Cys-Ser-Ser-Cys bulge. These changes removed several of the exposed hydrophobic side chains evident in the C5-C345C structure (Fig. 1A). Nonetheless, all the mutants in this set showed full affinity for C6 and C7. 2 One region that has not so far been explored by mutagenesis, however, is that made up from the exposed face of the ␣-subdomain. This part of the structure exhibits unexpected structural differences from the other NTR modules and diversity in terms of chemical character among C3, C4, and C5. As described above (see Fig. 6), in C5-C345C the region has electronegative and hydrophobic features not seen in the C4 C345C domain. The C345C domain of C3 has many more Asp and Glu residues than either the C4 or C5 equivalents and consequently has an overall electronegative character, but C3 lacks the patch of five negatively charged side chains that occur next to the exposed hydrophobic side chains near the C terminus of C5. C5-C345C has been shown (16) to bind to the FIMAC domain pair of C7. C7-FIMAC-II has been expressed alone and found not to interact with C5-C345C, thus implicating FI-MAC-I in the interaction. There is no experimental structural information for FIMAC domains, but it is noteworthy that the pI of C7-FIMAC-I is 9.5 (for C7-FIMAC-II it is 4.6), consistent with a strong ionic component to the recognition of C7-FI-MAC-I by C5. In contrast, the pI values of the C6-FIMACs I and II are 7.8 and 9.2, respectively. If ionic interactions also dominate the C5-C6 interaction then, in the case of C6, it is FIMAC-II that more likely binds to C5-C345C. Hence, nonequivalent FIMACs in C6 and C7 could mediate binding to C5, or the C5-C6 interaction could be of a different nature to the C5-C7 one. Either situation is compatible with the observation that binding of C5 to C7, but not to C6, is essential for MAC formation. 2 Conclusions-This work confirms that the multifunctional C-terminal 150 residues of C5 constitute an independently folding unit that is an example of an NTR module. NTR modules are compactly folded and globular, containing a ␤-subdomain consisting of an OB-fold barrel, against which is packed a structurally divergent ␣-subdomain. In the C5 NTR module, residues at the tip of a prominent mobile hydrophobic loop between ␤-strands D and E enhance proteolytic activation of C5, probably via a direct interaction with the convertase enzyme. Many NTR modules appear to have a functional association with proteinases, but comparison of the structures does not suggest similarities in mechanism. A region not yet explored by mutagenesis that is a good candidate for a C6/C7interacting surface is located on the exposed side of the ␣-subdomain. In the context of full-length C5b, this electronegative region, which is far in space from the N terminus of this C-terminal module, is likely to be accessible and available for binding to the electropositive FIMAC domains of C6 and C7.