NMR Solution Structure of Domain 1 of Human Annexin I Shows an Autonomous Folding Unit*

Annexins are excellent models for studying the folding mechanisms of multidomain proteins because they have four–eight homologous helical domains with low identity in sequence but high similarity in folding. The structure of an isolated domain 1 of human annexin I has been determined by NMR spectroscopy. The sequential assignments of the 1H, 13C, and 15N resonances of the isolated domain 1 were established by multinuclear, multidimensional NMR spectroscopy. The solution structure of the isolated domain 1 was derived from 1,099 experimental NMR restraints using a hybrid distance geometry-simulated annealing protocol. The root mean square deviation of the ensemble of 20 refined conformers that represent the structure from the mean coordinate set derived from them was 0.57 ± 0.14 Å and 1.11 ± 0.19 Å for the backbone atoms and all heavy atoms, respectively. The NMR structure of the isolated domain 1 could be superimposed with a root mean square deviation of 1.36 Å for all backbone atoms with the corresponding part of the crystal structure of a truncated human annexin I containing all four domains, indicating that the structure of the isolated domain 1 is highly similar to that when it folded together with the other three domains. The result suggests that in contrast to isolated domain 2, which is largely unfolded in solution, isolated domain 1 constitutes an autonomous folding unit and interdomain interactions may play critical roles in the folding of annexin I.

Most proteins in nature are large multidomain proteins (1). While a great deal of knowledge on the folding properties of small single-domain proteins has been acquired (2), our understanding of the folding of multidomain proteins is still poor. It has been suggested that the domains of large proteins fold independently and subsequently assemble to form the native structures (3)(4)(5).
Annexins are a large family of ubiquitous proteins that bind to phospholipids in the presence of calcium ions (6,7). Although their physiological functions are not clear, these proteins are implicated in many important cellular processes (8) such as exocytosis (9,10) and ion channeling (11). All annexins contain four homologous repeats of ϳ70 residues (see Fig. 1, a and c) and a variable N terminus, with the exception of annexin VI, which has four additional repeats. The crystal structures of annexins I, II, III, IV, V, VI, VII, and XII have been determined (12). As revealed by x-ray crystallography, each repeat forms a compact domain consisting of five helices. All the domains are highly similar in structure, as illustrated in Fig. 1b with the four domains of annexin I. The four domains of each annexin are arranged in a planar-cyclic manner with domain 4 in contact with domain 1, as depicted in Fig. 1a. Domains 1 and 4 as well as domains 2 and 3 have many tight hydrophobic contacts, mainly involving helices B and E, constituting two two-domain modules. The interactions between the two modules are mostly hydrophilic via helices A and B of domains 2 and 4, forming a central hydrophilic channel.
With the well defined domains and the simple and elegant structure, annexins are excellent models for studying the folding mechanisms of multidomain proteins. Using synthetic peptides and more recently recombinant peptides, Sanson and co-workers (13)(14)(15)(16) have been systematically studying the folding properties of domain 2 of human annexin I. They have clearly shown, with CD and NMR, that isolated domain 2 of annexin I is largely unfolded in aqueous solution (15). A preliminary study on the folding properties of domain 1 has also been reported (17).
Our approach to dissect the folding mechanism of annexin I is to compare the folding properties of the intact protein and the four isolated domains. We have expressed the entire annexin I and the four individual domains in Escherichia coli. Using multidimensional NMR techniques, we have determined the solution structure of domain 1 (residues 14 -86, according to the numbering of the crystal structure of an N-terminal truncated human annexin I (18)). The NMR structure of the isolated domain 1 is highly similar to the corresponding part of the crystal structure of a truncated human annexin I containing all four domains (18). The result shows that in contrast to isolated domain 2, which is largely unfolded in solution, isolated domain 1 constitutes an autonomous folding unit. Comparative structural analysis suggests that interdomain interactions may play critical roles in the folding of annexin I.

EXPERIMENTAL PROCEDURES
Materials-The E. coli clone containing the cDNA encoding human annexin I was purchased from ATCC (ATCC number 65114, deposited by Joel Ernst). The expression vector pET-17b was purchased from Novagen. DNA sequencing kit was obtained from United States Biochemical. Enzymes for recombinant DNA experiments were purchased from Life Technologies, Inc. or New England Biolabs. 15 NH 4 Cl and [ 13 C 6 ]-D-glucose were purchased from ISOTEC. Other chemicals were analytical or reagent grade from commercial sources.
Cloning-The amino acid sequence of domain 1 of human annexin I is shown in Fig. 1c. The portion of human annexin I cDNA that encodes domain 1 was cloned into the expression vector pET-17b by polymerase chain reaction and other standard recombinant DNA techniques. The primers used for the polymerase chain reaction cloning were 5Ј-GGAATTCCATATGACCTTCAATCCATCCTCG-3Ј (forward) and 5Ј-CCGGATCCTTATTTTAGCAGAGCTAAAACAAC-3Ј (reverse). The correct amino acid sequence was verified by double-stranded DNA sequencing of the DNA insert in the expression construct pET-17b-ANX1D1.
Expression and Purification-Unlabeled protein was produced by growing the E. coli strain BL21(DE3) containing the expression construct pET-17b-ANX1D1 in LB media in the presence of 100 g/ml ampicillin at 37°C without IPTG 1 induction. Uniformly 15 N-labeled protein was produced by growing the same expression strain in M9 media with 15 NH 4 Cl as the sole nitrogen source, and uniformly 15 N/ 13 Clabeled protein was produced in M9 media with 15 NH 4 Cl and [ 13 C 6 ] D-glucose as the sole nitrogen and carbon sources. Protein production in the M9 media was induced by addition of IPTG to a final concentration of 0.4 mM when the cultures reached an A 600 of ϳ1.0. The culture was incubated for 4 more h after addition of IPTG. The bacterial cells were harvested by centrifugation and suspended in buffer A (40 mM acetate, 1 The abbreviations used are: IPTG, isopropyl-1-thio-␤-D-galactopyranoside; DQF-COSY, double quantum filtered correlation spectroscopy; HSQC, heteronuclear single quantum coherence; NOE, nuclear Overhauser effect; NOESY, NOE spectroscopy; PAGE, polyacrylamide gel electrophoresis; RMSD, root mean square deviation; TOCSY, total correlation spectroscopy; HCCH-TOCSY, proton-carbon-carbon-proton correlation using carbon TOCSY; HNCACB, amide proton to nitrogen to ␣/␤ carbon correlation; CBCA(CO)NH, ␣/␤ proton to ␣/␤ carbon (via carbonyl carbon) to nitrogen to amide proton correlation. Only the helices of each domain were used for the structural alignment. c, sequence alignment of the four domains. The numbering is according to the crystal structure of the truncated annexin I (18). The hydrophobic core residues are shown in yellow and other conserved residues are in blue. Panels a and b were generated using the program MOLMOL (43). pH 5.3). The bacterial suspension was sonicated on ice and centrifuged (27,000 ϫ g) at 4°C for 30 min. The supernatant was applied to a CM-cellulose column equilibrated with buffer A. The column was washed with buffer A until A 280 of the eluent was less than 0.05. Elution of the column was achieved by a linear NaCl gradient (0 -500 mM in buffer A) and monitored by A 280 and 15% SDS-PAGE. The fractions containing domain 1 of annexin I were pooled and concentrated by an Amicon ultrafiltration cell using a YM3 membrane. The protein preparations were Ͼ95% pure as judged by SDS-PAGE. Isotopically labeled proteins were further purified by a Sephadex G-50 column.  analyzed with the program PIPP (36). Briefly, solvent suppression was improved by convolution of time domain data (37). The data size in each indirectly detected dimension of the three-dimensional data was extended by backward-forward linear prediction (38). A 45°-shifted sine bell and single zero-filling were generally applied before Fourier transformation in each dimension.
Derivation of Structural Restraints-Approximate interproton distance restraints were derived from sequentially assigned NOEs. NOE cross-peaks between aliphatic protons were picked from the homonuclear two-dimensional NOESY spectrum, and those involving amide protons were from the three-dimensional 1 H-15 N NOESY-HSQC spectrum. The NOE intensities obtained by the program PIPP were converted into approximate interproton distances by normalizing them against the calibrated intensities of NOE peaks between backbone amide protons (d NN ) within the identified ␣-helices. The upper limits of the interproton distances were calibrated according to the equation 6 , where V a and V b were the NOE intensities and r a and r b were the distances. The distance bounds were then set to 1.8 -2.7 Å (1.8 -2.9Å for NOE cross-peaks involving amide protons), 1.8 -3.3 Å (1.8 -3.5 Å for NOE cross-peaks involving amide protons), and 1.8 -5.0 Å corresponding to strong, medium, and weak NOEs, respectively. Pseudoatom corrections were made for nonstereospecifically assigned methylene and methyl resonances (39). An additional 0.5 Å was added to the upper bounds for methyl protons.
Structure Calculation-NMR structures were calculated with a hybrid distance geometry-simulated annealing protocol (40) using the program X-PLOR (Version 3.1) (41) on an SGI Indigo II workstation. A square-well potential function with a force constant of 50 kcal mol Ϫ1 Å Ϫ2 was applied for the distance restraints. The X-PLOR f repel function was used to simulate van der Waals interactions, with atomic radii set to 0.80 times their CHARMM values (42) and a force constant of 4.0 kcal mol Ϫ1 Å Ϫ4 . A total of fifty structures were generated using this protocol. The structures were inspected by the programs MOMOL (43) and QUANTA96 (Molecular Simulations) and analyzed by PROCHECK-NMR (Version 3.4.4) (44,45). An iterative strategy was used for the structure refinement. In each round of structure refinement, newly computed NMR structures were employed to assign more NOE restraints, to correct wrong assignments, and to loosen the NOE distance bounds if spectral overlapping was deduced. Then another round of structure refinement was carried out with the modified NMR restraints. All structures were converged after several rounds of such refinement. An ensemble of 20 structures was selected according to their best fit to the experimental NMR restraints and the low values of their total energies.

RESULTS
Total sequential resonance assignments of the isolated domain 1 were achieved by the combined analysis of two-dimensional and three-dimensional NMR data, including three-dimensional HNCACB, CBCA(CO)NH, and HCCH-TOCSY. The sequential assignments of the backbone and side-chain amide resonances are shown in Fig. 2. Stereospecific assignments were made for about 70% of ␤-methylene protons and the methyl groups of valine and leucine residues based on qualitative estimations of 3 J ␣␤ constants from the DQF-COSY spectrum in conjunction with the NOE data (46).
A total of 1099 structurally useful distance restraints were obtained from the analyses of the homonuclear two-dimensional NOESY (D 2 O) and three-dimensional 1 H-15 N NOESY-HSQC spectra (Table I), 707 of which were medium and long range NOEs. In average, each residue had ϳ15 NOE restraints. A superposition of 20 calculated structures with no NOE restraint violations above 0.5 Å is shown in Fig. 3a. The statistics of the structures are summarized in Table I. The precision of the structures (RMSD of the ensemble of the 20 NMR structures from its mean coordinate) was 0.57 Å for the backbone (N, C ␣ , C', O) and 1.11 Å for all heavy atoms. The distribution of the average backbone RMSDs is shown in Fig.  4a. The structure of domain 1 consists of five ␣-helices: helix A, residues 5-15; helix B, residues 22-30; helix C, residues 34 -47; helix D, residues 52-58; and helix E, residues 63-70 (numbering according to the isolated domain 1). Helices A, B, D, and E are assembled in a bundle with two nearly parallel helix-loop-helix motifs. Helix C lies approximately perpendicular to the helical bundle with one end close to the N terminus and the other to the C terminus of domain 1.

DISCUSSION
Comparison with the Crystal Structure of Human Annexin I-The structure of a truncated human annexin I has been determined by x-ray crystallography in the presence of 10 mM CaCl 2 (18). The truncated annexin I lacks the N-terminal 32 residues but has four domains all intact (Fig. 1a). Six calcium ions are found to bind to the truncated annexin I, two each in domains 1 and 4 and one each in domains 2 and 3. The solution structure of the isolated domain 1 is highly similar to the corresponding part of the crystal structure of the truncated annexin I containing all four domains. Thus, the minimized average NMR structure of the isolated domain 1 can be superimposed very well with the corresponding x-ray structure as shown in Fig. 3b. There are 1-2 residue differences in the lengths of some helices but the five helices are assembled in the same way. The distribution of the average backbone RMSDs of the ensemble of the 20 NMR structures from the corresponding x-ray structure is shown in Fig. 4b. The largest differences are found at the N terminus and in the AB loop. It should be noted  that constitutes the second site is essentially the same as that found in the crystal structure, probably because the second site has lower affinity for Ca 2ϩ than the first site.
Implications for Protein Folding-As described earlier, the four domains of annexin I are highly homologous in structure when folded together (Fig. 1, a and b). The hydrophobic cores are highly conserved among all annexin domains. Surprisingly, isolated domain 2 is largely unfolded in aqueous solution and thus is not an independent folding unit (15). Its helical content is less than 25% compared with ϳ80% when the domain is folded together with the rest of the protein. In contrast to domain 2, our work presented here clearly demonstrates that the isolated domain 1 is fully folded in solution with little change in structure from that in the native state, and thus constitutes an autonomous folding unit. The results present an interesting question of why the domains with high sequential and structural homologies exhibit totally different folding behaviors.
The failure of the isolated domain 2 to form its native structure is likely because of the removal of the interdomain interactions that exist in the whole protein. As mentioned earlier, according to the crystal structure of annexin I (18), domains 2 and 3 form a modular structure with many hydrophobic interactions, and so do domains 1 and 4. Thus, it is unlikely that the removal of the hydrophobic contacts with domain 3 is the cause for the folding failure of the isolated domain 2. By default then, the removal of the interactions with domain 4 may be the cause for the failure of the isolated domain 2 to fold to its native structure. Indeed, there are many interactions between domain 2 and domain 4 as shown in Fig. 5. This explanation is supported by the NMR studies of the isolated domain 2 and its components helices A and B (14,15).
It has been shown by NMR that a stable nonnative N-terminal cap, with the sequence F 91 D 92 A 93 D 94 E 95 L 96 (numbering according to the crystal structure of the truncated annexin I), is formed in helix A in a peptide fragment containing helices A and B of domain 2 (14). With the carboxyl groups of Asp-92 and Glu-95 hydrogen-bonded to their reciprocal backbone amides and many hydrophobic contacts between Phe-91 and Leu-96, it is a canonical N-terminal cap (47,48). Furthermore, the nonnative cap persists in isolated domain 2 (15,16). It has been suggested that the nonnative N-terminal cap serve as a very potent initiation site for folding (14). However, it may be more likely that the formation of the nonnative N-terminal cap prevents the isolated domain 2 from reaching the native state for two reasons although its role in the folding of entire annexin I is not known. First, it disrupts a pair of hydrogen bonds between the carboxyl group of Asp-92 and the guanidinium group of Arg-117 that helps to lock helices A and B in place (18) (Fig.  5). The breakage of the hydrogen bond also makes it possible for Arg-117 to form nonnative salt bridges as found in the isolated domain 2 (16). Second, as shown in Fig. 5, in the native structure, Leu-96 is roughly at the center of the hydrophobic core. It is surrounded by as many as seven core residues: Met-100 from helix A; Leu-110, Ile-113, and Ile-114 from helix B; Ile-125 and Tyr-129 from helix C; and Leu-137 from helix D. On the other hand, the side-chains of Phe-91 and Leu-96 are Ͼ10 Å apart. Thus, the nonnative hydrophobic interactions between Phe-91 and Leu-96 in the isolated domain may not only take out the side-chain of Leu-96 from the hydrophobic core structure but also disrupt the packing of the other hydrophobic core residues. The nonnative conformation of the isolated domain 2, however, may not necessarily have a lower energy than the native conformation. The nonnative N-terminal cap may act as a kinetic trap that keeps the isolated domain 2 from reaching the native structure.
Why does the nonnative N-terminal cap form in the isolated domain 2? The separation of domain 2 from the rest of the protein has two structural consequences that may bear on the formation of the nonnative N-terminal cap as shown in Fig. 5. cluster of negatively charged residues without positively charged partners, including Glu-95, Asp-106, Glu-107, Asp-108, and Glu-112. The carboxyl group of Glu-95 is ϳ6.7 Å away from that of Asp-106 and ϳ7.1 Å away from that of Glu-112. It is likely that the negative charge potential generated by the cluster of acidic residues may push away the carboxyl group of Glu-95 so that it forms a hydrogen bond to the backbone amide of Asp-92. Second, Phe-91 is almost completely buried in the whole protein but its side-chain becomes mostly exposed to solvent in the isolated domain 2. Thus, Phe-91 in the isolated domain 2 seeks hydrophobic partners, and it finds Leu-96. It is noted that Phe-91 and Glu-95 are replaced by a serine and an alanine, respectively, in domain 1 (Fig. 1c). Therefore, the nonnative N-terminal cap is unlikely to form in the folding process of the isolated domain 1. The hypothesis may be tested by replacing Phe-91 and Glu-95 of domain 2 with the corresponding amino acids of domain 1 by site-directed mutagenesis. Refolding at a higher salt concentration may also help the isolated domain 2 to reach the native conformation by reducing the effects of the negative charges of the cluster of acidic residues and strengthening the hydrophobic interactions to drive formation of the hydrophobic core.
For multidomain proteins, the formation of a native structure requires not only the correct folding of each domain but also the appropriate assembly of the domains via interdomain interactions. However, little is known about the roles of interdomain interactions during the folding process. As discussed above, interdomain interactions may play a critical role in the folding of domain 2 of annexin I. It is interesting to note that among the four domains of annexin I, only domain 1 is folded and soluble when expressed in E. coli. Domain 2 is soluble but largely unfolded. Expression of separated domains 3 and 4 in E. coli results in inclusion bodies (data not shown). It has been reported that domain 3 is easily degraded, but domain 4 forms inclusion bodies when expressed as fusion proteins of glutathione transferase (17). It appears that only domain 1 is an autonomous folding unit, although it is not known at present whether domains 3 and 4 can be solubilized and refolded to their native structures. As described earlier, annexin I is composed of two modules. One module consists of domains 1 and 4, and the other domains 2 and 3. Each module has a hydrophobic interface between its constituents. The two modules are assembled with mostly hydrophilic interactions between domains 2 and 4. It is tempting to speculate that folding of annexin I follows a sequential process with domain 1 as an autonomous initial folding unit. The folded structure of domain 1 facilitates the folding of domain 4 through the hydrophobic interface. Then, the hydrogen bonds and hydrophobic interactions between domains 4 and 2 help domain 2 to get rid of the nonnative cap and reach the native structure. Domain 2, in turn, assists the folding of domain 3 through many hydrophobic interdomain interactions. This proposal can be tested by systematic studies of the folding properties of the entire protein and separated domains of annexin I.