The Structure of the Cys-rich Terminal Domain of Hydra Minicollagen, Which Is Involved in Disulfide Networks of the Nematocyst Wall*

The minicollagens found in the nematocysts of Hydra constitute a family of invertebrate collagens with unusual properties. They share a common modular archi-tecture with a central collagen sequence ranging from 14 to 16 Gly-X-Y repeats flanked by polyproline/hy-droxyproline stretches and short terminal domains that show a conserved cysteine pattern (C XXX C XXX C XXX-C XXX CC). The minicollagen cysteine-rich domains are believed to function in a switch of the disulfide connec-tivity from intra- to intermolecular bonds during matu-ration of the capsule wall. The solution structure of the C-terminal fragment including a minicollagen cysteine-rich

Minicollagens of nematocysts in Hydra, corals, and other cnidaria are very unusual proteins with structural properties not shared by other invertebrate or vertebrate collagens. From both the N and C terminus of the collagen triple helix emerge three polyproline-II-type helices, which consist of 5-23 proline or hydroxyproline residues (1)(2)(3). Each of the polyproline-IItype helices is terminated by a small Cys-rich domain termed minicollagen Cys-rich domain (MCRD). 1 The N-and C-terminal MCRDs are homologous and share the cysteine pattern CXXXCXXXCXXXCXXXCC. A small propeptide region preceding the N-terminal MCRD is cleaved off during expression, and mature minicollagen has a rather symmetrical appearance with closely similar structural elements at both sides. This bipolar nature of minicollagen is unique among all other known collagens (4 -7) and suggests a special function.
A unique function of minicollagen is also suggested by its restricted appearance in the capsule wall of nematocysts. Nematocysts are complex explosive organelles, which basically consist of a capsule, an inverted tubule armed with spines, and an operculum. The tubule is connected to the capsule wall and is twisted in many turns inside the osmotically charged capsule matrix. Following stimulation, the internal tube is expelled, the osmotic pressure is released, and the capsule contents including toxins are released at the tubule end. This specialized form of exocytosis proceeds with ultrafast rates and accelerations comparable to those of a fired bullet (8).
The capsule wall resists more than 150 atmospheres of osmotic pressure in the charged state. For Hydra nematocysts it consists mainly of two proteins, minicollagen and nematocyst outer wall antigen (NOWA) (9 -11). The two proteins can be dissolved from capsule preparations only under reducing conditions. Already at a time when the amino acid sequence of minicollagens was unknown it was found that these collagens formed disulfide cross-linked polymers that were insoluble in SDS but easily soluble in the presence of a reducing agent (12,13). Following the discovery of a family of minicollagens (3) and recombinant expression of minicollagen-1 (9) it was found that the proteins are expressed in a soluble precursor form present in the endoplasmic reticulum and post-Golgi vacuoles in Hydra. They are converted to the disulfide-linked assembly form of the nematocyst wall upon wall compaction, during which a dense and well defined capsule wall is formed (9). The morphological changes, a loss of accessibility to antibodies against minicollagen-1, and a parallel loss of solu-bility under non-reducing conditions suggested a close link between disulfide polymerization and the condensation of wall proteins. Both processes provide an explanation for the unusually high tensile strength of the mature nematocyst wall.
Minicollagen-1 of Hydra recombinantly expressed in mammalian cells contains internal disulfide bonds in its MCRDs but no interchain disulfide cross-links between chains (9). The trimeric collagen molecules dissociated into single chains when heated to 45°C under non-reducing conditions (10). Reduction did not influence this transition temperature, indicating that only the collagen domain is responsible for trimerization. Recombinant minicollagen-1 was found to form aggregates in electron micrographs but no disulfide bridges were formed spontaneously under in vitro conditions (9). This material behaved like the precursor form in vivo and could be solubilized by SDS or other denaturants under non-reducing conditions. It was therefore concluded that disulfide isomerases or other external parameters are required for a disulfide reshuffling process by which internal disulfide bridges are converted to intermolecular links (14).
More recent results suggest that the structural function of minicollagens in wall hardening is complemented by NOWA (11). Altogether 10 domains were identified in this protein, namely a SCP-domain (see Smart data base, smart.embl-heidelberg.de, smart00198), a C-type lectin domain (CTLD, smart00034), and eight C-terminal domains with homology to the MCRDs. In particular, the pattern of six cysteines is shared by the corresponding domains in NOWA and minicollagen. The presence of homologous Cys-rich domains in both proteins suggested a joint function in disulfide-mediated polymerization. Supportive evidence was obtained by the analysis of breakdown products of native nematocyst capsules after limited sonification without reduction. 2 Minicollagen and NOWA formed disulfide cross-linked units that, like the entire capsule, were readily dissolved under even mild reducing conditions. Disulfide reshuffling processes and the formation of disulfide-linked complexes are common mechanisms in the extracellular space, and some recently explored systems may be referenced (15)(16)(17)(18)(19). The minicollagen/NOWA system may stand as a prototype for the controlled formation of a highly stable matrix layer by disulfide linkage. To understand the assembly mechanism, the three-dimensional structure of the MCRDs involved is needed. The core of the MCRD consists of only 20 residues with 6 closely positioned Cys residues. At the start of this study it was not known whether the very small domains were autonomous or stable but only as part of a larger structure. The disulfide connections were also not known and could not be explored by limited proteolysis because of the lack of suitable cleavage sites. These problems were approached by solving the structure of the C-terminal Cys-rich domain of minicollagen-1 by NMR spectroscopy. A new fold was found, with disulfide connections between Cys 2 and Cys 18 , Cys 6 and Cys 14 , and Cys 10 and Cys 19 . Oxidative folding occurs at high rates in the presence of an oxidized glutathione/reduced glutathione redox buffer with formation of almost exclusively one single isomer. A highly conserved Pro residue located between Cys 10 and Cys 14 induces and probably directs correct disulfide connections. The presented structure proposes likely candidates for disulfide reshuffling supposed to be a key reaction during nematocyst morphogenesis.
The lyophilized purified peptide was dissolved in 25 mM Tris/HCl, pH 6.0, containing 100 mM NaCl at a concentration of 0.02 mM. Folding was allowed to proceed under N 2 for 72-96 h at 4°C. Oxidation was induced by changing the pH to 8.0 -8.1 with saturated Tris and the addition of reduced and oxidized glutathione (10:1 molar ratio) to a final concentration of 1 mM. The solution was exposed to air at 4°C for 4 -10 days. Oxidation was stopped by the addition of trifluoroacetic acid to give a pH of 1.3. The oxidized peptide was purified by preparative HPLC. The yield of oxidized peptide was 103 mg (16% of theoretical yield). Mass spectroscopy of the oxidized peptide showed a molecular mass of 2585.1 Da.

NMR Data Collection
Structure 1-NMR experiments for conformational analysis were carried out at 283 K on Bruker DRX500, DMX750, and DX900 spectrometers using a 3 mM sample of the peptide dissolved in a H 2 O/D 2 O (9:1) mixture at pH 3.5. Resonance assignments were performed according to the method of Wü thrich (38). 173 experimental interproton distance constraints were extracted from two-dimensional-nuclear Overhauser effect spectroscopy (39) experiments with mixing times between 75 and 200 ms. Five hydrogen bonds were identified from temperature shifts and 1 H/ 2 H exchange. Acceptor carbonyl groups were identified in initial structure calculations. The nuclear Overhauser effect intensities were converted into interproton distance constraints using the classifications of very strong, 1.7-2.3 Å; strong, 2.2-2.8 Å; medium, 2.6 -3.4 Å; weak, 3.0 -4.0 Å; very weak, 3.2-4.8 Å, and the distances of pseudo atoms were corrected as described by Wü thrich (38). Distance geometry and molecular dynamics-simulated annealing calculations were performed with the INSIGHTII 98.0 software package (Accelrys, San Diego, CA) on Silicon Graphics O 2 R5000 computers (SGI, Mountain View, CA) as described recently (25). In brief, one hundred structures were generated by distance geometry and refined with molecular dynamicssimulated annealing steps. The experimental constraints were applied at every stage of the calculations. The coordinates and structural restraints have been deposited in the Brookhaven Protein Data Bank under accession number 1SOP.
Structure 2-The chemically synthesized C-terminal MCRD of minicollagen-1 was purified after oxidation according to purification procedure 2. The determination of the NMR structure (PDB accession number 1SP7) was carried out in 5 mM sodium phosphate buffer, pH 6.5, at 15°C by using homonuclear and heteronuclear techniques and infor-mation from weak alignment (44). Structure representations were generated with MOLMOL (43).
Analytical Ultracentrifugation-Sedimentation equilibrium experiments were performed on a Beckman Optima XL-A analytical ultracentrifuge (Beckman Instruments) equipped with 12-mm Epon doublesector cells in an An-60 Ti rotor. The MCRD was analyzed in 5 mM Tris buffer, pH 7, either without salt or with 100 mM NaCl at 20°C. The peptide concentrations were adjusted to 0.2-0.8 mM. Sedimentation equilibrium scans were carried out at 48,000 rpm. Molecular masses were evaluated from ln A versus r 2 plots, where A is the absorbance at 277 nm and r is the distance from the rotor center. A partial specific volume of 0.73 ml/g was used for all calculations.

The MCRD Constitutes a Conserved Sequence Module in
Nematocyst Minicollagens and in NOWA-The sequence of minicollagen-1 with domain indications and the alignment of cysteinerich domains from different minicollagens and NOWA are represented in Fig. 1. Minicollagen-1 consists of an N-terminal MCRD, N-terminal polyproline region, a central collagen sequence, and a shorter C-terminal polyproline stretch followed by a C-terminal MCRD (Fig. 1A). The preceding propeptide is cleaved off during the recombinant expression of minicollagen-1 and probably also in Hydra (9). As already mentioned in the Introduction the overall sequence homology of the Cys-rich domains is not very high, but the cysteine pattern is identical for all minicollagens and for the eight C-terminal domains of NOWA with the only variation being in the number of residues spacing the first two Cys residues. The sequence of the C-terminal minicollagen-1 MCRD, which has been investigated in the present study, is underlined and the numbering of residues corresponds to the synthetic peptides used in this study (Fig. 1B). Beside the cysteines there is only one conserved residue, which is Pro 12 (Fig. 1B, shown in purple).
Peptide Synthesis and Oxidation-As the MCRD occurs in different molecular contexts, at the N-and C-terminal extensions of minicollagen molecules as well as eight times repeated at the C terminus of NOWA, we speculated that it might constitute an isolated domain with the capacity of independent folding. Peptide synthesis was carried out for the C-terminal MCRD of minicollagen-1 starting with the last proline residue of the C-terminal polyproline stretch and including the charged C terminus of the full-length molecule. The formation of disulfide bonds occurs in the presence of a redox buffer at 100 M peptide concentrations to avoid aggregation by intermolecular disulfide bonds (see "Experimental Procedures"). The final product showed a single peak in mass spectroscopic analysis with the reduced and oxidized MCRDs having a difference in molecular weight of 6 Da, thereby strongly indicating the formation of three intramolecular disulfide bonds (Table I). Disulfide bonds can be shown to be all intramolecular in mass spectroscopic analysis. Analytical ultracentrifugation confirmed the absence of significant aggregation or multimerization in solutions from 0.2 to 0.8 mM total peptide concentration (Table I), which was further supported by 1 H N NMR T 2 relaxation times of more than 100 ms at 25°C indicative of the prevalence of a monomeric state in solutions of 1.6 mM MCRD.
The NMR Structure of the MCRD-The solution structure of the MCRD was determined independently from two separate peptide preparations (see "Experimental Procedures"). Both structures show an identical tightly packed globular fold (Figs.  2 and 3), which consists of a short N-terminal ␣-helix between Val 5 and Gln 9 followed by an inverse ␥-turn (Gln 9 -Val 11 ), a type I ␤-turn (Val 11 -Cys 14 ), and a type III ␤-turn (Pro 15 -Cys 18 ). Thus cysteines 6, 10, 14, and 18 are directly located in turns, whereas cysteine 2 is located in a proline-rich N-terminal sequence, and cysteine 19 is oriented presumably by the ␤-III turn and a C-terminal proline. All proline residues in the MCRD are in trans conformation as evidenced by specific nuclear Overhauser effects.
The only conserved residue Pro 12 (Fig. 1B) imposes a ␤-I turn topology on residues 11-14 because of its fixed angle of Ϫ60°.  (2); MColac, minicollagen Acropora cervicornis (1); MColap, minicollagen Acropora palmate (1); N and C, N-, C-terminal. The sequence of the Cys-rich region of NOWA in Hydra (NWh) starts with repeat 1 and terminates with repeat 8. Residues in MCol1hC are numbered starting at the proline preceding the first cysteine, and the same numbering was used in the NMR structures. The highly conserved cysteine residues are marked in red. Proline in position 12, which is conserved with two exceptions, is marked in purple. The sequence of the synthesized and investigated peptide is underlined.
(H N ), and Pro 15 (O) and Cys 18 (H N ) (Fig. 3), respectively, as derived both from the calculated structures and from the slow 1 H/ 2 H exchange upon lyophilization of MCRD and redissolving in 2 H 2 O. The positively charged C terminus of Lys 22 , Arg 23 , and Lys 24 is flexibly disordered and does not contribute to the MCRD structure (Fig. 2).
The first disulfide bond, Cys 2 -Cys 18 , clasps the N and C termini of the domain, whereas the Cys 6 -Cys 14 bond connects the N-terminal ␣-helix to the type I ␤-turn starting with Val 11 , thus forming the core of the MCRD structure. The third disulfide bond Cys 10 -Cys 19 is more exposed to the C-terminal surface of the domain and constrains the polypeptide backbone into two consecutive turns. The more surface-exposed N-and C-terminal disulfide bridges represent the most likely candidates for intermolecular disulfide exchange reactions. Complete reduction of the disulfide bonds with an excess of Tris(2carboxyethyl)phosphine hydrochloride at pH 7.5 results in a complete unfolding without retaining local conformational preferences as well assessed by 1 H NMR.
Reoxidation of reduced MCRD with an excess of oxidized glutathione at pH 7.5 proceeds extremely fast and is completed within 2 h as observed by 1 H NMR spectroscopy (Fig. 4). This is presumably because of the small domain size and the effect of Pro 12 upon disulfide bridge formation (see Discussion). Fluorescence spectroscopy allows the monitoring of the refolding due to the quenching of Tyr 17 fluorescence by a nearby disulfide bridge and yields a folding half-time of 1.5 h at pH 8 upon oxygen saturation of the solution in agreement with NMR data (not shown).
Model of Trimeric Minicollagen-1-It has already been mentioned that minicollagen-1 at native conditions appears to be a non-covalent trimer. A schematic representation of the trimeric minicollagen molecule including the connections of the N-and C-terminal MCRDs to the polyproline type II helices is shown in Fig. 5. A model of trimeric minicollagen-1 was proposed earlier (9) in which the MCRD was assumed to have a linear structure. After elucidation of the structure of the MCRD we are now able to draw a schematic model in scale with the known dimensions of the collagen triple helix and the polyproline-II helix (4). An open question is the geometry of the connections between the collagen and polyproline parts of the molecule and between MCRDs and polyprolines, respectively. The angles and flexibility at the junction sequence APLP in the N terminus and the single Ala spacer in the C terminus (Fig. 1, shown in black) or by any Pro residue that potentially can occur in cis conformation are not known. The lack of association between MCRDs even at relatively high concentrations would suggest that they also do not associate in minicollagen without the help of an isomerase or another catalytic system. Polyproline or polyhydroxyproline-II helices are also known to be monomeric. For these reasons the polyproline arms with their MCRD heads are displayed as non-interacting entities in Fig. 5. This is confirmed by previous biochemical data showing that minicollagen-1 trimers are not disulfide-cross-linked and the triple helix is stabilized solely by the collagen sequence composition (10).
The ways that MCRDs are linked to the main molecule are shown schematically (Fig. 5, red arrows) in the upper (Nterminal) and the lower (C-terminal) part of Fig. 5. The polyproline-II helices approach the MCRDs from the N terminus for the C-terminal domain and from C terminus for the N-terminal domain. The connections between the polyproline-II helices and the MCRDs might alter the accessibility of particular disulfide bridges in the MCRD. As can be seen from the comparison of Figs. 3 and 5, in the N-terminal MCRD the Cys 10 -Cys 19 disulfide is less exposed. An opposite situation is observed in the C terminus where the polyproline-II stretch hides the Cys 2 -Cys 18 bridge. Accordingly the suggested candidates for intermolecular disulfide exchange are Cys 2 -Cys 18 for the N-terminal and Cys 10 -Cys 19 for the C-terminal MCRD.

DISCUSSION
Small Cys-rich domains are widely distributed building blocks of extracellular proteins in essentially all phyla including plants and bacteria. The most abundant domain type is the epidermal growth factor domain (EGF, smart00181). Including its variants EGFca (smart00179) and EGF-like (smart0001) several thousands of different EGF-domains are known in proteins of different functions. In most cases EGF domains are arranged in arrays with other domains. Laminin (20) and fibrillin (21) are two of very many examples. Many Cys-rich domains, however, also exist as single autonomous proteins, and the epidermal growth factor domain is a well known example. This fact provides a bridge to the numerous low molar mass Cys-rich proteins, which are also products of larger precursor forms but express their function as toxins, antimicrobial peptides, or other small bioactive agents. Variations of sequences, cysteine patterns, and three-dimensional structures are, however, very large for this diverse class of small proteins, and clear homologies exist for small groups only (22). Small Cys-rich peptides with antimicrobial or toxic functions are part of the innate immunity and defense system of invertebrate animals. They are often stored in secretory granules and released in response to parasites via exocytosis. Nematocyst discharge in cnidarians represents a specialized form of exocytosis from a giant post-Golgi vesicle. The appearance of a structural motif related to Cys-rich peptide tox-  ins in proteins involved in the formation of a wall polymer, which is associated with the nematocyst membrane, might hint at a phylogenetic link between this group of defensive molecules and the evolution of the nematocyst.
The common feature of small Cys-rich domains is the prevalence of disulfide bridges in structure stabilization. This is already suggested by the large fraction of cysteines, which is 4/22 for gomesin (23), about 6/40 for EGF domains, 8/42 for crambin (24) and hellethionin (25) as compared with 6/20 for the Cys-rich domain of minicollagen-1. The later domain has one of the highest ratios of cysteines to total residues known so far. Comparably dense cysteine patterns are only described for small conotoxin peptides like PnIVA and PnIVB (26). The peptide whose structure was elucidated in the present work is four residues longer than the essential core. The structure shows that the first proline and the last three residues are randomly oriented and probably not required for correct oxidative folding. The remaining structure is well defined, and data demonstrate formation of a single topoisomer under optimized oxidative conditions.
In the reduced state no conformational preferences were found in the MCRD by NMR. Similar observations were reported for many other Cys-rich domains underlining the importance of disulfide bonds for stabilization. As shown for tachyplesin mutational replacements of Cys residues by alanine lead to a loss of structure (27). Interestingly the global fold of tachyplesin was rescued by hydrophobic interactions between pairs of tyrosines or phenylalanines when cysteines were replaced by these residues (27).
Although the equilibrium structure of MCRD is predomi- nantly determined by the covalent interactions between Cys residues, kinetic intermediates must be responsible for correct disulfide pairings. A folded-precursor mechanism (28,29,36) and a quasi-stochastic mechanism (30,31) were proposed for other monomeric proteins. True pathways are probably between these two extremes. For the Cys-rich domain of minicollagen formation of a type I ␤-turn by Pro 12 and hydrogen bonding between Cys 14 (H N ) and Val 11 (O) is most likely a very important intermediate step. As can be seen from the structure (Figs. 2 and 3) this bend and the ␥-turn bring the cysteines into close vicinity for the correct intramolecular disulfide bridging.
It is amazing that few non-covalent interactions in the MCRD of minicollagen and in many other short peptides lead to correct connectivities between several Cys residues, which statistically could also interact in very many different intra-and intermolecular modes. To establish the disulfide connections of MCRD, pairwise bridges between adjacent Cys-residues have to be prevented. The three intervening amino acid residues found in MCRDs render pairwise bridges unfavorable (32), and this feature is most likely essential for proper folding.
In small disulfide-rich proteins identical folds are found in homologous domains in contrast to large variations in noncysteine residues. Striking examples are the ϳ200 solved structures of different EGF domains, which exhibit the same fold, although sequence identities of non-cysteine residues are difficult to detect (Smart data base, key words EGF and structure). This high conservation of structure leads us to assume that the related Cys-rich repeats in the minicollagen of corals and in NOWA proteins have the same global fold as the MCRD of minicollagen-1 of Hydra, which was investigated in this study. Conservation of residues is low (Fig. 1) and would not allow the prediction of structure identities for conventional globular proteins (33).
Earlier data on the interaction of minicollagens and NOWA in the nematocyst wall conclusively demonstrated a switch from a soluble state of both proteins with all disulfides internally linked to an insoluble dense polymeric state with intermolecular disulfide bonds (see Introduction). Analogous assembly processes involving disulfide reshuffling were reported for virus capsid proteins (19), collagen IV (34), and other systems (17,18). The only cysteines in minicollagen-1 are located in the MCRDs. Consequently a disulfide reshuffling interaction was proposed between these domains. The data also suggested corresponding interactions between the Cys-rich domains of minicollagen and NOWA. In vivo the situation may be rather complex because of possible interactions between N-and C-terminal domains of different minicollagens in the same organism. At each end of a minicollagen three MCRDs are present (Fig. 4). This opens the possibility of a simultaneous interaction between minicollagens and NOWA.
Extensive searches of the data base failed to reveal MCRDlike domains in non-nematocyst proteins. The domain type seems to be highly specialized for disulfide-mediated crosslinking of the nematocyst wall. For several organisms with nematocyst organelles minicollagen and NOWA genes have been found, but the sequencing of cnidaria genes is still too incomplete to allow a general conclusion. Interestingly in the minicollagen of corals (AF507373.1) Cys-residues frequently occur after glycines in the collagenous part of the sequence. Cysteine residues, which interlink collagen chains of the same molecule are located at the ends of a triple helix to form a stable disulfide knot (35,36) or located in interruptions of the regular Gly-X-Y repeat (37). Cys-residues in the X position of collagens are highly unusual and are most likely used to link between collagen molecules. It may be speculated that the nematocyst walls of corals are cross-linked by disulfide bridges between collagen triple helices in addition to those between MCRDs thus providing a higher tensile strength.
Future work will focus on the mechanism of the disulfide reshuffling process using isolated Cys-rich domains of minicollagen and NOWA. A likely possibility is the interaction between N-and C-terminal domains of minicollagens. In vivo, the disulfide exchange reaction probably is catalyzed by disulfide isomerases or other helper proteins, which have to be defined prior to in vitro studies. Our working hypothesis is that only one or two of the three disulfide bonds will be reshuffled from intra-to intermolecular. Likely candidates are the bonds Cys 2 -Cys 18 and Cys 10 -Cys 19 , which are both surface-exposed.