Tough Tendons

The primary structure of the α-chain of preCol-D (molecular mass = 80 kDa), a tanned collagenous protein predominating in the distal portion of the byssal threads of the musselMytilus edulis, was deduced from cDNA to encode an unprecedented natural block copolymer with three major domain types: a central collagen domain flanked by fibroin-like domains and followed by histidine-rich termini. The fibroin-like domains have sequence motifs that strongly resemble the crystalline polyalanine-rich and amorphous glycine-rich regions of spider dragline silk fibroins. The terminal regions resemble the histidine-rich domains of a variety of metal-binding proteins. The silk domains may toughen the collagen by increasing its strength and extensibility. PreCol-D expression is limited to the mussel foot, which contains a longitudinal gradient of preCol-D mRNA. This gradient increases linearly in the proximal to distal direction and reaches a maximum just before the distal depression of the foot.

Marine mussels Mytilus edulis produce byssal threads to attach to solid surfaces in the turbulent intertidal zone. Each thread is stiff at one end and extensible at the other. Molecular gradients correlated with a stiff-to-extensible mechanical gradation exist along the length of each thread (1,2). Byssal threads (Fig. 1A) are often compared with tendons on the basis of their construction from anisotropically oriented bundles of collagen fibrils (3). Indeed, the distal portion of byssus and tendon share a roughly similar tensile strength and initial stiffness (4), but the similarity ends there. Byssal threads are at least five times more extensible and five times tougher than Achilles tendon (5). Moreover, unlike tendon, byssal threads have a nonperiodic microstructure and shrinkage and melting temperatures in excess of 90°C (6). Understanding how byssal collagen differs from other collagens is a critical first step to appreciating its role in the material performance of byssus.
The characterization of byssal collagens has long been thwarted by the intractable cross-linked nature of byssus. Two independent lines of evidence, however, have hinted at an unusual possibility: that byssal collagen may be endowed with some qualities of a silk fibroin-like structure. The first evidence is based on wide-angle fiber x-ray diffraction of byssal threads of Mytilus edulis. The distal part consistently exhibits a typical collagen diffraction pattern with the notable addition of strong equatorial reflections at 4.5-4.6 Å and nonaxial arcs at 3.7-3.8 Å (2, 7). These prompted Rudall (7) to suggest the presence of a ␤-phase in the byssal collagen or an additional ␤-protein in the distal portion of byssus.
In the second line of evidence, byssal threads of M. edulis were subjected to acid extraction coupled with extensive pepsinization. This approach solubilized two collagenous fragments, Col-P 1 and Col-D (both apparently homotrimers), predominating at the proximal and distal end of each thread, respectively (1). Precursors of these fragments, preCol-P (apparent molecular mass 95 kDa) and preCol-D (apparent molecular mass 97 kDa), were identified in foot extracts using specific polyclonal antibodies. The amino acid composition for the N-and C-extensions in preCol-D resembled silk fibroins in containing glycine plus alanine levels in excess of 60 mol % (1). Here we report on a cDNA-deduced primary sequence of ␣-pre-Col-D, a natural block copolymer with collagen, silk fibroin, and putative cross-linking domains.

EXPERIMENTAL PROCEDURES
Partial Characterization of PreCol-D-Common mussels M. edulis were collected at Union, Maine, or Lewes, Delaware. Feet were amputated from freshly shucked mussels and frozen over dry ice. The distal half of approximately 50 mussel feet was dissected and extracted with one of the following: 5% v/v acetic acid at 4°C with protease inhibitors (1), 5% acetic acid without inhibitors, or 5% acetic acid with pepsin performed at 15°C for 24 h at protein-to-enzyme (wt/wt) ratios of 20:1 or 10 4 :1. PreCol-D or preCol-D-derived proteins in each extraction were identified by Western blotting using specific polyclonal antibodies following SDS-polyacrylamide gel electrophoresis (1). The first extraction recovers intact preCol-D. Its amino acid composition and N-terminal sequence were determined directly from protein bands electroblotted to polyvinylidene difluoride membranes (1). The protein, however, was intractable to further purification. The second extraction contains a stable partially degraded intermediate of preCol-D (apparent molecular mass 70 kDa), which was purified by chromatography on Sephadex G-200 (eluted with 5% acetic acid), followed by C-4 reversed phase (column 250 x 7-mm; Applied Biosystems, Foster City, CA) HPLC eluted with a gradient (28 -48% acetonitrile and 0.1% trifluorocetic acid) (1). The third extraction contains pepsin-resistant fragments of preCol-D recovered from digests. Fragments were purified by C-4 HPLC using a linear acetonitrile gradient (42-56% over 60 min) with 0.1% v/v trifluoroacetic acid and partially sequenced. Pepsinization gives rise to enzyme concentration-dependent collagenous fragments Col-D (20:1), Col-DЈ (10 4 :1), and a noncollagenous peptide NC-DЈ (10 4 :1) (see Table I). Finally, several peptides were produced by clostridial collagenase digestion of Col-D at a protein-to-enzyme ratio of 100:1. These were purified by C-18 reversed phase HPLC using a linear gradient of acetonitrile (0 -22% over 50 min) with 0.1% trifluoroacetic acid and sequenced (1,8).
One additional extraction of Col-D was carried out on byssal threads.
* Supported by Grants from the National Institutes of Health (DE 10042) and the Office of Naval Research (N00014-96-1-0812 and N00014-96-1-1205) (to J. H. W.).The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EBI Data Bank with accession number(s) AF029249.
These (1 g wet weight) were harvested within 6 h of secretion by mussels maintained in running seawater held at 12°C. Col-D was liberated by pepsin digestion of mechanically disrupted threads and electroblotted to polyvinylidene difluoride membranes following separation on SDS-polyacrylamide gel electrophoresis (1). The blotted protein was subjected to N-terminal sequencing by automated Edman degradation (8).
Peptide masses were determined by matrix-assisted laser desorption-ionization mass spectrometry with time-of-flight using a PerSeptive Biosystems Voyager model in the positive ion mode with delayed extraction. Peptides and proteins were co-crystallized on sample plates with a molar ratio of 10 3 :1 sinapinic acid to protein. About 250 shots were summed for a total irradiance of 0.6 to 2.0 J at 337 nm in vacuo (1 ϫ 10 Ϫ7 torr). Accelerating voltage, grid voltage, and guide wire were as follows: 25,000, 90%, and 0.30%.
Screening the cDNA Library for PreCol-D-A cDNA library prepared from the foot of M. edulis (ZAP expression vector, Stratagene, La Jolla, CA) was screened by PCR with a degenerate antisense primer 5Ј-TAYTGYTTRTTDCCRTT-3Ј (synthesized by Operon Technologies, Alameda, CA) based on a sequence (NGNKQY) near the N terminus and a vector specific universal sense primer Bkr 5Ј-ACAGGAAACAGCTAT-GACCTTG-3Ј (Stratagene). The PCR reaction was carried out in 50 L of 1 ϫ universal buffer and 10 pmol of each primer, 250 mol of each dNTP, 3 L of the cDNA library (4.4 ϫ 10 8 plaque-forming unit), and 2.5 units Taq polymerase (Stratagene) for 30 cycles on a Robocycler (Stratagene). Each cycle consisted of 30 s at 94°C, 30 s at 50°C, and 30 s at 72°C, with a final extension of 1 min. Thus, a 201-bp product including a 79-bp insert was obtained. The products were subcloned into a PCR TA vector (TA Cloning Kit, Invitrogen, Carlsbad, CA) and transformed into a competent bacterial host (Invitrogen) for amplification, purification, and further analysis. The insert encodes the signal peptide and N terminus of preCol-D.
A DNA probe was synthesized with the 79-bp insert as template by PCR incorporation (primers: sense 5Ј-TCTACAAACTCCTGACCGTG-3Ј for bases 11-30 and antisense 5Ј-TACTGTTTGTTGCCGTTATA-3Ј for bases 69 -89) of nonradioactive digoxigenin-labeled dUTP using DIG PCR probe synthesis and detection kits (Boehringer Mannheim). The M. edulis foot cDNA library was screened with the digoxigenin-labeled probe, and positive clones were converted into a plasmid (pBK-CVM) in the presence of helper phage (Stratagene). The plasmids were isolated by alkaline lysis and purified by anion exchange chromatography using plasmid minikits (Qiagen, Santa Clarita, CA). Plasmids were subjected to restriction digestion analysis to determine the insert size. Nested deletions of the full-length clone were made using the Nested Deletion Kit (Pharmacia Biotech Inc.). These were sequenced using a dideoxy termination kit (Perkin-Elmer). Sequence analysis was performed with commercial software (Sequencher 3.0, Gene Codes Corp., Ann Arbor, MI).
Reverse Transcriptase-PCR for Tissue-specific mRNA Expression-Mussel feet were excised from locally collected mussels, quickly frozen with liquid nitrogen, and cut into five sections of equal thickness along the length of the foot. Total RNA was extracted from each section and from nonfoot tissue (9). A 1-g aliquot was reverse transcribed to cDNA with random hexamers using a first strand cDNA synthesis kit (Life Technologies, Inc.). One-tenth of the first strand reaction was PCRamplified using gene-specific primers (sense, 5Ј-ATCACCCTTGTCAT-TGAAGAC-3Ј for bases 2005-2025; and antisense, 5Ј-ACTTCCATGTC-CTGTTACGGT-3Ј) for preCol-D. A unique pair of primers (sense, 5Ј-ACAGGACCAGAAGTTGCAGAATT-3Ј; and antisense, 5Ј-TCCTACA-TTGAGTACTCCGAATGG-3Ј for bases 2583-2604) was used for pre-Col-P. rRNA served as a positive control using EukA and EukB primers (10). Amplifications were done for 35 cycles at 30 s and 94°C for denaturation, 30 s and 55°C for annealing, and 90 s and 72°C for extension with a final 5-min extension. The following controls were included: reverse transcriptase was omitted for the presence of DNA contamination in RNA extracts; no first strand templates were used for primer-dimer formation; control RNA (Life Technologies) was used for reverse transcriptase activity; the plasmid containing preCol-D was used for PCR activity. The identity of PCR products was determined by restriction digests and DNA sequencing.

RESULTS
A number of amino acid sequences were determined from intact and protease-digested preCol-D to establish a basis for comparing cDNA sequences with the protein isolated from the foot as well as from byssal threads. The digests included clostridial collagenase ( Fig. 2A) and two different concentrations of pepsin (Fig. 2, B and C). Peptides were separated by reversed phase HPLC and sequenced where possible by automated Edman degradation. These sequences are summarized in Table I. Notably, the N terminus of preCol-D contains 4 residues of 3,4-dihydroxyphenylalanine (DOPA), a posttranslational modification of tyrosine that it shares in common with a number of byssal precursor proteins (11). It is reassuring to find that the N-terminal sequence of foot-and byssus-derived Col-D is similar though not identical, owing to the broad substrate specificity of pepsin (Table I).
Of the sequences in Table I, NGNKQY near the N terminus of ␣-preCol-D codes for the least degenerate set of synthetic oligonucleotide primers. With this primer, a 79-bp cDNA fragment was amplified by PCR using a cDNA library of the mussel foot as template. The cDNA library was then screened using the digoxigenin-labeled 79-bp fragment as a probe. Of the positive clones, the largest one was 2870 bp. Further analysis indicated that this contained the full-length sequence of ␣-pre-Col-D with an open reading frame of 2766 nucleotides encoding 922 amino acids (GenBank TM accession no. AF029249) with a calculated mass of 80 kDa. The discrepancy between this mass and the apparent mass of 97 kDa is largely due to the anomalous mobility of isolated ␣-preCol-D on SDS-polyacrylamide gel electrophoresis. A reexamination of byssal precursors using matrix-assisted laser desorption ionization mass spectrometry with time-of-flight indicates 79 -80 kDa to be the better mass estimate for ␣-preCol-D (data not shown). The loss of about 2.2 kDa by removal of the signal peptide is somewhat compensated by the extensive hydroxylation of prolines in the mature protein (1). There is no evidence to support the presence of glycosylation in preCol-D. A mass [M ϩ H ϩ ] ϩ ϭ m/z of 47 kDa for the pepsin-resistant collagen domain, Col-D, was also determined by matrix-assisted laser desorption ionization time-of-flight.
Reverse transcriptase-PCR amplification of foot and nonfoot mRNAs using ␣-preCol-D gene-specific primers demonstrates that the expression of preCol-D is foot-specific (Fig. 3). Moreover, by comparison to 18 S RNA as a control (Fig. 3B), the presence of an ␣-preCol-D mRNA gradient within the foot is indicated. Maximal levels are detectable just proximal to the distal depression near the foot tip and decrease linearly up to the base of the foot (Fig. 3A). The gradient for preCol-P runs in the opposite direction (Fig. 3C) but does not appear to extend as far as the foot tip. Since these results parallel an earlier study of preCol-D protein distribution in the foot determined by Western blotting (1), the gradients of the ␣-preCols would thus appear to be transcriptionally regulated.
The primary structure of preCol-D as deduced from cDNA is unusual in that four distinctive domains are rather symmetrically disposed about a central collagenous domain (Figs. 1B and  4). The collagenous domain (45.5 kDa) constitutes more than half of the total mass and contains about 175 Gly-X-Y tripep-  tide repeats that are reminiscent of type III collagens in having enhanced levels of Gly. There are 26 incidences of Gly in the X position (mostly GGP) and 2 in the Y position (Fig. 4). GGX triplets have a demonstrated destabilizing effect on the triple helical structure while perhaps imparting greater chain flexibility to affected regions (12). The Tyr-405 and Cys-424 residues of ␣-preCol-D are also unusual for collagens. In addition, the first third of the collagenous domain has four breaks in the continuity of the Gly-X-Y repeats. The first is a Gly substitution [SGP], and the other three involve deletions of one or two residues in the Gly-X-Y triplet i.e. GTq, qqP, and qPA. These breaks do not compromise the resistance of the collagen domain to degradation during extensive pepsinization if T Ͻ 10°C suggesting the presence of stable bends or kinks in the triple helical structure of collagen. This is reminiscent of another kinked collagenous protein, C1q, which is a heterotrimer in which one chain has no discontinuity, another has an Ala for Gly substitution, and the third is missing Gly-X from the triplet (13). ␣-PreCol-D has the latter two features on the same chain. A hydrophobic and acidic patch of about 20 residues punctuates the end of the collagen domain and is followed by a Pro-rich (10%) hinge region that is 44 residues long (Fig. 4).
Flanking the collagen domain on both sides are silk fibroinlike domains, so named because they mimick at least two of the distinctive features of the primary sequence of spider dragline silk of Nephila clavipes (14) and Araneus diadematus (15) (Figs. 4 and 5): polyalanine runs and GGX repeats. Many of the polyalanine runs in spider silk dragline proteins are present as microcrystalline antiparallel ␤-sheets (16), whereas the GGX regions are variously modeled as amorphous (17) or nonperi-odic lattice (18). Recent 13 C 2D NMR spin diffusion experiments suggest that the glycine-rich segments of spider dragline silk occur as 3 1 helices (19). Whatever the case is, the result is a microcomposite made up of a relatively unstructured or unstably structured network reinforced by a stiff crystalline filler (17). There are 7 polyalanine runs in preCol-D that extend for 11 to 14 residues (Fig. 4); 2 of these are on the amino side and 5 are on the carboxy side. These are slightly more substituted than the polyalanine runs in silk fibroin, particularly by bulky Arg and Gln residues (Fig. 5). Side chain bulkiness, however, is not necessarily a problem in ␤-sheet formation since intersheet spacings as large as 10 Å occur in group IV and V silks (21). Another fibroin-like structure is the GAGA-rich sequence (residues 783 to 801) immediately preceding the C-terminal polyalanine runs in ␣-preCol-D (Fig. 4). This resembles not only spider silk ADF-2 ( Fig. 5) but also silkworm silk fibroin sequences (21). As noted previously (18), none of these primary sequences has a natural propensity to form ␤-sheet structures in isotropic solution. Instead, ␤-pleated sheet formation is induced by mechanical shear i.e. the spinning of silk (22) and perhaps cold-drawing in byssal thread formation (5).
His-rich regions are located at both ends of preCol-D (Fig. 4). The only noncollagenous Tyr and Lys residues are also found here. The His-rich domains comprise about 60 residues at both termini, but at the C terminus, there is some overlap with the silk fibroin domain. His represents nearly 20 mol % of the composition or 1 in 5 residues in the His-rich domain and frequently occurs in tandem with Ala or Gly. The N terminus, for example, contains eight repeats of AH. His-rich domains appear in several other proteins including high molecular weight kininogen (23), plasmodial histidine-rich protein (24), blood histidine-rich glycoprotein HRG (25), and trematode eggshell precursors (26). Sequence homology is limited to short repeats e.g. AH compared with AHH repeats in plasmodial histidine-rich protein, and HG compared with similar repeats in the kininogen and eggshell precursors. The function of the His-rich tracts is in almost all known cases connected with metal binding, e.g. zinc in kininogen and heme iron in the plasmodial protein and HRG. Moreover, His-rich nereid jaw proteins are high in complexed copper or zinc (27), and poly-(His) tracts are routinely added to recombinant proteins for facile recovery by metalloprecipitation or metalloligand affinity chromatography (28,29). Of the transition metals detected in M. edulis byssus, zinc levels are consistently the highest at about 1 mg/g of dry weight (30). DISCUSSION Qin and Waite (1) demonstrated that byssal threads contain two major proteins, preCol-D and preCol-P, that are distributed in complementary gradients, i.e. in moving along the length of each thread from the adhesive plaque to the stem, preCol-D decreases whereas preCol-P increases. These gradients appear to be transcriptionally programmed and prefabricated in the foot. Recent characterization of the complete primary structure of preCol-P (31) reveals that the two proteins have a strikingly similar domain structure with one notable exception; the silk fibroin flanking domains in preCol-D are replaced by elastin-like flanks in preCol-P. This raises some critical questions about their in situ assembly and molecular maturation. For example, given the symmetric disposition of domains in both, is the axial alignment of the preCols, N to N, N to C, or random during assembly? Are microfibrils formed only from preCol homopolymers, i.e. poly(preCol-D), or do heteropolymers occur? If the deletions in the Gly-X-Y sequence induce kinks or bends in the collagenous portion, are all the bends in consecutive preCols in the same (cis) or opposite (trans) orientations? Finally what cross-links stabilize preCol microfibrils? Only the last question can be speculated on at this time. Aldimine-derived cross-links typical of collagen are not detectable in byssus (32). The collagenous portion of Col-D may not be cross-linked because it is liberated by pepsinization of byssus. Flanking silk fibroin domains lack the functional side chains to form any known cross-links. Potential cross-linking sites, however, abound in the His-rich domains, where two types of cross-links can be envisaged. One is a Michael-type addition product between DOPA-quinone and His or Lys (33,34). The presence of peptidyl-DOPA in preCol-D and catechol oxidase in byssus (35) makes formation of these likely upon maturation. Another type of cross-link is an intermolecular bisor tris-histidinyl zinc complex of the sort found at the active site of carbonic anhydrase or zinc metalloproteases (36). Organometallic cross-links have the advantage of being formed immediately upon secretion and of being reversible.
The results reported here are unusual in several respects; preCol-D is the first known protein to contain both collagenous and silk fibroin-like domains. Indeed, this and another report (37) on molluscan shell matrix proteins challenge the traditional view that silk fibroins are the exclusive domain of arthropods (20). In addition, the tandem array of collagenous and fibroin domains is remarkably reminiscent of synthetic block copolymer designs with their hard and soft segments (Fig. 1B) (38). In this case, extensible and tough "blocks" of frame silk (strain Ͻ0.4; energy to break Ͼ50 ϫ 10 6 Jm Ϫ3 ) could be used to bolster the poorer material performance of the collagen (strain Ͻ0.1; energy to break Ͻ5 ϫ 10 6 Jm Ϫ3 ) (39). Future studies would profit from a closer scrutiny of the mechanism of preCol fibrillogenesis as well as the interplay of collagen, elastin, and silk segments in determining the unique mechanical properties of byssal threads.