Correlation between Fibroin Amino Acid Sequence and Physical Silk Properties*

The fiber properties of lepidopteran silk depend on the amino acid repeats that interact during H-fibroin polymerization. The aim of our research was to relate repeat composition to insect biology and fiber strength. Representative regions of the H-fibroin genes were sequenced and analyzed in three pyralid species: wax moth ( Galleria mellonella ), European flour moth ( Ephestia kuehniella ), and Indian meal moth ( Plodia interpunctella ). The amino acid repeats are species-specific, evidently a diversification of an ancestral region of 43 residues, and include three types of regularly dispersed motifs: modifications of GSSAASAA sequence, stretches of tripeptides G XZ where X and Z represent bulky residues, and sequences similar to PVIVIEE. No concatenations of G X dipeptide or alanine, which are typical for Bombyx silkworms and Antheraea silk moths, respectively, were found. Despite different repeat structure, the silks of G. mellonella and E. kuehniella exhibit similar tensile strength as the Bombyx and Antheraea silks. We suggest that in these latter two species, variations in the repeat length obstruct repeat alignment, but suffi-ciently long stretches of iterated residues get super-posed to interact. In the pyralid H-fibroins, interactions of the widely separated and diverse motifs depend on the precision of repeat matching; silk is strong in G. mellonella and E. kuehniella , with 2–3 types of long homogenous repeats, and nearly 10 times weaker in P. interpunctella, with seven types of shorter erratic silk glands for molecular analyses, fully grown larvae were anesthetized in water and dissected under saline. About 20 larvae were used for each silk gland preparation in G. mellonella (body weight (cid:2) 180 mg) and 50 larvae in E. kuehniella (body weight (cid:2) 35 mg) and P. interpunctella (body weight (cid:2) 20 mg). Dissected glands were frozen in liquid nitrogen and stored at (cid:3) 80 °C. PCR Primers— Conserved regions of known H-fibroin proteins and H-fibroin genes (4) were used to design primers. Sequence similarities between G. mellonella H-fibroin and several tags of E. kuehniella H- fibroin cDNA 2 allowed us to use semi-degenerate primers in an early stage of our research. Commonly used overlapping forward primers E1 and E2 correspond to the MRVTTFV and FVILCCA parts of the signal peptide, and the forward primer F1 and reverse primer R1 correspond to the YEED motif in the N-terminal non-repetitive fibroin sequence in G. mellonella H-fibroin (9). The VIVI motif, which is typical for the internal repeats, was used to design reverse and forward primers F2 and R2. The forward primer F3 was based on the GAGNI motif present in some of the E. kuehniella tags, and reverse primer R3 was based on the RRQF(L)VVK sequence conserved in the non-repetitive C-terminal region of G. mellonella and B. mori (4). Final sequence verification was carried out with specific primers. Amplification of H-fibroin Sequences from the Genomic DNA— Larval carcasses left after silk gland dissection a genomic DNA. About 1 g oftissue was a under liquid and ml of lysis (0.1 M M EDTA, SDS, and M Tris, Genomic of H-fibroin Tallying for Polymerization—

Silks have long attracted attention as biodegradable fibers of considerable strength, elasticity, and durability. X-ray diffraction studies revealed that the fibers contain crystallites, interpreted to be stacks of ␤-sheets formed by regular amino acid repeats in the core protein called fibroin (1). The highly orga-nized crystalline regions were suggested to confer the toughness and the interspersed "amorphous" regions the elasticity of the silk fiber. The distance between ␤-sheets in the silk crystallites of Bombyx mori (family Bombycidae) corresponded to hydrogen bridges between side chains of glycine and alanine residues whereas in the silk of Antheraea pernyi (family Saturniidae) to interactions between alanine-alanine residues (2,3). These conclusions were consistent with the total amino acid composition of the respective silk.
Modern protein and gene analyses characterized the structure of silk in detail (4). It was confirmed that the fiber core consists primarily of a large (200 -500 kDa) protein, designated heavy chain fibroin (H-fibroin), 1 that is associated with two small proteins, light chain fibroin and P25, in most Lepidoptera. Several sericins enveloping the fiber core serve as glue for the cocoon construction, and a few additional peptides seem to confer protective functions. Most of the physical fiber properties appear to be determined by the amino acid sequence of the H-fibroin, which is for the most part composed from repeats. The repeats in B. mori H-fibroin are dominated by iterations of a GAGAGS motif (5,6), and a string of alanines occurs at the end of each of the four types of repeats in Antheraea H-fibroin (7,8).
On the basis of x-ray diffraction data (1), we expected to find polyalanine repeats in the H-fibroin of the waxmoth Galleria mellonella (family Pyralidae), but gene analysis revealed a more complex amino acid arrangement (9). The overall gene structure and the nucleotide sequences at the 5Ј and 3Ј gene ends were similar to the H-fibroin genes of B. mori (6), A. pernyi (7), and Antheraea yamamai (8), but the major internal sequence was unique. About 95% of the gene encoded amino acid repeats arranged hierarchically into 10 -12 large assemblies. The remarkable homogeneity of the repeat length and composition contrasted with repeat variability in the H-fibroins of Bombyx and Antheraea (6 -8). Repeat regularity, absence of concatenations of short motifs, and relatively high content of amino acid residues with bulky side chains indicated that organization of the H-fibroin polymer in G. mellonella differs from that in Bombyx and Antheraea (9).
The silk of G. mellonella was reported to be considerably more extensible and somewhat stronger than the silks of B. mori and A. pernyi (10,11). These properties are probably related to the mode of silk use in G. mellonella. The caterpillars of B. mori and A. pernyi spin silk only for cocoon construction at the end of larval life, and high tensile strength is the major functional requirement of the fiber. From an early larval stage, the caterpillars of G. mellonella, which develop in bee colonies, spin silky tubes that protect them against the detection and killing by the bees. They continuously renew and enlarge the tubes, devouring some of the old silk to utilize amino acids that are in short supply in their comb diet. Before pupation, the larva can descend on a newly spun fiber to leave the bee colony in search for a suitable place for cocoon construction. The silk produced by G. mellonella must obviously be extensible to allow widening of the tube when the larva turns around and at the same time strong enough to hold the larval weight. It was deduced that the uniformity of H-fibroin repeats underlies the excellent mechanical properties of G. mellonella silk (9).
G. mellonella is a member of Galleriinae subfamily of the Pyralidae moth family. In the present study we examine correlation between the regularity of H-fibroin repeats and the silk properties in two pyralids that belong to subfamily Phycitinae (12). Larvae of the Mediterranean flour moth, Ephestia kuehniella, which typically attack ground cereal products, spend most of their life in silk tubes that probably provide some protection against parasitoids and reduce water loss. Larvae of the Indian meal moth, Plodia interpunctella, seem to be less dependent on protective silk production because they typically develop inside nuts, seeds, and dried fruits. From the comparison of the three species we conclude that silk strength in pyralids is correlated with the regularity of H-fibroin repeats. We also conjecture that the partially recycled silk of pyralids serves as a temporal and readily accessible depot of essential and energy-rich amino acids.

EXPERIMENTAL PROCEDURES
Insect Culture and Silk Gland Dissection-Cultures of the greater waxmoth, G. mellonella L. (Pyralidae: Galleriinae), and the Mediterranean flour moth, E. kuehniella Zeller (Pyralidae: Phycitinae), were maintained in our Institute as described previously (13,14). The colony of P. interpunctella (Hü bner) (Pyralidae: Phycitinae) was raised on a cereal diet supplemented with ground hazel nuts, sucrose, milk powder, and brewer's yeast (4.5:1:1:1:1), and the colony was kindly provided by Dr. B. Darvas of the Plant Protection Institute, Budapest, Hungary. To obtain silk glands for molecular analyses, fully grown larvae were anesthetized in water and dissected under saline. About 20 larvae were used for each silk gland preparation in G. mellonella (body weight ϳ180 mg) and 50 larvae in E. kuehniella (body weight ϳ35 mg) and P. interpunctella (body weight ϳ20 mg). Dissected glands were frozen in liquid nitrogen and stored at Ϫ80°C.
PCR Primers-Conserved regions of known H-fibroin proteins and H-fibroin genes (4) were used to design primers. Sequence similarities between G. mellonella H-fibroin and several tags of E. kuehniella Hfibroin cDNA 2 allowed us to use semi-degenerate primers in an early stage of our research. Commonly used overlapping forward primers E1 and E2 correspond to the MRVTTFV and FVILCCA parts of the signal peptide, and the forward primer F1 and reverse primer R1 correspond to the YEED motif in the N-terminal non-repetitive fibroin sequence in G. mellonella H-fibroin (9). The VIVI motif, which is typical for the internal repeats, was used to design reverse and forward primers F2 and R2. The forward primer F3 was based on the GAGNI motif present in some of the E. kuehniella tags, and reverse primer R3 was based on the RRQF(L)VVK sequence conserved in the non-repetitive C-terminal region of G. mellonella and B. mori (4). Final sequence verification was carried out with specific primers.
Amplification of H-fibroin Sequences from the Genomic DNA-Larval carcasses left after silk gland dissection were used as a source of genomic DNA. About 1 g of tissue was crushed in a glass mortar under liquid nitrogen and homogenized in 10 ml of lysis buffer (0.1 M NaCl, 0.05 M EDTA, 0.5% SDS, and 0.01 M Tris, pH 7.5). Genomic DNA was extracted in a standard way with phenol/ethanol precipitation, treated with RNase A and proteinase K for 30 min at 37°C, purified with chloroform, and stored as an ethanol precipitate. PCR was performed with 40 ng of DNA in a 25-l volume. Typically, initial denaturation at 94°C for 1 min was followed by amplification in 35 cycles of 30 s at 94°C, 25 s at 51°C, and 90 s at 72°C, and termination with a final extension at 72°C for 10 min.
Reverse Transcription-PCR (RT-PCR)-Total RNA was isolated from the whole silk glands of the last instar larvae using the LiCl/urea method. First strand cDNA was prepared from 3 g of total RNA in 20-l PCR mixture containing 200 units of SuperScript II, 1ϫ Firststrand buffer, 0.01 M dithiothreitol, 500 M dNTP, 40 units of RNase-OUT ribonuclease inhibitor (all reagents from Invitrogen), and 20 pmol  of oligo(dT) adapter primer designed in our laboratory and referred to  as Trikant (5Ј-TGAGCAAGTTCAGCCTGGTTA(T) 19 -3Ј). The reverse transcription reaction was carried out at 42°C for 50 min followed by heat inactivation at 70°C for 15 min. The resulting cDNA was diluted 10-fold, and 50-ng aliquots were taken as template for PCR amplification of partial H-fibroin sequences. A typical PCR procedure included 1 min at 94°C, 35 cycles of 30 s at 94°C, 25 s at 53°C, and 90 s at 72°C, followed by 10 min at 72°C for a final extension. The primers were either both derived from the H-fibroin sequence or the Trikant adapter without the poly(T) tail was used as the reverse primer. Inverse PCR-Inverse PCR was performed according to the protocol of Willis et al. (16). High molecular weight DNA was digested with ClaI restrictase (Invitrogen) and 0.3 g used for self-ligation (T4 DNA ligase, Promega) at 15°C overnight to generate circular DNA fragments. The first round PCR was performed with reverse primer iR1 complementary to the nucleotide sequence 64 -82 ( Fig. 1) and forward primer iF1 corresponding to positions 202-223 in the intron (GenBank TM accession number AY253534). One 1 l of the product mixture was used for the second round of nested PCR with reverse primer iR2, which matches the region 32-52 nt (Fig. 1), and forward primer iF2, which corresponds to the intron sequence 323-344 nt (GenBank TM accession number AY253534). The cycling conditions for both rounds were identical: 94°C for 1 min; 35 cycles: 94°C for 30 s, 58°C for 25 s and 72°C for 3 min; 72°C for 7 min.
Nucleotide Sequencing and Computational Analysis of the Sequence Data-PCR products were separated on an agarose gel, extracted (Qiagen kit), ligated into the pGEM T-easy vector (Promega), and sequenced using M13 reverse and forward primers in an ABI prism sequencer (PerkinElmer Life Sciences model 310). Alignments of nucleotides and amino acid residues were carried out using the clustal algorithm as implemented by the megalign program of the lasergene package (DNAstar). In addition to the newly generated sequences, thorough analysis was applied to the E. kuehniella tags deposited in GenBank TM . 2 The conceptual translations of aligned tags revealed irregularities that could be traced to false frameshifts, obviously resulting from software interpretation of strong sequencer signals as nucleotide doublings. Some of these errors could be resolved and the continuity of the reading frames restored (Fig. 3).
Measurements of Silk Fiber Strength-Silk fibers were obtained from fully grown larvae descending on a glass slide slanted to about 30°while spinning. The diameter of fibers was measured under a Zeiss microscope Axioplan II equipped with Nomarski optics. Fibers several cm long were individually attached with Scotch tape between a weight placed on a laboratory balance and a string connected over a fine pulley to a slowly rotating motor. The fiber was pulled at a constant speed of 37 m/s and the force to which it was exposed was read on the balance as a reduction of the original weight load. The measurements were terminated when the fiber was pulled apart.

Identification of H-fibroin Genes-
To obtain H-fibroin gene sequences from E. kuehniella and P. interpunctella, initial RT-PCR and genomic DNA PCR were performed with forward primers E1 and E2 in all combinations with reverse primers R1 and R2. The dominant RT-PCR products obtained with the E1/R1 pair were about 1000 nt long in E. kuehniella and 750 nt in P. interpunctella. The similarity of their sequences with the G. mellonella H-fibroin gene (9) left no doubt that they represented 5Ј ends of H-fibroin cDNAs of the respective species. The PCR with genomic DNA template and primers E1 and R2 resulted in amplification of multiple products ranging in size from about 500 to 2500 nt. Most of them were cloned and sequenced. The results showed that the first 42 nt at the 5Ј end of the genomic DNA fragments were identical with the 5Ј area of the RT-PCR products in both species. The following 660 (E. kuehniella) or 721 nt (P. interpunctella) were present only in the genomic DNA, whereas the terminal parts of the genomic fragments again matched the RT-PCR products. It was obvious that the first 42 nt represented the first exon, the 660 or 721 nt an intron, and the remainder was part of the second exon.
The non-coding upstream region of the H-fibroin gene was identified in E. kuehniella by means of inverse PCR. A ClaI restriction site, which was found in the intron (in a position later identified as 474 -479 nt from the transcription start), was taken as the 3Ј limit of the amplified sequence. A fragment of 800 nt was amplified by nested PCR, and its sequence proved that it contained the 5Ј end and upstream flanking region of the H-fibroin gene. Sequence homology with H-fibroin genes of other species (4) allowed recognition of a transcription start site 28 nt upstream from the initiation codon. The 3Ј end of the fragment matched the first exon and part of the intron. As expected, the fragment was terminated with ClaI sites at both ends.
The major aim of our research was to reconcile DNA sequences encoding amino acid repeats that are involved in Hfibroin polymerization. To this end, PCR on genomic templates was performed with the semi-degenerate reverse primer R2 and specific forward primers based on the already known nonrepetitive region of the second exon. A "ladder" of PCR products ranging from 600 to 2400 nucleotides was obtained in both E. kuehniella and P. interpunctella, reflecting the presence of repeats containing sequence complementary to the R2 primer. Five products of different sizes were isolated from each species and sequenced from both ends. They proved to be identical at their 5Ј ends and similar at the 3Ј ends, but differed by the number of repeats that made up the major central part of each product. In addition, species-specific patterns of repeats could be recognized. The repeats were extremely uniform in both length (342 nt) and structure in E. kuehniella (GenBank TM accession number AY253534), whereas rather random assemblies of several types of short motifs were found in P. intepunctella (accession number AY253533). We regarded the identified sequences as typical for the repetitive region of the respective H-fibroin gene and refrained from searching for longer PCR fragments, especially as the extreme homogeneity of nucleotide repeats and lack of suitable restriction sites would make cloning of long repetitive fragments very difficult.
The arrangement and length of repeats found in E. kuehniella were slightly different from the previously disclosed cDNA tags. 2 To attest if the differences were due to repeat variation in different gene sections, we set out to examine cDNAs derived from the 3Ј end of the gene. The first strand cDNA was synthesized from total silk gland RNA with the aid of the F2 forward primer and the Trikant reverse primer. The PCR product contained 762 nt of which the last 19 nt represented a poly(A) tail. In other PCR runs with the F3 primer, we obtained an overlapping fragment nearly 750 nt long that also terminated with a poly(A) tail.
The Structure of H-fibroin Gene in Pyralid Moths-Alignment of the presently obtained DNA sequences with the Hfibroin gene of G. mellonella (9) facilitated interpretation of the new data. Of 3148 nt identified at the 5Ј end of the H-fibroin gene in E. kuehniella (GenBank TM accession number AY253534), the first 255 nt represent the non-transcribed upstream region which is followed by a 28-nt leader and 42-nt coding sequence of the first exon, a 660-nt intron, and 2163-nt from the proximal part of the second exon. The sequenced 3Ј end (GenBank TM accession number AY253535) includes 733 nt of which 471 nt encode amino acid repeats, 177 nt the nonrepetitive H-fibroin C terminus, and 85 nt represent the 3Ј non-translated region terminated with a polyadenine chain. Analysis of the H-fibroin 5Ј end beginning with the translation start site in P. interpunctella revealed an exon sequence of 42 and 721 nt intron and 1668 nt of a second exon (GenBank TM accession number AY253533).
The sequence alignment disclosed several homologous regions in the fibroin genes of pyralid moths (Fig. 1). Sequence similarities in the 5Ј gene end and adjacent 5Ј-flanking region extend to position Ϫ55. Distinct homology occurs between positions Ϫ12 to Ϫ55 and is particularly high in the area of the TATA box (Ϫ26 to Ϫ31 in E. kuehniella and Ϫ27 to Ϫ32 in G. mellonella). An alternative TATA box, which is present in G. mellonella at Ϫ47 to Ϫ51, is also associated with a small area of homology. Homology of the leader sequence between G. mellonella and E. kuehniella surpasses 70%.
The first exon and the initial 6 nt of the intron are identical in all three pyralid species (Fig. 1A). The introns are alike by having high contents of adenine and thymidine, but homology is restricted to the exon/intron and intron/exon boundaries demarcated by the classical GT/AG borders. The intron in G. mellonella H-fibroin is unusually long due to insertion of a repetitive DNA element, Gm1 (9). The beginning of the second exon exhibits a high degree of interspecies homology; the first 326 nt in G. mellonella, 323 nt in E. kuehniella, and 320 nt in P. intepunctella can be aligned with just a few gaps that represent occasional deletions or insertions (Fig. 1). The homology, which is particularly high between E. kuehniella and P. interpunctella, is interrupted by an insertion of 201 nt in G. mellonella and 69 nt in E. kuehniella just before the start of the repetitive sequence. The initial stretch of 90 nt of the repetitive region is clearly similar in all three species, then the G. mellonella sequence diversifies and interspecies similarity is lost. The partly conserved region of 139 nt, which encodes a portion of an amino acid repeat designated A-type (see below), reappears in the repetitive nucleotide sequence in the subsequent parts of the gene (data not shown) in all species. Most of the repetitive sequence, however, is species-specific.
DNA repeats were shown to be highly homogenous throughout H-fibroin gene in G. mellonella (9). Repeat homogeneity was now found in E. kuehniella (GenBank TM accession numbers AY253534 and AY253535) and to a lesser extent in P. interpunctella (GenBank TM accession number AY253533). For G. mellonella and E. kuehniella it has been demonstrated, and for P. interpunctella it is assumed that virtually identical repetitive DNA blocks occur both at the 5Ј and 3Ј ends of the gene, suggesting that they occupy the entire central gene region.
At the non-repetitive 3Ј end, about 150 nt of the coding region and 80 nt of the non-translated tail contain DNA blocks that are the same in G. mellonella and E. kuehniella. Very high homology between the two compared species occurs in the last 84 nt before the stop codon (Fig. 1B). The polyadenylation signal is localized closer to the stop codon in E. kuehniella than in G. mellonella. The poly(A) tail is added to the transcript about 70 nt after the polyadenylation signal in the former, and more than 110 nt in the latter species.
Deduced Amino Acid Sequence of the Analyzed H-fibroins-Analyzed gene sequences allowed us to deduce 1468 amino acid residues from the N terminus of G. mellonella H-fibroin, 735 in E. kuehniella, and 508 in P. interpunctella. The initial stretch of 18 residues, which complies to the rules characterizing the signal peptide (17), is nearly identical in all three species (Fig.  2A). The following sequence is diversified. Residues 19 -28 diverge in E. kuehniella, but the following region of 95 residues is 97% identical in this species and P. interpunctella. The region includes seven areas that are also conserved in G. mellonella. These areas contain acidic (Asp and Glu) and basic (Arg and Lys) residues in a neutral environment of amino acids with bulky side chains, such as Val, Leu, Ile, Asn, Pro, and Tyr ( Fig.  2A). The high homology between E. kuehniella and P. interpunctella is disrupted by an insertion of 23 residues just before the start of the repetitive sequence in the former species. An insertion of 67 residues is present before the start of repeats in G. mellonella. The insertions have relatively high Pro content in both species but are dissimilar otherwise. Ratios between the charged, neutral polar, and hydrophobic residues in the non-repetitive part of the H-fibroin are similar in the three species (Table I). The physiological significance of the high representation (22.8 -27.2%) of charged residues is unknown. We speculate that it may control the process of polymerization by preventing adherence of hydrated H-fibroin molecules during their synthesis in the endoplasmic reticulum, transport in secretory vesicles, and storage in the silk gland lumen.
The major central part of the H-fibroin consists of repeats that differ from the non-repetitive N terminus by a lower proportion of the charged residues and increased content of the neutral and hydrophobic residues that account for about 60 -75% of the sequence (Table I). In all species the repetitive region begins with a repeat designated A type, which includes a homologous zone encompassing 42 (E. kuehniella and P. interpunctella) or 43 (G. mellonella) residues and a species-specific terminal part (see below). The zone contains a central core composed of Pro, Val, Ile, and Glu, such as the PVIVIEE sequence in E. kuehniella, which is flanked by crystalline-like regions made of Ala, Ser, and Gly ( Fig. 2A). At the end of the zone, Gly alternates with bulky residues such as Leu, Pro, Asn, and Tyr. The degree of zone homology between the three species is shown in the alignment of sequences at the beginning of the repetitive regions ( Fig. 2A). The match ranges from 47% between G. mellonella and P. interpunctella and 52% between E. kuehniella and G. mellonella, to 69% between E. kuehniella and P. interpunctella.
Except for the homologous zone, which is reiterated at certain intervals, and the scattered short motifs of not more than five residues, the repetitive part of H-fibroin is dissimilar among the three compared species. This can be seen from the alignment of the terminal parts of the repetitive regions (Fig.  2B). The established amino acid sequences at the H-fibroin C termini include 375 residues of the repetitive and 69 residues of the non-repetitive regions in G. mellonella and 161 and 55 residues in E. kuehniella. Similarities in the repetitive region are restricted to short motifs that are dispersed in an otherwise dissimilar sequence. A stretch of about 35 residues following immediately after the repeats is even more diversified. Only the last 28 residues are highly conserved, including positions of the three cysteines that have been shown to form an intramolecular disulfide bridge and to provide linkage to the light chain fibroin in B. mori (18,19).
Characteristics of Amino Acid Repeats-Several types of repeats can be recognized, but all of them can be viewed as species-specific extensions of a zone of 42-43 residues that is conserved, albeit with some modifications, in the three species. Recognition of several repeat types accentuates conservation and reiteration of this zone that alternates with other amino acid tracts. Repeats designated type A include the homologous zone and a distal species-specific section. Interspecies sequence homology in this section and in the other repeats is restricted to short motifs that probably represent vestiges of an ancestral H-fibroin molecule. The sequence of B repeats resembles and is probably derived by duplication from the distal portion of the A type repeat of the given species (Fig. 3). Additional types of repeats were found only in P. interpunctella and represent combinations of shorter motifs.
In the H-fibroin of G. mellonella, A repeats of 63 residues alternate regularly with shorter repeats, designated B 1 (43 residues) and B 2 (18 residues). There are actually two subtypes of A differing in a few residues. One is associated with B 1 and the other with B 2 . Because each A repeat is always followed by a B type, the repetitions (AB 1 ) and (AB 2 ), which are 106 and 81 residues long, can be regarded as two versions of one long repeat. Repeat uniformity rests on the highly conserved length and composition of the encoding DNA blocks. About 12 assem- (P.i.). The sequences are aligned to maximize their mutual matching and to reflect homologies in the corresponding nucleotide sequences (Fig. 1). Amino acid residues in matching positions are printed on a gray background. A, H-fibroin N termini with signal peptides printed in italics. The beginning of the repetitive regions is marked with an arrow, and sequences regarded as the homologous part of the A-type repeat are underlined. B, sequences of C termini numbered backward from the last amino acid residue. Trailing ends of the repetitive region (underlined) contain similar amino acid motifs, and the blocks of the last 28 residues are nearly identical in the two species. Cysteine residues are double-underlined; for B. mori it has been shown that Cys in position Ϫ22 provides linkage to light chain fibroin, whereas the penultimate and last cysteine residues form an intramolecular bridge (18).

TABLE I Representation of different types of amino acid residues in H-fibroins
Per cent contents of acidic (Asp and Glu), basic (Lys and Arg), polar (Asn, Cys, Gln, Ser, Thr, and Tyr), neutral (Gly and Pro), and hydrophobic (Ala, Ile, Leu, Phe, Trp, and Val) residues in the N-terminal non-repetitive (signal peptide excluded) and the internal repetitive protein regions. blies AB 1 AB 1 AB 1 AB 2 (AB 2 )AB 2 make up about 95% of the waxmoth H-fibroin molecule (9). Two types of A repeats can be distinguished in E. kuehniella. In A 1 , the species-specific sequence succeeding the homologous zone includes a PYGPN motif, a crystalline region such as GSAASSAAASGAAGAGG, and a stretch of bulky residues alternating with Gly. The sequence is typically a 36-mer, but in a few cases it is truncated (in one case to 17 residues) or extended (to 39 residues). A 2 differs from A 1 by a few residue replacements in the homologous zone and by a longer unique sequence that consists of 49 residues. The first half and the last quarter of this extension contain doublets of bulky residues interspaced with Gly, and the third quarter is "crystallinelike," being composed of Ala, Ser, and Gly. Each B repeat begins with a nonapeptide GALLSGAAG and continues with a 42-mer matching the second half of the A 1 repeat.
The regularity of repeats varies within the repetitive realm of E. kuehniella H-fibroin. The beginning of repetitions is characterized by regular combinations of highly conserved repeats, A 1 and B, which make a repetitive unit of 72 ϩ 41 ϭ 113 residues (Fig. 3B). There are only a few regular variations in the sequence of nearly 600 residues that were identified. Pro, Ala, Ala, and Leu in positions 37, 50, 56, and 68 from the start of A 1 , and Val and Leu in positions 10 and 13 from the start of

FIG. 3. Amino acid repeats in the H-fibroin regions analyzed in G. mellonella (A), sample sequence taken from Ref. 9, E. kuehniella (B), and P. interpunctella (C).
Sequences are broken into repeats that are aligned to accentuate similarities. Several types of repeats can be recognized, but all can be viewed as extensions of repeat A that is characterized by a sequence conserved in all species (shown on a gray background). The B repeats seem to be derived from the distal parts of the A type and C repeats from a combination of the proximal and distal parts; possible origin of the D and E types is hard to trace. Residues in the exactly known positions are numbered from the translation start site. For E. kuehniella (B), internal sequences 1, 2, 4, 5, and 8 (their order is arbitrary) were deduced from GenBank TM entries BG695831, BG695832, AY033494, BG695835, and BG695828, respectively; residues identified after correction of the original nucleotide sequence (see "Experimental Procedures") are printed in lowercase. Repeats close to the C terminus (marked with t) span positions 379 and 56 upstream from the H-fibroin end. B, alternate with Gln, Ser, Ser, and Ile, and with Ala and Val, respectively (Fig. 3A). Previously reported sequences of randomly amplified H-fibroin cDNAs 2 and the newly established sequence of the H-fibroin 3Ј end (GenBank TM accession number AY253535) indicate that the regularity of repeats is disturbed in the central and terminal parts of H-fibroin. The A 1 repeat was found in doublets or in conjunction with A 2 , and neither the sequence nor length of either of them was fully conserved. One highly conserved B re-repeat occurs close to the end of the repetitive region that has an A 2 BA 1 A 1 BA 1 arrangement.
The H-fibroin of P. interpunctella is characterized by more repeat types and lower repeat regularity in comparison with the two other species (Fig. 3C). Three types of A repeats were defined based on the amino acid composition of the homologous zone; only one of these types was found in more than one copy. The terminal unique section of all A repeats is short and for the most part composed of bulky residues and Gly. Concatenations of triplets consisting of Gly, a hydrophobic and a polar residue, are also typical for the terminal part of repeat C, the major parts of repeats B and D, and the entire repeat E. "Crystalline" regions of 7-12 small residues (Ala and some Ser and Gly) are present in the N-terminal parts of repeats B and C.
Physical Properties of the Silk Fibers-Silk fibers spun by fully grown larvae at the stage of wandering were used to measure physical silk properties. Microscopic observations confirmed that the fibers were composed of two threads, each secreted by one of the paired glands and glued together by a layer of sericins. Diameters of the two threads were measured in native silks and the cross-section surfaces calculated under the approximation that the threads are cylinders. The sum of the values obtained for the threads was taken as the profile surface of the whole silk fiber. The fibers of pyralids were 1 order of magnitude thinner than the fiber of B. mori (Table II). Silk fiber diameters were comparable in G. mellonella and E. kuehniella, whereas the fiber of P. interpunctella was considerably thinner (Table II).
The tensile strength and extensibility were measured on individual fibers and the values expressed per m 2 of the profile surface. Measurements of B. mori silk were made for comparison. The silk fibers of E. kuehniella and B. mori resisted the highest pulling strength (Table II). The silk of G. mellonella was slightly weaker and that of P. interpunctella considerably weaker. It should be noted that our values for B. mori and G. mellonella were 4.1-and 6.8-fold, respectively, lower than reported previously (11,10). The difference may be due to the use of other measuring equipment and different origin of the silk fibers. The published data are based on measurements of well dried fibers obtained from cocoons, which had been stored for an unknown length of time, whereas we used freshly produced fibers. In addition, the fiber pulling strain exerted by a cocoonspinning larva is probably less than the pulling tension imposed by the body weight of a larva slipping on the smooth slanted plane in our experiments. However, because we used identical methods for fiber acquisition and measurements, the data in Table II provide reliable comparative values that can be correlated with the H-fibroin amino acid sequence.

DISCUSSION
Physiological Demands on H-fibroin Structure-Construction of a cocoon in which the larva pupates is a general function of lepidopteran silk. The cocoon must be firm to resist mechanical damage, durable to withstand harsh weather conditions, non-palatable to persist attacks by animals, and immune to microbial degradation. High tensile strength is the major mechanical requirement imposed on the cocoon silk. In the case of Pyralidae and some other groups of Lepidoptera, silk is also used to protect larvae that live in resilient silky tubes. Such silk must be strong to hold larval weight when it suspends on a silk fiber and extensible to allow expansion of the tube when the larva turns around. The results of physical measurements are consistent with these deductions. The silks of B. mori and A. pernyi, which are used exclusively for cocoons, were reported to have a tensile strength of 7.4 ϫ 10 8 and 5.8 ϫ 10 8 newton⅐m Ϫ2 , and extensibility of 24 and 35%, respectively (11). By contrast, the silk of G. mellonella, which is spun both for larval tubes and cocoon, is 7.5 ϫ 10 8 newton⅐m Ϫ2 strong and 101% extensible (10).
Selection for physical properties has not been the only driving force in the evolution of H-fibroin. Insects are unable to synthesize the amino acids Arg, His, Ile, Leu, Lys, Met, and Thr and possibly also Phe, Tyr, and Trp (20). These essential amino acids must be obtained in the diet or supplied by symbiotic microorganisms. Because both of these resources may be limited, it is not surprising that the content of essential amino acids is low in the H-fibroin of species employing silk solely for spinning cocoons, which are eventually discarded. The deduced content of essential amino acids in the H-fibroin of B. mori is less than 5% and in A. pernyi less than 13%. The latter value would be considerably lower if Tyr were not regarded as essential.
Silk used for the construction of larval tubes in the pyralid moths is partly recycled. A larva typically extends the tube to a wider dimension on one end and devours the opposite narrow end. Under these circumstances, good digestibility of the silk is desirable and selection pressure against the use of essential amino acids is reduced. The content of essential amino acids in the analyzed H-fibroins of pyralids actually amounts to about 17%, and the representation of non-essential but bulky residues (Pro, Val, Glu, Gln, Asp, and Asn), which are a rich energy source, is equally high. Our previous investigations revealed that silk digestibility is controlled by proteinase inhibitors, and possibly other specific proteins, which are absent in the tube silk but are added to the silk synthesized for cocoon spinning (21,22).
Characteristic Structural Features of Pyralid H-fibroin-The non-repetitive N and C termini resemble the corresponding parts of H-fibroins in other Lepidoptera (4). The composition and arrangement of the repetitive region, however, is unique. The repeats are characterized by a high representation of Ser and bulky residues, whereas the contents of Gly and Ala are low in comparison to B. mori (Table III). The presence of a homologous zone in the repetitive sequence is the most distinct feature of pyralid H-fibroins. It must be emphasized that G. mellonella represents a subfamily that is remote from the Phycitinae to which E. kuehniella and P. interpunctella belong. These taxonomic relations are consistent with the degree of similarity in the non-repetitive N-terminal H-fibroin region ( Fig. 2A) and justify generalizations of our data for the whole Pyralidae family.
Homologous zones of 42-43 residues are mutually separated by 61 or 36 residues in G. mellonella, usually 71 residues but occasionally as few as 11 residues in E. kuehniella, and 5-86 residues in P. interpunctella (Fig. 3). The regularity of the intercalated sequences allows the distinction of several types of repeats (the homologous zone is part of the A repeats). Repeat structures are species-specific, but certain common features can be recognized. The repeats in pyralid H-fibroin are principally multiples of three amino acid residues and contain regions built of tripeptides combining Gly with bulky residues, typically one with a hydrophobic and one with a polar side chain. In G. mellonella, there are also triplets with two Gly residues, such as GGL or GLG. The assemblies GLN or GPY are less frequent in this species but dominate certain repeat sections in E. kuehniella and P. interpunctella. Reiterations of the same triplet are rare; GLN, GLG, and GPY occasionally occur in doublets, and a string of four GLN was found in an extended and modified form of A 1 repeat in E. kuehniella (Fig. 3).
A short chain of alanines is conserved at the end of the homologous zone, in E. kuehniella in the sequence SSAAAAASSSSSG, for example. Interestingly, strings of several Ala and Ser residues dispersed in a variable sequence were detected in a silk protein cDNA in Euagrus, a species of the basal-most clade of spiders (23). The other crystalline motifs in pyralid H-fibroin, which occur at the beginning of the homologous zone, in the terminal part of A repeats, and in the other repeat types, contain at most triplets of alanines. For example, the sequence GAGGSSGSSAASGASG links the end of B 1 repeats with the start of A repeats in G. mellonella, and GSAAS-SAAASGAAGAGG is in B repeats of E. kuehniella. Various crystalline motifs occupy about 39% of the repetitive H-fibroin region in G. mellonella and P. interpunctella, and around 53% in E. kuehniella. They are distributed within sequences rich in bulky residues, being 9, 20, and 39 residues apart in G. mellonella, mostly 10 but rarely up to 30 residues in E. kuehniella, and 9 -30 residues in P. interpunctella.
The absence of concatenations of simple motifs and the short length of polyalanine chains distinguish H-fibroin of pyralid moths from the H-fibroins of Saturniidae (7,8) and Bombycidae (24), the only other lepidopteran families for which data are available. Fibroins of many spider silks also contain polyalanine chains (25)(26)(27) or reiteration of simple Gly-rich motifs (28). The situation in Pyralidae is exceptional. Distant similarity to the structure and distribution of their crystalline motifs can be found only in the minor ampullate spider silk (28,29).
Possible Origin of H-fibroin Repeats in Lepidoptera-Homologies of all known H-fibroins in their non-repetitive terminal regions reveal their monophyletic origin (4). It is reasonable to assume that H-fibroins evolved by preferential amplification and modification of certain motifs and subsequent homogenization of the emerging repeats. The diversity of H-fibroin structure in recent species suggests that H-fibroin evolution diverged to several pathways compatible with fiber polymerization. Diversification was promoted by deviations in the functional requirements of the silk, by nutritional factors, and possibly also by DNA structure constraints that seem to restrict the use of certain isocodons (30,31). The restriction has lead to a certain balance in isocodon frequency that characterizes insect families and subfamilies (Table IV). The sequence of the homologous zone in the pyralid H-fibroin indicates a possible course of evolution of the repetitive fibroin region.
A striking feature of the non-repetitive N-terminal part of pyralid H-fibroins is the spacing of charged residues, especially the distribution of six glutamates, each of them close to a valine or a similar residue and 10 or 15 positions apart from the next glutamate (Fig. 2). Conservation of five such groupings in the H-fibroins of Antheraea and Bombyx (32,33) indicates their functional importance. We suggest that they represent vestiges of an unknown molecule ancestral to the H-fibroin and that a sequence similar to the stretch SGEDKLVRTFVIETDAS-GNEVIY, which is present in positions 129 -151 in E. kuehniella H-fibroin (Fig. 2), might have been at the root of repeat genesis. We imagine that mutations and selection for polymerization fostered extension of the initial SG dipeptide to the crystalline region where the homologous zones begin (Fig. 3). Changes in single nucleotides would convert codons for Arg, Glu, and Asp to those encoding Gly and Ala. The central part of the sequence stretch, with adjacent charged and hydrophobic residues, is conserved in the homologous zone of the examined pyralids in several variations, such as GEVIVIDD and PAPVIVIED in G. mellonella, SAPVIVIEEN and GPVVIIEDN in E. kuehniella, and GPVVVIEEN in P. interpunctella. The subsequent crystalline motif, which is rich in Gly, Ala, and Ser, and the tail end of the homologous zone, which contains bulky residues, can also be traced to the non-repetitive sequence under discussion.
The homologous zone must have evolved before the family Pyralidae split into subfamilies. Further evolution apparently involved multiplication of the terminal tripeptides to fulllength A repeats. Their partial duplication gave rise to the B repeats. Concerted evolution driven by unequal crossing-over and gene conversion (34) homogenized repeat sequences within the gene. Homogeneity of the relatively long and complex repeats, which lack concatenations of single residues or oligopeptides, distinguishes pyralid H-fibroins.
In contrast to Pyralidae, the evolution of H-fibroin in the suprafamily Bombycoidea, represented by genera Antheraea (family Saturniidae) and Bombyx (Bombycidae), was driven by tandem duplication of short motifs (35), leading to catenations of alanine or GlyX dipeptides. Most of Antheraea H-fibroin is built of four types of alternating repeats, which are 20 -40 residues long and include chains of 13 alanine residues (7,8). The repeats in B. mori H-fibroin consist of GlyX iterations in which X is Ala in 65% of cases, Ser in 23%, and Tyr in 9% (24). The evolution of H-fibroin in Bombycoidea was convergent to that of some spider silks that also contain runs of polyalanines or poly(GlyX) motifs (23). Two Mechanisms of H-fibroin Tallying for Polymerization-Despite profound differences in the structure of H-fibroins, the caterpillars of Pyralidae, Bombycidae, and Saturniidae produce silk fibers of similar properties. It is generally accepted that H-fibroin polymerization depends on proper alignment of amino acid residues that cross-link peptidic chains with hydrogen bridges. We propose that there are two structural mechanisms facilitating apposition of the interacting amino acid motifs. The "bombycoid" type (Bombycoidea include Saturniidae and Bombycidae) is characterized by strings of alanines or oligopeptides built from 2 to 6 simple residues. The length of these crystalline alanine chains and oligopeptidic assemblies varies along the H-fibroin molecule (36), apparently due to mis-matching recombination and DNA polymerase slippage that is likely to occur in long repetitive DNA sequences (37). This is associated with great allelic variation in H-fibroin length, which varies by 15% in B. mori (38). The negligible impact on such variations on the silk fiber has also been shown for spiders (39). This length flexibility is possible because any fraction of the crystalline concatenations can form ␤-sheets and length unification is unnecessary for polymerization. The multiple repetitions ensure that some of the building units (Ala in Antheraea, GlyX in Bombyx) are always placed in positions appropriate for ␤-sheet formation.
Pyralid H-fibroins contain only short chains of alanines and no catenations similar to the GlyX multiplication in Bombyx.
The expansion of such crystalline units in pyralid evolution was probably constrained by selection for fiber extensibility and possibly also by H-fibroin utilization for the storage of essential and energy-rich amino acids. The evolution of pyralid H-fibroins led to homogeneity of long and complex repeats, which include several types of orderly spaced motifs. The regularity of homologous motifs, which occur in aligned H-fibroin molecules in mutually matching intervals, ensures that correct apposition of a single pair of interacting motifs implies orderly alignment and interaction of long repetitive sequences. Therefore, it can be expected that registered crystalline regions will form ␤-sheets. Conservation of other parts of repeats, such as tripeptides combining Gly with a hydrophobic and a polar residue, and the sequences of hydrophobic and charged residues in the center of the homologous zones suggest that they are also involved in polymerization; however, the nature of their interactions is unknown. We assume that they confer to the silk additional properties than strength. For example, silk elasticity has been traditionally ascribed to the non-crystalline, amorphous protein regions (15), such as those occupying most of pyralid H-fibroins.
Correlations between Repeat Regularity and the Silk Strength-Silk fiber tensile strength is believed to depend largely on the presence of ␤-pleated crystallites, but nothing is known about the control of their size. The H-fibroin of B. mori with more than 10 repetitions of the GAGAGS motif in each of about 60 reiterated units should contain huge crystals rendering the silk extremely rigid and much stronger that the silk of pyralids. However, presumably small and isolated crystallites in the H-fibroins of G. mellonella and E. kuehniella seem to provide the same tensile strength. A plausible explanation is that the amorphous regions in pyralid H-fibroin contribute to the fiber strength and that there are innate limits to the crystal growth in B. mori H-fibroin. Examination of the pyralid H-fibroins confirms that fiber strength is not strictly correlated with the abundance of crystalline units. Their proportion in the H-fibroin repeats is about 39% in both G. mellonella and P. interpunctella, but the tensile strength of H-fibroin is much higher in the former species. It appears that strength is correlated with repeat regularity, which is high in G. mellonella and degraded in P. interpunctella. Repeat heterogeneity is associated with irregular spacing and thereby disturbed alignment of the motifs whose interactions underlie H-fibroin polymerization. Imprecise apposition of short motifs reduces the number of possible interactions, and the resulting polymer is flimsy. We propose that fiber strength in pyralid H-fibroin depends on the uniformity of H-fibroin repeats.