Complete Sequence of the 24-mer Hemocyanin of the TarantulaEurypelma californicum

Hemocyanins are large oligomeric respiratory proteins found in many arthropods and molluscs. The hemocyanin of the tarantula Eurypelma californicum is a 24-mer protein complex with molecular mass of 1,726,459 Da that consists of seven different polypeptides (a–g), each occupying a distinct position within the native molecule. Here we report the complete molecular structure of the E. californicumhemocyanin as deduced from the corresponding cDNAs. This represents the first complex arthropod hemocyanin to be completely sequenced. The different subunits display 52–66% amino acid sequence identity. Within the subunits, the central domain, which bears the active center with the copper-binding sites A and B, displays the highest degree of identity. Using a homology modeling approach, the putative three-dimensional structure of individual subunits was deduced and compared. Phylogenetic analyses suggest that differentiation of the individual subunits occurred 400–550 million years ago. The hemocyanin of the stemline Chelicerata was probably a hexamer built up of six distinct subunit types a, b/c, d,e, f, and g, whereas that of the early Arachnida was originally a 24-mer that emerged after the differentiation of subunits b and c.

Hemocyanins are large allosteric multisubunit protein complexes representing one of three classes of respiratory proteins in animals. They are freely dissolved in the hemolymph of arthropods and molluscs and constitute the predominant group of hemolymph proteins in these taxa. Arthropod and mollusc hemocyanins differ substantially concerning subunit size, composition, quaternary structure, and evolutionary origin (1,2). Despite these marked differences, oxygen binding by hemocyanins of both phyla is mediated by a pair of copper atoms that are coordinated by six histidine residues (2,3).
Arthropod hemocyanins have proven to be an excellent model for studying multisubunit protein complexes by biochemical and biophysical approaches (1,2,4). These proteins are composed of heterogeneous subunits in the 75-kDa range that combine to form either a regular cubic single hexamer (1 ϫ 6) or multiple hexamers (2-8 ϫ 6), depending upon the species or physiological conditions (5). Full biological activity requires a specific arrangement of the hexamers within the native complex and that each subunit type occupies a distinct position within the hexamer (6 -8). Up to now, the primary structures of several hemocyanin subunits from Chelicerata and Crustacea have been determined (9 -11). In addition, detailed structural models of hemocyanin subunits from the Crustacea Panulirus interruptus (12,13) and the chelicerate Limulus polyphemus (14) have been established based on x-ray diffraction studies. Each hemocyanin subunit is folded into three domains with a highly conserved central domain 2 providing the binuclear copper site (CuA, CuB) for binding one oxygen molecule, Nterminal domain 1, and C-terminal domain 3, which are both less stringently conserved among arthropod hemocyanins (9 -11). Moreover, analysis of the exon/intron architecture of one hemocyanin gene supports the proposed protein domain structure (15).
The hemocyanin of the North American tarantula Eurypelma californicum is a native 24-mer protein complex consisting of two identical dodecamers with an estimated total molecular mass of about 1800 kDa (5, 16 -18). Formation of the 24-mer complex requires the aggregation of seven different subunit types in a constant stochiometric amount with four copies each of subunits a, d, e, f, and g, and two copies of subunit b and c (5,19). Each subunit type has unique immunological and physico-chemical properties but similar oxygenbinding behavior (20,21). Previous work has revealed the amino acid sequences of subunits a, d, and e by conventional protein sequencing methods or by cloning the corresponding cDNAs (22)(23)(24)(25)(26). To understand how structural differentiation and adaptive evolution led to individual subunits obtaining distinct roles requires analysis of the whole primary structure. Here we report the cDNA cloning of subunits b, c, d, f, and g. The availability of the deduced sequence data, together with previously identified a, d, and e subunits, have permitted us to investigate the intra-molecular evolution of the tarantula hemocyanin and to build a three-dimensional model of the 24-mer protein.

EXPERIMENTAL PROCEDURES
Animals-The tarantula E. californicum was purchased from Carolina Biological Supply. Animals were kept under standard conditions as described (27).
Cloning and Sequencing of Hemocyanin cDNAs-Total RNA was prepared from E. californicum hearts following induction of hemopoie-sis as described before (26). Oligo(dT)-primed cDNA libraries were established in phage gt10 and probed with two degenerate oligonucleotides designed according to the highly conserved amino acid sequences of CuA (HHWHWH)-and CuB (HNWGHV)-binding sites of E. californicum hemocyanin subunits a and e (26). The cDNAs of positive clones were cloned into pUC19 or pBluescript, and sequenced. Missing 5Ј-terminal regions of the cDNAs were obtained by 5Ј-rapid amplification of cDNA ends (Life Technologies, Inc.) using 1 g of E. californicum heart total RNA and three nested oligonucleotide primers specific for each subunit. Sequences were obtained either directly from the polymerase chain reaction products or after cloning into the vector pGEMTeasy (Promega).
Sequence Analysis and Phylogenetic Studies-The programs provided by the Software Package 8.0 from the Genetics Computer Group (Madison, WI) were used for sequence analysis. Sequences were aligned using PILEUP and imported into GeneDoc 2.5 for further manipulation. The alignment is available from the authors upon request. The PHYLIP 3.6 software package was used for phylogenetic analyses (29). Distances between pairs of protein sequences were calculated and corrected for multiple changes according to Dayhoff's empirical PAM 001 matrix (30) with the PROTDIST program. Phylogenetic trees were constructed either by the neighbor joining method or the maximum parsimony method implemented in the PROTPARS program. The reliability of the trees was tested by bootstrap analysis (31) with 100 replications (SEQBOOT program). To estimate the divergence times, the PAM matrix was imported into the Microsoft EXCEL 97 spreadsheet program (32). A linearized tree that corresponds to the phylogeny of the chelicerate hemocyanins was calculated on the basis that Merestomata and Arachnida separated about 450 MYA 1 (33). The confidence limits were estimated using the observed standard deviation of the protein distances.
Protein Structure Modeling-Homology models of the subunits were built on the basis of the three-dimensional structure of L. polyphemus hemocyanin subunit II (Protein Data Bank entry 1LLA; Ref. 14) by satisfaction of spatial restraints incorporated in the program MOD-ELLER 4 (34) and examined for quality by using several structure analysis methods (35,36). A tentative model of the four-hexamer hemocyanin was composed using the distances, angles, and other parameters as deduced from electron microscopic studies of E. californicum hemocyanin (7). The spatial arrangement of the seven subunit types was performed on the basis of immunoelectron microscopy studies and reassembly experiments (6,37).

RESULTS AND DISCUSSION
Cloning of Tarantula Hemocyanin cDNAs-In the tarantula E. californicum, hemocyanin is synthesized in hemocytes attached to the inner heart wall, and hemopoiesis is markedly induced after bleeding (38). To establish cDNA libraries enriched for hemocyanin mRNA, total RNA was prepared from tarantulas that had been bled 5 days before RNA isolation (26). From this RNA a cDNA library containing 5.4 ϫ 10 5 independent clones was constructed in phage gt10. To identify hemocyanin-specific cDNAs, degenerate oligonucleotide probes were used for screening that were derived from the amino acid sequence of the CuA-and CuB-binding sites. These sites are known to be most highly conserved among several heterogene-ous hemocyanin subunits of different arthropods (9 -11, 14). 36 positive clones were isolated that contained cDNA inserts of 0.8 -2.0 kilobases in size. Sequence analysis revealed that these cDNAs contained open reading frames coding for seven different polypeptides each encompassing the CuA-and CuB-binding sites. 11 of 36 cDNAs were found to code for the known hemocyanin subunits a (24,26), d (22), and e (23,26). Comparison with partial amino acid sequences of different hemocyanin subunits that had been determined before (37-41) allowed the unambiguous assignment of the residual open reading frames to subunit b, c, f, and g. Because all these cDNAs represented N-terminally truncated versions missing between 30 and 500 base pairs at the 5Ј end, the corresponding full-length cDNAs were obtained by a polymerase chain reaction-based approach. The sequences of the full-length cDNAs have been deposited in the EMBL/GenBank TM data base (Table I).
Sequence Analysis-The full-length cDNAs cover the complete coding regions for the different subunits together with 11-77 base pairs of the respective 5Ј-untranslated regions and the complete 3Ј-untranslated regions comprising the polyadenylation signal AATAAA and the poly(A) tail. The open reading frames translate into polypeptides of 624 to 631 amino acids with calculated molecular masses ranging from 71.5 to 72.4 kDa (Table I). This corresponds well with the apparent molecular mass of 67-76 kDa (17). In total, the 24-mer hemocyanin acquires a total molecular mass of 1,726,459 Da. The aligned complete amino acid sequences are presented in Fig. 1. Pairwise sequence comparison of single subunits revealed that 52-65% of amino acid positions are identical. 202 amino acids (32%) are strictly conserved throughout all seven subunits (Table II and Fig. 1). Arthropod hemocyanins are clearly divided into three structural domains (9,(12)(13)(14). Most sequence variation was found in the N-terminal domain 1 and in the C-terminal domain 3 showing 21 and 27% identity. In contrast, the central domain 2, which contains the copper-binding sites A and B, is strikingly conserved exhibiting 45% sequence identity.
Although hemocyanins are extracellular proteins, no signal peptides for transmembrane transport have been found in the subunit sequences. The available amino acid sequences from the N-terminal ends of the native polypeptides corroborate this result and show only the removal of the first (initiator) methionine (22)(23)(24). This is consistent with the observation that in chelicerate hemocyanin is synthesized by free ribosomes and subsequently released by cell rupture (38). Thus, the proteins do not pass through the Golgi apparatus, and despite of the presence of several putative N-glycosylation sites (NX(T/S)) in the primary structure of all seven subunits, no carbohydrate moiety was detected in the native tarantula hemocyanin (1).
Hemocyanin Structure-The hemocyanin subunits of E. californicum display high sequence identity (between 52 and 60%) to hemocyanin subunit II of the horseshoe crab L. polyphemus 1 The abbreviation used is: MYA, million years ago. (LpoHcII). The crystal structure of LpoHcII at 0.22 nm resolution had been elucidated by x-ray studies (14). Thus, the likely three-dimensional structures of the tarantula hemocyanin sub-units can be deduced by protein modeling using LpoHcII as a template (Fig. 2). The modeling process is straightforward. The models show only minor differences, all of which are located  Table I for accession numbers); II, subunit II of L. polyphemus (SWISSPROT accession number P04253). within the predicted loop regions. Root mean square values between 0.32 and 0.44 Å were calculated between the seven models. The positions of the amino acids conserved within all subunits were superimposed on the model of subunit a (Fig. 2). As expected, we found strong conservation within and around the copper-binding sites, most notably in the histidine-bearing ␣-helices 2.1 and 2.2 (CuA) and 2.5 and 2.6 (CuB). These structural similarities are the most likely explanation for the virtually identical oxygen binding properties of the different subunits (20,21). The cysteines forming the two disulfidebridges present in LpoHcII, which stabilize domain 3 by connecting ␤-sheet 3K and ␣-helix 3.5, are also conserved in the E. californicum subunits. There is little conservation at the surface of the predicted subunit. This result is not unexpected, because specific subunit-subunit contact regions are required for the self-assembly process (19). Only 14 amino acids are conserved within all phenoloxidases, hemocyanins and hexamerins (Fig. 1). Because these proteins employ very different functions (11,(42)(43)(44), these residues are most likely maintained because of constrains that are imposed to the structure of the subunit. Indeed, we found among these invariant residues five prolines and two glycines that are involved in the formation of turn structures. Other turns, most notably adjacent to the central ␣-helices 2.1 and 2.6, are formed by con-served aspartate, threonine, and arginine residues. The central core with the copper-binding sites is stabilized by a salt bridge between Arg 255 C-terminal to ␤-sheet 2C and an Asp 259 adjacent to ␣-helix 2.6 (cf. Fig. 1).
The models of the seven subunits were used to construct a tentative model of the native 24-mer E. californicum hemocyanin (Fig. 3). The subunits were arranged as deduced by electron microscopic images (7), and the positions of the subunits are displayed as deduced from immunoelectron microscopy (19). In fact, the model displayed in Fig. 3 is indistinguishable from the electron microscopic pictures of the native hemocyanin molecules.
Hemocyanin Evolution-The sequences of the seven subunits of E. californicum hemocyanin were aligned with three additional known hemocyanin sequences of Chelicerata and with those of the Crustacea. Pairwise comparisons show that the hemocyanin subunits of the Chelicerata are less conserved (Յ47.2% distance or 0.7540 corrected amino acid changes per site) than those of the Crustacea (Յ42.9% or 0.6250 changes per site), which reflects the more ancient divergence of the chelicerate hemocyanins (see below). For phylogenetic inference, the available prophenoloxidase sequences of Crustacea and insects were included into the alignment. Prophenoloxidase-like proteins are the most likely ancestors of arthropod  (7), and the subunits were arranged as deduced from immunoelectron microscopy (6,37). The subunit types are displayed in different colors: a, green; b, gray; c, brown; d, yellow; e, pink; f, blue; g, red.  2. Model of E. californicum hemocyanin subunit. The stereo view of the structure of subunit a was deduced by comparative modeling using the L. polyphemus hemocyanin subunit II (14). Positions strictly conserved in all E. californicum subunits are displayed in red, the copper atoms are displayed in dark blue, the coordinating histidines are in green, and the disulfide bridges are in yellow. The three domains are differentially shaded: the first domain is gray, the second domain is light blue, and the third domain is dark blue.
hemocyanins (11), and thus the contemporary arthropod prophenoloxidase was considered as the outgroup. Phylogenetic trees were constructed assuming maximum parsimony or with the neighbor joining method. In all types of analysis, the hemocyanin subunits of the Chelicerata form a single, well supported clade that is strictly separated from that of the crustacean proteins (Fig. 4A). This indicates that the subunits individually diversified in the subphyla after the Crustacea and Chelicerata diverged in the Cambrian or in an earlier period (45,46).
Diversification of the Chelicerate Hemocyanin Subunits-Within the clade of the chelicerate hemocyanins, the association between the E. californicum subunit b and c is strongly supported (100%). Moreover, the close relationship of these two subunits, as well as their early branching from the other chelicerate hemocyanins, is supported by comparative immunological studies (5,47). Subunits b and c play an important role in the formation of the four-hexamer complex. They form a stable heterodimer under conditions when the hemocyanin complex is partially dissociated and contribute to inter-hexameric contact regions (37) (Fig. 3). From this it is likely that the ancestor of the tarantula hemocyanin and, therefore, the ancestral hemocyanin of the Arachnida stemline was a hexamer composed of a, b/c, d, e, f, and g-like subunits, with each subunit occupying a distinct position in the hemocyanin structure. The formation of the four-hexamer protein was then possible after gene duplication and emergence of distinct b and c type subunits. Although the bootstrap support is not very strong (40% to 60% in different analyses), both parsimony and neighbor joining analysis suggest that the branch leading to b and c diverged first; therefore, these proteins are basic to all other tarantula hemocyanin subunits and phylogenetically the closest to the phenoloxidase outgroup (Fig. 4A). This is noteworthy because b and c are the only two tarantula hemocyanin subunits that exhibit a significant pseudophenoloxidase activity (42). This phenomenon may be interpreted as a reminiscence of the origin of the hemocyanins from prophenoloxidases (11).
There is good statistical evidence (94% bootstrap support) for a close relationship of the tarantula subunit a with subunit II of the horseshoe crab L. polyphemus, and the orthology of these subunits is supported by immunological data (47). These studies also suggested common epitopes in the e and g subunits on the one hand and in d and f on the other hand (5,28,47). However, this conclusion could not be confirmed by the sequence data. Although there is moderate support for a common clade of subunits d, e, f, and g (68%), the resolution of the relationships among these and the other chelicerate subunits is very poor, indicating a lack of phylogenetic signal, which may be interpreted as the result of a rapid diversification from a single primordial chelicerate hemocyanin. Phylogenetic analysis using a DNA alignment of the hemocyanin coding sequences did not perform better but showed essentially the same topology of the tree (not shown).
A Time Scale of Chelicerate Hemocyanin Evolution-Limulus subunit II and tarantula subunit a are orthologous proteins that most likely diverged at the time of speciation of the Merestomata and Arachnida. These taxa split between 420 and 480 MYA (33). Under the assumption of an approximate divergence time of about 450 MYA, a mean amino acid substitution rate of 6.4 Ϯ 0.4 ϫ 10 Ϫ10 /site/year was inferred from the distance data set. Then the earliest time of divergence of the chelicerate hemocyanin subunits (i.e. b/c versus others) was estimated to have occurred about 555 Ϯ 37 MYA (Fig. 4B), probably in the stemline of the Chelicerata. Immunological data suggest that the inter-hexameric contact subunits b and c split within the Arachnida but before the Araneae and the Scorpiones diverged (47) in the Silurian period (33). We calculate the time of the b-c split about 412 Ϯ 27 MYA. This date is consistent with the available fossil record of chelicerate evolution (33) and indicates that the chelicerate hemocyanin evolved in an at least approximate clock-like manner. The other E. californicum subunit types diversified between about 474 Ϯ 45 MYA. Thus, the mutations that differentiate the subunit types hemocyanin occurred more than 400 MYA, but apparently no distinct subunit emerged in the tarantula stemline after this date. A 24-mer built by seven different subunits appears to be the original quaternary structure of the Arachnida hemocyanins, and other subunit compositions of hemocyanins in several non-orthognathan taxa of the Arachnida should be interpreted as secondary rearrangements (5).
In summary, E. californicum hemocyanin is a true molecular fossil with a subunit organization that has been conserved for approximately half a billion years. However, the oxygen bind- FIG. 4. Phylogenetic relationship of the chelicerate hemocyanins. A, the simplified tree was deduced by neighbor joining analysis based on the alignment of the amino acid sequences. The arthropod phenoloxidases are considered as the outgroup (11). The numbers at the nodes are the statistical confidence estimates computed by the bootstrap procedure (31) using the neighbor joining. The bar represent 0.1 PAM distance (30). B, a time scale of the evolution in the chelicerate hemocyanins. The linearized tree was obtained on the basis of corrected protein distance data (PAM 001 matrix) (32). The divergence times were estimated as described. At the bottom of the figure, the arrow indicates the Merestomata-Arachnida split. AauHc6, Androctonus australis hemocyanin 6 (P80476); TtrHcA, Tachypleus tridentatus hemocyanin ␣ (9); CmaHc6, Cancer magister hemocyanin subunit 6 (AAA96966); PinHcA, P. interruptus hemocyanin subunit a (P04254); PinHcB, P. interruptus hemocyanin subunit b (P10787); PinHcC, P. interruptus hemocyanin subunit c (S21221); PvuHc, Palinurus vulgaris hemocyanin (P80888); PvaHc, Penaeus vannamei hemocyanin (S55387).
ing properties of the different subunits have remained constant (20,21), reflected by the conservation of the copper-binding sites. Nevertheless, the ancient differentiation suggests particular roles for each of the subunits within the native multihexameric hemocyanin, either related to the cooperative function or to the structure of this protein (6 -8).