Globin and globin gene structure of the nerve myoglobin of Aphrodite aculeata.

The globin of the nerve cord of the polychaete annelid Aphrodite aculeata was isolated and purified to homogeneity. The native molecule has a pI of 6.3 and acts as a dimer of two identical Mr 15,644.5 polypeptide chains as determined by electrospray mass spectrometry. It has an average affinity for oxygen (P50 = 1.24 torr) resulting from fast association (kon = 170 × 106 M−1·s−1) and dissociation rates (koff = 360 s−1). The partial primary structure of this nerve globin was determined at the protein level and completed and confirmed by translation of the cDNA sequence. The globin chain has 150 amino acid residues and a calculated Mr of 15,602.69 strongly suggesting that the amino terminus is acetylated. The absence of a leader sequence and the lack of Cys at the positions NA2 and H9 needed for the formation of the high Mr complexes found in extracellular annelid globins classify the Aphrodite globin with the cellular globin species. The Aphrodite nerve globin is unlikely to represent a separate globin family, as cDNA derived primers detect globin messenger RNA in muscle, gut, and pharynx tissue as well. The gene encoding this globin species is interrupted by a single intron, inserted at position G7.0. Comparison to other globin gene structures strongly suggest that introns can be lost independently, rather than simultaneously as a result of a single conversion event as suggested previously (Lewin, R. (1984) Science 226, 328).

The globin of the nerve cord of the polychaete annelid Aphrodite aculeata was isolated and purified to homogeneity. The native molecule has a pI of 6.3 and acts as a dimer of two identical M r 15,644.5 polypeptide chains as determined by electrospray mass spectrometry. It has an average affinity for oxygen (P 50 ‫؍‬ 1.24 torr) resulting from fast association (k on ‫؍‬ 170 ؋ 10 6 M ؊1 ⅐s ؊1 ) and dissociation rates (k off ‫؍‬ 360 s ؊1 ). The partial primary structure of this nerve globin was determined at the protein level and completed and confirmed by translation of the cDNA sequence. The globin chain has 150 amino acid residues and a calculated M r of 15,602.69 strongly suggesting that the amino terminus is acetylated. The absence of a leader sequence and the lack of Cys at the positions NA2 and H9 needed for the formation of the high M r complexes found in extracellular annelid globins classify the Aphrodite globin with the cellular globin species. The Aphrodite nerve globin is unlikely to represent a separate globin family, as cDNA derived primers detect globin messenger RNA in muscle, gut, and pharynx tissue as well. The gene encoding this globin species is interrupted by a single intron, inserted at position G7.0. Comparison to other globin gene structures strongly suggest that introns can be lost independently, rather than simultaneously as a result of a single conversion event as suggested previously (Lewin, R. (1984) Science 226, 328).
Hemoglobin is widely distributed in ganglia and the nerve cord of invertebrates in many phyla. However, this "nerve globin" is by no means common and can be present or absent in closely related species (2)(3)(4). This erratic occurrence has led to the speculation that nerve globins might constitute a distinct globin subfamily, possibly with as yet unknown function. In molluscs and annelids, it mainly occurs intracellularly at millimolar concentration in glial cells surrounding the nerve cord (5)(6)(7). Kraus and Colacino (8) clearly demonstrated that neural excitability in Tellina alternata (Bivalvia) is completely dependent on the oxygen stored on the globin in the nerves.
In molluscs and annelids, Mb 1 -like tissue globins occur in addition to Hb-like molecules, respectively, intracellular in circulating "coelomocytes" and extracellular dissolved in the hemolymph. The molecular architecture of the latter forms is highly variable but all are built up from M r ϳ17,000 globin chains (4,9).
Previously, no nerve globins have been characterized in detail. Therefore it is unclear whether they are an unique, novel globin type selectively expressed in nerve tissue or a normal "myoglobin" type also occurring in other tissues and over-expressed in nerve tissue.
In the ventral nerve cord of the marine polychaete Aphrodite aculeata, a monomeric Hb of M r ϳ17,000 with a hyperbolic oxygen dissociation curve (P 50 ϭ 1.1 mm Hg) is present (2,3). These characteristics are similar with those of the nerve globin of the mollusc Aplysia and strongly resemble these of a myoglobin type molecule.
Vertebrate and plant globin genes contain, respectively, two introns (in the B-and G-helix) and three introns (in the B-, E-, and G-helix). The intron insertion position is conserved at B12.2, E15.0, and G7.0. The three intron/four exon pattern of plants is proposed to be ancestral and all other globin gene structures would be derived mainly by intron loss (1). Several invertebrate and protozoan globin genes have been characterized and it has become clear that the intron/exon pattern is less conserved than originally expected (10,11). For example, the intron/exon pattern of the globin gene of the annelid, Lumbricus terrestris, is the same as in vertebrates whereas that of some nematode globin genes is plant-like. However, at least five different insertion positions for the central (E-helix) intron are documented in nonvertebrates. Therefore the evolution of the intron pattern in the globin gene family has become a subject of debate (10 -15).
Here we describe the kinetics of ligand binding of purified Aphrodite nerve globin, and the primary structure of the protein as determined by protein and cDNA sequence analysis. The structure of the gene encoding the globin polypeptide was determined as well.

MATERIALS AND METHODS
Purification of Aphrodite Nerve Globin-Live specimens of A. aculeata were collected in the North Sea or obtained from the University Biological Supply Millport, Scotland. The brilliant red ventral nerve * This work was supported by the Belgian National Science Foundation (NFWO) (to J. V. who is a Research Director and also acknowledged for Grants 20023.94 (to J. V. and L. M.) and G.2133.94 and 9.0008.93 (to E. L. E.)). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EBI Data Bank with accession number(s) U46754.
Protein Sequencing-Heme extraction was performed by acid acetone precipitation. Globin samples were carboxymethylated, maleylated, and cleaved separately with trypsin and endoproteinase Asp-N (19). The resulting peptide mixtures were separated by RP-HPLC using a Vydac C4 column developed with 0.1% trifluoroacetic acid/CH 3 CN. Peptides were sequenced in a ABI 471-B sequencer operated as recommended by the manufacturer. Globin sequence was partially reconstructed from relevant peptides using sperm whale myoglobin as a template (20,21).
Mass Determination-Electrospray (ES) mass spectra were recorded on a VG Quattro II triple quadruple mass spectrometer (VG Manchester, UK) equipped with a Kontron HPLC system (Kontron Instruments, Milan Italy) consisting of a 325 pump, a HPLC autosampler 465, and a 332 HPLC detector. Tuning of the instrument was done by injecting a 20 pmol/l solution of myoglobin. The electrospray carrier solvent used was CH 3 CN/H 2 O (50/50, v/v) containing 0.1% HCOOH. Samples were dissolved in the same solvent and injected via a Rheodyne loop injector of 100 l. The flow rate of the carrier solvent was 40 l. The capillary voltage was set at ϩ3.81 KV. The source temperature was 80°C. The flow rate of the nebulizing gas and the drying gas were 2001/h and 3501/h, respectively. The cone voltage was 25 V. The mass spectrometer was scanned from mass to charge 500-1500. Spectra were recorded in the multichannel acquisition mode and by averaging 6 scans. The relative molecular mass (M r ) was calculated using Masslynx software.
Construction of Degenerate Oligonucleotide Primers-Two primers were designed based on the obtained protein sequence data. Primer APH1, CAYGGNGCNAARTTYATGGA a 20-mer with 128 redundancies, corresponding to the sense strand predicted by the peptide fragment HGAKFME. Primer APH2, GTRTTRTANACYTTNGTCCA, also a 20-mer with 128 redundancies, corresponding to the antisense strand predicted by the peptide fragment TNYVKTW (Fig. 4).
cDNA Sequencing-mRNA was isolated from different tissues with a Fast Track mRNA isolation Kit (Invitrogen). Reverse transcriptase-PCR was carried out with the Stratagene kit. First strand cDNA was synthesized using an oligo(dT) primer. A PCR reaction was then performed using the degenerate primers. The PCR was carried out for 30 cycles of 94°C for 1 min, 50°C for 1 min, and 72°C for 1 min with Taq polymerase. The amplified fragment was blunt-end cloned into pBluescript KSϩ, recombinants were confirmed by PCR, and the DNA sequence obtained from double stranded templates using T7 polymerase (Pharmacia Biotech Inc.) (22). Specific primers APH3 and APH4 were derived from the obtained DNA sequence (Fig. 5). APH3 and oligo(dT) primer were used in a PCR reaction to obtain the 3Ј end of the cDNA.
The 5Ј end was obtained with a 5Ј rapid amplification of cDNA ends experiment (RACE) (Life Technologies, Inc.) (23). First strand cDNA was synthesized with APH4 (specific antisense primer). A poly(C) tail was added to the 3Ј end of the cDNA with terminal deoxynucleotide transferase. The 5Ј end was then amplified using an oligo(dG) adaptor and the specific nested primer APH6 in a PCR of 30 cycles each consisting of 1 min at 94°C, 1 min at 55°C, and 2 min at 72°C. The sequence was determined as described earlier.
Genomic DNA-Isolated gDNA was used as a template in a PCR with primers APH3 and APH8 and in a PCR with primers APH5 and APH10. Fragments were cloned and sequenced as described earlier.
Binding Kinetics-Ligand rebinding kinetics were measured after photolysis with 10-ns laser pulses at 532 nm (Quantel, France). Samples were 10 M in protein and equilibrated under 1 atm oxygen (1.2 mM at 20°C) or 1 atm CO (1 mM at 20°C). Detection of the kinetics was made at 436 nm, which is near the deoxy peak absorbance (Fig. 2).
For the ligand dissociation rates, the ligand replacement method was employed. For oxygen displacement by CO, samples were prepared under a mixed CO/oxygen atmosphere. For nearly equal partial pressures of the two gases, the CO form is dominant, due to the higher affinity. After photodissociation, a large fraction of exposed hemes will bind oxygen since it has the higher on-rate. A second phase on a much slower time scale is the replacement of oxygen by CO to return to the original stable CO form. With the previously determined on-rates for oxygen and CO, one can simulate the kinetics to determine the oxygen off-rate.
For the CO dissociation rate, an aliquot of the CO sample was injected into a cuvette containing buffer with 1-5 mM potassium ferricyanide. The kinetics of the CO to Met transition were followed by measuring the absorption spectra every 3 s (initially) with a HP8453 diode array spectrophotometer.
Structural and Evolutional Analysis of Globin Sequence-The primary structure of Aphrodite nerve globin was aligned with relevant vertebrate and 145 nonvertebrate globins by means of existing templates (20,21,24). Penalty scores were calculated manually.
Based on this alignment, phylogenetic trees were constructed using the TREECON software (25).

RESULTS AND DISCUSSION
Isolation and Characterization of Nerve Hemoglobin-The nerve globin of Aphrodite was purified by ammonium sulfate precipitation and gel permeation chromatography as described under "Materials and Methods." This results in a preparation essentially free of contaminants as judged by one-dimensional SDS-PAGE (Fig. 1).
The purified nerve globin shows absorption spectra in the oxy, deoxy, cyano-Met, and carbon monoxide forms essentially as reported previously (2). The spectrum of the deoxy derivative displays a shoulder at 550 nm, next to the maximum at 568 nm, confirming the earlier observations (2). Cytochrome b type spectral characteristics (maxima at 528 and 558 nm) as reported for the deoxy form of Spisula solidissima nerve globin (26) are absent (Fig. 2). The apparent M r of the native protein as determined by HPLC permeation chromatography was 31,600 Ϯ 2,000. M r estimation by SDS-PAGE yielded 15,500 Ϯ 400. After purification, extraction of the heme group, carboxymethylation, and maleylation, the nerve globin was subjected to RP-HPLC and a single globin chain was detected (Fig.   3A). Two-dimensional SDS-PAGE of this fraction revealed the presence of a single protein spot (Fig. 3B). Thus we conclude that the native protein is a homodimer. Similar conclusions were reached for the nerve globin of the bivalve species T. alternata and S. solidissima (3,7). Interestingly Wittenberg et al. (2) calculated a M r of 15,600 Ϯ 1,000 for native Aphrodite globin from s 20,w ϭ 1.7. Concentration-dependent association/ dissociation might be the cause of this discrepancy.
Determination of the Primary Structure-Amino-terminal sequencing of approximately 1 nmol of globin yielded no phenylthiohydantoin-derivative signal, suggesting that the polypeptide was inaccessible for Edman degradation. Sequence analysis of overlapping internal fragments permitted the reconstruction of most of the polypeptide sequence as well as the construction of specific primers to amplify globin cDNA (Fig. 4). Full-length cDNA was isolated as described, and the sequence is presented in Fig. 5. It encompasses the entire coding region and confirms the amino acid sequence determined by Edman degradation. The initiation codon is preceded by at least 137 bases of untranslated sequence. No secretory leader sequence is present. This provides evidence that this globin is intracellular and a tissue or Mb type. The open reading frame extends for 150 codons and is followed by a 427-base long 3Ј-untranslated region ending with a polyadenylated tail. TATA box and polyadenylation signal are present.  The M r calculated for the deduced polypeptide is 15,602.69. Using ESMS, a M r of 15,644.5 Ϯ 0.48 was determined. Acetylation of the amino terminus would increase the mass by 43.03, which matches the difference of 41.81 Ϯ 0.48 between measured and calculated M r . Hence, we conclude that the amino terminus is likely blocked by an acetyl group. Modified amino termini are not uncommon in globins in general. N-Acetylated termini are found in nonvertebrate globins intracellularly in erythocytes or muscles but never in extracellular globins ( Table  I). The meaning of this difference is unclear.
The protein and cDNA derived sequence of the Aphrodite nerve globin is confirmed by the primary structure derived from the globin gene sequence (Fig. 5).

Structural and Phylogenetic Aspects of Deduced Amino Acid
Sequence-The alignment of the Aphrodite myoglobin sequence with the globin fold is unambiguous: (i) by the exclusion of polar residues from 33 out of the 33 invariant nonpolar sites (27), (ii) by the alignment of Pro-C2, which determines the folding of the BC corner, (iii) by the presence of the conserved heme-linked His(F8), and (iv) by the presence of the invariant Phe(CD1) (Fig. 6).
The Aphrodite nerve globin thus aligned matches both vertebrate and nonvertebrate globin templates quite well (20,21). This is illustrated by the low penalty scores and proves that all major determinants of the gobin fold are conserved (Table II).
Comparison with 145 nonvertebrate globin sequences reveals the highest similarity (31-32%) with the polymeric globins of the Glycera group (24,28). All helical segments identified in sperm whale Mb and in the monomeric globin MII of Glycera (29) are also predicted to be present in the Aphrodite nerve globin. In addition, a 7-residue long D-helix, typical of ␤ type globin chains, is present. The distal heme ligand is provided by His as in most globins, including the polymeric Glycera globins. The presence of Leu at that position in Glycera globin MII in MIV is exceptional (28). Most heme contacts are conserved, although some are unusual. Lys(CD4) and Arg(F7) replace the common residues Tyr or Phe, respectively. Similar   (20,21) The abbreviations used are: Phys; P. catodon; Glyc, G. dibranchiata; Tyl, T. heterochaetus; Lum, L. terrestris; Tub, T. tubifex. changes were also seen in the globin species present in the perienteric fluid in Ascaris, and as these positions coincide with surface crevices, it was suggested that both basic amino acids could force their polar groups to reach into the solvent (30). Lys(CD4) and Lys(E10) might also form salt bridges with the heme propionates. The occupation of E11 by Phe is only shared in the Calyptogenae soyoae and Lucina pectinaria hemoglobins, two globins having a heme environment exceptionally rich in aromatic amino acids (31)(32)(33).
An evolutionary tree representing all sequenced annelid globin chains is depicted in Fig. 7. The Urechis caupo (Echiura), Lameliibrachia sp (Vestimentifera), and Oligobrachia sp (Pogonophora) were also included because they represent related phyla of coelomate worms. This tree clearly shows that there are two distinct clusters of globin chains. In fact, all sequences grouped with the upper cluster represent globin chains that form the giant polymeric globins (M r Ͼ 10 6 ) of the extracellular fluid. They all have Cys occupying the positions NA2 and H9, these residues being required for aggregation of the composing chains into giant polymeric complexes (4,21,24). The lower cluster of globin chains is formed by intracellular globins: the Aphrodite nerve globin, and the globins from Glycera and Urechis, present in coelomocytes. Conceivably, a gene duplication of an ancestral globin gene initiated the separate evolution of the intra-and extracellular lines of globin evolution well before the divergence of the species mentioned.
Globin Expression in Different Tissues-Using nerve globinspecific primers the presence of nerve globin mRNA in different tissues was analyzed by PCR. Fig. 8 clearly illustrates that globin mRNA can be detected in nerve, longitudinal muscle, gut, and pharinx tissue. The presence of the mRNA, however, does not guarantee the presence of the functional molecule. Next to the brilliant red nerve tissue, only the pharyngeal tissue is slightly pink and spectral analysis reveals the presence of an Hb-like molecule. Although we cannot entirely exclude the possibility that one or more isoforms might be detected by this experiment we feel it more likely that the same nerve globin species is expressed in other tissues as well. This proves that the nerve globin of Aphrodite must be considered as a normal Mb, with a novel expression site.
Functional Characteristics-The k on and k off values, respectively for O 2 and CO of the Aphrodite Mb were determined (Table III). The molecule shows rapid ligand rebinding. The oxygen and CO association rates are faster than those for whale or horse Mb, but are typical of legHb. The oxygen association rates are among the highest observed, approaching the diffusion limit. The dissociation rate for oxygen is also relatively high, resulting in an average oxygen affinity: less than legHb or Glycera, but more than whale Mb. The rate of autoxidation of 0.05/h at 37°C is typical for Hb or Mb with this oxygen affinity, indicating no stability problems. Based on its intracellular location, its abundance, and its functional characteristics, the Aphrodite nerve globin most likely functions as an oxygen store. Similar functions are found in the gastrotrich Neodasys (34).
Gene Structure-Using cDNA derived primers in PCR reactions, two overlapping, genomic DNA fragments were amplified covering the entire globin gene. They were cloned and partially sequenced.
Comparison of the globin gDNA with the cDNA reveals that the coding sequence is interrupted by a single 2.5-kilobase intron between bases 330 and 331 (Fig. 5). The standard splicing donor and acceptor sequences are present.
Based on the alignment of Fig. 6 it can be concluded that the intron insertion position is at the conserved position G7.0.
The structure of the Aphrodite globin gene is unusual. As mentioned before, globin genes mainly contain both the B and G helix introns together or neither of them. This intron configuration (B12.2 and G7.0) (Fig. 9) for example, occurs in the gene coding for the extracellular hemoglobin of the annelid, Lumbricus (12). The single G-helix intron in Aphrodite strongly suggests that the loss of individual introns can occur independently rather than simultaneously being caused by a single conversion event as suggested by Lewin (1). The presence of a single B-helix intron in the second domain of the Pseudoterranova globin gene confirms this view (35,36).