Fetal globin expression in New World monkeys.

Reverse phase chromatography of the globin chains of adult, newborn, and fetal erythrocytes from three species of New World monkeys (Cebus apella, Aotus azarae, and Callithrix jacchus) representing three of the seven platyrrhine clades showed that gamma-globin expression was fetal in these animals. The globins were identified by a combination of chemical sequencing and mass spectrometric analysis. Since gamma-globin expression is fetal in the other major simian branch, the catarrhines, but embryonic in prosimian primates and nonprimate placental mammals, the evolution of fetal recruitment can now be assigned to the period between the simian-prosimian divergence (55 million years ago) and the platyrrhine-catarrhine divergence (35 million years ago). The gamma-globin gene underwent tandem duplication during the same evolutionary epoch, in accord with a model that suggests that the downstream duplicated gamma-gene (gamma2) was free to acquire the mutations necessary for fetal recruitment. Mass spectrometric analysis of tryptic digests of the gamma-globins verified the amino acid sequences deduced from genomic sequencing. Detailed analysis of high performance liquid chromatography and matrix-assisted laser desorption/ionization mass spectrometry data showed that gamma2-globin in Cebus was expressed to a far greater extent than gamma1-globin, supporting inferences drawn from a study of the promoter sequences. A "pre-gamma"-globin was observed in C. apella and shown to be primarily the glutathionyl adduct. The other species, A. azarae and C. jacchus, also express only one gamma-globin polypeptide. This work provides biochemical evidence of an evolutionary trend in the platyrrhines to alter the duplicated gamma-globin gene locus so that only one gamma-globin polypeptide is expressed.

␥, A ␥, ␦, and ␤. These genes are expressed at defined times during ontogeny. In humans (1), the ⑀-globin gene is embryonic, expressed during the first 20% of gestation, after which it is down-regulated. The ␥-genes are up-regulated at this time and are expressed for the remaining 80% of prenatal development. After birth, ␥-gene expression declines as the adult ␦and ␤-globins become the predominant forms. The mechanism of the developmental switch from one globin gene to the next is of fundamental importance for an understanding of stage-specific gene expression. There is a possible clinical benefit as well, since an ability to maintain or revive expression of the ␥-globin gene can be a route to treatment of the severe clinical conditions associated with abnormal adult ␤-globins, such as ␤-thalassemia and sickle cell anemia.
An evolutionary approach has been productive in identifying important promoter control regions. Alignment of promoter regions from globin genes with similar developmental timing has revealed conserved sequences important for transcriptional control (2-4). More recently, a detailed comparison of differences near these conserved regions in genes that are expressed either embryonically or fetally has identified base changes associated with alternative developmental timing (5). Examination of globin gene promoter structure and expression in the descendants of the primate ancestor will be particularly informative, since fetal recruitment of ␥-globin occurred in this lineage (Fig. 1). The galago, a prosimian primate, has a ␤-globin gene cluster that differs in two significant ways from the human (2). First, the galago possesses only one ␥-globin gene. Second, expression of the single ␥-globin gene is embryonic, rising and falling simultaneously with ⑀-globin expression during the first 30% of gestation. The galago ␦and ␤-globins are the functional globins for the remaining two-thirds of prenatal development. Thus, ␥-gene expression was delayed to fetal life at some point after the divergence of the prosimian and simian lineages, about 55 million years ago.
Studies of humans and other catarrhine primates have shown that ␥-globin expression is fetal in these species, but little data are available for the other major simian group, the platyrrhines (New World monkeys). A single study has reported the presence of a globin in a newborn marmoset with an electrophoretic mobility different from the adult form, although its identification as a ␥-globin was based only on the amino acid analysis of one cyanogen bromide fragment (6). We have now examined globin expression during fetal life in a group of platyrrhine primates. Together with the earlier studies of catarrhine globin gene expression, these results place the emergence of fetal recruitment in the same 55-35 million years ago time period as the duplication of the ␥-globin gene (7) addition, evidence is presented that in the platyrrhine lineage, there is a trend toward a reduction of the number of functioning ␥-genes from two to one. This is associated with divergent promoter regions (8), with large scale deletion (9), and crossing over events (10).

MATERIALS AND METHODS
Blood samples were obtained from a total of 15 animals of the following species: Cebus apella (brown capped capuchin), mother 1 with fetus, mother 2 with fetus, mother 3 with a newborn and two fetuses; Callithrix jacchus (common marmoset), one mother and fetus, two other fetuses from unrelated mothers; Aotus azarae (owl monkey), one mother, together with one fetus and one newborn. The fetuses were obtained from spontaneous miscarriages occurring after one-half to two-thirds of gestational time.
Chromatography-Reverse phase HPLC 1 was carried out essentially as described by Shelton et al. (11) using a Vydac C4 column on a Varian ISO 9001 apparatus. Quantitation of the amounts of the various globins was obtained by integration of the peak areas monitored at 214 nm. Globin polypeptides were purified for chemical analysis by pooling peaks from four or five runs and rechromatographing. The purified samples were lyophilized.
Hemoglobin A 2 was isolated on DE52 cellulose as described by Huisman (12). Peak fractions were dialyzed against ammonium bicarbonate, lyophilized, and dissolved in 0.1% trifluoroacetic acid for HPLC analysis.
Amino acid sequencing was done with an Applied Biosystems gas phase sequencing apparatus. When required, the globins were cleaved with cyanogen bromide directly on the glass filter support used in the sequencer, and sequencing was performed after washing the filter.
Mass Spectrometric Analysis-Analysis by matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) were performed with a PerSeptive Biosystems Voyager Elite mass spectrometer equipped with a nitrogen laser ( ϭ 337 nm). Data were acquired in the linear mode of time-of-flight operation with a 50-MHz transient recorder. All experiments were performed using ␣-cyano-4-hydroxycinnamic acid (Aldrich) as the matrix. Saturated matrix solutions were prepared in a 1:1 solution of acetonitrile/aqueous 0.1% trifluoroacetic acid. To prepare the sample, 1 l of the globin or tryptic digest was added to 1 l of the matrix solution and applied to a stainless-steel sample plate. The mixture was allowed to air dry before being introduced into the mass spectrometer.
Tryptic Fragment Analysis-Tryptic digestions were carried out in 400 mM Tris-Cl, pH 8.0, 6 M guanidine, 40 mM CaCl 2 , heated to 90°C for 15 min. Sequencing grade trypsin was then added at a protein/enzyme ratio of approximately 20 (w/w). The mixture was then incubated at 37°C in a water bath for 24 h. Digestion mixtures were analyzed by MALDI-MS without further fractionation or purification.
TCEP Reduction-Protein samples were mixed with 50 volumes of 10 mM tris[2-carboxyethyl]phosphine hydrochloride (TCEP, Pierce) and incubated at 37°C for 30 min. Genomic Sequencing-The ␥-globin region of C. jacchus was obtained by Long polymerase chain reaction as described elsewhere (10). RESULTS The results of chromatographic separation of the globin chains are shown in Fig. 2 with the peak identifications. The amino acid sequences for C. apella ␣and ␤-globins are known (13), as well as genomic sequences for the ␥1and ␥2-globin regions (10), and this species was selected for detailed analysis and verification of the peak assignments. The C. apella ␥-globin amino acid sequences are identical to the previously published Cebus albifrons sequences (8). For convenience, the major HPLC peaks are labeled A-E on the fetal C. apella sample, which contains the relevant globin chains.
␣and ␤-Globin-Peak A was identified as ␣-globin because of its constant presence in all blood samples. Moreover, the N-terminal sequence of peak A from C. apella was VHS-PAEEK, identical with the known ␣-globin sequence (13). Peak C is ␤-globin, with an N-terminal sequence of VHLTAEEK, in agreement with the published ␤-globin sequence (13). It was found in adult samples at an approximate 1:1 ratio with ␣-globin.
Peak B eluted at the same position as human "pre-␤"-globin, which has been described as a mixture of post-translationally modified ␤-globin polypeptides (14), sometimes including an adduct with GSH. Edman degradation of C. apella peak B yielded the ␤-globin N-terminal sequence, and MALDI-MS found two major polypeptides with masses of 16,013 Da (the predicted ␤-globin molecular mass) and 16,318 Da, the mass expected for a ␤-globin-GSH adduct. Phosphine reduction of the pre-␤-globin to remove the glutathionyl residue eliminated the 16,318 peak, which was shifted downward to be coincident with the 16,013 mass peak (data not shown). Thus, the HPLC peak B was a C. apella pre-␤-globin, consisting largely of glutathionyl-␤-globin.
␦-Globin-A late eluting minor peak, seen only in the adult and newborn samples, was identified as ␦-globin. Hemoglobin A 2 was prepared by DEAE chromatography of undenatured hemoglobin. Upon reverse phase HPLC, the late-eluting putative ␦-globin was recovered, together with ␣-globin. In addition, Edman degradation of the putative ␦-chain yielded the Nterminal sequence VHLTGEEKAAVA. Although the ␦-globin sequence for C. apella is not known, inspection of the known sequences of ␦-globins from New World monkeys (Fig. 3) indicates that all have alanine at position 12, which distinguishes them from the otherwise nearly identical ␤-chains.
␥-Globins-Peaks D and E are candidate ␥-globins because of their restriction to fetal samples. N-terminal sequencing of the purified Peaks D and E was unsuccessful. This was not unexpected, since the deduced amino acid sequences of the C. apella ␥-globins (10) (identical to C. albifrons (8)) predicts an Nterminal serine. Such serines are normally acetylated in globins (for review, see Ref. 1).
The Major ␥-Globin (Peak E)-For positive identification, advantage was taken of the two methionines of the ␥ sequence. Cyanogen bromide cleavage was performed on the larger peak E polypeptide, still bound to the glass filter used as a support for N-terminal sequencing. Thirteen cycles of Edman degradation yielded two amino acids at each cycle as follows: GV, NA, PG, KV, VA, KS, AA, HL, GG, AS, KR, VY, LH. These are precisely those expected from the ␥-globin sequences (Fig. 4).
These cyanogen bromide fragment sequences do not distinguish which of the two C. apella ␥-chains was present. C. apella has two ␥-globin genes (10), but ␥1 and ␥2 differ in only one amino acid residue (Fig. 4), position 116 (V in ␥1, I in ␥2). There is no convenient cleavage site in the vicinity of this residue, precluding identification by Edman sequencing. Therefore, MALDI-MS was used to determine the molecular weight of the intact globin (Fig. 5). Within experimental error (Ϯ0.1%), the measured mass [M ϩ H] ϩ at m/z 15,852 was the same as the predicted value for the ␥2 gene product, which is 15,805 plus a mass of 42 for N-terminal acetylation. The identification was confirmed by mass spectrometric analysis of a tryptic digest of C. apella HPLC fraction E (Fig. 5), which showed peaks corresponding to the expected [M ϩ H] ϩ ions of peptides for the ␥2 sequence (Table I). Nearly the entire amino acid sequence predicted by the genomic analysis (10) was verified by this mass analysis, with the exception of residues 61-83, which produce a number of tryptic fragments too small to be detected in the MALDI spectra (i.e. below mass 500, where matrixrelated peaks predominate). Most importantly, the fragment containing residues 106 -121 with the single amino acid difference between ␥1 and ␥2 at position 116 has a predicted molecular mass of 1684 for ␥2 and 1670 for ␥1. Only the 1684 mass peptide was found, showing that the larger ␥-globin peak E was ␥2-globin.
The Minor ␥-Globin (Peak D)-The smaller HPLC peak D was largely the GSH adduct of ␥2-globin. MALDI spectra of HPLC peak D always found two species with molecular masses at 16,162 and approximately 15,840 (Fig. 6). The larger molecular ion is close to the predicted mass for glutathionyl-␥2globin, with [M ϩ H] ϩ at m/z ϭ 16,157 Da (15,852 for ␥2-globin plus 305 for glutathionyl). Phosphine reduction, which cleaves the disulfide linkage to glutathione, shifted the 16,162 mass peak to 15,842, the ␥2-globin mass, verifying the assignment (Fig. 6, inset). MALDI-MS of a tryptic digest of the minor ␥-globin after TCEP reduction was indistinguishable from that of HPLC peak E (data not shown). Thus, HPLC peak D is a "pre-␥2"-globin.
It is likely that the glutathionyl-globin adducts in the pre-␤ and pre-␥ fractions formed during the relatively long period required to transport the samples from Brazil to Detroit. Nor-mally, intraerythrocytic human ␤-globin has no GSH adduct (14,15), but significant amounts can form during storage (16,17).
Quantitation of ␥1-globin Polypeptide-After treatment of the HPLC peak D with TCEP and reapplication to reverse phase HPLC, about 75% of the material was found to elute at the E peak position, indicating that most of the D peak was the GSH adduct of ␥2-globin. The precise identity of the 25% of peak D that survives reduction is uncertain. Incomplete reduction by TCEP was ruled out by MALDI-MS of the tryptic digest of reduced peak D, in which no fragments characteristic of the glutathionyl adduct were seen. There are at least two other possibilities. Schroeder et al. (14) found that human pre-␤ contained a post-translationally modified ␤-globin of uncertain chemical structure. Thus, the non-reducible component of HPLC peak D may represent a similar modification of ␥2globin. Alternatively, the residual material may represent a low level of expression of ␥1-globin, although no peptide fragment characteristic of ␥1-globin was observed in the MALDI-MS of a tryptic digest of HPLC peak D. An upper limit for the amount of ␥1 expression can be estimated in the following way. Integration of the HPLC peaks (Fig. 2a) showed that the minor ␥-globin peak D is 15% of the total ␥-globins, and, in turn, MALDI-MS of D found that at least 75% was the glutathionyl derivative of ␥2-globin. Thus, ␥1 can contribute no more than 4% of the expressed ␥-globin. C. apella ␥2-globin is expressed at levels that are at least 20 times greater than ␥1.
Other Species-The chromatographic separation of globins from A. azarae and C. jacchus is shown in Fig. 2, b and c. The globins were identified by comparison with C. apella and by the timing of their appearance in fetal or adult life. MALDI-MS analysis of tryptic digests of purified ␥-globin from A. azarae (data not shown) verified the amino acid sequence deduced from genomic DNA sequencing (10). Note the presence of two fetal globin HPLC peaks in A. azarae, which has only one ␥-globin gene (10). In this instance, the smaller peak can only be a "pre-␥" resulting from a post-translational modification. Insufficient sample prevented direct chemical proof of this point, however.
C. jacchus, the common marmoset, has two ␥-globin genes. The genomic coding sequence of the two genes and the deduced amino acid sequence are shown in Fig. 7. Both ␥-genes code for identical polypeptides. There is a single base difference between the genes in exon 2, but the two codons are synonymous and the amino acid is Phe in both chains. MALDI-MS of the intact HPLC-purified ␥-chain found a single polypeptide, and a tryptic digest yielded fragments consistent with the deduced amino acid sequence (data not shown). A "pre-␥"-globin is also present in C. jacchus.
Timing of Globin Expression-All of the fetuses examined were post-embryonic. No embryonicor ⑀-globin chains were observed with HPLC, although elution conditions were varied  FIG. 4. The ␥-globin sequences of C. apella (10), and the predicted peptides obtained from cyanogen bromide cleavage. The bold I indicates the position of the single sequence difference between ␥1 and ␥2. This residue is V in ␥1.
extensively in an attempt to detect additional polypeptides. More importantly, in all samples, the amount of adult ␣-chain was equivalent to the sum of the ␤and ␥-chains, indicating that embryonic -globin was not present. It is very significant, therefore, that in all of the fetal erythrocytes examined, substantial amounts of ␥-globin polypeptides were seen (Table II), demonstrating that ␥-globin expression is fetal in these platyrrhine species, as it is in human and other catarrhines (6,18). Thus, the evolutionary events that resulted in ␥-gene expression during fetal life predate the catarrhine-platyrrhine divergence.
There were significant amounts of ␤-globin in all the fetal samples examined, amounting to as much as 75% of the ␤-type globin in A. azarae and 54% in C. apella, and no fetal hemoglobin was detected in the newborns. This contrasts with human infants, who have large amounts of ␥-polypeptides in the peripheral circulation for a number of months after birth. In the only well studied non-human catarrhine, the timing of ␥and ␤-globin expression resemble the human (18). These observations suggest that the timing of the ␥to ␤-globin switch is later in catarrhine than in platyrrhine species. DISCUSSION The data presented here, combining HPLC and mass spectroscopic analysis, establish that ␥-globin expression occurs in fetal life in platyrrhines, the New World monkeys. Since it is known that that humans (1) and other catarrhines (6,18) exhibit fetal expression of ␥-globin, the evolutionary emergence of fetal expression can now be placed earlier than the catarrhine-platyrrhine divergence (35 million years ago). Because ␥-globin expression is embryonic in prosimians (2), the appearance of fetal expression occurred after the prosimian-simian divergence (55 million years ago). Thus, the sequence changes in the ␤-globin cluster that led to fetal recruitment of ␥-globin occurred between 55 and 35 million years ago. Fitch et al. (7) have shown that the duplication of the stem simian ␥-gene took place in the same time period. They suggested that the duplication of the ␥-gene supplied a redundant ␥-gene copy which was then free to collect the regulatory and structural sequence changes required for fetal expression and function. The finding that ␥-gene duplication and fetal recruitment are closely linked in evolutionary time is in accord with this hypothesis.
A second important point was the observation that Cebus ␥2-globin was expressed at much higher levels than ␥1. This supplies experimental proof for inferences drawn from genomic data. There are numerous nucleotide differences between the ␥1 and the ␥2 promoters of C. apella, notably an aberrant proximal CCAAT box sequence, CCAAC, in the ␥1 promoter (10). The same promoter differences were found in another Cebus species, C. albifrons (8). Studies of globin expression

TABLE I Predicted and observed tryptic peptide masses for Cebus ␥-globins
The expected tryptic peptides and their masses were deduced from the DNA sequences of the Cebus ␥-globins (10). All of the predicted peptides are identical for ␥1 and ␥2, except for the 105-120 peptide, which contains the single amino acid difference. CaE yielded only the 1685 mass fragment expected for ␥2.

Fragment
Predicted peptides from ␥ sequence MH ϩ (mass) predicted MH ϩ (mass) observed in CaE  have assigned a prominent role to the proximal CCAAT box (19,20), and it was predicted that the presence of a noncanonical CCAAT sequence (CCAAC) would markedly downregulate the ␥1 gene (5,21). The measurements of the expressed globin polypeptides presented here verify this hypothesis, since the ratio of ␥2 to ␥1 expression was at least 20:1 in C. apella.
The very low expression of ␥1 in Cebus species is a manifestation of a striking evolutionary trend in platyrrhines toward reduction of two functional ␥-genes to one (8 -10, 22). The platyrrhine genera can be grouped in seven major clades (23), which can in turn be grouped into the families of Atelidae, Pitheciidae, and Cebidae (24). In the four genera of the Atelidae clade, a large deletion in ␥1 renders it nonfunctional (9). The Pitheciidae family contains two clades. In one, represented by Callicebus species, the proximal CCAAT box of the ␥1 promoter has the same disabling base change (CCAAt 3 CCAAc) noted in the Cebus ␥1 promoter (10). In Pitheciini, the other major clade of this family, the base change CCaAT to CCgAT is found in the proximal CCAAT box of the ␥1 promoter (10). The remaining four clades comprise the Cebidae family. In Cebus species, as we show here, only ␥2-globin is expressed at observable levels. In each of the two clades represented by A. azarae and Saimiri species, only a single ␥-globin gene exists, but in contrast with Atelidae, the mechanism was an unequal crossover event that has led to the formation of a single hybrid ␥-gene from the original duplicated pair (10). Only C. jacchus, representing the seventh clade, has two ␥-genes which are free of disabling promoter mutations (10), but code for identical polypeptides. It is unclear why significant expression from both ␥-globin genes has been preserved in the catarrhines while only one ␥-globin polypeptide is expressed in the New World monkeys.
The single hybrid ␥-globin gene in A. azarae has the promoter and 5Ј-flanking regions of a ␥1 gene. The fact that this hybrid gene is expressed at much higher levels than ␥1 of C. apella (but less than ␥2; Table II) may be related to the fact that the proximal CCAAT box of A. azarae, although noncanonical (CcAAT 3 CaAAT), is not the same variant as the C. apella ␥1 promoter (CCAAt 3 CCAAc).
The occurrence of gene duplication and fetal recruitment in the same evolutionary epoch suggests a number of possible scenarios. 1) Fetal recruitment might have preceded duplication. 2) If gene duplication occurred first, producing a redundant gene for fetal recruitment, subsequent gene conversions conferred fetal characteristics on the the second ␥-gene. Conversions between the ␥-globin genes have been frequent (25-   27), but the first explanation cannot be ruled out at the present time. The second model makes the interesting prediction that at the early stages of fetal recruitment, one of the duplicated ␥-genes would have remained embryonic while the other was evolving toward fetal expression. It is possible that in some platyrrhine species, the ␥1-globin gene may be expressed in embryonic life and thus represent a survival of this situation. A test of this suggestion will require analysis of hematopoietic tissues from early embryos of these species. A method known as phylogenetic footprinting (2-4) identifies promoter sequence regions that have been conserved throughout mammalian evolution and are therefore likely to be functionally significant. Recently, "differential" phylogenetic footprinting was introduced, which pinpoints lineage-specific base changes near the conserved promoter elements. This analysis (5) suggests that at least one of the changes associated with fetal recruitment is an alteration in sequences immediately upstream from the proximal CCAAT box of the ␥-gene.
The changes, which occurred in the stem anthropoid ancestor, seriously impair the binding of several trans factors (G1, G2, G3, G4) which bind avidly to prosimian ␥ promoter sequences and to all ⑀-gene promoters, i.e. to all embryonically expressed genes, but not to ␥-genes that are fetally expressed. It remains to be determined whether these changes are necessary and sufficient to explain fetal expression of the ␥-globin genes. Promoter sequence changes are not the only possible mechanism which might have facilitated fetal recruitment. The ⑀to ␥-gene distance is 6 kb in the prosimian galago, but the simian ␥ duplication increased the distance between ⑀ and the downstream ␥2-gene to 11 kb. It is possible that this increased distance prevents the LCR from interacting with the ⑀-promoter and the ␥2 promoter at the same time. This has been proposed as one of the events that allowed the duplicated ␥2 gene to escape the requirement for expression during the embryonic period (22).
The demonstration that ␥-globin expression is fetal in the New World monkeys will allow the inclusion of the known promoter sequences (8,10,21) from these species in the sequence comparisons and phylogenetic tests that aim to define the elements associated with fetal recruitment more precisely. Another valuable comparison will be the alignment of platyrrhine ␥-gene promoters with catarrhine sequences, with the aim of detecting those differences in the promoters that are associated with the delay of ␥-gene inactivation into postnatal life in humans.