Evolution of a Fetal Expression Pattern via cisChanges near the γ Globin Gene*

One basis for the evolution of organisms is the acquisition of new temporal and spatial domains of gene expression. Such novel expression domains could be generated either bycis sequence changes that alter the complement oftrans-acting regulators binding to control elements or by changes in the expression patterns of one or more of the regulatory (trans) factors themselves. The γ globin gene is a prime example of a gene that has undergone a distinct change in temporal expression at a defined time in evolution. Approximately 35–55 million years ago, the previously embryonic γ gene acquired a fetal expression pattern. This change occurred in a simian primate ancestor after the separation of simian and prosimian primates but before the further separation of the major simian lineages; thus, the (prosimian) galago γ gene retains the ancestral embryonic expression pattern, whereas the (simian) human γ gene is fetal. This analysis of galago and human γ genes in transgenic mice demonstrates thatcis changes in sequences within a 4.0-kilobase region surrounding the γ gene were responsible for the evolution of a novel fetal expression pattern in the γ globin genes of simian primates.

Reconstruction of the evolutionary history of the mammalian ␤ globin gene cluster indicates that in the common ancestor of marsupial and placental mammals (135 MYA), 1 two ␤-like globin genes existed, each with a different temporal expression pattern (1). This two gene cluster, 5Ј-⑀-␤-3Ј, persists in present day marsupials; the ⑀ gene is expressed in embryonic life, whereas the ␤ gene is active postembryonically (1,2). In early placental mammals, however, prior to the mammalian radiation (80 -100 MYA), a ␤ duplication produced two postembryonic genes (␦ and ␤), and ⑀ duplications produced three embryonic genes (⑀, ␥, and ) (3,4). Although gene duplications, gene inactivations, gene deletions, and even whole locus duplications or triplications have further modified the ␤ globin loci of all present day eutherian mammals, a clear relationship to this ancestral five-member cluster can still be appreciated (3). Moreover, these five genes have, for the most part, retained their ancestral programs of stage specificity; the ⑀ gene is embryonic, and the ␤ gene is postembryonic in all extant eutherian mammals. However, some lineage-specific alterations in the temporal expression of globin genes have occurred. An important example is the ␥ globin gene; originally embryonic in its expression pattern, the ␥ gene was recently recruited to be a fetal gene in anthropoid (simian) primates (5).
All simian ␥ genes studied to date are expressed fetally, whereas prosimian ␥ genes retain the embryonic pattern characteristic of other (nonprimate) eutherian mammals (4 -7). Therefore, acquisition of fetal specificity can be traced to a relatively narrow evolutionary window , after the separation of simian (catarrhine and platyrrhine) from prosimian (galagos and lemurs) primates but prior to the divergence of platyrrhine (New World Monkeys) from catarrhine (Old World Monkeys, apes, and human) primates (4,5). Two other events can be traced to this same evolutionary window: duplication of the ␥ gene and a burst of base substitutions that occurred both in the promoter region of the ␥ globin gene and in coding regions (5). Amino acid substitutions resulting from the coding region changes led to loss of 2,3-diphosphoglycerate binding ability, resulting in a fetal hemoglobin molecule that could bind oxygen with increased affinity and thus facilitate the transfer of oxygen from mother to fetus. Both the promoter and coding region base substitutions were subsequently fixed during further evolution of platyrrhine and catarrhine primates. This pattern of accelerated base substitution followed by decelerated rates of substitutions in the same regions has been considered indicative of the spreading and subsequent preservation of adaptive substitutions (8 -10).
Although these three molecular events (fetal recruitment, gene duplication, and the burst of promoter substitutions) cannot be temporally ordered based on current phylogenetic evidence, one possible scenario is that the duplication of the ␥ gene provided a redundant substrate for the accumulation of base changes that altered ␥ stage specificity (6,11). Once such a base change was collected, perhaps in the duplicate ␥ gene at greater distance from the LCR (A␥ or ␥2), they may have been selected for and subsequently transferred to the other gene (G␥ or ␥1) by gene conversion. Indeed, evidence exists for gene conversions of this polarity (11,12). Implicit in this hypothetical scenario is the assumption that cis mechanisms were responsible for the fetal recruitment of the ␥ gene, an assumption that has not been definitively tested. To examine this question directly, transgenic mice were generated in which expression of the galago ␥ (embryonic) and the human ␥ (fetal) genes could be compared. Analysis of the stage-specific expression of these two ␥ genes in the mice reveals distinctly different patterns; galago ␥ gene expression is embryonic and is silenced in the fetal liver, whereas human ␥ gene activity peaks in fetal life. cis differences in the ␥ gene fragments must therefore direct these different expression patterns.

EXPERIMENTAL PROCEDURES
DNA Fragments Used in Transgene Construction-The galago ␥ fragment used spans sequences 10508 -14995 of GenBank entry M73981 of the Galago crassicaudatis ␤ globin cluster. The human ␥ fragment corresponds to sequences 38084 -42140 of the human globin cluster (GenBank HUMHBB) and includes the A␥ 3Ј enhancer region (13). The galago ␥ gene contains several small insertions not present in the human gene, accounting for the slight size difference in these two fragments (4057 and 4487 bp for the human and galago ␥ gene, respectively). Human ⑀ sequences and HS3 sequences used for both constructs correspond to HUMHBB coordinates 3267-5172 (HS3) and 17841-21241 (⑀).
Transgenic Mice-Insert DNA was purified away from vector sequences prior to injection of the constructs for production of transgenic mice. Purified DNA fragments were microinjected into F 2 hybrid zygotes from C57BL/6J X SJL/J parents at a concentration of 2-3 ng/l. Injections were done by the Transgenic Animal Model Core in the University of Michigan Biomedical Core Research Facility. All procedures using mice were approved by the University of Michigan Committee on Use and Care of Animals, and all work was conducted in accord with the principles and procedures outlined in the National Institutes of Health Guidelines for the Use and Care of Experimental Animals. Four founder animals were identified for each construct and were mated to CD-1 females to acquire F 1 males that could be used in breeding for all experimental time points. Timed matings were done to obtain F 2 (in some cases, F 3 ) conceptuses for S1 analysis.   (7)   DNA Analysis-DNA for polymerase chain reaction and Southern analysis was purified from tails of founders or F 1 and from the heads of F 2 or F 3 fetal and embryonic conceptuses. Polymerase chain reaction primers consisted of 5Ј-AGCTGCTGCAGTCAAAGTCGAATGCAGCTG and 5Ј-TCCATCCATTTCTACCATTTCTTTCTCCTA and detected the boundary between the upstream ⑀ region and HS3. For determination of copy number and transgene integrity, Southern blots were probed with a 0.4-kb HindIII/BamHI fragment corresponding to the 5Ј end of the 1.9-kb HindIII HS3 fragment used in both constructs.
RNA Analysis-RNA was extracted from 10.5-day yolk sacs, or from fetal liver of 12.5-, 14.5-, and 16.5-day conceptuses (the morning on which the plug was detected was considered day 0.5). Tissues were dissected and immediately frozen in liquid nitrogen prior to processing. Isolation of RNA was accomplished using Trizol (Life Technologies, Inc.) according to the manufacturer's directions. RNA was quantitated spectrophotometrically and was analyzed on agarose gels to assess integrity. To quantitate mRNA levels, S1 nuclease protection was used according to published protocols (14). S1 nuclease probes for the detection of human and mouse mRNAs were kindly provided by Dr. Timothy Ley and have been described earlier (14,15). The galago ␥ S1 probe corresponded to a 435-bp XbaI/BamHI genomic fragment labeled at the BamHI site in exon 2. The protected fragment was 204 bp. Quantitation of the signals from S1 analysis was accomplished using a PhosphorImager with ImageQuant software.

RESULTS
Two related constructs (of structure HS3-⑀-␥) were introduced into transgenic mice (Fig. 1A). The degree of homology of the two ␥ fragments used is illustrated in Fig. 1B. The native galago ␤-like globin cluster contains a single ␥ gene, whereas the human cluster contains two ␥ genes (5). The human A␥ (␥2) gene, including its 3Ј enhancer (13), was used in these constructs. The galago ␥ gene contains sequences similar to the human A␥ enhancer as indicated in the matrix plot of homology shown in Fig. 1B; however, the regulatory function, if any, of this region of the galago ␥ gene has never been tested.
LCR sequences are necessary for the high level expression of human transgenes in the murine background (16), but single DNaseI hypersensitive sites (HS) within the LCR can also confer this property (17)(18)(19)(20). Because it had been demonstrated that HS3 could impart high level, copy number dependent expression to a human transgene, a 1.9-kb HindIII fragment spanning this region was included in both constructs (17,19). In addition, earlier data indicated that of all of the hypersensitive sites, HS3 may be uniquely able to drive ␥ expression in the fetal liver (20). To provide a standard against which to compare expression of the human and galago ␥ genes, the human ⑀ gene (Ϫ2000 to ϩ1780) was also included in both constructs. Earlier studies had shown that this ⑀ fragment is expressed in the embryonic yolk sac and silenced autonomously in the fetal liver (21,22).
Transgene copy number (Table I) and integrity (not shown) were assessed in Southern blots of tail DNA. For each construct, transgenic males from four independent lines were bred to obtain embryonic and fetal tissues. Table I summarizes the copy number corrected expression levels for ⑀ and ␥ transgenes in all eight lines examined relative to total mouse ␣ chains. S1 nuclease analysis of ⑀ and ␥ expression in a representative transgenic line carrying each type of ␥ gene is shown in Fig. 2.
All eight transgenic lines expressed both the ⑀ and ␥ transgenes. However, line to line variation in transgene expression level was observed, most likely due to position effects. Thus, expression was not copy number-dependent despite the fact that both constructs contained the region of HS3 recently shown to possess dominant chromatin opening function (23). Significant position effects with HS3A␥ transgenes (but missing the A␥ enhancer) have also been observed by others (24). Interestingly, all four HS3-⑀-gal ␥ lines and three of four HS3-⑀-hum ␥ lines exhibit an inverse relationship between copy number and expression (Table I). This pattern has been observed previously with HS2-containing constructs (25), but the significance of this phenomenon is presently unclear.
Although expression levels varied, patterns of transgene expression during development were highly reproducible for each gene as illustrated in Fig. 3 where expression at each time point is plotted relative to the 10.5-day expression level (which is taken as 100%). In both HS3-⑀-hum ␥ and HS3-⑀-gal ␥ lines, the human ⑀ gene was expressed at high levels in the embryonic yolk sac (day 10.5 and 12.5) and was significantly repressed in 14.5 and 16.5 day fetal livers (Figs. 2 and 3). The yolk sac portion (10.5 and 12.5 days) of the ⑀ expression curves in HS3-⑀-hum ␥ lines were somewhat more variable than those seen in HS3-⑀-gal ␥ lines. However, the well known variability in the timing of development of conceptuses even within the same litter makes it difficult to determine if these differences are significant. Nevertheless, the fetal portion (14.5 and 16.5 days) of the ⑀ expression curves was identical in mice carrying both constructs; the human ⑀ gene was silenced in fetal life.
In contrast, the two ␥ genes exhibited distinctly different expression patterns in the fetal liver (Figs. 2 and 3). The galago ␥ gene was expressed at highest levels in embryonic life and silenced along with the human ⑀ gene by 14.5 days, mimicking the embryonic pattern characteristic of the galago (5). Interestingly, the developmental expression curves for human ⑀ and galago ␥ in each line were nearly superimposed, suggesting that the two genes were coordinately silenced. In contrast, the human ␥ gene was not coordinately silenced with ⑀; rather, expression peaked in 14.5 day fetal livers and declined at 16.5 days. Although expression curves were somewhat variable in shape, considerable ␥ expression was still observed at 16.5 days, a pattern distinctly different than that seen for the galago ␥ gene.  (Table I) reveals that in HS3-⑀-hum ␥ lines, ␥ gene expression was greater than ⑀ gene expression (average 1.4-fold). This pattern (⑀ Ͻ ␥) has been seen by others when larger constructs containing the human ⑀ and ␥ genes were studied in the mouse (26,27). In contrast, in HS3-⑀-gal ␥ lines, human ⑀ expression was greater than galago ␥ expression (3.3-, 6.4-, 35.7-, and 2.8-fold for the four lines). DISCUSSION These data indicate that the characteristic embryonic expression pattern of the galago ␥ gene can be recapitulated in the transgenic mouse. It has also been demonstrated that globin genes from the chicken (28) and frog (29) are expressed in the mouse background in temporal patterns similar to those expected on the basis of in vivo patterns. Together, these studies attest to the broad evolutionary conservation of cis and trans regulators of globin gene expression.
The work presented here demonstrates that human and galago ␥ transgenes exhibit different developmental expression patterns when linked to the same portion of the LCR and when placed in the same microenvironment (mouse fetal liver). The divergent expression patterns of the two genes must therefore be due to differences in DNA sequence (cis elements) within the 4.0-kb fragment that contains the ␥ gene. Thus, fetal recruitment of the simian ␥ gene was (at least in part, see below) a cis-mediated event. Moreover, this result confirms that cis signals for stage-specific globin gene expression must reside near the genes, not within the LCR. Earlier studies of human transgene expression in the absence of LCR sequences also support this conclusion (30).
The data also eliminate distance from the LCR as a determinant per se of the differences in stage-specific gene expression of these two genes (31,32). The physical distance between the ␥ gene(s) and the LCR in the intact ␤-like globin loci of human and galago differ significantly; the single galago ␥ gene is 13.5 kb from the 3Ј end of the LCR (the HS1 core), whereas the human G␥ and A␥ genes are 21 and 26 kb away, respectively. In the constructs studied here, both ␥ genes were equidistant from HS3; however, their characteristic expression patterns were preserved. It is nevertheless possible that the increased distance of the duplicated human ␥ gene from the LCR may have played a permissive role in the initial evolution of a new fetal expression pattern (6,11).
The conclusion that a fetal liver trans environment that is permissive for ␥ expression had already evolved prior to the mammalian radiation is supported by data presented here and elsewhere (24,26,27,33). This does not imply that the mouse fetal liver environment is identical to that of the human; differences may exist in the relative balance of trans factors that would result in some distinct patterns of regulation in each species. Indeed, when the human ␥ gene is placed in the context of the entire ␤-like globin locus, it seems to be silenced at an earlier developmental time in the mouse fetal liver than in the human fetal liver (26,27). Regardless of these differences, the data presented here indicate clearly that cis differences exist between galago and human ␥ genes that result in the generation of distinct patterns of expression in the fetal mouse liver; the galago ␥ gene is silenced, whereas human ␥ gene expression peaks in this stage.
Interestingly, in several independent lines carrying the HS3-⑀-gal ␥ construct, the kinetics of galago ␥ and human ⑀ gene silencing after embryonic life were nearly identical. Such coordinate regulation could be a consequence of lineage restriction. That is, both genes may be expressed at high levels only in yolk sac derived "primitive" erythrocytes and not in fetal liverderived "definitive" erythrocytes. Whether there are actually two different stem cell lineages that contribute progeny to primitive and definitive lineages is still a matter of some debate, but recent identification of an intraembryonic source of long term repopulating hematopoietic cells suggests that this is likely (reviewed in Ref. 34). Coordinate regulation of human ⑀ and galago ␥ genes could be achieved by the presence of silencers that act on both genes in definitive cells or by the absence of primitive activators in definitive cells. Alternatively, lineage specific changes in chromatin structure may explain the coordinate silencing of these two genes. The human ␥ gene but not the human ⑀ or galago ␥ genes may contain elements that allow it to be expressed in the progressively heterochromatic environment of the definitive cell.
In order for the simian ␥ gene to complete the transition from an exclusively embryonically expressed gene (the galago ␥ pattern) to a primarily fetally expressed gene (the human ␥ pattern), a second anthropoid-specific change is required: reduction of embryonic expression levels. This could have been accomplished by cis alterations that created binding site(s) for anthropoid-specific embryonic repressor(s) or by trans changes (loss of an embryonic activator of the ␥ gene specifically in anthropoid primates). Both scenarios imply that the trans environment of the mouse yolk sac must differ from that of the human and other anthropoid primates. Because ␥ globin gene  Table I  expression has only been studied in relatively few anthropoid primates, it is possible that further analysis will reveal a species in which the ␥ gene is expressed at high levels in both embryonic and fetal life.
The constructs described here should facilitate the identification of the specific cis sequence change(s) that mediated fetal ␥ expression, information that will likely reveal the molecular mechanisms responsible for acquisition of this new temporal expression domain. Several possible mechanisms exist, and a few candidate cis elements have already been identified. First, nucleotide changes could have resulted in the loss of fetalspecific repressor binding site(s) in the ancestral simian ␥ gene; a region near the proximal CCAAT box shows anthropoidspecific base changes that reduce the binding of a complex of putative fetal repressor proteins (35). Second, base changes could have generated simian-specific activator motif(s); anthropoid-specific changes in the Ϫ1086 region alter a YY1 binding site that appears to be important for the activation of ␥ in the fetal stage. 2 Third, the gain of a binding site for a fetal stage selector protein (SSP; Ref. 36) may have given the ␥ gene a competitive edge over the ␤ gene in fetal life. In the Ϫ50 region of the human ␥ promoter, several anthropoid-specific nucleotides comprise a binding site for SSP; the SSP site is absent in the galago ␥ gene (36). Finally, fetal ␥ expression could have arisen via acquisition of a new interaction between the ␥ promoter and the LCR that is stable in the fetal stage. In this regard, it is of interest that in HS3-⑀-gal ␥ lines, ⑀Ͼ␥ and the two genes are coordinately silenced, but in HS3-⑀-hum ␥ lines, ␥ Ͼ ⑀ and silencing is not coordinate. Establishment of a strong LCR contact that is stable in fetal life would not only accomplish the fetal recruitment of ␥ but could conceivably force a delay in the expression of ␤ via competitive mechanisms. Indeed, it has been demonstrated that the galago ␤ gene is activated in early fetal life, whereas human ␤ gene activation occurs at birth (5). Identification of the exact cis sequences that mediated the different expression patterns of the ␥ genes observed in this study is likely to further our understanding of the molecular mechanisms that control the evolution of novel stage-specific expression domains as well as the regulation of hemoglobin switching.