The Pem homeobox gene. Androgen-dependent and -independent promoters and tissue-specific alternative RNA splicing.

The Pem gene encodes an atypical homeodomain protein, distantly related to Prd/Pax family members, that we demonstrate is regulated in a complex transcriptional and post-transcriptional manner. We show that the rat Pem genomic structure includes three 5′-untranslated (5′-UT) exons and four coding exons, three of which encode the homeodomain. Several alternatively spliced transcripts were identified, including one that skips an internal coding exon, enabling this mRNA to express a novel form of the Pem protein. Other alternatively spliced mRNAs were characterized that possess different 5′-UT regions, including a muscle-specific transcript. The different 5′-UT termini present in Pem transcripts conferred different levels of translatability in vitro. Two promoters containing multiple transcription initiation sites were identified: a distal promoter (Pd) in the first 5′-UT exon and a proximal promoter (Pp) located in the “intron” upstream of the first coding exon. The Pd was active in placenta, ovary, tumor cell lines, and to a lesser extent in skeletal muscle. In contrast, transcripts from the Pp were only detectable in testis and epididymis and were only expressed in epididymis in the presence of testosterone. To our knowledge no transcription factors have previously been identified that exhibit androgen-dependent expression in the epididymis.

The Pem gene encodes an atypical homeodomain protein, distantly related to Prd/Pax family members, that we demonstrate is regulated in a complex transcriptional and post-transcriptional manner. We show that the rat Pem genomic structure includes three 5-untranslated (5-UT) exons and four coding exons, three of which encode the homeodomain. Several alternatively spliced transcripts were identified, including one that skips an internal coding exon, enabling this mRNA to express a novel form of the Pem protein. Other alternatively spliced mRNAs were characterized that possess different 5-UT regions, including a muscle-specific transcript. The different 5-UT termini present in Pem transcripts conferred different levels of translatability in vitro. Two promoters containing multiple transcription initiation sites were identified: a distal promoter (P d ) in the first 5-UT exon and a proximal promoter (P p ) located in the "intron" upstream of the first coding exon. The P d was active in placenta, ovary, tumor cell lines, and to a lesser extent in skeletal muscle. In contrast, transcripts from the P p were only detectable in testis and epididymis and were only expressed in epididymis in the presence of testosterone. To our knowledge no transcription factors have previously been identified that exhibit androgen-dependent expression in the epididymis.
Androgens are of paramount importance to spermatogenesis in the testis and sperm maturation in the epididymis. Testosterone alone maintains spermatogenesis in gonadotropin-deficient animals, including hypophysectomized rats and mutant hypogonadal mice (1)(2)(3). Evidence suggests that testosterone drives spermatogenesis by acting on Sertoli cells and peritubular cells, both of which express androgen receptors (4,5). Sertoli cells perform numerous functions critical for spermatogenesis by virtue of their intimate contact with differentiating germ cells within the seminiferous tubule. Androgens have been known to be critical for epididymal function since early in this century. They regulate the proliferation and differentiation of somatic cells in the epididymis and control the micro-environment of the maturing spermatozoa by regulating the synthesis of adhesion proteins in the epididymis, the secretion of proteins into the luminal fluid that are in contact with the spermatozoa, and the transport of ions and small organic molecules across the epididymal epithelium (6,7). Secreted proteins under androgen control in the epididymis include steroidmetabolizing enzymes, polyamine synthesis enzymes, detoxification enzymes, oxidation reduction enzymes, hydrolases, and proteases (6 -8).
The transcription factors that orchestrate androgen-dependent events in the testis and epididymis have not been identified, although the androgen receptor clearly plays a major role in such responses. Homeobox transcription factors are candidates to regulate spermatogenesis and sperm maturation, since they are known to regulate many other developmental events. The distinguishing feature of homeobox proteins is a conserved DNA-binding motif 60 amino acids in length, termed a homeodomain. The homeodomain is comprised of three ␣-helices; sequence specificity is conferred by key residues in the third helix that direct binding to base contacts in the major groove of DNA. The best understood homeobox proteins are those encoded by the hox/hom, prd/pax, and POU gene families (9). Studies in Drosophila melanogaster, Xenopus laevis, and mice have shown that members of these classical homeobox gene families are required for discrete events during development. For example, studies in null mutant mice have demonstrated that the Pax-6 gene activates a regulatory cascade necessary for eye development (10), the Oct-2 POU homeobox gene promotes late stages of B-cell maturation (11), and Hox genes specify axial identity during embryogenesis (12).
Homeobox transcription factor genes have been shown to be expressed in the male reproductive system, but none have been shown to be androgen-regulated. Many of these homeobox genes are expressed in germ cells of the testis. For example, the POU homeobox gene sperm-1 is expressed transiently prior to meiosis in germ cells (13), and Hoxa-4 is expressed specifically in postmeiotic germ cells (14,15). Hoxb-4 is expressed by both germ cells and somatic cells in the testis, while Hoxd-4 is expressed by Leydig cells (15). Both Hoxb-4 and Hoxd-4 are expressed by other adult organs besides testis (16,17). Little is known about the expression pattern of transcription factors in the epididymis. To our knowledge the only transcription factor genes identified as expressed in the epididymis are the homeobox gene Pax-2 (18) and the ETS-like transcription factor PEA3 (19).
In a search for developmentally regulated genes, we used the subtraction hybridization technique to isolate several cDNAs corresponding to novel genes (20), including the homeobox gene, Pem (21,22). The Pem homeodomain shares modest sequence identity with prd/pax homeodomains (22, 23), but its primary amino acid sequence is sufficiently unique to warrant classification in a different subfamily. The Pem gene is expressed in a unique pattern during embryogenesis. Tissues that contribute to the extraembryonic compartment express mouse Pem; the gene remains highly expressed in the placenta and yolk sac until term (21,24). The in vivo expression pattern of Pem in endodermal tissue is mimicked in pluripotent stem cell lines that differentiate in vitro; F9 embryonal carcinoma stem cells induced to differentiate into either the visceral or endodermal lineage up-regulate Pem mRNA expression (22) and accumulate Pem protein specifically in the outer layer of cells that possess characteristics of extraembryonic endoderm (24,25). Pem gene expression is also dramatically up-regulated in normal diploid embryonic stem cells induced to differentiate in vitro (22), although it is not known which specific differentiated cell types activate Pem gene expression.
In this communication, we report that Pem gene expression is not restricted to embryogenesis. We show that in prepubertal and adult rats, the Pem gene is specifically transcribed in both male and female reproductive tissue and to a lesser extent in skeletal muscle. Transcript analyses revealed that Pem transcripts are derived from two promoters and undergo complex alternative splicing events that are regulated in a tissue-specific manner. The alternative splicing events alter both the 5Ј-UT 1 and coding regions of Pem mRNA. We demonstrate androgen-dependent expression of Pem transcripts from the promoter used exclusively by male reproductive tissue. The complex and androgen-dependent pattern of Pem expression by alternative promoter usage and alternative splicing has important implications for the possible role of the Pem homeobox gene in development.

Isolation and Characterization of Rat Pem cDNA and Genomic
Clones-To obtain rat Pem cDNA clones, a mouse Pem cDNA probe was used to screen 5 ϫ 10 5 plaques (26) from a Rat-1 fibroblast cDNA library in ZAP (kindly provided by David Pribnow). Southern blot analysis revealed that 31 independent Rat-1 cDNA clones contained inserts that strongly hybridized with the mouse Pem probe. Many of the cDNA clones were sequenced at their 5Ј termini to determine the approximate site of transcriptional initiation (see "Results"). DNA sequence analysis was performed by the dideoxy method according to the manufacturer's instructions (U.S. Biochemical Corp.).
Three genomic libraries and a P1 rat genomic library were screened to obtain rat Pem genomic clones that correspond to the rat Pem cDNAs that we had isolated. We were not able to isolate any clones for a functional rat Pem gene, but instead isolated several independent copies of a rat Pem pseudogene, and thus we concluded that sequences within or near the functional rat Pem gene might interfere with proportional representation in genomic libraries. Thus, we used the "long PCR" approach as an alternative strategy to obtain rat Pem genomic clones. PCR was performed with rat genomic DNA as a template for 35 cycles with the XL-PCR kit according to the manufacturer's instructions (Perkin-Elmer) under the following conditions for each segment: 94°C for 1 min, 52°C for 2 min, and 68°C for 7 min. Oligonucleotides (oligos) corresponding to middle (primer A) and 3Ј (primer B) portions of the rat Pem cDNA were used for PCR amplification (see Fig. 1 for location of primers). A 3.5-kilobase fragment was amplified, subcloned, and sequenced. The exon sequences in clone 1 were identical to the sequences present in the rat Pem cDNA clones. Next, primers corresponding to 5Ј (oligo C) and middle (oligo D) cDNA sequences were used for long PCR amplification. Sequence analysis of the resulting fragment (clone 2) revealed that it overlapped clone 1 by 57 nucleotides, as expected. Together, clones 1 and 2 comprised the entire rat Pem coding region and all intervening introns.
The genomic organization of rat Pem was confirmed by analysis of two other subcloned PCR products (clones 3 and 4) derived from other primer sets. Clone 3 was generated using an oligonucleotide that included the start codon ATG (oligo E) in combination with oligo B. This 4-kilobase PCR product was subcloned and shown to possess restriction sites predicted from sequence analysis of clones 1 and 2. Clone 4 was a 5Ј PCR product that was generated using an intron 1 oligo (oligo F) that bound fortuitously to a region in the the third exon. Sequence analysis of this 0.5-kilobase PCR product showed that it contained the sequences predicted from clone 2.
To obtain sequences upstream of exon 2, we employed the method of Siebert et al. (27). The "Rat Promoter Finder Kit" used for this upstream walking (a generous gift from Clontech) was used according to the manufacturer's instructions, except that the rTth DNA polymerase XL (from Perkin-Elmer) was used in place of the enzyme provided by this kit. The primary PCR was performed with primer AP1 and oligo F for 7 cycles under the following cycle parameters: 94°C for 5 s and 68°C for 4 min. The secondary PCR was done with primer AP2 and oligo G for 40 cycles under the following cycle parameters: 94°C for 5 s and 63°C for 4 min for 40 cycles. Sequence analysis of the subcloned PCR product (clone 5) showed it contained an intron (IVS1) and exon sequences (exons 1 and 2) identical to known rat Pem cDNA sequences.
RNA Isolation and RNase Protection Analysis-Total RNA from tissues was prepared as described previously by either guanidinium isothiocyanate lysis and centrifugation over a cesium chloride cushion (28) or by a single-step acid guanidinium thiocyanate/phenol/chloroform extraction (29). For RNase protection analysis, we prepared [ 32 P]UTPlabeled RNA probes with T7, T3, or SP6 RNA polymerase. The probes used for RNase protection analysis contained the following exon and intron sequences: In some experiments, a glyceraldehyde-3-phosphate dehydrogenase probe was included in the annealing reaction as a positive control. For probe synthesis, we used the in vitro transcription protocol as described (30). Probes were purified in a 8 M urea, 6% polyacrylamide denaturing gel. After exposure to film, the appropriately sized bands were excised from the gel and placed in individual Eppendorf tubes. The gel slices were mashed with an RNase-free pestle in 100 l of diethylpyrocarbonate-treated water. To each sample, 600 l of proteinase K-containing solution (0.3 M NaCl, 0.5% SDS, 10 mM Tris (pH 7.5), 200 g/ml proteinase K, and 20 g/ml tRNA) was added, vortexed, and incubated at 37°C for 15 min. After vortexing and pulse-spin, the suspended probe was filtered through a 0.45-m filter (Acrodisc), followed by passing 200 l more proteinase K solution through the filter to increase the recovery of probe. Each sample was then extracted with 200 l of phenol/chloroform. One microliter was used to determine radioactive counts per minute, and the rest was ethanol-precipitated and stored at Ϫ70°C.
RNase protection analyses were performed as described (30), with minor modifications. Briefly, sample RNA or tRNA (negative control) was co-precipitated with the appropriate gel-purified [ 32 P]UTP-labeled probes. The pellet was resuspended in 30 l of annealing buffer (40 mM PIPES (pH 6.4) 0.4 M NaCl, 1 mM EDTA, 80% formamide) and allowed to hybridize overnight at 44°C. Unhybridized RNA was digested with RNase A and RNase T1 for 20 min at 37°C at concentrations of 25 g/ml and 5 g/ml, respectively, unless otherwise noted. RNases were then removed by treatment with proteinase K and extraction with phenol/chloroform/isoamyl alcohol. After ethanol precipitation, the RNA pellet was resuspended in 90% formamide loading buffer, denatured at 85°C, electrophoresed in a 8 M urea, 6% polyacrylamide gel. A set of RNA size markers generated from the Century ladder template (Ambion) was included in all gels.
Primer Extension Analysis-Primer extension was carried out essentially as described by McKnight et al. (31) using total cellular RNA (30 g) from rat placenta and 32 P-end-labeled primers. The labeled oligo (2 ng) and the RNA mixture were ethanol-precipitated and resuspended in a total volume of 18 l of water and incubated on ice for 5 min with intermittent vortexing, followed by the addition of 2 l of 10 ϫ annealing buffer (3 M NaCl, 0.4 M Tricine (pH 8.0) and 1 mM EDTA), brief vortexing, and incubation at 65°C for 10 min. Following the annealing reaction, the tubes were transferred to the temperature of the extension reaction (42, 46, or 52°C). To each tube the following was added: 4 l of 10 ϫ extension buffer (1 M Tris [pH 8.3], 120 mM MgCl 2 , 100 mM dithiothreitol), 0.8 l of 25 mM dNTPs, 14 l of double-distilled H 2 O, and 5 units of either avian myeloblastosis virus (for 42°C) or Moloney murine leukemia virus reverse transcriptase (for 46 and 52°C). After a 60-min extension reaction, the template RNA was degraded by incubation with 1 l of RNase A (stock 10 mg/ml) at 37°C for 1 h. The sample was then precipitated by the addition of 132 l of stop mixture (2.5 M NH 4 OAC, 10 mM EDTA) and 500 l of ethanol. The products were separated by electrophoresis in an 8 M urea, 6% polyacrylamide gel.
Reverse Transcriptase-PCR (RT-PCR) and 5Ј-Rapid Amplification of cDNA Ends (5Ј-RACE)-RT-PCR was performed as described (32) using total cellular RNA (1 g) from adult rat epididymis and skeletal muscle. 5Ј-RACE was performed according to the manufacturer's instructions (Life Technologies, Inc.). In brief, cDNA was generated using a primer complementary to a region within exon 4, the cDNA was dC-tailed with terminal transferase, and then PCR was performed using oligo D and an "anchor primer" complementary to the C-tail.
In Vitro Transcription and Translation-Three 147C3-based plasmids were prepared that contained precisely the same rat Pem open reading frame preceded by the 5Ј-UT region in A-, M-, and T-transcripts (Fig. 2). The length of the Pem 5Ј-UT region in the A-, M-, and Ttranscripts was 93 nt, 138 nt, and 109 nt, respectively. The plasmids were linearized with HindIII, and RNA was synthesized in vitro according to the manufacturer's instructions (Promega Corp.). The RNA was translated in vitro using [ 35 S]methionine and reticulocyte lysates in a 25-l reaction for 1 h at 30°C according to the manufacturer's instructions (Promega), and the products were analyzed by SDS-polyacrylamide gel electrophoresis.
Animals-Untreated, sham-operated, and hypophysectomized Sprague-Dawley rats were obtained from Charles River Laboratories. Animals were housed in the Oregon Health Sciences University animal care facility and cared for according to approved protocols. Hypophysectomized animals received 5% glucose water ad libitum. Animals were killed by CO 2 asphyxiation, and organs were immediately re- moved, homogenized, and frozen at Ϫ70°C until RNA was extracted. The effectiveness of hypophysectomy was determined by assessing testosterone levels in serum with a standard chromatographic procedure (33). For the testosterone implant experiments, hypophysectomized rats (12 days post-treatment) were anesthetized at one atmosphere isofluorane and the implants (silastic tubing 3 cm long filled with testosterone proprionate) were placed subcutaneously along the upper back and neck in collaboration with Dr. John Resko (Oregon Health Sciences University), who has shown that these implants generate a serum concentration of 4 ng/ml testosterone. The levels of testosterone and dihydrotestosterone (DHT) were determined in contralateral epididymides (weighed, homogenized in phosphate-buffered saline, and frozen at Ϫ70°C until analysis). The sham-operated animals had Ͼ6 pg of DHT/mg of tissue, hypophysectomized animals had 1 pg of DHT/mg of tissue, and all testosterone-treated animals had 3-4 pg of DHT/mg of tissue (assayed 2-8 days after introduction of the implants). All androgen assays were done in the laboratory of Dr. David Hess at the Oregon Regional Primate Research Center (Beaverton, OR).

RESULTS
The Pem Gene-As described under "Materials and Methods," use of the long PCR method permitted isolation of several overlapping DNA fragments that corresponded to the entire rat Pem gene. Sequence analysis and comparison of these sequences with the known mouse and rat Pem cDNA sequences (Ref. 22 and see below) allowed us to deduce the genomic organization of the Pem gene ( Figs. 1 and 2). The exon/intron splice junctions in the Pem gene conform to the consensus sequences for 5Ј (CAGGTRAGT) and 3Ј (Y n NYAG) splice sites (the invariant dinucleotides at the termini of the intron consensus sequences are underlined). The homeodomain region of the Pem gene is interrupted by two introns, positioned precisely in the same location as in the D. melanogaster prd class homeobox gene aristaless (al) (34). The location of the second intron interrupting the Pem homeodomain (IVS5) is identical to the location of the intron in the homeodomain region of several other prd/pax class homeobox genes, including the gsc, S8, otx, unc-4, ceh-8, and ceh-10 genes, but is in a different position than the introns in most other known homeobox genes (9). This provides further support that the Pem gene is a distant relative of the Prd/Pax homeobox gene sub-family. We previously showed that the Pem homeodomain exhibits up to 35% sequence identity with prd/pax family member homeodomains (35).
In the studies described below, we demonstrate the usage of two promoter regions in the Pem gene and show that transcripts derived from these promoters undergo alternative splicing events. Fig. 2 summarizes the results of these studies. Below, we will first provide evidence for the existence of the proximal promoter (P p ) and then show its androgen-dependent regulation and developmental expression pattern in male reproductive tissue. Then we will define and analyze the distal promoter (P d ), which is primarily expressed in ovary and placenta, and to a lesser extent in male reproductive tissue and skeletal muscle. Throughout the analysis, we also make use of two cell lines that transcribe the Pem gene from the P d : the Rat-1 immortalized fibroblast cell line and the MCA8994 rat hepatoma cell line. We previously showed that immortalized and tumor cell lines from multiple cell lineages express the Pem gene (21).
A Proximal Promoter Active in Epididymis and Testis-Pem transcripts derived from the P p were first identified from epididymis RNA by the PCR-based approach 5Ј-RACE. Sequencing of subcloned 5Ј-RACE products revealed that the 5Ј termini extended to several sites within the intron upstream of exon 3 (positions Ϫ59, Ϫ86, Ϫ94, Ϫ117, and Ϫ125 nt in Fig. 1). The different lengths of these termini could be due to multiple transcription start sites within the P p , or they could have resulted from incomplete cDNA synthesis by reverse transcriptase.
RNase protection analysis was employed to determine whether multiple transcription start sites were present in the P p . Fig. 3 shows the results with probe A, which contains the P p region. Epididymis RNA protected four major fragments (bands 1-4) that correspond to transcriptional initiation sites at the following approximate positions relative to the initiator ATG: Ϫ126, Ϫ109, Ϫ75, and Ϫ68. Different ribonuclease A and T 1 concentrations (over a 4-fold range) did not affect the migration of bands 1-4 (data not shown), so these multiple bands do not represent partial ribonuclease cleavage fragments, and instead correspond to multiple sites of transcriptional initiation. Use of multiple transcriptional initiation sites is typical for mammalian promoters that lack an upstream TATA box, and indeed, no TATA box is present upstream of the initiation sites in the P p (Fig. 1). In order to determine if any other transcription start sites are used in epididymis further upstream within the intron, we also used a probe that contained all of intron 2 (probe B). No additional bands were obtained with this probe (data not shown), indicating that there are no other transcription start sites within the P p .
The transcription initiation sites within the P p that we defined in epididymis were also used in testis (see below) but were not active in placenta, ovary, skeletal muscle, or two immortalized cell lines that express the Pem gene, Rat-1 and MCA8994. Instead, these tissues and cell lines expressed P dderived transcripts, and thus they protected band 5 after an- FIG. 4. Pem regulation in the epididymis: androgen-dependent expression from the P p . A, schematic diagram showing the region encompassed by probe C and the regions of the probe protected by RNA from different cellular sources. B, RNase protection analysis using probe C and total cellular RNA from MCA8994 cells (5 g) or epididymides (20 g) obtained from hypophysectomized and sham-treated rats as described under "Materials and Methods." The MCA8994 rat hepatoma cell line, like Rat-1 cells, serves as a positive control for P d -derived transcripts (band 2); the origin of the minor bands above and below band 2 are not known. Band 1 encompasses ϳ300 -350-nt fragments (derived from multiple transcription start sites in the P p ) and band 2 is ϳ220 nt. C, RNase protection analysis performed as in panel B except that a glyceraldehyde-3-phosphate dehydrogenase probe was also included in each annealing reaction to show the amount of RNA annealed and loaded. nealing with probe A (Fig. 3B and data not shown). Placental RNA not only protected band 5 but also a much less abundant fragment that was slightly larger than band 1. The origin of this band is not known; it does not appear to represent transcription from the P p region, since another probe that included the P p region (probe C) failed to detect P p -derived transcripts in placenta (data not shown).
The Proximal Promoter Is Androgen-dependent-Since the functional competence of the epididymis depends on the presence of androgens (6 -8), we tested whether transcripts derived from the P p depended on testosterone for expression. A probe that included the P p region (probe C) was used for this analysis (Fig. 4A). Hypophysectomy caused a precipitous drop in the levels of P p -derived transcripts (Fig. 4, B and C, band 1 in lanes labeled HPX), whereas animals that underwent sham treatment maintained P p -derived transcript expression in the epididymis. Introduction of exogenous testosterone in hypophysectomized animals restored expression of these transcripts (lanes labeled HPX1ϩT and HPX2ϩT in Fig. 4C represent two different animals). Testosterone induced expression from the P p after 2 days of treatment (Fig. 4C); expression was maintained for at least 8 days in other animals tested (data not shown). These mRNAs derived from the P p were designated "T-transcripts" (Fig. 2), since they are inducible by testosterone.
Developmental Shift from the Distal Promoter to the Testosterone-dependent Promoter-Testis accumulated three major Pem mRNAs represented by bands 1, 2, and 3 in Fig. 5B. Band 1 corresponds to A-transcript, derived from the P d , the predominant transcript also expressed by placenta, ovary, and muscle (Fig. 2, and see below). Band 2 represents an mRNA that also appears to be transcribed from the P d , but its size suggests that it may have undergone an alternative splicing event between exons 1 and 2 (although other explanations are possible). Band 3 represents T-transcripts derived from the P p .
The ratio of transcripts derived from the P p and P d varied at different developmental stages in the testis. At early time points after birth (days 5-33), the predominant transcripts were derived from the P d (Fig. 5B, bands 1 and 2). T-transcripts (band 3) became more prominent by day 44 and later accumulated to similar levels as P d -derived transcripts (days 78 and 104). The developmental shift observed with a P d probe was confirmed by analyses with a P p probe (compare the ratio of band 1 with band 2 on day 21 with day 78 (Fig. 5D)). Note that the epididymis expressed much higher levels of T-transcripts than did the testis: even young animals (23 and 30 days old) accumulated high levels of T-transcripts in epididymis (Fig. 5D).
Multiple Transcription Initiation Sites Used by the Distal Promoter-The sites of transcriptional initiation from the P d were analyzed by three different approaches: 1) sequence analysis of the 5Ј termini of cDNA clones; 2) RNase protection analysis; and 3) primer extension analysis. The Rat-1 fibroblast cell line was chosen for cDNA library screening since it expresses high levels of Pem transcripts, as do many other immortalized and tumor cell lines (21). Examination of the 5Ј termini of 15 independent Rat-1 Pem cDNA clones showed a range of termini that were clustered in exon 1 (5-29 nt of exon 1 were included in these cDNA clones).
RNase protection analysis was performed with a probe complementary with this putative promoter region in exon 1 (probe E; Fig. 6A). This analysis was done with RNA from the Rat-1 cell line and placental tissue, since they both express high FIG. 5. Developmental regulation of P p and P d usage in the testis. A, schematic diagram showing the region encompassed by probe D and the regions of the probe protected by the three transcripts expressed by testis. The testis-specific transcript (middle mRNA shown) is predicted to be generated by alternative splicing of the first intron (depicted by the dotted line) based on the size of the protected band (labeled 2), but this was not determined definitively. B, RNase protection analyses using probe D and total cellular RNA from epididymis (30 g) or testis (20 g) derived from rats of the ages shown. Note that the autoradiographic exposure time for the epididymis lane was 10 times shorter than for testis. When exposed for an equal time, it was evident that the level of P d -derived transcripts (band 1) is similar in epididymis and testis. Band 1 is ϳ300 nt, band 2 is ϳ270 nt, and band 3 is ϳ205 nt. C, schematic diagram showing the region encompassed by probe C and the regions of the probe protected by Pem transcripts in testis. D, RNase protection analysis using probe C and total cellular RNA from Rat-1 cells (5 g), epididymis (15 g), testis (20 and 50 g for left and right panels, respectively), or tRNA (20 g). Epididymides and testes were derived from animals of the ages shown. Band 1 encompasses ϳ300 -350-nt fragments (derived from multiple transcription start sites in the P p ) and band 2 is ϳ220 nt. levels of Pem transcripts from the P d . Fig. 6B shows that multiple bands were protected by RNA from the Rat-1 cell line and placenta. The lengths of the major bands (labeled 1-5) correspond to transcription start sites at positions between Ϫ40 and Ϫ18 relative to the exon 1/intron 1 border. The multiple bands did not represent partial RNase cleavage, since alteration of RNase A and T 1 concentrations did not alter any of the major bands, although two new bands appeared at higher RNase concentrations (Fig. 6C). The existence of multiple transcription initiation sites at these positions was also shown with two other probes that include this putative promoter region (probes F and G; see "Materials and Methods"). Analyses with probes F and G showed that ovary and testis contained Pem mRNA with the same 5Ј termini as did placenta and Rat-1 cells (data not shown).
Primer extension analysis was performed to determine definitively whether transcription initiated at several sites in exon 1, as suggested by RNase protection analysis, or whether transcription initiated from an exon further upstream than exon 1. Fig. 6D shows that primer extension analysis with oligonucleotide H yielded products that indicated transcription initiates at several sites between Ϫ40 and Ϫ22 relative to the exon 1/intron 1 border. This is in agreement with the transcription initiation sites predicted by RNase protection analysis (bands 1-4). Similar results were obtained when extension by reverse transcriptase was performed at 42, 46, or 52°C or when another oligo was used for the analysis (oligo G; data not shown). We conclude that multiple transcription initiation sites span a region of approximately 20 nt in exon 1 of the rat Pem gene. These initiation sites are used in placenta, ovary, and testis, resulting in the generation of Pem transcripts possessing 5Ј termini of slightly different lengths. No TATA box is present upstream of the multiple start sites in the P d (Fig. 1).
A 5Ј-UT Exon Uniquely Included in Skeletal Muscle-RNase protection analysis revealed that the P d was not only transcriptionally active in reproductive tissue but also in skeletal muscle. We examined the 5Ј termini of muscle Pem transcripts by the 5Ј-RACE method. Sequence analysis of the subcloned 5Ј RACE products from muscle showed that the 5Ј termini of these products was in the P d in exon 1. Surprisingly, two of the six subcloned PCR products possessed a novel 45-nt sequence inserted in the 5Ј-UT region between exons 1 and 2. This sequence corresponded to a 45-nt sequence that we identified in genomic DNA between exons 1 and 2 that is flanked by canonical 5Ј and 3Ј splice sites (Figs. 1 and 2). Thus, this novel sequence is an alternatively spliced exon (termed the M exon) that is included in Pem transcripts in skeletal muscle (Fig. 2). The M exon possesses two initiator AUG codons (Fig. 1). The first AUG is followed by a termination codon five codons downstream. The second AUG is in-frame with the Pem protein reading frame and would dictate a seven-amino acid N-terminal extension to the Pem protein. However, the sequences surrounding both AUGs exhibit poor matches with the Kozak consensus sequence GCCRCCAUGG (36). They possess neither the critically important purine at position Ϫ3 nor the G at position ϩ4.
To examine the regulation of M exon inclusion, a Pem probe containing the M exon (probe G) was prepared for RNase protection analysis (Fig. 7A). Fig. 7B shows that skeletal muscle RNA protected two bands (1 and 2), which represent M exon ϩ and M exon Ϫ transcripts, respectively. It is not known if the M exon ϩ transcripts are present in a specific subpopulation of cells in muscle or if they are present in all cell types in this tissue. No difference in the relative expression levels of the M ϩ and M Ϫ transcripts was noted in skeletal muscle from male and female animals (Fig. 7B). In contrast to skeletal muscle, we could not detect M ϩ transcripts in any other tissues or cell lines tested, including placenta, epididymis, testis, ovary, and Rat-1 cells (Fig. 7B and data not shown). Thus, the inclusion of the M exon by alternative splicing appears to be regulated in a tissuespecific manner.
Alternative Splicing of 5Ј-UT Exons in Placenta and Rat-1 Cells-Further Pem alternatively spliced mRNAs were revealed from sequence analysis of Rat-1 cDNA clones. Although most Rat-1 cDNA clones corresponded to A-transcript (Fig. 2), two of the cDNA clones were derived from alternatively spliced mRNAs, termed B-and C-transcripts (Fig. 2). B-transcript is derived by an alternative splice acceptor in IVS1 (Fig. 1), which results in the inclusion of 38 nt from IVS1. RNase protection analysis with a probe prepared from this variant cDNA (probe H) showed that the B-transcript was expressed at about 5-fold lower levels than the A-transcript in placenta and the Rat-1 and MCA8994 cell lines (data not shown). The C-transcript was derived by use of an alternative splice acceptor in IVS2 (Fig. 1), resulting in the inclusion of 61 nt from IVS2. RNase protection analysis showed that Rat-1 cells and placenta expressed Ctranscript (Fig. 7, C and D). The B-and C transcripts encode the same Pem protein as the more abundant A-transcript, since the inclusion of additional 5Ј-UT sequences in the B-and Ctranscripts did not introduce an initiator AUG upstream of the initiator AUG in exon 3. The regulatory significance of the alternative splice acceptors in IVS1 and IVS2 is not known.
An Exon-skipped Transcript Encodes a Novel Protein-Transcripts that skip exon 4 (⌬E4 transcripts) were revealed by sequence analysis of RT-PCR products generated from epididymis RNA. These ⌬E4 transcripts originated from either the P p or the P d , as shown by sequence analysis of subcloned epididymal RT-PCR products generated using any of the following FIG. 6. Multiple transcriptional initiation sites in the distal promoter (P d ). A, schematic diagram of the first three exons in mature Pem mRNA, the region encompassed by the probe used for RNase protection analysis, and the approximate location of transcription initiation sites. B, RNase protection analysis of total cellular RNA (20 g) or tRNA (20 g) annealed with probe E (contains 49 nt of exon 1), followed by incubation with RNase A (25 g/ml) and RNase T 1 (4 g/ml) as described under "Materials and Methods." The average lengths of bands 1-5, as determined in three independent gels, were 110, 113, 119, 123, and 131 nt, respectively. C, as in panel B except that 3 g of placental RNA was annealed, and the concentrations of RNase used were as shown. D, primer extension analysis of total cellular placental RNA (30 g) using primer H. The nucleotide positions of the extension products (relative to the exon 1/intron 1 border) were determined by their migration relative to sequencing ladders in the adjacent lanes. Extension reactions performed in parallel with tRNA gave no visible products at these positions.
oligo combinations: C ϩ B, J ϩ I, or K ϩ L (Fig. 1). RNase protection analysis with a probe that spanned exon 4 and the adjacent exons (probe J; Fig. 8A) showed that ⌬E4 transcripts were not only expressed in epididymis but also in placenta and Rat-1 cells (Fig. 8B). In vitro translation of the ⌬E4 transcript generated a smaller protein (Pem-E) than that translated from a normally spliced Pem transcript (Fig. 8C). Pem-E shares the first 26 amino acids of the amino terminus with the known Pem protein but contains 55 novel amino acids in the carboxyl terminus (Fig. 8D). Thus, Pem-E would lack the homeodomain (present in the carboxyl region of Pem) but would contain instead the most highly conserved region of the Pem protein that is present in the amino terminus (35).
Translational Regulation by Alternative 5Ј-UT Regions-We compared the translatability of three alternatively spliced Pem transcripts that differed only in their 5Ј-UT region: 1) A-transcript; 2) M-transcript expressed exclusively in muscle; and 3) T-transcript expressed exclusively in testis and epididymis. The three different RNAs were generated in vitro and quantitated by both optical density and by visual inspection of 2-fold serial dilutions in agarose gels, and then equal amounts of the RNAs were translated in vitro using reticulocyte lysates. Multiple experiments with independent RNA samples demonstrated that T-transcript was translated less efficiently than A-transcript (4-fold lower average translation rate; Fig. 9). M-transcript was also translated less efficiently than A-transcript, although the average reduction was only 2-fold. We conclude that unique sequences present in the 5Ј-UT regions of these three Pem transcripts can alter the rate of translation in vitro.

DISCUSSION
In this report, we characterized the genomic structure of the Pem gene, defined two promoters used in a tissue-specific manner, and demonstrated that Pem transcripts undergo alternative RNA splicing events. We showed that an androgen-dependent promoter, P p , is used exclusively in male reproductive tissue, while the other promoter, P d , is expressed in female reproductive tissue and at low levels in skeletal muscle (Fig. 2). We found that several different modes of splicing regulation are exerted on Pem transcripts: 1) alternative exon inclusion; 2) alternative exon skipping; and 3) alternative splice acceptor usage (Fig. 2).
We showed that mRNAs transcribed from the P p (T-transcripts) require androgens for expression (Fig. 4), which is likely to explain why the P p promoter is active in testis and epididymis, and is not used detectably in placenta, muscle, or ovary. The temporal pattern of T-transcript expression during development differed in testis and epididymis. In prepubertal animals, T-transcript levels were very high in epididymis but low in testis. T-transcript levels were high as early as day 23 post partum in epididymis, while in testis T-transcripts remained barely detectable until day 44 and only reached levels similar to that of P d -derived transcripts at later developmental times (Fig. 5). The explanation for why T-transcripts are regulated differently in testis and epididymis is not known. Since we showed that T-transcript expression in epididymis requires testosterone, it is likely that the early postnatal expression of this transcript in epididymis is due to the known presence of androgens in the lumen of the epididymis at this developmen- tal time point (37,38). Less clear is why T-transcript levels are so low in testis. Expression of T-transcripts in the testis in vivo requires testosterone, based on experiments in EDS-treated rats, 2 but the available androgens in the testis may be insufficient to trigger strong expression. Androgen-binding protein, which is secreted by Sertoli cells and considered to act as an "androgen sink" in the testis and as an androgen-carrier protein to the epididymis, is known to be expressed at very high levels in rats (50 -100-fold higher than in mice; Ref. 39) and thus may depress P p transcription in rat testis. The specific androgens that are present in the testis at different developmental time points may also influence Pem expression. For example, the decline in the intratesticular levels of 5␣-reductase after day 40 (40) would cause a switch in the ratio of dihydrotestosterone to testosterone and thus may influence transcription from the P p .
The Pem gene is unusual in that it contains two promoter regions and three 5Ј-UT exons upstream of the coding region. As a result of alternative promoter usage and alternative splicing of these 5Ј-UT exons, Pem transcripts possess at least five different 5Ј termini (the number of variants is even greater if one considers the multiple transcriptional initiation sites within both the P p and the P d ). Since the use of some of these different 5Ј-UT sequences is regulated in a tissue-or androgen-dependent manner, it is tempting to speculate that they function in a regulatory capacity. We tested the effect of different Pem 5Ј-UT termini on translatability in vitro (Fig. 9) and found that Pem transcripts from the P p (T-transcripts) were translated less efficiently than transcripts from the P d (A-transcripts). Perhaps the male reproductive cell types that express P p -derived transcripts down-regulate the level of Pem protein that is translated because deleterious effects would be caused by Pem protein overexpression. In skeletal muscle, a unique 5Ј-UT exon (the M exon) is included in Pem transcripts that is excluded in all other tissues (Fig. 7). We found that inclusion of the M exon depressed translation somewhat (Fig. 9), but since the effect was not dramatic, the M exon may regulate events other than translation. For example, it is known that 5Ј-UT sequences can regulate mRNA stability (45). It will be of interest to determine if the M exon plays a role in the dramatic induction of Pem transcripts in 10T 1 ⁄2 mesenchymal stem cells when they commit to the muscle cell lineage (22). Secondary structure analysis by computer suggested that the different 5Ј termini present in A-, M-, and T-transcripts possess different secondary structures that may be responsible for the different rates of translation. 3 Tissue-specific factors may be present in vivo that differentially bind to these secondary structure regions and thereby regulate the translation rate of Pem mRNAs that possess these different 5Ј termini. Although many studies have demonstrated that translation is highly regulated in germ cells (46,47), little is known about translational regulation in somatic cells of the testis and epididymis, where Pem transcripts are expressed. 4 We identified an alternatively spliced transcript (⌬E4) that encodes a novel form of the Pem protein (Pem-E). Many transcription factors, including homeobox transcription factors, are known to be expressed as multiple isoforms as a result of tissue-specific alternative RNA splicing (41). Pem-E shares the amino terminus with the classical Pem protein but lacks the homeodomain and thus may not bind DNA. Since the aminoterminal region of Pem is, by far, the most conserved region of this protein based on comparison of the primary amino acid sequence of mouse and rat Pem (35), this region may possess functional attributes. For example, the amino-terminal region may serve as a binding interface that permits Pem and Pem-E to bind other proteins. Many homeobox proteins have been shown to use amino acids outside of the homeodomain to interact with other transcription factors, including other homeobox proteins (9,42,43). The importance of regions outside of the homeodomain for biological function is underscored by a recent study showing that a mutant Ftz protein completely lacking the homeodomain correctly regulates downstream target genes in vivo, probably because it is still able to bind to other transcription factors (44). Pem-E may act as an inhibitor protein that competes with classical Pem for interaction with another transcription factor, but by virtue of its inability to bind to DNA, it would exert a dominant negative effect. By analogy, the Id inhibitor protein possesses a helix-loop-helix motif and thus can dimerize with other helix-loop-helix proteins, such as myoD, but because Id lacks a DNA-binding domain it prevents these interacting helix-loop-helix proteins from activating the transcription of downstream target genes (41).
Most known examples of alternative transcriptional and posttranscriptional events in male reproductive tissue are known to occur in the germ cells (47). For example, the c-mos, c-abl, pim-1, cytochrome c, cyclin D3, superoxide dismutase, hoxa-4, proopiomelanocortin, and SRY genes use alternative promoters in germ cells of the testis that differ from the promoters used in somatic cells (48 -55). One hypothesis to explain the common usage of alternative promoters in germ cells is that it results from the changes in the chromatin structure needed to produce spermatozoa. This restructuring would not occur in somatic cells of the testis and epididymis, and thus transcriptional regulation unique to these tissues is not necessarily expected. The Pem gene is expressed by somatic cells of the testis and epididymis, 4 and thus it will be of interest to determine how and why it is regulated in such a complex manner at both the transcriptional and post-transcriptional level.
Since the Pem gene encodes a homeodomain-containing protein, it is reasonable to suppose that the Pem protein is a transcription factor that regulates specific events during male gametogenesis. The finding that the Pem gene depends on androgen for expression in the epididymis suggests that, in turn, Pem may regulate androgen-dependent events in the epididymis. To our knowledge no transcription factors have previously been shown to be androgen-regulated in the epididymis. The homeobox transcription factor, Pax-2, is clearly regulated by a distinct mechanism, since it is expressed in the epididymis of tfm mice, which lack androgen receptors (18). Several candidate downstream genes are known to require androgens for expression in the epididymis (directly or indirectly) and thus may be regulated by Pem, including those encoding 5␣-reductase, carboxypeptidase metalloprotein D/E (AEG, CRISP-1), the retinol binding protein B/C (ESPI), the glutathione peroxidase-like protein GPX, the glutamyltranspeptidase GGT, nerve growth factor, and E-cadherin (7,8). The epididymis has multiple functions, many of which depend on the presence of androgens and thus may be regulated by Pem: (i) induction of spermatozoa motility capability, (ii) spermatozoa membrane alterations that permit fertilization competence, (iii) changes in the methylation status of spermatozoa genes, and (iv) spermatozoa storage (6 -8, 56). Since the Pem gene is specifically expressed in the distal corpus/proximal cauda portion of the epididymis, 4 the site where spermatozoa gain motility capability and membrane alterations necessary for fertilization competence (57,58), it will be of interest to determine whether Pem regulates these final maturation events.