Complex Transcription and Splicing of Odorant Receptor Genes*

Human major histocompatibility (human leucocyte antigen (HLA)) complex-linked odorant receptor (OR) genes are among the best characterized OR genes in the human genome. In addition to their functions as odorant receptors in olfactory epithelium, they have been suggested to play a role in the fertilization process. Here, we report the first in-depth analysis of their expression and regulation within testicular tissue. Sixteen HLA-linked OR and three non-HLA-linked OR were analyzed. One OR gene ( hs6M1-16 , in positive transcriptional orientation) exhibited six different transcriptional start sites combined with extensive alternative splicing within the 5 (cid:1) -untranslated region, the coding exon, and the 3 (cid:1) -untranslated region. Long distance splicing, exon sharing, and premature polyadenylation

Odorant receptors (OR) 1 encode the largest multigene family in vertebrates. More than 900 members, distributed in 24 clusters on nearly all chromosomes, were uncovered after sequence analysis of the human genome (1,2). However, it appears that fewer than 350 OR genes (ϳ40%) have an intact, undisrupted open reading frame (2) coding for all seven transmembrane (TM) domains that are characteristic for these G-protein-coupled receptors (3,4). In addition, there is a high degree of OR polymorphism producing a variable functional repertoire with genes that are nonfunctional in one haplotype and functional in another (5). Despite these interindividual differences, the coding regions of vertebrate OR genes usually have a length of ϳ1 kb and are not interrupted by introns, although some expressed sequence tag data suggest that splicing within the predicted open reading frame may occur, producing a protein with only five transmembrane domains (6). The frequency of this type of splicing in OR genes is currently unknown. A number of 5Ј-untranslated exons located up to 11 kb upstream have been found by computational prediction of the genomic organization of human and mouse OR genes (7)(8)(9). Expressed sequence tag analysis has even demonstrated a 5Ј-untranslated exon to exist ϳ64 kb upstream of the coding exon (6). Support for this long distance splicing has also been provided by studies with transgenic mice revealing that genomic elements essential for transcription and axonal target recognition may be more than 50 kb upstream of the coding exon (10). Most of the 5Ј-untranslated exons of OR described so far have been found by shared sequence identity in a limited region upstream of the coding exon (7)(8)(9). Additional exons that do not share identity with these exons may exist further upstream. Knowledge about such 5Ј-untranslated exons is of special relevance, because tissue-dependent alternative usage of these exons has already been demonstrated for one OR (11). 3Ј-Untranslated exons of OR genes have not been described so far.
OR genes are expressed not only in the olfactory epithelium but also in numerous other organs (12)(13)(14)(15), suggestive of additional functions. In the testes of several mammalian species including man, at least 50 OR genes are transcribed that could be involved in sperm development, sperm competition, chemotaxis, or interaction between sperm and oocyte (16 -18). The involvement of OR in path finding has already been demonstrated for axon guidance of olfactory sensory neurons (OSN) (19,20). For testicularly expressed OR, an involvement in self non-self-discrimination has been suggested that could have developed to favor fertilization of female gametes by genetically different male germ cells (21).
However, next to nothing is known about the transcriptional control of OR genes. Only a single OR is expressed by a given OSN in a monoallelic fashion (22), probably to avoid the necessity of signal integration or scoring. It is enigmatic how this monoallelic expression mode is achieved for hundreds of OR genes present in a single OSN and whether it is implemented also in nonolfactory tissues. Interestingly, the analysis of promoter regions of the odorant transduction pathway components and other marker proteins of mature OSN has so far failed to reveal a general understanding of transcriptional control mech-* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM  anisms. However, a consensus sequence, the Olf-1 site, that binds a transcription factor (early B-cell factors) expressed solely in OSN and early B-cells (23) has been identified. The importance of this site for OR expression has, however, been questioned, because mice with an early B-cell factors null mutation displayed a profound B-cell deficit but exhibited a morphologically normal olfactory epithelium, an expressed olfactory marker protein, and the OR-specific G-protein G olf (24). Recently, two additional transcription factors (O/E-2 and O/E-3) were identified that interact with the same DNA-binding site. They were found to be transcribed only in OSN, but their relevance for OR expression awaits confirmation (25). Experimental evidence is also lacking for OLF-1 sites within the putative promoter region of some OR genes belonging to the chromosome 17p13.3 cluster (7). Likewise, the comparison of paralogous OR genes in man and mouse (6, 26 -29) has not yet provided any evidence for common regulatory features shared among OR genes from different species.
Our study addresses these problems by carrying out a detailed analysis of the transcriptional status of HLA-linked OR genes. We have employed testicular tissue for these studies to explore the function of these receptors in reproduction. We describe here that testicular OR gene transcripts are generated by a highly unorthodox combination of complex coand post-transcriptional events, including long distance and intracoding exon splicing, exon sharing, and premature polyadenylation.

EXPERIMENTAL PROCEDURES
Nomenclature for OR Genes-The Human Nomenclature Committee has not yet agreed on a common nomenclature for OR genes. 2 Therefore, we have used the original nomenclature under which these OR genes were first published (5,30).
Rapid Amplification of cDNA Ends (RACE)-Gene-specific primers were located directly downstream of the start codon or in the third transmembrane domain for the 5Ј rapid amplification of cDNA ends (RACE). Initially, a "pool" PCR was conducted containing the anchor primer 1 (AP1, 10 pmol), the gene-specific primers of all analyzed OR (gene-specific primer 1, 10 pmol each; Table I), 0.5 ng of anchored cDNA (Marathon-Ready TM cDNA; Clontech), 2.5 units of Taq (Takara), and dATP, dCTP, dGTP, and dTTP (0.2 mM each) in a 50-l reaction. The following parameters were used: 94°C for 30 s, 63°C for 30 s, and 72°C for 40 s for 35 cycles and finally 72°C for 4 min. Subsequently, nested gene-specific PCRs were conducted using 0.5 l of PCR product from the pool PCR, the respective nested gene-specific primer (gene-specific primer 2; Table II) and the nested anchor primer AP2 with the following parameters: 94°C for 30 s, X°C (Table II, Temperature column) for 30 s, and 72°C for 40 s for 35 cycles and finally 72°C 4 min. This nested PCR strategy eliminated much of the nonspecific background. In experiments with cDNA from freshly prepared testicular RNA (see below), the anchor primers AP1 and AP2 were replaced by SMART-TAG-1 and SMART-TAG-2, respectively (Tables III and IV).
To analyze coding region and 3Ј-UTR, primers located directly upstream of the start codon or in the last third of the coding region were used (Tables III-V). A first pool PCR was conducted containing AP1 or SMART-TAG-1 (10 pmol) and the gene-specific primers (gene-specific primer 1; Table III; 10 pmol each). The following parameters were used: 94°C for 30 s, 58°C for 30 s, and 72°C for 40 s for 35 cycles and finally 72°C for 4 min. Subsequently, nested gene-specific PCRs were carried out using 0.5 l of PCR product from the pool PCR, the respective nested gene-specific primer (gene-specific primer 2; Table IV), and the nested anchor Primer AP2 or SMART-TAG-2 with the following parameters: 94°C for 30 s, X°C (Table IV, Temperature column) for 30 s, and 72°C for 40 s for 35 cycles and finally 72°C 4 min. For PCRs with primers located in the last third of the coding region, the following parameters were used: 94°C for 30 s, X°C (Table V, Temperature column) for 30 s, and 72°C 40 s for 35 cycles and finally 72°C 4 min. Using the RACE results, gene-specific primers located in the 5Ј-and 3Ј-UTR were designed to facilitate the amplification of full-length cDNA (Table VI). PCR was conducted with the following parameters: 94°C for 30 s, X°C (Table VI, Temperature column) for 30 s, and 72°C for 40 s for 35 cycles and finally 72°C 4 min. PCR products were analyzed by gel electrophoresis (10% acrylamide) and visualized with UV light after staining with ethidium bromide.
To verify the identity of the RACE products, Southern analysis was conducted with probes specific for the OR genes of interest. The probes used to confirm the 5Ј RACE products covered the first 100 -300 bp of the coding region and 100 -200 bp of the 5Ј-UTR.
Isolation of Total RNA and cDNA Synthesis-To preserve the RNA, human testis obtained from the Urologische Klinik, Universitätsklinikum Charité, Humboldt-Universität zu Berlin, was immersed in RNALater (Ambion) directly after surgery. Total RNA was purified by RNAgents Total RNA Isolation System (Promega) according to the manufacturer's recommendations. RNA quality was assessed by gel electrophoresis. First-strand cDNA was synthesized from 5 g of total RNA with an oligo(dT) primer (Oligo-dT-SMART) using SuperScript TM II reverse transcriptase (Invitrogen) according to the manufacturer's recommendations. cDNA quality was assessed by the amplification of housekeeping genes (Table VII; primer and conditions were from Ref. 31).
Cloning and Sequencing-RACE reactions containing OR-specific products were purified with CHROMA SPIN-TE 100 columns and cloned into pCRII using the TOPO TA cloning kit (Invitrogen). Before sequencing, recombinant clones were identified by blue/white screening and confirmed by PCR with the respective RACE primers. Inserts of positive clones were amplified with vector-specific primers (M13APfor and M13Aprev;   Promoter Search-An analysis of putative promoter regions was undertaken using the TRANSFAC data base, in conjunction with the MatInspector program (32), and an in silico promoter search was performed using two pieces of software, Promoter Inspector (33) and Eponine (34).
Functional Promoter Assay-The region of interest was cloned into the pGL3 basic vector (Promega) in front of the firefly luciferase reporter gene using restriction sites BglII and XhoI. All of the resulting constructs were verified by sequencing, and plasmid DNA was purified prior to transfection. Human embryonic kidney cells (HEK293) (35) and Odora cells (36) were cultured in Dulbecco's modified Eagle's medium (Invitrogen) with 10% fetal calf serum, 100 units/ml penicillin, and 100 g/ml streptomycin. 2 ϫ 10 5 cells grown to 40 -80% confluence were transfected using the manufacturer's protocol for SuperFect reagent (Qiagen). Luciferase activity was determined on 100 l of the cells in medium with 100 l of Bright-Glo reagent (Promega). After a 2-min incubation to allow complete cell lysis, the samples were placed in a luminometer (Berthold Sirius), and readings were taken. Activity values were normalized to the average activity of the pGL3 control vector (which contains SV40 Promoter and Enhancer sequences) after subtracting the background luminescence activity (medium without cells plus Bright-Glo reagent).

Genomic Organization of HLA-linked OR Genes-
The 5Ј-and 3Ј-UTRs of 16 HLA-linked and three unlinked OR genes were analyzed by RACE using an anchored testis cDNA library (Marathon-Ready TM cDNA, Clontech). To improve specificity, a second nested amplification was conducted prior to cloning and sequencing.
Analysis of 5Ј-Untranslated Regions-Initially, gene-specific 5Ј RACE primers were located directly downstream of the start codon. This proximal position was chosen to facilitate access to the very 5Ј end of the transcripts. Because some G-proteincoupled receptors have been shown to be functional without the first two TM domains (37), additional gene-specific primers located in the third TM domain were designed to take also account of a transcriptional start site further downstream. After amplification of the 5Ј-UTR, the RACE reactions were examined by Southern analysis with OR-specific probes (Fig.  1), before cloning and sequencing of individual amplicons. ORspecific sequences were analyzed with the BLAST algorithm (38), revealing the genomic organization of the respective OR genes. Subsequently, splice donor and acceptor sites were verified. Despite two nested RACE reactions, a high background of false positive clones was obtained, especially for weakly expressed genes. The precise data are summarized in www.charite.de/immungenetik/ORexpression/exon-table.xls. 5Ј RACE was successful for almost all of the analyzed HLAlinked OR. No positives were detected for hs6M1-4P, -13P, -14P, and -19P, which are, at least in some haplotypes, likely to be pseudogenes. 5Ј-UTRs of varying sizes but without additional exons were found for hs6M1-1, -3, -6, - 10, -15, -17, and -20. In contrast, additional 5Ј exons were uncovered for hs6M1- 12, -16, -18, -21, and -27. 3 To compare HLA-linked and unlinked OR genes, 5Ј RACE was also conducted for three OR genes located on other chromosomes. hs19M1-4 was not expressed in testis, hs17M1-20 was expressed but revealed no 5Ј exon, and hs7M1-2 was expressed and showed two 5Ј exons.
For hs6M1-12, the most centromeric gene of the HLA-linked OR cluster, one additional 5Ј exon (A) was identified ϳ1.5 kb upstream of the start codon in three unrelated clones, and no additional 5Ј exon was found in four unrelated clones. 3 hs6M1-21, -27, -18, and -16 genes are located in a genomic region of about 110 kb (Fig. 2) and show a very unusual genomic organization. hs6M1-21, the most telomeric gene, exhibited several 5Ј-untranslated exons and shared some of these exons (exon L, exon M, and exon S, (Fig. 2) 3 with hs6M1-27 and hs6M1-18. The first exon of hs6M1-21, observed in all three splice variants, was situated more than 100 kb upstream of the start codon. Variant 1 comprised only exon L, whereas variant 2 showed in addition to exon L another exon (M) ϳ98 kb upstream of the initiator ATG codon. Variant 3 was found to contain exon S, situated ϳ50 kb upstream of the start codon, and exon L (Fig. 2). 3 The coding region of hs6M1-27 was located ϳ20 kb centromeric to the coding region of hs6M1-21. For this gene, two alternatively spliced transcripts were found: variant 1 contained three 5Ј-untranslated exons: exon L (ϳ80 kb upstream of the hs6M1-27 start codon), exon M, and exon S. All three exons were also identified for hs6M1-21. The second variant of hs6M1-27 contained only exon N (Fig. 2), 3 which resides ϳ75 kb upstream of the start codon and was not found in any other OR transcript. The coding region of hs6M1-18 ( Fig.  2) resides ϳ75 kb centromeric of hs6M1-21, in the same transcriptional orientation as hs6M1-21 and -27. Three alternatively spliced variants of this gene were found. The first exhibited four 5Ј-untranslated exons, again starting with exon L, which in this case was ϳ30 kb upstream of the coding region. The other three exons (exons O, P, and Q) were situated within the first 4 kb upstream of the start codon. The second variant contained only exon L, and the third variant contained only exon R, 600 bp distal to the start codon. Within the introns of hs6M1-18, -21, and -27, the three OR genes hs6M1-17, -19P, and -20 were located in opposite transcriptional orientation (Fig. 2). RACE products were found for hs6M1-17 and -20, but they contained no 5Ј-untranslated exons.
hs6M1-16, ϳ35 kb centromeric of hs6M1-18, shares the transcriptional orientation with hs6M1-17 and -20. Fourteen alternatively spliced hs6M1-16 variants were found. They contained different combinations of up to four untranslated 5Ј exons from a total of seven (Figs. 2 and 3). Interestingly, the most telomeric hs6M1-16 exon (B) was only 81 bp upstream of the most centromeric exon (L) of hs6M1-21, -27, and -18 (Fig. 2). Further variation in the splicing of hs6M1-16 appeared to have been generated by using a long or short version of exon D. 3 For each variant, all of the splice sites, FIG. 2. Diagram showing the genomic organization of hs6M1-21, -27, -20, -18, -17, and -16. hs6M1-19P was not found to be expressed. including the two forms of exon D, were confirmed to be consensus splice donors and acceptors.
Premature Polyadenylation-To analyze the coding region and the 3Ј-UTR of the HLA-linked OR genes, RACE was conducted with gene-specific primers located directly upstream of the start codon. Only OR genes that could be successfully amplified in the previous 5Ј RACE experiments were considered (Table III). After pool and nested PCR, the products were cloned and sequenced. Using testis cDNA, specific products could be obtained for only three of the 12 analyzed genes (hs6M1- 16, -18, and -21). Additionally, all of the RACE products with the exception of one transcript from hs6M1-16 and -18 showed premature polyadenylation. To exclude the possibility that these findings were artifacts of the anchored cDNA library (Marathon; Clontech), mRNA was also freshly isolated from human testis and reverse transcribed using a tagged oligo(dT) primer.
After pool and nested PCR with this cDNA and the respective tag primers (SMART-TAG-1 and -2), products were obtained for hs6M1-16 and -21 with premature polyadenylation at the same positions as with the anchored cDNA library. However, no transcripts were observed for hs6M1-18 with the SMART-tagged cDNA. For hs6M1-21, eight different RACE products from the Marathon library and three from the second cDNA source were found. They exhibited premature polyadenylation within the coding region at three defined positions (Fig. 4), although no polyadenylation consensus signals could be detected. Interestingly, the hs6M1-16 transcript with the most downstream premature polyadenylation site (position 596) had lost 234 bp of the coding region, including the premature polyadenylation sites of the other analyzed transcripts and the genuine start codon. Because splice donor and acceptor sites were found at the edges of the missing material, this clone is best explained by an additional splicing event. Additionally, one transcript from hs6M1-16 (Fig. 5, variant 11) was found without premature polyadenylation, but again, the other premature polyadenylation sites were lost because of a splicing event removing the first 221 bp of the coding region.
Analysis of the 3Ј-Untranslated Regions-Further oligonucleotides located in the last third of the coding region were designed to analyze the 3Ј-UTR of hs6M1-21, -27, -18, and -16. Three alternatively spliced variants were observed for hs6M1- 16. The first transcript (variant 8) contained exon K, the second (variant 9) exon J, and the third (variant 10) exon I as well as exon J (Fig. 5). 3 These exons were located within 1.6 kb of the stop codon. A 3Ј unspliced variant (11) was also found. Only the most downstream exon (K) contained a typical polyadenylation signal (40) (Fig. 5).
Two unspliced 3Ј variants differing only in length were identified for hs6M1-27 (variants 3 and 4; Fig. 6). 3 Variant 4 contained the polyadenylation consensus signal 24 bp upstream of the poly(A) tail. The 3Ј-UTR of the only hs6M1-18 3Ј variant was 845 bp in size and unspliced, 3 and a polyadenylation signal was found 17 bp upstream of the poly(A)-tail. However, no specific 3Ј RACE products from hs6M1-21 could be obtained.
Analysis of the Coding Regions-To amplify transcripts that contained the complete coding region, gene-specific primers located in the 5Ј-and 3Ј-UTR were designed. Five variants of hs6M1-16 were detected (Fig. 5, variants 11-15). Variant 14 contained the complete coding region and a spliced 3Ј-UTR (exon J), whereas variant 15 revealed an unspliced 3Ј-UTR. The 3Ј ends of variants 12 and 13 resembled those of 14 and 15 ( Fig. 5) but lacked the first 234 bp of the coding region and 19 bp of the 5Ј-UTR, removing the first methionine of the presumed intact OR and suggesting that this transcript produced a protein of only 238 amino acids as opposed to the 316 amino acids of the full-length OR. Variant 11 had lost 221 bp of the coding region and showed an unspliced 3Ј-UTR. This could lead to the same truncated protein as described for variants 12 and 13. For all of the transcripts mentioned, consensus splice sites were found. hs6M1-27 yielded two different variants: variant 6 contained the complete coding region (Fig. 6), whereas variant 5 lacked the sequence between bp 412 and 683 of the coding region. These 272 bp must have been removed by splicing to give rise to a frameshift that resulted in a premature termination codon 20 codons after the splice site. In contrast, all of the products obtained for hs6M1-18 were found to be unspliced.
Search for Regulatory Regions-The TRANSFAC data base, accessed by MatInspector, was used to search for transcription factor-binding sites within the 500-bp region containing exons B and C of hs6M1-16 and exon L of hs6M1-18, -21, and -27. A number of matches were observed (Fig. 2), but the OLF-1 binding site was not included. The detailed analysis of the region between exons B and L revealed significant matches for transcription factors belonging to three different groups: fork head-related activators, SRY-related factors, and AP1 transcription factors (Table VIII). Their binding motifs are all common within the genome. Therefore, the FastM (41) program was used, which allows a model of a putative promoter region to be developed through predicting two binding sites, their strand orientation, their sequential order, and the allowed distance between binding sites. However, nothing distinctive about this collection of transcription factor-binding sites could be discerned.
The sequence between exon C of hs6M1-16 and exon L of hs6M1-18, -21, and -27 (Fig. 2) was then compared against other regions of the human genome, using the BLAST program, to see whether this is a unique sequence or whether it exists in other OR clusters. The analysis revealed that the first half (positions 126 -247; Fig. 2) of this sequence was unique. However, the second half (positions 248 -346) of the sequence was found to be similar to several other regions within the genome. One of these similar sequences (with a 69% shared base pair identity) was located in an OR cluster on chromosome 11q12.2, but because this region of the genome is currently unfinished, further work is needed to confirm whether this shared sequence is located in a similar regulatory region in the chromosome 11q12.2 cluster.
To further investigate the putative promoter region between hs6M1-16 and hs6M1-18, -21, and -27, a functional analysis of the region was carried out. This involved cloning the candidate promoter region (positions 126 -346; Fig. 2) into the luciferase reporter vector pGL3 in both the forward and reverse orientation and transfecting the resulting constructs and the respective control vectors into two cell types. Because a suitable cell line derived from a seminoma was unavailable, two others were employed. Firstly, Odora cells, derived from rat OSN where OR could be expected to be expressed, were transfected. Because hs6M1-27 was also found to be expressed in the kidney (results not shown), human embryonic kidney cells (HEK293) were also used. After transfection, both sets of cells were assayed for luminescence. In both cases, positive controls revealed strong signals, but no promoter activity was found for the putative promoter region. In the case of OLFOP(F) (the region of interest in the forward orientation), cell luminescence was in several independent experiments below that observed for cells without constructs, and OLFOP(R) (the region of interest in the reverse orientation) revealed almost no activity (Table IX).
In addition, two promoter prediction programs were employed to analyze the major and the minor HLA-linked OR clusters, but no regulatory regions were predicted for any of the OR genes (Fig. 7), although promoters of several other genes were predicted accurately. DISCUSSION Major histocompatibility complex (MHC)-linked OR genes are of particular interest because of the possible involvement of their products in processes connected with MHC-dependent sperm selection and mate choice (16,18,21,30,42,43). The study of transcripts from human MHC-linked OR genes described here had three principle aims: (i) to obtain an estimate of the degree to which these genes are expressed in testis, (ii) to elucidate their genomic organization and transcription start sites, and (iii) to provide an in-depth analysis of potential promoter sites.
Testicular Expression of HLA-linked OR Genes-Over the last 10 years, more and more evidence has accumulated suggesting that OR are not exclusively expressed in OSNs (44). This has led to the hypothesis that OR also receive and transduce signals within numerous nonolfactory tissues (15). OR could be involved in various aspects of sperm biology (12, 13,   FIG. 4. Alignments of premature polyadenylated transcripts of hs6M1-16, -18, and -21. Products generated with the Clontech library are shown above the line; products generated with freshly prepared cDNA are shown below. Mismatches directly before the poly(A) tail are probably artifacts resulting from the anchored poly(T) primer (Table VII). For clarity, 303 bp of the third product of hs6M1-16 have been omitted. 16 -18, 21, 45) or in detecting or creating area codes that are important in embryogenesis (15,46). The huge number of different OR species appears exceptionally well suited to meet these requirements, and the detection of OR transcripts in embryonic (47) as well as many adult tissues (12, 48) provides further supporting evidence. In this study, we demonstrate for the first time that HLA-linked OR genes are also expressed in testis.
Only a relatively small fraction (ϳ5%) of the entire repertoire of non-MHC-linked mammalian OR is known to be tran-  (Table IV) are labeled with Primer, and amplifications using the anchor primer SMART-TAG-2 (Table VII) (Table IV) are labeled with Primer, and amplifications using the anchor primer SMART-TAG-2 (Table VII) Fig. 2 (bottom panel). Start T00027  AP1 transcription factor  259  T00027  AP1 transcription factor  272  T00027  AP1 transcription factor  279 T01470 Ikaros, lymphoid-specific transcription factor scribed in testis (12,16,17,45). Interestingly, nearly all of these testis-expressed OR are potentially functional, amounting to ϳ15% of the human OR with open reading frames (2). In contrast, more than 85% of the analyzed HLA-linked OR genes with open reading frame are expressed in this tissue. This discrepancy might in part be explained by sensitivity differences of the methods used. Additionally, interindividual differences in the expression of single OR genes could be responsible as well, because of independent tissue donors in the respective studies. Therefore, it remains to be seen whether the observed expression differences between HLAlinked and non-HLA-linked OR are functionally relevant. The functional significance of testicular OR expression has been questioned repeatedly (49), in particular in connection with the seemingly "promiscuous" expression of various genes that serve no obvious function within the testis. However, several lines of evidence argue in favor of the functionality of OR in testis: (i) promiscuous gene expression is also a feature of certain thymic medullary cells, where it serves a purpose within negative T-cell selection (50); (ii) certain OR have been detected on spermatozoa (13), proving testicular translation of OR transcripts; (iii) the expression levels of OR genes within the testis appear to be comparable with that in the main olfactory epithelium (7) 4 ; and (vi) the G-protein used by OR for signal transduction is expressed in testis (45), and spermatozoa seem to be endowed also with proteins involved in olfactory desensitization (51), implying that testicular OR expression is a meaningful event. Specific ligands are known only for very few OR (52)(53)(54)(55), and these odorants are without exception volatile, favoring their interaction with OR residing on cells of the main olfactory epithelium. However, because OR also play a major role in axonal targeting of olfactory neurons to specific glomeruli in the olfactory bulb (19, 56 -58), it may well be that OR interact with nonvolatile molecules as well, e.g. soluble molecules. This might be an essential requirement for the directed movement of spermatozoa (12,13,16,18,21,45,59). Therefore, OR may be considered as likely further examples for multi-functional proteins, such as MHC class I molecules that fulfill completely different roles within the immune system and particular areas of the brain (60).
Transcripts and Genomic Organization of OR Genes-In most of the analyzed cases, olfactory neurons of the main olfactory epithelium express only one OR gene/cell in a monoallelic fashion (22,61). In other tissues, the mode of expression is unknown. To analyze the underlying regulatory mechanism, the exact knowledge of the transcriptional start is a minimal prerequisite. In testis, the transcripts of six HLA-linked genes (hs6M1-12, -16, -18, -21, and -27 and the non-HLA-linked hs7M1-2) contain 5Ј-untranslated exons, whereas seven genes (hs6M1-1, -3, -6, -10, -17, and -20 and the HLA-unlinked hs17M1-20) are characterized by transcripts without 5Ј-UTR exons. 3 Interestingly, all but one (hs6M1-12) of the analyzed HLAlinked OR genes with 5Ј-untranslated exons are clustered within a region of ϳ110 kb (Figs. 2 and 7). The exception, hs6M1-12, shows a pronounced similarity to hs6M1-16, an OR gene within the 110-kb region cluster (62). This similarity is most prominent in the coding region (Ͼ90%) but extends ϳ5 kb upstream, suggesting a common ancestor for both genes. Nevertheless, no homologous 5Ј exons are found for hs6M1-12 and -16. The only 5Ј exon found for hs6M1-12 resides in a region that has probably been inserted after duplication of an ancestral OR gene. An analogous situation can be observed for the seven upstream exons of hs6M1-16, which are located mostly in regions with, at best, low similarity to hs6M1-12. This implies that both genes are characterized by unique regulatory elements.
Although the analysis of the human expressed sequence tag data base TIGR has shown that one-third of all genes exhibit alternative splicing and 80% are spliced in the 5Ј-UTR (63), the complexity found for the 5Ј-UTR of hs6M1-16 is remarkable and, to our knowledge, unprecedented for OR loci (Fig. 3). Five of the seven upstream exons and the coding exon itself are used as transcriptional starts, suggesting the existence of six different promoters. Taking into account that OR could be expressed in different tissues, it does not appear surprising that different promoters are needed for flexible transcriptional regulation. This is exemplified by FGF1 (64) and other genes.
The OR genes hs6M1-18, -27, and -21 give rise to very unorthodox transcripts. Most of these share a common first 5Ј exon, suggesting a conjoint regulation by the same promoter. This seems to resemble the situation of the two human zinc finger genes PEG3 and ZIM2 (65); seven of the 11 exons of ZIM2 are located upstream of and shared with PEG3. The distances between the respective zinc finger and OR genes are remarkably similar (ϳ30 kb), but there are two major differences: In addition to this first shared exon, transcription of hs6M1-18 and -27 starts at further sites (exons N and R; Fig. 2), 3 and the analysis of additional tissues and transcripts may uncover even more exons and transcription start sites. Furthermore, the introns of hs6M1-18, -27, and -21 contain three OR genes (hs6M1- 17, -19P, and -20) in opposite transcriptional orientation, of which hs6M1-17 and -20 were demonstrated to be transcribed in testis. As a consequence, the mRNA for the most telomeric gene of this transcriptional unit (hs6M1-21) is expected to have lost numerous untranslated as well as five complete OR coding exons (those of hs6M1- 17, -18, -19P, -20, and -27), in total more than 100 kb (Fig. 2). The large intron sizes are also highly unusual, because other OR genes exhibit distances from 1.3 to 11.1 kb between the first 5Ј noncoding exon and the coding region (7, 11, 26 -28). hs6M1-32, a member 4 A. Ziegler, A. Volz, and K. Zatloukal, unpublished results.

TABLE IX
Relative luminescence Shown is the relative luminescence (%) after transfection of various constructs into (i) the rat olfactory sensory neuron cell line Odora, and (ii) the human embryonic kidney cell line HEK293, calculated by comparison of samples from two experiments against an average value of luminescence of cells with the control vector (two experiments). The samples were: cells (only), cells ϩ basic vector (without promoter or enhancer sequence), cells ϩ promoter vector (SV40 promoter without enhancer), cells ϩ control vector (with SV40 promoter and enhancer), cells ϩ OLFOP(F) (basic vector ϩ putative odorant receptor promoter sequence in the forward orientation), and cells ϩ OLFOP(R) (basic vector ϩ putative odorant receptor promoter sequence in the reverse orientation). of the telomeric subcluster of HLA-linked OR genes (Fig. 7), is another notable exception to the rule; its 5Ј-UTR is ϳ64 kb in length and, like hs6M1-18, -27, and -21, it also splices around other OR genes (hs6M1-10 and -33P) (6). Numerous hs6M1-16, -18, and -21 transcripts exhibit premature polyadenylation at defined sites within the coding region without detectable poly(A) signals. They match genomic sequence without any frameshifts or nonsense mutations. Because the same premature polyadenylation sites are present within freshly prepared testis cDNA and A/T-rich sequence stretches are missing at the respective places, cDNA library artifacts can be excluded as explanation. Most other genes that use more than one polyadenylation site possess the common polyadenylation signal (AAUAAA) only at the most 3Ј site (66), because this strong signal would otherwise repress the production of full-length transcripts. This configuration is also exhibited by hs6M1- 16 and -27, where only the most distal exon contains the consensus polyadenylation signal. In general, splicing in the 3Ј-flanking region seems to be a rare phenomenon, which is in line with an unspliced 3Ј-UTR of hs6M1-18 and -27. On the other hand, the 3Ј exons of hs6M1-16 (Fig. 5) demonstrate for the first time that OR genes or G-proteincoupled receptors exhibit a spliced 3Ј-UTR.
Only 81 bp upstream of the shared first exon (L) of hs6M1-18, -27, and -21, transcription of some variants of hs6M1-16 initiate in the opposite direction (Fig. 2), suggesting that this small region contains a bidirectional promoter. Examples for bidirectional promoters in the human genome are common (67), but no bidirectional promoter controlling more than two genes has been described so far. This configuration could also explain why more than one OR gene has rarely been found to be expressed in a given OSN (68). However, neither an extensive in silico analysis, including the search for transcription factorbinding sites (32) and sequence similarities between OR clusters, nor experimental efforts using reporter gene constructs prove the existence of such a shared promoter region ( Fig. 2 and Table IX). The analysis of the major and the minor HLAlinked OR clusters using two promoter prediction programs, Eponine and Promoter Inspector, also provides no hint for the existence of "classical" promoters (Fig. 7). Nevertheless, a shared promoter for hs6M1-16, -18, -21, and -27 may exist, but its presence remains elusive.
This lack of convincing promoter motifs has also been observed in studies of other OR gene clusters, both in human and mouse (7, 26 -28). The central question of how an individual cell "decides" which OR gene to transcribe and how to suppress transcription of all others remains thus unsolved. Three mechanisms have been invoked for some years to account for this remarkable selectivity (reviewed by Ref. 69): (i) genomic rearrangements similar to those employed by B-and T-cells to generate immunoglobulins and T-cell receptors, respectively; (ii) gene-specific assemblies of transcription factors acting on gene-specific regulatory motifs; and (iii) involvement of "locus control regions" regulating the transcription of OR genes within clusters. There is either no evidence for such mechanisms (in particular, gene rearrangements; see also Ref. 28), or they present considerable conceptual difficulties together with lack of evidence (transcription factor assemblies and locus control regions). However, the concept of only a single OR transcription complex that stably associates with a single OR control region (Ref. 27; reviewed in Ref. 70) within a given OSN would provide a simple solution for the complex problem of gene-and allele-specific transcription.
The published OR coding regions of higher eukaryotic species like lamprey, teleosts, and mammals are all intronless, but for Caenorhabditis elegans and Drosophila melanogaster introns within the coding region were described (71). In this respect, HLA-linked OR seem to be no exception, if the coding region is analyzed only at the genomic level. However, our experiments reveal for hs6M1-16 and -27 that part of the coding region may be removed by splicing (Figs. 5 and 6). In case of hs6M1-27, this intracoding region splicing leads to premature termination because of a frameshift. The resulting protein would maximally contain the first three TM domains and is probably not functional on its own, because the pocket thought to accommodate a ligand is not present (3). In the case of hs6M1-16, two very similar splicing products (Fig. 5) are observed that could result in an OR without the first two TM domains. This N-terminally truncated OR would start like intact OR with an extracellular domain. For chemokine receptors, which are also G-protein-coupled 7-TM receptors, it has been demonstrated that the five distal TM regions may be sufficient for signal transduction (37). Because 10 of the HLAlinked OR genes comprise the respective splice site as well as the alternative start codon at position 79 (6), one may expect a biological function also for the truncated proteins. Hypothetically, differently truncated OR such as hs6M1-27 and -16 might even complement each other functionally. Provided that cells FIG. 7. Diagram showing the major and minor clusters of OR genes located in the extended human MHC (oriented in a telomere to centromere direction). The minor cluster is located ϳ1200 kb telomeric to the major cluster. OR genes within the region are marked by black arrows and are numbered. For clarity, the preceding hs6M1-as well as the P indicating pseudogenes have been omitted. Other genes referred to within the text are indicated. For an enlarged diagram of this region showing all gene designations, refer to Ref. 6. The lower part of the diagram shows the results from the two promoter prediction programs, Promoter Inspector (used at default settings) and Eponine (used at four different thresholds as indicated). Eponine predicts only four transcriptional start sites upstream of OR genes at a threshold of 0.9990, but the predicted site in front of hs6M1-21 does not match the experimental data (Fig. 2). On the other hand, Promoter Inspector does not predict any promoters that can be considered to regulate the transcription of OR genes. The positions of CpG islands are indicated on the bottom line. existed that transcribe two or more OR genes simultaneously, the resulting combinatorial potential between different truncated OR fragments could increase the number of receptor specificities considerably, and the presence of numerous OR pseudogenes within the human genome (72) could gain new relevance. hs6M1-14P, for example, was considered a pseudogene solely because of the missing conventional initiator ATG (6), but again, the alternative start codon at position 79 could give rise to a 5-TM fragment. In addition, OR gene polymorphism is expected to increase the complexity even further (5,44). Therefore, the accumulation of mutations during primate evolution does not necessarily reduce the number of OR specificities as postulated (73) but might have led to new functional properties of OR. Differential splicing of OR transcripts creating functional species from seemingly nonfunctional genes could be another example for the economical use of resources that has been suggested by the finding of only 30,000 -40,000 genes instead of some earlier predictions of up to 100,000 genes within the human genome (39,74).