Structure of the m4 cholinergic muscarinic receptor gene and its promoter.

Cholinergic muscarinic receptor genes are members of the G-protein receptor gene superfamily. In this study we describe the structure of the gene and promoter of the rat m4 muscarinic receptor gene. A rat cosmid clone containing the coding region for the m4 gene and 25 kilobases of upstream sequence was isolated. This clone directed expression of the rat m4 gene when introduced in IMR32 cells, a human neuroblastoma that expresses m4, but did not drive expression when introduced into Chinese hamster ovary cells, a line that does not express the m4 gene. S1 nuclease, modified 5'-rapid amplification of cDNA ends and polymerase chain reaction analysis of rat cosmid DNA and cDNA showed that the gene consists of a 2.6-kilobase coding exon, extending 34 base pairs (bp) upstream from the initiating ATG, separated from a 460-493 bp noncoding exon by a 4.8-kilobase intron. DNA sequence analysis shows that the non-coding exon is GC-rich and that the promoter does not contain a TATA or CAAT box and has several consensus sequences for enhancer elements including five Sp-1 binding sites, one AP-2 site, one AP-3 binding site and two E-boxes within the proximal 600 bp. A reporter construct consisting of 1440 bp of flanking DNA and 80 bp of the first exon cloned into a luciferase reporter plasmid, drove cell specific expression in transient transfection assays. Removal of 1088 bp of the 5' end of this construct resulted in expression in non-m4 expressing cell lines suggesting there is a repressor element in this region.

Cholinergic muscarinic receptor genes are members of the G-protein receptor gene superfamily. In this study we describe the structure of the gene and promoter of the rat m4 muscarinic receptor gene. A rat cosmid clone containing the coding region for the m4 gene and 25 kilobases of upstream sequence was isolated. This clone directed expression of the rat m4 gene when introduced in IMR32 cells, a human neuroblastoma that expresses m4, but did not drive expression when introduced into Chinese hamster ovary cells, a line that does not express the m4 gene. S1 nuclease, modified 5-rapid amplification of cDNA ends and polymerase chain reaction analysis of rat cosmid DNA and cDNA showed that the gene consists of a 2.6-kilobase coding exon, extending 34 base pairs (bp) upstream from the initiating ATG, separated from a 460 -493 bp noncoding exon by a 4.8-kilobase intron. DNA sequence analysis shows that the noncoding exon is GC-rich and that the promoter does not contain a TATA or CAAT box and has several consensus sequences for enhancer elements including five Sp-1 binding sites, one AP-2 site, one AP-3 binding site and two E-boxes within the proximal 600 bp. A reporter construct consisting of 1440 bp of flanking DNA and 80 bp of the first exon cloned into a luciferase reporter plasmid, drove cell specific expression in transient transfection assays. Removal of 1088 bp of the 5 end of this construct resulted in expression in non-m4 expressing cell lines suggesting there is a repressor element in this region.
Signal transduction in the nervous system is largely conducted by activation of G-protein coupled receptors (GPRs). 1 GPRs are encoded by one of the most diverse gene families in the mammalian genome, accounting for as much as 1% of the entire genome. Most, if not all, of these gene products have a unique distribution within the nervous system and hence it is of interest to determine how these expression profiles are brought about, i.e. what determines the receptor repertoire of individual neurones? Of the several hundred members of the G-protein coupled receptors very little is known about the gene structure and the transcriptional control regions that result in this tightly controlled expression (see Ref. 1). Muscarinic receptors mediate many of the effects of acetylcholine in the central and autonomic nervous systems, and are found in both neuronal and effector tissues. Five muscarinic receptor genes have been identified, m1-m5 (2)(3)(4), each of which has a unique pharmacological profile and pattern of expression within the central nervous system (5)(6)(7). The function of the muscarinic receptors within the central nervous system is poorly understood although they have been implicated in arousal (8), maintenance and acquisition of short term memory (9), and psychomotor control (10).
The present study focuses upon the m4 muscarinic receptor gene and represents the first report of the gene structure of a member of the muscarinic receptor gene family. The m4 gene is expressed mainly in telencephalic regions of the central nervous system (6), autonomic ganglia (11,12), and lung (13,14). Activation of m4 receptors can lead to closing of N-type voltage sensitive Ca 2ϩ channels, activation of K ϩ channels, and inhibition of adenylyl cyclase. Each of the muscarinic receptor genes contains its open reading frame encoded entirely within one exon and it is thought that each of these genes contains at least one further exon upstream containing the 5Ј-untranslated sequences.
In order to determine the mechanisms that are involved in the initiation and maintenance of expression of these receptors, a cosmid clone containing the entire m4 receptor gene and flanking regions has been isolated. Several parallel strategies have been employed to map the upstream elements of this gene and to identify regulatory elements involved in driving cell specific expression.
Nuclear Run-on Assay-Nuclei from PC12 and CHO cells were isolated according to Current Protocols (16), and stored at Ϫ70°C prior to usage. For the transcription run-on a 200-l aliquot of frozen nuclei was thawed and labeled with [␣-32 P]UTP (3000 Ci/mmol, DuPont NEN) according to Current Protocols (16). After labeling, DNA was digested using RNase-free DNase (Promega) and protein was digested with proteinase K (Promega). Total and incorporated counts were determined and the probe hybridized to slot blots in a hybridization buffer containing 10 mM Tris-Cl (pH 7.4), 0.2% SDS, 10 mM EDTA, 300 mM NaCl, 1 ϫ Denhardt's reagent, and 200 g/ml tRNA at 65°C for 48 h. The blots were washed at final stringency of 0.1 ϫ SSC, 0.1% SDS at 65°C and excess RNA removed in 2 ϫ SSC, 10 g/ml RNase A at 37°C for 30 min. The blots were exposed to x-ray film at Ϫ70°C for 1 week.
RNA Isolation and PCR Analysis-RNA was isolated using the acid phenol/guanidinium thiocyanate method of Chomczynski and Saachi (17). mRNA was purified using oligo(dT) (Invitrogen). Northern blots were prepared by electrophoresis through formaldehyde agarose gels and electroblotting onto nylon membranes (GeneScreen).
5Ј-RACE-5Ј-RACE was carried out using a modified procedure of Frohman et al. (19). PC12 RNA was reverse transcribed using an m4 specific primer, 863a (5Ј-GCTGGACTCATTGGAAGTGTCCT-3Ј). First strand cDNA was then purified and tailed with dATP prior to amplification using a t17-adapter primer, RACE-1 (5Ј-GACTCGAGTCGA-CATCGATTTTTTTTTTTTTTTTT-3Ј), and an m4 specific primer, Rm4 63a (5Ј-ACTGTCTCCAGGTGGTTGTGGGCT-3Ј). PCR cycling conditions were: 95°C/30 s, 58°C/30 s, and 72°C/90 s. Amplification was carried out for 30 cycles. 10% of the amplified product was used in a second amplification under the same conditions as above but using a set of nested primers, internal to the m4 specific product amplified. This second set of primers consisted of an m4 specific primer, Rm4 10a (5Ј-GTCACCAGGCGCACAGACTGATTGGCTGAGCTGCCATTGAC-AGGCGTG-3Ј) and the adapter primer, RACE-2 (5Ј-GACTCGAGTC-GACATCG-3Ј). [␣-32 P]dATP (6000 Ci/mmol, DuPont NEN), was also included to a final concentration of 10 mM to internally label the amplified product. After 30 cycles of PCR, labeled RACE products were purified by spinning through a G-50 column. The RACE probe was denatured by boiling for 5 min and hybridized to Southern blots containing DNA fragments of the cosmid clone, R3-6. PCR mapping of the upstream exon and exon/intron boundaries was carried out using primers indicated in Fig. 1.
DNA Transfections-Cesium chloride banded DNA or Qiagen column purified DNA was transfected into cells using calcium phosphate precipitation as described by Chen and Okayama (20) or with Lipofectamine (Bethesda Research Laboratories). For transient assays, cells were harvested 48 h after transfection and cell extracts prepared as described below. Stable transformants were produced by co-transfection with the cosmid clone (R3-6), and a Simian virus 40 (SV40) driven neomycin resistance gene and selection in the presence of 400 g/ml G418. Single clones were amplified and RNA was prepared to test for rat m4 expression. For reporter gene assays NG108 -15, CHO, and 3T3 cells were transfected using 0.8 g of luciferase plasmid DNA, 0.2 g of a standard plasmid consisting of a cytomegalovirus promoter ␤-galactosidase gene (pCMV␤ (Clontech)) and 5 l of Lipofectamine/35-mm dish. These conditions gave transfection efficiencies of 5-50% as adjudged by ␤-galactosidase histochemistry. Two to three days after transfection, cells were harvested into Reporter Lysis Buffer (Promega), the lysate freeze/thawed, and stored at Ϫ80°C. Luciferase measurements were carried out using the Promega luciferase assay system, according to manufacturers instructions in a Turner TD-20e luminometer. ␤-Galactosidase measurements were carried out using an o-nitrophenyl-␤-D-galactopyranoside assay (21) and used to normalize luciferase results. S1 Nuclease Mapping-S1 nuclease mapping was carried out by a slightly modified procedure of Sambrook et al. (21). 5 g of plasmid DNA was cut with a suitable restriction enzyme, added to 100 g of either PC12 or CHO total RNA, precipitated, and dissolved in 30 l of hybridization buffer (90% formamide, 400 mM NaCl, 40 mM PIPES (pH 6.4), and 1 mM EDTA). The nucleic acids were denatured at 85°C for 10 min and hybridized overnight at 50°C before digestion with S1 nuclease. Samples were electrophoresed on a 2% agarose gel and blotted onto a nylon membrane (GeneScreen). The blot was then hybridized with fragments of the R3-6 clone to identify the size and location of the protected exonic domains.
Nested Deletions and DNA Sequencing-The 5.0-kb BamHI fragment believed to contain the transcription initiation site for the m4 gene was subcloned into the plasmid pGem 7Zf(ϩ) (Promega) and nested deletions were then produced using exonuclease III and an Erase-a-base kit (Promega). Single recombinant plasmids were isolated and used for DNA sequencing. DNA sequencing of PCR products was performed using the fmol DNA sequencing kit (Promega); PCR products were first purified using a Centricon 100 column and sequenced as per the manufacturers' instructions. DNA sequencing of plasmid DNA was performed with a Sequenase kit from U. S. Biochemical Corp. or by cycle sequencing using Taq polymerase and dye terminators on an Applied Biosytems automated sequencer.
Reporter Plasmid Construction-Reporter plasmids were constructed using a modified version of pGL3 basic (Promega). Two primers, Rm4x15s and Rm4x18s, containing KpnI linkers were used in conjunction with Rm4x6a containing a BglII linker to amplify a 1520and a 432-bp fragment of the m4 promoter, respectively. These were cloned into a KpnI/BglII cut pGL3 basic to generate the pGL3 Ϫ1440/ ϩ80 and pGL3 Ϫ352/ϩ80 constructs.

RESULTS
Characterization of Cosmid R3-6 -Screening of approximately 1 ϫ 10 6 recombinants in a rat cosmid library with DNA probes specific for the m4 gene yielded three overlapping DNA clones each of which contained at least some of the m4 coding region. Two of the clones contained only a truncated m4 coding region and no 5Ј-flanking DNA while one of the clones, R3-6 ( Fig. 1), was found to contain the entire coding region for the m4 gene and approximately 25 kb of 5Ј-flanking DNA. Further analysis was therefore confined to the R3-6 clone. Our initial approach was to determine if R3-6 contained a functional m4 transcriptional unit. In order to test this hypothesis the R3-6 cosmid clone was used to produce stable transformants of two cell lines; IMR32, a human neuroblastoma that expresses the m4 gene and CHO, which does not express the m4 gene (Fig. 2). A PCR analysis was then performed on RNA extracted from these cell lines using species-specific primers to determine if R3-6 contained enough information to drive expression of the rat m4 gene. PCR analysis on stably transfected CHO and IMR32 cells (Fig. 2) shows that R3-6 is capable of driving m4 gene expression in IMR32 but not CHO cells thereby indicating that the R3-6 cosmid clone contains the transcription start site for the m4 gene and at least some of the transcriptional regulatory sequences that drive cell type-specific m4 expression. Although, in principle, this type of analysis could be conducted on transiently transfected cells, in practice, we found it impossible to completely digest the transfected cosmid DNA present in the nucleic acid extractions, using either exonucleases or endonucleases. Under these conditions, it was not possible to determine whether subsequent PCR products were derived from cDNA or cosmid DNA.
Identification of the Transcription Start Site-Several independent approaches were then utilized to determine the position of the transcription start site and the location of exonic sequences within the R3-6 clone. A full-length cDNA for the m4 receptor gene has proven difficult to isolate despite using various strategies, including screening of 5Ј STRETCH cDNA libraries (Clontech) which have not resulted in any cDNA sequence upstream from the initiating ATG being obtained; this was also noted in the original work describing isolation of the rat m4 cDNA (3). In order to determine the size of the untranslated exon in the m4 gene a series of primer extension analyses were carried out using mRNA from PC12 cells; however, even with the use of a variety of primers and a selection of extension conditions no specific products were obtained. Positive controls using primers to reverse transcribe the 5Ј end of the tyrosine hydroxylase gene consistently generated specific bands of the expected length (data not shown). Problems with primer exten-m4 Muscarinic Promoter sion analysis have previously been noted for a number of genes in this gene family including the D 1A gene and have generally been ascribed to the presence of GC-rich sequences within the 5Ј noncoding regions (22).
To determine the regions of the R3-6 cosmid that are transcribed a nuclear run-on analysis was performed using nuclei isolated from PC12 cells which were shown to express the rat m4 gene (Fig. 2). The results of the nuclear run-on analysis suggest that there may be two transcription units within the R3-6 clone (Fig. 3). The 4.5-kb BamHI fragment containing the coding region and the 5.0-kb BamHI fragment upstream from the coding region both show hybridization signals suggesting  (1)(2)(3)(4)(5)(6). PCR amplification using rat m4-specific primers was performed as described under "Materials and Methods" on RNA, DNased RNA, and reverse transcribed DNased RNA and run on a 2% agarose gel. Each of the numbered lanes corresponds to an individual clone. Ϫ and ϩ represent negative and positive controls for the PCR, respectively. Amplified products from the RNA lanes 1, 2, 4 and 5 derive from genomic DNA and demonstrate that the rat sequences are present within the genome of the clones. The genomic DNA is efficiently removed by a DNase treatment (DNased RNA, lanes [1][2][3][4][5][6] and no amplified product is obtained from RNA alone. Reverse transcription and subsequent PCR shows that 2 of the clones (reverse transcribed DNased RNA lanes 3 and 4) express the rat m4 mRNA. C, characterization of 7 stably transfected CHO clones (1-7). A PCR analysis on reverse transcribed DNased RNA was performed using m4-specific primers and primers to the constitutively expressed enzyme hypoxanthine-guanine phosphoribosyl transferase to demonstrate that the lack of an amplified product using the m4 primers was not due to a lack of reverse transcribed RNA. Ϫ and ϩ represent negative and positive controls for the PCR reaction. m4 Muscarinic Promoter that both fragments contain exonic sequences. The 3.0-kb fragment does not show any hybridization indicating that this fragment does not contain transcribed sequences. Repeat experiments failed to show any hybridization with this fragment. This data suggests that transcription of the m4 gene begins within the 5.0-kb fragment. The hybridization signal obtained for the 14.3-kb fragment is likely to be due to a separate transcription unit as there is no transcription occurring through the 3.0-kb fragment, or due to hybridization of repetitive sequences which have been found in intronic regions (23).
A Northern blot analysis on total RNA isolated from CHO and PC12 cells using subcloned regions of the cosmid was performed to eliminate the possibility of any hybridization due to intronic sequences. The results are shown in Fig. 4 where it can be seen that both the 5.0-and 4.5-kb BamHI fragments show hybridization signals consistent with these containing exonic material. Also the sequences tested within the 14.3-kb fragment show hybridization signals with PC12 RNA but not with CHO RNA indicating that the hybridization is likely to be due to exonic sequences rather than repetitive elements. These upstream transcribed sequences show a more intense hybridization signal than the 5.0-and 4.5-kb fragments which are hybridizing to the m4 transcript consistent with the interpretation that these fragments are hybridizing to a transcript different from the m4.
Both the nuclear run-on analysis and the Northern blot analysis suggest that the transcription start site for the m4 gene is situated within the 5.0-kb BamHI fragment (Figs. 3 and  4). In order to map the position of exonic sequences within this fragment more precisely it was decided to utilize the 5Ј-RACE procedure (19). Hybridization analysis of 5Ј-RACE products using nested oligonucleotides as hybridization probes revealed the presence of specific RACE products consistent with the presence of at least a further 200 bp of exonic sequence. However, all attempts to clone the 5Ј-RACE product ended with deleted clones. Use of severely rec Ϫ host cells such as SURE (Stratagene) did not help in stabilization of this clone. Hybridization and sequencing analysis revealed a consistent deletion 5 bp upstream of the initiating methionine. Interestingly, the original cloning of the rat m4 cDNA did not produce a fulllength clone, all clones initiated 6 bases downstream of the first base of the open reading frame (3). Consequently, we decided to produce an internally labeled RACE product that could be used as a probe in Southern analysis of the R3-6 clone. This analysis should identify only those sequences in the R3-6 clone that are exonic as the probe is initially derived from mature mRNA. Amplified DNA was produced by two consecutive PCR amplifications with [␣-32 P]dATP incorporated in the second amplification as outlined under "Materials and Methods." When analyzed on a 2% agarose gel the radiolabel from 2 probes appeared to be incorporated into a smear of DNA between 100 and 1500 bp with two specific DNA bands observed at approximately 330 and 560 bp (Fig. 5A). This heterogeneity is expected when using an inherently degenerate amplification procedure such as 5Ј-RACE. Initially the RACE probe was hybridized to a slot blot containing DNA fragments derived from the R3-6 clone (Fig. 5B). This data showed that two fragments, the 5.0-and 4.5-kb BamHI fragments, hybridized with the RACE probe indicating that both of these fragments contain exonic sequences, consistent with the previous results discussed. The 4.5-kb BamHI fragment hybridizes due to sequences from the coding exon that are contained within it. The 5.0-kb BamHI fragment, 0.5-5.5 kb 5Ј from the ATG initiation codon, hybridizes because it must also contain exonic se- FIG. 3. Shows the results obtained from a nuclear run-on assay from PC12 nuclei. 500 ng of each DNA fragment was immobilized on a nylon membrane, hybridized with the nuclear run on probe, washed, and exposed to x-ray film for 1 week. The hybridization signal for each of the fragments used is shown in the top panel. Also shown is the hybridization obtained from a negative control plasmid, ptkAGPT, and two positive control plasmids, p-tubulin (containing the coding region for the ␣-tubulin gene) and pleu (containing the cDNA for the tRNA that recognizes leucine). The open box represents a portion of the R3-6 clone spanning a region between 30 kb 5Ј to 2. m4 Muscarinic Promoter quences for the m4 gene. Further upstream fragments also appear to give weak hybridization signals (Fig. 5B) but these are due to background as no hybridization is seen on a Southern blot of these fragments after separation on an agarose gel (Fig. 5D).
To identify more closely which sequences from the 5.0-kb BamHI fragment were hybridizing and thus contained exonic material, a 9.8-kb KpnI fragment, containing 1.6 kb of the coding exon at its 3Ј end and 8.2 kb of sequence 5Ј to the coding exon, was subcloned into pGEM 7Zf(ϩ) (Fig. 5D). This construct was cut with various restriction enzymes and the DNA fragments were separated on an agarose gel, transferred to a nylon membrane, and hybridized with the RACE probe (Fig.  5D). The 1-kb PstI fragment and the larger hybridizing fragments (Ͼ5 kb), seen in the HindIII, BamHI/SalI, and HindIII/ SalI lanes are all due to hybridization with sequences in the coding exon. The smallest upstream DNA fragment that hybridizes is the 1.2-kb HindIII/SalI fragment suggesting that the upstream exon, or exons, are present within this fragment of DNA. However, the possibility that further exonic sequences are present upstream of this cannot be eliminated because the RACE probe produced may be not fully extended to the 5Ј end of the m4 mRNA.
To determine the size and number of the upstream exons an S1 nuclease analysis was performed. Two fragments of DNA from the R3-6 clone were used for this analysis in independent hybridizations. First, the 9.8-kb KpnI fragment (see Fig. 5D), which contains part of the coding exon and upstream sequences for the m4 gene, and the 4.5-kb BamHI fragment (see Fig. 5D) which is completely contained within the 9.8-kb KpnI fragment and contains upstream sequences for the m4 gene but none of the coding exon. The use of two fragments for this analysis allows the resolution of protected sequences due to hybridization with m4 mRNA from any artifacts, as the protected sequences due to upstream exons will be the same size for both of these fragments. In addition the use of RNA from PC12 cells (which express the m4 gene) and CHO cells (which do not express the m4 gene) aids in the identification of m4 specific protected fragments.
Protected fragments are present in the S1 nuclease analysis of PC12 RNA with both DNA fragments (Fig. 6, lanes B and C) but no protected fragments are seen with CHO RNA (Fig. 6,  lane A). This is consistent with protected fragments resulting from hybridization of the DNA with m4 mRNA. Both DNA fragments show protected sequences of higher molecular weight than the 1.3-kb DNA size marker which are most likely the result of DNA:DNA hybridization between complementary strands of the plasmids. The 9.8-kb KpnI fragment shows a protected sequence that is approximately 950 bases in length. This fragment is likely to be due to hybridization with the m4 m4 Muscarinic Promoter coding exon as it is of the correct size and no equivalent protected fragment is seen for the 5.0-kb BamHI fragment which does not contain any sequence from the m4 coding exon.
Both the 4.5-kb BamHI and 9.8-kb KpnI fragments produce protected fragments of 460 bases, suggesting that this is the size of the upstream exonic material. This data, suggesting that the upstream exonic material is approximately 460 bases in length, is also in good agreement with the labeled 5Ј-RACE product of 560 bases obtained with the RACE amplification procedure (see Fig. 5A). As well as containing exonic material the RACE product also contains approximately 80 bases of the coding exon and primer sequences. Therefore the 560-bp 5Ј-RACE product contains approximately 480 bp of upstream exonic material.
The non-coding exon and flanking sequences were subcloned and sequenced. Oligonucleotides were constructed from this sequence data and used in a PCR analysis of m4 cDNA and genomic DNA in order to map the size and boundaries of the non-coding exon. One of the oligonucleotides used, Rm4x2s produced a specific amplified product of 163 bp when used in conjunction with a primer derived from the coding exon, Rm4 63a. Sequencing of the PCR product showed that the coding exon extended 34 bp upstream of the initiating ATG. Examination of corresponding genomic sequence identified a consensus splice acceptor site and indicated the presence of a 4.8-kb intron separating the two exons. PCR analysis using a further upstream oligonucleotide, Rm4x4s and Rm4x2a, resulted in a 400-bp amplification product using either cDNA or plasmid DNA as template, indicating that the intervening sequence was a single contiguous exon. This region proved recalcitrant to standard PCR analysis. After trying several combinations of thermostable polymerases and denaturing agents, success was achieved by using Vent polymerase (New England Biolabs) in conjunction with 10% formamide and using a commensurately lowered annealing temperature of 55°C, a similar combination of conditions has been reported previously for amplification of problematic templates (24). Use of further upstream primers, including Rm4x18s, Rm4x810s, and Rm4x10s in concert with Rm4x4a resulted in amplified products of appropriate size when using plasmid DNA (279, 139, and 72 bp, respectively) but yielded no amplified products when using cDNA as template. Hence, there is no further exonic DNA upstream of base Ϫ27, the most 3Ј base of primer Rm4x10s. This is consistent with hybridization analyses which demonstrated an absence of exonic or intronic sequences in any genomic fragments upstream of the HindIII site 35 bp 5Ј to Rm4x4.
Sequence Analysis of the Upstream Region of the m4 Gene-Sequencing of the non-coding exon and 0.6 kb of 5Ј-flanking sequence reveals several GC-rich domains and an absence of either a TATA or CAAT box. There are, however, Sp-1, AP-2, and AP-3 consensus elements, an RE1 site and two E-boxes. Also, notable by their absence are any cAMP responsive element or AP-1 consensus elements. An inverted repeat is present between ϩ21 and ϩ49, CGGAcagCCCCCACc-CACCCCCcggAGGC.
Characterization of Promoter Activity-Promoter constructs containing either 1520 or 432 bp of the m4 promoter region cloned upstream of the luciferase gene of a modified pGL3 vector (Promega) were transfected into 3 different cell lines and assayed for luciferase expression (Fig. 7). The pGL3 Ϫ1440/ ϩ80 construct is capable of driving luciferase expression in the NG108 cell line but no luciferase expression is seen in CHO or 3T3 cells. The pGL3 Ϫ1440/ϩ80 construct thus appears to be capable of driving cell specific expression. The pGL3 Ϫ352/ϩ80 construct shows the same level of expression in NG108 cells as pGL3 Ϫ1440/ϩ80 but 7-and 10-fold higher expression in CHO and 3T3 cells, respectively. This suggests repressor elements are present between positions Ϫ1140 and Ϫ352 of the m4 promoter, removal of which leads to transcription in both expressing and non-expressing cells. DISCUSSION The control of receptor gene expression underwrites the specificity of cell communication and is hence of paramount importance to the development and function of an organism. By far the greatest molecular diversity is provided by the GPR gene family, whose membership exceeds 1000, yet little is known of the mechanisms regulating transcription of these genes. In this study, we have focused on the structure of the m4 muscarinic receptor gene and its promoter as a step toward identifying genomic elements responsible for controlling m4 gene expression. The work was initiated by the isolation and characterization of a cosmid clone containing the entire m4 gene and 25 kb of upstream sequence. This clone, R3-6, contains sufficient information to drive the expression of the rat m4 gene when introduced into IMR32 cells, a cell line that endogenously expresses m4. This expression displays cell specificity as the m4 gene is not expressed when R3-6 is introduced into CHO cells, a cell line that does not express the m4 gene. Our strategy of initially using the whole cosmid in these transfection studies was to facilitate future searches for transcriptional control elements necessary for m4 gene expression even if they are intronic or intragenic. This latter possibility has many precedents, particularly in neural genes, such as Thy-1 (29), nestin (30), GAP43 (31), and neurone-specific enolase (32), all of which require downstream sequences to generate appropriate expression patterns in transgenic mice.
A combination of nuclease protection and PCR analyses of cDNA and genomic DNA demonstrated that the m4 gene is encoded by a short 460-bp non-coding exon separated from a m4 Muscarinic Promoter 2.6-kb coding exon by a 4.8-kb intron. This gene structure of one coding and one non-coding exon is similar to several other members of the GPR gene family, including the C5a (33), ␣1 B adrenergic (34), and D 1a dopaminergic (35) genes. Although the 5Ј-flanking region of the gene is not particularly GC-rich (55% over the proximal 600 bp), the non-coding exon has several domains of 80% GC. This is likely to be the reason that no full-length m4 cDNA has been isolated from cDNA libraries and that primer extension techniques were unable to produce any specific data. These GC-rich regions were also problematical for PCR analysis and of the various strategies attempted only the combination of Vent polymerase and formamide led to specific amplified products using primer pairs spanning this region. The presence of such GC-rich regions and inverted or direct repeats also impeded isolation of full-length ␣1 B adrenergic (36), D4 dopaminergic (37), and ␣2 nicotinic (38) cDNAs.
Sequencing of 600 bp upstream of the proximal promoter revealed an absence of any TATA or CAAT boxes, this feature is shared with the majority, but not all, of other GPR genes that have been examined, including the ␣1 B (36) and ␤ 1 adrenergic (39), D 1A (22,35) and D 2 dopaminergic (40), 5HT 1c serotoninergic (41), SSTR1 somatostatin (42), and NPY-1 (43) receptor promoters. All of these genes have tissue-specific expression patterns and thus it is becoming increasingly clear that lack of TATA and CAAT boxes can no longer be considered the hallmark of housekeeping genes. Several potential transcription factor binding sites were identified, including three Sp-1 sites, an AP-2 site, an AP-3 site, an RE1 site, and two E-boxes. An inverted RE1 silencing element is present between Ϫ574 and Ϫ550. This region shares 20/24 bases with the silencer element originally identified in the SCG10 (25) and type II Na ϩ channel genes (26) and subsequently in the dopamine ␤-hydroxylase (27) and synapsin I (28) genes. Deletion of 1088 bp of the promoter (Fig. 7, pGL3 Ϫ352/ϩ80) resulted in increased reporter gene expression in CHO and 3T3 cells, suggesting that the m4 gene may be under the control of negative regulators that repress transcription in non-expressing cells. A good candidate for this regulation is the RE1 element that is located within this 1088 bp fragment. We are unable, at the moment, to directly compare the promoter strength of the pGL3 Ϫ352/ϩ80 construct between NG108 and CHO or 3T3 cells because the control SV40 promoter used in these experiments appears to be more active in CHO and 3T3 cells than in NG108 cells. 2 No cAMP response elements were found in this region implying that the m4 gene (or this construct at least) is not under the control of cAMP. Agonist induced down-regulation of muscarinic receptors has been shown to be mimicked by activation of protein kinase C (44), furthermore, agonist activation of mus-carinic receptors induces changes in muscarinic receptor mRNA levels in chick heart (45) and neuroblastoma cells (18) but it is not known if this regulation is mediated at the level of transcription. If the latter was the case, then the AP-2 site may be involved in mediating such actions. The presence of two E-boxes in the proximal promoter raises the possibility that transcription of the m4 gene may be influenced by basic helix loop helix transcriptional activators. The inverted repeat found between ϩ21 and ϩ49 may play a role in regulating transcription as such sequences have been shown to be important in directing specific expression of other genes including the neurofilament heavy chain promoter (46). Whether any of these motifs play a functional role in regulating transcriptional activity, however, awaits determination by deletional and DNA footprint analysis.
Few other GPR promoters have been assessed functionally; those that have been examined include the D1a (22) and D2 dopamine (40), LH (47), and C5a (33). Promoter strength of these constructs varies between 2-fold basal in the case of C5a (33) and 27-fold basal in the case of D 1A (22). These variations in promoter strength do not appear to be simple reflections of endogenous levels of mRNA and thus this wide variation of apparent promoter strength may result from an absence of all the necessary elements required for regulated expression of each of these genes. There are numerous cases of short promoter constructs generating apparent cell specificity in transient transfection assays that subsequently fail to recapitulate appropriate spatiotemporal patterns of expression in transgenic mice, such as the nicotinic ␣-receptor subunit gene where 110 bp of proximal promoter drives myotube-specific expression in vitro (47) but fails to generate any reporter gene expression in transgenic mice (48). Conversely, there are cases of constructs that fail to demonstrate cell specificity in transfections yet generate tissue and stage-specific expression in transgenic mice, such as the neurofilament light gene where a 1.6-kb promoter fragment drives expression in both neuronal and non-neuronal cells in vitro but apparently drives cell-specific expression in transgenic mice (49). Although it would appear from the transient transfection assays reported in the present study, that pGL3 Ϫ1140/ϩ80 has elements required for cellspecific expression, rigorous testing of this hypothesis can only be carried out using a transgenic animal model. FIG. 7. Shows the constructs used in transfection assays. All data are normalized to ␤-galactosidase expression driven by cotransfected pCMV␤ (Clontech). Numbers represent the percentage of expression with respect to pGL3 control when transfected into NG108 -15, CHO, or 3T3 cells. Data represent the mean of at least two independent experiments each performed in triplicate. Values were found to vary less than 20%.