Structure and Function of Human Prepro-orexin Gene*

Orexin-A and -B are recently identified potent orexigenic peptides that are derived from the same precursor peptide and are highly specifically localized in neurons located in the lateral hypothalamic area, a region classically implicated in feeding behavior. We cloned the whole length of the human prepro-orexin gene and corresponding cDNA. The human prepro-orexin mRNA was predicted to encode a 131-residue precursor peptide (prepro-orexin). The human prepro-orexin gene consists of two exons and one intron distributed over 1432 base pairs. The 143-base pair first exon includes the 5′-untranslated region and a small part of the coding region that encodes the first seven residues of the secretory signal sequence. The second exon contains the remaining portion of the open reading frame and 3′-untranslated region. The 3.2 kilobase pairs of the 5′-upstream region from a cloned human prepro-orexin gene promoter is sufficient to direct the expression of the Escherichia coliβ-galactosidase (lacZ) gene in transgenic mice to neurons in the lateral hypothalamic area and adjacent regions. ThelacZ-positive neurons were positively stained with anti-orexin antibody but not with anti-melanin-concentrating hormone antibody. These findings suggest that this genomic fragment contains all the necessary elements for appropriate expression of the gene and will be useful for the targeted expression of the exogenous gene in orexin-containing neurons. These mice might also be useful for examining the molecular mechanisms by which orexin gene expression is regulated.

Orexins (orexin-A and -B) are neuropeptides that were identified as endogenous ligands for an orphan G-protein-coupled receptor, which was originally found as an expressed sequence tag from human brain (1). Orexin-A and -B are derived from the same precursor peptide (prepro-orexin) by proteolytic processing. They bind and activate two closely related G-proteincoupled receptors, termed OX 1 and OX 2 receptors. OX 1 receptor is selective for orexin-A, whereas OX 2 receptor is a nonselective receptor for both orexin-A and -B. Prepro-orexin mRNA and immunoreactive orexin-A are highly specifically localized in neurons within and around the lateral hypothalamic area (LHA) 1 in the adult rat brain, a region implicated in feeding behavior (2)(3)(4). Orexin-containing neurons diffusely innervate the entire brain, including monosynaptic projections, to various regions of the cerebral cortex, limbic system, and brain stem (5,6). Orexins stimulate food consumption when administered intracerebroventricularly (1). Orexin gene expression is upregulated upon fasting, suggesting the existence of molecular mechanisms that control orexin gene expression according to the nutritional status of the animal (1).
Expression of the orexin gene is highly restricted to neurons located in the LHA, indicating the existence of molecular mechanisms by which orexin gene transcription is highly specifically performed by the distinct population of neurons in these areas (1,5,7). Radiation hybrid mapping showed that the human prepro-orexin gene is located at human chromosome 17q21 (1). We have already reported that human prepro-orexin mRNA can also be exclusively detected in the hypothalamus/subthalamic regions (1). The mechanisms by which orexin gene expression is highly restricted to the distinct populations of neurons in these regions is of interest.
As the first step toward unveiling these mechanisms, we cloned fragments of the human prepro-orexin gene and its corresponding cDNA to determine their complete primary structures. One way to study the physiological roles of the orexin neuronal system would be to examine the consequences of expression of exogenous genes in orexin-producing neurons of transgenic mice, thereby manipulating the cellular environment in vivo. However, such studies require the use of an appropriate gene promoter to direct gene expression to orexinproducing neurons. Human prepro-orexin gene promoter is indeed a good candidate for targeting gene expression to orexin-producing neurons.
From these points of view, we made a prepro-orexin-lacZ fusion gene and tested it in transgenic mice to identify a DNA fragment containing all the necessary elements for appropriate orexin expression. This approach would also be useful to examine the mechanism by which orexin gene expression is highly restricted to the LHA and adjacent regions.

EXPERIMENTAL PROCEDURES
Cloning of Human Prepro-orexin Gene-Because we found that the full-length rat orexin cDNA, which contains CTG triplet repeats (encoding the oligo-leucine stretch in the signal sequence), tends to crosshybridize with a number of unrelated genes, we used a 0.29-kb segment of rat cDNA encoding Gln 33 -Ser 128 of prepro-orexin as a probe (1). Approximately one million clones from a human genomic library (CLONTECH) were screened by plaque hybridization with this cDNA probe. We isolated several clones, and one of the longest clones, * This work was supported in part by grants from the University of Tsukuba Project Research and by the Ministry of Education, Science, and Culture of Japan. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EBI Data Bank with accession number(s) AF118885.
Genomic Southern Blot Analysis-A membrane that contains human genomic DNA digested with designated restriction enzymes was purchased from CLONTECH (Human GENO-BLOT; catalog number 7700-1). The 544-bp PstI fragment of human prepro-orexin gene was labeled by the random priming method with [␣ 32 P]dCTP to a specific activity of 5 ϫ 10 8 cpm/mg and used as a probe. The membrane was incubated at 65°C in a solution containing 1 M NaCl, 1% SDS, 150 mg/ml salmon sperm DNA, and 5 ng/ml probe. The membrane was finally washed in a solution containing 0.1ϫ SSC (1ϫ SSC ϭ 0.15 M NaCl and 0.015 M sodium citrate) and 0.1% SDS at 50°C and subjected to autoradiography at Ϫ80°C for 48 h.
Primer Extension Analysis-An infrared dye (IRD41)-labeled primer 5Ј-GTAGCCGGGAAAGGAGATGTCTGTGGTGG-3Ј, which is complementary to position 72 to 100 of human prepro-orexin gene, was hybridized to 1 g of human whole brain poly(A ϩ ) RNA (purchased from CLONTECH) in a solution containing 80% formamide, 400 mM NaCl, 10 mM EDTA, 40 mM PIPES (pH 6.4) at 30°C. The hybridized RNA/primer was precipitated with ethanol and then subjected to reverse transcription with avian myeloblastosis virus reverse transcriptase. The same primer was also used for dideoxy sequencing reaction with Thermo Sequenase (Amersham Pharmacia Biotech) using ghLig72-7 DNA as a template. The samples were analyzed by 4000L DNA sequencer (LI-COR).
Cloning of Human Prepro-orexin cDNA-Human cDNA encoding prepro-orexin was cloned by the 3Ј-rapid amplification of cDNA ends method using human hypothalamus Marathon-ready cDNA (CLON-TECH; catalog number 7429-1) as a template following the manufacturer's instruction. We made an oligonucleotide, 5Ј-ACATCTCCTTTC-CCGGCTACCCCAC-3Ј, which corresponds to nucleotides 81-105 of human prepro-orexin gene (5Ј-untranslated region of human preproorexin mRNA) and used it as a specific primer. We performed the polymerase chain reaction using this primer and adaptor primer (AP-1 primer; CLONTECH) under the following conditions: 30 cycles at 94°C for 1 min, 60°C for 1 min, and 70°C for 1 min. A discrete 0.6-kb product containing the correct cDNA sequence was obtained and subcloned into pCR2.1 vector (Invitrogen). The cDNA sequence was determined by sequencing the products obtained from three independent 3Ј-rapid amplification of cDNA ends reactions with a DNA sequencer (LI-COR IR4200) using T7 primer and M13 reverse primer.
Transgenic Mice with Human Prepro-orexin-nlacZ Fusion Gene-We performed polymerase chain reaction reaction using cloned human prepro-orexin gene as a template with two oligonucleotide primers, 5Ј-GCAGCGGCCATTCCTTGG-3Ј and 5Ј-AAGTCGACGGTGTCTGGC-GCTCAGGGTG-3Ј. The product corresponded to position Ϫ118 to 122 of human prepro-orexin gene shown in Fig. 2 and had an artificial SalI site at the 3Ј-end. We digested this fragment with PstI and SalI and ligated it to the 1.6-kb PstI fragment of the cloned prepro-orexin gene (from Ϫ1.7 kb to 72 bp shown in Fig. 2). This DNA fragment was then ligated to BamHI-PstI (Ϫ3.15 -Ϫ1.7 kb) fragment of the gene. The resulting DNA fragment, which has a 3.15-kb 5Ј-flanking region and the whole length of the 5Ј-noncoding region, was used as the promoter that directs expression of the cloned Escherichia coli lacZ gene with an inserted SV40 antigen nuclear localization signal (nlacZ), derived from pnlacF (8). The 3Ј-end of nlacZ was ligated to murine protamine-1 (mPrm1) gene fragment (from ϩ95 relative to the transcription start site to ϩ625), which includes part of exon 1 and all of intron 1 and exon 2, including the poly(A ϩ ) adenylation signal and site.
The resulting gene fragment free of vector sequence was isolated and injected into fertilized mouse eggs to generate transgenic founder animals. The presence and copy numbers of the transgene was identified by tail blot (9). Animals from these lines were examined by ␤-galactosidase histochemical technique to assess transgene expression in the tissues.
LacZ Histochemical and Immunohistochemical Staining-The mice were perfused via the heart with phosphate-buffered saline followed by 4% paraformaldehyde in 0.1 M phosphate buffer (pH 7.3). Tissue fragments were further fixed for 60 min at 4°C in the same fixative buffer. They were then rinsed three times with a solution containing 0.1 M phosphate buffer (pH 7.3), 2 mM MgCl 2 , 0.01% sodium deoxycholate, and 0.02% Nonidet P-40. The staining reaction was then performed by incubating the tissue fragments for 16 -24 h at 37°C in a solution containing 0.1 M phosphate buffer (pH 7.5), 2 mM MgCl 2 , 0.01% sodium deoxycholate, 0.02% Nonidet P-40, 1 mg/ml X-gal, 5 mM K 3 Fe(CN) 6 , and 5 mM K4Fe(CN) 6 . Post fixation was performed for 24 -48 h in 10% formalin. Tissues were dehydrated and embedded in paraffin for sectioning.
For immunohistochemical staining, coronal sections of brain (40m) were incubated for 35 min in 0.6% hydrogen peroxide to eliminate endogenous peroxide activity. Sections were rinsed in phosphate buffer and incubated for 30 min in Tris-buffered saline containing 3% normal goat serum and 0.25% Triton X-100. Thereafter, sections were incubated with rabbit polyclonal anti-orexin antibody (1) or anti-melaninconcentrating hormone (MCH) antibody diluted 1/2000 in Tris-buffered saline containing 1% normal goat serum and 0.25% Triton X-100 overnight at 4°C. The primary antibody was localized with the avidin-biotin system (Vector Laboratories). Bound peroxidase was visualized by incubating sections with 0.01 M imidazole acetate buffer containing 0.05% 3,3Ј-diaminobenzidine tetrahydrochloride and 0.005% hydrogen peroxide.

Structure and Sequence of Human Prepro-orexin Gene and
Its Transcript-By screening a human genomic library (constructed in EMBL3 SP6/T7 vector) with rat prepro-orexin cDNA probe, we cloned a recombinant phage clone ghLig72-7, which has an approximately 18-kb insert that contains the whole length of human prepro-orexin gene. We subcloned the 4.9-kb BamHI fragment of ghLig72-7 insert into the pBluescript SKII(ϩ) vector (Stratagene), which we termed pghLig72Bam, and subjected it to further analyses. Structural organization of the human prepro-orexin gene and mRNA are shown in Fig. 1. We also cloned a cDNA for human prepro-orexin by 3Ј-rapid amplification of cDNA ends to determine the sequence of prepro-orexin mRNA.
To locate the transcription initiation sites, we performed primer extension analysis using poly(A ϩ ) RNA from whole human brain or control yeast tRNA. Primer extension analysis with a 29-mer oligonucleotide primer complementary to the 5Ј-untranslated region (positions 23 to 51 upstream from the putative translation initiation site) revealed that the main transcription initiation site is at nucleotide 1 in Fig. 2 tential site for the promoter region was predicted to start at position Ϫ291 by BCM Gene Finder web site. 2 This program also ignores the TATAAA sequence in position 5-10.
The restriction enzyme map and the structural organization of the human prepro-orexin gene are schematically shown in Fig. 1. The complete nucleotide sequence of the structural gene and the 5Ј-and 3Ј-flanking regions determined from the results of primer extension analysis and the cDNA sequence are shown in Fig. 2. The pghLig72Bam insert contains a 3149-bp 5Ј-flanking region, a 1432-bp structural gene, and a 364-bp 3Јflanking region. The structural gene consists of two exons (143 and 473 bp) and one intron (816 bp). Southern blots of human genomic DNA were probed with the 544-bp PstI fragment of ghLig72-7, which contains a part of intron 1 and exon 2. Single EcoRI (Ͼ23 kb), HindIII (7.0 kb), BamHI (4.9 kb), and PstI (0.55 kb) fragments of the genomic DNA hybridized with the probe, indicating that the cloned DNA is an authentic copy of the genomic DNA (Fig. 3).
A BLAST (blastn) search of the GenBank data bases with the sequence presented in Fig. 3 failed to find a significantly similar sequence, except for several highly repetitive elements of the primate Alu family (nucleotides Ϫ2414 to Ϫ2143, Ϫ1916 to Ϫ1640, Ϫ1627 to Ϫ1343, Ϫ1292 to-1004, and Ϫ885 to Ϫ613) (searched with CENSOR Web Server 3 ). These regions show 64 -85% nucleotide identity with the consensus sequence of human Alu repeat.
Comparison of the gene and cDNA sequences, together with the results form the primer extension analysis (Fig. 4), suggest that human prepro-orexin mRNA, which is 616 nucleotides long excluding the poly(A ϩ ) tail, is encoded by two exons distributed over 1432 base pairs of the human genome (Figs. 1 and 2). The 5Ј-most ATG codon of the cDNA (nucleotides 123-125) was preceded by an in-frame stop codon (TGA; nucleotides 108 -110), and the sequence around this initiation codon conformed well to Kozak's rules (11). The open reading frame starting with this ATG encodes a 131-residue polypeptide, human prepro-orexin (Fig. 2). The 5Ј-untranslated region and the first 7 residues of the secretory signal sequence correspond to the 143-bp first exon. The 473-bp second exon contains the remaining portion of the open reading frame and the 102-bp 3Ј-untranslated region. Thus, the remaining portion of the signal sequence (residues 8 -33) and pro-orexin are encoded in exon 2 (Fig. 2).
Structure and Sequence of Human Prepro-orexin-The first 33 amino acids of human prepro-orexin exhibited characteristics of a secretory signal sequence: a hydrophobic core followed by residues with small polar side chains (12). The SignalP Server web site 4 predicted that Ala 33 -Gln 34 was the most likely site for signal sequence cleavage. The orexin-A sequence starts with Gln 34 , which is presumably cyclized enzymatically into the N-terminal pyroglutamyl residue by transamidation (13,14). Thus, the mature peptide directly follows the signal pep-   tide cleavage site. The last residue of the mature peptide is followed by Gly 67 , which presumably serves as an NH 2 donor for C-terminal amidation by the sequential actions of peptidylglycine monooxygenase and peptidylamidoglycolate lyase (15,16). As expected, Gly 67 is followed by a pair of basic amino acid residues, Lys 68 -Arg 69 , which constitutes a recognition site for prohormone convertases (17). The last residue of the sequence of orexin-B, Met 96 residue, is again followed by Gly-Arg-Arg. These observations suggest that human orexin-A and -B are also C-terminally amidated like their counterparts in the rodent. The predicted human orexin-A sequence was identical to rodent/bovine orexin-A. Human orexin-B had two amino acid substitutions compared with the rodent sequence. Overall, the human prepro-orexin sequences were 83% identical to the rat counterpart (1). The majority of amino acid substitutions were found in the C-terminal part of the precursor, which appears unlikely to encode for another bioactive peptide.
Expression of ␤-Galactosidase Gene in Transgenic Mice-A fragment of human prepro-orexin gene, which contains a 3.15-kb 5Ј-flanking region and the whole length of the 5Јnoncoding region of exon 1, was fused to the modified E. coli lacZ gene, which has an SV40 T antigen nuclear localization signal (nlacZ) (8) (Fig. 1). We generated transgenic mice using this construct as a transgene. A total of six lines bearing the transgene were examined, and lacZ expression could be detected in three of these ( Table I).
The lacZ staining showed that the transgene was highly specifically expressed in the LHA/subthalamic region in two lines ( Table I), suggesting that this fragment contains all the necessary elements for appropriate expression in these regions (Fig. 5). We observed that the neurons expressing the transgene were only a subset of the neurons expected to stain. Generally, only 30 -50% of neurons containing immunoreactive orexin were stained by ␤-galactosidase histochemical technique. All of the lacZ-positive neurons in the LHA contained immunoreactive orexins (Fig. 5).
Line D5 showed ectopic expression of lacZ in several regions that do not express orexins, including the arcuate nucleus, periventricular nucleus, and preoptic nuclei (Table I). This ectopic expression was only observed in line D5 and might be because of a positional effect. We could not observe any ectopic expression of transgene other than the eutropic expression in the LHA in line A3 and line J2 (Table I).
We also examined whether MCH-containing neurons, which are also known to be exclusively located in the LHA and adjacent regions, express the transgene or not. As shown in Fig. 5, MCH-positive neurons did not show lacZ activity, suggesting that this promoter does not direct expression to the MCH neurons. We could not observe any positive lacZ staining in tissues outside the brain. DISCUSSION Recent studies have identified several neuropeptide and receptor systems in the hypothalamus that are critical in the regulation of body weight (18). The LHA has long been considered essential in regulating food intake and body weight, because cell specific lesions of this region can result in decreased food intake and body weight (2), and this region contains glucose-sensing neurons (4).
We recently identified a family of neuropeptides, orexins, which are localized exclusively in neurons in the LHA. Orexins increase food intake when administered intracerebroventricularly. The neurons containing orexins (orexin neurons) diffusely innervate the entire brain, including monosynaptic projections to the cerebral cortex, limbic system, and brain stem (5)(6)(7). Therefore, orexin neurons may be ideally positioned to regulate cognitive, motivational, emotional, and autonomic aspects of food intake and body weight regulation.
Orexin neurons are shown to be highly specifically localized within and around the LHA in rodents and humans (1, 5-7). These observations suggest the existence of molecular mecha- nisms by which orexin expression is highly restricted to distinct populations of neurons in these regions. To examine these mechanisms, we first determined the structure of the human prepro-orexin gene and its transcript.
The human prepro-orexin mRNA was predicted to encode a 131-residue precursor peptide. The human prepro-orexin gene consists of two exons and one intron distributed over 1432 bp (Fig. 1, 2). The 143-bp exon 1 includes the 5Ј-untranslated region and the coding region that encodes the first 7 residues of the secretory signal sequence. Exon 2 contains the remaining portion of the open reading frame and the 3Ј-untranslated region. The predicted human orexin-A sequence was identical to rodent/bovine/porcine orexin-A. Human orexin-B had two amino acid substitutions compared with the rodent sequence, and one amino acid substitution as compared with the porcine sequence (Fig. 2). Considering the difference in species, the structures of both orexins are strikingly conserved. This suggests the important physiological roles of orexins.
To evaluate the function of the cloned human prepro-orexin gene fragment, we generated transgenic mice using human orexin gene-nlacZ fusion gene. This construct utilizes the human prepro-orexin gene fragment, which contains the 3,149-bp 5Ј-flanking region and 122-bp 5Ј-noncoding region of exon 1 to drive the nlacZ gene as a reporter protein (Fig. 1). We found that this gene fragment is sufficient to direct the expression of human prepro-orexin mRNA in the LHA and adjacent regions (Table I, Fig. 5). This finding suggests that this genomic fragment contains all the necessary elements for appropriate expression of the gene. However, we observed that the neurons expressing the transgene were only a subset of the orexincontaining neurons (Fig. 5). Generally, only 30 -50% of neurons containing immunoreactive orexins were stained by ␤-galactosidase histochemistry. This phenomenon has also been described for hsp68-lacZ and dopamine ␤ hydroxylase-lacZ transgene and is referred to as incomplete penetrance (8,19).
On the other hand, neurons containing MCH, which are also known to be localized exclusively in the LHA, did not express lacZ in the transgenic mice (Fig. 5). This observation suggests that this promoter specifically directs expression to orexin neurons, and orexin neurons and MCH neurons are distinct neuronal populations.
Transgenic line D5 showed ectopic expression in several regions, which may be because of a positional effect (Table I).
Alternatively, there may be other necessary elements to ensure the proper gene expression in other genomic regions. In this case, ectopic expression would be eliminated by the inclusion of additional DNA for the endogenous gene, leading to valuable insights into the regulatory mechanism of the endogenous gene.
In any case, the 3.2-kb human prepro-orexin promoter we used in this study was sufficient to direct expression of the exogenous gene in orexin-producing neurons in transgenic mice. Therefore, this promoter might be useful to examine the consequences of the expression of exogenous molecules in orexin neurons of transgenic mice, thereby manipulating the cellular environment in vivo. This genomic fragment will also be useful for the targeted ablation of orexin neurons by using it as a promoter that drives toxin expression (20).
Prepro-orexin mRNA was shown to be up-regulated under fasting conditions, indicating that these neurons somehow sense the nutritional status of the animal. We have found that orexin gene expression is influenced by plasma glucose and leptin levels. 5 Human prepro-orexin nlacZ transgenic mice will also be useful to examine the molecular mechanisms by which orexin gene expression is regulated.