Three vha Genes Encode Proteolipids of Caenorhabditis elegans Vacuolar-type ATPase GENE STRUCTURES AND PREFERENTIAL EXPRESSION IN AN H-SHAPED EXCRETORY CELL AND RECTAL CELLS*

The proteolipids of the vacuolar-type H-ATPase (VATPase) are major components of the integral membrane sector. The vha-1 and vha-2 (vacuolar-type HATPase) genes in Caenorhabditis elegans encode putative 16-kDa proteolipids and are tandemly localized on chromosome III. The vha-2 gene has three exons, whereas vha-1 has no introns. The deduced amino acid sequences of the two genes exhibit about 60% identity with the homologues from yeast, mouse, and cow. The mRNAs of both vha genes are trans-spliced to spliced leaders, suggesting that these genes constitute a polycistronic transcriptional unit. The vha-4 gene consists of four exons and is very similar to the yeast VMA16 gene that codes for the 23-kDa proteolipid. This is the first example of three distinct V-ATPase proteolipids being identified in higher eukaryotes. Northern blot and transgenic analyses show that the three vha genes may be highly expressed in the H-shaped excretory cell, rectum, and a pair of cells posterior to the anus. These results suggest that the V-ATPase activity may be important for exporting toxic compounds or metabolic wastes in this organism.

formed from the 16-kDa proteolipid and other subunits (3,10). The proteolipid is a major component of V o , and chemical modification with N,NЈ-dicyclohexycarbodiimide and directed mutagenesis experiments have identified a glutamate residue essential for proton transport (2,3,11,12). cDNA clones encoding the proteolipids have been obtained from a number of species (13)(14)(15)(16)(17). In Saccharomyces cerevisiae, two genes (VMA3 and VMA11) encode 16-kDa proteolipids, and both gene products are essential for a functional V-ATPase (18 -20). Furthermore, a 23-kDa proteolipid encoded by VMA16 was also found to be required for function (12). Up to now, multiple proteolipid isoforms have not been found in higher eukaryotes.
From the contiguous sequence of Caenorhabditis elegans chromosome III, a putative proteolipid gene has been identified (21), but the open reading frame of 302 amino acid residues is much longer than expected based on homologous proteins from other organisms. We have reassessed the C. elegans genomic sequence and surmised that the chromosome III locus actually contains two 16-kDa proteolipid genes. In this study, we demonstrate the existence of the two genes named vha-1 and vha-2, which are tandemly located as a polycistronic unit. The vha-1 and vha-2 gene products are homologous to those of other organisms. In addition, a third proteolipid gene, vha-4, was identified on chromosome II, and its 23-kDa proteolipid product shares a high degree of homology with the yeast VMA16 protein. The promoters of three vha genes together with that of the V 1 sector B subunit are predominantly active in an H-shaped excretory cell of the adult worm.

EXPERIMENTAL PROCEDURES
General Maintenance of Worm Strains-Wild-type Bristol N 2 was cultured and maintained as described (22). Animals were transformed using the selectable marker plasmid, pRF4 (23).
Sequencing cDNA Clones-yk185d8, yk100 g12, and yk167f7 (lambda ZAP II cDNA clones of vha-1, vha-2, and vha-4, respectively) were kindly provided by Y. Kohara and were converted to recombinant plasmids using the Rapid Excision kit (Stratagene). The resulting plasmid pCVA-1 carried the cDNA from vha-1, pCV10 carried the cDNA from vha-2, and pCVC-1 carried the cDNA from vha-4. Nucleotide sequences were determined using the Dye Terminator DNA sequencing kit (Applied Biosystems). The nucleotide sequence data reported in this paper will appear in the DDBJ, EMBL, and GenBank TM nucleotide sequence data bases with the following accession numbers: vha-1, AB000917; vha-2, AB000918; and vha-4, AB000919.
Amplification of the 5Ј Ends of the vha Transcripts-Total RNA from C. elegans were prepared from a liquid culture of mixed growth stages using TriZOL LS reagent (Life Technologies, Inc.). After first-strand cDNA was synthesized with SuperScript II reverse transcriptase (Life Technologies, Inc.), PCR was performed with the following cycle conditions: 30 s at 94°C, 30 s at 61°C, and 2 min at 68°C for 30 cycles with SL primers (SL1 or SL2 are equivalent to the C. elegans spliced leader sequences) plus gene-specific primers (vha-1, 5Ј-cttcactgatgatatcggcg-3Ј; vha-2, 5Ј-cggaaatccgttaagacttgg-3Ј; or vha-4, 5Ј-gatcccgttgtgaagattcc-3Ј).
Construction of the GFP Reporter Plasmids-To construct the translational GFP fusion genes, vha-1::GFP, vha-2::GFP, and vha-4::GFP, genomic fragments that included the upstream region and sequences encoding the first two amino acids of each vha gene were subcloned in-frame into the GFP (S65C mutation) reporter vectors 2 (24,25). The 6.2-kb BamHI fragment, which includes all of the vha-1 and vha-2 genes, was taken from cosmid R10E11 and ligated into pBluescript II to make subclone pCV-F. To create the vha-1::GFP fusion plasmid pCV01, the 1.3-kb BamHI to SalI fragment of pCV-F was inserted into pPD95.67. To create the vha-2::GFP fusion plasmid pCV012, the 2.2-kb BamHI to Spl I fragment was ligated into pPD95.70. In the case of the vha-4::GFP fusion, the 5.2-kb BamHI to EagI fragment of cosmid T01H3, which includes a part of the vha-4 gene, was inserted into pPD95.67 to create pCVC01. To make the GFP fusion with the putative V-ATPase B subunit gene F20B6, the 3.2-kb HincII fragment, which includes the upstream and coding sequences for the first four amino acid residues, was taken from genomic clone F20B6 and ligated into pPD95.70.
Northern Blot Analysis-Total RNA was electrophoresed on a 1.5% agarose, 6% formaldehyde gel and transferred to a Hybond-N ϩ membrane (Amersham Life Science, Inc.). Probes were digested from cDNA clones (vha-1, bp ϩ6 to ϩ281; vha-2, bp ϩ5 to ϩ283; or vha-4, bp ϩ21 to ϩ359 (numbering from the first letter of the initiation codon)) and 32 P-labeled using the Random Primed DNA labeling kit (Boehringer Mannheim). Hybridizations were carried out using the QuikHyb solution (Stratagene).  Proteolipids of the C. elegans V-ATPase-Analysis of the contiguous sequence of chromosome III led to the identification of a gene in genomic clone R10E11 that was believed to encode a V-ATPase proteolipid-like protein (21). Interestingly, the open reading frame encoded a protein of 302 amino acids that is much longer than known V-ATPase proteolipids. For this reason, we reanalyzed the genomic sequence and found reasonable evidence that the chromosomal segment actually contains two genes very similar to each other (Table I). Both genes encode V-ATPase 16-kDa proteolipids with significant similarity to those known from other organisms. To verify this possibility, Northern blot analysis was carried out using probes that would differentiate between the putative genes. The two probes hybridized with distinct transcripts of different lengths (Fig. 1). These results indicate that the genomic clone R10E11 carries two genes, both of which are transcribed in vivo. The two genes were named vha-1 (0.8-kb transcript) and vha-2 (0.9-and 1.0-kb transcripts).

Presence of Two Genes for the
A Single Polycistronic Transcription Unit of vha-1 and vha-2-C. elegans genes are often transcribed in clusters, and polycistronic RNAs are processed to individual mRNAs by transsplicing (26). One of two spliced leaders, SL1 or SL2, is attached to the 5Ј end of almost all processed transcripts (27).
To assess whether the vha-1 and vha-2 genes are transcribed together, RT-PCR was carried out using primers specific for spliced leaders. The vha-1 mRNA was amplified by RT-PCR only with the SL1 primer ( Fig. 2, lanes 1 and 2), whereas RT-PCR products of vha-2 were obtained using either SL1 or SL2 primer (Fig. 2, lanes 3 and 4). Thus, the vha-1 mRNA was trans-spliced to SL1 exclusively, whereas the vha-2 mRNA was spliced to a mixture of SL1 and SL2. The presence of the spliced leaders on both mRNA in addition to the structure of the gene cluster (Table I) suggests that the two genes are transcribed as a single polycistronic unit. The vha-1 mRNA exclusively received SL1, implying that vha-1 is the upstream gene of this cluster because SL2 is known to be specific for trans-splicing to the downstream genes of a cluster (26,27).
vha-1 and vha-2 Genes Code for the 16-kDa Proteolipids-The Expressed Sequence Tag data base of C. elegans provides preliminary sequencing data for terminal regions of randomly isolated cDNA clones. 3 We searched for clones corresponding to vha-1 and vha-2 in the Expressed Sequence Tag data base and identified cDNA clones yk185d8 and yk100g12 as putative vha-1 and vha-2 gene transcripts, respectively. Upon complete sequencing of the two clones, we found that neither clone contained a spliced leader, indicating that they lacked their 5Јterminal regions. To obtain clones of the 5Ј regions, RT-PCR was carried out as described above. Sequencing of the resulting products confirmed that the 5Ј regions of the initial cDNA clones for vha-1 lacked 31 bp, including the spliced leader, and vha-2 lacked 21 bp. The entire vha-1 cDNA was 830 bp (not including polyadenylation) with an open reading frame for a 169-amino acid polypeptide, whereas the vha-2 cDNA was 983 bp with an open reading frame for 161 residues. Comparison of the genomic and cDNA sequences revealed that vha-1 is an intron-less gene and vha-2 consists of three exons (Fig. 3A).
Homology of 16-kDa Proteolipids of C. elegans and Other Sources-Consistent with the known V-ATPase proteolipids, FIG. 1. Northern blot analyses of vha-1, vha-2, and vha-4. 15 g of total RNA from mixed stage populations were electrophoresed, blotted, and probed with a portion of the coding sequence from each gene as described under "Experimental Procedures." The size of transcripts were estimated from RNA standards run on the same gel. Lanes 1, 2, and 3 were blotted with probes for vha-1, vha-2, and vha-4 gene transcripts, respectively. Arrowheads indicate positions of transcripts.  the vha-1 and vha-2 gene products are hydrophobic and possess four putative transmembrane domains. The proteins share 60% identity with each other and exhibited 55-67% similarity with 16-kDa proteolipids of yeast, mouse, and cow (Fig. 3B). Especially noteworthy is the high degree of sequence conservation in the fourth transmembrane segment, which includes the putative N,NЈ-dicyclohexycarbodiimide-reactive glutamic acid (Glu-153 in Vha1 and Glu-145 in Vha-2). This segment shares about 80% identity, and all remaining residues are conservatively substituted.
vha-4 Codes for the 23-kDa Proteolipid-The C. elegans genome project 4 predicted that the T01H3 cosmid clone contains another potential proteolipid gene, T01H3.1, which is similar to the yeast VMA16 protein (12). Based on the Expressed Sequence Tag data base of C. elegans, T01H3.1 was believed to be covered by five lambda cDNA clones. We sequenced both terminal regions of all cDNA inserts and selected the longest clone, yk167f7, for further study. The full sequence was deter-mined, and the 1002-bp insert was found to contain an open reading frame for a 214-amino acid polypeptide with five putative transmembrane segments but without a spliced leader. This gene, named vha-4, contains three introns and maps to chromosome II.
For analysis of the 5Ј upstream region of the vha-4 mRNA, RT-PCR was performed using primers specific for SL1 and SL2. Only SL1 spliced leader sequence was found associated with vha-4 mRNA, suggesting that the vha-4 gene is localized on the first gene of a polycistronic unit (Fig. 4A). SL1 was located four nucleotides upstream of the initiation methionine codon.
Homology of 23-kDa Proteolipid of C. elegans and Other Sources-As shown in Fig. 4B, comparison of deduced amino acid sequences found that the Vha-4 protein exhibited 52% identity with the yeast VMA16 protein (12) and 64% identity with a human homologue. 5  . The genomic fragment between the BamHI and the SalI or SplI sites after the first two codons of each vha gene was fused with the GFP gene for testing expression (plasmids pCV01 for vha-1 and pCV012 for vha-2; see ''Experimental Procedures''). B, multiple alignment of 16-kDa proteolipids of nematode, yeast, cow, and mouse. The deduced amino acid sequences from yeast Vma3 (18,19), Vma11 (20), cow (17), mouse (15), and nematode (Vha-1 and Vha-2) were aligned for maximal homology. Boxes represent amino acid residues conserved in all four species. The asterisk denotes the putative N,NЈ-dicyclohexycarbodiimide-reactive glutamate residue. Putative transmembrane domains (I-IV) were defined by hydropathy analysis. the three proteolipids. This residue was shown in the yeast V-ATPase to be critical for function (12).
Characterization of Three vha Transcripts-To determine the sizes of the vha-1, vha-2, and vha-4 mRNA, total RNA prepared from populations of mixed stages were hybridized with the 5Ј region of each cDNA. Single mRNA bands of about 0.8 kb corresponding to vha-1 and 1.0 kb corresponding to vha-4 were found and were consistent with the size of their cDNA clones (Fig. 1, lanes 1 and 3). On the other hand, two transcripts of 0.9 kb and 1.0 kb were detected by a vha-2specific probe (Fig. 1, lane 2). The size of the longer RNA agreed well with that of the vha-2 cDNA isolated above. To assess whether the vha-2 gene contains the two polyadenylation sites, RT-PCR was carried out using a primer including an oligo(dT) sequence. Two amplified products with about a 100-bp difference were obtained. Sequence analysis showed that the polyadenylation of the shorter RT-PCR product was located 135 bp upstream of that of the longer one, which was identical to the isolated vha-2 cDNA clone. These results indicate that the vha-2 gene has two transcripts with different lengths. A cDNA clone corresponding exactly to the shorter sequence product was obtained, confirming that vha-2 is transcribed as two different forms.
The amounts of the vha-1 and vha-2 mRNA isolated were roughly the same and strengthened the notion that these two genes are transcribed as a single polycistronic unit. The amount of vha-4 mRNA was 5-10-fold lower than those of vha-1 and vha-2, which is a similar situation to yeast where the VMA16 protein is found in relatively low abundance (12).
Preferential Expression of the Three vha Genes in an Excretory Cell of C. elegans-The V-ATPase gene is expected to be a housekeeping gene because every cell has vacuole-related organelles. In addition, V-ATPase is strongly expressed in specific tissues (28 -30). To determine in which C. elegans cells the vha genes were most strongly expressed, three translational fusion genes, vha-1::GFP, vha-2::GFP, and vha-4::GFP, were constructed by inserting upstream sequences plus the first two codons of the vha genes into GFP vectors. Although the GFP fusion protein contains a nuclear localization signal, the protein is known to leak to the cytoplasm. 2 All three GFP fusion genes were strongly expressed in the large mononuclear cell with bilateral excretory canals extending along the length of the body (Fig. 5, A-E). The cell body forming a bridge between the two lateral canals is positioned on the ventral epidermal ridge slightly posterior of the nerve ring. This cell is called the H-shaped excretory cell and is believed to function in toxin and metabolic waste excretion and osmoregulation (31)(32)(33)(34). In addition to the excretory cell, the three fusion proteins were detected in cells of the rectum (Fig.  5, B, C, and F) and a pair of cells with parallel orientation posterior to the anus (Fig. 5, C and G). No signals were detected without the control regions of the vha genes. In fact, the three fusion proteins were observed to have indistinguishable expression patterns. These results indicate that the vha promoters are strongly active in the H-shaped excretory cell, rectum, and a pair of cells posterior to the anus and imply that C. The 5.2-kb genomic fragment between the upstream BamHI site and the coding region EagI site, including the first two amino acids of vha-4, was fused with the GFP gene for testing the expression pattern (pCVC01). The direction of transcription for the T01H3.4 and T01H3.5 genes is opposite to vha-4 and T01H3.2. B, multiple alignment of 23-kDa proteolipids of nematode, yeast, and human. The deduced amino acid sequences from yeast Vma16 (12), human, 5 and nematode (Vha-4) were aligned for maximal homology. Boxes represent amino acid residues conserved in all three species. The asterisk denotes the glutamate residue that is essential for yeast V-ATPase activity (12). Putative transmembrane domains (I-V) were defined by hydropathy analysis. elegans V-ATPase may be important for the excretory function.
To test whether the V 1 sector was also highly expressed in the same cells as the proteolipids, GFP was translationally fused with the V 1 sector B subunit gene. The GFP fusion of the B subunit was preferentially expressed in the H-shaped cell, indicating that the V-ATPase exists as a functional enzyme in the H-shaped excretory cell. DISCUSSION We have identified three distinct C. elegans genes that putatively encode the V-ATPase proteolipid. All were transcriptionally active and were expressed in cells involved in excretion. The vha-1 and vha-2 genes form a polycistronic unit on chromosome III. The proteins coded by the two genes share 60% identity to each other and are highly similar to the 16-kDa proteolipids of yeast (19,20,35), Manduca sexta (13), Drosophila melanogaster (14), mouse (15), cow (17), and human (36). In yeast, VMA3 and VMA11 encode different 16-kDa proteolipids, both of which are essential for V-ATPase activity (19,20,35). Since vha-1 and vha-2 gene products share equal similarities with the VMA3 and VMA11 proteins, we could not draw definite conclusions about the correspondence of the two yeast genes to the C. elegans counterparts. Despite their similarities, vha-1 and vha-2 could not restore the negative growth of the VMA3 or VMA11 null mutants, 6 which were conditionally le-thal on neutral pH plates (35). A third proteolipid gene, vha-4, was found on chromosome II and codes for a 23-kDa proteolipid homologous to the yeast VMA16 protein, suggesting that the vha-4 is a functional counterpart of VMA16.
The discovery of the two genes, vha-1 and vha-2, for the 16-kDa proteolipid and vha-4 gene for the 23-kDa proteolipid is the first example in higher eukaryotes of three distinct V-ATPase proteolipid genes. In the lower eukaryote S. cerevisiae, all three proteolipids genes (VMA3, VMA11, and VMA16) are essential for function. An important question is whether the three C. elegans vha gene products are all required for V-ATPase function. In this regard, the GFP reporter gene experiments found that the three proteolipids have identical expression patterns, strongly suggesting that all three isoforms are necessary (Fig. 5). One is cautioned that each proteolipid may be used in the V o sectors of different organelles.
vha-1 and vha-4 gene transcripts had sizes expected based on their cDNA clones. In contrast, two transcripts were detected for the vha-2 genes with different lengths in the 3Ј-untranslated regions. In analogy to the two tissue-specific transcripts of the human V 1 sector B subunit (37), it is tempting to hypothesize that transcription variants of vha-2 may be expressed in different cells.
The GFP gene under control of the vha-1, vha-2, or vha-4 regulatory sequences was expressed in the H-shaped excretory cell, rectum, and a pair of cells near the anus. We could not rule out the possibility that the expression patterns resulted from 6 T. Oka, R. Yamamoto, and M. Futai, unpublished observation. the overproduction of the GFP fusion proteins because transgenic animals were carrying the expression plasmid as an extrachromosomal array. This did not appear to affect the tissue-specific expression because the same expression patterns were observed for the GFP fusions under control of the upstream regions of three different vha genes. It is reasonable to assume from these results that the signals of the GFP fusion proteins are closely related to the distribution of C. elegans V-ATPase. Interestingly, the C. elegans P-glycoprotein, Pgp-3, was predominantly expressed in the H-shaped excretory cell (34). Furthermore, the pgp-3 deletion mutant was sensitive to both colchicine and chloroquine (34), suggesting that the Pglycoprotein functions in exporting toxic compounds. Because of the similar distribution of the vha genes, we suggest that the proton electrochemical potential generated by V-ATPase may also be required for exporting toxic or metabolic wastes.
The mechanisms and functions of V-ATPase have been investigated extensively and are well understood at the molecular level (3,38); however, the roles of the enzyme in development and behavior of higher eukaryotes still remain uncertain. C. elegans is a model organism suitable for such problems using genetic approaches. The present study is the initial step in elucidating the functional roles of V-ATPase and acidic organelles in development and behavior.
Acknowledgments-We thank Dr. Andy Fire for the GFP reporter plasmids, Dr. Alan Coulson for the F20B6, R10E11, and T01H3 cosmid clones, Dr. James Kramer for plasmid pRF4, Drs. Shouhei Mitani and Kiyoshi Kita for wild-type Bristol N 2 , and Dr. Yuji Kohara for the C. elegans cDNA clones (yk100 g12, yk167f7, and yk185d8). We give special thanks to Drs. Ryugo Hirata and Yasuhiro Anraku for providing us with the yeast strains and many excellent suggestions, Dr. Jun Takeda for allowing us to view his manuscript before publication, and Dr. Robert Nakamoto for his critical reading of this manuscript. We also thank Drs. Makoto Koga, Ken-ichi Ogura, and Yasumi Oshima for their significant comments and helpful suggestions.