OC-2, a Novel Mammalian Member of the ONECUT Class of Homeodomain Transcription Factors Whose Function in Liver Partially Overlaps with That of Hepatocyte Nuclear Factor-6*

Transcription factors of the ONECUT class, whose prototype is hepatocyte nuclear factor (HNF)-6, are characterized by the presence of a single cut domain and by a peculiar homeodomain (Lannoy, V. J., Bürglin, T. R., Rousseau, G. G., and Lemaigre, F. P. (1998) J. Biol. Chem. 273, 13552–13562). We report here the identification and characterization of human OC-2, the second mammalian member of this class. TheOC-2 gene is located on human chromosome 18. The distribution of OC-2 mRNA in humans is tissue-restricted, the strongest expression being detected in the liver and skin. The amino acid sequence of OC-2 contains several regions of high similarity to HNF-6. The recognition properties of OC-2 for binding sites present in regulatory regions of liver-expressed genes differ from, but overlap with, those of HNF-6. Like HNF-6, OC-2 stimulates transcription of the hnf-3βgene in transient transfection experiments, suggesting that OC-2 participates in the network of transcription factors required for liver differentiation and metabolism.

Transcription factors of the ONECUT class, whose prototype is hepatocyte nuclear factor (HNF)-6, are characterized by the presence of a single cut domain and by a peculiar homeodomain (Lannoy, V. J., Bü rglin, T. R., Rousseau, G. G., and Lemaigre, F. P. (1998) J. Biol. Chem. 273, 13552-13562). We report here the identification and characterization of human OC-2, the second mammalian member of this class. The OC-2 gene is located on human chromosome 18. The distribution of OC-2 mRNA in humans is tissue-restricted, the strongest expression being detected in the liver and skin. The amino acid sequence of OC-2 contains several regions of high similarity to HNF-6. The recognition properties of OC-2 for binding sites present in regulatory regions of liver-expressed genes differ from, but overlap with, those of HNF-6. Like HNF-6, OC-2 stimulates transcription of the hnf-3␤ gene in transient transfection experiments, suggesting that OC-2 participates in the network of transcription factors required for liver differentiation and metabolism.
The identification of transcription factors has provided insight into the mechanisms of the control of gene expression. Some of these factors contain a DNA-binding region called the homeodomain. The homeodomain proteins are evolutionarily conserved and play an important role in cell differentiation and in morphogenesis (1)(2)(3)(4). Several homeodomain proteins contain a second type of DNA-binding domain. This is the case for the proteins of the CUT superclass, in which the second DNAbinding domain is called cut because it was initially described in the Drosophila CUT protein (5)(6)(7). The CUT superclass comprises three classes (8). The CUX class, whose members have three cut domains, includes the Drosophila CUT protein and its mammalian homologs, namely human CDP, rat CDP-2, dog CLOX, and mouse CUX and CUX-2. The SATB class, whose members have two cut domains, includes the human homeodomain proteins called matrix attachment region-binding proteins or SATB (special AT-rich binding) proteins. A third class, called ONECUT because its members have a single cut domain, was identified (8) thanks to the cloning of rat hepatocyte nuclear factor-6 (HNF-6) 1 (9). The ONECUT class includes mammalian HNF-6 and four Caenorhabditis elegans cDNAs or open reading frames (ORFs) (8).
The proteins of the ONECUT class are characterized not only by their single cut domain, but also by a homeodomain with a peculiar amino acid composition. Homeodomains, which are 60 residues long, are organized in three ␣-helices (for a review on homeodomain-DNA interactions, see Refs. 10 -12). The third helix, called the recognition helix, contacts the DNA and is crucial for sequence-specific binding. Within this helix, residue 48 is part of a hydrophobic core. Whereas residue 48 is a tryptophan in all known homeodomains, it is a phenylalanine in the ONECUT proteins. Residue 50 is also located in the recognition helix. Mutations at this position often lead to changes in the sequence specificity of DNA binding (13)(14)(15). This is consistent with the crystallization data that demonstrate that the amino acid at position 50 is in contact with bases (for a review, see Ref. 16). In the ONECUT proteins, a methionine is found at position 50. This amino acid is never found at this position in other homeodomains.
Our previous experiments showed that HNF-6 can bind to a number of DNA sites, which differ slightly in terms of sequence, and that the cut domain is required for binding of HNF-6 to all the sites tested (8). These experiments ascribed a dual role to the peculiar homeodomain of HNF-6. They showed that this domain is involved in DNA binding, but only for a subset of the sites recognized by HNF-6. They also showed that the homeodomain is involved in transcriptional activation, but only of those genes for which the binding of HNF-6 does not require its homeodomain. By mutational analysis, we demonstrated that phenylalanine 48 and methionine 50 play a role in this transcriptional activating function of the HNF-6 homeodomain (8). This work indicated that the linker region between the cut domain and homeodomain of HNF-6 is important for DNA binding (8). We identified two rat isoforms (HNF-6␣ and HNF-6␤) that originate from the same gene by alternative splicing (17). HNF-6␤ (491 residues) is identical to HNF-6␣ (465 residues), except that it contains an insert of 26 amino acids in the linker region. These two isoforms differ in DNA binding specificity and kinetics (8).
As mentioned above, the ONECUT class contains several C. elegans members, but only one mammalian member, namely HNF-6. The DNA-binding domains of these C. elegans proteins display DNA binding properties similar to those of HNF-6 (8). This indicates that these properties have been evolutionarily conserved and are therefore important for basic regulatory processes that are common to nematodes and mammals. The existence of several members of the ONECUT class in C. elegans prompted us to search for mammalian members of this class that are distinct from HNF-6. We describe here a new member, which we call OC-2, of the ONECUT class. Cloning of OC-2 cDNA-A 193-base pair (bp)-long OC-2 probe was synthesized using the PCR primers FM2.1 and FM2.2 and the IMAGE clone 566080 as template. This clone contains an expressed sequence tag (GenBank TM accession number AA121823) derived from a human fetal retina cDNA library. The OC-2 probe was used to screen a HeLa cell cDNA library (kindly provided by J.-M. Garnier) prepared in Zap II (Stratagene) by hybridization at 42°C in 6ϫ SSC and 50% formamide. Filters were washed at 55°C in 3ϫ SSC, and positive clones were isolated according to the supplier's instructions. Clones lacking the 5Ј-end of the cDNA were obtained. To find this 5Ј-end, a human genomic EMBL3 library was screened with a probe containing the OC-2 cut domain. This probe was obtained by PCR using primers VLIIA and VLIIB and a HeLa cell OC-2 cDNA clone as template. A 4.5-kilobase pair-long EcoRI-EcoRI fragment was obtained from a purified genomic clone and subcloned in pBluescript KS ϩ (Stratagene). A cDNA containing a full-length coding sequence was constructed by fusing 585 bp of the genomic subclone (nucleotides 1-585 in Fig. 1A) to 1069 bp of a HeLa cell OC-2 cDNA clone (nucleotides 586 -1654 in Fig. 1A). The genomic portion used to construct the fully coding cDNA is expressed as mRNA as confirmed by RT-PCR with liver or skin RNA. These RT-PCR experiments were performed with the Titan RT-PCR system (Boehringer Mannheim) in the presence of 2 M betaine using primers FM2.9 and FM2.10. The sequence of the two DNA strands was determined by automated DNA sequencing using the dideoxy chain termination method. DNA and protein sequence analysis was performed using the GCG software package (Genetics Computer Group, Inc., University of Wisconsin, Madison, WI). To identify predicted structurally conserved regions, multiple sequence alignments were performed using the Match-Box algorithm with matrix blosum62 (18,19). 2 Radiation Hybrid Mapping-The chromosomal location of the human OC-2 gene was characterized by radiation hybrid mapping using the GeneBridge 4 Radiation Hybrid panel (Research Genetics). This panel of 93 human ϫ hamster somatic cell hybrids was screened by PCR (35 cycles at 94°C for 1 min, 57°C for 1 min, and 72°C for 1 min) with primers FM2.1 and FM2.2A in 50-l incubations and visualized on an ethidium bromide-stained agarose gel. The data were analyzed using the Whitehead Institute/Massachusetts Institute of Technology Center for Genome Research radiation hybrid map. 3 RT-PCR-To determine the tissue distribution of OC-2 mRNA, RT-PCR was performed (GeneAmp kit, Perkin-Elmer) with 500 ng of total RNA isolated from human tissues. The integrity of the RNA preparations was tested by amplification of a 610-bp-long ␤-actin cDNA fragment with primers bACT5 and bACT3. OC-2 cDNA (193-bp-long fragment) was amplified with primers FM2.1 and FM2.2. HNF-6 cDNA (387-bp-long fragment) was amplified with primers HNF6.5 and HNF6.6. Primers FM2.2 and HNF6.6 were used in the reverse transcription step. The specificity of the amplified products was verified in Southern blotting experiments as described (20) using, as 32 P-labeled probes, PCR fragments obtained with primers HOM1 and HOM2 for the identification of OC-2 products and with primers HNF6.3 and HNF6.4 for the identification of HNF-6 products.
Electrophoretic The tissue distribution of OC-2 and HNF-6 was determined by RT-PCR, and amplified products were identified by Southern blotting. As a control for RNA integrity, ␤-actin mRNA was amplified by RT-PCR, and the products were visualized on a nondenaturing acrylamide gel stained with ethidium bromide. Ki, kidney; Lu, lung; Mu, skeletal muscle; Br, brain; Sp, spleen; Te, testis; He, heart; My, myometrium; Li, Liver; Bl, urinary bladder; Sk, skin; PBL, peripheral blood lymphocytes.
FIG. 3. DNA binding specificity of OC-2 and HNF-6 as determined by electrophoretic mobility shift assay. Extracts from nontransfected (nt.) COS-7 cells or from COS-7 cells transfected with HNF-6 or OC-2 expression vectors were used as a source of proteins, as indicated above the lanes. The radioactive probes used are indicated below. The unlabeled competitor oligonucleotides used in the binding reaction are indicated above.

RESULTS AND DISCUSSION
Identification of OC-2-During a search for HNF-6-related proteins in GenBank TM using the Basic Local Alignment Search Tool (BLAST) (24), we found an expressed sequence tag (GenBank TM accession number AA121823) from a fetal human retina cDNA library that contains a partial cDNA showing significant similarity to the HNF-6 homeobox. Two primers (FM2.1 and FM2.2; see "Experimental Procedures") whose sequences were based on the expressed sequence tag were used to screen, by PCR, two cDNA libraries, one from adult human retina and one from HeLa cells. Identical 193-bp-long bands were seen with the two libraries, yielding a probe that was used to screen the HeLa cell cDNA library by phage plaque hybridization. Six independent clones were obtained. All these clones lacked the 5Ј-end of the coding sequence, but contained nearly identical 5Ј-extremities. This probably reflects the presence of secondary structures in the mRNA that blocked reverse transcription during the preparation of the library. Rapid amplification of cDNA ends/PCR failed to amplify the missing portion, which turned out to be GC-rich (see below and Fig. 1A). To clone the 5Ј-end, we therefore screened a human genomic DNA library with a probe encompassing nucleotides 1027-1214 (Fig.  1A). An additional 5Ј-sequence was obtained (nucleotides 1-585 in Fig. 1A), and RT-PCR with high processivity polymerase on total RNA from human liver and skin confirmed that this sequence is expressed as mRNA. The fully coding cDNA and the derived 485-amino acid-long sequence are shown in Fig. 1A. The ATG codon at nucleotides 88 -90 lies within a perfect Kozak consensus sequence, strongly suggesting that it codes for the initiator methionine. Sequence comparisons showed the presence of a single cut domain (residues 314 -379) and a homeodomain (residues 407-466). The homeodomain was typical of the ONECUT class, with a phenylalanine at position 48 and a methionine at position 50. We therefore named the novel protein OC-2 (for ONECUT-2). Fig. 1B shows an alignment of the amino acid sequences of OC-2 and HNF-6. This alignment shows that the sequences of the cut domain and homeodomain of OC-2 are 97 and 87% identical to those of HNF-6, respectively, and that HNF-6 and OC-2 display 92% amino acid sequence identity over the 157 residues (boxed in Fig. 1B) that encompass the entire DNAbinding region. There are additional regions of similarity between OC-2 and HNF-6 ( Fig. 1B), such as a serine-rich tail of 19 residues in OC-2 and of 21 residues in HNF-6. The linker between the cut domain and homeodomain has the same length (27 residues) in OC-2 and HNF-6␣, and the amino acid sequence of this linker is quasi-identical (83%) in both proteins. A search by RT-PCR with RNA from tissues expressing OC-2 (see below) and primers located in the cut domain and homeodomain failed to provide evidence for an OC-2 isoform containing an alternatively spliced insert between these domains, as is the case for HNF-6␤ (data not shown). As in HNF-6, a polyhistidine FIG. 4. OC-2 is a transcription factor that stimulates transcription of the hnf-3␤ gene. FTO-2B cells were transiently transfected with the empty expression vector pXJ42 (black bars) as a control or with an OC-2 (shaded bars) or HNF-6 (white bars) expression vector. The cotransfected firefly luciferase reporter constructs pHNF3␤(6ϫ)-luc, pHNF3␤-luc, pHNF3␤mut-luc, and pHNF4-0.7-luc were used as indicated. Firefly luciferase values were normalized for Renilla luciferase values from the internal control plasmid pRL138. Data are expressed as -fold stimulation of reporter gene activity by OC-2 or HNF-6 (means Ϯ S.E., n ϭ 3) and are normalized for the reporter activity in the presence of empty expression vector. With the empty expression vector, the absolute activity of pHNF3␤mut-luc was 2-fold lower than that of pHNF3␤luc, as shown by Landry et al. (22). tract occurs in OC-2 at about the same distance upstream of the cut domain. A sequence of 14 consecutive glycines (residues [22][23][24][25][26][27][28][29][30][31][32][33][34][35] is present in OC-2. Such stretches of identical amino acids have been found in many homeodomain proteins (25) and may be functionally important. Indeed, alterations of a polyalanine stretch in the HOXD13 protein causes polydactyly (26). Yet another conserved region (92% identity between OC-2 and HNF-6) is a serine/threonine-and proline-rich sequence (STP box) of 24 residues located just upstream of the polyhistidine tract. Our experiments on HNF-6␣ suggest that this STP box plays a role in transcriptional activation. 4 Fig. 1B also shows an alignment of the amino acid sequences of OC-2 and HNF-6␣ with that of the C. elegans ONECUT gene product closest to these mammalian proteins, namely R07D10.x (an ORF found on cosmid C17H12). One sees that not only the homeodomain and the cut domain, but also the STP box, have been conserved in the ONECUT class since the nematodes.
Recent data bank searches showed that the human genome contains sequences that code for additional putative ONECUT proteins very similar to HNF-6 and OC-2. The amino acid sequences of two ORFs, one present in fosmid F37502 (Gen-Bank TM accession number AC004755) and one present in cosmid F21967 (GenBank TM accession number AC005256), are shown in Fig. 1B. Fosmid F37502 and cosmid F21967 contain sequences present on the same chromosome (human chromosome 19) and separated by a 1-kilobase pair gap. The distance between the two ORFs shown in Fig. 1B is 19 kilobase pairs. The ORF of fosmid F37502 contains a typical ONECUT homeodomain with a phenylalanine at position 48 and a methionine at position 50. The ORF of cosmid F21967 contains a cut domain that is quasi-identical to those of the bona fide ONE-CUT proteins. It also contains an STP box and a polyglycine stretch. It could be that cosmid F21967 and fosmid F37502 each contain one-half of a single ONECUT gene whose organization resembles that of the hnf-6 gene, namely with a very long intronic region between the cut domain and homeodomain (17). In this case, the two corresponding ORFs shown in Fig. 1B would belong to two parts of the same OC-3 protein. If these ORFs do not correspond to pseudogenes and if these ORFs are expressed as peptides, these peptides very likely correspond to ONECUT protein(s) different from HNF-6 and OC-2.
Chromosomal Localization of OC-2-The chromosomal localization of OC-2 in humans was determined by radiation hybrid mapping. PCR amplifications were performed, and the data were analyzed as indicated under "Experimental Procedures." This showed that the OC-2 gene is located on chromosome 18, 10.31 cR 3000 telomeric from marker WI-8740 and 9.2 cR 3000 centromeric from marker CHLC.GATA30B03 (lod score Ͼ 3.0). Human HNF-6 is located on chromosome 15q21. 1-21.2 (27), 5 and, as said above, the putative human OC-3 gene is located on chromosome 19. These data show that in humans, the genes coding for the homeodomain proteins of the ONECUT class are not located in a cluster.
OC-2 Gene Expression Is Tissue-restricted-To determine the tissue distribution of OC-2, we screened, by RT-PCR, 12 human tissues for expression of its mRNA. The tissue distribution of HNF-6 mRNA, not yet studied in humans, was analyzed in parallel. The PCR products were subjected to Southern blot analysis using overlapping radioactive PCR probes. As a control, ␤-actin mRNA was amplified from the same RNA preparations. The data (Fig. 2) show that OC-2 mRNA is abundant in the liver and skin. Lower amounts were found in the testis, brain (occipital cortex), and urinary bladder. The kidney yielded a very weak signal. The expression of HNF-6 mRNA was strongest in the liver and clearly detectable in the testis and skin. We conclude that OC-2 has a tissue-restricted pattern of expression that differs from, but overlaps with, that of HNF-6.
OC-2 and HNF-6 Have Distinct but Overlapping DNA Binding and Transcriptional Activation Properties-The remarkable amino acid sequence similarity of the two DNA-binding domains of OC-2 and HNF-6␣ and of the linker region between these domains suggested that the two proteins display similar DNA binding specificities. HNF-6 binds to two types of sequences. Binding to the first type, exemplified by a probe derived from the transthyretin gene (TTR probe), requires both the cut domain and homeodomain of HNF-6. Binding to the second type of sequence, exemplified by a probe derived from the hnf-3␤ gene (HNF-3␤ probe), requires only the cut domain of HNF-6 (8). We therefore tested, by electrophoretic mobility shift assay, whether OC-2 binds to these two probes. This was the case (Fig. 3, lanes 3 and 8 versus lanes 1 and 6). Binding of OC-2 was specific as shown in incubations containing an excess of unlabeled probe (Fig. 3, lanes 4 and 9). Cross-competitions suggested that OC-2 binds with higher affinity to the HNF-3␤ probe than to the TTR probe (Fig. 3, lanes 5 and 10). In contrast, HNF-6␣ bound equally well to the two probes, as shown in Fig. 3 (lanes 2 and 7) and by our earlier results (8).
HNF-6 regulates the transcription of genes involved in glucose metabolism (9, 32). 6 As shown in Fig. 3 (lanes 14 -19), we found that OC-2 and HNF-6␣ bind to the same cis-acting sequences in the pepck (PEPCK probe) and pfk-2 (GRU probe) genes. Like HNF-6, OC-2 is therefore expected to play a role in the regulation of liver gluconeogenesis and glycolysis. Experiments on dedifferentiated hepatoma lines and on embryoid bodies provided evidence for a network of liver-enriched transcription factors that is required for hepatocyte differentiation and morphogenesis (33)(34)(35)(36)(37)(38). HNF-3␤ is an essential component of this network. It stimulates expression of hnf-4. On the other hand, HNF-4 activates transcription of the hnf-1 gene. Our earlier experiments indicated that HNF-6 binds to, and controls the expression of, the hnf-3␤ and hnf-4 genes (22). We therefore determined whether OC-2 binds to the same sites as HNF-6 in these two genes. While OC-2 bound to the same HNF-3␤ probe as HNF-6 (see above), OC-2 did not bind to the HNF-6-binding site derived from the hnf-4 gene promoter (HNF-4 probe) (Fig. 3, lanes 11-13). The discrimination of OC-2 between the HNF-3␤ and HNF-4 probes is similar to that described earlier for HNF-6␤ (8). However, OC-2 binds to a probe (PEPCK) to which HNF-6␤ binds very poorly. Therefore, OC-2 has a DNA binding specificity that differs from, but overlaps with, that of HNF-6␣ and HNF-6␤ (Table I).
To study if OC-2 may be involved in the network of liver transcription factors by controlling transcription of the hnf-3␤ gene, rat hepatoma FTO-2B cells were cotransfected with an OC-2 expression vector and a construct containing the luciferase reporter under the control of six copies of the HNF-6/OC-2 site found in the hnf-3␤ gene. Cells similarly transfected, but with HNF-6␣ instead of OC-2, were examined in parallel. As shown in Fig. 4A, OC-2 stimulated the reporter construct 50fold and was as effective as HNF-6␣. To compare the activities of HNF-6␣ and OC-2 on the wild-type hnf-3␤ promoter, we cotransfected, in FTO-2B cells, their expression vectors with a reporter construct containing the hnf-3␤ gene promoter (pHNF3␤-luc) or the same promoter in which the HNF-6/OC-2 site has been destroyed by mutation (pHNF3␤mut-luc). Fig. 4B shows that overexpression of OC-2 or HNF-6␣ stimulated transcription from the wild-type promoter 2-3-fold, but was without effect on the mutated promoter. Consistent with earlier results (22), the mutated promoter displayed a 50% reduction in basal activity compared with the wild-type promoter, probably as a result of the loss of action of endogenous ONECUT transcription factors. Finally, as expected from the lack of binding of OC-2 to the probe derived from the hnf-4 gene (Fig.  3), overexpression of OC-2 did not activate the hnf-4 promoter (pHNF4-0.7-luc) (Fig. 4B) in transfection experiments, whereas HNF-6␣ did. We conclude from these transfection experiments that OC-2 stimulates transcription of the hnf-3␤ gene. Both HNF-6 and OC-2 might therefore be key players in the network of liver transcription factors by controlling a different set of genes.
Conclusions-Human OC-2 is a mammalian member of the ONECUT class of homeodomain transcription factors whose sequence is very similar to that of HNF-6. On the basis of their respective tissue distribution, DNA binding specificities, and transcriptional activation properties, HNF-6 and OC-2 have different but overlapping functions. The expression of OC-2 in the retina and its chromosomal localization raise the question of the involvement of OC-2 in CORD1. In liver, OC-2 is expected to participate in the network of transcription factors that regulates differentiation and morphogenesis.