Cotranscription and Intergenic Splicing of Human Galactose-1- phosphate Uridylyltransferase and Interleukin-11 Receptor a-Chain Genes Generate a Fusion mRNA in Normal Cells IMPLICATION FOR THE PRODUCTION OF MULTIDOMAIN PROTEINS DURING EVOLUTION*

In the past 10 years, much attention has been focused on transcription preinitiation complex formation as a target for regulating gene expression, and other targets such as transcription termination complex assemblage have been less intensively investigated. We established the existence of poly(A) site choice and fusion splicing of two adjacent genes, galactose-1-phosphate uridylyltransferase (GALT) and interleukin-11 receptor a-chain (IL-11Ra), in normal human cells. This 16-kilobase (kb) transcription unit contains two promoters (the first one is constitutive, and the second one, 8 kb downstream, is highly regulated) and two cleavage/polyadenylation signals separated by 12 kb. The promoter from the GALT gene yields two mRNAs, a 1.4-kb mRNA encoding GALT and a 3-kb fusion mRNA when the first poly(A) site is spliced out and the second poly(A) is used. The 3-kb mRNA codes for a fusion protein of unknown function, containing part of the GALT protein and the entire IL11Ra protein. The GALT promoter/IL-11Ra poly(A) transcript results from leaky termination and alternative splicing. This feature of RNA polymerase (pol) II transcription, which contrasts with efficient RNA pol I and pol III termination, may be involved, together with chromosome rearrangements, in the generation of fusion proteins with multiple domains and would have major evolutionary implications in terms of natural processes to generate novel proteins with common motifs. Our results, together with accumulation of genomic informations, will stimulate new considerations and experiments in gene expression studies. Genome projects are underway in various model organisms, and sequencing of large genomic regions has already begun. Along with physical mapping data, the comparison of genomic sequences with expressed sequences obtained by sequencing total cDNA libraries will identify the relative positions of genes within large genomic regions (1). Several large-scale DNA studies have pointed out that in many regions there is a high density of genes with related or unrelated physiological functions (2). A role for genomic disposition seems to be attested by the existence of gene clusters conserved during the evolutionary process such as the homeobox gene clusters in animals and plants (3), the b-globin loci in mammals (4), and the cholinergic locus in animals (5). Furthermore, the identification of families of proteins suggests the implication of common ancestral genes, duplications, mutations, and genomic rearrangements during evolution. An additional complexity is the association within a protein of structural motifs issued from different families of proteins (6). However, the mechanism of generation of such hybrid molecules during evolution is not understood. Whether gene disposition in the genome participates in new protein building remains to be addressed. In this study, we provide evidence that two adjacent human genes coding for unrelated proteins can be considered as a single transcription unit. The unexpected transcription unit is formed upon cotranscription and fusion splicing of genes encoding GALT and IL-11Ra. GALT is a key soluble enzyme in the Leloir pathway for the conversion of galactose to glucose (7). Impairment of GALT results in galactosemia (8–10). The human GALT gene is organized into 11 coding exons spanning 4 kb of genomic DNA (11). A single 1.4-kb mRNA has been identified, and the corresponding cDNA encoding a 43-kDa protein has been cloned. GALT enzyme activity has been detected in all tissues examined, with different levels of activity at different stages of development and in different tissues (8). The GALT gene has been mapped to human chromosome 9p13 (12). The human IL-11Ra gene has also been assigned to chromosome 9p13 (13, 14). IL-11Ra is a member of the hematopoietin receptor superfamily (15–17). This chain of 48 kDa (18), together with the common transducing chain gp130, forms the high affinity receptor for the hematopoietic growth factor, interleukin-11 (19–21). Human IL-11Ra mRNA expression is restricted to hematopoietic and osteoblastic bone compartments (17). The human gene spans 8 kb. It is composed of 13 * This work was supported by INSERM, CNRS, Association pour la Recherche contre le Cancer, the European Union Biotechnology Research Program, Association Française contre les Myopathies, Groupement de Recherche et d’Etude sur les Génômes, and Boehringer Mannheim GmbH (Mannheim, Germany). The chromosome-specific gene library LL09NC01 used in this work was constructed at the Biomedical Sciences Division, Lawrence Livermore National Laboratory, Livermore, CA 94550 under the auspices of the National Laboratory Gene Library Project sponsored by the U.S. Department of Energy. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. § These two authors contributed equally to this work. i Supported by a fellowship from the University Paris VII. To whom correspondence should be addressed: Lab. de Génétique Moléculaire de la Neurotransmission et des Processus Neurodégénératifs, CNRS UMR9923, Bat. CERVI, Hôpital de la Pitié Salpétrière, 83 bld. de l’Hôpital, 75013 Paris, France. Fax: 33-1-42-17-75-33; E-mail: pitiot@ infobiogen.fr. ** Supported by a La Ligue contre le Cancer fellowship. 1 The abbreviations used are: GALT, galactose-1-phosphate uridylyltransferase; IL-11Ra, interleukin-11 receptor a-chain; kb, kilobase(s); RTPCR, reverse transcription-polymerase chain reaction; pol, polymerase. THE JOURNAL OF BIOLOGICAL CHEMISTRY Vol. 273, No. 26, Issue of June 26, pp. 16005–16010, 1998 © 1998 by The American Society for Biochemistry and Molecular Biology, Inc. Printed in U.S.A.

Genome projects are underway in various model organisms, and sequencing of large genomic regions has already begun. Along with physical mapping data, the comparison of genomic sequences with expressed sequences obtained by sequencing total cDNA libraries will identify the relative positions of genes within large genomic regions (1). Several large-scale DNA studies have pointed out that in many regions there is a high density of genes with related or unrelated physiological functions (2). A role for genomic disposition seems to be attested by the existence of gene clusters conserved during the evolutionary process such as the homeobox gene clusters in animals and plants (3), the ␤-globin loci in mammals (4), and the cholinergic locus in animals (5). Furthermore, the identification of families of proteins suggests the implication of common ancestral genes, duplications, mutations, and genomic rearrangements during evolution. An additional complexity is the association within a protein of structural motifs issued from different families of proteins (6). However, the mechanism of generation of such hybrid molecules during evolution is not understood. Whether gene disposition in the genome participates in new protein building remains to be addressed.
In this study, we provide evidence that two adjacent human genes coding for unrelated proteins can be considered as a single transcription unit. The unexpected transcription unit is formed upon cotranscription and fusion splicing of genes encoding GALT 1 and IL-11R␣. GALT is a key soluble enzyme in the Leloir pathway for the conversion of galactose to glucose (7). Impairment of GALT results in galactosemia (8 -10). The human GALT gene is organized into 11 coding exons spanning 4 kb of genomic DNA (11). A single 1.4-kb mRNA has been identified, and the corresponding cDNA encoding a 43-kDa protein has been cloned. GALT enzyme activity has been detected in all tissues examined, with different levels of activity at different stages of development and in different tissues (8). The GALT gene has been mapped to human chromosome 9p13 (12). The human IL-11R␣ gene has also been assigned to chromosome 9p13 (13,14). IL-11R␣ is a member of the hematopoietin receptor superfamily (15)(16)(17). This chain of 48 kDa (18), together with the common transducing chain gp130, forms the high affinity receptor for the hematopoietic growth factor, interleukin-11 (19 -21). Human IL-11R␣ mRNA expression is restricted to hematopoietic and osteoblastic bone compartments (17). The human gene spans 8 kb. It is composed of 13 exons, and its intron-exon organization is consistent with the genomic structure pattern observed for the hematopoietin receptor superfamily (22).
The main significance of our results is the reevaluation of gene disposition in terms of gene expression studies and the evolutionary implications of the existence of the GALT/IL-11R␣ fusion as a possibility of generating new multidomain proteins.

EXPERIMENTAL PROCEDURES
Genomic Cloning-The GALT and IL-11R␣ locus was isolated using a copy of the chromosome 9-specific cosmid library LL09NC01, which was constructed by Dr. J. Allmeman (Biomedical Sciences Division, Lawrence Livermore National Laboratory, Livermore, CA). The screening of orderly spotted colony filters was performed using standard hybridization procedures for Hybond N ϩ membranes (Amersham Pharmacia Biotech). Positive clones were tested to ensure that they were single colonies. Clones were grown in LB medium containing 20 g/ml kanamycin. DNA was prepared using a plasmid DNA preparation kit (QIAGEN Inc.). One g of DNA was digested for 2 h at 37°C with EcoRI or HindIII or both. The DNA was subjected to pulsed-field gel electrophoresis on a CHEF apparatus (Bio-Rad) in 1% agarose and 0.5ϫ Tris borate/EDTA at 140 V for 10 h and a pulse time of 10 s. DNA samples were depurinated and alkali-transferred to nylon membranes (Amersham Pharmacia Biotech). Oligonucleotides (Gal1 (GALT exon 1) and B9E (IL-11R␣ exon 2), see sequences below; B23 (GALT exon 10), GCAGCTGCCAATGGTTCCAG; B31 (IL-11R␣ exon 1), TAGCTGGT-GAGAGGAAGTCC; B22 (IL-11R␣ exon 7), CATGCCCACAGGATC-CCCTA; and B14 (IL-11R␣ exon 13), GCTGAAAGGTGCTTGTAC-CTCT) were phosphorylated with [␥-32 P]ATP and T4 kinase and hybridized at 45°C to Hybond N ϩ filters according to the manufacturer's recommended conditions and washed in 0.1ϫ SSC and 0.1% SDS at 45°C for 30 min.
Screening of cDNA Library-A human placenta cDNA library (provided by B. Lethé, Ludwig Institute for Cancer Research, Brussels, Belgium) containing inserts with an average size of 2.5 kb was screened using the human IL-11R␣ probe following the experimental procedure of Chérel et al. (17). Three positive clones were isolated and analyzed by restriction mapping. One of these clones, A11, was fully sequenced.
RNase Protection Assays-PCR was performed with primers FG6 (5Ј-CTGGAACCATTGGCAGCTGC) and B4. The PCR product was purified and ligated into pNoTA/T7, resulting in pNoFG6-B4, and sequenced on both strands. The antisense RNA probe was synthesized using T7 RNA polymerase in the presence of [␣-32 P]UTP (800 Ci/mmol) using pNoFG6-B4 DNA as the template. The labeled RNA probe was treated with DNase I and purified by electrophoresis on an 8 M urea and 5% polyacrylamide gel. RNase protection assays were performed according to the manufacturer's instructions (RPAII kit, Ambion Inc., Austin, TX). Gels were used to expose x-ray film overnight, and autoradiographic signals were quantified on a PhosphorImager (Molecular Dynamics, Inc., Sunnyvale, CA).
Cell Culture-The human histiocytic lymphoma cell line U937, the chronic myelogenous leukemia cell line K562, the osteogenic sarcoma cell line SAOS-2, the human myeloma cell line RPMI 8226, the human bone marrow metastatic neuroblastoma cell line SKNSH, and COS-7 monkey kidney cells were purchased from American Type Culture Collection (Rockville, MD). All cell lines except COS-7 cells were cultured in RPMI 1640 medium containing 10% fetal calf serum and 2 mM glutamine. COS-7 cells were grown in Dulbecco's modified Eagle's medium containing 10% fetal calf serum and 2 mM glutamine. Ba/F3 cells (provided by J.-F. Moreau, CNRS UMR5540, Bordeaux, France) were maintained in RPMI 1640 medium, 10% fetal calf serum, 2 mM glutamine, and 10% WEHI-3-conditioned medium as a source of IL-3 (24). The Kit 225 cell line (obtained from D. A. Cantrell, ICRF, London, United Kingdom), was cultured in RPMI 1640 medium containing 10% fetal calf serum, 20 ng/ml IL-2, and 2 mM glutamine. The T cell clones, derived from the skin of a patient suffering acute graft-versus-host disease after allogenic bone marrow transplantation from an HLAmismatched related donor, were kindly provided by H. Vié (INSERM U463) (25).
Immunoblot Analysis-Confluent COS-7 cells (48 h after transfection) were scraped, pelleted, washed with phosphate-buffered saline (pH 7.5), and stored at Ϫ70°C. Cells were thawed at 0°C; resuspended in buffer containing 10 mM Tris-HCl (pH 7.5), 0.25 M sucrose, 2 mM MgCl 2 , and 0.1 mg/ml leupeptin; homogenized in a Dounce homogenizer; and centrifuged for 20 min at 2500 rpm at 4°C. The clarified supernatant (12,000 rpm, 20 min) was centrifuged again at 48,000 rpm for 45 min at 4°C to obtain the cytosolic protein lysate in a final volume of 500 l. The cell pellet was resuspended in 200 l of buffer containing 10 mM Tris-HCl (pH 7.5) and 0.25 M sucrose and centrifuged twice (12,000 rpm, 20 min), and the membrane proteins were resuspended in 60 l of the same buffer. Protein concentration was determined by the BCA method (Pierce) with bovine serum albumin used as a standard. Membrane or cytosolic proteins (130 g/lane) were resolved on a 7.5% SDS-polyacrylamide gel, transferred to polyvinylidene difluoride membranes (Millipore Corp., Bedford, MA), and probed with a rabbit monoclonal antibody (1:100) directed against a 20-amino acid peptide of murine IL-11R (Santa Cruz Biotechnology, Santa Cruz, CA). Detection of the antibody-antigen complex was achieved by the enhanced chemiluminescence procedure (ECL kit, Boehringer). Blots were used to expose X-Omat films (Eastman Kodak Co.) for 1 min to visualize immunoreactive bands.
GALT Activity Assay-Confluent COS-7 cells (48 h after transfection) were scraped, pelleted, and washed with phosphate-buffered saline (pH 7.5). Cells were lysed in 10 mM Tris/glycine buffer (pH 8.5) and 10 mM dithiothreitol by freezing and thawing once, followed by sonication for 2 min and centrifugation at 2000 rpm for 20 min at 4°C. GALT activity was measured by the consumption of UDP-Glu according to published procedures (27). Curves were fitted using linear regression (Mathematica software, Wolfram Research Europe Ltd., Oxfordshire, United Kingdom).

RESULTS
Identification of GALT⌬11/IL-11R␣ cDNA-The human IL-11R␣ gene has been widely studied, but its transcription start site and the 5Ј-untranslated region of the mRNA remained to be determined. To identify a cDNA containing the 5Ј-untranslated region, a probe corresponding to the human IL-11R␣ cDNA was used to screen a placenta cDNA library. Three positive clones were obtained. The longest one, clone A11, was analyzed. Its sequence codes for the entire IL-11R␣ protein. Surprisingly, this sequence was found to be fused to a part of the GALT cDNA sequence. Comparison of the A11 sequence with the GALT and IL-11R␣ genomic sequences suggested that the mRNA corresponding to the A11 cDNA is the result of a particular splice event occurring between GALT exon 10 and IL-11R␣ exon 2 ( Fig. 1, inset). The absence of the last exon of GALT (exon 11) led to deletion of the 26 C-terminal amino acids of GALT in the putative fusion protein and eliminated the stop codon of GALT. Similarly, the absence of a 5Ј-untranslated region of IL-11R␣ (exon 1) in A11, which could have introduced a stop codon or changed the reading frame, kept the nucleotide sequence in frame so that it also encoded a complete IL-11R␣ protein. The 3Ј-end of GALT exon 10 was linked to IL-11R␣ exon 2, which began in frame with the initiation codon. Thus, identification of the A11 cDNA suggests that transcripts encoding a GALT⌬11/IL-11R␣ fusion protein, derived from the GALT and IL-11R␣ proteins, may exist in human cells. However, at this stage, the possibility that the fusion resulted from a cloning artifact could not be eliminated.
Genomic Organization of GALT/IL-11R␣ Locus-Mapping of both the GALT and IL-11R␣ genes to human chromosome 9p13 suggested that the mRNA produced from A11 cDNA resulted from cotranscription and intergenic splicing of the GALT and IL-11R␣ genes. To investigate this possibility, a cosmid library from human chromosome 9, LL09NC01, was screened with the A11 probe. The nine positive clones were further characterized. Hybridization with the labeled oligonucleotides Gal1 (GALT exon 1) and B9E (IL-11R␣ exon 2) of positive clone DNA revealed that three clones contained both genes. Pulsedfield gel electrophoresis restriction mapping of the clones and hybridization with oligonucleotides B23, B31, B22, and B14 as well as primers within the T3 and T7 Lawrist 16 vector arms revealed that the two genes were separated by 4 kb, giving a total genomic locus of 16 kb (Fig. 2). This supported the notion that the mRNA coding for the fusion protein GALT⌬11/IL-11R␣ could be generated through a single process of RNA pol II transcription.
Analysis of GALT⌬11/IL-11R␣ Intergenic Transcripts in Normal Tissues and Cell Lines-To further confirm the existence of the intergenic transcripts, RT-PCRs were performed using primers designed to amplify the mRNA sequence spanning the junction region of the potential GALT⌬11/IL-11R␣ intergenic transcript. Primers corresponding to the start codon portion of GALT (Gal1; Fig. 1) and IL-11R␣ exon 3 (B4; Fig. 1) resulted in amplification of a 1203-base pair product, detectable by UV illumination and confirmed by Southern blotting, in five different cell types and in four different tissues. In normal cells, the intergenic transcripts were present at high levels in fetal bone marrow and at low levels in brain, liver, and small intestine. In cultured cells, the fusion transcript was highly expressed in T cell clones; moderately expressed in osteosarcoma SAOS-2 cells, neuroblastoma cell line SKNSH, and myeloma cell line RPMI 8226; and weakly expressed in histiocytic lymphoma cell line U937 (Fig. 3). The nucleotide sequences of RT-PCR products were consistent with previous findings in the human placental cDNA clone A11. This demonstrated that this fusion mRNA was produced in normal tissues and cell lines and that the A11 cDNA resulted neither from a particular mutation or a genomic rearrangement within the individual from which the placenta cDNA library was generated nor from a cloning artifact.
Expression Analysis of GALT⌬11/IL-11R␣ Intergenic Transcript by RNase Protection Assay-RNase protection analyses were performed to investigate whether the intergenic transcript is expressed as a bona fide mRNA species in human cells (Fig. 4). An antisense RNA probe (366 nucleotides) was synthesized from a DNA template constructed by PCR using primer FG6 from GALT exon 10 and primer B4 from IL-11R␣ exon 3 as indicated in Fig. 1. We expected to detect three fragments protected by this probe: the intergenic mRNA should protect a 266-nucleotide fragment, the IL-11R␣ mRNA should protect a 142-nucleotide fragment, and the GALT mRNA should protect a 124-nucleotide fragment (Fig. 4B). RNase protection analysis detected intergenic and GALT mRNAs in six different T cell clones (LT1-LT6) and in the osteosarcoma SAOS-2 cell line. IL-11R␣ mRNA was detectable only in two T cell clones (LT3 and LT4) and in SAOS-2 cells (Fig. 4A). These results were consistent with RT-PCR analysis.
Relative quantification showed that the GALT⌬11/IL-11R␣ transcripts in T cell clones represented ϳ4% of GALT transcripts (average of six clones) and 150% of IL-11R␣ transcripts (average of two clones). In SAOS-2 cells, intergenic transcripts comprised Ͻ3% of GALT transcripts and 10% of IL-11R␣ transcripts. We used the GALT-protected fragment signal as an internal control of RNA quality and quantity; the level of the GALT-protected fragment in total RNA from K562 cells was too low to evaluate other protected transcripts.
We thus proved that RNA pol II can transcribe, in human cells, two independent genes and produce a processed mRNA from this pre-mRNA via an intergenic splicing event. The fusion mRNA is produced at low levels, but higher than usually described for illegitimate transcription (28). In addition to this result concerning transcription by RNA pol II, we decided to analyze whether this phenomenon could play a role at the protein level.
Analysis of Fusion Protein Products-Based on their sequence, GALT⌬11/IL-11R␣ transcripts may encode a fusion protein with multiple domains, the first consisting of the GALT⌬11 product followed by the signal peptide of IL-11R␣ and the mature IL-11R␣ chain with its Ig-like, cytokine receptor-like, transmembrane, and cytoplasmic domains. This hybrid molecule should contain two hydrophobic regions, raising questions about its folding and anchorage to the membrane. To examine these points, we performed Western blot analysis on total protein lysates (100 -500 g) from cells expressing fusion mRNA (T cell clones, T cell lymphoma-derived cell line Kit 225, and osteosarcoma cell line SAOS-2) and from COS-7 cells transfected or not with pKCSR␣-IL-11R␣ or pKCSR␣-GALT⌬11/IL-11R␣. Immunoblotting with an antibody raised against an N-terminal peptide from murine IL-11R␣ detected only an 85-kDa protein in COS-7 cells expressing GALT⌬11/ IL-11R␣ (data not shown). Additional experiments with membrane or cytosolic protein lysates revealed that the 85-kDa protein is present in the membranous fraction of COS-7 cells expressing GALT⌬11/IL-11R␣ and a doublet of 48 and 50 kDa is present in the membranous fraction of COS-7 cells expressing IL-11R␣ (Fig. 5).
These apparent sizes are consistent with the predicted molecular mass of the protein encoded by the GALT⌬11/IL-11R␣ transcript (85 kDa) and our previous Western blot data using cell membrane lysates from Ba/F3 cells expressing IL-11R␣ (18). We checked that cell membrane lysates were equally loaded on the gel by immunoblotting with a high affinity-purified antiserum specific for the ␣-subunit of G i2 (29) (data not shown).
These results demonstrate that the fusion mRNA is translatable to an 85-kDa GALT⌬11/IL-11R␣ fusion protein that is produced as a membrane-associated protein detectable only in transfected cells. They also show that the fusion mRNA produces an 85-kDa protein that is not further processed by endoproteolytic cleavage of the signal peptide of IL-11R␣ to generate a shorter protein that could be a functional IL-11R␣.
Activity of Fusion Protein-We then investigated whether this fusion protein had both GALT and IL-11R␣ activities. GALT activity of the fusion protein was unlikely because a stop codon mutation in the GALT gene, E340X (14 amino acids less than GALT⌬11), was reported in a galactosemic patient (30). This was confirmed by measuring GALT activity in whole cell lysates from COS-7 cells transfected with pKCSR␣-GALT⌬11/ IL-11R␣, pKCSR␣-GALT, pKCSR␣-GALT⌬11, or pKCSR␣-IL-11R␣ (Fig. 6). We detected no transferase activity in cells producing GALT⌬11 or GALT⌬11/IL-11R␣ protein, whereas we did detect activity in cells producing GALT. Control experiments were performed with cells producing IL-11R␣.
Similarly, the question arose as to whether the GALT⌬11/ IL-11R␣ protein would form a functional IL-11 receptor with gp130. We investigated this using the same approach we used to show that Ba/F3 cells producing both components of the IL-11 receptor complex, i.e. IL-11R␣ and gp130, become IL-11dependent (18). Experiments were performed using GALT⌬11/ IL-11R␣ instead of IL-11R␣. We replicated this experiment three times, but were unable to get any IL-11-dependent cotransfected Ba/F3 cells (data not shown), suggesting that the fusion protein is not functional as a co-receptor.
In conclusion, it appears that the results presented here strongly support the existence of a fusion mRNA, resulting from the cotranscription of two distinct genes and which codes for a putative fusion protein. The experiments we performed failed to detect the protein in the nontransfected cells and any activity in the transfected cells.

DISCUSSION
The study of the IL-11R␣ gene led us to identify a cDNA clone containing part of the GALT mRNA sequence linked in phase with the full IL-11R␣ coding sequence. Based on RT-PCR and RNase mapping experiments, we established that significant amounts of the corresponding GALT⌬11/IL-11R␣ fusion mRNA are present in normal human cells. These results demonstrate that RNA pol II complexes may cotranscribe two nearby unrelated genes in the human genome. Thus, two mRNAs are generated from the GALT promoter, a 1.4-kb mRNA encoding GALT and a 3-kb fusion mRNA encoding part of GALT and the entire IL-11R␣ chain. The existence of the fusion transcript is probably due to leaky cleavage/polyadenylation at GALT poly(A) and an alternative splicing event between the GALT exon 10 donor splice site and the IL-11R␣ exon 2 acceptor splice site. The biological significance of the fusion mRNA in human tissues remains to be established.
The formation of fusion proteins by readthrough mechanisms is a recurrent phenomenon in the life cycle of several viruses. Such proteins are issued from translational readthrough of at least the stop codon of the first cistron (31,32). Transcription termination readthrough generating polycistronic mRNA is also frequent (33)(34)(35), but the first cistron is the only one to be translated in the absence of translational readthrough, as demonstrated for the paramyxoviruses (36). This transcriptional readthrough could be mediated by cellular and viral factors (37,38) and could participate in the regulation of gene expression (38 -40) as well as in the formation of fusion proteins. Furthermore, retroviruses seem to evolve using transcriptional readthrough to integrate cellular sequences close to the provirus (41). In summary, the structure of the transcription unit we identified is similar to the one found in the adenovirus major late promoter/E3 late transcription unit when dealing with mRNA formation. Within the numerous mRNAs initiated at the major late promoter, some of them spliced out exons containing poly(A) signal located before the E3 promoter and spliced to E3 gene exon 2 (42). However, it has not been reported that this process could generate a new fusion protein from two independent genes as described in this study. Furthermore, here we report transcriptional readthrough in the human genome. Up to now, the only data supporting such a mechanism in eukaryotic genomes came from the identification of a fusion transcript between the two genes MDS1 and EVI1, occurring in normal human tissues (43). In that work, quantitative experiments and analysis of putative fusion products were not addressed. As compared with viruses, we hypothesized for a role of the cotranscription process in the eukaryotic genomes. We first addressed this question at the protein level. Transfection experiments showed that this fusion mRNA is translated into an 85-kDa protein associated with a cell membrane, devoid of both expected biological activities; nevertheless, in nontransfected cells where the mRNA is present, such as T lymphocytes and T lymphoma, the hybrid protein is undetectable by Western blot analyses. It is important to consider that IL-11R␣ is also undetectable by Western blotting in IL-11responsive cells, and yet to date, the receptor chain was only identified using Scatchard plot analysis (44). In the same way, the fusion protein could be expressed at extremely low levels and have a novel function different from GALT and IL-11R␣ biological activities. Another possibility is that the hybrid protein is not active and corresponds to an intermediary step in the generation of a new protein. Thus, this hybrid molecule contains two internal hydrophobic domains, raising the possibility that some nascent proteins fail to fold correctly and would be rapidly degraded in the endoplasmic reticulum (45) before being addressed to a cell membrane. This view is supported by a lower expression of the fusion protein compared with a higher expression of IL-11R␣ protein observed in three different transfection experiments. The fusion mRNA would then represent an exploratory event for evolution where a new protein is created and tested. It would be of interest to check whether this phenomenon is general in eukaryotic genomes, where recent physical mapping data indicate that many regions contain a high density of genes. If this is the case, one could speculate about whether there is a "second reading" of a genome where fusion proteins from independent genes are produced at low levels in cells. This reservoir of function could then be selected under particular conditions. In that case, inefficient RNA pol II termination, which contrasts with efficient RNA pol I and III termination, may have been selected during evolution, together with chromosome rearrangements, as a mechanism for generating new multidomain proteins. Furthermore, a single mutational event at the acceptor splice site or at the poly(A) signal of the last exon of the first gene could generate large amounts of a hybrid protein.
Another possibility is that the fusion mRNA has a function by itself, not implicating the production of a fusion protein. We can hypothesize that the transcript elongation by RNA pol II could generate transcriptional interference with the IL-11R␣ promoter. The GALT gene, which is strongly expressed in all cells, is located upstream from the IL-11R␣ gene, which is expressed at low levels with strong tissue specificity. This generates a paradoxical situation in which RNA pol II complexes are continually progressing along the IL-11R␣ promoter in all cell types, but very little, if any, IL-11R␣ mRNA is produced. The peculiarity of RNA pol II termination could down-regulate IL-11R␣-specific transcription. Other studies demonstrated negative effects of one promoter on another for RNA pol I (46) and for RNA pol II (47,48). The same phenomenon appears to be used by some viral genomes. Thus, the relative positions of genes in the genome could be an important element of gene regulation.
In conclusion, we have demonstrated the production of a fusion GALT⌬11/IL-11R␣ mRNA in human tissues and cell lines by a mechanism of cotranscription and intergenic splicing. This mRNA is translatable into a fusion protein (GALT⌬11/IL-11R␣) that is associated with the cell membrane. The proximity of GALT and IL-11R␣ genes and the existence of a fusion mRNA will open the way for new investigation into galactosemic patients with chromosome 9p13 deletions. More generally, our results suggest that gene disposition throughout the genome, as well as transcription termination efficiency, has to be considered in gene expression studies. Cotranscription of flanking units may not be limited to the GALT/IL-11R␣ locus and may be of general importance.