The Human Rh50 Glycoprotein Gene

The Rh (Rhesus) protein family comprises Rh50 glycoprotein and Rh30 polypeptides, which form a complex essential for Rh antigen expression and erythrocyte membrane integrity. This article describes the structural organization of Rh50 gene and identification of its associated splicing defect causing Rhnulldisease. The Rh50 gene, which maps at chromosome 6p11–21.1, has an exon/intron structure nearly identical to Rh30 genes, which map at 1p34–36. Of the 10 exons assigned, conservation of size and sequence is confined mainly to the region from exons 2 to 9, suggesting thatRH50 and RH30 were formed as two separate genetic loci from a common ancestor via a transchromosomal insertion event. The available information on the structure of RH50facilitated search for candidate mutations underlying the Rh deficiency syndrome, an autosomal recessive disorder characterized by mild to moderate chronic hemolytic anemia and spherostomatocytosis. In one patient with the Rhnull disease of regulator type, a shortened Rh50 transcript lacking the sequence of exon 7 was detected, while no abnormality was found in transcripts encoding Rh30 polypeptides and Rh-related CD47 glycoprotein. Amplification and sequencing of the genomic region spanning exon 7 revealed a G → A transition in the invariant GT motif of the donor splice site in both Rh50 alleles. This splicing mutation caused not only a total skipping of exon 7 but also a frameshift and premature chain termination. Thus, the deduced translation product contained 351 instead of 409 amino acids, with an entirely different C-terminal sequence following Thr315. These results identify the donor splicing defect, for the first time, as a loss-of-function mutation at theRH50 locus and pinpoint the importance of the C-terminal region of Rh50 in Rh complex formation via protein-protein interactions.

Apart from being a structural unit of Rh antigen expression, the Rh50 and Rh30 proteins appear to possess some hitherto undefined roles essential for the function and integrity of plasma membranes. This proposal is highlighted primarily by the occurrence of Rh deficiency syndrome, a rare autosomal recessive disorder characterized by a chronic hemolytic anemia of varying severity, a hereditary spherostomatocytosis, and multiple membrane abnormalities (1)(2)(3). The Rh deficiency syndrome exists in two conditions in which a complete absence of all Rh antigens defines the Rh null status and a barely detectable presence defines the Rh mod phenotype (11,12). Both conditions exhibit an absence or weakened expression of several other membrane glycoproteins or associated antigens, including Rh50, CD47, LW, Duffy (Fy5), and glycophorin B (GPB for SsU) (1)(2)(3). Therefore, the Rh deficiency syndrome can be regarded as a disorder of impaired protein-protein interactions.
As shown by family studies, Rh deficiency is almost invariably associated with consanguinity and can occur on different genetic backgrounds (11,12). The amorph type of Rh null is thought to arise by silencing mutations at the RH30 locus encoding RhD and RhCE polypeptides, but its underlying molecular defect has remained to be determined (13)(14)(15)). In contrast, the regulator Rh null and Rh mod phenotypes are considered to result from suppressor or "modifier" mutations independent of the RH30 locus (16). The genuine interaction of Rh50 with Rh30 proteins in Rh complex formation points to RH50 locus as a primary candidate responsible for the suppressor forms of Rh deficiency. To facilitate the identification of such suppressor mutations, the organization of Rh50 gene has now been delineated. Here, I describe the exon/intron structure of the Rh50 gene and identification of its associated splicing defect as a loss-of-function mutation in one Rh null patient. The findings reported herein correlate the disease phenotype with an impaired Rh complex formation and provide evidence for the importance of the C-terminal region of Rh50 participating in protein-protein interactions.

EXPERIMENTAL PROCEDURES
Blood Samples-Blood samples from normal human blood donors with RhD-positive (RhD ϩ ) and RhD-negative (RhD Ϫ ) phenotypes (de-* This work was supported in part by National Institutes of Health Grant HL54459. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM  fined by DCe/DCe and dce/dce genotypes) were used as controls. The Rh null blood sample was obtained from a Japanese patient (T. T.). Preliminary studies showed that the propositus was a homozygote for the regulator type of Rh null disease and no Rh antigen was detectable by serologic testing. Furthermore, Southern blot analysis demonstrated that the RH30 locus was grossly intact without apparent gene deletion or rearrangement (15).
Nucleic Acid Isolation and Southern Analysis-Total RNA was isolated from reticulocyte polysomes using the differential cell lysis method (17), followed by extraction with the Trizol reagent (Life Technologies, Inc.). Genomic DNA was prepared from leukocyte pellets, as described previously (18). Southern blot analysis was performed using Rh50, Rh30, and CD47 cDNA probes generated with gene-specific primers (see below) and labeled with [␣-32 P]dCTP (NEN Life Science Products).
Characterization of Exon/Intron Structure of the Rh50 Gene-To determine the structural organization of the Rh50 gene, genomic DNA from a normal person was digested separately with restriction endonucleases EcoRV, HincII, PvuII, SmaI, SspI, and StuI. The total digests of each restriction enzyme were ligated to the same adaptor to generate a genomic library using the Marathon amplification kit (CLONTECH). The exon and its adjacent intron sequences were then amplified in two steps using the Taq DNA polymerase chain reaction (PCR) (19). The first step employed the adaptor primer (AP1) and a Rh50 gene-specific primer (GSP1), whereas in the second step, nested AP2 and GSP2 were used. The resultant products were analyzed by agarose gel and sequenced after purification by 5% polyacrylamide gel electrophoresis. When new sequence information became available, new primers were designed for further bidirectional walking (Table I).
Direct Nucleotide Sequencing and Sequence Analysis-All amplified cDNA and genomic DNA products were purified by native 5% polyacrylamide gel electrophoresis and sequenced with either amplimers or nested primers. Nucleotide sequence determination was carried out using fluorescent dye-tagged chain terminators on an automated DNA  (Table I). Shown below is a representative 1.8% agarose gel electrophoresis of sequenced genomic products. The exon (E)/intron (IVS) content and restriction enzyme usage of amplified fragments are indicated. Note that lanes 8 and 13 are products amplified directly from total genomic DNA, which encompass the whole intron 4 and intron 6, respectively. Bands seen in other lanes each were obtained by two rounds of PCR using AP1ϩGSP1 and AP2ϩGSP2 (Table I) The size of coding sequence for each exon (in base pairs) is shown; exon 1 is counted from ATG and exon 10 ends before the TAA codon (marked by asterisks).
sequencer (model 373A, Applied Biosystems). The resultant nucleotide sequences were analyzed by the DNASIS program (Hitachi), and the deduced amino acid sequences were assessed for hydropathy character using the Kyte-Doolittle plotting method (22).

Organization of Rh50 Gene and Comparison with Rh30
Gene-To delineate the structural organization of the Rh50 gene, a bidirectional walking approach was taken to retrieve unknown sequences (Fig. 1A). 40 synthetic primers that cover various coding sequences (Table I) were used in combination to amplify the adaptor-ligated, restriction enzyme-specific genomic libraries. Fig. 1A shows a representative panel of the resultant Rh50gene products, each spanning a unique exon/ intron junction. They range in size from several hundred base pairs (bp) to several kilobase pairs, depending on the distribution of restriction sites. Sequencing of these amplified products revealed the features of the Rh50 gene and confirmed no coamplification from the related Rh30 genes.
The translated sequence of Rh50 was found to be distributed in 10 exons whose size ranges from 15 (exon 10) to 184 bp (exon 2) (Fig. 1B). This global organization is strikingly similar to that of the Rh30 genes (23, 24) and is essentially conserved in the Rh50 homologues from the mouse and Caenorhabditis elegans. 2 Comparison of Rh50 with Rh30 showed that their sequence homology is confined mainly to exons 2-9, whereas their 5Ј or 3Ј regions share little or no sequence similarity. The size of all internal exons except exons 7 and 8 was conserved, and exon 2 of Rh50 was missing codon AGT for Ser 99 , which is present in Rh30 genes (5-7). Thus, Rh50 and Rh30 show the same assignment of exon/intron junctions except for a difference in their exon 7/exon 8 boundaries (Fig. 1B). The 5Ј region of Rh50 has several putative cis-acting elements (Fig. 2), including the TATA boxes that are absent from the proximal promoter of both RhD and RhCE (23,24). Multiple transcription initiation sites were identified between the two Ets elements. This mapping result was consistent with the assignment of ATG initiation codon noted in bone marrow Rh50 mRNAs (8), although the genomic sequence indicated a potential occurrence of another in-frame ATG codon upstream (nt Ϫ96 to Ϫ94) (Fig. 2). A detailed study of the Rh50 gene, including the mapping of its introns and dissection of its promoter activity and transcription initiation sites, will be described elsewhere. 3  a Suffixes "a" and "s" denote antisense (reverse) and sense (forward) primers, whereas numbers 1 and 2 indicate GSP1 and GSP2, respectively. E10 primers are all located in the 3Ј-untranslated region downstream of TAA stop codon.

Sequence of Splice Sites and Exon/Intron Junctions in the
b All nucleotide positions are accounted from the first base of ATG initiation codon (Fig. 2).

FIG. 2. Nucleotide sequence of the 5 portion of the Rh50 gene.
The 5Ј region and exon sequences are shown by uppercase letters and the intron 1 sequence by lowercase letters. For brevity, the intron 1 sequence, whose size is grater than 15 kilobase pairs in size, is omitted (shown by dots). Putative cis-acting motifs in the promoter are marked and underlined. Note that TATA boxes are not found in the promoter of both RhD and RhCE genes (23,24). Note also that there is a strong strand asymmetry in the region. Multiple transcription initiation sites occur between the two putative Ets binding sites (see Footnote 3). The first position of ATG codon assigned for translation initiation of the erythroid-specific Rh50 protein (8) is denoted. The encoded amino acids of exon 1 and exon 2 (partial) are shown below the nucleotide sequence.
Rh50 Gene- Fig. 3 schematically shows the nucleotide sequence of splice sites as well as the structure of exon/intron junctions in the Rh50 gene. All the 5Ј donor and 3Ј acceptor splice sites conform to the "GT-AG" rule and possess the consensus splicing signals (25). Of the 10 exons identified, only exon 6 is symmetrical, having intraexon codons GTT (Val 270 ) and ACT (Thr 315 ) at its 5Ј and 3Ј ends, respectively, whereas the other exons have either one or two split interexon codons (Fig. 3). One potential consequence of this type of exon/intron arrangement is that skipping of any single internal exon, except exon 6, during the splicing of Rh50 pre-mRNA would result in a shift in open reading frame and, therefore, alter the encoded amino acid sequence downstream of the skipped exon.
Expression of Rh50, Rh30, and CD47 mRNAs in Normal and Rh null Cells-To identify the molecular defect underlying the Rh null disease, the expression of candidate genes encoding the Rh50, Rh30, and CD47 proteins was characterized by RT-PCR and nucleotide sequencing. The full-length cDNA of Rh30 or CD47 was readily detectable in normal and Rh null erythroid cells (gels not shown), indicating a comparable expression of the corresponding mRNA. Sequencing showed that the Rh30 or CD47 cDNA from Rh null was normal and that the Rh30 cDNA contained both RhD and RhCe, indicating that the patient was a DCe/DCe homozygote. Definition of this Rh genotype by transcript analysis was in full agreement with the result of DNA typing by SphI polymorphisms (15). These data showed that the RH30 or CD47 locus itself is not responsible for the disease phenotype.
However, RT-PCR analysis of Rh50 gene expression in erythroid cells revealed an important difference between the normal and Rh null patient. Although there was no apparent change in size of the 5Ј portion of Rh50 cDNA encompassing exons 1-5, the 3Ј portion of Rh50 cDNA encompassing exons 4 -10 always showed a truncation in the Rh null patient (Fig. 4A). This finding indicated that the Rh50 mRNA from Rh null could be an aberrantly spliced form lacking a portion of the 3Ј sequence. Indeed, sequencing showed that the 122-bp sequence of exon 7 was excluded from the truncated cDNA, resulting in the connection of exon 6 to exon 8 (Fig. 4B). To determine whether the skipping was complete or partial, a 3Ј RACE reaction was carried out using 7 s and 3Ј-UTR primers. A cDNA product of expected size (376 bp) was found in normal controls but not in the Rh null patient (Fig. 4C), indicating that no splicing of exon 7 occurred for the Rh50 primary transcript. Further studies showed that this exon skipping was not seen in 15 normal subjects nor in other Rh null patients examined; thus, it could not be a constitutive splicing or regulated alternative splicing event.
Identification of Rh null -associated Donor Splice Site Mutation in Rh50 Gene-The complete absence of exon 7 associated with Rh50 cDNA suggested strongly that either a splicing defect or a genomic deletion was present in the cognate gene. To define the nature of the underlying mutation, amplification from Rh null genomic DNA of a segment encompassing exon 7 of the Rh50 gene was attempted. A fragment of 354 bp in size was detected, excluding the possibility of gene deletion. Sequencing of this fragment on both strands led to the identification of a single G 3 A mutation in the invariant GT element (ϩ1 position) of the 5Ј donor splice site attached to exon 7 (Fig. 5A). Sequencing of other exon/intron junctions amplified with intron-specific primers (data not shown) confirmed this mutation to be the only structural alteration in the Rh50 gene.
Because the mutation abolished a PmlI restriction site (CAC2GTG) (Fig. 3) and introduced a novel NlaIII site (2CATG), a direct diagnostic assay was performed on amplified exon 7-containing fragments. The two enzymes showed an opposite cleavage pattern in normal and Rh null fragments (Fig.  5B), confirming the mutation at the splicing junction. To demonstrate that loss of the PmlI site was not caused by PCR spurious mutations, Southern blot of native genomic DNAs was hybridized with a probe spanning the exon7/intron 7 junction. As shown, the PmlI specific band was seen in normal but not in Rh null (Fig. 5C). Given the observation of no dosage reduction in RH50, these results confirmed that the patient is homozygous for the G 3 A splicing mutation. Such a genotype assignment is consistent with the inheritance of Rh null syndrome in an autosomal recessive fashion.
Deduced Primary Sequence and Predicted Membrane Topology of Rh50 Mutant Protein-To gain information on the primary structure of Rh50 glycoprotein, the Rh null -associated Rh50 cDNAs were sequenced to completion. Compared with normal Rh50, no point mutation other than an absence of the sequence encoded by exon 7 was observed in the Rh null patient (Fig. 6A). Because exon 7 is asymmetric in codon distribution at the 5Ј side (Fig. 3), its complete skipping and the subsequent joining of exon 6 with exon 8 inevitably resulted in an open reading frame shifting (Fig. 6A). In turn, the deduced translation product would be truncated and prematurely terminated, containing only 351 amino acid residues. This includes the loss of 41 amino acid residues encoded in exon 7 and gain of an entirely new sequence of 36 residues following Thr 315 (Fig. 6A).
Compared with the wild-type Rh50 protein (8), hydropathy plot analysis of the mutant form suggested two possible alterations in membrane organization of the C-terminal region The inherent frameshift and premature termination further eliminates the last TM domain, and the resulting new sequence would face the cytoplasmic side due to lack of a continuous stretch of hydrophobic residues. Apparently, loss of a normal C-terminal portion of the Rh50 protein is the major cause for the perturbation of Rh complex formation in the Rh null erythrocyte.

DISCUSSION
Rh50 glycoprotein is a critical coexpressor of Rh30 polypeptides, the carriers of erythrocyte Rh antigens (1)(2)(3)(4). Here, the exon/intron structure of Rh50 gene has been delineated, which should facilitate identification of mutations underlying the suppressor forms of Rh deficiency syndrome. A homology-based approach coupling with bidirectional walking revealed that Rh50 is a single copy gene with 10 exons and has a global organization strikingly similar to its related Rh30 members (23,24). Both the structural conservation and sequence homology of the two genes are confined mainly to exons 2-9, while their 5Ј and 3Ј regions, including the promoter and untrans-lated sequences, share little or no similarity. Since Rh50 and Rh30 genes are located on different chromosomes (5-8), these findings suggest that the two genetic loci might be formed by a rare transchromosomal insertion event. Our recent studies suggest that Rh50 and Rh30 genes originated from a common ancestor and were linked to each other following their initial duplication; later, one was translocated and diverged as the independent locus on a separate chromosome. 4 Comparative analysis of the Rh50 and Rh30 gene orthologues in lower organisms should help decipher the evolutionary pathway ultimately leading to the establishment of two genetic loci encoding the Rh family proteins in Homo sapiens.
The extreme rareness, recessive nature, and consanguineous background of Rh deficiency syndrome (11,12) point to a heterogeneous spectrum of the underlying mechanisms. At present, the molecular defect at RH30 locus responsible for the amorph type of Rh null remains unknown (13)(14)(15). Nevertheless, several lines of evidence suggest that the RH50 locus is the FIG. 4. Analysis of Rh50 transcript expression in normal and Rh null erythroid cells. RT-PCR analysis of Rh50 transcript was carried out using 3Ј-UTR primer for cDNA synthesis and two pairs of amplimers for cDNA amplification. The location, direction, and designation of primers with respect to the structure of Rh50 are specified. A, agarose gel electrophoresis of amplified Rh50 cDNA products from RhD ϩ , RhD Ϫ , and Rh null . The size of segment 4s-10a from Rh null is smaller than that of controls, indicating a deletion in the region spanning exons 5-10. Note that the Rh null lanes were overloaded. B, nucleotide profiles of the exon/exon boundary associated with exon skipping. Exon boundary is indicated by a vertical arrow. In normal, exon 6 is joined to exon 7, whereas in Rh null exon 7 is absent, resulting in exon 6 to exon 8 connection. C, 3Ј RACE assay for the functional splicing of exon 7 in Rh50 pre-mRNAs. A primer anchored in exon 7, 7s, was coupled with 3Ј-UTR primer for 3Ј RACE reaction. The expected cDNA product of 376 bp is clearly seen in control lanes but not the Rh null lane, confirming a complete exclusion of exon 7 from the latter.
prime target of suppressor mutations resulting in the regulator Rh null disease. (i) Rh50 is thought to directly interact with Rh30, and the deficiency of the two proteins in the plasma membrane occurs in parallel (9,26). (ii) Despite a close link of Rh null with absence or deficiency in GPB, Duffy, or LW, the erythrocytes lacking these glycoproteins per se exhibit no change in the Rh antigen expression and no apparent perturbations in membrane physiology and cell morphology (27)(28)(29)(30). Presumably these proteins are casually associated components not essential for the interaction and membrane assembly of Rh family proteins. (iii) Although CD47 is also reduced in Rh null state, its low level of expression is restricted to erythroid cells but not to other hematopoietic cells (31,32), suggesting that CD47 deficiency occurs as the consequence of, rather than the cause for, the defect in Rh complex formation. (iv) More recently, two small DNA deletions causing frameshift in the Rh50 gene have been found to be associated with the regulator Rh null phenotype in unrelated patients (16).
Our previous studies showed that this Rh null patient had a grossly intact RH30 locus occurring in the form of DCe/DCe haplotype combination (15). The present study confirmed this assignment and showed further that the RH30 locus gave rise to expression of both RhD and RhCe transcripts with sequences identical to that from normal subjects. These results, together with the identification of a normal CD47 gene, exclude the involvement of mutations of RH30 or CD47 locus in this Rh null patient. However, transcript analysis showed that there was no expression in the Rh null cells of any full-length form of Rh50 mRNAs except the shortened one specifically lacking the sequence of exon 7. Genomic sequencing revealed the occurrence of a homozygous G 3 A mutation in the invariant GT element of 5Ј donor splice site as the only alteration in the Rh50 gene. These findings establish the pre-mRNA splicing defect, for the first time, as the suppressor mutation of RH50 leading to a loss-of-function phenotype characteristic of the regulator form of Rh null disease.
Mutations in the GT and AG motifs of the donor and acceptor splice sites, the cis-acting elements essential for pre-mRNA splicing (33), portray an important mechanism for the origin of human genetic diseases (34). The donor splice site mutation described here has caused a complete skipping of exon 7 from the mature form of Rh50 mRNA in the Rh null patient. Signifi- NlaIII enzymes. The 6s-7a fragment was cleavable with PmlI in normal but not Rh null . In contrast, the 7s-7aЈ fragment of Rh null has an extra NlaIII site. C, Southern blot analysis of native genomic DNAs from normal person and Rh null patient. Genomic DNAs were digested the enzymes indicated and hybridized with an exon7/intron 7 junction probe. The PmlI cleavable fragments are seen in normal persons but not in Rh null patient. cantly, such a splicing event not only excluded a coding sequence for 41 amino acids but resulted in a frameshift after the codon for Thr 315 and a premature chain termination after the codon for Ile 351 . Therefore, the deduced Rh50 mutant protein contains only 351 amino acids, including a stretch of 36 new residues at the C terminus. Correlation of these primary changes with regulator Rh null disease provides new insight regarding how different mutations might act as suppressors to disrupt or modify the protein-protein interactions that dictate the Rh complex formation.
Prior studies suggested that there may be a direct contact between Rh50 and Rh30 via their N-terminal sequences (9, 10). Nevertheless, additional interacting sites are likely to be present in the Rh protein complex. For the Rh50 mutant reported here, its only difference from the wild-type lies C-terminal to the 10th putative TM domain (Fig. 6). This suggests that the C-terminal half may also participate in the interaction directly and/or confer required conformation to stabilize that interaction. In support of this notion, we have identified in unrelated Rh null patients several missense mutations that are clustered in the C-terminal region of the Rh50 protein. 4 It is of further interest to note that such mutations all target the TM domains in the C-terminal half that are conserved in the Rh50 homologues from the mouse to C. elegans. Currently, little is known about how the disruption of the Rh protein complex causes the multiple facets of structural and functional abnormalities in the Rh-deficient erythrocytes. There is also a lack of general information regarding the involvement and coordination of possible intracellular factor(s) in the functioning of the Rh membrane complex. A full description of Rh null disease mutations and assessment of their phenotypic effects in model systems, such as C. elegans, should lead to a better understanding of the membrane assembly and structure/function relations of the Rh family of proteins.