Molecular Basis of the Globoside-deficient Pk Blood Group Phenotype

The biochemistry and molecular genetics underlying the related carbohydrate blood group antigens P, Pk, and LKE in the GLOB collection and P1 in the P blood group system are complex and not fully understood. Individuals with the rare but clinically important erythrocyte phenotypes P1 k and P2 klack the capability to synthesize P antigen identified as globoside, the cellular receptor for Parvo-B19 virus and some P-fimbriatedEscherichia coli. As in the ABO system, naturally occurring antibodies, anti-P of the IgM and IgG class with hemolytic and cytotoxic capacity, are formed. To define the molecular basis of the Pk phenotype we analyzed the full coding region of a candidate gene reported in 1998 as a member of the 3-β-galactosyltransferase family but later shown to possess UDP-N-acetylgalactosamine:globotriaosylceramide 3-β-N-acetylgalactosaminyltransferase or globoside synthase activity. Homozygosity for different nonsense mutations (C202 → T and 538insA) resulting in premature stop codons was found in blood samples from two individuals of the P2 k phenotype. Two individuals with P1 k and P2 k phenotypes were homozygous for missense mutations causing amino acid substitutions (E266A or G271R) in a highly conserved region of the enzymatically active carboxyl-terminal domain in the transferase. We conclude that crucial mutations in the globoside synthase gene cause the Pk phenotype.

The P k blood group phenotype was recognized in 1959 by Matson et al. (1) because of antibodies in the serum of a patient whose erythrocytes were shown to lack an antigen related to but not identical with previously discovered P-related antigens (1). Further studies later showed the P antigen to be globotetraosylceramide (Gb 4 Cer, globoside, GalNAc␤3Gal␣4Gal␤4-GlcCer), 1 the most abundant neutral glycolipid in the erythro-cyte membrane (2) as well as in other mesodermally derived tissues (3,4). Later it was found that human fibroblasts with the P k phenotype were deficient in ␤-N-acetylgalactosaminyltransferase activity (5).
With only few exceptions the molecular basis of the major human blood group antigens have been determined and the genes responsible for erythroid expression of the various phenotypes cloned and characterized. However, the biochemistry and molecular genetics behind the carbohydrate-based P1 antigen in the P blood group system (numerical nomenclature 003001 according to the International Society of Blood Transfusion, ISBT) 2 and the P, P k , and LKE antigens in the GLOB collection (ISBT no. 209001, 209002, and 209003) are complex and not yet fully understood (6). Fig. 1 shows the biosynthetic route to these glycosphingolipid antigens by the sequential addition of monosaccharide residues to ceramide by different glycosyltransferases (7). The key enzyme for initiation of globo-series glycolipid synthesis, UDPgalactose:lactosylceramide 4-␣-galactosyltransferase (Gb 3 Cer/ P k /CD77 synthase, ␣4Gal-T), was independently cloned by three research teams in the year 2000 (8 -10). Mutations in this gene were shown to constitute the molecular basis of the p phenotype in which all antigens of the P blood group system and GLOB collection are lacking on the cell surfaces (8,11). Attempts to find polymorphisms in the ␣4Gal-T gene correlating with P1ϩ/P1Ϫ status were fruitless, however (8). Therefore, it is still unclear if another gene codes for P1 synthase (also an ␣4Gal-T) or if other mechanisms exist.
Cells of the p phenotype, which lack P1/P/P k antigens; of the P 2 k phenotype, which lack P1/P antigens; and of the P 1 k phenotype, which lack the P antigen are of principal interest because of potent naturally occurring antibodies that are regu-* This work was supported in part by the Swedish Research Council, the Medical Faculty at Lund University, the Lund University Hospital Donation Funds, the Claes Högman SAGMAN-stipendium, and Tore Nilson's Fund for Medical Research. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EBI Data Bank with accession number(s) AF494103, AF494104, AF494105, AF494106.
¶ To whom correspondence should be addressed: Blood Centre, University Hospital, SE-221 85, Lund, Sweden. Tel.: 46-46-173210; Fax: 46-46-173226; E-mail: Martin_L.Olsson@transfumed.lu.se. 1 The abbreviations used are: Gb 4 Cer, globotetraosylceramide/globoside/GalNAc␤3Gal␣4Gal␤4GlcCer; Gb 3 Cer, globotriaosylceramide, Gal␣4Gal␤4GlcCer; ␣4Gal-T, 4-␣-galactosyltransferase, Gb 3 Cer/P k / CD77 synthase; ␤3GalNAc-T1, 3-␤-N-acetylgalactosaminyltransferase, Gb 4 Cer/globoside synthase; ␤3Gal-T1 to T5, 3-␤-galactosyltransferase 1 to 5; ISBT, International Society of Blood Transfusion; PCR-SSP, polymerase chain reaction with sequence-specific primers; nt, nucleotide. 2 Blood group antigens, phenotypes, and antibodies are designated according to the nomenclature recommended by the ISBT working party on terminology for red cell surface antigens. larly present in plasma. As in the ABO blood group system, antibodies of the IgM and IgG class (anti-PP1P k , anti-P1P or anti-P) are made against the missing blood group antigens. While the frequency of the p phenotype in general has been estimated at 5.8 per million (14) it is considerably higher in certain populations, especially in Northern Sweden (15). The frequency of the P k phenotype is not known but it is generally accepted to be even rarer than p. The frequencies are possibly slightly higher in Finland and Japan but not as marked as for p in Northern Sweden (16).
There are several reasons for the scientific and medical attention spent investigating patients with these rare phenotypes (17). P-related antibodies are implicated in hemolytic transfusion reactions if random antigen-positive blood is transfused. The P, P1, and P k antigens are well developed on fetal erythrocytes but in most cases relatively mild hemolytic disease of the newborn because of the anti-PP1P k or anti-P has been reported (16). Women of the p and P k phenotypes suffer a higher frequency of spontaneous abortion than normal, a phenomenon that is most likely due to the IgG anti-P component (18). The placenta expresses high levels of P and P k antigen (as opposed to the early fetus itself) and has been suggested as the prime target for these antibodies (19). Another anti-P-mediated disease usually seen in children following a viral infection is paroxysmal cold hemoglobinuria in which P-positive erythro-cytes are lysed by a transient auto-anti-P (20).
Globoside/P antigen has also been identified as a cellular receptor for parvo-B19 virus (21) that causes erythema infectiosum, also known as fifth disease, in children and sometimes is complicated by severe aplastic anemia. Complete viral replication and subsequent cell lysis are limited to early erythroid precursor cells expressing the globoside receptor (22). Individuals lacking the receptor appear to be naturally resistant to this infection (23). Finally, it has also been shown that some P-fimbriated E. coli express globoside-binding molecules (PapF and PapG) at the tips of their pili (24), a finding with possible implications for their uropathogenicity.
Because of the relationship to many human pathogens and diseases the interest in glycolipid-based blood group antigens remains high. In this study we tested the hypothesis that the reported ␤3GalNAc-T1/globoside synthase (12) is indeed the enzyme responsible for synthesis of P blood group antigen. By analyzing the coding DNA sequence of the corresponding gene we show here for the first time that mutations capable of abolishing enzyme function are present in individuals with the P k phenotype.

EXPERIMENTAL PROCEDURES
Blood Samples-Four blood samples of the P k phenotype (one P 1 k and three P 2 k ) were obtained from the liquid nitrogen in-house collection of test cells at the International Blood Group Reference Laboratory (IBGRL). The phenotype of the cells and presence of anti-P in serum were confirmed by standard serological methods at IBGRL and in two cases also by other reference laboratories. For screening of the detected missense mutations blood samples from apparently healthy random blood donors of mixed European descent were used.
DNA Preparation-DNA was prepared from EDTA or acid-citratedextrose blood at the Blood Centre in Lund by a salting out method modified from Miller et al. (25) and dissolved in H 2 O at a concentration of 100 ng/l. PCR Amplification and DNA Sequencing-Oligonucleotide primers were synthesized by DNA Technology ApS (Aarhus, Denmark). For each reaction 5 pmol of primers P-(-6)-F and P-1015-R or P-123-F and P-891-R (Table I) were mixed with 100 ng of genomic DNA, 2 nmol of each dNTP, 2% glycerol, 1% cresol red, and 0.5 units of AmpliTaq Gold (PerkinElmer/Roche Molecular Biochemicals) in the buffer supplied. The final reaction volume was 11 l. Thermocycling was undertaken in a GeneAmp PCR system 2400 (PerkinElmer Cetus). Initial denaturation at 96°C for 7 min was followed by 35 cycles at 94°C for 30 s, 64°C for 30 s, 72°C for 1 min, and a final extension for 2 min. PCR products were excised from 3% agarose gels (Seakem, FMC Bioproducts, Rockland, ME) stained with ethidium bromide (0.56 mg/liter gel, Sigma Chemicals) and purified using the QiaQuick gel extraction kit (Qiagen GmbH, Hilden, Germany). The Big Dye Terminator Cycle sequencing kit (Applied Biosystems Group) and an ABI PRISM 310 Genetic Analyser (Applied Biosystems Group) were used for direct DNA sequencing according to the manufacturer's instructions. Besides the PCR primers, internal primers (Table I) GAGCCAGGAGGTGGGTTTGCC c a Primer used for amplification of PCR fragments for sequencing. b Primer used as sequencing primer. c Primer used for PCR-SSP to screen for presence of and confirm homozygosity for the two missense mutations.
artifacts, sequencing was performed on both strands using independently obtained PCR fragments. Sequence analysis was performed with SeqEd software 1.03 (Applied Biosystems Group). Sequence-specific Primer PCR (PCR-SSP)-PCR with allele-specific primers designed to detect the two missense mutations found in the study was performed. For A 797 3 C detection 2.5 pmol of primer P-797C-null-F and P-1015-R were used together with 0.5 pmol of internal control primers JK-F3L and JK-R3L, modified from Ref. 26, with PCR conditions as above. For G 811 3 A detection 7.5 pmol of primer P-811A-null-F and P-1015-R were used instead.
For confirmation of homozygosity consensus primers P-797A-F and P-811G-F substituted P-797C-null-F and P-811A-null-F at 5 and 10 pmol per reaction, respectively. The annealing temperature for the latter reaction was increased to 65°C for optimal specificity.

RESULTS
PCR Amplification and DNA Sequencing-Using the oligonucleotide primers shown in Table I the entire coding region of the ␤3GalNAc-T1 was amplified in one fragment in samples from donors with the phenotype P 1 k (n ϭ 1, English), P 2 k (n ϭ 3, Arabian, French, Finnish), or common GLOB collection/P blood group phenotypes (n ϭ 2, Swedish). Sequence analysis of the amplified gene fragments was performed and compared with sequences deposited in GenBank TM : AB050855 (12) and Y15062 (13). The Swedish random donors had ␤3GalNAc-T1 gene sequences identical to the consensus GenBank TM entries.
However, in two of the P 2 k samples homozygosity for two distinct nonsense mutations resulting in premature stop codons were detected. The Finnish sample was homozygous for C 202 3 T resulting in an immediate stop codon following residue 67. The Arabian sample was homozygous for a single adenosine insertion at nucleotide (nt) 537-538 (AG 3 AAG), here designated 538insA. This insertion causes a frameshift from amino acid 180 and a premature stop at codon 182 (consensus-Arg 180 -His 181 -Stop). The remaining two samples were homozygous for two different missense mutations. In the English P 1 k sample glycine at residue 271 is substituted for arginine because of G 811 3 A whereas glutamic acid is changed to alanine at residue 266 because of an A 797 3 C substitution in the French P 2 k sample. DNA sequencing chromatograms visualizing the detected sequences from wild-type and P k donors are found in Fig. 2. A schematic representation of the open reading frames in the consensus and variant alleles of the ␤3GalNAc-T1 gene is shown in Fig. 3.
A comparison of glycosyltransferase genes with significant homology revealed that both of the missense mutations found involve charged versus non-charged residues in a highly conserved region of the carboxyl-terminal globular domain of the transferase (Fig. 4). No other deviations from the ␤3GalNAc-T1 consensus sequence were encountered in any of the analyzed samples. No mRNA studies could be undertaken due to lack of fresh samples.
Confirmation of Homozygosity-Because no other polymorphisms were detected in the ␤3GalNAc-T1 gene we ensured that both alleles were amplified for sequence analysis by using two independent PCR primer pairs, P-(-6)-F and P-1015-R or P-123-F and P-891-R. In two of the samples the presence of missense mutations and absence of consensus sequence were confirmed with genomic DNA and sequencespecific primers. Representative electrophoretograms are shown in Fig. 5.
Screening for Missense Mutations by Sequence-specific Primer PCR (PCR-SSP)-DNA samples of 220 mixed European blood donors were screened for the two missense mutations found during the study to exclude that they constitute common polymorphisms in the gene. Neither mutation was detected in any of the 440 alleles analyzed. Representative results outlining the screening procedure are shown in Fig. 5.

DISCUSSION
The data presented in this report show that mutations in the recently cloned ␤3GalNAc-T1 gene are associated with the  Table I) are shown as filled and open arrowheads above the bars, respectively. However, the primer targeted against the 3Јuntranslated region of the gene was used for both purposes but is filled.
clinically important P k phenotype in which synthesis of the globoside/P antigen cannot take place. The four samples analyzed were homozygous for rare alleles, which is not unexpected considering the low frequency of this phenotype in all populations. This is most probably due to consanguineous marriages as previously shown in at least 7 of 33 studied propositi of the P k phenotype (16). As is often the case with rare phenotypes when propositi from different ethnic groups and/or geographic areas are studied the molecular background for the samples in this study was quite heterogeneous. Similar data have been obtained for almost all rare blood group phenotypes with sporadic cases in most populations. Some of these variant phenotypes have a higher frequency in geographically isolated populations. A slightly increased frequency of the P k phenotype has been noted in Japanese and Finns (16). It is therefore possible that the allele with C 202 3 T encountered in the Finnish sample is a founder gene responsible for the P k phenotype also in other Finnish cases similar to the situation with the rare Jk3 negative blood group phenotype in Finns (27).
In two of the investigated samples nonsense mutations at nt 202 or 538 truncate the protein to 20 and 55% of its native length, respectively, thereby making any retention of enzymic activity impossible because the enzymatically active domain is in the carboxyl-terminal portion. This agrees well with other blood group-related glycosyltransferases that lose activity if severely truncated. For example, the common O allele (O 1 ) in the ABO blood group system has a single guanosine deletion at nt 261 causing a reading frameshift that truncates any translated product at residue 117 (28) leaving no resemblance to A or B transferase after residue 86 (thus 25% of the amino acid sequence remains intact). Other examples are inactive forms of the FUT2 gene product, the 2-␣-fucosyltransferase of the H blood group system, which are truncated at 55 and 61% of the full reading frame (29).
The other two P k samples had missense mutations changing amino acids in the functionally active globular domain in the carboxyl-terminal portion of the transferase (30). By comparing the amino acid sequence of homologous genes (Fig. 4) it was shown that the mutated residues are either invariant or highly conserved among the sequences compared, suggesting their importance for the function of ␤3GalNAc-T. Furthermore, they are located in an evolutionarily conserved cluster of amino acids found in glycosyltransferases belonging to different functional families (␤3Gal-T, ␤3GalNAc-T, and ␤3GlcNAc-T) in humans and also in a Drosophila homologue, suggestive of importance for this region across species boundaries and enzyme specificities. Although not expressed and experimentally proven non-functional by enzyme analysis the data presented make it highly probable that both the E266A change from a large negatively charged to a small hydrophobic residue and the G271R change from a small non-charged to a large positively charged residue causes enough disturbance in this well conserved region to render the protein inactive. Again, analogy can be taken from the ABO system where the full-length O 2 protein is rendered non-functional by a G268R change (31,32) at a conserved residue proposed to be exposed in the substratebinding pocket of the 3-␣-galactosyltransferase family (33).
Furthermore, it has been shown unambiguously (14) that the P1 antigen (putative gene located on chromosome 22) is inherited independently from the P antigen (globoside synthase gene located on chromosome 3). Thus, there is no reason to believe that G271R and E266A are specific for the P 1 k and P 2 k phenotypes, respectively. The frequency of the P1 antigen among Caucasians is ϳ75%. Any correlation of G 811 3 A (G271R) to FIG. 4. Multiple amino acid sequence alignment of homologous glycosyltransferases based on ClustalW plots presented in previously published papers (12, 13, 34, 35). Sequences from the enzymatically active globular domain of the carboxyl-terminal portion in the transferases corresponding to amino acids surrounding the residues mutated in two of the P k samples are shown. Numbering on the left and right refers to the first and last amino acid shown on each line, respectively. The sequences are from human ␤3GalNAc-T1 (12) originally designated ␤3Gal-T3 (13), human ␤3Gal-T1, 2, and 4 (13), human ␤3Gal-T5 (34), human ␤3GlcNAc-T with poly-N-acetyl-lactosamine synthase activity also predesignated ␤3Gn-T (35), a homologous gene, GT9, which was initially published as ␤3Gn-T (35), and a Drosophila melanogaster gene homologue to the human ␤3Gal-T gene family designated Brainiac (36). The two upper lines show the sequences from the English and French P k individuals. Hyphens represent sequence identity to ␤3GalNAc-T1. Light gray, dark gray, and black boxes symbolize identity between 4 -5, 6 -7, or all 8 of the compared sequences, respectively.  Table I) and separated on 3% agarose gels. The upper (nt 797) and lower (nt 811) panels show screening with primers amplifying the mutated (left) or consensus (right) sequence. Samples from individuals with the following phenotypes are shown: lane 1, P k phenotype donor homozygous for the A 797 3 C (upper) or G 811 3 A (lower) mutations; lane 2, P k phenotype donor negative for the tested mutation; lanes 3 and 4, random donors of common phenotype; lane 5, H 2 O contamination control. M is the molecular size marker X174RF DNA HaeIII (Invitrogen). Filled and thin arrows indicate the globoside synthase-specific fragments (sizes given in base pairs) and the JK blood group gene-derived control (952 bp) DNA fragments, respectively.
P1ϩ status was effectively ruled out by PCR-SSP screening of the 220 samples, none of which were positive for the mutation while the majority was P1ϩ.
The P antigen may now be extracted from the GLOB collection and promoted to constitute a blood group system of its own according to the inclusion criteria set up by the ISBT working party on terminology for red cell surface antigens. The rule is that a blood group system has to be defined by a single genetic locus or possibly two (or three) closely linked loci so that the antigens included in that system are localized to the same (type of) molecule. Based on the data presented here the P antigen is now tied to a defined genetic locus different from other blood group loci and the molecular basis of the null (P 1 k and P 2 k ) phenotypes of the system has been identified.