THE RISE AND FALL OF THE CHEMOATTRACTANT RECEPTOR GPR33

Chemokine and chemoattractant receptors are members of the large superfamily of G protein-coupled receptors (GPCR), which control leukocyte chemotaxis. In addition to their physiological role, several chemokine and chemoattractant receptors, such as CCR5 and Duffy, have been directly associated with pathogen entry. GPR33 is an orphan chemoattractant GPCR that was previously identified as a pseudogene in humans. GPR33 evolved in mammals about 125-190 million years ago. The cloning and analysis of more than 120 mammalian GPR33 orthologs from 16 of 18 eutherian orders revealed an inactivation of this chemoattractant GPCR not only in humans, but also in several great ape and rodent species. Intriguingly, in all ape and some rodent species where the inactivation occurred, samples harbored both pseudogene and intact gene variants. The analysis of over 1200 human individuals representing all major linguistic groups revealed that the intact allele of GPR33 is still present in the human population. Estimates of the age of the human alleles suggest inactivation in the past 1 million years. Similarly, analysis of more than 120 wild-caught gray rats (Rattus norvegicus), revealed that inactivation of gpr33 is worldwide fixed and occurred in less than 0.7 million years ago. The coincidental inactivation and its fixation in several species of distantly related mammalian orders suggest a selective pressure on this chemoattractant receptor gene.


INTRODUCTION
Recruitment of cells involved in the cellular defense against pathogens is mediated by an armada of cell surface proteins, chemokines, complement factors and leucotriens. The receptors that mediate the action of these factors all belong to the large family of G-proteincoupled receptors (GPCR), and constitute within this family structural classes sharing sequence similarities and common transduction pathways (1). In addition to the well known repertoire of CC and CXC chemokine receptors, there are several orphan GPCRs that share structural similarities with chemokine and other chemoattractant receptors and which are currently under intensive investigation. Less attention, however, has been directed to pseudogenes of chemoattractant receptorsamong them GPR33, which was identified as a pseudogene in human but as an intact gene in mouse (2). The human GPR33 gene contains a premature stop codon within the coding sequence of the second intracellular loop but no other obvious structural defects, suggesting a recent inactivation of the receptor. Its close structural relation to ChemR23, a chemokine receptor for chemerin and SIV co-receptor (3,4), N-formyl-peptide (fMLP) receptors and other chemokine receptors, focused our attention on the potential role of GPR33 in the cellular defense that was lost during human evolution.
By screening all vertebrate classes it turned out that GPR33 arose together with the first mammals. The analysis of GPR33 orthologs revealed an almost simultaneous inactivation in humans, several apes as well as in several rodent species about 0.5-1 million years ago. Our data provide evidences for a selective pressure to inactivate this chemoattractant GPCR. However, a small fraction of the human population still harbors an intact GPR33 allele. Our finding may be of great medical relevance because mutational inactivation of other chemokine GPCRs, such as CCR5 and Duffy, are associated with resistance to infections by HIV-1 and by P. vivax, respectively (5-7).

EXPERIMENTAL PROCEDURES
Cloning of GPR33 orthologs -To identify GPR33 sequence in other vertebrates, genomic DNA samples were prepared from tissue or peripheral mononuclear blood cells of various species or were kindly provided by several other labs (Suppl . Table S1). Tissue samples were digested in lysis buffer (50 mM Tris/HCl, pH 7.5, 100 mM EDTA, 100 mM NaCl, 1% SDS, 0.5 mg/ml proteinase K) and incubated at 55 °C for 18 hours. DNA was purified by phenol/chloroform extraction and ethanol precipitation. Based on the human and mouse GPR33 sequences (2) sets of degenerated primer pairs (Suppl . Table S2) were applied to amplify GPR33 specific sequences. PCR reactions were performed with Taq polymerase under variable annealing and elongation conditions. Specific PCR products were directly sequenced and/or subcloned for sequencing into the pCR2.1-TOPO vector (Invitrogen, La Jolla, CA). In case of heterozygosity allelic separation was performed by subcloning and subsequent sequencing.
Sequencing reactions were performed on PCR products with a dyeterminator cycle sequencing kit (Applied Biosystems) on an ABI 3700 automated sequencer (Applied Biosystems). Based on considerable sequence similarities of the 5'-and 3'-untranslated region (UTR) of GPR33 genes primers were designed (Suppl . Table S2) which allowed for the identification of sequences encoding the N and C termini of mammalian GPR33 orthologs.
GPR33 sequence analyses in human, chimpanzee, bonobo and rat samples -To screen large sample sets for the presence of the TGA allelic variant we established a restriction analysis. Because the TGA codon 140 and the flanking sequence does not contain a suitable site for restriction analysis we designed an onebase mismatch antisense primer (DdeI-AS 5'-AATGCTGGAAGCCCAGCGCGGGGcTC-3') which introduces a new DdeI restriction site in the PCR product if the TGA codon 140 is present. Together with a sense primer (DdeI-S 5'-CTGGAACTTTGGAACTGCCTTGTGC-3') PCR reaction resulted in a 166-bp fragment. The reactions were initiated with denaturation at 92 °C for 1 min, followed by 35 cycles of denaturation at 92 °C for 20 s, annealing at 55 °C for 20 s and elongation at 72 °C for 20 s. A final extension step was performed at 72 °C for 10 min. PCR products were treated with DdeI (37 °C overnight) and separated in a 3 % agarose gel. In case of the TGA allele the 166bp PCR product was cleaved into two fragments (139 bp and 27 bp). Uncut fragments were always sequenced. This DdeI restriction analysis was performed for 1217 human individuals (for details see Suppl. Table S3). The DNA sample collection containing most samples of the Centre d'E´ tude du Polymorphisme Humain (CEPH) panel, a panel representing all major linguistic groups (DNA panel from ref. 8), 45 individuals from Papua New Guinea (kindly provided by Mark Stoneking) and 21 Yoruba individuals (International HapMap Project) was used.
To search for individual variations we sequenced the GPR33 coding region in 85 humans (DNA panel from ref. 8), PCR fragments were amplified with genomic DNA samples and primers (hu-32 5'-ACTGTTTCTCACTCCACAGGTC-3' and hu-33 5'-CTATTGGTATAATTGACCAAGTGC-3'). Genomic DNA (100 ng) was used in PCR reactions (50 µl) with primers (10 pmol each), standard buffer (Perkin Elmer), dNTP (200 µM) and Taq polymerase (1 U; Perkin Elmer). The reactions were initiated with denaturation at 94 °C for 3 min, followed by 35 cycles of denaturation at 94 °C for 45 s, annealing at 60 °C for 45 s and elongation at 72 °C for 2 min. A final extension step was performed at 72 °C for 10 min. PCR products were separated in a 1% agarose gel, purified by using a PCR Product Purification Kit (Qiagen) and sequenced.
For dating purposes several kb 5'-and 3'flanking regions of human, chimpanzee, bonobo, R. rattus, and R. norvegicus were amplified by PCR and directly sequenced (Suppl . Tables S4 and S5). Similarly, the coding region of the human GPR33 and approx. 4 kb of the 3'-flanking region were sequenced from 21 Yoruba individuals (International HapMap Project) and analyzed for signatures of positive selection using the Tajima's D-test (9) and the Fu and Li's D-Test (10).
To analyze the allelic variance in rat GPR33, genomic DNA was prepared from 114 wildcaught R. norvegicus from different parts of Germany (Nordrhein-Westfalen, Mecklenburg-Vorpommern, Berlin-Brandenburg), 5 wildcaught R. norvegicus from Russia (Siberia), 6 wild-caught R. norvegicus from United States (Alaska), 2 wild-caught R. norvegicus from Japan, 2 wild-caught R. rattus from Germany (Berlin-Brandenburg), and 4 wild-caught R. rattus from Paraguay. GPR33 coding regions were amplified by PCR and sequenced.
Sequence analyses and age estimates -Nucleotide and amino acid sequence alignments were made with Clustal X program (11). Maximum likelihood trees reconstructed with the PHYLIP software package (12) and the neighbor-joining method (13) gave essentially similar results. Bootstrap replications were conducted to assess the reliability of the trees.
The expected age of a selected allele at a given population frequency is a well studied problem. However, in this case, we only know the sample frequency of the derived allele, not the population frequency. Griffiths (14) investigated various properties of a derived allele given the sample frequency under a range of models. The approach makes use of a diffusion process, which is the large population limit of a wide range of random mating, constant population size models. The derived allele appears in the population at an arbitrarily small frequency, after a single mutational event.
We have implemented his approach to evaluate the expected age of an allele given its sample frequency under a model of genic selection and under a neutral model. Specifically, Griffiths (14) derives the joint probability density function of the population frequency of the allele and its age given the sample frequency. From this, we computed the expected age of an allele for a given sample frequency by numerically integrating the conditional expectation of the age of the allele given a population frequency over the marginal distribution of the population frequency conditional on the sample frequency. This approach yields the expected age in time scaled in units of 2N e , generations where N e is the effective population size of the species. Thus, an estimate of N e and of the generation time is required to translate the age into years. We used a generation time of 25 for humans and 20 for chimpanzee and orangutan and estimates of N e reported in Table 2. Since nothing is known about the effective population size of siamang, we omit the species from this analysis.
To obtain a second estimate of the age of the allele, assuming neutrally, we first used the program PHASE v2.1 to infer haplotypes from genotypes, using default parameters and multiple runs to check the accuracy of the results (15). We then used the GENETREE program to estimate the age of the mutation (16). The program finds the maximum likelihood estimate of the age of the allele for a given population mutation rate, θ, through neutral coalescent simulations conditional on the full polymorphism data set. The age was evaluated for Watterson's estimate of θ, based on the number of segregating sites in the sample (17).
Construction of wild-type and mutant GPR33, cell culture, transfection and functional assays -Full length GPR33 sequences were inserted into the mammalian expression vector pcDps. The human and mouse GPR33 were tagged with an N-terminal hemagglutinin (HA) and a C-terminal FLAG epitope. GPR33 mutation (A125D) was introduced into the HAtagged version of the murine GPR33 using a PCR-based site-directed mutagenesis and restriction fragment replacement strategy. The identity of the various constructs and the correctness of all PCR-derived sequences were confirmed by restriction analysis and sequencing. COS-7 cells were grown in DMEM supplemented with 10% fetal bovine serum, 100 U/ml penicillin, and 100 µg/ml streptomycin at 37°C in a humidified 7% CO 2 incubator. LipofectAMINE (Invitrogen) was used for transient transfection of COS-7 cells. Thus, cells were split into 12-well plates (1.5 x 10 5 cells/well) and transfected with a total amount of 1 µg of plasmid DNA/well. To measure inositol phosphate (IP) formation, transfected COS-7 cells were incubated with 2 µCi/ml of myo-3 H-inositol (18.6 Ci/mmol, NEN) for 18 hours. Thereafter, cells were washed once with serum-free DMEM containing 10 mM LiCl followed by incubation for one hour at 37°C. Intracellular IP levels were determined by anion-exchange chromatography as described (18).
ELISA and immunofluorescence studies -To estimate cell surface expression of receptors carrying an N-terminal HA-tag, we used an indirect cellular ELISA (19). Immunofluorescence studies were carried out to examine the subcellular distribution of the human GPR33. COS-7 cells were transferred into 6-well plates containing sterilized glass cover slips and transfected. Approximately 48 h later, cells were fixed, permeabilized with 0.1% Triton X-100 in PBS, and probed with a monoclonal anti-HA antibody (10 µg 12CA5/ml in PBS) and an affinity purified rabbit anti-FLAG antibody (10 µg /ml in PBS). The primary antibodies were detected using species specific TRITC-and FITC-labeled secondary antibodies (Sigma). Fluorescence images were obtained with a confocal laser-scanning microscope (LSM 510, Carl Zeiss Jena, Jena, Germany).

GPR33 pseudogenes are polymorphic in human and several ape species
Here, we followed the evolutionary trail of an orphan chemoattractant GPCR, GPR33, which was originally identified as a pseudogene in humans (2). The human GPR33 gene contains a stop codon within the coding sequence of the second intracellular loop which leads to a premature termination of translation (see Suppl. Fig. S1). To investigate if the inactivation of GPR33 is recent in primate evolution, we amplified and sequenced the GPR33 orthologs in the great apes and gibbons. Surprisingly, different inactivating nonsense mutations were found together with intact variants in our samples of chimpanzees, orangutans and siamangs (Fig. 1, Table 1). The positions of most stop codons within the GPR33 genes were different from the one (Stop140) truncating the human GPR33. However, the same Stop140 mutation as in humans was also identified in two chimpanzee individuals (Table  1). Since a polymorphism in the ancestor of humans and chimpanzees is highly unlikely to persist for over 5 million years (Myr) (20), this is likely to be the result of two mutations at the same site. Two observations strengthen this interpretation: first, the site is at a CpG, a dinucleotide known to experience an elevated mutation rate (21). Second, an allelic variant at another site (silent C to T transition codon 305) is in complete linkage disequilibrium with the Stop140 mutation in the chimpanzee sample, but is absent from the human sample. Thus, it appears that the Stop140 mutation was introduced independently in the human and chimpanzee lineages in the last 5 Myr after the human-chimpanzee split (20). An extensive sequence analysis of 17 additional primates did not reveal any other GPR33 orthologs with an obvious gene inactivation (Fig. 1), suggesting that GPR33 pseudogenization is restricted to hominoids among primates.
Because the Stop140 was introduced in the human GPR33 after speciation, we determined the Stop140 allele frequency in 1217 individuals representing all major linguistic groups by DdeI restriction analysis (see Experimental Procedures). The inactive allele (TGA, Stop140) was present in all parts of the world at high frequency (Fig. 2, Table 1). Thus, it is likely that GPR33 inactivation occurred before the last migration out of Africa about 50,000-100,000 years ago (20). Strikingly, we identified individuals which were heterozygous (n=50) or homozygous (n=1) for an allele resistant to DdeI cleavage. In all latter individuals the entire GPR33 coding region was sequenced. Sequencing revealed the Stop140 allele and the ancestral sequence of codon 140 (CGA, Arg140, intact receptor protein) in heterozygous individuals. One individual was homozygous for the ancestral allele. Figure 2 shows the geographic distribution of the GPR33 pseudogene and the intact allele which was found with a 2.1% frequency in 2434 alleles screened (Table 1).

GPR33 pseudogenes predominantly occur in rodents and primates but not in other mammals
To find out if GPR33 inactivation is unique to hominoids, we set out to identify GPR33 orthologs from all vertebrate classes (mammals, birds, reptiles, amphibians, fishes) by an extensive PCR-based ortholog cloning strategy (22). GPR33 orthologs were found in eutherians and marsupials (Suppl. Table S1) but not in other vertebrate clades tested. Database search of the available fish genomes (zebrafish, fugu, tetraodon) and the chicken genome revealed no putative GPR33 homologues. The absence of GPR33 in amphibians, birds and reptiles implies that the receptor is less than ~310 Myr old (23). All attempts to amplify GPR33 sequence from two monotremes, platypus and echidna, failed. This in turn implies that GPR33 arose 125-190 Myr ago (24, 25) or alternatively, was lost in the monotreme lineage.
Next, we analyzed GPR33 orthologs from 121 species representing 16 out of a total of 18 existing eutherian orders (26). GPR33 pseudogenes generated by frame shifting deletions or insertions were found in two gerbil species, the laboratory rat, the Syrian hamster, the European mole and, as a polymorphic stop mutation, in the yellow-toothed cavy. In other ordinal clades no obvious GPR33 inactivation was found. It has been estimated that about 65% of all inactivating mutations naturally found in GPCRs are missense mutations which can only be identified by functional testing (27). The other third is obvious because of stop codons, deletions or insertions. Taking in account that 5 of 25 rodent species and 4 of 27 primate species present an apparent inactivation of GPR33, we would have expected to see at least one obvious inactivation in Carnivora (19 species) and Cetartiodactyla (20 species) if GPR33 pseudogenes are as frequent in these orders as in rodents and primates (p<0.01, Fisher's Exact Test, comparing Carnivora and Cetartiodactyla to primates and rodents).

Age estimation of GPR33 pseudogenization in hominoids and rats
GPR33 pseudogenization events in hominoids and rats took place after divergence from their closest living relative. To further delimit the time point of GPR33 inactivation in great apes and humans, we sequenced the 5' and 3' non-coding genomic regions flanking intact and pseudogene GPR33 alleles of human, chimpanzee and bonobo. Comparison of ~5.4 kb homologous genomic sequence revealed an average of 9.6 mutations/1,000 bp between chimpanzee and human alleles (Suppl. Table  S6). Assuming that half of the mutations occurred in humans and chimpanzees, respectively, and a split of both species 5 Myr ago, there is an average mutation rate of 10 -9 per bp per year similar to previous studies (20). The human intact and pseudogene haplotypes differ from each other by 2.2 substitutions/1,000 bp. Thus, by comparison to human-chimpanzee divergence, we would estimate the inactivation of the GPR33 in human to be less than 1.1 Myr old (Table 2). Similarly, the fact that the bonobo GPR33 allele does not carry one of these stop mutations implicates an inactivation in chimpanzees <800,000 years ago (28). This is consistent with our finding of 1.1-1.3 substitutions per kb between intact and pseudogene alleles in chimpanzees while the pairwise divergence between bonobo and chimpanzee is 2.6 to 3.5 substitutions per kb (Suppl . Table S6).
We can also estimate the age of the disruption mutation from its frequency in the sample. If we assume the pseudogene allele is neutral, we estimate an age of ~1 Myr in humans and a very similar time in other ape species (Table 2). A second approach to estimate the age of a neutral allele led to similar results ( Table 2). If instead we assume that it is weakly favored, the age could be as recent as ~50,000 years in humans. In general, the estimate of the age decreases with increasing strength of selection.
Sequence analysis of GPR33 in ten laboratory rats (R. norvegicus) revealed an inactivation caused by a 14-bp deletion. In the black rat (R. rattus) which diverged from R. norvegicus ~5.5 Myr ago (29) GPR33 is intact (five analyzed individuals from Germany and Paraguay). Interestingly, we identified a second GPR33 allelic variant with a 14-bp insertion in wild-caught R. norvegicus, which exactly matches the deleted portion of the first allele discovered (Fig. 3). This unique finding suggests a simultaneous inactivation of both alleles in R. norvegicus by an unequal meiotic crossing over event leading to a reciprocal 14bp deletion and duplication. We took advantage of this finding to date the pseudogenization event by sequencing 4 kb of the 5' and 3' noncoding genomic regions flanking both R. norvegicus alleles and the R. rattus allele (Suppl . Table S6). Based on a comparison of the sequence differences between both R. norvegicus alleles and the divergence from R. rattus, we estimate that the inactivation occurred less than 0.7 Myr ago. Similar to the human GPR33 pseudogene, GPR33 inactivation is probably fixed worldwide in R. norvegicus populations as indicated by the presence of both pseudogene alleles in 127 wild-caught R. norvegicus analyzed from Germany, Alaska, Russia and Japan.

GPR33 has structural and functional features of an chemokine/chemoattractant GPCR
To gain more functional information, we set out to clarify the tissue expression pattern and signal transduction of GPR33. First, we analyzed the expression pattern of the intact GPR33 in mouse tissue by RT-PCR (for details see Supplemental Materials). The mouse GPR33 mRNA is predominantly expressed in lung, spleen, and testis ( Fig. 4A) similar to those found for several fMLP and chemokine receptors (30, 31). In consensus with a predominant expression in immunologically relevant tissues GPR33 transcripts were identified in RAW 264.7 cells, a murine macrophage cell line (data not shown). The persistence of expression in humans and rats may also be indicative to a recent gene inactivation. To date, there are only a few pseudogenes found to be expressed (32). As found for the mouse GPR33, transcripts of the GPR33 pseudogene of R. norvegicus were amplified from spleen cDNA and its identity was verified by sequencing (for details see Supplemental Materials). Similarly, the human GPR33 pseudogene is still transcribed in spleen and lung but also in heart, liver, kidney, pancreas, thymus, gonads, and leucocytes ( Fig.  4B and Supplemental Materials).
Most chemoattractant and chemokine-like GPCR mediate their signal transduction via coupling to G i/o proteins which inhibit adenylyl cyclases and, therefore, decrease intracellular cAMP levels (1). According to the current model of GPCR function, receptor overexpression can result in constitutive activation of signaling pathways. Thus, coupling abilities of several receptors, including "orphan" receptors, have been characterized by overexpression in the absence of agonist (22).
As a member of the rhodopsin-like GPCR family, GPR33 presents common scaffold residues such as the Asp-Arg-Tyr (DRY) motif located at the transmembrane domain 3/intracellular loop 2 (TMD3/IL2) transition. The crystal structure of rhodopsin proposes that the acetic residue in the DRY motif forms a salt bridge to Arg and probably acts as a proton acceptor during receptor activation (33). Interestingly, all GPR33 orthologs cloned from genus Mus display an ARY motif, whereas all other GPR33 orthologs contain the classical DRY motif at the TMD3/IL2 transition (Fig.  5A). There are numerous reports showing that Ala mutation of the acetic residue in the DRY motif results in constitutive activity of many receptors (34). To test the functional consequence of Asp to Ala mutation in murine GPR33 orthologs we utilized second messenger assays. As shown in Fig. 5B, basal activity of the human ADP receptor (P2Y 12 ) which served as positive control (22) and the mouse GPR33 were ~5-fold and ~3-fold increased, respectively, whereas GFP-transfected or human GPR33-transfected cells displayed no significant changes in basal IP levels. To control whether the high basal activity of the murine GPR33 is indeed due to the mutational change in the DRY motif, Ala125 in the mouse ARY motif was reversed to Asp. Replacement of Ala125 by an Asp residue resulted in a decrease of basal receptor activity (Fig. 5B). Plasma membrane expression of Ala125Asp was almost unchanged (110 ± 14 %) when compared with the wild type murine GPR33 (100 %) as determined by an indirect cellular ELISA. These data indicate that the decreased constitutive activity of Ala125Asp is indeed caused by stabilizing the inactive receptor conformation and not by a reduction of the receptor number at the cell surface.

Potential evolutionary modes of GPR33 inactivation
Although the existence of GPR33 provides an advantage as reflected by its presence in all mammalian orders, it was recently inactivated in humans, as well as in a subset of ape and rodent species. Two scenarios have to be discussed that may explain GPR33 inactivation: GPR33 inactivation occurred due to a loss of constraint or due to positive selection.
The "loss of constraint" scenario may occur due to 1.) loss of the endogenous agonist for GPR33, 2.) loss of an exogenous agonist, and 3.) the development of functional redundancy and mechanisms which compensate for loss of GPR33 function. Following gene inactivation by one of these reasons, neutral drift can lead to fixation of the inactive allele. However, scenarios 1 and 3 are very unlikely because they are not compatible with an almost simultaneous inactivation of GPR33 in several unrelated species but not all mammalian orders. The loss of a potential exogenous/environmental factor (point 2) which somehow participates in GPR33 function is more suitable to explain the timely synchronized pseudogenization in several unrelated species. For example, most GPCR pseudogenes are found in mammalian odorant receptors (35). It is assumed that the repertoire of functional and non-functional olfactory receptors is triggered by the odour composition of the environment and the development of compensatory mechanisms (36). The disappearance of an odorant releases the specific olfactory GPCR from its constraint. Similarly, loss of a potential exogenous factor which interacts with GPR33 should also release the constraint on this GPCR. Such mechanism would at least account for the simultaneous inactivation in several unrelated mammals. However, GPR33 inactivation is restricted to only a small subset of apes and rodents and is kept intact in most mammalian orders which share the environment with those species harbouring a GPR33 pseudogene. But we can not rule out the theoretical possibility that a potential exogenous/environmental factor was specific to only humans, great apes, and some rodents. Even if one follow the argument that there was a loss of constraint only in a distinct subset of species it appears very unlikely that drift led to fixation of different nonsense mutations in different GPR33 orthologs in the past one Myr. This is because stop mutations are a rare event. This is reflected by the fact that only 8% of all inactivating mutations in GPCRs are stop codons but almost 65% are missense mutations (27). This calculation does not include all missense mutations which do not interfere with proper receptor function. So, one would expect at least 8 missense mutations per one stop mutation. By contrast, only 1-3 missense mutations became fixed after speciation in GPR33 orthologs of great apes but 3 out of 5 orthologs contain a stop codon. This also implicates that there are several inactivating missense mutations which escape detection unless orthologs have been functionally tested.
In a second scenario GPR33 pseudogenization provides some advantage and, therefore, is positively selected. A likely cause of chemokine receptor inactivation by selection is its interplay with an exogenous factor. For example, the chemokine receptor CCR5 and the Duffy antigen, both members of the superfamily of GPCRs, act as co-receptors for cell entry of pathogens: CCR5 for the internalization of HIV-1 and Duffy for entry of Plasmodium vivax. Mutational inactivation of CCR5 and Duffy are associated with resistance to infections by HIV-1 (5, 6) and by P. vivax (7), respectively. However, HIV-1 has not infected humans long enough to account for the 10 % frequency of inactive allele (CCR5-∆32) in the European population. Other reasons have been proposed such as plague and small pox that selected for the null-allele (37). The expression of GPR33 in macrophages, lung and spleen (see Fig. 4), its relation to fMLP and chemokine receptors, and its coupling to G i/o protein (see Fig. 5B) all suggest that GPR33 most likely functions as a chemoattractant receptor. Unfortunately, all our attempts to identify the endogenous agonist by functional screening of potential ligands, fractionated tissue and bacterial extracts failed so far. Despite its close structural relation to other co-receptors for retroviruses, the introduction of HIV into the human population occurred too recently to explain the frequency of GPR33 inactivation. Additionally, GPR33 inactivation was found also in non-primate species which are usually not the host of hominoidotrope retroviruses. In contrast to GPR33, the inactive CCR5 and Duffy allelic variants display a strong geographic and ethnic restriction to Eurasia and Central Africa, respectively. Therefore, it is more likely that the inactivation of GPR33 conferred a selective advantage to hosts exposed to another probably older and more widespread pathogen.
Assuming a positive selection of GPR33 pseudogenes one has to search for signatures which may support this scenario. The recent fixation of a beneficial substitution of an allele at one site leads to a reduction in variation and a high proportion of rare alleles at linked, neutrally-evolving sites (38). To address this point we sequenced the GPR33 coding region (~1 kb) and 3.8 kb of the 3' untranslated region from 21 Yoruba individuals. For reference purposes the same region was analyzed in a Philippine individual (homozygous for the ancient CGA allele) and in chimpanzee. Sequence data analysis revealed neither of the expected signatures of a recent selection (Suppl .  Table S7). Diversity levels are π=13.3 x 10 -4 , which is not unusually low by comparison to π values (mean π=9.3 ± 4.2 x 10 -4 ) estimated in 24 African-Americans for 217 genic regions (http://pga.gs.washington.edu/summary_stats.ht ml) (39). Moreover, commonly used summaries of the allele frequencies, such as Tajima's D (D=-0.121) and Fu and Li's D-Test (D=1.01) show no marked departures from the expectations of the standard neutral model, nor were the D values unusual by comparison to values obtained in the same survey of African-Americans. Taken together, the polymorphism data do not suggest a recent selective sweep. However, theoretical investigations suggest that if the fixation event occurred prior to 100,000 years ago, the signature will have weakened considerably and may no longer be detectable (40).

Conclusion
After introduction of the chemokine receptor GPR33 into the mammalian genome more than 125-190 Myr ago this receptor underwent pseudogenization in humans, other hominoids and some rodents. This process is still ongoing, as reflected by a polymorphic existence of intact alleles and pseudogenes, e.g. in a worldwide sample of humans. Simultaneous pseudogenization because of loss of constraint and neutral drift which led to fixation of pseudogenes in several species appear to be very unlikely in less than one Myr. Given the fact that inactivation occurred independently and at similar times in several species of unrelated orders, the GPR33 inactivation is likely due to selection. A likely cause of GPR33 inactivation by selection is its interplay with an exogenous factor, e.g. a rodent-hominoidotopic pathogen. If so, the finding that an appreciable fraction of humans still harbor an intact copy of the gene may have important medical implications.

Acknowledgments
We would like to thank the numerous contributes for the species samples (Suppl.

Supporting Information
Suppl. Fig. S1 Suppl. Tables S1-S7   Table 3 and the approximate geographic origin of each sample group is indicated. Sample groups in which only the Stop140 allele was found are marked in blue. The presence of both alleles, the wild-type and the Stop140, are marked in red.

Fig. 3. An unequal meiotic crossover event caused inactivation of GPR33 in R. norvegicus.
Sequence analysis of the GPR33 ortholog from R. norvegicus revealed two alleles. Allele A contains a 14-bp deletion and allele B a 14-bp insertion which represents the deleted portion of allele A. Sequences were compared with GPR33 orthologs from Mus musculus and R. rattus. The numbering is referred to the GPR33 coding sequence from R. rattus. Positions which differ from the sequence of R. rattus are boxed.  It has been demonstrated that replacement of the four C-terminal amino acids of Gα q with the corresponding Gα i residues (referred to as Gα ∆6qi4myr ) confers the ability to stimulate the PLC-β pathway onto G i -coupled receptors (41). Therefore, receptor constructs were co-expressed with the chimeric Gα ∆6qi4myr protein in COS-7 cells and IP assays were performed as described (Experimental Procedures). As controls plasmids encoding GFP, the human P2Y 12 and the human GPR33 pseudogene were co-transfected with Gα ∆6qi4myr . Basal IP formation is expressed as fold over basal levels of GFP-transfected cells (347 ± 8.6 cpm/well). Data are presented as means ± SEM of three independent experiments, each carried out in triplicate.

Species (number of individuals investigated) Allele 1 -Allele 2 Number of individuals
Human (1217) Chimpanzee (9) Orangutan (4) Siamang (3) Arg140 -Arg140 Arg140 -Stop140 Stop140 -Stop140 Trp94 -Stop94 Polymorphic GPR33 inactivation (bold) was found in humans (see Fig. 2 and Suppl.   (20). We assume that the generation time for humans is 25 years (43), while for chimpanzees and orangutans it is 20 years. b Assuming that the selective advantage of the favored allele is 0.005. c The entire coding region of GPR33 was sequenced from 85 individuals of a human genomic DNA panel used in (8). Except of the wild-type and the Stop140 alleles no further sequence variations or evidences of recombination were observed within the coding region of GPR33. Age estimates were performed on the basis of these sequencing results. d There is also a distinct gene disruption mutation present twice in the sample. We estimated the expected age of a selected allele (Stop39/Arg140) ignoring interference between them. e The two estimates are for the two positions at which the mutation could occur in the genealogy. Please note that the GENETREE approach assumes no recombination. While there is no evidence for recombination in the coding region of GPR33, this does not preclude that recombination occurred in the history of the sample. Thus, the results of this analysis should be interpreted with care. f Estimate of the allele age on the basis of mutation rates (see text and Suppl.

Nonsense mutation of GPR33 causes truncation of the receptor protein
Because all GPR33 pseudogenes found in humans and apes were due to stop codons we addressed the question whether the stop codons indeed terminates the translation of the receptor proteins. An increasing number of studies showed that not all stop codons are equal and some allow a leaky translation mainly in plants and yeast (44-46). To test whether the stop codon within the human GPR33 pseudogene leads to a truncated protein the receptor was Nand C-terminally epitope tagged with an HAepitope and a FLAG-epitope, respectively. Immunofluorescence studies revealed the presence of the N-terminal epitope, however, the C terminus could not be detected (Suppl. Fig. S1). Similar results were obtained with the pseudogenes of chimpanzee and orangutan (data not shown). Interestingly, aminoglycoside antibiotics are able to suppress premature stop codons, thereby permitting protein translation to continue to the normal end of the gene. This phenomenon is most likely due to the interaction of the aminoglycosides with ribosomes, reducing the usual stringency of codon-anticodon pairing (47). Recent in vivo studies have shown, for example, that aminoglycoside antibiotics can suppress premature stop codons in the cystic fibrosis transmembrane regulator, dystrophin and the V2 vasopressin receptor genes (48-50). Similarly, COS-7 cells transiently transfected with the human GPR33 regained the ability to produce a full length receptor following incubation with the aminoglycoside geneticin (Suppl. Fig. S1).

Human and rat GPR33 pseudogenes are expressed
To gain more information beyond structural relations, we set out to clarify the expression pattern of GPR33 in mouse tissue by RT-PCR. The mouse GPR33 is a single copy gene at chromosome 12 (12B3) and the structure of the mRNA was unknown. To avoid false positive results in the RT-PCR analysis with primers derived from the coding region because of genomic contamination of the cDNA we first analyzed the structure of the mouse GPR33 transcript. Thus, 5' rapid amplification of cDNA ends (RACE) PCR was performed with a cDNA library from mouse spleen (Clontech, Palo Alto, CA). PCR fragments were cloned and sequenced. In several clones the coding region of the mouse GPR33 was 5' joined with noncoding DNA sequence which matched with the genomic mouse database sequence (Acc. number NT_039551) about 3.6 kb upstream the start ATG of GPR33. The intron of the 5'-UTR is flanked by a classical consensus sequence. The sequence of the GPR33 5'-UTR was verified by direct genomic sequencing. For RT-PCR a primer pair was chosen which flanks the intron and a 413-bp PCR product is obtained only when the exons are properly spliced. As shown in Fig. 4A, GPR33 is mainly expressed in mouse spleen, lung and testis. Then, several murine cell lines were analyzed for GPR33 expression by RT-PCR. In consensus with a predominant expression in immunologically relevant tissues GPR33 transcripts were identified in RAW 264.7 cells, a murine macrophage cell line.
Next, we asked whether GPR33 pseudogenes are also transcribed. Comparison of the rat genomic sequence with the mouse GPR33 locus revealed high sequence similarities in the coding region and the 5'-UTR. Similarly, primers were designed to amplify a 284-bp fragment from spleen cDNA of R. norvegicus. As found for the mouse GPR33 a specific product was amplified and its identity was verified by sequencing.
Except of the coding sequence and the splice acceptor site (5' of the translation start ATG), no similarities are obvious when the mouse and the human GPR33 locus were compared in the 5' region. Therefore, primers derived from the coding region had to be used in the human RT-PCR. As shown in Fig. 4B, the human GPR33 transcript was highly expressed in spleen and lung but also in heart, liver, kidney, pancreas, thymus, gonads, and leucocytes. The tissue panel was controlled for genomic contamination by using the identical antisense primer together with a sense primer directly upstream of the putative splice acceptor site. Although a specific PCR product was amplified with human genomic DNA as control, all samples of the cDNA panel revealed no such product indicating that there is no contamination. Sequence analysis of rat and human pseudogene cDNA revealed no evidence for exon skipping or intergenic splicing which may produce fusion proteins with neighbored genes as found e.g. for P2Y 11  Suppl. Figure S1.