The Human CC Chemokine Receptor 5 (CCR5) Gene

Human CC chemokine receptor 5 (CCR5), mediates the activation of cells by the chemokines macrophage inflammatory protein-1α, macrophage inflammatory protein-1β, and RANTES, and serves as a fusion cofactor for macrophage-tropic strains of human immunodeficiency virus type 1. To understand the molecular mechanisms that regulate human CCR5 gene expression, we initiated studies to determine its genomic and mRNA organization. Previous studies have identified a single CCR5 mRNA isoform whose open reading frame is intronless. We now report the following novel findings. 1) Complex alternative splicing and multiple transcription start sites give rise to several distinct CCR5 transcripts that differ in their 5′-untranslated regions (UTR). 2) The gene is organized into four exons and two introns. Exons 2 and 3 are not interrupted by an intron. Exon 4 and portions of exon 3 are shared by all isoforms. Exon 4 contains the open reading frame, 11 nucleotides of the 5′-UTR and the complete 3′-UTR. 3) The transcripts appear to be initiated from two distinct promoters: an upstream promoter (PU), upstream of exon 1, and a downstream promoter (PD), that includes the “intronic” region between exons 1 and 3. 4) PU and PD lacked the canonical TATA or CAAT motifs, and are AT-rich. 5) PD demonstrated strong constitutive promoter activity, whereas PU was a weak promoter in all three leukocyte cell environments tested (THP-1, Jurkat, and K562). 6) We provide evidence for polymorphisms in the noncoding sequences, including the regulatory regions and 5′-UTRs. The structure of CCR5 was strikingly reminiscent of the overall structure of other chemokine/chemoattractant receptors, underscoring an important evolutionarily conserved function for a prototypical gene structure. This is the first description of functional promoters for any CC chemokine receptor gene, and we speculate that the complex pattern of splicing events and dual promoter usage may function as a versatile mechanism to create diversity and flexibility in the regulation of CCR5 expression.

CC chemokine receptor 5 (CCR5), 1 a receptor for the CC chemokines macrophage inflammatory protein-1␣, macrophage inflammatory protein-1␤, and RANTES (1)(2)(3), also serves as a fusion cofactor for the entry of macrophage-tropic strains of HIV-1 (4 -8). The level of CCR5 cell surface expression may have a direct influence on the relative ease with which an individual acquires HIV-1 infection (9 -12): individuals homozygous for a 32-bp deletion (denoted ⌬CCR5) in the open reading frame (ORF) do not express the protein on the cell surface, and are relatively resistant to developing HIV-1 infection. In contrast, individuals who display the CCR5/⌬CCR5 genotype can develop HIV-1 infection, however, their progression to AIDS may be slower. Interestingly, in individuals who display the CCR5/CCR5 genotype, the cell surface expression of CCR5 can be highly variable (13), however, whether this heterogeneity in protein expression also correlates with differences in HIV-1 infection/transmission in vivo is not known. These observations suggest that a therapeutic or preventive strategy based on targeting CCR5 cell surface expression could potentially be quite beneficial. Toward this end, we have initiated studies to define the structural organization of CCR5 and molecular factors that regulate its expression.
Phylogenetic analysis of the G-protein coupled receptor (GPCR) superfamily indicates that replication of a progenitor gene may have given rise to clusters of evolutionarily related receptor genes (14,15). Two such GPCR clusters are members of the chemokine receptor subclass, and receptors for the classical chemoattractants, such as the N-formyl peptide receptor. To date, the complete mRNA and genomic organization of only a limited number of human chemokine receptors has been described (16 -20), however, a comparison of their structural organization with that of the receptors for the classical chemottractants reveals some striking similarities (21)(22)(23)(24). 1) Their ORFs are usually intronless or contain a single intron interrupting the amino-terminal coding region, as is the case for the C5a receptor (21). 2) Their 5Ј-untranslated regions (UTR) can have a surprisingly complex organization. Unlike most GPCRs, the 5Ј UTRs for these genes reside on multiple exons and alternative splicing may generate multiple mRNA isoforms. 3) Splicing of the untranslated exons to form the mature transcripts occurs at a common 3Ј-splice junction that is a short distance upstream of the start of the translation. Thus, the transcription and translation start sites can be separated by long intervening sequences. 5) Although they are products of distinct genes, they tend to be physically clustered on a single chromosome (14,15,19,21,25). For example, CCR5 and several other CCRs co-localize on chromosome 3p21. 3-p24 (25), whereas several of the chemoattractant receptors co-localize to 19q13.3 (21). These similarities suggest that despite coding for receptors that have diverse ligand-receptor relationships, these two subclasses of receptors have retained a remarkably conserved structural organization.
Some of these prototypical structural features also appears to be true for human CCR5. First, a partial length gene (1376 bp) has been cloned and it has an intronless CCR5 ORF (Ref. 1; position 240 to 1298). Second, cDNA clones for CCR5 have been cloned and reported by two groups (2,3). Comparison of the partial CCR5 sequence with that of the cDNA clones, and restriction mapping of P1 clones suggests the presence of a single ϳ1.9-kb intron between position Ϫ11 and Ϫ12 relative to the start of translation (1)(2)(3). To delineate the full extent of the 5Ј-UTR of human CCR5, Raport et al. (3) also performed 5Ј-RACE (5Ј-rapid amplification of cDNA ends) on human spleen cDNA, and by this method the longest 5Ј-UTR identified was 54 nucleotides in length. The cDNA clone reported by Raport et al. (3) also contains a poly(A) tail, suggesting a full-length 3Ј-end. Nevertheless, the exact location of the remainder of the reported CCR5 5Ј-UTR sequence on the gene, and the nature of the cis-acting elements is not known.
Expression of CCR5 at the mRNA level suggests that CCR5 may contain tissue-specific cis-acting elements. An ϳ4-kb human CCR5 transcript has been observed in several human cell lines, and in human thymus, spleen, small intestine, and peripheral blood leukocytes (1-4). Combadiere et al. (2) have shown that human CCR5 transcripts are present in primary adherent monocytes but are absent from the primary neutrophils and eosinophils. Carroll et al. (26) have reported recently that human unstimulated CD4ϩ cells do not express CCR5 mRNA. However, CD4ϩ cells activated by phytohemagglutinin (PHA)/IL-2 expressed CCR5 mRNA, whereas those costimulated with immobilized antibodies to CD3/CD28 did not. Both unstimulated CD4ϩ cells and CD4ϩ cells costimulated with CD3/CD28 were resistant to infection by macrophage-tropic strains of HIV-1 in vitro, whereas phytohemagglutinin/interleukin-2 activated CD4ϩ cells could be infected, further highlighting the importance of understanding the molecular mechanisms that regulate CCR5 expression.
The present study represents a first step toward addressing some of the issues outlined above. Unlike reported previously, we demonstrate that the mRNA structure of human CCR5 is not monomorphic. Instead transcript analysis by 5Ј-RACE and RT-PCR (reverse transcriptase-polymerase chain reaction) revealed complex alternative splicing patterns in the 5Ј-UTRs of CCR5: alternative splicing of four exons that span ϳ6 kb of CCR5 give rise to multiple CCR5 transcripts that differ in their 5Ј-UTRs. Although the generation of multiple CCR5 transcripts has no effect on the protein sequence of CCR5, it does have consequences for the regulation of the gene as we demonstrate that CCR5 transcription is regulated by at least two promoters, and we ascribe an important role for the 5Ј-UTR and intron sequences in regulating CCR5 expression. In this report we also provide evidence that the regulatory sequences and noncoding exons of CCR5 are polymorphic.

EXPERIMENTAL PROCEDURES
Cells and Cell Culture-After obtaining informed consent, normal adult donors were pretreated with granulocyte colony stimulating factor (Amgen; 10 g/kg body weight, subcutaneously) for 5 days, and then their low density cells in the peripheral blood were collected by apheresis. These cells were enriched for CD34ϩ progenitor cells by positive selection, using the Ceprate SC column (CellPro, Bothell, WA). The purified CD34ϩ cells were differentiated into dendritic cells by cultur-ing them in a cytokine mixture for 7 days. The cytokine mixture contained stem cell factor (20 ng/ml), granulocyte macrophage colony stimulating factor (20 ng/ml), and tumor necrosis factor (TNF)-␣ (2 ng/ml; R & D Systems). The culture conditions were similar to those described previously (27), and included Iscove's modified Dulbecco's medium and 20% fetal calf serum (Life Technologies). We confirmed that the cells derived from the cytokine-stimulated CD34ϩ cells had a dendritic cell phenotype by two independent criteria (data not shown): first, by fluorescence-activated cell sorter they expressed a high percentage of cell surface markers characteristic of dendritic cells; and second, dendritic cells pulsed with tetanus toxoid and purified-protein derivative stimulated the proliferation of autologous T cells. Density gradient Ficoll centrifugation was used to isolate peripheral blood mononuclear cells from whole blood and the cells obtained from the apheresis flowthrough. Monocytes were isolated from peripheral blood mononuclear cells by plastic adherence for 6 h. CD4ϩ cells were purified by positive selection using the Ceprate LC4 column (CellPro). To prepare activated CD4ϩ T lymphocytes, resting CD4ϩ T lymphocytes were stimulated with irradiated autologous dendritic cells. Lymphocytes, monocytes, and peripheral blood mononuclear cells were cultured in RPMI and 10% fetal bovine serum. The apheresis protocol was approved by the Institutional Review Board of the University of Texas Health Science Center at San Antonio, TX.
RNA Extraction-Total RNA was extracted from human leukocytes, including dendritic cells and the cell lines (THP-1 and Jurkat), using commercially purchased reagents according to instructions of the manufacturer (Trizol; Life Technologies).
5Ј-RACE and RT-PCR-The template for 5Ј-RACE was total RNA (1 g) isolated from dendritic cells. For RT-PCR, the template was 1 g of total RNA isolated from human leukocytes, including dendritic cells. We used a 5Ј-RACE kit (5Ј-RACE System, Life Technologies) as per instructions of the manufacturer. The sequences of the reverse and forward primers (primary and nested) corresponded to the amino terminus of the CCR5 ORF and the anchor primer, respectively. The 5Ј-RACE products were subcloned into pBlueScript II SK(ϩ) and the nucleotide sequence was determined on both strands. To confirm the sequence composition of the 5Ј-RACE products we performed RT-PCR reactions on the aforementioned RNA templates. A reverse primer complementary to the amino terminus of CCR5 was first extended by avian myeloblastosis virus reverse transcriptase (Invitrogen), and then PCR was performed with a forward primer that was specific to the 5Ј-most unique sequence segment identified by 5Ј-RACE and a reverse primer specific to the CCR5 ORF; semi-nested PCR was then performed on this PCR template with primers specific to the 5Ј-UTR (Fig. 1B). The RT-PCR products were subcloned into pBlueScript II SK(ϩ) and sequenced. Oligonucleotides were synthesized by the Advanced DNA Technology Unit, University of Texas Health Science Center at San Antonio, TX. DNA sequence analysis was performed by the dideoxy method according to the manufacturer's instructions (U. S. Biochemical Corp.) and also by the Dye Terminator Cycle Sequencing method using an automated fluorescent sequencer (Applied Biosystem 373).
Characterization of CCR5 Gene-The genomic region upstream of the 5Ј-UTR sequence reported by Raport et al. (3) was cloned by using the Human PromoterFinder Kit (CLONTECH) according to the manufacturer's protocols. The forward and reverse primers (primary and nested) were complementary to the adaptor ligated to the genomic DNA fragments in each library, and the 5Ј-UTR sequence reported by Raport et al. (3), respectively. A series of overlapping genomic DNA amplification products were also generated using PCR primer sets specific to the following regions: 1) 5Ј-UTR and amino terminus of the ORF; 2) amino terminus and the intracellular carboxyl-tail of the ORF (28); and 3) the intracellular carboxyl-tail of the ORF and a reverse primer whose 3Ј terminus is immediately upstream to the polyadenylation signal sequence AAATAA in the 3Ј-UTR. The PCR amplification products were subcloned into pBlueScript II SK(ϩ) and the nucleotide sequence was obtained for both strands. Nucleotide sequences were analyzed by algorithms in the GCG software (BLAST, FASTA, BestFit) and Gene-Works (IntelliGenetics, CA). The promoter sequences were analyzed for the presence of potential transcription factor binding sites by the SIGSCAN (http://bimas.dcrt.nih.gov/molbio/signal; Ref. 29) and Mat-Inspector (http://transfac.gbf-braunschweig.de/TRANSFAC/; Ref. 30) programs.
Construction of CCR5 Promoter Constructs-Convenient restriction endonuclease sites and/or PCR were used to create a series of gene fragments of varying lengths from different regions of CCR5, and they were cloned into the promoterless pGL3-Basic vector (Promega) upstream of the firefly luciferase gene. Nucleotide fidelity was confirmed by sequencing.

Transient Transfection of Cell Lines and Luciferase
Assays-The cell lines (K562, Jurkat, and THP-1) were obtained from ATCC (Rockville, MD). The promoter constructs were transfected into the cell lines as described previously (18). Transfection efficiency was normalized by cotransfecting either the promoterless vector pGL3-Basic or the CCR5 promoter constructs with 0.5 g of renilla luciferase vector, pRL-CMV (Promega). Forty hours post-transfection the cells were pelleted, washed in Dulbecco's phosphate-buffered saline and lysed in 1 ϫ passive lysis buffer (Promega). The firefly and renilla luciferase activities were determined according to manufacturer's instructions (Dual-Luciferase Reporter Assay System, Promega) in a luminometer (Turner TD-20/20). In initial experiments, we determined that the protein concentration in the cell lysates as measured by the Bradford method were comparable between and within experiments. The "relative luciferase activity" reported is derived from: (firefly luciferase activity of CCR5 promoter construct Ϭ renilla luciferase activity of co-transfected pRL-CMV) Ϭ (firefly luciferase of promoterless vector pGL3-Basic Ϭ renilla luciferase activity of co-transfected pRL-CMV).

RESULTS
Heterogeneity in the 5Ј-UTR of Human CCR5 mRNA-A single CCR5 mRNA isoform that contains a 5Ј-UTR of 54nucleotides in length has been reported (3). Since alternative splicing in the 5Ј-UTRs appears to be a feature common to several human chemokine and chemoattractant receptors (18,22,24), we hypothesized that this might also be true for CCR5. To test this hypothesis, we designed a strategy that involved 5Ј-RACE and RT-PCR techniques, and probed the diversity in the CCR5 mRNA structure in several primary human cell types and the human cell lines THP-1 and Jurkat. By this strategy, we identified PCR products of ϳ100 to ϳ350 bp in length from human dendritic cells, suggesting the possibility of novel 5Ј-UTR sequences. These PCR products were subcloned, and a schematic illustration of their sequence composition is shown in Fig. 1A. Based on sequence analysis and criteria outlined below, we have arbitrarily segregated these cDNA clones into two categories, representing either "full-length" or "truncated" CCR5 transcripts.
The two full-length CCR5 transcripts, designated as CCR5A and CCR5B, shared three sequence segments but differed by the presence or absence of a 235-bp sequence segment in the 5Ј-UTR (Fig. 1). As demonstrated later, these sequence segments were identified on CCR5, and based on their location on the gene they were designated as exons 1-4; exon 2 corresponded to the 235-bp sequence segment that is unique to . The ORF resides in exon 4 and also contains 11 bp of the 5Ј-UTR and the entire 3Ј-UTR. Dotted lines represent gaps in the ORF and 3Ј-UTR. The transcripts that contained exon 1 sequence are designated as full-length transcripts, whereas the individual transcripts that lacked exon 1 were designated as truncated isoforms. Individual names were not assigned to the truncated isoforms. The 5Ј termini of each truncated isoform identified, relative to its position in CCR5A, is indicated by an open circle. The 5Ј terminus of the longest reported 5Ј-UTR is denoted by an asterisk (3). B, nucleotide sequence of the 5Ј-UTR of human CCR5A and its comparison with the 5Ј-UTRs of mouse and rat CCR5 cDNAs. The CCR5A sequence shown was obtained by RT-PCR from human dendritic cells (the first nucleotide is indicated as ϩ1). Sequence analysis of the identical region from cDNAs obtained from THP-1 cells, and the CCR5B isoform from an unrelated normal donor revealed polymorphisms which are shown in Fig. 6. The GenBank accession numbers for the mouse and rat cDNAs are D83648 and Y12009, respectively. The asterisks above the CCR5A sequence denotes the 5Ј terminus of the cDNA clone published by Raport et al. (3). The closed circles represent the 5Ј termini of the truncated isoforms identified in this study and correspond to the open circles shown in panel A. Exon names are left-justified above their 5Ј termini. The translation initiation codon is indicated in bold letters and the derived amino acid is indicated by a single-letter code below the nucleotide sequence. The upstream AUGs are boxed. Vertical lines indicate nucleotide sequence identity, dots indicate gaps introduced to optimize the sequence alignment. The arrows depict the location and orientation of the final set of primers used in the RT-PCR assay whose results are shown in Fig. 2. CCR5A. Exons 1, 3, and 4 were common to both CCR5-A and -B, and the ORF, 11-bp of the 5Ј-UTR and the 3Ј-UTR resided in exon 4 (Fig. 1A).
The cDNA clones that lacked sequences corresponding to the 5Ј-most unique sequence segment, i.e. exon 1, were arbitrarily classified as truncated CCR5 mRNA isoforms. The 5Ј-termini of the truncated clones relative to their position on CCR5A are shown in Fig. 1. It should be emphasized that the truncated CCR5 transcripts could also represent incomplete cDNA synthesis by the reverse transcriptase. However, two findings suggest that this may not be the case. 1) From a single RT-PCR, we cloned products whose lengths were significantly longer than the truncated transcripts (Fig. 1A). 2) Except in a single instance, several clones had identical 5Ј termini, suggesting that they may represent transcripts that originate from distinct transcription start sites. It should also be noted that the presence of additional CCR5 isoforms that may have unique 5Јnoncoding exons or novel splice patterns cannot be excluded.
The cDNA sequence reported by Raport et al.
(3) lacked in-frame stop codons in the 5Ј-UTR, raising the possibility of a longer CCR5 ORF initiated at an upstream methionine. Inframe stop codons were identified 26 and 12 amino acids upstream of the currently assigned translation initiation codon in CCR5A and CCR5B, respectively. None of the upstream inframe amino acids were a methionine, excluding the possibility of a longer transcript that could encode a protein isoform with an amino-terminal extension. Interestingly, four upstream AUG triplets were found in the 5Ј-UTR of both CCR5A and CCR5B, but they were followed by downstream termination codons, and the two longest minicistrons were 9 and 15 amino acids in length.
The 5Ј-UTR sequences of the full-length and truncated CCR5 transcripts appeared to be highly conserved in evolution as GenBank data base analysis revealed strong sequence homology with the 5Ј-UTRs of mouse and rat CCR5 cDNAs (Fig. 1B). The 5Ј termini of the 5Ј-UTRs of mouse and rat cDNAs reside in a region that corresponds to exon 2 of human CCR5A. Whether additional upstream mRNA sequences exists in these two species is not known. It is interesting that 12 bp upstream of the start of the translation start site, all the human CCR5 cDNA clones had a 4-bp insertion (CCCC) relative to the mouse and rat cDNAs.
Tissue Distribution of Human CCR5 mRNA Isoforms-All the CCR5 cDNA clones identified contained exon 4 and portions of exon 3, and the additional length contributed by exons 1 and/or 2 to CCR5A or CCR5B was not substantial. This implied two points. First, that the proportion of transcripts in human cell types that are either full-length or truncated cannot be readily ascertained by size differences on Northern blots. Second, since CCR5A and CCR5B can be differentiated only by the presence or absence of exon 2, a RT-PCR strategy could be designed to evaluate exon usage in different human leukocyte populations. However, the latter strategy would not be helpful in defining the relative abundance of the truncated transcripts, as portions of exon 3 are common to all isoforms. To illustrate the first point, when we used a probe that corresponded to exon 1, a ϳ4.0-kb hybridizing band was visualized in human poly(A) ϩ mRNA derived from bone marrow, peripheral blood mononuclear cells, thymus, lymph node, and spleen (data not shown), and corresponded to the transcript size seen in the identical tissues hybridized with an ORF/3Ј-UTR probe (3).
The second point is illustrated in Fig. 2, which demonstrates the splicing patterns, i.e. exon usage, of CCR5 mRNA. In these RT-PCR experiments, total RNA derived from the primary human cell types shown in Fig. 2, and the THP-1 and Jurkat cell lines were used as PCR templates (location of the final set of RT-PCR primers is shown in Fig. 1B). In these experiments, we observed two bands in these cell types (Fig. 2, and data not shown for THP-1 and Jurkat cell lines). To confirm the exon composition of the ethidium bromide-stained PCR products, we subcloned the two bands that were amplified from dendritic cells and the THP-1 cell line. Sequence analysis revealed that the upper and lower band in Fig. 2 corresponded to isoforms that contained exons 1 ϩ 2 ϩ 3 (CCR5A) or exons 1 ϩ 3 (CCR5B), respectively. It should be noted that this analysis is qualitative, and although minor variations in the proportion of the transcripts containing these exons were observed, there was no clear pattern of tissue-specific utilization of either CCR5A or CCR5B.
The Human CCR5 Gene-Using PCR we amplified, cloned, and sequenced overlapping fragments of human CCR5, that together comprised an ϳ8-kb contiguous stretch of CCR5. The 5Ј-UTR sequences detected by 5Ј-RACE and RT-PCR, and the cDNA sequence reported by Raport et al. (3) were identified on this genomic contig (Fig. 3A). This genomic contig spanned 8035 bp, and originated ϳ1.9 kb upstream of exon 1 and terminated immediately upstream of the polyadenylation signal (Fig. 3A). The gene is organized into four exons and two introns (Figs. 3A and 4A). Both introns interrupt the 5Ј-UTR. Interestingly, exons 2 and 3 are contiguous and are not interrupted by an intron. The exon/intron splice junctions in CCR5 conform to the consensus sequences for 5Ј-(CAGGTRAGT) and 3Ј-(Y n NYAG) splice sites (the invariant dinucleotides at the termini of the intron consensus sequences are underlined; Fig.  3A). Interestingly, a region upstream of exon 1 had strong sequence homology (ϳ89%) with sequences in the 3Ј-flanking region of CCR5 (Fig. 3B).
We compared the 5Ј-and 3Ј-flanking regions of CCR5 with sequences deposited in GenBank (updated July 31,1997). This analysis revealed identity or close homology between the CCR5 sequences that we characterized and two unpublished gene sequences that were submitted while this work was in progress. 1) The entire 8035-bp sequence that we cloned was colinear with a portion of a human genomic DNA contig sequenced as part of the Advanced Genome Sequence Analysis Course, Cold Spring Harbor Laboratory, NY (GenBank accession number U95626); this unpublished contig is 143,068 bp in length and in addition to CCR5, it contains CCR2A,CCR2B and an orphan chemokine receptor gene. Our CCR5 sequence ends just proximal to the polyadenylation signal. However, alignment of our sequence contig with the sequences contained in GenBank accession U95626 revealed that the nucleotides that follow the end of our clone are identical to the polyadenylation An ethidium bromide-stained gel of the RT-PCR products obtained from total RNA derived from human leukocytes is shown. The forward and reverse primers were specific to exon 1 and 3, respectively (Fig. 1B). The RNA templates used in the RT-PCR are shown on the top, and the abbreviations used are: P, peripheral blood mononuclear cells; L, lymphocytes; M, monocytes; DC, CD34ϩ progenitor cell-derived dendritic cells; AC, activated CD4ϩ T cells. The length of the two PCR products obtained are indicated at the left of the panel. A single PCR product of ϳ800 bp was detected when human genomic DNA was amplified with the identical primers, suggesting that the RNA templates used to perform RT-PCR were free of genomic DNA contamination (data not shown). Each RT-PCR included a negative control that lacked the cDNA template (data not shown). signal sequence AAATAA. 2) A 227-bp sequence that is upstream of the Macaca mulatta CCR5 ORF (GenBank accession number U77672) had a high degree of homology with the region that corresponds to intron 2 of human CCR5. The 5Ј-and 3Ј-flanking sequences reported previously by Samson et al. (1) were 239-and 78-bp in length, respectively, and identical sequences were found in the CCR5 that we characterized. A region in intron 2 also had strong sequence homology with Alu repeats (Fig. 4).
The exact location of the exon/intron boundary between intron 2 and exon 4 in human CCR5 appears to be conserved in mouse. Comparison of the mouse CCR5 cDNA and genomic sequences (GenBank accession numbers D83648 and U68565) revealed an intron between Ϫ11 and Ϫ12 upstream of the translation start codon, a position that is identical for intron 2 in the human CCR5. Interestingly, the 554-bp mouse intron sequence had no homology with human CCR5 sequences, whereas, the 5Ј-UTRs of human and mouse CCR5 are highly conserved (Fig. 1B).
Evolutionary Conservation in the mRNA and Genomic Structure of Human CCR5 with That of Other Human Chemokine/ Chemoattractant Receptors-The mRNA and gene organization of human CCR5 is remarkably similar to that described for several other human chemokine and chemoattractant recep- FIG. 3. Human CCR5 sequence. A, exon sequence is in uppercase, introns and the 5Ј-flanking region are in lowercase. ORF sequences are shown in uppercase boldface letters and the derived amino acids are indicated by a single-letter code below the first nucleotide of each codon. The asterisk denotes the stop codon. Exons and intron names are leftjustified above their 5Ј termini. Double underline indicates terminal dinucleotides for the introns defined by each of the CCR5 RNA splice variant shown in Fig. 1. Note that all these dinucleotides obey the GT/AG rule and in one instance the AG dinucleotide resides within the exon defined by another splice variant. The 5Јmost expressed nucleotide has arbitrarily been designated as nucleotide ϩ1 (see also Fig. 1B). Gaps are denoted by serial dots. The length of the sequence not shown is indicated in the gaps. The pyrimidine-rich sequences are overlined by a straight line. Sequences that conform favorably to the indicated transcription factor DNA binding elements are overlined by a straight arrow; the direction of the arrow indicates the 5Ј 3 3Ј orientation of the putative binding site. The inverted Lshaped arrows delimit a region with strong homology (in reverse orientation) to a region in the 3Ј-UTR of CCR5 (panel B). Uppercase italicized sequence in the 3Ј-end of CCR5 represents the polyadenylation signal sequence (AAATAA). Alignment with the sequences contained in GenBank accession number U95626 revealed that this signal sequence is immediately contiguous with the CCR5 contig that we cloned. B, sequence alignment depicting the high degree of homology between a short region in the 5Ј-and 3Јflanking sequences of CCR5. Note that the 3Ј-flanking sequence is in the reverse complement orientation.
tors (16 -18, 20, 22, 24), suggesting a selective evolutionary pressure for these receptors to retain a conserved gene architecture (Fig. 4). It should be appreciated that, to date, the gene and mRNA structures (human) of only one CCR, CCR2 (20), two CXCRs, CXCR1 and CXCR2 (18,19), and the Duffy antigen receptor for chemokines (DARC; 16, 17) has been described. Furthermore, the functional promoters for only two human chemokine receptors, CXCR1 and CXCR2, have been described (18). As described below, we have characterized two promoters for CCR5, designated as P U and P D , and their locations are noted in Fig. 4. Interestingly, as is the case for the promoters for CXCR2 (18) and platelet-activating factor receptor gene (22,23), the two CCR5 promoters are also tandemly arranged on the gene. Another feature that is common to both CCR5 and CXCR2 is that they contain exon-exon units that are uninter-rupted by an intron. For example, exon 2 of CCR5A, resides in the "intronic" region for CCR5B, and exon 5 of the CXCR2-3 isoform, resides in the intronic region for CXCR2-1, -2, and -4 isoforms (Fig. 4).

Molecular Dissection of Functional Promoters for CCR5-
The genomic region upstream of exon 1 should potentially contain the cis-acting elements important in the promoter activity of CCR5A and CCR5B. We therefore constructed CCR5firefly luciferase chimeric plasmids from portions of the gene upstream of exon 1, designated as pA1-4 (Fig. 5, upper panel). We tested the ability of these promoter constructs to drive the expression of the reporter gene (firefly luciferase) in the following cell lines: 1) THP-1, a human monocytic leukemia cell line, a surrogate for monocytes; 2) K562, a human chronic myelogenous leukemia cell line, a surrogate for undifferentiated he- Angled dashed lines indicate splicing patterns. gt indicates the GT dinucleotides immediately 3Ј to the 3Ј terminus of each indicated exon; ag indicate AG dinucleotides immediately 5Ј to the 5Ј terminus of each indicated exon. For the truncated isoforms, the gt/ag dinucleotides for only the longest truncated isoform is shown. Also shown is a summary of the CCR5 promoter analysis illustrated in Fig. 5; the arrows demarcate the regions corresponding to the upstream (P U ) and downstream (P D ) promoters characterized in this study (Fig. 5) mopoietic cells; and 3) Jurkat, which is a human T cell leukemia cell line. To correct for differences in transfection efficiency, we co-transfected the promoter constructs and the promoterless vector pGL3-Basic with pRL-CMV, a construct that contains the renilla luciferase gene downstream of a CMV promoter. Lysates prepared from cells transfected with constructs pA1-4 exhibited weak luciferase activity (Fig. 5). This genomic region upstream of exon 1, which has weak promoter activity, is designated as the upstream promoter (P U ).
Because a large number of 5Ј-RACE clones terminated either in exon 3 or at the 3Ј-end of exon 2 (Fig. 1A), we hypothesized that these transcripts may represent distinct isoforms that are initiated because of the usage of an alternative promoter. To test this hypothesis, we constructed the series of promoter constructs shown in the lower panel of Fig. 5. It should be noted that in some instances these constructs contain portions of P U , intron 1, and exon 2, and that the distal end of each of these constructs resides within exon 3.
In contrast to P U , the region upstream of exon 3, designated as the downstream promoter (P D ), had strong luciferase activity in all the three cell lines tested (Fig. 5, lower panel). Maximal promoter activity was consistently observed in the cell lysates from K562 cells, especially with those transfected with pB3 and pB4. The promoter activity for these two constructs in K562 cells was ϳ8 -10-fold more than that detected in cells transfected with pB1, pB2, or pB5. The increase in luciferase activity in THP-1 and Jurkat cell lines transfected with pB3 and pB4 was not as prominent as that observed for these two promoter constructs in K562 cells. Relative to pB3 and pB4, the construct pB5 exhibited weak promoter activity. This finding suggests that the sequences between pB4 and pB5 may contain important cis-acting elements for CCR5 promoter activity. It is important to note that since all the P D constructs contain all or portions of exon 2, it is likely that this noncoding exon may play an important role in modulating gene expression.
Analysis of the P U and P D Sequences-It is important to appreciate that because of the complex genomic and mRNA organization of CCR5, it is difficult to unambiguously assign certain regions of CCR5 as an exon, intron, or promoter. Notwithstanding this caveat, P U and P D lacked canonical TATA and CCAAT motifs. However, in P D there was a nonconsensus TATA box (TTTATA; Fig. 3A). Unlike most TATA-less promoters which have a high GC content, P U and P D were GC poor. The overall G ϩ C content of P U and P D was ϳ46 and ϳ40%, respectively. We identified several pyrimidine-rich segments in both P U and P D . Pyrimidine-rich sequences have been observed in the proposed promoter for DARC (17), and several other genes that are abundantly expressed in myeloid cells, including N-formyl peptide receptor (FPR) (24). P U and P D contained consensus sequences for several transcription factor DNAbinding sites (e.g. AP-1, Oct-1, PuF, PU.1, and NF-B-like). The PU.1 transcription factor has been found to be important in the promoter activity of several genes expressed in myeloid cells, including monocyte colony-stimulating factor and CD11b genes (31). Multiple binding sites for GATA-1, an important transcription factor in the development of hematopoietic cells (31), and for Sp1 were also noted.
Polymorphisms in CCR5 Noncoding Sequences-We aligned the nucleotide sequences of the CCR5 gene that we cloned with gene sequences in GenBank accession number U95626, and the sequences of the cDNA clones derived by RT-PCR and 5Ј-RACE. This alignment revealed extensive nucleotide differences in the noncoding sequences of the gene. The relative positions of the nucleotide substitutions, deletions, or inser- Note the difference in the scale for the relative luciferase activity for the upper and lower panels. Also the scale for the relative luciferase activity in the lower panel for promoter constructs transfected into K562 cells is different from that used for THP-1 and Jurkat cells. The weak luciferase activity exhibited by the promoter constructs pA1-4 was not attributable to low transfection efficiency as first, the co-transfected pRL-CMV vector directed high levels of renilla luciferase, and second, after demonstrating the functional activity of the constructs pB1-5, we repeated some of the experiments with pA1-4, using pB3 or pB4 as a positive control (data not shown). In these repeat experiments, pB3 and pB4 directed high firefly luciferase expression levels, whereas, constructs pA1-4 had luciferase activities similar to those shown in the upper panel (data not shown).
tions detected in the 5Ј-noncoding sequences are shown in Fig. 6. Differences in the 3Ј-flanking regions of the two gene sequences were also noted (data not shown). The nucleotide differences noted in the cDNAs obtained from the nonrelated donors and the THP-1 cell line were not random, as sequence of multiple cDNA clones identified differences only at those positions where the two gene sequences diverged. This also suggests that these differences were probably not due to mutations introduced by the Taq polymerase. Sequence analysis of the genomic region upstream of exon 3 in 5 additional unrelated donors revealed polymorphic changes at the same and/or additional nucleotide positions (data not shown).

DISCUSSION
In this report, we have identified novel CCR5 transcripts, defined their splicing patterns, and determined the organization of CCR5. We also illustrate the striking conservation in gene structure of CCR5 and related chemokine/chemoattractant receptors. We also provide the first description of functional promoters for any CC chemokine receptor gene. With regard to the molecular nature of the cis-acting elements that regulate the constitutive CCR5 expression in human leukocytes, a complex picture is emerging, one which may involve alternative promoter usage with regulatory elements residing on both sides of the 5Ј-most exon, implicating an important role for intronic and 5Ј-UTR sequences. In addition, we provide evidence for the presence of polymorphic nucleotides in the noncoding sequences of CCR5.
It is likely that a single gene encoding multiple transcripts allows for genetic parsimony while maximizing the mechanisms by which gene expression can be modulated (32). We speculate that the full-length and truncated transcripts are initiated from P U and P D , respectively, and that those initiated from P U undergo alternative splicing, giving rise to CCR5-A and -B. The number of truncated isoforms may be even greater if one considers the possibility of additional transcription start sites within P D . Nevertheless, as alluded to earlier, it is important to emphasize that distinguishing whether these truncated isoforms are transcribed in vivo or merely represent premature termination of cDNA synthesis by the reverse transcriptase is difficult.
The structural similarities in the gene and mRNA organization of CCR5 and several other chemokine/chemoattractant receptor genes, underscores an important evolutionary conserved function for this prototypical gene structure, the propensity for alternatively spliced isoforms, and usage of multiple promoters. It is likely that these receptors arose from an initial gene duplication event, with subsequent tandem duplication of an ancestral gene on chromosome 3p giving rise to several CCRs. It should be noted that in addition to these two GPCR subclasses, alternative splicing within the 5Ј-UTR has been described for a few other human GPCR genes (33,34).
From an evolutionary perspective, it is intriguing that in addition to their ORFs, the 5Ј-UTRs of mouse, rat, and human CCR5 share strong sequence homology. To date, murine homologues for CCR1-5 have been cloned (reviewed in Ref. 35). The 5Ј-UTR sequences for murine CCR1 are not available in Gen-Bank, nevertheless, unlike the strong interspecies homology of the 5Ј-UTRs of CCR5 (Fig. 1B), the 5Ј-UTRs of mouse and human CCR2, CCR3, and CCR4 do not share significant sequence homology (data not shown). These observations point toward a selective pressure for both mouse and human CCR5 to retain similar noncoding exons, which at least in humans, may participate in CCR5 gene regulation.
It is likely that CCR5 regulation may occur at many levels (14,15). As is the case for other GPCRs, the cell surface expression of CCR5 may be regulated at the protein level, over the short term, through mechanisms such as receptor internalization, sequestration, and desensitization. Longer term regulation of these receptors is likely to be achieved through regulation of the rate of transcription of the gene, stability of the mRNA, and translation efficiency, and there is increasing evidence that the sequences in the 5Ј-and 3Ј-UTRs may influence these processes (36).
We will discuss two possible mechanisms by which the 5Ј-UTRs of CCR5 may regulate gene expression. First, the 5Ј-UTR of CCR5-A and -B have several structural features that may exert a negative effect on the efficiency of translation. Kozak has examined factors in the 5Ј-UTRs that promote efficient translation (37,38), which include the observation that: 1) most eukaryotic mRNAs have a short 5Ј-UTR, and 2) there are no AUGs upstream of the translation initiation site of the major ORF. Both CCR5A and CCR5B, the two full-length transcripts, have relatively long 5Ј-UTRs, and they belong to the unusual class of mRNAs (Ͻ10% vertebrate RNAs characterized) that contains AUG triplets upstream of the AUG that initiates the major ORF (Fig. 1B). The presence of translation initiation codons followed immediately by termination codons creates short upstream ORFs in the 5Ј-UTR. As reported in other gene systems (39, 40) these short upstream ORFs could lead to reduced protein output through a mechanism of abortive translation. For example, a product of a short upstream ORF encoding a 19-amino acid leader peptide inhibits the translation of the ␤2-adrenergic receptor (40). Since some of the truncated isoforms lack short upstream ORFs, it is conceivable that preferential initiation of transcripts from P D may represent a potential mechanism by which CCR5 expression is modulated, as this would by-pass the possible inhibitory effects of the upstream minicistrons.
A second mechanism includes the possibility that differences in the secondary structures of the 5Ј-UTRs of the distinct CCR5 transcripts may influence translation efficiency. It is known that a Gibbs free energy of formation (⌬G) of less than Ϫ50 FIG. 6. Polymorphisms in the regulatory sequences and noncoding exons of CCR5. The alignment of the following nucleotide sequences is shown. 1) CCR5 gene isolated in this report (top line); 2) the sequences in GenBank (accession number U95626) that are co-linear with the sequences determined in this report; 3) partial CCR5B cDNA clone (source: dendritic cells) from a normal donor; 4) partial CCR5A cDNA clone (source: dendritic cells) from an unrelated second donor; 5) partial CCR5A cDNA clone from THP-1 cells. The nucleotide numbers are derived from Fig. 3A. Serial dots denote gaps introduced. Boxes, denote insertions or deletions. P U denotes upstream promoter.
kcal/mol can impair the passage of the ribosomal 40 S subunits as they scan from the cap site (41). Algorithms developed by Dr. M. Zucker (Ref. 42 and http://www.ibc.wustl.edu/ϳzuker/rna) were used to analyze the 5Ј-UTRs of CCR5A and CCR5B for their tendency to undergo secondary structure. These algorithms predict that the ⌬G of CCR5A and CCR5B are Ϫ69.5 and Ϫ48.7 kmol/mol, respectively, suggesting that relative to CCR5B,CCR5A has a higher propensity to form a very stable structure.
We have identified two CCR5 promoter regions that were active in all three cellular environments tested: P U , a weak promoter which resides proximal to exon 1, and P D , a stronger promoter which is located upstream of exon 3. It is conceivable that regions further upstream of exon 1, or constructs shorter than those that we tested, may support strong promoter activity for P U . We ascribe an important role for the region between ϩ429 to ϩ634 in regulating CCR5 expression. Although within this region, consensus sequences representing binding sites for transcription factors such as Oct-1 and GR-␤ are present, future studies will be required to determine the precise cis-acting elements that confer this activity. It should be noted, that several of the constructs designed to test P D had intron 1 and exon 2 sequences, implicating an important function for these two regions in the regulation of CCR5. An important role for intronic sequences in the regulation of several genes has been described, including for CXCR2 (18).
The promoter sequences of CCR5 have two interesting features. First, a region in P U has sequence homology to a region in the 3Ј-UTR, the significance of which, if any, remains unclear. Second, characteristic of several GPCRs, neither P U nor P D had classical TATA or CCAAT motifs, although P D does contain a nonconsensus TATAA box. Most genes that are TATA-deficient can be divided into two classes on the basis of their upstream GC content (43). GC-rich promoters, found primarily in housekeeping genes, are very complex and prevalent; their promoters contain several binding sites for the ubiquitous trans-activating Sp1 protein and have several transcription start sites. In contrast, the remainder of the genes that are TATA-deficient and are not GC rich, tend to be regulated during differentiation or development; many of their promoters are not constitutively active and initiate at only one or a few very tightly clustered start sites. The AT-rich composition of the CCR5 promoters, P U or P D , suggests that they belong to the latter class of promoters. However, in contrast to this subclass of TATA-deficient promoters, P U or P D appear to be constitutively active, are possibly initiated at several transcription start sites, and there is no conclusive evidence, to date, to suggest that CCR5 requires strict activation and inactivation during cellular differentiation and development.
It is clear from the study of several diverse gene systems that alternative promoter usage resulting in alternative transcripts is an important evolutionary mechanism to create diversity in the regulatory control of gene expression (reviewed in Ref. 32). In these systems, alternative promoter usage has been shown to be an important transcriptional mechanism for regulating either tissue-or cell-type specific expression, the level of expression, the developmental stage-specific (temporal) expression, the specific capacity to respond to a particular cellular or metabolic conditions, or the translational efficiency of the mRNA. Which, if any of these mechanisms is operative in CCR5 is not known, however, several possible scenarios for CCR5 can be envisaged. It is possible that the level of CCR5 expression is regulated at a transcriptional level by the usage of promoters of different strengths, such as the ones we have described. Whether the observed differences in transcription efficiencies between the P U and P D , and whether the observed differences in the transcription efficiencies in the P D constructs in the different cell environments might result in differential expression at the protein level is unknown.
Although the protein encoded by the different CCR5 transcripts is likely to be identical in different cell types, they may be regulated differentially in these different cell types by various extracellular signals, such as cytokines or chemokines. To test this latter possibility, we have in preliminary experiments, determined whether cytokine stimulation alters the constitutive promoter activity of a single promoter construct (pB3). The promoter activity of pB3 in Jurkat cells stimulated with phytohemagglutinin, phytohemagglutinin and phorbol myristic acid, ionomycin and phorbol myristic acid, or CD3/CD28 was similar to that observed in unstimulated Jurkat cells transfected with pB3 (n ϭ 3; data not shown). Similarily, the cell lysates of THP-1 cells transfected with pB3 and stimulated with lipopolysaccharide, TNF-␣, interleukin-6, and interferon-␥ exhibited promoter activities similar to the cell lysates from the unstimulated THP-1 cells transfected with pB3 (n ϭ 3; data not shown). Additional studies will be required to investigate whether these stimuli alter the promoter activity of the other reporter constructs used in this study.
Several polymorphisms have been described in the CCR5 ORF (10 -12, 44). We now provide evidence for polymorphisms in the flanking regions of CCR5, however, it should be emphasized that the significance of this finding, if any, in HIV-1 infection is not known. Several studies have clearly demonstrated that genes can be polymorphic not only in their coding regions, but also in important cis-regulatory sequences (45)(46)(47)(48)(49)(50)(51)(52)(53). Furthermore, transcriptional mutants may profoundly affect the promoter strengths of particular alleles by altering the affinity of regulatory proteins for these elements, and in some instances a single nucleotide change in a critical regulatory region can result in up to 1 order of magnitude difference in transcriptional activity of two otherwise identical promoters. As discussed below, this in turn, can have a profound affect on protein synthesis.
One of the most striking examples of transcriptional mutants affecting protein synthesis came in the wake of the cloning of the human ␤-globin gene nearly 20 years ago, where in addition to mutations in the coding region, single mutations in the regulatory regions were shown to decrease the amount of ␤globin produced by red cells, leading to the blood disorder called ␤-thalassemia (52). It is interesting, that to date, over 300 ␤-thalassemia alleles have been discovered, including 12 transcriptional mutants, that account for the molecular basis of the marked heterogeneity of the ␤-thalassemia syndrome. Transcriptional mutants that lead to an increase in protein expression have also been described. For example, studies have linked the variant allele for the TNF-␣ gene, referred to as TNF-2, to increased serum levels of TNF-␣, and a poor prognosis for several infections, such as malaria (53). Thus, it is conceivable that the polymorphisms in the regulatory regions of CCR5 may, in part, explain the observed variability in CCR5 expression in individuals who display the CCR5/CCR5 genotype (13,54), and may therefore, influence the clinical outcome of HIV-1.
In summary, the studies reported here should prove useful as a basis for further studies that will focus on analyzing the differential role of the CCR5 mRNAs, if any, in protein expression, and delineating the protein factors and DNA sequences that are specifically responsible for transcription of CCR5 at physiologic sites. Additional studies will be required to determine the functional importance of the polymorphisms, if any, in the noncoding regions of CCR5 that we have described. These future studies should provide not only insights into the evolutionary rationale for such a complex system of multiple tran-scripts directed by different regulatory regions, but may also spawn new ideas regarding mechanisms by which CCR5 gene expression can be down-regulated in disease states such as HIV-1 infection.