The Human gC1qR/p32 Gene, C1qBP GENOMIC ORGANIZATION AND PROMOTER ANALYSIS*

is an ubiquitously expressed cell protein that interacts with the globular heads of C1q (gC1q) and many other ligands. In this study, the 7.8-kilobase pair (kb) human gC1qR/p32 ( C1qBP ) gene was cloned and found to consist of 6 exons and 5 introns. Analysis of a 1.3-kb DNA fragment at the 5 * -flanking region of this gene revealed the presence of multiple TATA, CCAAT, and Sp1 binding sites. Luciferase reporter assays performed in different human cell lines demonstrated that the reporter gene was ubiquitously driven by this 1.3-kb fragment. Subsequent 5 * and 3 * deletion of this fragment confined promoter elements to within 400 base pairs (bp) upstream of the translational start site. Because the removal of the 8-bp consensus TATATATA at 2 399 to 2 406 and CCAAT at 2 410 to 2 414 did not significantly affect the transcription efficiency of the promoter, GC-rich sequences between this TATA box and the translation start site may be very important for the promoter activity of the C1qBP gene. One of seven GC-rich sequences

gC1qR is a biologically important, widely distributed, multiligand-binding and multifunctional protein (1). Numerous reports have claimed that gC1qR and its homologue could be isolated or identified in various cellular compartments, including plasma membrane, cytoplasm, mitochondria and nucleus. gC1qR isolated from the plasma membrane was originally characterized as a high affinity C1q-binding protein (2), and later many reports showed that gC1qR could interact with several proteins of the intrinsic coagulation/bradykinin-forming cascade, including high molecular weight kininogen (3), Factor XII (4), fibrinogen (5), and multimeric vitronectin (6). Conversely, intracellular gC1qR was shown to interact and subsequently down-regulate the surface expression of the ␣ 1badrenergic receptor (7) as well as bind to the kinase domain of protein kinase C and thus prevent its substrate phosphorylation activity (8). In addition, gC1qR was also reported to bind to a nuclear splicing factor, SF2, and to many viral proteins, including HIV-1 Tat (9) and Rev (10), core protein V of adenovirus (11), EBNA-1 of the Epstein-Barr virus (12), and open reading frame P of the herpes simplex virus (13), implying that gC1qR may play a role in virus-host interaction.
The full-length cDNA of gC1qR encodes a pre-pro-protein of 282 residues from which a 73-residue-long N-terminal segment is removed by site-specific cleavage to generate the mature gC1qR (2). It was shown that the fusion of the residues 1-81 or 1-33 of the pre-pro-protein to the N terminus of the green fluorescent protein directed the fusion protein to mitochondria (14). However, the findings of Dedio et al. (14) do not exclude the possibility that gC1qR, like many other proteins, could be exported from the mitochrondria by an unknown mechanism (15). This possibility is supported by a recent report showing that anti-gC1qR monoclonal antibody can reverse the antiproliferation effects of hepatitis virus C core antigen on activated T cells (16).
The human C1qBP gene was assigned to human chromosome 17q13.3 (17). A high degree of amino acid identity exists between the human, rat, and mouse gC1qR cDNA sequences (18). In this study, the full-length gene of human gC1qR was cloned, and its exon-intron boundaries were revealed. Furthermore, the transcription start site and its promoter elements were also mapped and characterized.

MATERIALS AND METHODS
Screening of the Human Genomic Library-A human genomic library in bacteriophage EMBL3 was purchased from CLONTECH (cat. no. HL1067J). The human cDNA of the C1qBP gene was used as the probe for library screening (2). The cDNA insert was released from its vector and gel-purified before radiolabeling with [ 32 P]dATP by the random priming method. Positive plaques were picked, replated, and rescreened until single positive plaques were picked. The insert in a positive plaque was digested with various restriction enzymes, and the positive fragments were mapped by Southern blot analysis. Six overlapping subclones ( Fig. 1) were obtained by inserting the positive fragments into the plasmid pBluescript. Each subclone was sequenced by the primer walking method using the ABI Prism 310 Genetic Analyzer (Applied Biosystems, Hong Kong Limited, Hong Kong).
Sequence Assembly and Analysis-Sequence data were assembled and analyzed by DNA processing software including MAC DNASIS (Hitachi, Japan) and DNA Strider (Christian Marck, Service de Biochimie, Department de Biologie, Institut de Recherche Fondamentale, CEA, Paris, France). Promoter analysis was performed using MatInspector software (version 2.2) (19) obtained from the German Research Center for Biotechnology, Braunschweig, D-38124, Germany. A transcription factor data base (TRANSFAC, version 4.0) was employed for the search of promoter elements (20).
Construction of Expression Plasmids for Promoter Assay-Various DNA fragments of the 5Ј-upstream region of the human gC1qR gene (C1qBP) were obtained by polymerase chain reaction using subclone gI as the template. Restriction sites BglII or SacI were added to the 5Ј-end of the primers (Table I) so that the polymerase chain reaction products could easily be subcloned into the corresponding restriction sites in the dual-luciferase reporter vector, pGL3-Basic (Promega, Hong Kong). A nested family of 5Ј and 3Ј deletion clones (Table II) was generated in this manner (Fig. 4).
Cell Culture, Transient Transfection, and Promoter Assays-The human cell lines PANC-1 (ATCC no. CRL-1469), MDA-MB-231 (ATCC no. HTB-26), SVG P12 (ATCC no. CRL-8621), HuTu80 (ATCC no. HTB-40), and 293-EBNA (Invitrogen, no. R620-07) were selected for transient transfection studies. SVG P12 and HuTu80 cells were cultured in minimum essential medium (Life Technologies, Inc.); PANC-1, MDA-MB-231, and 293-EBNA cells were cultured in high glucose Dulbecco's modified Eagle's medium (Life Technologies, Inc.). All culture media were supplemented with 10%(v/v) fetal bovine serum (50 ml in 500 ml of medium) and 1ϫ antibiotic-antimycotic mixture (Life Technologies, Inc.) at 37°C with 5% CO 2 . Plasmid DNA used for transfection was prepared using the Quantum prep Kit (Biorad, Hong Kong), and treated with phenol-chloroform before transfection. For transient transfection, cells were seeded onto 12-well plates (Costar, Corning Glass Inc.) at a density of 1.0 ϫ 10 5 cells/well. Co-transfection was performed 19 h later using the LipofectAMINE Plus ® Reagent (Life Technologies, Inc.) with procedures performed according to the manufacturer's protocol. The molar ratio of the recombinant plasmid to be assayed and the internal control pRL-SV40 plasmid (Promega, Hong Kong) was kept at 1:1 in all transfection. Immediately after transfection, the cells were incubated in the medium without serum and antibiotics at 37°C with 5% CO 2 for 3.5 h before changing to complete medium. The cells were lysed at 39 h post-transfection by washing the cells twice with 1ϫ phosphate-buffered saline followed by the addition of reporter passive lysis buffer (Promega, Hong Kong). The dual-luciferase reporter assay was performed according to the manufacturer's protocol (Dual-Luciferase reporter assay kit, Promega, Hong Kong) using a luminometer (Lumat LB-9507, EG&G) as the measuring apparatus.
RNA Isolation and Primer Extension Analysis-Total cellular RNA was isolated from human cell line MDA-MB-231 by TRIzol reagent (Life Technologies, Inc.). Poly(A) ϩ RNA was extracted from 2 mg of total RNA using the Poly(A)Tract mRNA isolation kit (Promega). Labeling of the primer and primer extension reaction was performed using 1 g of poly(A) ϩ RNA with 10 pmol of the antisense primer G1A2Bgl following the instructions for the Primer Extension System (Promega). 4 l of the primer extension reaction product was analyzed on a 6% (w/v) acrylamide, 7 M urea sequencing gel. The size of the primer extension product was determined by comparison with a DNA sequence ladder generated with the same oligonucleotide primer using ERES3 plasmid as a template.
Gel Mobility Shift Assays-The human cell line PANC-1 was selected for gel shift assays. Cells were cultured to exponential phase as mentioned above and harvested. End-labeling of double-stranded oligos (Table I) was done using the Ready-To-Go T4 polynucleotide kinase labeling kit and [␥-32 P]ATP (5000 Ci/nmol) (Amersham Pharmacia Biotech). Nuclear extract preparation and gel shift assays were performed as described previously (21,22). Fig. 1 is a schematic representation of the genomic organization of the gene, which, including its 5Ј-and 3Ј-flanking regions, spans about 7.8 kb. 1 From the first codon of the initiation methionine to the stop codon of the gene, the gene spans 6055 bp (Fig. 2). By alignment of the cDNA and the genomic sequences, intronexon boundaries were defined. There are 6 exons and 5 introns in the C1qBP gene. The size of the exons range from 94 (exon 3) to 232 bp (exon 1), and that of the introns range from 128 (intron 5) to 3156 bp (intron 2). Amino acid codons are split by introns 1 and 2 at the junctions of their adjacent exons. (Table  III) A poly(A) signal is located 369 bp from the stop codon. The entire genomic sequence has been deposited in the GenBank TM under accession no. AF338439.

Genomic Organization of the Human C1qBP Gene-
Characterization of the 5Ј-Flanking Region of the Human C1qBP Gene-A 1.3-kb nucleotide sequence located upstream of the ATG initiation codon of the gene was analyzed (Fig. 3) using the TRANSFAC transcription factor data base. Putative

5Ј attaGAGCTC CTGCCCTTGAGGATG 3Ј
Ϫ1314 to Ϫ1300 Pro200Sac a Four bases and a restriction site were added at the 5Ј end. b All primers were designed from their GC with respect to their location in the 5Ј flanking region of the human gClqBP gene. "S" indicates sense primers, and "A" is the antisense version of its sense primer. L8S and L8A, which do not contain GC-rich sequences, were designed as negative controls.
promoter and enhancer elements including TATA boxes, CCAAT boxes, octamers, Sp1 binding sites, GATA sequences, E-boxes, and AP elements were identified by the software Mat-Inspector. There are four putative TATA boxes (Ϫ806 to Ϫ811, Ϫ614 to Ϫ617, Ϫ446 to Ϫ449, and Ϫ399 to Ϫ406) and three putative CCAAT boxes (Ϫ1033 to Ϫ1037, Ϫ460 to Ϫ463, and Ϫ410 to Ϫ414). The 8-bp consensus TATA box with the sequence TATATATA located at Ϫ399 to Ϫ406 is the longest TATA element found in the region, and it is in close proximity to a CCAAT box located at Ϫ410 to Ϫ414. Sp1 binding GC-rich motifs are also found throughout the 5Ј-flanking region of the gene at positions Ϫ1309 to Ϫ1314, Ϫ959 to Ϫ965, Ϫ535 to Ϫ540, Ϫ519 to Ϫ524, Ϫ337 to Ϫ342, Ϫ177 to Ϫ182, Ϫ148 to Ϫ154, Ϫ139 to Ϫ144, Ϫ111 to Ϫ118, Ϫ83 to Ϫ90, and Ϫ51 to Ϫ61.
A variety of consensus elements for transcription factors are also present, including an octamer site for the homeobox domain factor Oct-1 (Ϫ28 to Ϫ34) and several recognition sites for the putative zinc transcription factor GATA, which is expressed in high levels in the pancreas and gut-derived cells (23). Other putative sites for enhancer elements AP1, AP2, AP3, and AP4, HSF (heat-shock factor) (24), NFAT (nuclear factor of activated T cells) (25), AML transcription factor (26), and c-Ets-1 transcription factor (27) are also identified in the 5Ј-flanking region.
Determination of the Transcription Start Sites-A primer extension analysis was performed to determine the transcription start site of the gene. As shown in Fig. 4, there are three regions detected with stronger intensities, and each region was found with a number of individual start sites. These regions are the major transcription start sites. The farthest extension band was observed at 49 bp upstream of the ATG translation initiation codon and is an adenine residue. This nucleotide is positioned just downstream to an SP1 binding site found in the promoter region. All of these regions lie within 50 bp upstream to the translation initiation codon.
Mapping of the Human gC1qR Gene (C1qBP) Promoter Element in PANC-1 cells-To localize the essential promoter region for human gC1qR gene expression, several 5Ј and 3Ј dele-  The Human gC1q-R/p32 Gene (C1qBP) tion clones were constructed by cloning various restriction fragments produced by polymerase chain reaction into a reporter vector, pGL3-Basic. Because these fragments were cloned upstream to a luciferase reporter gene, transcriptional activities of the promoter-luciferase cartridge could be studied by transfecting various clones to the human cell line PANC-1 (Fig. 5). It was found that the essential region for promoting transcription is very close to the start site of translation. By comparing luciferase activities between constructs ERES2 to ERES7 and that of constructs ERES7 with ERES-2H and -2P, the essential promoter region could be mapped to a region spanning the translation start site to Ϫ364 bp. This observation is supported by the fact that no significant difference in the luciferase activities was observed upon deletion of the 5Ј sequence from Ϫ364 to Ϫ1319 (ERES7). However, in the two constructs in which the 3Ј region was deleted, the luciferase activities were completely abolished. Because no 5Ј deletion studies were carried out beyond Ϫ364, we cannot conclude and finely map the essential promoter elements in the Ϫ1 to Ϫ365 region upstream of the translation start site. Experimental data also showed that inverting the 1.3-kb 5Јflanking region (ERES1) greatly decreased the transcription activities. Deletion Analysis on the Other Human Cell Lines-Three constructs, ERES2, ERES7, and ERES-2P were transfected to the other human cell lines to test the cell specificity of the  3. Analysis of the 5-flanking region of the human C1qBp gene. The first nucleotide of the translation start codon ATG (bolded) is designated as ϩ1 and the preceding nucleotide as Ϫ1. All potential transcription factor binding sites, promoters, and elements that may be responsible for transcription control are underlined and labeled below the line. promoter element as well as to map the element. All of the four cell lines, MDA-MB-231, SVG P12, 293-EBNA, and HuTu80, showed enhanced levels of luciferase activities for constructs ERES2 and ERES7, ranging from 10-to 100-fold depending on the cell lines (Fig. 6). However, similar to the results obtained in PANC-1, deletion of the 3Ј region of the promoter (ERES-2P) completely abolished the promoter-reporter luciferase activities in all of these four cell lines. It was surprising that the removal of the TATA box at Ϫ399 to Ϫ406 and the CCAAT box at Ϫ410 to Ϫ414 in ERES7 did not significantly affect the transcription efficiency of the promoter. This finding implied that there are strong promoter elements in the Ϫ1 to Ϫ364 region of the 5Ј-flanking region.
Interaction between DNA-specific Nuclear Protein Factors and Promoter Elements-In the Ϫ1 to Ϫ364 regions, seven GC-rich sequences with high homology to the consensus SP1 binding site were identified (Fig. 3). To test whether there is any nuclear factor interacting with these GC-rich sequences, gel shift assays (Fig. 7A) were performed with nuclear extracts of PANC-1 cells. Seven pairs of synthetic oligonucleotides in both sense and antisense orientations (Table I) were designed based on these GC-rich motifs. Of the eight oligo pairs (including a negative control that is not GC-rich), only the L6 probe formed DNA-protein complexes with the nuclear extract. In the competitive assay, a decreased intensity of the hybridization bands was observed (Fig. 7B) when the concentrations of the unlabeled L6 probe were 10 -100-fold that of the labeled L6 probe. When an unrelated DNA probe was used for the competition assay (Fig. 7B), one (complex 1) of the four bands also showed a reduced signal, indicating that complex 1 is a nonspecific signal. To reveal the identity of the complexes, monoclonal anti-Sp1 antibodies (Research Diagnostics Inc.) and bovine serum albumin were mixed with the complexes in a supershift assay. It was observed that the DNA-protein complex 4 was shifted only by anti-Sp1 antibodies but not by bovine serum albumin, indicating that complex 4 is a complex between oligonucleotide L6 and Sp1 (Fig. 8). DISCUSSION Like the mouse gene, the human gC1qBP gene contains six exons and five introns, and the sizes of the exons do not differ much except for exon 6 (28). Exon 6 in the human C1qBP gene is about 100 bp larger in than the mouse gene because of a longer 3Ј-untranslated region defined by the polyadenylation sites. Even though the sizes of the exons between the human and mouse genes are almost the same, three of the introns of the human C1qBP gene have expanded sizes in comparison with their mouse counterparts. The size of intron 2 in the human and mouse genes is 3156 and 2071 bp, respectively, which makes the human gene approximately 1 kb larger than the mouse gene.
The promoter elements of the human gene were identified by computer analysis and transfection studies. Several TATA boxes and CCAAT boxes were found on the 1.3-kb 5Ј-flanking region. The TATA box is a crucial positioning component of the core promoter and is usually located about 25 bp upstream of the transcription start site; it constitutes the only upstream promoter element that has a relatively fixed location with respect to the transcription start site. The CCAAT boxes are often located close to the TATA boxes, but they can function at distances that vary considerably from the transcription start site in either orientation. Mutation studies have suggested that the CCAAT box plays a strong role in determining the efficiency of the promoter, and its inclusion increases promoter strength.
The transcription start sites are usually located downstream in close proximity to the TATA elements in the majority of promoters. However, the transcription start sites (Ϫ49 to Ϫ29) of the human C1qBP gene are around 350 bp from the closest TATA element (Ϫ399) found in the 5Ј-flanking region of the gene. The 5Ј-flanking region of the human gene contains an 8-bp consensus TATA box with the sequence TATATATA at Ϫ399 toϪ406 and a CCAAT box at Ϫ410 toϪ414, whereas in the mouse gene, the TATA box closest to the ATG start codon is located approximately at Ϫ306 toϪ309 (29). However, in the promoter studies, the construct ERES7, which did not carry any TATA or CCAAT boxes, still gave more than 90% transcription efficiency, indicating that none of the TATA boxes or CAAT boxes was essential for the transcription of luciferase mRNA in the cell lines under test. Instead, deletion studies carried out in all cell lines showed complete abolishment of the promoter-reporter activity when the 364 bp adjacent to the initiation methionine was deleted (ERES-2P). This 364-bp promoter sequence, similar to the large stretch of GC-rich sequence in the mouse gene, especially from nucleotides Ϫ1 to Ϫ200 (75% G ϩ C content) (28), contains seven GC-rich Sp1 sites, which are frequently linked to the transcriptional control of genes lacking a functional TATA box. As no TATA element was found to be essential for the activity of the gC1qBP promoter, Sp1 sites in the upstream region close to the transcription start site are thought to be very important for the promoter activity of the gene. In fact, one of these seven GC-rich Sp1 sites (Ϫ96 to Ϫ76) was found to bind specifically to PANC-1 nuclear proteins in gel mobility shift assays, and one of these nuclear factors was further proved to be Sp1 binding factor in supershift assays employing anti-Sp1 antibodies. These results show that binding of Sp1 to the SP1 binding site located at around 80 bp upstream to the translation initiation codon may play an important role in transcription control in human The three-dimensional structure of gC1qR was revealed by x-ray crystallography. The mature protein molecule has one N-terminal ␣-helix followed by seven consecutive antiparallel ␤-strands and two C-terminal ␣-helices (29). Three molecules form a doughnut-shaped quaternary structure with an internal channel 10 Å in diameter. By aligning the exon boundaries and the three-dimensional structure, it is observed that: exon 1 encodes the first 77 amino acid residues containing the mitochondria-targeting sequence (14); exon 2 encodes the N-terminal ␣-helix and the first two ␤-strands; exon 3 encodes ␤-strand 3; exon 4 encodes ␤-strands 4 and 5; exon 5 encodes ␤-strands 6 and 7; and the last exon encodes for the C-terminal ␣-helices. As gC1qR was shown to be a multiligand-binding protein, it would be interesting to map the binding sites on gC1qR to various ligands. More recently, gC1qR was found to function as a receptor for the internalin B (InlB) invasion protein of Listeria monocytogenes (30) and to bind to protein A of Staphylococcus aureus (31). Another report showed that the hepatitis virus C core antigen can interact with gC1qR, and the anti-proliferation effects of hepatitis virus C core antigen on activated T cells can be reversed by anti-gC1qR monoclonal antibodies (16). All of these reports further suggest that gC1qR is a biologically important, widely distributed, multiligand-binding and multifunctional protein (1).  7. Gel-shift assays. 8 g of PANC-1 nuclear extracts were used for each individual reaction. A, gel shift was observed only in the L6 DNA probe, corresponding to the sequence Ϫ96 to Ϫ76 of the human C1qBP gene. B, a competition assay was performed using 8 g of the PANC-1 nuclear extracts in the presence of an increasing concentration of unlabeled DNA probe (0-, 10-, and 100-fold). The formation of DNA-protein complexes was gradually inhibited only by the specific oligo but not by an unrelated oligo. In the negative control experiments (Ϫve) using an unrelated oligo, only the formation of DNA-protein complex 1 was inhibited, but the formation of the other three complexes was not disturbed; this showed that DNA-protein complex 1 was an artifact.
FIG. 8. Supershift assay was performed using the L6 oligo as a probe. When 4 g of anti-Sp1 antibodies were added to the mixture, the band of DNA-protein complex 4 shifted upward from its original position (as shown when no antibody was added), whereas the other three bands were unaffected. DNA-protein complex 4 should be a complex of oligo L6 and the transcription factor Sp1. Bovine serum albumin was used as the negative control in this assay, which was unable to display DNA-protein complex 4.