Functional and Structural Profiling of the Human Thrombopoietin Gene Promoter*

Human thrombopoietin (TPO) is involved in cardiovascular disease as it regulates megakaryocyte development and enhances platelet adhesion/aggregation. The THPO promoter structure is still controversial. By reverse transcription-PCR, we confirm that THPO transcription is cell line-dependently initiated at two alternative promoters, which we newly designated P1a and P1. We subsequently electrophoretically scanned and resequenced these portions in 95 and 57 patients with cardiovascular disease, respectively, and identified seven variants (–1450/del58bp, C-920T [rs2855306], A-622G, C-413T [rs885838], C+5A, G+115A, and C+135T). After subcloning of 1032 bp of THPO P1 in pGL3-basic vector, five molecular haplotypes (MolHaps1–5) were observed: [A–622-C–413-C+5-G+115; wild type (wt)], [A–622-T–413-C+5-G+115], [G–622-T–413-C+5-G+115], [A–622-C–413-A+5-G+115], [A–622-C–413-C+5-A+115], and analyzed in reporter gene assays in HEK293T and HepG2 cells. MolHaps 2, 4, and 5 were significantly more active than wt (all p values ≤0.01) in HEK293T cells, MolHap3 exerted a substantial loss of promoter activity (p < 0.0001 in HEK293T and p < 0.01 in HepG2, compared with wt). Electrophoretic mobility shift assays revealed that A-622G and C-413T individually differed from MolHaps in their DNA-protein interaction patterns. Supershift and chromatin immunoprecipitation assays identified CCAAT/enhancer-binding protein δ as the binding protein exclusively for the –622A allelic portion.

The human glycoprotein hormone thrombopoietin (TPO), 3 also known as myeloproliferative leukemia virus ligand (c-Mpl), is an important growth factor for the megakaryocyte lineage, controlling its proliferation and differentiation (1,2). TPO affects platelet function by stimulating platelet aggregation via phosphoinositide-3 kinase (3), and it affects mature platelets by reducing the threshold levels of ADP, collagen, or thrombin necessary for aggregation (4,5) and stimulates platelet adhesion (6). In TPO-deficient mice platelet counts are reduced by ϳ90% compared with wild type mice (wt) (7). Circulating TPO levels are strongly influenced by inflammatory processes, including the presence of coronary atherosclerosis (8). Although predominantly expressed in liver cells, TPO is also generated by kidney cells and bone marrow stroma as well as smooth muscle cells (9). Recently, a genome-wide linkage analysis in a large Asian kindred proposed TPO as a candidate for platelet count variation (10). The THPO locus has been mapped to chromosome 3q26.3-q27. Three groups independently reported that THPO comprises 6 exons and 5 introns spanning more than 6.2 kilobases (kb) (11)(12)(13), whereas Chang et al. (14) proposed that it consists of 7 exons and 6 introns spanning 8 kb. Thus, the 5Ј-regulatory region consists of two alternative promoters which we designated P1a and the intronic promoter P1 with multiple transcription initiation sites, with 10% of all transcripts synthesized from P1a and 90% originating from P1 (15,16). Given the controversy with respect to the THPO structure, we analyzed the two alternative promoter regions and identified 7 exons instead of 6 and propose an alternative nomenclature for THPO architecture. Furthermore, we systematically scanned the promoter region P1 in 190 chromosomes of high risk myocardial infarction (MI) patients for genetic variation/ single-nucleotide polymorphisms (SNPs) and resequenced this promoter portion in an additional set of 114 chromosomes of patients with cardiovascular disease (CVD) including MI and essential hypertension. We then performed molecular func-* This work has been supported by grants from the German Ministry for Edu-  tional analyses of the identified promoter variations in singleallelic assays as well as for the molecular haplotypes (MolHaps), determined by individual subcloning procedures in the 57 CVD patients. Subsequently, in-depths reporter gene and band shift assays were performed in two different cell lines under three different stimulatory regimes.

EXPERIMENTAL PROCEDURES
Study Populations-The current investigation was based on the Etude Cas-Témoins de l'Infarctus du Myocarde (ECTIM) and Münster Molecular Functional Profiling for Mechanism Detection (MolProMD) studies. ECTIM is a study of patients with MI (n ϭ 1000) from several regions covered by World Health Organization MONICA (MONItoring trends and determinants in CArdiovascular disease) registers and controls (n ϭ 1000) representative of each geographic area. The ECTIM Study design was originally described in 1992 (17). Genetic informed consent was obtained from all study subjects. Ninetyfive high risk MI patients (positive family history for CVD, MI Յ 55 years of age) from the ECTIM study were selected for a preliminary scan of the THPO promoter region. An additional sample of 57 CVD patients recruited in Münster hospitals (ongoing recruitment) as well as 35 healthy controls was used for molecular haplotyping and further functional profiling studies. The Münster MolProMD study is a running study of patients with CVD (MI, essential hypertension) and families aimed at studying molecular genetic mechanism of CVD. The study was approved by the ethics committee of the Medical Faculty, Westfälische Wilhelms-University of Münster, and written informed consent was obtained from all study subjects.
Scanning of the THPO Promoter P1 for Genetic Variants-Genomic DNA was prepared from white blood cells using either phenol-chloroform extraction or a salting out method (18). From the published sequences of the THPO gene (accession number D32046), 15 overlapping fragments Ͻ300 bp in length from 95 MI patients (190 alleles) of the ECTIM study were amplified to cover 2084 bp of the upstream region. Specific amplification protocols can be obtained at GeneCanvas INSERM website.
Single-strand Conformation Polymorphism analysis was performed as previously described (19). DNA from patients presenting different migration patterns on the polyacrylamide gels were then sequenced twice (both DNA strands with sense and antisense primers) using an automated sequencing device (ABI Prism 377, PerkinElmer Life Sciences).
Resequencing of the THPO Promoter P1 and Identification of Molecular Haplotypes (MolHaps)-To identify THPO promoter MolHaps, 1032 bp of the promoter region P1 of genomic DNA from 57 CVD patients from the Münster study were amplified and subcloned into vector pCRII-TOPO (Invitrogen) and sequenced twice as described above.
Isolation of Total RNA and Generation of cDNA-Total RNA from cells was isolated with the RNeasy Mini kit (Qiagen, Hilden, Germany) following the manufacturer's protocol. Of the total RNA, 5 g were used for the generation of cDNA, performed with the First Strand cDNA Synthesis kit (Fermentas, St. Leon-Rot, Germany).
Generation of Reporter Gene Vectors-Promoter constructs were generated using human genomic DNA from patients bearing either the Ϫ413C or T allele and the Ϫ622A or Ϫ622G allele (positions from the first nucleotide of exon 1; accession number U17071) as template. Sequence localization refers to the first nucleotide of exon 1; the sense primer was located at position Ϫ902, 5Ј-upstream of exon 1 (5Ј-AGAGAGCCT-GAGGAAGTTCTG-3Ј), and the antisense primer was located within exon 1 (5Ј-ACTCACCGGGTGGAGAAGG-3Ј). Amplicons were subcloned into the luciferase reporter gene vector pGL3-basic (Promega, Mannheim, Germany), generating the reporter vectors pGL3-P1_1032 containing the Ϫ413C and Ϫ622A allele, pGL3-P1-413T containing the T allele, and pGL3-P1-413T/Ϫ622G containing the T allele of C-413T and the G allele of A-622G. The presence and correctness of the sequences were confirmed by automated sequencing as described above.
Terminal Deoxynucleotidyltransferase Labeling and Annealing-All oligonucleotides for electrophoretic mobility shift assay (EMSA) (supplemental Table 1) were synthesized at a minimum coupling efficiency of Ͼ98.5% and purified twice by high performance liquid chromatography (IBA, Goettingen, Germany). Single-stranded DNA was labeled using the biotin 3Ј-end DNA labeling kit (Thermo Fischer, Bonn, Germany) with Biotin-11-dUTP, Biotin-16-ddUTP (Roche Applied Science), or [␥-32 P]ATP following the manufacturer's protocol. To generate double-stranded labeled oligonucleotides for EMSA, 2 pmol each of sense and antisense oligonucleotides were mixed in 100 mmol/liter NaCl, heated at 95°C for 5 min, and then allowed to cool down to RT overnight.
Extraction of Nuclear Proteins-Nuclear proteins were harvested by a modified procedure of the protocol published by Schreiber et al. (20). Intactness of nuclear protein extracts was ascertained by SDS-PAGE and Coomassie Blue staining.
EMSA-EMSAs were performed using the LightShift chemiluminescent EMSA kit (Thermo Fischer). For each reaction 5 g of nuclear protein extract were mixed with 500 ng of presheared poly(dI⅐dC)/poly(dA⅐dT) (Amersham Biosciences) as nonspecific competitor plus, where applicable, a 200-fold molar excess of unlabeled oligonucleotide (8 -10 pmol/reaction) as specific competitor and incubated with binding buffer (20 mmol/liter MgCl 2 , 240 mmol/liter KCl, 40 mmol/liter HEPES/KOH pH 7.9, 5 mmol/liter spermidine (Sigma), 20% Ficoll) for 5 min at RT. DNA concentration in lanes with no specific competitor was adjusted with the according amount of poly(dI⅐dC)/poly(dA⅐dT). After the addition of 40 -50 fmol of the labeled THPO probe bearing one allele, reactions were left for 20 min at RT or 45 min on ice. For supershift reactions, 0.2 g of the chosen antibody was added and incubated for an additional 15 min at RT. Gels were run as a 6% native PAGE in 0.5ϫ or 0.25ϫ Tris-buffered EDTA, blotted, and visualized using the Chemiluminescent Nucleic Acid Detection Module (Thermo Fischer).
Chromatin Immunoprecipitation (ChIP) Assay and Realtime PCR-ChIP was performed as described previously (21,22). Approximately 10 8 cells were fixed by adding formaldehyde to a final concentration of 1% and incubated by modest shaking for 30 min at RT. Thereafter, cells were washed twice with cold phosphate-buffered saline (Sigma). The pellet was resuspended and lysed, and nuclei were isolated and sonicated until the chromatin had an average length of ϳ500 -1500 bp. After centrifugation, the supernatant was incubated with 3 g of antibody against CCAAT/enhancer binding protein ␣ (C/EBP␣; #2295, Cell Signaling, Frankfurt, Germany), C/EBP␤ (#3082, Cell Signaling), and C/EBP␦ (#2318, Cell Signaling) overnight at 4°C for immunoprecipitation. The following day 10 l of magnetic protein-G beads (Invitrogen) were added and further incubated at 4°C for 1-3 h. After appropriate washing, the antibody-transcription factor-DNA complex was eluted from the beads, subsequently formaldehyde cross-links were reversed, and proteins were digested with proteinase K at 67°C overnight. DNA was extracted by phenol/chloroform/isoamyl alcohol, washed with 75% ethanol, resuspended in water, digested with the restriction enzyme TsoI (Fermentas) to shorten the DNA fragments, and used for PCR and real-time PCR. Primer for the promoter variant A-622G were 5Ј-GGA-CAGAGACTGTGGGGAG-3Ј (sense) and 5Ј-GACCTCC-CCCGCAAATAC-3Ј (antisense). Real-time PCR was performed using the Platinum SYBR Green qPCR Super-Mix-UDG with ROX (Invitrogen) and a 7500 ABI-prism sequence detection system (Applied Biosystems). Values were normalized against glyceraldehyde-3-phosphate dehydrogenase. Semiquantitative PCR was followed by densitometric analyses using NIH Image J 1.40 (rsb.info.nih.gov/ij/download.html).

In Silico Analyses of Putative Transcription Factor Binding
Sites and CpG Islands-Portions of about 50 bp (ϳ25 bp flanking either side of the polymorphic positions) were subjected to computer-aided analyses using the Alibaba2.1 net-based search tool (23). For the detection of CpG islands, 1188 bp of THPO promoter region P1a were analyzed using the "CpG island searcher" tool.

Transcription of THPO Is Initiated at Exon 1a and Exon 1-
To analyze the promoter selection of THPO in HEK293T and HepG2, we designed sense primers within exon 1a (E1aS) and 1 (E1S) and antisense primers within exons 2 (E2AS) and 3 (E3AS) for reverse transcription-PCR (Fig. 1B). We detected THPO in both cell lines (Fig. 1B) and under all stimulatory regimes, without stimulation and with 10 Ϫ8 mol/liter PMA and 0.5 mmol/liter 8-Br-cAMP. THPO transcription initiated from promoter P1a was much more prominent in the embryonic cell line HEK293T than in the adult liver cell line HepG2 except for cAMP-stimulated cells.
Analyses of the THPO Promoter Region P1a-We designed two pairs of primers flanking P1a and P1, directly sequenced, and subsequently subcloned genomic DNA of 57 MI patients to determine MolHaps of four promoter variants of promoter P1 (Fig. 2). The P1a region was amplified from positions Ϫ664 to Ϫ1653 (from the first nucleotide of exon 1; accession number U17071). As this area was GC-rich, we subsequently performed a CpG island analysis. CpG islands are defined as regions of DNA (Ն200 bp) with a GC content Ͼ50% and a ratio of observed CpG to expected CpG (Obs CpG /Exp CpG ) of Ն0.6 (24). We analyzed 1188 bp of THPO promoter region P1a from positions 120601 to 121680 (accession number AC078797) and observed a GC content of 65.7% with an Obs CpG /Exp CpG ratio of 0.688, strongly indicating the presence of CpG islands.
Computational Prediction of Putative Transcription Factor Binding Sites (TFBSs) in the THPO Promoter P1-Analyses of A-622G, C-413T, Cϩ5A, and Gϩ115A with the net-based program Alibaba2.1 predicted several TFBSs, some of which were altered by the presence of different alleles (Fig. 3).
Identification of Genetic Variants in the THPO Promoter and Determination of Molecular Haplotypes-By the combined scanning approach using PCR-single-strand conformation polymorphism analysis in 95 MI patients from the ECTIM study and resequencing of the THPO promoter P1 in an additional set of 57 CVD patients from the MolProMD Study, we identified 6 SNPs and a 58-bp deletion variant (Fig. 1A). Three variants were located in the 5Ј-flanking (C-920T [rs2855306], A-622G, and C-413T [rs885838]) and three in the 5Ј-untranslated region (Cϩ5A, Gϩ115A, and Cϩ135T). The 58-bp deletion identified in 4 CVD patients is located between positions Ϫ1450 and Ϫ1507 (Ϫ1450/del58bp; accession number U17071), 137 bp upstream of exon 1a. The fragment containing the A-622G site display sequence similarity to members of the C/EBP family, particularly C/EBP␦ and C/EBP␣, which belong to the basic leucine zipper protein (bZIP) transcription factor family. This SNP is also located in a binding site for Sp1, but the position of this binding site is displaced for 2 bp by the G allele. For the G allele, the sequence shows similarity to a member of the Ets family, c-Ets-1, which is involved in the transcriptional activation of different genes such as stromelysin 1, collagenase 1, or urokinase type plasminogen activator (25). A core binding site for PEA3, a member of the Ets family of transcription factors, was identified by visual accounting and is only present for Ϫ622G. The sequence containing C-413T display similarity for the consensus sequence of a member of the nuclear factor 1 (NF-1) family and the T allele represents a putative binding site for C/EBP. The consensus sequence for this transcription factor additionally shows sequence similarity to both alleles of Cϩ5A and Gϩ115A, which are both located in exon 1. The ϩ5C allele is located in a binding site for the thyroid hormone receptor (T3R). Both alleles of Gϩ115A show sequence similarity for members of the cAMP response element-binding protein (CREBP) family, more precisely CREBP1, which has a leucine zipper structure (26). For the A allele, we also found a putative binding site for members of the PU.1 transcription factor family and GATA-binding protein 1, a zinc finger transcription factor. The members of the PU.1 family belong to a divergent subclass of the Ets transcription factor family (27).
Transfection Experiments of the THPO Promoter P1-To investigate the identified genetic variants A-622G, C-413T, Cϩ5A, and Gϩ115A for their effect on promoter activity, DNA fragments corresponding to the five MolHaps were inserted into pGL3-basic (Promega). Transient transfection experiments were performed in HEK293T and HepG2 cells, because these cell lines expressed THPO endogenously. In HEK293T, the activity of MolHap4 was significantly increased under basic conditions (p Ͻ 0.0001) and under both stimulatory conditions, 10 Ϫ8 mol/liter PMA (p ϭ 0.0124) and 0.5 mmol/liter 8-Br-cAMP (p ϭ 0.0136) (Fig. 4A), which was not observed in HepG2 cells under stimulatory conditions. Under basic conditions MolHap4 activity was increased compared with wt (p ϭ 0.013) (Fig. 4B). MolHap5 also showed a significantly increased transcriptional activity compared with Mol-Hap1 (wt) in HEK293T cells under all stimulatory regimes (basic conditions, p Ͻ 0.0001; PMA, p Ͻ 0.0001; cAMP, p Ͻ 0.0001; Fig. 4A), which was not observed in HepG2 cells under basic conditions. Stimulation with PMA and cAMP, however, resulted in a slightly increased transcriptional activity (PMA, p ϭ 0.02; cAMP, p ϭ 0.033) (Fig. 4B). Because MolHap4 occurred only once in the MolProMD and twice in the ECTIM sample and MolHap5 only once in MolProMD and ECTIM, they were not further studied. In HEK293T cells, under basic A B HEK293T HepG2  and stimulatory conditions MolHap2 was significantly more transcriptionally active than wt (basic conditions, p ϭ 0.0036; PMA, p ϭ 0.0014; cAMP, p ϭ 0.0004) (Fig. 4A). In HepG2 cells there was no significant difference in transcriptional activity of MolHap1 and 2 (Fig. 4B). Conversely, MolHap3 displayed a significant abrogation in promoter activity in both cell lines; 1) in HEK293T (basic conditions, p Ͻ 0.0001; PMA, p Ͻ 0.0001; cAMP, p ϭ 0.0003) (Fig. 4A), 2) in HepG2 (basic conditions, p Ͻ 0.0001; PMA, p ϭ 0.001; cAMP, p ϭ 0.034) (Fig. 4B) compared with wt. C-920T did not show any allelic differences in promoter activity in transient transfection assays using HEK293T and HepG2 cells (data not shown) and was, therefore, not further investigated.
EMSAs for A-622G and C-413T with Nuclear Extracts from HEK293T and HepG2-Investigation of A-622G within the THPO promoter P1 by EMSA resulted in an altered DNA-protein binding pattern with nuclear proteins from HEK293T (   For this in silico analysis, the net-based program Ali-baba2.1, which uses the data base Transfac 7.0, was used. A-622G resides in a putative Sp1 site and is adjacent to binding sites for the family of C/EBP proteins. For the Ϫ622G allele, a core binding site for PEA3, a member of the Ets-family of transcription factors, and c-Ets-1 was identified. For the polymorphic site at Ϫ413, two putative NF-1 sites are predicted, whereas the T allelecontaining promoter may additionally bind C/EBP␣ (as for Cϩ5A). Putative binding sites for CRE-BP1 and C/EBP␣ are predicted for the polymorphic site at position ϩ115; the A allele also resides in binding sites for PU.1 and GATA-1. The wt sequence is shown in black, and the polymorphic site is in light gray; the variant sequence is shown in underlined black letters; TFBSs predicted for both alleles are labeled in black with an asterisk, allele-specific TFBSs are given in the same shading as the respective SNP.
PMA stimulation (Fig. 5A, arrow) and another faster migrating non-allele-specific band (Fig. 5A, single asterisk). Conversely, the interaction with HepG2 nuclear proteins showed an A allele-specific band (Fig. 5B, arrow), suggesting different expression patterns of transcription factors in the two different cell lines. Supershift assays with a C/EBP␦ antibody resulted in a specific shift of the A allele with nuclear proteins from HEK293T (Fig. 5C). This finding could be confirmed by ChIP assays, which identified C/EBP␦ as one of the binding factors at position Ϫ622 (Fig. 5D). Nuclear extracts from HepG2 cells did not lead to any specific supershift, indicating that C/EBP␦ does not bind in this cell line or is not present to a sufficient amount. Despite the in silico prediction of an Ets-binding site for the Ϫ622G allele, bandshift assays using antibodies against Ets-1/-2 as well as antibodies against Sp1, C/EBP␣ and C/EBP␤ failed to interact with nuclear proteins from HEK293T or HepG2 (data not shown).
EMSA for C-413T resulted in specific interaction with nuclear proteins from HEK293T and HepG2 for both alleles upon basic and stimulatory conditions (Fig. 6, A and B), whereas the binding of the T allele seemed to be much stronger with nuclear extracts from HEK293T. Competition with consensus sites for transcription factors Sp1, NF-1, and C/EBP resulted in a specific competition of both alleles and both cell lines with the NF-1 probe (Fig. 6,  A and B). The Sp1 and the C/EBP consensus sites did not compete any DNA-protein interaction. These results indicate that NF-1, but neither SP1 nor a member of the C/EBP family, is involved in THPO promoter binding in this region.
Supershift assays with a Sp1 antibody as well as EMSA experiments with the SP1 binding domain instead of nuclear proteins did not result in any shift, indicating that SP1 is either not involved in DNA binding or not present at sufficiently high levels to being detected (data not shown). Supershift assays with anti-NF-1 antibody were also negative. . Sequence harboring A-622G is bound by nuclear proteins from HEK293T and HepG2 cells. A, EMSAs were performed with 5 g of nuclear extracts from HEK293T cells without (basic cond.), with 10 Ϫ8 mol/liter PMA, and with 0.5 mmol/liter 8-Br-cAMP stimulation. Two 31-bp oligonucleotides served as probes bearing either the Ϫ622A or the Ϫ622G allele and was competed with the same unlabeled probe in 200-fold excess. Under all stimulatory conditions, one specific (non-allelic) band appeared that was competed (single asterisk). In addition, a slower migrating G allele-specific band appeared under all stimulatory conditions (arrow). B, EMSAs were performed as described above with nuclear extracts from HepG2 cells without (basic conditions), with 10 Ϫ8 mol/liter PMA, and with 0.5 mmol/liter 8-Br-cAMP stimulation. Under all stimulatory conditions, one A allele-specific band appeared that could compete (arrow). C, supershift assays were performed with 5 g of nuclear extracts from HEK293T cells without (basic cond.), with 10 Ϫ8 mol/liter PMA, and with 0.5 mmol/liter 8-Br-cAMP stimulation. Because in silico analyses of putative TFBSs for A-622G predicted a C/EBP␦ binding site, supershift assays were performed with antibodies against this member of the C/EBP family. Two 31-bp oligonucleotides with either Ϫ622A or Ϫ622G served as probes and was competed with the same unlabeled probe. The addition of nuclear extract and antibody resulted in an additional, compatible band specific for the Ϫ622A allele (single asterisk). D, ChIP assays showed that C/EBP␦ is bound to the Ϫ622 site in HEK293T cells. Antibodies against C/EBP␦ precipitated significantly more Ϫ622 probe as quantified by densitometric (upper and middle panel) and real-time PCR (bottom panel) compared with the remaining C/EBP family members. Band intensities were normalized to the input DNA.
EMSAs for MolHaps1-3 with Nuclear Extracts from HEK293T and HepG2-Because Ϫ622G only occurred together with Ϫ413T, we investigated the interaction of a fragment that contained both SNPs in one strand with nuclear extracts from HEK293T and HepG2 by EMSA. For that purpose, we amplified 249-bp fragments by PCR that represented MolHaps1-3. All three labeled haplotypes competed either with the unlabeled MolHap probes or specifically with individual double-stranded 31-bp oligonucleotides with either Ϫ622A/G or Ϫ413T/C alleles. The addition of nuclear proteins from HEK293T resulted in the presence of two specific bands (Fig. 7A). One band (Fig. 7A, single asterisk) was specifically competed by the full-length homologous amplicon or by the oligonucleotide bearing the Ϫ413C or T allele but failed to interact with the oligonucleotide bearing the Ϫ622A or Ϫ622G allele. A slower migrating band appeared in probes bearing the major Ϫ622A allele (Fig. 7A, double asterisks) but was undetected in the probe with the G allele. This slower band was significantly more prominent when probed with MolHap2. The band was specifically competed by the cold probe alone but did not disappear upon competition with either SNP. This indicates that the MolHap sequence regions are specifically interacting with nuclear proteins and that this DNA-protein interaction depends on the presence of both variants in a strand. The same experiment with nuclear extracts from HepG2 showed a similar result (Fig. 7B). There was also a slow migrating-specific band that was competed with the full-length amplicon but not with either single SNP, and it did not appear in the presence of the Ϫ622G allele.

DISCUSSION
In the wake of non-hypothesisdriven genome-wide approaches (nk-profiling) (28,29), which offer the opportunity of detection of genomic loci associated with complex genetic disease, specific and accurate molecular functional profiling of genetic variation in any gene region is warranted. In the present study we analyzed the transcriptional organization of the human THPO and propose a revised nomenclature for its molecular architecture, taking into account previous and partly contradictory reports. We confirm that THPO consists of seven exons and six introns and that transcription is initiated at two alternative promoters. The groups who reported that THPO consisted of 6 exons and 5 introns obtained their results by RNase protection and 5Ј-rapid amplification of cDNA ends assays; those experiments were performed with adult liver cell lines and revealed a single promoter region with multiple transcription initiation sites (12,13). In the HepG2 cells used in our experiments, transcription was nearly exclusively started at the downstream promoter, leading to transcripts missing upstream exonic sequences, backing the above-mentioned finding, which led to the OMIM nomenclature of exons 1-6 (accession number D32046). On the other hand, Chang et al. (14) reported that THPO contains seven exons and six introns. In embryonic HEK293T cells we demonstrated that transcription is also strongly driven by an upstream promoter region located 5Ј of an additional, noncoding exon. Analogous to the nomenclature of the human prolactin gene (30,31), we suggest designating this exon as 1a (alternative) and retaining the OMIM nomenclature of exon 1-6 for the remaining structure. The respective promoter regions, which we characterized in the present work, are consequently assigned promoter P1a, upstream of exon 1a, and promoter P1, upstream of exon 1. . Sequence harboring ؊413C is allele-specifically bound by nuclear proteins from HEK293T and HepG2 cells. A, EMSAs were performed as mentioned above but with probes labeled with [␥-32 P]ATP with nuclear extracts from HEK293T cells without (basic conditions), with 10 Ϫ8 mol/liter PMA, and 0. 5 mmol/liter 8-Br-cAMP stimulation. Two 31-bp oligonucleotides with either Ϫ413C or Ϫ413T served as probes and were competed with the same unlabeled probe and oligos bearing the consensus site of the transcription factors Sp1, NF-1, and C/EBP in 200-fold excess. The addition of the nuclear extract to the labeled oligonucleotides resulted in a specific band under all stimulatory conditions (arrow) that could compete with the cold probe and C NF1 . Binding of the T allele was more pronounced compared with the C allele. B, EMSAs were performed as mentioned above with nuclear extracts from HepG2 cells without (basic condition), with 10 Ϫ8 mol/liter PMA, and 0.5 mmol/liter 8-Br-cAMP stimulation. Two 31-bp oligonucleotides with either Ϫ413C or Ϫ413T served as probes and was competed with the same unlabeled probe and oligos bearing the consensus site of the transcription factors Sp1, NF-1, and C/EBP in 200-fold excess. The addition of the nuclear extract to the labeled oligonucleotides resulted in the presence of two specific bands (arrows), which were competed with the cold probe alone and C NF1 .
To identify the molecular basis for this differential promoter usage, we first subcloned and sequenced 1188 bp of promoter P1a and exon 1a of 57 patients with CVD. The THPO promoter P1a is very GC-rich and contains several CpG islands, rendering it a possible target for methylation with activating or inactivating capacities. Because methylation events often represent long-lived regulatory procedures, it is conceivable that transcription from P1a might be repressed by methylation during ontogeny or due to differentiation, as observed in HepG2.
P1 is the major, if not the only promoter utilized in HepG2 cells and probably responsible for THPO expression in the adult individual. This is in good agreement with results from Ghilardi et al. (15) reporting that in human liver 90% of the transcripts originate from P1. The intronic promoter is located between exons 1a and 1; thus, the region responsible for its transcriptional regulatory potency is clearly defined. By subcloning and resequencing patient DNA, we identified the common MolHaps1-3, , and [G Ϫ622 -T Ϫ413 -C ϩ5 -G ϩ115 ], respectively. As MolHaps4 and 5, even if functional, were observed only twice or once in 57 patients, respectively, we focused our analyses on MolHaps1-3, which were defined by allelic constellations representing the distal promoter variant sites A-622G and C-413T. Transient transfection of these haplotypic strands in the context of the full-length P1 revealed a strong transcriptional activity in both HEK293T and HepG2, both variants residing in functional promoter regions.
Furthermore, we were able to demonstrate that single genetic and molecular haplotypic variation in the THPO promoter P1 have distinct and profound functional consequences for transcription as well as DNA-protein interactions as identified by appropriate promoter analyses and bandshift assays. Mol-Hap3 led to a significant loss in transcriptional activity in both cell lines under basic and stimulation regimes including potent inflammatory (cAMP) and cell differentiation (PMA) factors compared with the wt promoter. In contrast to HepG2, MolHap2 was significantly more transcriptionally active compared with the wt sequence in HEK293T cells. To investigate whether these obviously functional sequences interacted with transcription factors, we performed in silico analyses, which revealed that A-622G is located within a consensus binding site for CCAAT box/enhancer-binding proteins, a family of proteins with either activating or inhibiting (or both) nature (C/EBP␣, -␤, -␥, -␦, -⑀, -) (32). The physiological activity of C/EBP transcription factors within a physiological cellular sit- FIGURE 7. MolHap sequences (؊622G and ؊413T) interact differentially with nuclear proteins from HEK293T and HepG2 cells. A, EMSAs were performed with 5 g of nuclear extracts from HEK293T cells without stimulation. 249-bp probes representing MolHaps1-3 were competed with 200-fold excess of either the very probe or specifically by individual double-strand 31-bp oligonucleotides harboring either the Ϫ622 or the Ϫ413 variant. Binding of nuclear proteins resulted in two specific bands. One band (single asterisks) was specifically competed by the full-length homologous amplicon or by the oligonucleotide bearing either the Ϫ413C or T alleles but failed to interact with the oligonucleotide bearing the Ϫ622A or Ϫ622G alleles. A slower migrating band (double asterisks) appeared in probes bearing the major A allele at position Ϫ622 but was undetectable in the probe presenting the G at that position (MolHap3). This slower band was more prominent when probed with MolHap2. The band was specifically competed by the cold probe alone but did not disappear upon competition with either isolated SNPs. B, EMSAs were performed as described above with nuclear extracts from HepG2 cells without stimulation. Binding of nuclear proteins resulted in one specific band (single asterisks) that appeared in probes bearing the major A allele at position Ϫ622 but was undetectable in the probe presenting the G at that position (MolHap3). The band was specifically competed by the cold probe alone but did not disappear upon competition with either isolated SNPs. uation depends on the actual availability of these factors, their interaction with other partners, and their activation status. After having shown in EMSAs that specific DNA-protein interaction occurred, supershift experiments revealed that C/EBP␦ specifically binds to the Ϫ622A allele in HEK293T cells irrespective of the stimulatory regime applied. We were able to confirm a specific C/EBP␦ binding at this site by ChIP assays. Despite the in silico prediction of an Ets-binding site for the Ϫ622G allele, bandshift assays using antibodies against Ets-1/-2 as well as antibodies against Sp1, C/EBP␣, and C/EBP␤ failed to interact with nuclear proteins from HEK293T or HepG2.
By competing the C-413T variant with consensus sites of SP1, NF-1, and C/EBP, we observed that both alleles bind NF-1 in both cell lines. The Ϫ413T-containing promoter, a potential binding site for C/EBP␣ and NF-1, displayed a significantly enhanced promoter activity compared with the wt sequence in HEK293T cells, whereas in the MolHap3 constellation comprising Ϫ622G the promoter activity in both cell lines was substantially decreased. This suggests that Ϫ622G may have a key role as a repressor allele. Moreover, with the use of EMSAs, we were able to demonstrate that nuclear proteins specifically interact with the region covering the A-622G and C-413T polymorphic sites and that this interaction was more prominent in the presence of the Ϫ413T allele in HEK293T cells and missing in the presence of Ϫ622G allele in both cell lines. This indicates a loss of protein binding, which may explain the decreased transcriptional activity of MolHap3 in transfection experiments. We did not see any differences between PMA and cAMP-stimulated cells in protein binding. In the present analysis, we were able to demonstrate that protein binding patterns differed between promoter fractions containing single variation and MolHaps. Indeed, in the embryonic cell line HEK293T and for all three MolHaps we identified distinct and specific binding patterns, which were compatible with the entire sequence fragment as well as C-413T alone. This gives rise to the notion that some DNA-protein interactions were specific exclusively in the presence of both alleles simultaneously in the promoter strand. In the adult liver cell line HepG2 we could not identify this specific DNA-protein interaction, which indicates a different protein binding pattern in these two cell lines. This clearly differed from single interactions involving C/EBP␦, as the Ϫ622Acontaining sequence was not able to compete the DNA-protein binding of MolHaps. In silico analyses might fail to accurately predict protein binding for MolHap constellations because they cannot integrate the impact of protein-protein interactions on the recruitment of factors to a transcriptional module. Spectrometric approaches, however, might provide a valuable tool to identify and quantify all proteins involved in promoter binding complexes and elucidate the kinetics of their assembly.
Interestingly, the THPO locus on chromosome 3q26-27 has recently been linked with bipolar disorders (29), and a significant association between THPO rs10513797 (intron 2) has been shown with type 1 diabetes mellitus and rheumatoid arthritis (see the Wellcome Trust Case Control Consortium; other disease phenotypes may emerge in future genome-wide studies). As these (mostly) tagging SNPs used in genome-wide studies are only markers in linkage disequilibrium with functional vari-ants, it would be interesting for these consortia to test whether our functional MolHaps are associated with these latter disease phenotypes.
In conclusion, in the present analyses we functionally and structurally characterized the alternative promoter P1a and propose to designate the most distal exon as 1a. Within P1, we identified three common functional MolHaps, which substantially influenced its transcriptional activity. We provide strong evidence that both A-622G and C-413T are functional as they reside in an active region of THPO promoter P1. Further investigations should identify associations of MolHaps with TPO mRNA levels in appropriate cells extracted from typed individuals as well as differential gene expression profiles. Additionally, coexpression experiments and RNA-mediated interference approaches may help to elucidate the interaction of different transcription factors within the THPO promoter. With respect to haplotype-specific protein binding, protein trapping technologies might help to identify candidate proteins that act in concert to regulate THPO transcription.