Transcriptional regulation of the gene coding for human protein C.

The promoter for the gene coding for human protein C has been characterized as to nucleotide sequences that regulate the synthesis of mRNA. The major transcription start site was found 65 nucleotides upstream from the first intron/exon boundary along with two minor sites. Functional characterization of 1528 base pairs at the 5'-end of the gene was then carried out by chloramphenicol acetyltransferase reporter assays, protection from DNase I digestion, and electrophoretic mobility shift assays employing HepG2 and HeLa cells. One of the upstream regions (nucleotides -25 to +9) contained binding sites for at least two different transcription factors, including a hepatic nuclear factor 1-binding site (-10 to +9) and two overlapping and oppositely oriented hepatic nuclear factor 3-binding sites (-25 to -11). A second major region (PCE1) (+12 to +30) appeared to be a unique, liver-specific regulatory sequence. An Sp1-binding site in exon I (+58 to +65) was also recognized by cotransfection experiments with an Sp1 expression plasmid. Specific mutations in these promoter elements reduced transcriptional activity and abolished the binding of hepatic nuclear proteins. Finally, a strong silencer element (PCS1) (between -162 and -82) and two possible liver-specific enhancer regions (PCE2 and PCE3), which interact coordinately with the promoter elements, were also found (between -1462 and -162).

The promoter for the gene coding for human protein C has been characterized as to nucleotide sequences that regulate the synthesis of mRNA. The major transcription start site was found 65 nucleotides upstream from the first intron/exon boundary along with two minor sites. Functional characterization of 1528 base pairs at the 5-end of the gene was then carried out by chloramphenicol acetyltransferase reporter assays, protection from DNase I digestion, and electrophoretic mobility shift assays employing HepG2 and HeLa cells. One of the upstream regions (nucleotides ؊25 to ؉9) contained binding sites for at least two different transcription factors, including a hepatic nuclear factor 1-binding site (؊10 to ؉9) and two overlapping and oppositely oriented hepatic nuclear factor 3-binding sites (؊25 to ؊11). A second major region (PCE1) (؉12 to ؉30) appeared to be a unique, liver-specific regulatory sequence. An Sp1binding site in exon I (؉58 to ؉65) was also recognized by cotransfection experiments with an Sp1 expression plasmid. Specific mutations in these promoter elements reduced transcriptional activity and abolished the binding of hepatic nuclear proteins. Finally, a strong silencer element (PCS1) (between ؊162 and ؊82) and two possible liver-specific enhancer regions (PCE2 and PCE3), which interact coordinately with the promoter elements, were also found (between ؊1462 and ؊162).
Protein C is a vitamin K-dependent zymogen of a serine protease that is present in plasma (1,2). The active form, called activated protein C, can regulate the blood coagulation cascade by minor proteolysis by the inactivation of activated factors V and VIII (3). Protein C is synthesized in hepatocytes as a single chain precursor, which undergoes processing steps to give rise to a two-chain molecule held together by a disulfide bond. Additional post-translational modifications include carboxylation of 12 amino-terminal glutamic acid residues (4), hydroxylation of an aspartic acid residue (5,6), and glycosylation of several amino acid residues (7). The two-chain form is converted to activated protein C by thrombin in the presence of thrombomodulin by the cleavage of a 12-residue peptide from the amino terminus of the heavy chain (2,8). Protein C together with protein S, its cofactor, antithrombin III, and tissue factor pathway inhibitor represent major independent pathways for the regulation of blood coagulation. A deficiency of protein C constitutes a risk factor for venous thrombosis as well as other thrombotic disorders (9,10).
A large number of mutations have been found in the genes from patients with protein C deficiency, including several in the 5Ј-flanking region of the gene. Recently, activated protein C resistance with a factor V Leyden mutation has been identified as a highly occurring risk factor for thrombotic disease (11). However, individuals with a single genetic defect, such as protein C deficiency or activated protein C resistance, can be asymptomatic. Combined genetic defects often lead to a much higher thrombotic risk and support the concept that hereditary thrombophilia is often a multigenic disease (12,13).
The gene for protein C is ϳ11 kb 1 in length and contains nine exons (14,15). It is located on chromosome 2q13-q14. The gene shares significant organizational similarity with the genes coding for the other vitamin K-dependent proteins that circulate in blood. However, significant differences in the steady-state mRNA levels in liver and in the concentrations of these proteins in plasma occur.
A comparison of the 5Ј-flanking sequences of the protein C gene with those of the genes coding for the other vitamin K-dependent coagulation proteins indicates a significant DNA sequence divergence. Nevertheless, transcriptional regulation of these genes has certain common features. In this investigation, a number of positive elements as well as a negative element are identified that regulate the gene coding for human protein C. The data demonstrate that transcriptional activity of the TATA-less protein C promoter is largely dependent upon sequences surrounding the transcription initiation sites. Three liver-specific promoter regions are identified, including contiguous binding sites for HNF1 and HNF3 and a unique regulatory element (designated PCE1) present in exon I. Regions responsible for positive and negative regulations in the upstream enhancer region are also defined.

EXPERIMENTAL PROCEDURES
Sequencing of the 5Ј-Flanking Region of the Protein C Gene-A phage clone (pC6) was isolated previously by screening a human genomic DNA library with a radiolabeled cDNA probe for human protein C (14). A 4.4-kb EcoRI fragment containing the 5Ј-flanking region of the protein C gene was removed from phage pC6 and inserted into the XbaI site of pCAT-0 at the 5Ј-end of the CAT gene to yield plasmid pPC-4.4kb. The 5Ј-end sequence of the protein C gene was completely sequenced on both strands using synthetic primers employing the dideoxy terminator method of Sanger et al. (16).
Rapid Amplification of 5Ј-cDNA Ends-To analyze the 5Ј-end of the protein C gene, the liver 5Ј-RACE-Ready cDNA (CLONTECH) was amplified with the anchor primer (5Ј-CTGGTTCGGCCCACCTCT-GAAGGTTCCAGAATCGATAG-3Ј) and two nested antisense primers (RPC2, 5Ј-ACGTAGCTGCCGTAGCCGTCGAAGTCGACG-3Ј; and RPC7, 5Ј-ACCGTCGACGTGCTTGGACCAGAAGGCCAG-3Ј) designed from the protein C cDNA sequence. Human liver 5Ј-RACE-Ready cDNA prepared from a 33-year-old Caucasian female donor is an uncloned cDNA library with a unique single-stranded anchor oligonucleotide (5Ј-CACGAATTCACTATCGATTCTGGAACCTTCAGAGG-NH 3 -3Ј) attached to the 3Ј-ends of the cDNAs. The PCR products were purified and cloned into pCRII vector (Invitrogen) for sequencing of the 5Ј-cDNA ends.
Construction of Plasmids-Plasmid pCAT-0 was purchased from Promega, and plasmid pSV␤ was from CLONTECH. Plasmid pSV2-CAT was constructed as described previously (17). A 1482-bp fragment containing 1462 bp upstream and 20 bp downstream of the transcription initiation site of the protein C gene was obtained by amplification of pC6 using the polymerase chain reaction technique. The resulting DNA fragment was cloned into the XbaI site of pCAT-0, generating the pPC-1482 construct. A 107-bp StuI-XbaI fragment containing the exon I sequence of the protein C gene was obtained by amplification of pC6 using the PCR technique and was inserted into the StuI and XbaI sites of pPC-1482 to yield the pPC-1528 construct. Deletion constructs from plasmids pPC-1.5kb and pPCE-1.5kb were obtained by the exonuclease III unidirectional deletion method, by digestion of convenient restriction sites, or by PCR techniques followed by ligation reactions. The sequences of these constructs were completely verified by dideoxy sequencing.
Mutations in the promoter-binding sites were generated in plasmids pPC-1482 and pPC-1528 using oligonucleotide-directed mutagenesis and the polymerase chain reaction technique, respectively (18). Overlapping oligonucleotides with mutations were used as primers, and pPC-1482 and pPC-1528 were used as templates. The oligonucleotides used for sequencing primers, PCR primers, and EMSAs were synthesized on an Applied Biosystems Model 380B DNA synthesizer.
Cell Culture and Transfections-Human hepatoma cells (HepG2) were cultured in Ham's F-12 medium supplemented with L-glutamine, antibiotics (penicillin, streptomycin, and neomycin), and 5% fetal calf serum. HeLa cells were cultured in minimal Eagle's medium supplemented with L-glutamine, antibiotics, 1% nonessential amino acid, 1% sodium pyruvate, and 10% fetal calf serum. Both cell lines were maintained in a 5% CO 2 atmosphere at 37°C. Plasmid DNA (5 g) and pSV␤ (5 g) were used as an internal control and cotransfected into cultured cells by the calcium phosphate precipitation technique (19). Each transfection was repeated at least three times.
CAT and ␤-Galactosidase Assays-CAT activity was measured by the method of Gorman et al. (20,21), and ␤-galactosidase activity by the method of Herbomel et al. (22). CAT activities were normalized to ␤-galactosidase activities to correct for differences in transfection efficiency and cell concentrations.
DNase I Footprint Assays-DNase I footprint assays were performed in a 30-l reaction volume containing 20 mM Hepes, pH 7.9, 25 mM KCl, 4 mM MgCl 2 , 4 mM spermidine, 0.5 mM EDTA, 12 ng of pUC18 DNA, and 3 g of poly(dI-dC). Crude HepG2 nuclear extracts (containing 60 g of total protein) were added and incubated for 10 min at room temperature. End-labeled DNA fragments (1-2 ng) were added, and incubation was continued for an additional 10 min at room temperature. The reaction mixtures were then digested with freshly diluted DNase I for 2 min. Digestion was stopped by adding 70 l of stop solution (20 mM Tris-HCl, pH 7.5, 20 mM EDTA). The DNA was then extracted with phenol/chloroform, precipitated with ethanol, and analyzed on 6% polyacrylamide gels containing 7 M urea.

Sequence of the 5Ј-End of the Gene Coding for Human Protein
C-More than 1500 bp from the 5Ј-noncoding region of the gene coding for human protein C was isolated from phage pC6 (14) and sequenced (Fig. 1). The sequence from Ϫ617 to ϩ66 was identical to that previously reported by Foster et al. (14), while the additional sequence from Ϫ1462 to Ϫ618 was established in the present study. The bases were numbered relative to the major transcription initiation site (see below).
Transcription Initiation Site-Since the cellular level of the protein C mRNA is very low, rapid amplification of the 5Ј-cDNA ends was performed to identify the transcription start site(s). Two antisense primers (RPC2 and RPC7) designed from the protein C cDNA sequence and an anchor primer to the oligonucleotide-anchored 3Ј-ends of the cDNA library were used to amplify the 5Ј-region of the protein C mRNA employing a human liver cDNA library. PCR products were analyzed by 2% agarose gel electrophoresis, and the amplification product, which appeared as a band of ϳ400 bp, was isolated and cloned for sequencing. Sequencing of the resulting transcripts revealed several transcription initiation sites with ϳ80% of the transcripts starting at an A located 65 nucleotides upstream from the first intron/exon boundary or 1515 bp upstream from the translation start codon (AUG). This A was designated as ϩ1 in the DNA sequence shown in Fig. 1. Another 18% of the transcripts started at Ϫ7 and 2% at ϩ13. The ϩ13 initiation site corresponds to the site assigned previously (15). Also, a slightly different splice site was observed in the first intron/ exon II junction (ϩ1497, ϩ1498, AG) compared with that reported earlier (ϩ1493, ϩ1494, AG) (15).
Transcriptional Regulation of the Human Protein C Gene-To characterize sequences responsible for transcriptional regulation from the 5Ј-end of the gene, a 1482-bp segment including 20 bp from exon I (Ϫ1462 to ϩ20) and a 1528-bp segment including 66 bp from exon I (Ϫ1462 to ϩ66) were linked to a promoterless CAT reporter gene in plasmid pCAT-0. The resulting constructs, pPC-1482 and pPC-1528, were transfected into human hepatoma HepG2 and HeLa cells, and transient reporter gene expression was monitored by measuring CAT activity in the cell extracts. To correct for differences in cell number and DNA transfection efficiencies, the cells were cotransfected with a reference plasmid carrying the ␤-galactosidase gene under the control of the SV40 early promoterenhancer. After transfection, CAT activities were measured and normalized to ␤-galactosidase activity. When plasmid pPC-1482, containing the 5Ј-flanking region of the protein C gene from Ϫ1462 to ϩ20, was transfected into HepG2 cells, low but detectable levels of CAT activity were produced in HepG2 cells. Plasmid pPC-1528, containing the 5Ј-flanking region from Ϫ1642 to ϩ66, resulted in CAT activity 10-fold higher than that obtained with pPC-1482. Only background levels of activity were observed with both plasmids in HeLa cells. These results indicate that the 5Ј-flanking sequence with the 20-bp exon I sequence is sufficient to direct basal liver-specific gene expression, but the inclusion of the entire exon I sequence (66 bp) results in high level and liver-specific expression of the gene.
A series of deletion constructs were then generated from plasmid pPC-1528 ( Fig. 2A) and tested for activity in HepG2 and HeLa cells (Fig. 2B). A deletion from Ϫ1462 to Ϫ723 resulted in a 30% reduction in activity in HepG2 cells, and a further deletion to Ϫ162 caused another 30% reduction in activity. This suggested the presence of at least two enhancer regions (PCE3 and PCE2) between Ϫ1462 and Ϫ162 in the promoter. These reductions in activity, resulting from the deletions from Ϫ1462 to Ϫ162, were not observed in the absence of the full exon I sequence (data not shown). This suggests that the function of these enhancer elements depends upon the initial assembly of the initiation complex on the protein C promoter. Further stepwise deletions from Ϫ162 to Ϫ82 resulted in an increase of ϳ4-fold in reporter gene activity, indicating the presence of a strong silencer element (PCS1) in this region. Deletion of the sequence from Ϫ82 to Ϫ42 resulted in a small but reproducible decrease in activity. Finally, a precipitous reduction in expression occurred upon deletion of the sequence from Ϫ42 to ϩ66 (PCE1). These experiments indicate that one or more promoter elements are located from Ϫ42 to ϩ66 and are functionally responsible for high efficiency transcription. This region also contains an Sp1 consensus sequence (ϩ58 to ϩ65) that may play a role in this activity (see below).
Deletion constructs were also transfected into HeLa cells to determine further which region(s) was responsible for directing the liver-specific expression of the protein C gene. Plasmid pPCϪ42-66, containing the region from Ϫ42 to ϩ66, exhibited much higher CAT expression than the promoterless pCAT-0 plasmid in HepG2 cells (Fig. 2B). In contrast, little increase was observed in HeLa cells. These data indicate that the region from Ϫ42 to ϩ66 contains strong liver-specific elements as well as other regulatory elements. Small increments with plasmid pPCϪ82-66 and a decrease in CAT activity with plasmid pPCϪ162-66 were observed in HepG2 and HeLa cells. Furthermore, the exon I-dependent enhancer activity in the region from Ϫ1462 to Ϫ162 was not observed in HeLa cells, suggesting that the enhancer elements are liver-specific regulatory sequences. Further investigation is needed to define the function of these enhancer elements.
DNase I Footprint Analyses-Nuclear proteins from HepG2 cells protected two distinct sequence areas in the promoter region from Ϫ42 to ϩ58 (Fig. 3, A and B). Sense strand footprints were designated FP-I (ϩ12 to ϩ30) and FP-II (Ϫ25 to ϩ9), as shown in Fig. 3C. Footprints on the antisense strand were very similar and showed only very minor differences.
These regions were not protected by the HeLa cell nuclear extract, as shown in Fig. 3A (lane 5). Comparison of the sequences of the FP-II region with known consensus binding sites revealed that this region contained sequences characteristic of binding sites for several liver-specific transcription factors. The downstream half-sequence of the FP-II region (Ϫ10 to ϩ9) was homologous to an HNF1 recognition site (23), whereas the upstream half-sequence (Ϫ25 to Ϫ11) included overlapping HNF3-binding sites homologous to two oppositely oriented TGTTT motifs (24), shown in Table I. Four point mutations have been reported in this region in patients with low protein C levels. These occur at Ϫ20 (A 3 G), Ϫ15 (T 3 A), ϩ3 (C 3 T) (25,26), and Ϫ2 (T 3 C) (27). The mutations at ϩ3 and Ϫ2 occur in the HNF1 site, and those at Ϫ15 and Ϫ20 in the HNF3 sites (Fig. 3C). The FP-I region, from ϩ12 to ϩ30, designated PCE1, was located within exon I, immediately downstream from the initiation sites. No homology has yet been found for this sequence when compared with recognition sequences for other known transcription factors.
To evaluate the effect of the mutations on transcription of the gene in the regions identified by DNase I footprinting, the pPC-1528 construct was mutated and transfected into HepG2 cells. A mutation in the HNF1-binding site at ϩ3 (C 3 T) or Ϫ2 (T 3 C) reduced the reporter gene activity by 90 and 85%, respectively, while a mutation in the HNF3 site at Ϫ 20 (A 3 G) reduced the activity by 82%. Finally, mutations in the PCE1 site at ϩ23, 24 (GG 3 AA) reduced the activity by 84% com- pared with the wild-type promoter. These results demonstrate that specific mutations in these protein-binding sites for HNF1, HNF3, and PCE1 greatly impair the promoter for the protein C gene.
Characterization of the Promoter by EMSA-Synthetic oligonucleotides used in EMSAs (Table I) were designed for the regulatory sites in the proximal promoter region for the normal protein C gene as well as for known mutations in genes of patients with coagulation disorders. With the HepG2 nuclear extract, all three wild-type double-stranded oligonucleotides (PCP1, PCP2, and PCE1; shown in Table I) designed from footprinted areas bound nuclear proteins, which were visualized as bands with retarded mobility (lanes 1 in Fig. 4, panels A-C). These DNA-protein complexes were competed and eliminated by a 20-or 200-fold molar excess of the corresponding unlabeled binding site oligonucleotide (lanes 2 and 3 in Fig. 4, panels A-C), indicating that the interactions of nuclear proteins with these binding sites are sequence-specific. When these sequences were tested with HeLa or fibroblast nuclear extract, little if any protein binding occurred (data not shown). These results, together with the footprint and CAT reporter functional assays, indicate that proteins bound to these regulatory sequences are liver-specific transcription factors.
As shown in Fig. 4A, DNA-protein complex formation by the PC(HNF1) oligonucleotide and HepG2 nuclear proteins was not influenced by the addition of mutated oligonucleotides (PC(HNF1,m1) and PC(HNF1,m2)) (lanes 4 -7), but was competed and abolished by 20-and 200-fold molar excesses of unlabeled HNF1 consensus oligonucleotide (lanes 8 and 9). Furthermore, 32 P-labeled PC(HNF1) sequences that were mu-tated (PC(HNF1,m1) and PC(HNF1,m2)) were also unable to bind hepatic nuclear proteins (Fig. 4A, lanes 10 and 11). Finally, the retarded bands formed by the oligonucleotide containing the HNF1 consensus sequence and HepG2 nuclear proteins were competed and eliminated completely by a 200fold molar excess of HNF1 and PC(HNF1) oligonucleotides (Fig. 4A, lanes 12-14). These results clearly indicate that a nuclear protein(s) binds to an HNF1 site in the promoter of protein C and that a single base mutation at ϩ3 (C 3 T) or Ϫ2 (T 3 C) abolishes this binding.
The PC(HNF3) oligonucleotide was also bound to HepG2 nuclear protein, but with low affinity. However, this DNA-

FIG. 2. Transient expression of CAT activities by deletion constructs transfected into HepG2 and HeLa cells.
A, a series of PC-CAT fusion constructs containing varying lengths of the protein C 5Ј-end sequences were transfected into HepG2 and HeLa cells. B, shown are CAT activities expressed by deletion constructs. CAT activity of pPC-1528 was arbitrarily defined as 100% in HepG2 cells and used as a reference to normalize the CAT activity data of other constructs.

FIG. 3. DNase I footprint analyses.
A, sense strand of the protein C promoter region. A DNA fragment containing the protein C promoter (from Ϫ42 to ϩ58) was labeled at the 3Ј-end of the sense strand and was subjected to DNase I digestion in the absence (lane 2) and presence of HepG2 nuclear extracts (lanes 3 and 4) and HeLa nuclear extracts (lane 5). A purine-specific sequence marker (G ϩ A; lane 1) was obtained by Maxam-Gilbert sequencing of the end-labeled fragment (55). B, antisense strand. Lane 1, G ϩ A sequence marker; lane 2, without nuclear extracts; lane 3, with HepG2 nuclear extracts. Brackets indicate regions that are protected from DNase I digestion. C, sequence of the protected regions identified as FP-I and FP-II. Naturally occurring mutations are indicated (1), as are transcription start sites (*). protein complex was competed and eliminated by a 20-or 200-fold molar excess of unlabeled PC(HNF3) oligonucleotide (Fig. 4B, lanes 6 and 7). Unlabeled mutated PCP2 oligonucleotide (PC(HNF3)) was unable to compete and abolish DNAprotein complex formation (Fig. 4B, lanes 4 and 5). Also, the 32 P-labeled PC(HNF3,m1) oligonucleotide was unable to bind hepatic nuclear proteins (Fig. 4B, lane 8). These results demonstrate that this site is an HNF3-binding site and that a single base mutation at Ϫ20 (A 3 G) can abolish its binding to hepatic proteins.
As shown in Fig. 4C, two retarded bands were formed when the PCE1 oligonucleotide designed from the FP-I region was incubated with HepG2 nuclear extract. Oligonucleotides designed from consensus sequences for the most abundant known hepatic transcription factors, including HNF1 (23), HNF3 and HNF4 (28), HNF5 (29), and C/EBP (30), were unable to compete and eliminate the retarded complexes formed by the PCE1 oligonucleotide (Fig. 4C, lanes 4 -13), indicating that this element is a unique and specific sequence. Also, the unlabeled mutated sequence (PCE1,m1) was unable to compete and eliminate the DNA-protein complexes (Fig. 4D, lanes 4 and 5). The 32 P-labeled PCE1,m1 oligonucleotide also failed to bind hepatic nuclear proteins (Fig. 4D, lane 6). These two base mutations were located at ϩ23 and ϩ24 (GG 3 AA).
Characterization of a Potential Sp1 Site-Sequences spanning from ϩ58 to ϩ65 (CCCGCCCC) contain an Sp1 consensus sequence (31) that has been examined by reporter gene assays. The pPC-1528 construct was mutated in the Sp1-binding site at ϩ63 (C 3 T), transfected into HepG2 cells, and compared with pPC-1528. The single base substitution employed in these experiments occurs in the proximal Sp1 site of the human low density lipoprotein receptor promoter, resulting in a heterozygous familial hypercholesterolemia (32). The mutation in the protein C promoter reduced the activity of pPC-1528 by only 20%. Cotransfection of pPCϪ42-66 containing the Sp1-binding site with an Sp1 expression plasmid (33) resulted in a 1.6-fold increase in activity, while no effect was observed with the pPCϪ42-34 construct lacking the Sp1 site. The small increase may be due in part to the relatively high level of Sp1 already present in HepG2 cells. The potential Sp1 site was further examined by EMSAs. An oligonucleotide from this region (ϩ53 to ϩ70; PC(Sp1)) bound to HepG2 nuclear proteins and formed retarded bands (Fig. 4E, lane 1). These bands were competed and abolished by a 20-or 200-fold excess of unlabeled PC(Sp1) or Sp1 consensus oligonucleotide, respectively (Fig. 4E, lanes 2,  3, 6, and 7). Unlabeled mutated oligonucleotide (PC (Sp1,m1)) was unable to compete and abolish DNA-protein complex formation (Fig. 4E, lanes 4 and 5). Also, the 32 P-labeled PC(Sp1,m1) oligonucleotide was unable to bind hepatic nuclear proteins (Fig. 4E, lane 8). These data are consistent with the proposal that the nucleotide sequence in the region from ϩ53 to ϩ70 may contain an active Sp1-binding site and that a single base mutation at ϩ63 (C 3 T) inhibits its binding to hepatic proteins.

DISCUSSION
This study has demonstrated that ϳ1.5 kb of DNA from the 5Ј-flanking region and 66 bp from the noncoding exon I sequences of the protein C gene contain sufficient information for high level expression of the gene in HepG2 cells. The data also indicate that the protein C promoter consists of at least three liver-specific regulatory elements and one general element that drives the high level, liver-specific expression of the gene. These elements include HNF1, HNF3, and PCE1 as well as a potential Sp1-binding site, all of which are located in the region surrounding the transcription initiation site.
HNF1␣, a homeodomain transcription factor, has been reported to be a major transactivator of numerous liver-specific genes and is also an activator of the protein C promoter (27). Cotransfection with HNF1␣ induced a 1.5-fold transactivation in the wild-type promoter and a 0.8-fold transactivation in a mutated promoter. These data are consistent with the present experiments showing that the HNF1-binding site is important for basal level transcription of the gene. Whether other factors of the HNF1 family participate in the transactivation of the protein C gene is not yet clear.
The binding affinity of hepatic nuclear protein(s) for the two HNF3 sites in the protein C promoter was quite low. Recently, it has been shown that cotransfection experiments with an HNF3 expression plasmid and the wild-type protein C promoter resulted in a 4 -5-fold increase in promoter activity in HepG2 cells (34). HNF3-binding sites have been identified as essential cis-acting elements in the promoters and enhancers of several liver-specific genes. However, the transactivation by HNF3 of an HNF3-dependent minimal promoter was relatively low since it did not exceed 4 -5-fold. In contrast, HNF1-dependent promoters show a Ͼ100-fold increase when excess HNF1 is present. Several laboratories have reported that an important role of HNF3 could be to cooperate with other factors bound to contiguous DNA elements, such as the glucocorticoid-responsive enhancer (29), the nuclear factor 1 element (24), the HNF4/ARP1/COUP-TF family-binding site (35), and the HNF1 element (36). The low binding of hepatic protein to the HNF3 site in the protein C gene may also be due to the absence of accessory proteins or sequences. Another proposed role for HNF3 involves the transition of chromatin from an inactive to an active conformation (37). In the case of the gene for protein C, binding of HNF3 to the PC(HNF3) sequence may contribute to opening the chromatin at or near the protein C promoter, therefore making it available for subsequent HNF1 binding and transcriptional activation.
Deletion analysis from the 3Ј-end revealed that the PCE1 site, a unique and liver-specific regulatory sequence, was the principal element for high efficiency transcription. Mutations at ϩ23 and ϩ24 (GG 3 AA) in this element abolished its binding to hepatic proteins and greatly decreased its transcriptional activity. Mutational analysis, cotransfection experiments, and EMSAs also showed a potential Sp1-binding site in exon I downstream from the PCE1 element. Sp1-binding sites have been demonstrated as an important regulatory element at transcription initiation sites for many TATA-less promoters, including the gene for factor VII (38). It is believed that the preinitiation complex is assembled around the multiple initiation sites directed by the tightly clustered regulatory elements in the proximal promoter region of the protein C gene. Any disruption in the promoter sequences surrounding the transcription initiation sites impairs the assembly of the preinitiation complex, causing a reduction in transcription efficiency.
Furthermore, this promoter region is similar to the 80-bp enhancer region described for the prothrombin gene in which the HNF1-binding site is flanked on the 3Ј-side by Sp1 sequences (Fig. 5).
cis-Acting elements located upstream from the promoter can also modulate the promoter activity. The upstream Ϫ162/Ϫ82 fragment decreases the activities of the strongly active pPCϪ82-66. This reduction, observed in HepG2 cells as well as in HeLa cells, may be due to a silencer element interacting with ubiquitous factors or to other effects, such as steric hindrances exerted on promoter elements. A possible HNF4-like element from Ϫ131 to Ϫ116 is also located in this region. Polymorphism in this region has been described to affect plasma protein C levels in the population (39). Further work is needed to elucidate the role of negative regulation in protein C gene expression.
One particular feature of the protein C gene among the vitamin K-dependent genes is that it contains an additional short noncoding exon I sequence upstream from the translation start codon (AUG), separated by a 1463-bp intron sequence. The gene coding for the only other vitamin K-dependent anticoagulant factor, protein S, has also been postulated to contain an additional noncoding exon I sequence since two transcripts have been observed (40). The participation of intron sequences in regulating protein C gene expression is currently under investigation.
It is common in many TATA-less promoters that transcription initiates from a cluster of sites surrounding ϩ1. Each of the initiation sites found in the protein C gene were surrounded by pyrimidine-rich sequences as characterized by most initiation sites of other genes. The ϩ1 and Ϫ7 sites were present in the HNF1-binding site, whereas the ϩ12 site was located in the PCE1 site (Fig. 3C). Several promoters, generally but not necessarily lacking the TATA-box, have an initiator element that can replace or reinforce the role of the TATA sequence in directing the location of a transcription start site. These initiator elements have recently been grouped into families based upon sequence homology (41). Sequences surrounding the multiple initiation sites of the protein C gene, however, were not homologous to any other known initiator sequence(s). It is unclear at this point whether an unidentified initiator element in the protein C gene or the clustered regulatory elements initiate the assembly of the transcription apparatus. This is very similar to the factor IX gene, where the promoter is characterized by a tight cluster of regulatory elements surrounding the transcription initiation site (42). Mutations found to date in the 5Ј-flanking region of the factor IX gene in patients with factor IX deficiency (hemophilia B Leyden phenotype) were all located in this tight cluster called the Leyden-specific region from Ϫ40 to ϩ20 in the 5Ј-end sequence. This is also comparable to the protein C gene, in which naturally occurring mutations in the 5Ј-region were located from Ϫ20 to ϩ3. Characterization of the protein C promoter led to the understanding of genetic disorders caused by known and possible additional mutations occurring in the 5Ј-region in patients with type I protein C deficiency.
In addition to protein C, there are six other vitamin K-dependent glycoproteins that circulate in blood, including factors VII, IX, and X, prothrombin, protein S, and protein Z. The genes for these vitamin K-dependent proteins share significant organizational similarity and have evolved from a common ancestral gene (43). It is also noted that five of these genes (not including the protein S and protein Z genes, the regulation of which has not been studied) are regulated by "TATA"-less promoters. Furthermore, transcriptional regulation of these genes shares certain common features (Fig. 5). The factor VII gene, which is located on chromosome 13q34-qter (44), is regulated by two promoter elements, the FVIIP1 site containing an HNF4-binding site and the FVIIP2-binding site present in a GC-rich sequence that binds hepatic specific factors as well as the ubiquitous transcription factor Sp1 (38). In addition, two silencer elements were located upstream of the promoter region. The factor IX gene, which is located on chromosome Xq26 -27, is regulated by the presence of liver-specific cisacting elements that interact with the liver enriched transcription factors C/EBP and HNF4 (45,46) and with the liverspecific transcription factors nuclear factor 1-like liver-specific protein and D-site-binding protein (DBP) (42). The factor IX gene may be hormonally regulated since the deficiency in hemophilia B Leyden, which is caused by mutations in the Leyden-specific regulatory region of the factor IX gene, can be partially overcome following puberty or by the administration of testosterone (47). In rat liver, DBP, which also recognizes some of the cis-elements as C/EBP, was not expressed until puberty (48). Reporter gene studies and DNA-protein binding assays with factor IX promoter sequences containing hemophilia B Leyden mutations of the C/EBP-binding site suggest that DBP may enhance C/EBP binding and transcription of the factor IX gene due to a synergistic interaction between C/EBP and DBP (49). Hence, the hormonal regulation of the factor IX gene is probably due to the induction of DBP expression during puberty rather than the presence of an androgen-responsive element in the factor IX gene. The factor X gene is located 2.8 kb downstream from the factor VII gene on chromosome 13q34qter. The factor X gene is regulated by three positive regulatory regions (FXP1, FXP2, and FXP3 sites) and a negative element that blocks the transcriptional activity toward the upstream factor VII gene (17,50). Transfection in HepG2 and human fibroblast cells suggests that the FXP1 and FXP3 sites interact with liver-specific trans-activating factors, while the FXP2 site interacts with ubiquitous transcription factors. Furthermore, the FXP1 site contains a 22-bp sequence similar to the consensus recognition site for the liver-specific transcription factor HNF4. This HNF4-binding element in the FXP1 site has a 6-bp core sequence (CTTTGC) that is also present in the HNF4binding element present in the factor VII and factor IX promoters. The prothrombin gene is located on chromosome 11p11-q12 (51). It contains a weak promoter immediately before the tran- FIG. 5. Schematic comparison of the known transcription regulatory sites present in the genes coding for the human vitamin K-dependent coagulation proteins. Red inverted triangles correspond to silencer or repressor elements. Upright triangles correspond to positive regulatory elements. Black triangles indicate elements with no known homologous sequences. Other elements are labeled according to their corresponding transcription factors. FVII, factor VII; FIX, factor IX; FX, factor X; PC, protein C; NF-1, nuclear factor 1. scription initiation site and a liver-specific enhancer sequence located 860 -940 nucleotides from the transcription initiation site. The latter region apparently interacts with HNF1 and is flanked on the 3Ј-side by GC-rich sequence that is similar to an Sp1-binding site and is essential for enhancer activity (52)(53)(54). The 10-base pair GC-rich sequence shares 90% sequence identity with the Sp1-binding site present in the factor VII promoter. As shown in Fig. 5, there are a number of common regulatory units shared by the vitamin K-dependent proteins as well as unique sequences that regulate the individual proteins.