Characterization of the human cyclin-dependent kinase 2 gene. Promoter analysis and gene structure.

Cyclin-dependent kinase 2 is a serine/threonine protein kinase essential for progression of the mammalian cell cycle from G to S phase. CDK2 mRNA has been shown to be induced by serum in several cultured cell types. Therefore, we set out to identify elements that regulate the transcription of the human CDK2 gene and to characterize its structure. This paper describes the cloning of a 2.4-kilobase pair genomic DNA fragment from the upstream region of the human CDK2 gene. This fragment contains five transcription initiation sites within a 72-nucleotide stretch. A 200-base pair sub-fragment that confers 70% of maximal basal promoter activity was shown to contain two synergistically acting Sp1 sites. However, a much larger DNA fragment containing 1.7 kilobase pairs of upstream sequence is required for induction of promoter activity following serum stimulation. The intron exon boundaries of seven exons in this gene were also identified, and this information will be useful for analyzing genomic abnormalities associated with CDK2.

Cyclin-dependent kinase 2 is a serine/threonine protein kinase essential for progression of the mammalian cell cycle from G 1 to S phase. CDK2 mRNA has been shown to be induced by serum in several cultured cell types. Therefore, we set out to identify elements that regulate the transcription of the human CDK2 gene and to characterize its structure. This paper describes the cloning of a ϳ2.4-kilobase pair genomic DNA fragment from the upstream region of the human CDK2 gene. This fragment contains five transcription initiation sites within a 72-nucleotide stretch. A 200-base pair sub-fragment that confers 70% of maximal basal promoter activity was shown to contain two synergistically acting Sp1 sites. However, a much larger DNA fragment containing ϳ1.7 kilobase pairs of upstream sequence is required for induction of promoter activity following serum stimulation. The intron exon boundaries of seven exons in this gene were also identified, and this information will be useful for analyzing genomic abnormalities associated with CDK2.
Cyclin-dependent kinases (CDKs) 1 are the catalytic subunits of a family of serine/threonine protein kinase complexes that are also composed of a cyclin regulatory subunit (1)(2)(3). Most members of the CDK family are involved in regulating the progression of the eukaryotic cell cycle at various stages throughout G 1 , S, G 2 , and M phases (4). Other CDKs are involved in regulation of other processes in the cell, including phosphate metabolism (5) and transcription (6,7).
CDK2 is a member of the CDK family whose activity is restricted to the G 1 /S phase of the cell cycle. Several experiments demonstrated that CDK2 is essential for the mammalian cell cycle progression; micro-injection of antibodies directed against CDK2 blocked the progression of human diploid fibroblasts into S phase (8,9), and overexpression of a CDK2 dominant negative mutant in human osteosarcoma cells had a similar effect (10).
CDK2 is subject to an elaborate series of post-translational modifications. Although it has no kinase activity itself, kinase activity is conferred by association of CDK2 with a regulatory subunit, cyclin A or cyclin E, and by phosphorylation of Thr-160. Conversely, CDK2 activity is repressed by phosphoryla-tion of Thr-14 or Tyr-15. Another layer of complexity is added to the regulatory scheme by CDK inhibitory proteins that can bind to CDK2 and inhibit the activity of the cyclin-kinase complex (4).
While much attention has been given to the post-translational regulation of CDK2, we and others have found that CDK2 is also regulated at the transcriptional level. Horiguchi-Yamada et al. (11) reported a 3-fold increase in CDK2 mRNA in HL60 cells following stimulation with the phorbol ester 12-otetradecanoyl 13-acetate. Other groups (12) had similar findings with serum-stimulated human keratinocytes and human lung fibroblasts. Tanguay et al. (13) found induction of CDK2 expression in primary B lymphocytes following anti-IgM stimulation. These data suggest that transcriptional regulation of CDK2 could be important in the transition of cells from G 1 to S phase.
Our interest in CDK2 transcriptional regulation originated from our observation that CDK2 protein is undetectable by immunohistochemistry in sections of normal rat carotid arteries but is rapidly induced in smooth muscle cells of rat carotid arteries after balloon injury (14). This manuscript reports the cloning of the human genomic DNA upstream of the coding region of CDK2. Most (70%) of the basal transcriptional activity of this promoter was localized to a 210-base pair (bp) fragment. Two Sp1 sites in this region were shown to contribute cooperatively to this transcriptional activity. The serum-induced activity of the promoter is located in a ϳ1.7-kilobase pairs (kb) region starting 680 bp upstream of the most proximal transcription initiation site.

MATERIALS AND METHODS
Plasmids and Constructs-pGL2-Basic (Promega) was used to generate luciferase reporter gene constructs. pCMV/SEAP (Tropix), which contains the secreted alkalaine phosphatase (SEAP) gene driven by the cytomegalovirus (CMV) promoter, was used in cotransfection experiments. pBR-␤-Puromycin, a plasmid expressing the puromycin resistance gene driven by the ␤-actin promoter, was a kind gift of L. Lee of the S. N. Cohen Lab (Stanford University) and was used to generate stably transfected cell lines by cotransfection. DSC34 was generated by cloning a ϳ2.4-kb AvrII-PstI fragment (Fig. 1, fragment B) from inverse PCR-amplified fragment A into pUC19 (New England Biolabs) digested with PstI and XbaI. DSC36 was generated by cloning a blunt-ended ϳ2.4-kb PstI-Asp718 fragment of DSC34 into HindIII-digested, bluntended pGL2-Basic, such that the CDK2 promoter directs transcription away from the luciferase gene. DSC37 was constructed the same way as DSC36, except that the CDK2 promoter directs transcription toward the luciferase gene. DSC40 was generated from DSC37 by deleting from the BamHI site in the insert to a BglII site in the poly-cloning region of pGL2-Basic. DSC40⌬4 -1, DSC40⌬6 -3, DSC40⌬9 -17, DSC40⌬10 -10, and DSC40⌬10 -16 were generated by exonuclease III/mung bean nuclease deletions (15) using NheI/SacI-digested DSC40. The end points of the deletions were determined by sequencing. DSC42 was generated from DSC37 by deletion of a BglII-Eco47III fragment. DSC51 was generated from DSC40 by deleting an Eco47III-Bsp120I fragment. DSC67 and DSC68 were generated from DSC40⌬9 -17 by site-directed mutagenesis (see below).
PCR Amplifications-The positions of the 5Ј-end of all primers are * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EMBL Data Bank with accession number(s) U50730.
RNase Protection Assays-The 250-bp Eco47III-PstI fragment from DSC34 was cloned into pT7T318U (Pharmacia Biotech Inc.) digested with HincII and PstI. The resulting plasmid was linearized with EcoRI and transcribed in vitro with T3 RNA polymerase in the presence of [␣-32 P]ATP to generate an antisense probe. Ribonuclease protection was performed as described (19) using RNA isolated from human umbilical vein cells (ATCC), according to Chirgwin et al. (20). Yeast RNA (Sigma) was used as a negative control. The size of the protected products was determined from a sequencing ladder run alongside the samples.
DNase I Protection Assays-Protection assays were carried out as described (21), using purified Sp1 protein (Promega). An XmaI fragment from DSC40 (for Fig. 5, panel A) and a Bsp120I fragment from DSC68 and DSC69 (for Fig. 5, panels B and C) were radiolabeled using [␥-32 P]ATP and T4 polynucleotide kinase. These labeled fragments were digested with PvuII (for a fragment derived from DSC40) or BglI (for fragments derived from DSC68 or DSC69) to obtain fragments exclusively labeled at the 5Ј-end of the bottom strand. A primer (5Ј-CCGGGTCGGGATGGAACG-3Ј) starting at the 5Ј-end of the XmaI fragment was used to generate a parallel sequencing ladder.
Site-directed Mutagenesis-DSC40⌬9 -17 was mutagenized using the U. S. E. Mutagenesis Kit from Pharmacia with the following oligonucleotides: 5Ј-TTTCCCTGGCTCCGAACCAGGC-3Ј and 5Ј-CACCA-GAGGCCCCGAACTGCTTCCCGCGTTT-3Ј, which are the Sp1 mutagenic oligonucleotides, and 5Ј-CATCGGTCGATGGATCCAGAC-3Ј, which was used to mutate the SalI site in the vector (mutated nucleotides are underlined). The SalI site change was used to enrich and screen for mutated plasmids. Mutagenesis was verified by sequence analysis.
Cell Culture Methods-All tissue culture reagents were purchased from Life Technologies, Inc., except where indicated. NIH3T3 cells were grown in Dulbecco's modified essential medium containing 10% calf serum, 100 units/ml penicillin, G and 100 g/ml streptomycin. For serum stimulation experiments, cells were serum starved by growing them for 72 h in Dulbecco's modified essential medium containing 0.5% calf serum. Cells were stimulated with growth medium containing 10 ng/ml basic fibroblast growth factor and 1 ng/ml epidermal growth factor. Cells were transfected using LipofectAMINE (Life Technologies, Inc.) according to manufacturer's instructions. For transient assays 1.5 g of the luciferase-expressing plasmids were cotransfected with 0.5 g of pCMV/SEAP. Conditioned media were collected 24 h after transfection and assayed for SEAP using the Phospha-Light kit (Tropix). Cells were harvested and assayed for luciferase activity using the Luciferase Assay System (Promega). Transient transfections were repeated at least two independent times in duplicates. Stable cell lines were generated by cotransfecting 1.8 g of a luciferase-expressing construct and 0.2 g of pBR-␤-Puromycin, a plasmid expressing the puromycin resistance gene. Resistant cells were selected with puromycin (2 g/ml, Sigma) 24 h after transfection. After 5-10 days of selection, single resistant colonies were isolated and expanded.

Cloning Genomic Upstream Sequences of the Human CDK2
Gene-Inverse PCR (17) was employed to clone a 4.2-kb genomic DNA fragment upstream of the known cDNA sequence corresponding to a BglII-BclI fragment (Fig. 1, fragment A). Sequence analysis revealed that this fragment contains an intron, located just upstream of the first BglII site in the coding region. A 2.4-kb AvrII-PstI subclone (fragment B), containing only exon sequences upstream of the translational start site, was used for subsequent transcription analyses. The cloned inverse PCR product was shown to be part of the CDK2 gene by identifying the sequence junction of the published CDK2 cDNA and the upstream sequence. In addition, in situ hybridization (data not shown) mapped the cloned CDK2 upstream sequence to the chromosomal locus 12q13, corresponding to a previously published report (22).
Sequencing the Upstream Region of the Human CDK2 Gene and Mapping the 5Ј-End of the mRNA-The nucleotide sequence 1.1 kb upstream of the ATG translation initiation codon was determined from both strands as shown in Fig. 2. To determine the transcription start site, a ribonuclease protection assay was performed using RNA isolated from human umbilical vein cells and an in vitro transcribed RNA probe extending from the PstI site just upstream of the translation initiation codon to the Eco47III site 250 bp upstream (Fig. 3). Five transcription start sites were identified. The most downstream site was designated as nucleotide ϩ1 in Fig. 2. Three transcription start sites are clustered at positions ϩ1, Ϫ5, and Ϫ9. Two additional sites are located at positions Ϫ33 and Ϫ71. The Ϫ33 site maps close to the 5Ј-end of the longest published human CDK2 cDNA (23), which was isolated from HeLa cells, whereas the -9 start site maps close to the 5Ј-end of a different cDNA clone (15) that was also isolated from HeLa cells. No consensus TATA box was identified upstream to any of the transcription start sites nor was one identified anywhere else in the sequenced upstream region. Putative transcription factor binding sites were identified using manual scanning and the TFD data base (24) in conjunction with the MacPattern program (25) (see Fig. 2). Two consensus Sp1 elements were found to lie in proximity to the two upstream transcription start sites. Sp1 is known to guide initiation in some TATA-less promoters (26), and so we hypothesized that these elements might be functionally important in the human CDK2 gene. A binding site for YY1, another factor also known to determine the sites of initiation in some TATA-less promoters (26), was also identified upstream of the three transcription start sites clustered at positions ϩ1, Ϫ5, and Ϫ9. Other putative tran- scription factor binding sites identified in the upstream region of the human CDK2 gene include multiple AP-2, E2F, and p53 binding sites as well as single sites for AP-1, c-myb, oct,

HiNF-A, and NFY/CTF, a CCAAT box binding factor.
Functional Analysis of the Basal Activity of the CDK2 Promoter-The CDK2 promoter region was analyzed by transient transfection of luciferase reporter gene constructs into NIH3T3 cells. Luciferase activity was corrected for differences in transfection efficiency by cotransfection with a plasmid expressing the SEAP gene driven by the CMV promoter (see "Materials and Methods"). Deletion analysis of the CDK2 promoter (Fig. 4) revealed that a 210-bp fragment containing 100 bp upstream of the most proximal transcription start site (DSC40⌬9 -17) contains the required elements for approximately 70% of the promoter activity that is generated by a full-length construct (DSC37). A further deletion to nucleotide Ϫ15 (DSC40⌬10 -10) reduced the activity to less than 3% of that generated by the full-length construct (DSC37). This activity was similar to the background activity generated by the vector alone (pGL2-Basic). An internal deletion that removes all the transcriptional start sites (DSC51) also had no promoter activity above background as did a reporter construct containing the full-length sequence in the reverse orientation (DSC36).
DNase I protection analysis of the region contained in DSC40 using HeLa nuclear extracts identified two protected regions, each of which contained Sp1-like binding sequences (data not shown). To test the importance of these Sp1 sites, a conserved GG sequence in each of the Sp1 sites was independently mutated to AA. A DNase I protection assay (Fig. 5A) demonstrated that the wild type DNA fragment was protected by purified Sp1 protein from DNase I digestion at two distinct regions (I and II); these regions were the same as those detected with HeLa nuclear extract. Mutating each of these Sp1 sites individually resulted in loss of protection in the mutated Sp1 site but did not affect Sp1 binding to the remaining wild type Sp1 site (Fig. 5,  B and C). Transient transfection of NIH3T3 cells with con- structs analogous to DSC40⌬9 -17, except containing mutations in either one of the Sp1 sites (Fig. 6, DSC67 and DSC68), generated luciferase activity that was less than 25% of the activity generated by the full-length CDK2 promoter construct (DSC37), or approximately 30% of that generated by DSC40⌬9 -17. These results indicate that each of these Sp1 sites contributes to the observed transcription activity. Moreover, it also suggests that these sites act synergistically to generate transcriptional activity that is greater than the sum of activities each site can generate by itself.
Analysis of Serum-induced Activity of CDK2 Promoter-To analyze the serum inducibility of the cloned CDK2 promoter region, stably transfected NIH3T3 cell lines expressing luciferase from CDK2 promoter deletion derivatives were established. Cells were serum starved for 72 h prior to being exposed to serum and growth factors (Fig. 7). Luciferase activity increased 3-fold 12 h after serum stimulation of cells stably expressing the full-length construct (DSC37). In contrast, no induction by serum was observed with cells stably expressing DSC40, which exhibits full basal activity, but is about 1.7 kb shorter than DSC37. The same results were obtained with two independently isolated cell lines stably expressing the same constructs (data not shown).
CDK2 Gene Structure-PCR amplifications with pairs of primers that overlap most of the published human CDK2 cDNA sequence were used to determine the intron/exon junctions of this gene. Human genomic DNA and total human RNA were used as amplification substrates. Fragments from DNA amplification that were larger than the respective fragments amplified from RNA were cloned. Each cloned fragment was sequenced from both ends until an exon/intron boundary was reached. Seven exons were identified, and their positions are indicated in Fig. 8.

DISCUSSION
In this study, we have cloned and sequenced the upstream region of the human CDK2 gene and determined the transcription start sites for this gene by ribonuclease protection assay. Five transcription start sites spread over a 72-bp region were identified (Fig. 3). No consensus TATA box was identified in the FIG. 4. Deletion analysis of CDK2 promoter activity. Luciferase constructs are depicted on the left side. The 5Ј-end of each of construct relative to the proximal transcription start site is indicated. Luciferase activity was divided by the activity of the cotransfected SEAPexpressing construct to correct for differences in transfection efficiency (see "Materials and Methods") and is expressed as a percentage of DSC37 activity. Bars represent standard errors of the mean. entire upstream sequence. Thus, this promoter falls into the category of TATA-less promoters similar to all other cell cycle genes analyzed to date including: cdc2 (27), cyclin A (28), cyclin D1 (29,30), cyclin D2 and cyclin D3 (31), as well as Xenopus laevis cdk2 (32). A YY1 box, which in some TATA-less promoters is responsible for determining the transcription start site (26), is present just upstream from the three start sites located at positions ϩ1, Ϫ5, and Ϫ9. An Sp1 site was identified upstream of each of the remaining transcription start sites (Ϫ33 and Ϫ71), suggesting that these Sp1 regions may be responsible for localizing the start of transcription at these sites (26). Other putative transcription factor binding sites were also identified (Fig. 2). The presence of a c-Myb binding site is intriguing since c-myb was shown to transactivate the closely related human CDC2 gene (33). This could indicate that a transcription factor that positively regulates a G 2 event, like CDC2 induction, might also regulate a G 1 event such as CDK2 induction. Two putative p53 binding sites were identified within 200 bp of the 3Ј or most proximal transcription start site. p53 is a known tumor suppressor gene that has been postulated to be involved in induction of cell cycle arrest. It is perplexing to assume that p53 would induce CDK2 since this induction would most likely result in an accelerated cell cycle rather than a cell cycle arrest. Interestingly, a p53 site was also identified in the promoter region of the cyclin A gene, a regulatory partner of CDK2 (28). Further investigation of the possible involvement of p53 in CDK2 regulation is required.
Functional analysis of the promoter region revealed that a construct (DSC40⌬9 -17) that contains DNA extending from nucleotide Ϫ100 to ϩ108 is sufficient for strong basal promoter activity (about 30% of the SV40 early promoter, data not shown). DNase I footprint analysis of the CDK2 upstream region with HeLa nuclear extract (data not shown) revealed only two protected regions, both of which are Sp1 like sites, contained within the DSC40⌬9 -17 clone. Further analysis indicated that these sites in fact bind purified Sp1 protein (Fig.  5). Furthermore, individually mutating each site abolished the DNase I protection only in the mutated site but not in the adjacent wild type site. This information indicates that Sp1 can bind to each of these sites in an independent fashion. The transcriptional activity of reporter gene constructs equivalent to DSC40⌬9 -17, but with individually mutated Sp1 sites (Fig.  6, DSC67 and DSC68), was less than 25% of the activity generated by the full-length CDK2 promoter construct (DSC37) and approximately 30% of that generated by DSC40⌬9 -17. This suggests that each of these Sp1 sites contributes to the basal activity of the CDK2 promoter. It also suggests that their combined effect is synergistic, since both sites generate transcriptional activity that is greater than the sum of the activities generated by each site independently.
The level of CDK2 mRNA induction following stimulation of quiescent cells was reported to be 2-3-fold (11)(12)(13). Our attempts to detect this low level of serum-induced promoter activity using a transient transfection cell culture system produced ambiguous results, presumably because there is plasmid loss over time, and this loss masks the serum-induced promoter activity of the retained plasmids. To overcome this problem, NIH3T3 cell lines stably expressing the luciferase enzyme driven by various CDK2 promoter constructs were established. The basal luciferase activity of the cell lines in this study was comparable; however, only cells which contained about 2.4 kb of the upstream region of the CDK2 gene (DSC37) were induced by serum. The level of induction following serum starvation and maximal growth factor stimulation was about 3-fold, as was expected from the published literature and our own unpublished observations. The next longest deletion derivative, DSC40, which expressed full basal promoter activity in a transient transfection assay, was not induced by serum and growth factor stimulation. These data suggest that the information needed for serum induction resides in a ϳ1.7-kb segment, which starts 682 nucleotides upstream of the most proximal transcription start site.
We found that the human CDK2 gene is made up of at least seven exons. However, our characterization would not detect exons located 3Ј to position 1295 in the published cDNA sequence (15). All the intervening sequences that were identified are contained within the coding region of the gene. Exon I is longer in CDK2 than in the characterized CDC2 genes (27,34) and is conserved in X. laevis cdk2 (32). Other differences between the CDK2 and the CDC2 gene structure include two additional introns located at amino acids 105 and 196 of the human CDK2 gene that are not present in the Sacchromyces pombe CDC2 gene. The CDK2 gene structure and sequence information published here may be useful for designing primers to investigate possible CDK2 gene mutations and rearrangements. Although CDK2 has not been implicated in oncogenic transformation, one of its regulatory partners, cyclin A, has been implicated in human hepatocellular carcinoma (35), and its other regulatory partner, cyclin E, has been shown to accelerate G 1 progression if overexpressed (36). It is thus plausible to assume that CDK2 mutations might play a role in malignancy and may prove worthwhile targets for exploration of genetic instability in tumors.
In summary, the elements required for basal expression and serum induction of the human CDK2 promoter were localized to a ϳ2.4-kb fragment. Basal level expression of the CDK2 promoter is fully contained within 290 bp upstream of the most proximal transcription start site (DSC40⌬6 -3), and approximately 70% of the activity can be generated by a 200-bp fragment containing only 100 bp upstream of the most proximal transcription start site. Two Sp1 DNA binding sites identified in this region synergistically contribute most of the basal promoter activity of this region. The elements required for serum inducibility lie about 700 bp further upstream and are contained in a ϳ1.7-kb fragment. Multiple sites with homology to known transcription factor binding sites are located in the promoter region of the human CDK2 gene. Further analysis of these sites and their corresponding transcription factors is necessary for a more complete understanding of the transcriptional regulation of this gene. FIG. 8. Exon map of the human CDK2 gene. Boxes correspond in length to the exon size. The end of exon VII was not determined. Nucleotides are numbered relative to the most proximal transcription initiation site. Below the map, the exon/intron boundaries are aligned with each other and with the consensus splice acceptor and splice donor sequences. 100% conserved nucleotides are underlined.