Organization of the chick CDC37 gene.

CDC37 and the chaperone protein, Hsp90, form a complex that binds to several kinases, resulting in stabilization and promotion of their activity. CDC37 also binds DNA and glycosaminoglycans in a sequence-specific manner. In this study, we further characterize chick CDC37 and examine the organization of the CDC37 gene. Chick CDC37 is a approximately 50-kDa protein encoded by an mRNA of approximately 1.7 kilobases. The CDC37 gene is approximately 8.5 kilobases and contains 8 exons and 7 introns of various sizes. The presumptive promoter and 5'-flanking regions contain an E2 box and consensus binding sites for SP1, for the S8 homeodomain protein, and for two zinc finger clusters within the myeloid progenitor transcription factor, MZF1. Particularly striking is a approximately 470-base pair region composed of a highly repetitive 10-11-base pair sequence, (T/C)gCTAT(A/G)GGG(A/T) (where g represents the additional G present in the 11-base pair sequence). This region includes 15 copies of the sequence, TATGGGGA, which conforms to the DNA consensus sequence recognized by one of the zinc finger clusters in MZF1. These findings emphasize the potential importance of CDC37 in regulation of cellular behavior during tissue development and reorganization.

Genetic studies have shown CDC37 is essential for the START event in yeast (1,2), but other studies have shown that CDC37 is probably required in G 2 /M as well as G 1 (3,4). Vertebrate CDC37 has been cloned from several species (5)(6)(7)(8), and, recently, it was shown to be identical to p50 (8,9). Initially, p50 was characterized as a component of a ternary complex with the chaperone protein, Hsp90, and p60 src kinase (10,11). Based on these and other studies, it was suggested that the p50-Hsp90-p60 src kinase complex may mediate trafficking of p60 src kinase to the plasma membrane, and that similar events occur with several other kinases (11)(12)(13)(14). Further work showed that the Drosophila homologs of CDC37 and Hsp90 are involved in signal transduction in the sevenless pathway (3). It is now known that CDC37 and Hsp90 form a complex whose interaction with various protein kinases, e.g. Cdk4, p60 src kinase, casein kinase II, MPS1 kinase, and Raf-1, is required for optimal activity (8, 14 -17). The CDC37-Hsp90 complex may stabilize these enzymes or enable their correct folding, although there is some evidence that CDC37 can act independently as a chaperone (16). Thus, CDC37 most likely plays several important roles in intracellular signaling, especially with respect to the cell cycle.
In addition to the chaperone interactions described above, CDC37 binds DNA in a sequence-specific manner at a site that is also recognized by the retinoblastoma gene product, pRB 1 (18). CDC37 and pRB also bind specifically to each other, suggesting that CDC37-pRB-DNA interactions may be involved in cell cycle regulation (19).
We have shown that CDC37 contains two binding motifs for the polysaccharide, hyaluronan, and that CDC37 binds hyaluronan and related glycosaminoglycans: chondroitin sulfate, heparan sulfate, and heparin (5). Although these glycosaminoglycans are usually associated with extracellular matrices, they are also present at the cell surface (20) and within the cytoplasm and nucleus of several cell types (21)(22)(23). Although their intracellular functions are not yet established, it is noteworthy that a specific subpopulation of heparan sulfate is targeted to the nucleus of hepatoma cells (21,24) and that heparin inhibits Fos-and Jun-induced transcription events in vitro and in situ within the nucleus (25). The latter events may occur via competition with DNA binding (18,25). Another possibility is that nascent hyaluronan, attached to hyaluronan synthase at the inner face of the plasma membrane (26), interacts with CDC37 at G 2 /M since published evidence indicates that hyaluronan synthesis may be required for completion of mitosis (27).
Because of the apparent importance of CDC37 as a regulatory protein, we have further characterized chick CDC37 and analyzed the organization of its gene, the first CDC37 gene for which this has been done.

EXPERIMENTAL PROCEDURES
Northern Blot Analysis-For RNA preparation, limb and brain tissue from chick embryos at various stages of development were dissected, frozen immediately on dry ice, and stored at Ϫ80°C. Total RNA was extracted with Trizol reagent (Life Technologies, Inc.) according to the manufacturer's instructions. Poly(A) ϩ RNA was prepared by oligo(dT)cellulose chromatography. Samples of the poly(A) ϩ preparations were subjected to electrophoresis in formaldehyde gels and transferred to nitrocellulose membranes. The probe used for hybridization was labeled with [ 32 P]dCTP using a random priming DNA labeling kit (Boehringer Mannheim). Hybridization and washes were done under standard high stringency conditions (28).
After reverse transcription, a Marathon cDNA Amplification Kit was used according to the manufacturer's instructions (CLONTECH) for 3Јand 5Ј-RACE. The primer, RS1, used for 3Ј-RACE was 5Ј-CTGCTAAT-TACCTGGTCATCTGG-3Ј; this primer corresponds to bases 314 -336 in our partial cDNA for CDC37, NG-13 (5), and is underlined in Fig. 1 at residues 675-697. The primer, RAS1, used for 5Ј-RACE was 5Ј-GC-CAACTCCAGGATGAACTGCAT-3Ј; this is the reverse complement of bases 403-425 of NG-13 (5) and bases 764 -786, underlined in Fig. 1. Unique bands obtained by 3Ј-and 5Ј-RACE were purified and ligated * This work was supported by National Institutes of Health Grants DE05838 and HD23681 (to B. P. T.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM  into the pCRII vector (Invitrogen) for transformation, cloning, and sequencing. The 3Ј-and 5Ј-RACE products were fused by PCR, and the resulting full-length cDNA was cloned into the same vector for sequencing. To obtain as much of the 5Ј-UTR sequence as possible, the primer PAS1 ( Fig. 1) was also used in 5Ј-RACE. The size of 5Ј-RACE products was determined by agarose gel electrophoresis, transfer to nitrocellulose, and hybridization with the probe, GS1 (Fig. 1). Another sample was then electrophoresed on agarose for cloning and sequencing of the products.
Determination of Gene Organization by PCR-Primers GS1 and GAS1 (shaded in Fig. 1), which correspond to the extreme 5Ј and 3Ј ends, respectively, of the cDNA extended by 5Ј-and 3Ј-RACE, were used in PCR with chick genomic DNA (CLONTECH) as template. Elongase (Life Technologies, Inc.) was employed according to manufacturer's instructions to enable accurate amplification of long PCR products. The amplified genomic DNA product was cloned into the pCR-Script vector (Stratagene). Boundaries between exons and introns were determined by PCR and sequencing, using primers complementary to sequences at various positions along the cDNA.
To determine genomic sequences upstream of the cDNA sequence, inverse PCR (30) was used. Aliquots of chick genomic DNA were digested with BamHI, EcoRV, EcoRI, or HincII restriction enzymes. Each digest was diluted to 2 ng DNA/ml and ligated with DNA ligase (Life Technologies, Inc.). The circularized DNA digests were then used as templates for PCR with two primers (PAS1 and PS1, underlined in Fig.  1) corresponding to sequences in the first exon of the CDC37 gene.
Sequencing-After ligating into the pCRII or pCR-Script vector and transforming bacteria, the nucleotide sequences of selected cloned inserts were determined by the double-stranded DNA/dideoxy chain termination method (31) using a Sequenase 2.0 kit (U. S. Biochemical Corp.).
Antibody to CDC37 Fusion Protein-The partial length cDNA for CDC37, NG-13 (5), was ligated into the pGEX-2T vector (Pharmacia Biotech Inc.) and used to transform competent Escherichia coli DH ␣5; the transformed cells were grown and induced with 0.2 mM isopropyl-1-thio-␤-D-galactopyranoside at 37°C. Fusion protein was purified on an affinity column of glutathione-Sepharose following the manufacturer's instructions (Pharmacia), followed by preparative SDS-PAGE and electro-elution of the protein from the gel. Polyclonal antibody was raised in rabbits (HTI Bioproducts, Inc.), and the antiserum was purified by antigen affinity chromatography.
Western Blot Analysis of Chick Embryo Fibroblasts-Chick embryo fibroblasts (line CL-29 from American Type Culture Collection) were cultured in Dulbecco's modified Eagle's medium plus 10% fetal bovine serum and 1% antibiotics. At confluence, the cells were washed with PBS and lysed with extraction buffer (0.05 M Tris-HCl, pH 7.5, 0.5 M NaCl, 1% Nonidet P-40, 0.1% SDS, 5 M leupeptin, 5 M pepstatin, 5 ng/ml aprotinin, 2.5 mM phenylmethylsulfonyl fluoride). Insoluble material was removed by centrifugation, and the lysate was subjected to SDS-PAGE, transferred to polyvinylidene difluoride membrane, and probed with the purified polyclonal antibody or with the monoclonal antibody, IVd4, used for immunoscreening (5).

RESULTS AND DISCUSSION
Cloning of Full-length cDNA for Chick CDC37-In a previous study (5) we obtained a partial cDNA for CDC37, termed NG-13, by immunoscreening of a chick embryo heart cDNA library. By Northern blot analysis, the size of the chick CDC37 mRNA was found to be ϳ1.7 kb (5). Using 3Ј-and 5Ј-RACE with primers RS1 and RAS1 (Fig. 1), followed by PCR-mediated fusion of the two products, we have now obtained a larger chick CDC37 cDNA of ϳ1.6 kb. To extend the 5Ј-UTR sequence as far as possible, we also performed 5Ј-RACE with a primer, PAS1, situated at the beginning of the open reading frame. After electrophoresis through agarose, the products of this reaction were transferred to nitrocellulose membrane then hybridized with a probe just upstream of the PAS1 primer, i.e. GS1 (Fig.  1), to identify all products. A single band of ϳ130 bases was obtained. On cloning and sequencing, this band was found to contain several products of similar size but differing in length by a few nucleotides at their 5Ј ends; the sequence of the longest product is given as the 5Ј terminus of the cDNA in Fig.  1. Because of the above approaches, we conclude that this sequence includes virtually all of the 5Ј-UTR.
The full-length cDNA described above contains an open reading frame of 1179 bases, encoding a 393-amino acid polypeptide with a predicted molecular mass of ϳ45 kDa (Fig. 1). The first ATG codon conforms to the Kozak consensus for the translation initiation site; in addition, the presence of an in-frame stop codon immediately upstream of this ATG (Fig. 1) is consistent with this conclusion. The 3Ј-UTR contains the consensus polyadenylation signal, AATAAA.
Analysis of Primary Structure of Chick CDC37-The amino acid sequence of chick CDC37 has 84% identity and 91% similarity to that of human, and 82% identity and 90% similarity to that of mouse (7,8) (Fig. 2). These are the vertebrate homologues of yeast and Drosophila CDC37, previously characterized by genetic methods (2,3).
A novel observation previously made in our laboratory was that the amino acid sequence of chick CDC37 includes two consensus motifs for hyaluronan binding (5). We also demonstrated directly that recombinant chick CDC37 binds hyaluro-nan and other related glycosaminoglycans (5). Comparison with the human and mouse sequences reveals that these hyaluronan-binding motifs are conserved among vertebrate species (amino acid residues 163-173 and 268 -276 in Fig. 2). Of these motifs, the latter is the classic B(X 7 )B motif (where B is arginine or lysine and X is any non-acidic amino acid); the former is B(X 8 )BB, which would be expected to have equivalent activity (32).
Other motifs are also found in human and mouse, as well as in the chick amino acid sequence. At least one putative tyrosine phosphorylation site is present in CDC37 and is highly conserved across species (amino acid residues 173-180 in Fig. 2). This is consistent with the finding that CDC37 is phosphorylated (33,34). The pRB-binding motif (LVCEE) noted previously (19) is also highly conserved (residues 187-191 in Fig. 2).
Expression of CDC37 in Chick-We performed Western blots on chick embryo fibroblast extracts with both the monoclonal antibody, IVd4 (5), and a polyclonal antibody raised against bacterially expressed recombinant CDC37 protein. In both cases, the major protein recognized by this antibody is ϳ50 kDa in size (Fig. 3), agreeing well with the size of the calculated product of the open reading frame. Expression of the recombinant protein in bacteria or in vertebrate cells transfected with the full-length CDC37 cDNA also yields a protein of ϳ50 kDa (data not shown).
We used the full-length composite cDNA as a probe in Northern analyses of mRNA obtained from different stages of chick limb and brain development, loading a constant amount of mRNA from each stage. In both tissues, maximum expression of the ϳ1.7-kb mRNA was reached at about 6 days of development, followed by a gradual decrease until hatching (Fig. 4). As expected, in situ hybridization and immunohistochemistry revealed wide distribution of CDC37 in morphogenetically active tissues throughout the chick embryo (data not shown).
Organization of the CDC37 Gene-Southern analysis has shown that the CDC37 gene is present as a single copy in several vertebrate genomes (6), and we have confirmed this result with the chick. Since no further genomic analysis has been reported, we isolated a CDC37 genomic clone to characterize its organization. We obtained the clone by PCR, using chick genomic DNA with primers corresponding to the 5Ј and 3Ј ends of chick CDC37 cDNA (Fig. 1). The amplified product, ϳ8.5 kb in length, was then ligated into a plasmid vector and used for further analyses.
Using this genomic clone in PCRs with numerous primer pairs corresponding to progressively more 3Ј regions of the cDNA, we mapped the positions of intron/exon boundaries within the gene. These products were cloned and sequenced. Eight exons and seven introns were found (Fig. 5A); in each case, the sequences at these boundaries complied with the AG . . . GT consensus sequences for splicing sites (Fig. 5B). The sizes of introns were determined by PCR using primers from exon regions flanking each intron, and were found to range from ϳ0.1 to ϳ2 kb (Fig. 5A).
Inverse PCR (30) was used to obtain sequences upstream of the cDNA sequence shown in Fig. 1; chick genomic DNA was used as template with a primer pair corresponding to sequences at the 5Ј end of the open reading frame of the chick CDC37 cDNA (Fig. 1). The upstream sequence of ϳ2 kb obtained in this way is shown in Fig. 6A. The accuracy of this sequence was confirmed with several primer pairs from within this sequence and exon 1, using genomic DNA as a template. The region between nucleotide residues Ϫ1115 and Ϫ645 upstream of the translation start codon (shaded in Fig. 6A) consists of a highly repetitive 10 -11-bp sequence, (T/C)gCTAT(A/ G)GGG(A/T) (where g represents the additional G present in the 11-bp sequence). Several other motifs, discussed later, are present in this upstream region. The size of our full-length cDNA is ϳ1.6 kb (Fig. 1). Taking into account a poly(A) tail of ϳ100 bp, the size of this cDNA agrees very well with the size of mRNA observed in Northern blots, i.e. ϳ1.7 kb (Fig. 4). These observations imply that the major transcription initiation site is very close to the 5Ј end of the cDNA as shown in Fig. 1. As discussed above, this was confirmed by extending the 5Ј end of the cDNA as far as possible by 5Ј-RACE using the primer, PAS1, at the beginning of the open reading frame. The products were detected in Southern blots with a probe just 5Ј of the translation start codon, then cloned and sequenced to obtain the longest 5Ј sequence. Thus, we conclude that the 5Ј sequence shown in Fig.  1 is very close to the major transcription start site.
We investigated whether there might be additional initiation start sites. For this, we used RT-PCR with sense primers (S2, S3, and S4) corresponding to various sites upstream of the ATG start codon (and a control primer, S1, just 3Ј of the start codon) with a common antisense primer within exon 2 (primer AS1) (see Fig. 6, A and B); as a positive control, we also used genomic DNA as a PCR template with the same primers. The position of the antisense primer in exon 2 was chosen so that the expected RT-PCR products from mRNA would differ greatly in size from the products of any contaminating genomic DNA that might be in the mRNA preparation, since the ϳ1.5-kb intron 1 would then be included. As expected, we obtained a strong band using FIG. 4. Northern blot of RNA from chick embryo limb and brain. Lanes 1-5, mRNA from chick embryo brains at 4, 5, 6, 9, and 10 days of development, respectively; lanes 6 -9, mRNA from chick embryo brains at 4, 5, 6, and 10 days of development, respectively. A band at ϳ1.7 kb (large arrowhead) was obtained in each case. Arrows indicate the positions of 18 S and 28 S ribosomal RNA. Glyceraldehyde-3phosphate dehydrogenase (small arrowhead) was used as a measure of loading.
FIG . 5. Organization of exons and introns in the CDC37 gene. A, arrangement and sizes of exons and introns. B, nucleotide sequences immediately adjacent to each exon (exon sequences in uppercase; intron sequences in lowercase); the 5Ј ag and 3Ј gt consensus sequences for splicing are underlined. The precise sequence at the 5Ј end of exon 1 is not known; the 5Ј sequence shown is that of the most 5Ј primer used in these experiments and corresponds to nucleotides 81-86 in the cDNA sequence (Fig. 1). primer S1 (Fig. 6C, lane 5). However, we also obtained weak bands of expected sizes with primers S2 and S3, which are upstream of the major initiation site (Fig. 6C, lanes 3 and 4). No band was obtained by RT-PCR with primer S4 (Fig. 6C, lane 2). The most likely explanation of the latter finding is that an additional, minor transcription start site occurs between primers S3 and S4, i.e. at ϳ250 bp upstream of the major start site (Fig. 6A); however, no additional mRNA corresponding to this has been observed in Northern blots.
The CDC37 Promoter Region-As described above, we obtained a sequence of ϳ2 kb upstream of the translation start codon. This sequence includes a 5Ј-UTR of ϳ100 bp and an additional ϳ1900 bp upstream of the 5Ј-UTR (Fig. 6A). No TATA or CAAT boxes occur at appropriate positions but there is an SP1 site at residues Ϫ513 to Ϫ501.
An 11-bp DNA consensus sequence for binding the S8 homeodomain protein, a homeobox gene product expressed in specific regions of mesenchyme in the embryo, has been characterized. The DNA sequence to which the S8 homeodomain binds is AN(C/T)(C/T)AATTA(A/G)C, residues 3-9 being of particular importance (35). Two putative S8 homeodomain recognition sites are present in the CDC37 promoter region at residues Ϫ1966 to Ϫ1956 and Ϫ1941 to Ϫ1931. These two sequences are TATTAATTAGC and TGCTAATTAGT, respectively; both contain all critical components of the S8 consensus sequence, including the central nucleotides at positions 3-9 and the ATTA motif, which is the core sequence essential for DNA binding to most homeodomain proteins (36).
An E2 box, CACCTG, is present in CDC37 between residues Ϫ1244 and Ϫ1239. Several basic helix-loop-helix activator proteins bind to E2 box-containing sequences (37). However, E2 box repressors appear to act competitively by binding to sites that overlap with those of E2 activators and include at least part of the E2 box sequence. For example, ␦EF1, a zinc finger and homeodomain-containing protein thought to be important in developmental gene regulation, is a repressor of E2 box action (38). The binding site consensus sequence for one of the zinc finger clusters in ␦EF1 includes CACCT and consensus flanking sequences that include those found in CDC37, TC-CCACCTGAG (residues Ϫ1247 to Ϫ1237; flanking sequences in bold). Thus, the CDC37 promoter could conceivably include a site involved in E2 box activation or repression.
Of particular interest is the region of the CDC37 promoter between residues Ϫ1115 and Ϫ645, which has approximately 40 repeats of the 10 -11-bp consensus sequence, (T/C)gCTAT(A/ G)GGG(A/T) (bold letters indicate the most common alternative nucleotides). Within 15 of these repeats is the motif, TAT-GGGGA, which closely conforms to one of the DNA consensus sequences, AGTGGGGA (GGGGA being most critical), recognized by the myeloid zinc finger protein, MZF1 (39). MZF1 is a transcription factor that plays an important role in differentiation of myeloid progenitor cells. MZF1 contains two clusters of zinc fingers that bind independently to two DNA consensus sequences with G-rich cores. The N-terminal cluster binds to the GGGGA sequence (39). CDC37 also contains a cis sequence virtually identical to that used for binding the MZF1 C-terminal zinc finger cluster, i.e. GGNGAGGGGGAA (39). This second putative MZF1 motif lies between residues Ϫ1676 and Ϫ1665 of the CDC37 gene and has the sequence, GGGGGG-GGGGAA.
Genetic and biochemical studies have shown that CDC37 forms a complex with Hsp90 and that this complex stabilizes several protein kinases that are critical for signal transduction and cell division (7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17), events that are central to embryonic development and tissue remodeling. The S8 homeodomain, E2 box, and MZF1 sequences discussed above are all important in regulating gene expression during tissue and organ development. Thus, the presence of putative binding sites for such transcription factors within the promoter region of the CDC37 gene further supports the idea that CDC37 is important in cellular behavior and in morphogenesis. Future promoter function studies will determine which of these sequences are active in regulation of CDC37 expression.