Domain architecture of a high mobility group A-type bacterial transcriptional factor.

Myxococcus xanthus transcriptional factor CarD participates in carotenogenesis and fruiting body formation. It is the only reported prokaryotic protein having adjacent "AT-hook" DNA-binding and acidic regions characteristic of eukaryotic high mobility group A (HMGA) proteins. The latter are small, unstructured, nonhistone nuclear proteins that function as architectural factors to remodel DNA and chromatin structure and modulate various DNA binding activities. We find CarD to be predominantly dimeric with two stable domains: (a) an N-terminal domain of defined secondary and tertiary structure which is absent in eukaryotic HMGA proteins; (b) a C-terminal domain formed by the acidic and AT-hook segments and lacking defined structure. CarD, like HMGA proteins, binds specifically to the minor-groove of AT-rich DNA present in two appropriately spaced tracts. As in HMGA proteins, casein kinase II can phosphorylate the CarD acidic region, and this dramatically decreases the DNA binding affinity of CarD. The acidic region, in addition to modulating DNA binding, confers structural stability to CarD. We discuss how the structural and functional plasticity arising from domain organization in CarD could be linked to its role as a general transcriptional factor in M. xanthus.

Transcription, replication, recombination, and repair are mediated by the assembly of specific nucleoprotein complexes. One essential consequence of this nucleoprotein complex formation is to confer local flexibility to the inherently stiff DNA molecule (1). Various factors guide the specific assembly of these complexes including proteins referred to as architectural factors because they remodel DNA by specific or nonspecific DNA binding (2). Given their fundamental importance, architectural proteins are found in phages, prokaryotes, and eukaryotes. Examples in phages and bacteria include the proteins HU, H-NS, IHF, and phage 29 p6 (see Ref. 3 and references therein). In eukaryotes, in addition to histones, the abundant nonhistone chromosomal proteins of the HMG 1 family constitute an important group of architectural transcription factors that regulate gene expression and are implicated in a variety of cellular functions (4 -6).
The subfamily of HMGA proteins (previously HMGI(Y) (7)) is characterized by the presence of multiple repeats of a conserved RGRP sequence (the "AT-hook" motif) embedded in a less conserved cluster of basic and proline residues (8). The most extensively studied, the mammalian HMGA isoforms, are small proteins (Յ107 residues; ϳ12 kDa) with three AT-hooks lying between a highly acidic C-terminal stretch of about 15 residues and a short N-terminal region of less than 25 residues of variable sequence (Ref. 9; Fig. 1A). The AT-hooks bind specifically to the narrow minor groove of AT-rich sequences 4 -8 base pairs in length occurring in at least two or three appropriately spaced tracts (10,11). The unstructured AT-hooks adopt a defined conformation on binding DNA (12), and their DNA binding specificity is modulated by the acidic domain (13). Protein conformational changes and decreases in DNA binding affinity are also brought about by phosphorylation by a number of kinases including CKII, Cdc2 kinase, mitogen-activated protein kinase, and protein kinase C (14 -18). This provokes fluctuations in intracellular protein stability and in DNA binding, thereby fine-tuning their regulatory functions in vivo (16,18).
The first and, to our knowledge, only prokaryotic protein with multiple AT-hooks and a flanking highly acidic region is the 316-amino acid (34 kDa) product of gene carD in the bacterium Myxococcus xanthus (19). Protein CarD is involved in regulating at least two distinct processes in M. xanthus (20). It is required for the expression of two different sets of genes that form part of a complex network regulating light-induced carotenogenesis, "the light regulon" (reviewed in Ref. 21). In addition, CarD also participates in the starvation-induced formation of fruiting bodies where thousands of cells cluster and subsequently undergo cellular differentiation to myxospores (reviewed in Ref. 22). Mutations in carD prevent fruiting body formation and block the expression of several developmentally activated genes (20). Thus CarD, like mammalian HMGA proteins, is recruited by distinct gene regulatory circuits in vivo.
In contrast to mammalian HMGA proteins, CarD has one more AT-hook, and its significantly longer acidic region is situated to the N rather than to the C terminus of the AT-hook region (Fig. 1B). Moreover, the segment containing the acidic and AT-hook portions of CarD is preceded by a considerably larger N-terminal stretch of around 180 amino acids, whose equivalent is absent in eukaryotic HMGA proteins. This Nterminal segment of unknown function includes in its sequence a stretch of heptad repeats of the type found in leucine zipper coiled-coils (19). Several CKII phosphorylation sites are predicted in the acidic and N-terminal regions of CarD, as also some protein kinase C sites (Fig. 1). Equivalent kinases have not thus far been specifically assigned in M. xanthus, but it does contain several eukaryotic-like serine/threonine protein kinases (23), and at least one functionally linked phosphatase which have been proposed to act in concert in the developmental cycle of the bacterium (24).
The focus of the present study is to characterize the structural and functional domain organization in CarD. We have mapped out the domain architecture in CarD by means of biochemical and spectroscopic analyses of the pure protein and several specifically designed fragments, and have probed these for oligomerization, DNA binding, and phosphorylation. The properties thus dissected in CarD are discussed using the reported properties of the considerably smaller mammalian HMGA1a as a benchmark. Based on our findings we hypothesize on how the modular organization in CarD affords the structural stability and malleability that may underlie its regulatory roles in M. xanthus.

EXPERIMENTAL PROCEDURES
Cloning of carD and Its Fragments into Expression Vectors-Several constructs were generated for the production of CarD and the following truncated versions: CD-(1-104), CD-(183-316), CD-(225-316), CD-(247-316), CD-(⌬181-223), and CD-(1-215), the numbers referring to amino acid positions in CarD. The expression vector pET11b was used for production of untagged proteins and pET15b for production of His 6tagged ones (25). CarD and the indicated fragments, with the exception of CD-(1-215), were obtained by polymerase chain reaction amplification and cloned into the NdeI-BamHI sites of the vectors. The NdeI site introduces a non-native Met for bacterial expression of fragments CD-(183-316), CD-(225-316), and CD-(247-316). The procedure to obtain a construct expressing CD-(1-215) was as follows. pET11-carD was cut with BglII-BamHI, the fragment containing carD was digested with HinfI (which cuts at a position corresponding to the codons for residues 215 and 216), filled with Klenow and cut with NdeI. This fragment was cloned into pET11b, first cut with BamHI, filled with Klenow, then cut with NdeI. The resulting ligation results in a TGA stop codon immediately following the codon for CarD residue 215. pET11b-HMGA1a (human) was constructed from plasmid pET15b-HMGA1a (generously provided by Prof. T. Maniatis, Harvard University; Ref. 26).
Overexpression and Purification of CarD and Its Fragments-Escherichia coli BL21(DE3) cells freshly transformed with each construct were cultured overnight in LB-ampicillin medium. After dilution into fresh LB-ampicillin, the cultures were grown at 37°C to A 600 of 0.6 -0.8 and induced with 0.4 mM isopropyl-D-thiogalactoside for 5 h at 37°C or overnight at 25°C. Protein expression in whole cell extracts was checked by centrifuging 1 ml of induced culture (14,000 rpm in a Microfuge), and the cell pellet was lysed by boiling in SDS-loading buffer for analysis by SDS-PAGE (27) and Western blotting using anti-CarD polyclonal and monoclonal antibodies (see below). To check solubility of the expressed protein, the cell pellet (obtained as above) was suspended in buffer A (50 mM Tris, 2 mM EDTA, 5 mM ␤-mercaptoethanol, pH 7.5) and 200 mM NaCl, sonicated and centrifuged, and the supernatant and pellet were separately analyzed by SDS-PAGE.
Proteins were purified in ice-cold conditions. Pelleted cells from 1-liter induced cultures (4 -5 g of cell wet weight) were lysed by grinding with alumina (2 g/g of cell pellet) in the presence of 1 mM phenylmethylsulfonyl fluoride and benzamidine. Alumina and cell debris were eliminated by centrifugation of the ground cell paste after suspension in 25-30 ml of buffer A containing 1 M NaCl and 1 mM each of phenylmethylsulfonyl fluoride and benzamidine. The resulting supernatant was mixed with polyethyleneimine to 0.3% final concentration to precipitate out DNA. Expressed proteins were recovered from the polyethyleneimine supernatant by ammonium sulfate precipitation at levels determined in pilot experiments: 50% for CarD, His 6 -CD-(1-104), and CD-(1-215), and 65% for CD-(183-316). CarD, CD-(183-316), and HMGA1a were purified off phosphocellulose and then by HPLC off a MonoS ion-exchange column (AKTA, Amersham Pharmacia Biotech). CD-(1-215) was purified off DEAE-Sephadex and then by HPLC off a MonoQ column. CD-(1-104) was expressed with the His 6 metal-affinity tag and purified from inclusion bodies employing TALON metal affinity resin and the accompanying purification protocol (CLONTECH, Palo Alto, CA). Protein and fragment identities were confirmed by N-terminal amino acid sequencing (Applied Biosystems Procise-494 Protein Sequencer) and matrix-assisted laser desorption ionization mass spectrometry. Concentrations of CarD, CD-(1-104), and CD-(1-215) were determined from the 280-nm absorbance arising from Trp and Tyr present with ⑀ 280 (M Ϫ1 cm Ϫ1 ) ϭ 8,480 for CarD and CD-(1-215), and 6,990 for CD-(1-104) (28). The absorbance at 205 nm (29) was used for CD-(183-316), and that at 220 nm for HMGA1a (⑀ 220 ϭ 74,000 M Ϫ1 cm Ϫ1 ; Ref. 30).
Monoclonal and Polyclonal Anti-CarD Antibodies-Anti-CarD rabbit polyclonal and mouse monoclonal antibodies were obtained using standard procedures (31). Immunization was performed with CarD excised off SDS-PAGE gels reversibly stained with Zn-imidazole negative staining (32). Epitope mapping of the polyclonal and monoclonal anti-CarD antibodies was done by enzyme-linked immunosorbent assay and Western blotting (ECL TM kit from Amersham Pharmacia Biotech).
Limited Proteolysis-Subtilisin Carlsberg, proteinase K, papain, chymotrypsin, and trypsin (Sigma) protease stocks were stored at Ϫ70°C as aliquots of 5 g/l in 50 mM Tris, pH 7.5, 1 mM dithiothreitol. In pilot experiments at 28, 30, and 37°C, protease was added to 240 l of purified protein (50 g/l) in buffer A containing 0.2 M NaCl at 1:500 or 1:1000 (w/w) protein:protease. 40-l aliquots were removed at 0, 5-, 10-, 15-, 20-, 30-, 45-, 60-, and 90-min intervals, the proteolysis quenched with 1 l each of 1 M phenylmethylsulfonyl fluoride and benzamidine, and then analyzed by SDS-PAGE. For fragment identification, aliquots of a 15-or 45-min subtilisin digest at 28°C were run on separate SDS-PAGE gels for Coomassie Blue staining, and for electrotransfer to nitrocellulose for Western analysis using anti-CarD antibodies or to an Immobilon P SQ membrane (Millipore, Bedford, MA) for N-terminal sequencing. Proteolytic fragments for mass spectrometry were purified by reverse-phase HPLC, or by identifying the band in SDS-PAGE gel by reversible Zn-imidazole negative staining (32), excising it, and then leaching it off as described by Cohen and Chait (33). Intrinsic subtilisin activities at given salt concentrations were estimated from the hydrolysis of 1 mM TAME by subtilisin at 12.5 g/ml in 3 ml of total reaction volumes (34). Hydrolysis rates were calculated from the initial linear increase in absorbance at 247 nm (⌬A 247 /min) monitored in a Kontron UVIKON 940 spectrophotometer equipped with a stirrer unit and a constant-temperature circulating water bath.
Analytical Size-exclusion Chromatography-Analytical HPLC sizeexclusion data were obtained at room temperature using a Superdex-200 column equilibrated with buffer A with 200 mM NaCl, sufficient to minimize nonspecific interactions with the column matrix. Column calibration was done using vitamin B 12 (1.355 kDa), cytochrome c (12.4 kDa), carbonic anhydrase (29 kDa), ovalbumin (43 kDa), bovine serum albumin (66 kDa), yeast alcohol dehydrogenase (150 kDa), and ␤-amylase (200 kDa) (all from Sigma). 100-l samples of CarD or each of its fragments at 10 -100 M were injected at 0.4 ml/min, and the elution was tracked by absorbance at 280, 235, and 220 nm. Void (V o ) and total (V t ) bed volumes were determined using blue dextran (2000 kDa; Sigma) and vitamin B 12 , respectively. Elution volumes, V e , were assigned for CarD and each fragment in distinct runs by verifying peak identities by Coomassie-stained SDS-PAGE and Western blotting. Stokes radii, R S (in nm) for the standards were obtained from Potschka (35). The following calibration curves were generated from the data for the standards employing SigmaPlot (Jandel Scientific) with correlation coefficients Ն0.99 in each case: log M r ϭ 7.91-0.23 V e and K av ϭ . These were then used to estimate the apparent M r and R S for CarD and each fragment (36).
Chemical Cross-linking-Chemical cross-linking was examined with glutaraldehyde, DSS (Pierce Chemical Co.), or Ni-GGH (37). 2-5 M pure protein in 200 mM NaCl, 50 mM phosphate buffer, pH 7.5, was treated with a freshly prepared 5 mM glutaraldehyde (in water) or 25 mM DSS (in Me 2 SO) or 2 mM Ni-GGH to final concentrations of 1, 2.5, and 1 mM, respectively, and incubated for 1 h at 30°C. Total reaction volumes were 100 l. Cross-linking was quenched with SDS-PAGE gel-loading buffer (150 mM Tris final concentration) and analyzed by SDS-PAGE and Western blotting.
Analytical Ultracentrifugation-Sedimentation equilibrium measurements were done in a Beckman Optima XL-A analytical ultracentrifuge, a Ti60 rotor and six-sector Epon charcoal centerpieces with 12-mm optical path length. 70-l samples in 100 mM phosphate buffer, pH 7.4, containing 200 mM NaCl and 0.1 mM ␤-mercaptoethanol, were centrifuged to equilibrium at 13,000, 15,000, 18,000, or 25,000 rpm at 20°C. Radial scans were acquired at 2-h intervals by monitoring at wavelengths between 220 and 280 nm, until successive scans were superimposable indicating equilibrium. 10 M CarD, 25 M CD-(1-215), and 200 -300 M for CD-(183-316) and HMGA1a were used, and a 50 M CarD sample in buffer containing 1 M NaCl was also examined. The apparent weight-average molecular masses (M r ) were determined by fitting data (using the programs EQASSOC, Beckman) to the equation for an ideal solution containing a single species as described elsewhere (38,39). Partial specific volumes, v (in ml/g), calculated from the amino acid compositions (40), were set to 0.732 for CarD, 0.720 for CD-(183-316), 0.733 for CD-(1-215), and 0.718 for HMGA1a.
CD and Fluorescence Spectroscopy-CD spectra were recorded in a Jasco-810 spectropolarimeter coupled to a Neslab temperature control unit, using 0.2-nm steps at a scan speed of 20 nm/min and a 4-s time constant and averaged over 5 scans. A Hitachi F-4500 spectrofluorimeter equipped with a stirrer unit and a constant temperature circulating water bath was used for fluorescence spectra. Sample excitation was at 295 or 280 nm for a slit width of 2.5 nm, and the emission spectra, averaged over 2 scans, were recorded between 300 and 400 nm for a slit width of 10 nm at 240 nm/min and a 2-s response time. 10 -30 M protein and a 1-mm path length cuvette were employed for far-UV CD, while for fluorescence these were, respectively, 1-2.5 M and 1 cm. All spectra were recorded at 25°C in 200 mM NaCl, 100 mM phosphate buffer, pH 7.4. Fluorescence spectra in denatured conditions were obtained in buffer containing 6 M guanidinium hydrochloride (Ultrapure from ICN Biomedicals, OH).
CKII Phosphorylation Assays-CKII phosphorylation of purified CarD or fragments was examined using rat liver (Promega) or recombinant human CKII (New England BioLabs). 0.3-1 M protein was treated with 0.375 units of CKII and 1 Ci of [␥-32 P]ATP or [␥-32 P]GTP in CKII buffer (200 mM NaCl, 25 mM Tris, pH 7.4, 10 mM MgCl 2 , 100 M ATP) or DNA-binding buffer for at least 30 min at 30°C. Unincorporated [␥-32 P]NTP was removed by passing through a Sephadex G-50, and then examined by SDS-PAGE gel followed by autoradiography.

RESULTS
Limited Proteolysis of CarD Indicates Relatively Stable Nand C-terminal Domains-Unstructured or partially structured regions of native proteins are more accessible and so more susceptible to protease action than compact structured domains. The latter are revealed on limited proteolysis of purified protein as the relatively resistant fragments visualized in SDS-PAGE gels (41). Domain organization of CarD was assessed by treatment of the protein with broad specificity proteases such as subtilisin and proteinase K. Fig. 2A shows the proteolytic cleavage pattern from limited proteolysis using subtilisin. (A similar pattern was observed with proteinase K.) Three discrete bands (numbered 1-3 in Fig. 2A) and a group of bands (4 and 5 in Fig. 2A) were detectable 15 min after the initiation of proteolysis. Fragments corresponding to the group 4/5 persisted even after 45 min. Bands 1-5 were characterized by N-terminal sequencing, mass spectrometry, and Western blot analysis using anti-CarD polyclonal and monoclonal antibodies of known epitope specificities (see below). The analysis, summarized in Fig. 2B, indicated the following. Four fragments could be identified from bands 4 and 5 which are the ones most resistant to proteolysis: a fragment corresponding to approximately the first 100 N-terminal residues, a fragment beginning at residue 157 and including the entire acidic region, and fragments which contain all or most of the AT-hook region. Bands 2 and 3 are two discrete, intense bands generated in the early steps of proteolysis. The fragments corresponding to these begin around residues 151 (band 2) or 185 (band 3) and span all of the C terminus of CarD. Thus the complete acidic AT-hook segment appears to constitute a relatively stable domain. The segment from residues 100 to 151 is highly susceptible to proteolysis and could constitute a loosely structured part of the protein.
Expression of CarD Fragments Suggests a Protein-stabilizing Role for the Acidic Region through Interactions with the AThooks-Structural stability is an important determinant of proteolytic susceptibility and so of intracellular degradation in E. coli (42). Consequently, fragments corresponding to stably folded domains are usually expressed to higher levels than those with little or no defined structure. This provides a means to assess whether a given protein segment constitutes an independently folded domain, and can be used to corroborate limited proteolysis data. We have done so for CarD by checking the expression of the following fragments: (i) the N-terminal region spanning residues 1-104, CD-(1-104); (ii) the acidic and basic AT-hook regions alone, CD-(183-316); (iii) the segment containing all four AT-hooks, CD-(225-316), or just the last three C-terminal ones, CD-(247-316). These fragments were chosen taking into account the results of the limited proteolysis experiments described above. All these fragments were expressed off plasmids constructed as described under "Experimental Procedures," and none have as the penultimate Nterminal amino acid one that would confer a short half-life in bacteria (43). Fig. 3 shows the protein expression patterns for total extracts from cells in which expression of CarD (lane 3) or one of its fragments (lanes 4 -9) was induced under equivalent conditions of growth and induction times. Like CarD, CD-(1-104) and CD-(183-316) were expressed at levels visually detectable by Coomassie Blue staining (shown boxed in lanes 4 and 7, respectively, in Fig. 3A), but not CD-(225-316) or CD-(247-316) (lanes 5 and 6, respectively). An unambiguous identification of the above overexpressed bands was obtained from Western blots using polyclonal and monoclonal antibodies generated against purified CarD. The results obtained with polyclonal anti-CarD antibodies and with one of the two monoclonal antibodies generated are shown in Fig. 3, B and C, respectively. Polyclonal anti-CarD antibodies detected every one of the above fragments, including CD-(225-316) or CD-(247-316) which were not apparent in Coomassie-stained gels. This was also the case with the anti-CarD monoclonal antibody shown in Fig. 3C, except for the N-terminal fragment which is not detected by this antibody (epitope specificities are summarized in Fig. 3D). The relative intensities of the bands in Western blots paralleled the expression levels inferred from Fig. 3A: the observed bands for CD-(1-104) and CD-(183-316) in the Western blots were quite intense, but were barely perceptible for CD-(225-316) or CD-(247-316).
CarD has an apparent molecular weight, M r , of 41,000 in SDS-PAGE (Figs. 2 or 3), higher than the value of 33,900 calculated from sequence or determined by mass spectrometry. Similar anomalous mobilities in SDS-PAGE are observed for mammalian and insect HMGA proteins (13,44), as also for some other highly charged proteins. CD-(225-316), CD-(247-316), and CD-(183-316), all of which contain the AT-hook region, also run anomalously in SDS-PAGE with apparent M r 8,000 -10,000 higher than their true values of 9,600, 7,300, and 14,300, respectively, but CD-(1-104) (true M r ϭ 11,700) does not exhibit this anomalous behavior. introduced for bacterial expression of CD-(183-316) is retained, and thrombin-cleaved His 6 -CD-(1-104) has the expected Nterminal GSH. The molecular masses of 14.25 kDa for CD-(183-316), and 11.7 kDa for CD-(1-104) determined by mass spectrometry match the calculated values, confirming that both were purified as the full-length proteins.
The results for limited proteolysis of the whole protein are thus in accord with the stable expression observed for fragments CD-(1-104) and CD-(183-316), but in apparent contrast to the low expression of CD-(225-316) or CD-(247-316). If the latter two fragments containing only the AT-hooks are devoid of defined structure as has been reported for human HMGA1a (12), they would be expected to be more susceptible to intracellular proteolytic degradation and so poorly expressed (42), as is actually observed. Consequently, the stable expression of CD-(183-316) implies that the acidic region, when simultaneously present, is sufficient to stabilize the basic AT-hooks, which by themselves are not stable. In the context of limited proteolysis of the whole protein, the coexisting acidic region could help to stabilize, intra-or intermolecularly, the AT-hook regions thereby accounting for their apparent proteolytic resistance. Any interactions between the basic AT-hook region and the adjacent acidic region are most likely electrostatic in origin. In accord with this, we have observed in SDS-PAGE that bands corresponding to the acidic AT-hook fragments generated by limited subtilisin proteolysis of CarD are weaker in intensity at higher salt (0.9 M NaCl) than at lower salt (0.08 M NaCl) where the intrinsic activity of subtilisin is only ϳ5% lower (data not shown). Added support for the stabilizing role of the acidic region also comes from our observation that the fragment CD-(⌬181-223), which lacks only the acidic segment of CarD, was poorly expressed (lane 8, Fig. 3, A-C). The construct CD-(1-215) containing most of the acidic region but lacking the entire AT-hook region, was more stably expressed (lane 9, Fig. 3, A-C). Thus the observed expression for CD-(⌬181-223) and CD-(1-215) suggests that the presence of the acidic region is required for CarD stability, whereas the protein can be stably expressed in the absence of the AT-hook segment. We purified CD-(1-215), and verified that its N-terminal sequence matched that in CarD, and that its molecular mass from mass spectrometry corresponded to that calculated from the sequence (ϭ23.39 kDa).
Tests for CarD Oligomerization-Distinct segments of CarD may interact with one another as discussed above, and sequence analysis of CarD suggested leucine zipper-type heptad repeats between residues 120 and 141 (19). Moreover, the considerably smaller mammalian HMGA1a has been reported to oligomerize in solution, and to interact with other proteins (6,46,47). Consequently, we tested CarD for oligomerization.
The oligomeric nature of proteins, their shapes and sizes can be assessed by analytical gel-filtration HPLC (36). Fig. 4A summarizes the results of analyzing CarD, CD-(1-104), CD-(183-316), and CD-(1-215) as well as human HMGA1a in a Superdex-200 HPLC gel filtration column equilibrated with buffer at 0.2 M NaCl. Each of these proteins eluted as a single symmetrical peak of sharpness comparable to the standards, and no additional peaks were detected. This either indicates a homogeneously populated conformation, or different conformations exchanging rapidly relative to their mobilities in the column (48). The apparent M r estimated from this data are indicated in Fig. 4A. CD-(1-104) appears to be a globular monomer with apparent M r close to the expected monomer value. On the other hand, CarD, CD-(183-316), CD-(1-215), as also HMGA1a, elute with apparent M r considerably higher than their expected monomer values. (The highly charged CarD, CD-(183-316) and HMGA1a when examined in 1 M NaCl ex-hibited essentially identical elution behavior as at 0.2 M NaCl, data not shown.) Apparent molecular mass (in kDa) of 47 for CD-(1-215) is that expected for a dimer, whereas values of 129, 70, and 37 for CarD, CD-(183-316), and HMGA1a, respectively, suggest even higher order oligomers. Alternatively, the slower mobilities could reflect extended molecular shapes (49). To distinguish between these possibilities, CarD and its fragments were further examined by chemical cross-linking and analytical ultracentrifugation.
Chemical cross-linking of CarD and its fragments was examined using two reagents that cross-link primary amino groups, glutaraldehyde and DSS. Oxidative Ni-GGH cross-linking which has been proposed to involve aromatic amino acids (37) was also investigated. With all three cross-linking agents, dimers and to a lower extent higher order oligomers were observed for CarD and CD-(1-215). For CD-(183-316), as for HMGA1a, dimers were observed with glutaraldehyde and DSS but not with Ni-GGH possibly because both polypeptides lack aromatic amino acids. Defined cross-linked products could not be observed with CD-(1-104), consistent with its monomeric nature as suggested by analytical gel filtration. Fig. 4B shows representative cross-linking data obtained for the fragments CD-(1-215) and CD-(183-316). In all cases, the noncross-linked form appeared as an intense band even at the highest protein or cross-linker concentrations that we employed. Therefore, CarD and all its fragments except CD-(1-104) are present as an equilibrium distribution of monomers, dimers, and possibly higher order oligomers according to cross-linking data.
As a final diagnostic of oligomerization, we performed sedimentation equilibrium experiments of CarD, CD-(1-215), CD-(183-316), and HMGA1a by analytical ultracentrifugation (38,39). For each of these, the observed sedimentation equilibrium gradients were fit to the equation that describes an ideal single component situation to obtain the apparent weight-average molecular mass, M r . Fig. 4C shows such an analysis for CarD in 200 mM NaCl buffer, which yields a best fit M r ϭ 56,000 Ϯ 3,000 at 15,000 rpm, with small although not randomly scattered residuals around the fit. The latter is indicative of the presence of several species (38,39). For comparison, Fig. 4C also shows deviations from experimental data when the idealsingle component fits were performed with M r fixed to the calculated monomer or dimer value. Similar results were obtained with CarD at 13,000 rpm (best-fit M r ϭ 63,000 Ϯ 3,000), and also for a 5-fold higher protein concentration in 1 M NaCl buffer (best-fit M r ϭ 54,000 Ϯ 2,000). The best-fit M r values for CarD are close to the dimer value, thus suggesting that the protein exists as a largely dimeric form in equilibrium with monomeric and possibly some higher order forms. The same conclusion appears to hold for CD-(1-215): M r of 48,000 Ϯ 1,000 at 13,000 rpm or 45,000 Ϯ 2,000 at 15,000 rpm close to the calculated dimer value, but with small and nonrandomly scattered residuals around the fit. By contrast, M r was 19,000 Ϯ 2,000 for CD-(183-316) and 14,000 Ϯ 1,000 for HMGA1a at 25,000 rpm, and the residuals around the fit were small and randomly scattered even at the highest protein concentrations examined (200 -300 M), consistent with the predominant species being the monomer.
M r and the Stokes radii, R S (in m) values determined from gel filtration data for each protein may be used to estimate its approximate frictional ratio (f/f 0 ), which is an indicator of particle shape. f/f 0 ϭ R S /[3 v M r /4N)] 1/3 , where v is the partial specific volume of the protein (listed under "Experimental Procedures") and N is Avogadro's number (50,51). Gel filtration data for CarD, CD-(183-316), CD-(1-215), and HMGA1a, provided R S (in nm) of 3.63, 3.15, 2.89, and 2.72, respectively. Estimates for f/f 0 obtained were ϳ1.4 for CarD and ϳ1.2 for CD-(1-215) both within or close to the range observed for the standards cytochrome c, bovine serum albumin, and yeast alcohol dehydrogenase: f/f 0 ϭ 1.09, 1.30, and 1.28, respectively (50,51). By contrast, CD-(183-316) and HMGA1a with f/f 0 of 1.8 and 1.7, respectively, appear to deviate significantly from a spherical shape and are probably elongated molecules, in accord with their slower gel filtration mobilities.
In summary, gel filtration, chemical cross-linking, and analytical ultracentrifugation data together suggest that CarD and its fragment lacking the AT-hook, CD-(1-215), are predominantly dimers, its N-terminal fragment CD-(1-104) forms a compact monomeric domain, and the C-terminal acidic AThook segment, CD-(183-316), is largely monomeric and elongated as is HMGA1a. The compact nature of the CD-(1-215) dimer suggests that the acidic region in this molecule (and possibly in CarD) may not be in an extended conformation, but that it may be so in CD-(183-316) or HMGA1a. The above data also hint that residues within the 104 -215 segment may be involved in dimerization.

CarD Is Structurally Well Defined at Its N Terminus but
Random at Its C Terminus-The presence of protein secondary and tertiary structure is readily assessed by CD and intrinsic fluorescence of the purified protein (52). In far-UV CD spectra, ␣-helices are characterized by two minima at 222 and 208 nm and a maximum at 192 nm, ␤-sheets by a weaker and broader minimum around 215 nm and a maximum at 198 nm, and random coils by an intense minimum at 198 nm (53). The far-UV CD spectra of CarD and its N-terminal fragments CD-(1-215) and CD-(1-104) exhibit the characteristic minima for ␣-helical and ␤-sheet conformations. CD- (183-316), however, has a far-UV CD spectrum expected for random coils as does HMGA1a (Fig. 5). Thus, defined ␣-helical and/or ␤-sheet secondary structural elements in CarD are confined to its Nterminal region, and the C terminus is randomly structured. These observations are in qualitative accord with the secondary structure predictions using the PHD algorithm (54). The far-UV CD spectrum of CD-(1-215) resembles that of whole CarD, and, in terms of their Ϫ[⌰] values at 222 nm, the two have similar helix contents. Ϫ[⌰] 222 is smaller for CD- (1-104). The region between residues 104 and 215 may therefore be intrinsically more helical, or it may be that oligomerization in CarD and CD-(1-215) drives additional folding in these molecules relative to CD-(1-104) (55).
A Trp (Trp 92 ) and one or two Tyr (Tyr 18 (Fig. 5B). In all three, the maximum is red-shifted to 354 nm in denaturing 6 M guanidinium hydrochloride solutions with about a 20% loss in intensity for CarD and CD-(1-215), and a slight increase in intensity for CD-(1-104). Therefore the environment of Trp in these proteins in the native state differs from that in the denatured, solvent-exposed forms and points to the existence of defined tertiary structure (52). As inferred from their similar intrinsic Trp fluorescence behavior, the tertiary fold is probably similar in native CarD and CD-(1-215) but varies somewhat in CD- (1-104).
CarD and Its C-terminal Fragment Share the DNA-binding Specificity of Human HMGA1a-HMGA1a binds specifically to two appropriately spaced AT-rich tracts present in the Ϫ77 to Ϫ37 region just upstream of the interferon-␤ promoter, the interferon response element or IRE (26). EMSA analysis shows that CarD and its fragment CD-(183-316) containing the acidic AT-hook segment bind specifically to this 40-base pair IRE fragment, and that the binding is competed away by poly(dA-dT) or poly(dI-dC) but not poly(dG-dC) (Fig. 6A). This is identical to the behavior reported for human HMGA1a (10), and also shown in Fig. 6A. The minor grooves of G-C, A-T, and I-C base pairs differ solely in the presence of a 2-amino group in G but which is a hydrogen in A or I. As a consequence, I-C resembles G-C in the major groove and A-T in the minor groove. The ability of poly(dI-dC) to compete as well as poly(dA-dT) and far more effectively than poly(dG-dC) has therefore been used as evidence for the binding of HMGA1a to the minor groove of AT-rich DNA (10). Thus CarD and CD-(183-316), like HMGA1a, exhibit minor groove DNA binding specificity. For the same solution conditions HMGA1a binds to IRE with a K D Ϸ 40 nM (56). As judged by the concentrations of CarD and CD-(183-316) required for EMSA analysis, the specific DNA binding affinity for the acidic AT-hook fragment of CarD is slightly lower than for HMGA1a, whereas it may be as much as an order of magnitude weaker for CarD (Fig. 6A).
CarD is essential for the expression of the light-inducible carQRS operon, a key gene cluster in M. xanthus carotenogenesis (19). Two 5Ј-GGAAA-3Ј repeats 5 base pairs apart in the Ϫ88 to Ϫ55 promoter upstream region of this operon have been suggested to constitute a CarD-binding site (19). A double-stranded oligonucleotide probe containing this site (QRS) is bound by CarD, CD-(183-316), and HMGA1a, and the binding was competed away by poly(dA-dT) or poly(dI-dC) but not by poly(dG-dC) (data not shown), as was observed with probe IRE. This reiterates the minor-groove binding specificity of CarD. In EMSA analysis, specific binding to probe QRS required 3-5-fold higher protein concentrations relative to those used with probe IRE. This indicates that the binding affinity for probe QRS is less than for probe IRE. Consistent with this, binding to probe QRS is competed away far more effectively by IRE than by QRS in competition binding assays (Fig. 6B). It has been reported that optimal site-specific binding of AT-hooks requires a minimum of two appropriately spaced tracts of AT-rich sequences at least 4 base pairs in length (10,11). The lower affinity for probe QRS relative to probe IRE may therefore be related to its two AT-rich stretches being only 3 base pairs long. Thus the proposed CarD-binding site in M. xanthus is not optimized for maximal binding affinity, and the possible consequences of this will be examined under "Discussion." CK II Phosphorylates CarD Leading to Decreased DNA-binding Affinity-Mammalian HMGA proteins are phosphorylated by CKII kinase (14 -18). Protein sequence analysis predicts the presence of such sites in the acidic and N-terminal regions of CarD (Fig. 1). We tested this by examining CKII phosphorylation of CarD in vitro. Fig. 7A shows that CarD, CD-(183-316), and CD-(1-215) but not CD-(1-104) are phosphorylated in vitro by CKII. The CKII-phosphorylation sites thus map to the acidic region of CarD as in the case of HMGA proteins. Typical substrates for CKII are short, unstructured peptides (ϳ10 residues long) containing the required phosphorylation sites. CKII phosphorylation of the acidic region is therefore consistent with its relatively open structure, whereas in the more compactly structured CD-(1-104) the putative CKII sites appear to be inaccessible.
Phosphorylation of HMGA proteins causes marked decreases in their DNA-binding affinities (16 -18). Fig. 7B shows that CKII phosphorylation of CarD or CD-(183-316) also dramatically decreases DNA binding affinity. HMGA-DNA binding is highly dependent on ionic conditions (16), which implies large coulombic contributions to DNA binding (55). CKII phosphorylation of the acidic region would increase its negative charge. This, it may be argued, would boost coulombic repulsions from the DNA backbone and thereby diminish DNA binding affinity. The simultaneous enhancement expected for any favorable electrostatic interactions between the acidic region and the basic AT-hooks would also lead to the latter being further sequestered from DNA binding. DISCUSSION Our analysis reveals that CarD consists of two relatively stable domains: (i) a C-terminal HMGA-like region that consists of all of the acidic and AT-hook regions and spans residues 183 to 316 at the C-terminal end of the molecule; (ii) an Nterminal domain of about 100 residues that is absent in eukaryotic HMGA proteins. The two domains are linked by a segment whose stretch between residues 105 and 155 is quite susceptible to proteolysis and is thus likely to be a flexible linker region.
The CarD HMGA-like Domain-CD-(183-316) corresponds to the HMGA-like domain in CarD. It shares all the attributes of its eukaryotic counterparts. It consists of adjacent highly acidic and basic regions although juxtaposed differently. CD-(183-316) lacks defined structure as does HMGA1a (12). Both appear to be largely monomeric, although there may be intermolecular interactions that occur in a highly transient fashion and explain the experimentally observed chemical cross-linking. CD-(183-316) is the seat of DNA binding in CarD and is akin to HMGA1a in its minor-groove binding specificity. Consistent with large electrostatic contributions to DNA binding, the affinity is lowered dramatically by CKII-phosphorylation that is localized to the acidic portion of CD-(183-316) (and CarD) as in HMGA proteins.
The lack of intrinsic structure in CD-(183-316) or HMGA1a is not unexpected since both are composed of mostly Pro and highly charged residues and have few hydrophobic residues. These characteristics are unfavorable for the formation of defined structural elements like ␣-helices and ␤-sheets or a stable compact core (57)(58)(59). Absence of a defined structure usually correlates with low intracellular stability because of greater susceptibility to intracellular proteases (42). Sequences such as PEST if present also predispose a protein to intracellular proteolysis (60,61), but these are absent in CarD or HMGA proteins. Our results show that the presence of the acidic region is required for stable expression of whole CarD or its basic AThook region. This leads us to propose that an important role for the acidic region in the protein architecture is that of stabiliz-ing the randomly structured AT-hooks. The resultant acidic AT-hook interactions would necessarily affect DNA binding by the AT-hook both by sequestering the latter from DNA, as well as by charge repulsions between the acidic region and the DNA backbone as reasoned earlier. These inferences on the roles of the acidic region in protein stability and DNA binding very likely carry over to the members of the eukayotic HMGA protein family, all of which have the acidic region (9,13).
An intrinsic lack of structure has been argued to confer on a protein the inherent flexibility and structural plasticity required in fine-tuning its regulatory or signaling functions (61,62). Indeed, the ability to tweak both protein stability and conformation of the intrinsically unstructured HMGA proteins by covalent modifications such as phosphorylation or acetylation underlies their participation in diverse biological processes from transcription to recombination (18,62). The C-terminal HMGA-like part of CarD would be similarly malleable to such conformational and so functional alterations. Moreover, like HMGA proteins, CarD is also a multifunctional regulator being involved in the distinct processes of carotenogenesis and multicellular development in M. xanthus (20). Work currently in progress in this laboratory has also underscored the involvement of CarD in processes other than carotenogenesis and fruiting body development. 2 An array of signaling processes are known to be involved in M. xanthus development including a number of eukaryotic-like serine/threonine protein kinases and functionally linked phosphatases. However, the identification of specific phosphorylatable substrates has remained elusive (23,24). CarD would therefore be an attractive candidate given its involvement in M. xanthus development, as are HMGA proteins in the eukaryotic cell cycle and development, and based on the analogies between the two proteins that we have enumerated.
According to hydrodynamic experiments, CarD and its fragment lacking the AT-hooks exist largely as dimers in equilibrium with monomers, and these dimers can be chemically cross-linked with a variety of agents. Hydrodynamic data suggest that CD-(183-316) and HMGA1a are elongated monomeric species, but they seem capable of being cross-linked by bifunctional amino-reactive agents. Although this may be a cross-linking artifact caused by the presence of a large number of lysines, it may also reflect transient association. The latter explanation would be compatible with our earlier inference of intra-or intermolecular interactions of the acidic region with the AT-hooks that make them resistant to proteolysis. Although we have been unable to provide additional experimental evidence for this, protein-protein interactions between different HMGA molecules or of HMGA with other protein molecules have been invoked to explain the highly cooperative enhanceosome assembly (13,26,46,47,56,62). The inherent difficulty in pinpointing the mechanism of HMGA cooperativity has been noted before and attributed to, among others, its lack of defined structure (56). The latter imposes technical challenges which would also apply to CD-(183-316), and to this segment in CarD.
The CarD N-terminal Domain-CD- , which lacks the AT-hooks, appears to be both compact and dimeric. By contrast, CD-(1-104) appears to be a compact monomer. This would suggest that dimerization in CarD is mediated by a stretch or stretches between the N-terminal and acidic domains, and this would include the heptad leucine zipper-type repeats present between residues 120 and 141. This region is quite susceptible to proteolysis, is predicted to have low coiled-coil forming probability by available programs and contains two Pro residues that are not common to leucine zippers. As a consequence, attempts to obtain purified CarD fragments containing this region and lacking segments N or C terminus to it in the CarD sequence have not been successful thus far. Hence the actual dimerization segment remains to be specified.
The CarD N-terminal region as in fragments CD-(1-104) and CD-(1-215) constitutes a stable, compact domain. It appears to be the part that contains most of the secondary and tertiary structural elements of the protein based on our spectroscopic data. A specific function has yet to be attributed to this domain unique to CarD that is absent in its eukaryotic HMGA counterparts. Sequence analysis indicates that the region corresponding to CD-(1-104) shares significant homology with an ϳ200-residue segment of a family of bacterial proteins referred to as transcription repair coupling factors or TRCFs where they constitute the RNA polymerase interacting module (63,64). TRCFs stimulate the repair of lesions in the transcribed strand by interacting with RNA polymerase (63,65). The sequences of E. coli and Bacillus subtilis TRCFS involved in RNA polymerase-binding share 32% identity, and this appears to be sufficient for interaction with heterologous RNA polymerases (66). The CarD N-terminal domain is, respectively, 25 and 26% identical in sequence to the E. coli and B. subtilis TRCF RNA polymerase-binding modules, and could conceivably be involved in interactions with RNA polymerase.
TRCFs appear to bind to both the holo and apo forms of RNA polymerase indicating that the subunit does not interfere with the binding (63). In the case of the CarD-dependent activation of the carQRS operon in M. xanthus, genetic analyses have revealed that the CarQ gene product is also essential (67,68). The amino acid sequence of CarQ revealed that it may be a member of the extracytoplasmic function subfamily of RNApolymerase -factors (69). Whether CarD interacts with CarQ or with the RNA polymerase needs to be experimentally determined and is beyond the scope of the present study. Nevertheless, drawing from parallels with HMGA proteins, where specificity and affinity are both enhanced by the highly cooperative assembly of transcriptional complexes (62), it is tempting to speculate that CarD may interact with any of CarQ, RNA polymerase, or other to be discovered factors in assembling an enhanceosome-like complex in the vicinity of the carQRS promoter region. Our experimental data demonstrate that CarD does exhibit HMGA-like minor-groove binding to a specific AT-rich sequence upstream of the Ϫ35 region of the carQRS promoter, albeit with a lower binding affinity relative to CD-(183-316) or HMGA, or to the HMGA IRE-binding site in eukaryotic DNA. Both the specificity and affinity of CarD could be enhanced by interactions with additional factors, and by the fact that the search for its specific AT-rich binding sites would be facilitated by the highly GC-rich nature (67.5% GC) of M. xanthus DNA (70). Moreover, interactions of CarD with itself and with other proteins could also serve to maintain and modulate the intracellular levels of this protein which has regions with considerable lack of defined structure (61). Based on the structural and functional information generated in this study, we are currently examining the existence of other DNA-binding sites and protein factors that interact with CarD, using a battery of techniques including two-hybrid analysis, co-immunoprecipitation, and the effects in vivo of specifically truncated fragments.