The amino-terminal domain of human STAT4. Overproduction, purification, and biophysical characterization.

The multifunctional signal transducer and activator of transcription (STAT) proteins relay signals from the cell membrane to the nucleus in response to cytokines and growth factors. STAT4 becomes activated when cells are treated with interleukin-12, a key cytokine regulator of cell-mediated immunity. Upon activation, dimers of STAT4 bind cooperatively to tandem interferon-gamma activation sequences (GAS elements) near the interferon-gamma gene and stimulate its transcription. The amino-terminal domain of STAT4 (STAT4(1-124)) is required for cooperative binding interactions between STAT4 dimers and activation of interferon-gamma transcription in response to interleukin-12. We have overproduced this domain of human STAT4 (hSTAT4(1-124)) in Escherichia coli and purified it to homogeneity for structural studies. The circular dichroism spectrum of hSTAT4(1-124) indicates that it has a well ordered conformation in solution. The translational diffusion constant of hSTAT4(1-124) was determined by nuclear magnetic resonance methods and found to be consistent with that of a dimer. The rotational correlation time (tauc) of hSTAT4(1-124) was estimated from 15N relaxation to be 16 ns; this value is consistent with a 29-kDa dimeric protein. These results, together with the number of signals observed in the two-dimensional 1H-15N heteronuclear single quantum coherence spectrum of uniformly 15N-labeled protein, indicate that hSTAT4(1-124) forms a stable, symmetric homodimer in solution. Cooperativity in native STAT4 probably results from a similar or identical interaction between the amino-terminal domains of adjacent dimers bound to DNA.

kine binding induces receptor dimerization, which brings cytoplasmic, receptor-associated JAKs into apposition and enables them to self-activate by reciprocal transphosphorylation. The activated kinases phosphorylate a distal tyrosine on the cytoplasmic tail of the receptor, which can then be recognized by the SH2 domain of a specific STAT protein. Upon association with the receptor, a tyrosine residue near the carboxyl terminus of the STAT protein is phosphorylated by the JAK kinase. Now activated, the STAT protein can form homo-or heterodimers in which the phosphotyrosine of one partner binds to the SH2 domain of the other. The STAT dimers then migrate to the nucleus, where they participate in transcriptional activation by binding to specific DNA sequences, termed interferon-␥ activation sequence (GAS) elements.
Thus far, only seven different STATs and four JAK family members have been identified (2), which raises the question of how transcriptional specificity is achieved in cytokine signal transduction: that is, how can a relatively small number of JAKs and STATs elicit distinct responses to a much larger number of cytokines and growth factors, particularly when all but one of the STATs appear to bind preferentially to the same DNA sequence (2). Recently, Xu et al. (5) characterized authentic binding sites for STATs 1, 4, 5, and 6 within the first intron of the interferon-␥ (IFN-␥) gene by DNase I footprinting. Remarkably, these experiments revealed that rather than binding to the same sites, as might have been expected, STATs 1, 4, and 5 each bound to a distinct pattern of adjacent sites, none of which bears a close resemblance to the high-affinity, consensus sequence identified by the random selection method. Instead, these binding sites consist of tandem arrays of imperfect GAS elements that are separated by 10 base pairs or about one turn of the helix in B-form DNA. The binding of STAT4 to these tandem sites is cooperative in nature; simultaneous occupancy of multiple sites is required to achieve a stable association with the DNA. The amino-terminal 124 residues of STAT4 are essential for cooperative binding to these adjacent low-affinity sites, but not for its ability to be phosphorylated, to dimerize, or to bind to, a single high-affinity site (5). Xu et al. (5) proposed that this cooperativity results from a direct interaction between the amino-terminal domains of STAT dimers bound to adjacent sites on DNA. A similar result has been obtained with STAT1 (6), and it seems likely that the other STATs also use their amino-terminal domains for cooperative binding (2).
STAT4 is activated in response to interleukin-12 (IL-12), which plays a primary role in the development of T helper 1 (T H 1) cells and in the induction of organ-specific autoimmune diseases (7,8). Cooperative binding of STAT4 dimers to adjacent low-affinity GAS sites is required for transcriptional activation of IFN-␥ (5), which is thought to mediate many of the effects of IL-12 (9). Therefore, a small molecule that can bind to the amino-terminal domain of STAT4 and prevent its self-association might be an effective immunosuppressant. To facilitate the discovery of such a therapeutic agent, we have undertaken an effort to determine the three-dimensional structure of this domain in solution and elucidate the molecular details of its self-association. To this end, we have overproduced the amino-terminal domain of human STAT4 (hSTAT4(1-124)) in Escherichia coli and purified it to homogeneity. Furthermore, we have characterized this fragment of STAT4 by circular dichroism (CD) and nuclear magnetic resonance (NMR) techniques and report that it has a well ordered conformation in solution that is amenable to structure determination.

MATERIALS AND METHODS
Plasmid Expression Vector-The segment of a human STAT4 cDNA clone that corresponds to amino acid residues 1-124 was amplified by the polymerase chain reaction (PCR), using primers PE-11 (5Ј-TATTA-TCATGAGTCAGTGGAATCAAGTCCAACAG-3Ј) and PE-12 (5Ј-ATTA-TAAGCTTGGATCCTTA CTGGACAGGCATGTTGGCTGCAGCCAAa-ATaCgaCgCTCTTCCCT-3Ј). The restriction sites within these primers that were used to cleave the PCR fragment prior to ligation (BspHI and BamHI) are underlined. Mismatches that were introduced within the PE-12 primer to change rarely used arginine and isoleucine codons to more common ones are indicated in lowercase type (see text for discussion). PCR amplification was performed with Pfu polymerase under the conditions recommended by the supplier (Stratagene). The PCR regimen entailed 25 cycles in a Perkin-Elmer Thermal Cycler 2400, with each cycle consisting of 30 s at 94, 55, and 72°C, respectively. The PCR fragment was cleaved with BspHI and BamHI and then ligated with the NcoI/BamHI vector backbone of pET3d (10) to construct pDW474. The nucleotide sequence of the STAT4 DNA in pDW474 was confirmed experimentally.
Protein Expression-Cells from single, drug-resistant colonies of E. coli BL21/DE3 containing pDW474 and pDC952 were grown to saturation in LB broth (11) supplemented with 100 g/ml ampicillin and 30 g/ml chloramphenicol at 37°C. The saturated cultures were diluted 100-fold in the same medium and grown in shake-flasks to mid-log phase (A 600 ϭ 0.5-0.7), at which time isopropyl-1-thio-␤-D-galactopyranoside was added to a final concentration of 1 mM. After 3 h, the cells were recovered by centrifugation. 15 N-Labeled hSTAT4(1-124) was expressed in M9 minimal medium (11) containing 1.5 g/L of 15 NH 4 Cl and 2.6 g/L of 15 N-Celtone (Martek Biosciences Corp.).
Protein Purification-E. coli cells containing hSTAT4(1-124) were resuspended in 0.01 culture volumes of lysis buffer (50 mM Tris (pH 8.0), 150 mM NaCl, 1 mM EDTA) and disrupted by three successive passes through a French press at 4°C. The soluble extract was prepared by centrifuging the disrupted cell suspension at 14,000 ϫ g for 15 min at 4°C. To remove nucleic acids and some acidic proteins from the soluble extract, polyethyleneimine (PEI) was added to a final concentration of 0.15%. After the mixture was kept on ice for 5 min, the precipitate was pelleted by centrifugation as above. Residual PEI was removed by a two-step ammonium sulfate precipitation at concentrations of 20 and 60%. The pellet from the 60% ammonium sulfate precipitation was resuspended in 50 mM Tris (pH 8.0), 1 mM EDTA, 5 mM dithiothreitol (DTT) (10 ml/liter of culture volume) and dialyzed extensively against the same at 4°C. The dialyzed material was clarified by centrifugation at 14,000 ϫ g for 10 min and then passed over an anion exchange column (Q-Sepharose, 1.6 cm ϫ 10 cm, Amersham Pharmacia Biotech) at 4°C. The bound protein was eluted from the column with a linear gradient of 0 -1 M NaCl in 50 mM Tris (pH 8.0) and 5 mM DTT; hSTAT4(1-124) eluted at approximately 250 mM NaCl. Fractions containing hSTAT4(1-124) were pooled and dialyzed against 25 mM sodium phosphate, 12 mM sodium formate, 12 mM sodium acetate (pH 5.25), 5 mM DTT at 4°C. This material was then passed over a cation exchange column (SP-Sepharose, 1.6 ϫ 10 cm, Amersham Pharmacia Biotech), and the bound protein was eluted from the column with a linear gradient of 0 -1 M NaCl in 25 mM sodium phosphate, 12 mM sodium formate, 12 mM sodium acetate (pH 5.25), 5 mM DTT. Again, hSTAT4(1-124) eluted at approximately 250 mM NaCl. The fractions containing hSTAT4(1-124) were pooled, concentrated to at least 2 mg/ ml, and dialyzed against 20 mM sodium acetate (pH 5.25), 25 mM NaCl, 5 mM DTT at 4°C. The final polishing step was performed on a preparative-grade gel filtration column (Superdex 75, 1.6 ϫ 60 cm, Amersham Pharmacia Biotech) equilibrated with the same buffer.
CD Spectroscopy-The CD spectrum was recorded on a JASCO 720 spectropolarimeter. The protein sample was 23 M in 20 mM sodium acetate-d 3 (pH 5.3), 50 mM NaCl, 5 mM DTT. Spectra were recorded from 195 to 350 nm using a 0.1-mm path length demountable cell (Uvonics) at 22°C. Ellipticity was calculated per residue.
NMR Spectroscopy-All NMR spectra were acquired on a Varian Unity-plus 600 spectrometer equipped with a Z-spec triple resonance, pulsed-field gradient probe (Nalorac Corp., Martinez, CA) at 25°C. Self-diffusion measurements were made using the water-sLED experiment as described (12,13), with a diffusion time of 134 ms, a recycle delay of 10 s, and 128 transients for each of the 25 values of the pulsed-field gradient ranging from 4 to 42 G/cm. The 1 H-15 N HSQC spectrum was acquired using coherence selection (14) and flipback pulses (15). 15

RESULTS
Overproduction of hSTAT4  in E. coli-Xu et al. (5) produced amino acids 1-124 of human STAT4 as a hexahistidine-tagged polypeptide in E. coli and demonstrated that this fragment could competitively inhibit the cooperative binding of STAT4 homodimers to adjacent low-affinity GAS sites. This result indicates that the isolated domain is capable of folding into an "active" conformation in the E. coli cytoplasm. We elected to express the same fragment of human STAT4 in an untagged (native) form to alleviate any concern that the polyhistidine tag might reduce the solubility or otherwise alter the behavior of the protein in solution. Accordingly, residues 1-124 were amplified from a cDNA clone by PCR and inserted into the bacteriophage T7 promoter vector pET3d (10), as described under "Materials and Methods." We noted that this interval of the human STAT4 cDNA contains four arginine codons that are rarely used in E. coli (AGG and AGA), including two consecutive ones. Several studies have demonstrated that these arginine codons, which are frequently present in eukaryotic cDNAs, can significantly impair the yield of a recombinant protein in E. coli (17,18) or even cause misincorporation of amino acids (19), particularly when they occur in tandem. Fortuitously, the pair of consecutive arginine codons in STAT4 was close enough to the carboxyl terminus of this domain that they could be mutated to alternative codons simply by incorporating the appropriate mismatches into the PCR primer that was used to amplify the  overproducing hSTAT4(1-124). Cell pellets were resuspended in lysis buffer (see "Materials and Methods") and disrupted by sonication. Total protein samples were prepared by combining equal volumes of the disrupted cell suspensions and 2 ϫ sample buffer concentrate (28). Soluble protein samples were prepared in a similar fashion after the insoluble material was pelleted by centrifugation at 14,000 ϫ g. Samples of total and soluble protein were separated on an 18% Tris-glycine gel (Novex, San Diego, CA) and stained with Coomassie Brilliant Blue. Lanes: 1, total protein from induced BL21/DE3 (pDC952) cells harboring pET3d; 2, total protein from uninduced BL21/ DE3 (pDC952) cells harboring pDW474; 3, total protein from induced BL21/DE3 (pDC952) cells harboring pDW474; 4, soluble protein from induced BL21/DE3 (pDC952) cells harboring pDW474; 5, molecular size standards (kilodaltons).
coding sequence for ligation with the expression vector. Another way to circumvent the problems associated with rare arginine codons is to overproduce the cognate tRNA (the product of the argU gene) on a compatible plasmid vector (17,19). Thus, as a precautionary measure, cells also contained the argU plasmid pDC952 (20).
E. coli BL21/DE3 cells containing the hSTAT4(1-124) expression vector (pDW474) and pDC952 were grown to mid-log phase in LB broth at 37°C and induced for several hours with isopropyl-1-thio-␤-D-galactopyranoside, after which samples of the total and soluble intracellular proteins were prepared and analyzed by sodium dodecyl sulfate polyacrylamide gel electro-phoresis (SDS-PAGE); the results are shown in Fig. 1. Samples of the total intracellular protein from uninduced BL21/DE3 (pDW474 ϩ pDC952) cells and from induced BL21/DE3 (pET3d ϩ pDC952) cells were also prepared as controls. The results indicate that upon induction with isopropyl-1-thio-␤-D-galactopyranoside, hSTAT4(1-124) accumulates to comprise approximately 30% of the total intracellular protein ( ing the unmodified pET3d expression vector (lane 1). No difference in the yield of hSTAT4(1-124) was observed in the absence of pDC952 (data not shown).
Purification of hSTAT4(1-124)-We used a combination of bulk fractionation methods and column chromatography to purify the recombinant hSTAT4(1-124) to homogeneity from an E. coli cell extract. The purity was monitored by SDS-PAGE after each step (Fig. 2A). The cells were lysed with a French press and then the nucleic acids and some endogenous proteins were precipitated by the addition of PEI. Next, additional E. coli proteins were precipitated by adding ammonium sulfate to a final concentration of 20% (w/v), which is the highest concentration that did not precipitate any of the hSTAT4 (1-124). After the insoluble debris was removed by centrifugation, the ammonium sulfate concentration was increased to 60% (w/v), at which point nearly all of the hSTAT4(1-124) was recovered in the pellet. Some contaminating proteins remained in the supernatant at this point, which was discarded. Following bulk fractionation with PEI and ammonium sulfate, the hSTAT4(1-124) was already approximately 50% pure, as gauged by densitometric scanning of SDS-polyacrylamide gels ( Fig. 2A,  lane 5).
After the residual ammonium sulfate was removed by dialysis, the dialysate was applied to an anion exchange column. hSTAT4  bound to the column and was eluted with a NaCl gradient. Next, because the isoelectric point of hSTAT4(1-124) is close to neutral, we were also able to find conditions under which the protein would bind reversibly to a cation exchange column. After both forms of ion exchange chromatography were used in succession, hSTAT4(1-124) was free of detectable contaminants except for a multimeric form of the protein that results from an intermolecular disulfide bond. These cross-linked multimers were readily removed by gel filtration, however.
The final preparation of hSTAT4(1-124) is extremely pure, yielding just a single band on a silver-stained SDS gel (Fig. 2B,  lane 2). Liquid chromatography electrospray mass spectrometry analysis revealed that this material consists of a single component with a molecular weight of 14,473 (data not shown). The predicted molecular weight of hSTAT4(1-124) without the amino-terminal methionine residue is 14,472. Because the initiator methionine of hSTAT4(1-124) is followed by a serine, we expected that it would be removed posttranslationally by E. coli methionine amino peptidase (21). This conjecture was confirmed by amino-terminal sequencing (data not shown). Finally, we note that our preparation of hSTAT4(1-124) appears as a single, sharp band on an isoelectric focusing gel (Fig. 2C). The measured isoelectric point is approximately 6.7, which is only slightly higher than the predicted value of 6.5. Hence, by a variety of rigorous criteria, hSTAT4(1-124) produced in E. coli and purified as described appears to be a chemically homogeneous protein.
CD Study-The secondary structure of hSTAT4(1-124) is predicted (22) to be composed entirely of ␣-helices and connecting loops, with no ␤-structure (Fig. 3A). As shown in Fig. 3B, the far-UV CD spectrum of hSTAT4(1-124) corroborates this prediction. The CD spectrum exhibits a strong peak of negative ellipticity at 222 nm, which is characteristic of a high degree of ␣-helical character and further serves to demonstrate that hSTAT4(1-124) has a well ordered structure in solution. Quantitative analysis of the spectrum yields an estimate of ϳ75% ␣-helix (23), which is in good agreement with the predicted secondary structure (Fig. 3A).
NMR Analysis-Because it was proposed that the aminoterminal domain of STAT4 mediates cooperative DNA binding through self-association, we sought to determine whether the isolated domain exists as a dimer in solution. Initially, we used NMR pulsed-field gradient self-diffusion measurements (12) as an experimental tool for this purpose. The data obtained from these experiments are summarized in Fig. 4. The self-diffusion coefficient, D S , of 0.86 ϫ 10 Ϫ6 cm 2 s Ϫ1 corresponds to a dimer for hSTAT4(1-124), based on comparisons with two monomeric proteins of similar size (lysozyme, 14.1 kDa, D S ϭ 1.04 ϫ 10 Ϫ6 cm 2 s Ϫ1 ; NusB, 15.7 kDa, D S ϭ 1.01 ϫ 10 Ϫ6 cm 2 s Ϫ1 ) and a dimeric protein, which is slightly larger than hSTAT4(1-124) (IL-10, 37.4 kDa dimer, D S ϭ 0.82 ϫ 10 Ϫ6 cm 2 s Ϫ1 ) (12,13).
A second assessment of the quaternary structure of hSTAT4(1-124) was made based on 15 N relaxation measurements, utilizing a sample of uniformly labeled 15 N-hSTAT4(1-124) prepared and purified as described under "Materials and Methods." The rotational correlation time was estimated from the ratio of 15 N T 1 /T 2 (24), which yielded a value of 16 ns at 25°C (data not shown). The estimated correlation time clearly indicates that the protein is not monomeric when compared with values of 9.3 ns for ribonuclease H at 27°C (17.6 kDa) (25) or 10.7 ns for human immunodeficiency virus protease at 27°C (21 kDa) (26) and is in the proper range for a dimeric system with an effective size of 29 kDa. A recent detailed analysis of a two-domain protein, which is similar in size to the hSTAT4(1-124) dimer, determined a value of 13.1 ns for the correlation time at 40°C (27). This value would be expected to be longer at 25°C, which is what we observed for hSTAT4 (1-124).
To characterize the structure of this domain in greater detail, we have examined its two-dimensional 1 H- 15  NMR spectrum (Fig. 5). This spectrum reports a resonance for each H N -N pair that does not undergo rapid exchange of the H N with the solvent H 2 O, including side chain NH 2 groups. There are approximately 110 readily discernible backbone amide signals present in this spectrum; the remaining expected resonances cannot be unambiguously counted because of spectral overlap. Hence, the dimer must be symmetric, giving rise to a single set of equivalent resonances for each monomer. Any disruption of the symmetry would lead to a significant increase in the number of observed resonances. DISCUSSION Although it is not required for binding to single, high-affinity GAS sites, the amino-terminal domain of STAT4 is essential for cooperative binding to tandem arrays of low affinity sites within the IFN-␥ gene and for the stimulation of its transcription in response to IL-12 (5). Like IFN-␥, many genes probably will utilize tandem arrays of low-affinity sites for activation. If so, then the amino-terminal domains of the STATs may be attractive targets for therapeutic agents that seek to attenuate cytokine signal transduction.
The experiments reported here indicate that hSTAT4(1-124) has a well ordered conformation in solution that is amenable to structure determination by heteronuclear NMR spectroscopy. Furthermore, they reveal that hSTAT4(1-124) forms a stable, symmetric homodimer at high micromolar concentrations. The existence of a dimeric form of hSTAT4(1-124) is consistent with current models of transcriptional activation, which postulate that cooperative binding of STATs to adjacent low affinity GAS sites involves a physical interaction between their aminoterminal domains (2,5,6).
But does the hSTAT4(1-124) dimer that we observe in solution at high micromolar concentrations have any physiological relevance? It has not been possible to detect a similar interaction between the amino-terminal domains of native (i.e. fulllength) STATs in solution (22). Rather, such an association is evident only when pairs of STAT dimers are bound cooperatively to adjacent sites on DNA (5, 6). It is not known whether the failure to detect this interaction in the absence of DNA is because of its inherently weak nature or whether the higher order structural organization of the STATs somehow precludes an intermolecular association between amino-terminal do-mains in solution. Because of its unusually low exctinction coefficient, it will be difficult to determine the equilibrium dissociation constant for the hSTAT4(1-124) dimer. Using the technique of dynamic light scattering, however, we have been able to show that hSTAT4(1-124) is unquestionably dimeric at 10 M, the lowest concentration at which we can obtain reliable data with our instrument (data not shown). Thus, the actual dissociation constant is probably on the order of 1 M or less. This value is several orders of magnitude lower than the protein concentration in the NMR experiments and not much greater than what is generally regarded as a physiologically meaningful concentration for an intracellular protein. Besides, in the presence of DNA, which forms a bridge between two pairs of STAT dimers, the local concentration of the aminoterminal domains is probably much greater than 10 M. For these reasons, we do not believe that the hSTAT4(1-124) dimer is merely an artifact that arises at unnaturally high protein concentrations.
If, as we expect, the structure of the hSTAT4(1-124) dimer is physiologically relevant, then this information could be exploited to design compounds that will bind to the monomeric form of the amino-terminal domain and prevent its self-association. In this regard, it is interesting to note that almost all of the amino acid side chains that compose the dimer interface in the crystal structure of murine STAT4(1-124) are not conserved among the different STATs (22). This suggests that it should be possible to develop compounds of this type with high specificity for a particular STAT. Spectra were processed using NMRPipe (29). Data were processed using time-domain solvent subtraction in t 2 and were linear predicted, zero-filled, and multiplied by a 90°-shifted squared sine-bell before Fourier transformation in t 1 .