HIV-1 Tat Is a Natively Unfolded Protein

Tat (transactivator of transcription) is a small RNA-binding protein that plays a central role in the regulation of human immunodeficiency virus type 1 replication and in approaches to treating latently infected cells. Its interactions with a wide variety of both intracellular and extracellular molecules is well documented. A molecular understanding of the multitude of Tat activities requires a determination of its structure and interactions with cellular and viral partners. To increase the dispersion of NMR signals and permit dynamics analysis by multinuclear NMR spectroscopy, we have prepared uniformly 15N- and 15N/13C-labeled Tat-(1–72) protein. The cysteine-rich protein is unambiguously reduced at pH 4.1, and NMR chemical shifts and coupling constants suggest that it exists in a random coil conformation. Line broadening and multiple peaks in the Cys-rich and core regions suggest that transient folding occurs in two of the five sequence domains. NMR relaxation parameters were measured and analyzed by spectral density and Lipari-Szabo approaches, both confirming the lack of structure throughout the length of the molecule. The absence of a fixed conformation and the observation of fast dynamics are consistent with the ability of Tat protein to interact with a wide variety of proteins and nucleic acid and support the concept of a natively unfolded protein.


Tat (transactivator of transcription) is a small RNA-binding protein that plays a central role in the regulation of human immunodeficiency virus type 1 replication and in approaches to treating latently infected cells. Its interactions with a wide variety of both intracellular and extracellular molecules is well documented. A molecular understanding of the multitude of Tat activities requires a determination of its structure and interactions with cellular and viral partners. To increase the dispersion of NMR signals and permit dynamics analysis by multinuclear NMR spectroscopy, we have prepared uniformly 15 N-and 15 N/ 13 C-labeled Tat-(1-72) protein.
The cysteine-rich protein is unambiguously reduced at pH 4.1, and NMR chemical shifts and coupling constants suggest that it exists in a random coil conformation. Line broadening and multiple peaks in the Cys-rich and core regions suggest that transient folding occurs in two of the five sequence domains. NMR relaxation parameters were measured and analyzed by spectral density and Lipari-Szabo approaches, both confirming the lack of structure throughout the length of the molecule. The absence of a fixed conformation and the observation of fast dynamics are consistent with the ability of Tat protein to interact with a wide variety of proteins and nucleic acid and support the concept of a natively unfolded protein.
Human immunodeficiency virus type 1 (HIV-1) 2 -infected resting CD4 ϩ memory T-cells form a persistent viral reservoir that is a major barrier to curing HIV-1 infection although they do not permit viral replication (1). A small, RNA-binding protein, Tat (transactivator of transcription) (2), plays a central role in the regulation of HIV-1 replication and in potential approaches to treating latently infected cells. During transcription of the viral DNA, RNA polymerase II stalls as a result of binding the negative transcription elongation factor, resulting in the production of prematurely terminated transcripts that may include the tat message (3). Following translation, the Tat protein is transported from the cytoplasm into the nucleus, where it binds a stem-loop structure (transactivation response element (TAR)) formed by the first 59 nucleotides of the HIV-1 RNA (4). Tat stimulates elongation of full-length transcripts by recruiting the positive transcription elongation factor b, a complex of a regulatory cyclin and cyclin-dependent kinase 9, that phosphorylates the C-terminal domain of RNA polymerase II, components of negative transcription elongation factor, and the transcription elongation factor Spt5 (5)(6)(7). Recent results suggest that Tat activates positive transcription elongation factor b by displacing Hexim1 (hexamethylene bisacetamide-inducible protein 1) from its cyclin T1 binding site (8) and that the affinity of the Tat-cyclin T1-cyclindependent kinase 9 complex for TAR is regulated through Tat acetylation by histone acetyltransferases (9,10). The absence of Tat and low levels of cyclin-dependent kinase 9 and cyclin T1 in resting CD4 ϩ T-cells are both implicated in HIV-1 latency (1). Tat may also be involved in derepression of heterochromatin, transcription initiation (11), and reverse transcription (12). In addition to its intracellular activities, the literature contains a plethora of reports of extracellular Tat activities, including general cytotoxicity, that may contribute to immune suppression (13,14) and the development of HIV-dementia (15).
A molecular understanding of Tat activity requires a determination of its structure and interactions with cellular and viral partners (16,17). HIV-1 Tat is a 101-residue protein encoded by two exons (3,18). The first exon defines amino acids 1-72 that encompass an acidic and proline-rich N terminus (amino acids 1-21), a cysteine-rich region (amino acids 22-37), a core (amino acids 38 -47), a basic region (amino acids 48 -57), and a Gln-rich segment (amino acids 58 -72) (19); it activates transcription with the same proficiency as the full-length protein (3, 20 -22). Residues 1-24 form the co-activator and acetyltransferase CREB-binding protein KIX domain binding site (17). Cyclin T1 is thought to interact with the Cys-rich region of Tat (23), and mutation of any one of six of the seven Cys residues results in inactivation (3). The end of the Cys-rich region and the core are involved in mitochondrial apoptosis of bystander noninfected cells through their ability to bind tubulin and prevent its depolymerization (24). The basic region is important for TAR binding (25) and nuclear localization, and the segment between Tyr-47 and Arg-57 has been used to transport a large variety of materials, including proteins, DNA, drugs, imaging agents, liposomes, and nanoparticles across cell and nuclear membranes (26). The Gln-rich region has been implicated in mitochondrial apoptosis of T-cells (27). The second tat exon defines residues 73-101 and includes an RGD motif that may mediate Tat binding to cell surface integrins (28). Despite numerous attempts, the biological function of the second exon-encoded polypeptide has been difficult to determine (12,22).
The Tat amino acid sequence has a low overall hydrophobicity and a high net positive charge, and analyses by several algorithms (29,30) suggest that it is a natively unfolded protein (31, 32) with a possible folding nucleus between residues 42 and 75; CD spectropolarimetry confirms the lack of secondary structure (17). Proteome analysis suggests that about 33% of eukaryotic proteins are natively unfolded or contain long stretches of unfolded polypeptide, whereas unfolded proteins are much less abundant among the archaea and eubacteria (5%) (33). One advantage that unfolded proteins have over folded proteins is that they provide a much larger surface area, enabling multiple interactions with other molecules (34). The most common types of natively unfolded protein in yeast are transcription factors and transcription regulators as well as kinases (35), and DNA is the most common target bound by unfolded proteins. A common mechanism of action for natively unfolded proteins involves partial or complete folding upon interaction with a binding partner (31,32). Areas of residual structure in unfolded proteins can be critical to forming the initial site of interaction with a binding partner (32), and delineation of those sites is required for the design of drugs and antibodies that interfere with binding.
There have been several attempts to determine solution conformations of Tat and its segments both alone and in complexes. 1 H NMR spectroscopy and molecular dynamics simulations suggested that Tat-(1-86) (Z-variant) forms condensed domains encompassing the core and Gln-rich regions, whereas the basic and Cys-rich regions were found to be highly flexible at pH 6.3 under reducing conditions (36). In a model of the 87-residue Tat Mal protein at pH 4.5 under oxidizing conditions, the N-terminal Trp-11 forms a hydrophobic core through interactions with Phe-38 and Tyr-47 (37). The basic region is in an extended conformation, and the Cys-rich region contains ␤-turns; ␣-helix is found in the Gln-rich segment. A low resolution, globular conformation with some flexible segments (particularly the basic region) was deduced for 13 C ␣ -Gly-labeled synthetic Tat-(1-86) (Bru) at pH 4.5, in the absence of reducing agents (38). An oxidized Tendamistat-Tat-(1-37) fusion protein showed multiple conformations with some evidence of helicity in the Cys-rich region (20 -33) at pH 3.5 (39). A fusion protein consisting of the activation domain from the unrelated equine infectious anemia virus and Tat- (48 -57) showed high helical content in the basic domain by NMR spectroscopy and CD (40).
There have also been several studies of Tat fragments in complex with TAR RNA mainly focusing on the conformation of TAR (41)(42)(43)(44)(45). NMR spectroscopy suggested a conformational change in Tat-  in the region of Gly-42 and Gly-44 upon binding to Tar (44). 1 H NMR was also used to show that Tat- (46 -55), acetylated at Lys-50, is bound in an extended conformation to the bromodomain of p300/CREB-binding protein-associated factor, a histone acetyltransferase transcriptional coactivator (10). CD spectra suggested the possibility of a conformational change in Tat-(1-86) upon binding to the KIX domain of CREBbinding protein (17). 15 N NMR relaxation measurements showed that Tat-(47-58) becomes slightly more ordered on binding heparin (46), and CD studies of overlapping peptide fragments suggested that the most flexible regions are those that are adjacent to the basic region (47).
Determining the mechanisms by which proteins attain their biologically active three-dimensional structures is a fundamental goal of biology, with implications ranging from molecular recognition to understanding amyloid diseases (48). Multinuclear NMR spectroscopy is yielding insights into the folding process by characterizing unfolded proteins and equilibrium intermediates. The detection of residual native and nonnative structures in strongly denatured proteins suggests that they are important in seeding the early steps of protein folding (49 -56). Molten globule states of proteins have also been investigated by NMR and provide insight into the nature of the folding pathway and the structure of the transition state (57,58). Thus, although NMR spectroscopy is not ideally suited to kinetic studies of protein folding, it is incomparable in its ability to provide detailed information about the structures and dynamics of unfolded and partially folded proteins (59). 1 H NMR spectra of Tat are very difficult to assign due to poor dispersion of the resonances (36 -39). In addition, fully and partially oxidized material is likely to be structurally heterogeneous, including the formation of disulfide cross-linked oligomers often observed on polyacrylamide gels and adding to the difficulty in structural characterization. To increase the dispersion of the NMR signals and permit dynamics analysis by multinuclear NMR spectroscopy, we have prepared uniformly

EXPERIMENTAL PROCEDURES
Preparation of Histidine-tagged Tat-The detailed Tat-(1-72) expression and purification protocol will be published elsewhere. Briefly, an Escherichia coli codon-optimized synthetic exon 1 tat gene contained in pSV2tat72 was obtained through the AIDS Research and Reference Reagent Program, Division of AIDS, NIAID, National Institutes of Health, from Dr. Alan Frankel (60) and cloned into pET28b(ϩ) (Novagen, Madison, WI), enabling protein expression with an N-terminal His 6 segment and T7 epitope that adds 20 residues to the 72-residue protein. Protein expression was from E. coli BL21(DE3)plysS cells (Novagen, Madison, WI) in M9 minimal medium supplemented with either 15 NH 4 Cl, or 15 NH 4 Cl and U-13 C 6 -D-glucose (Cambridge Isotope Laboratories Inc., Andover, MA). The expression protocol was modified from recently published methods (61) that reduce the consumption of isotopically labeled ingredients. The protein was purified by elution from a Talon TM (Cobalt-Superflow) metal affinity resin (Clontech, Palo Alto, CA) in a gravity flow column in 50 mM sodium acetate buffer, 6 M guanidine HCl, and 10 mM Tris(2-carboxyethyl)phosphine (Sigma) adjusted to pH 4.0 with sodium hydroxide. The eluted protein was pooled and serially dialyzed against degassed 0.1, 0.05, and 0.01 M ammonium acetate at pH 3 for ϳ6 h each. A final dialysis was done against degassed water for 4 h. The dialysate was then frozen and freeze-dried.
NMR Sample Preparation-Freeze-dried protein was dissolved in degassed buffer containing 50 mM acetate-d 4 , 20 mM MES buffer, 80 M sodium sulfite, 0.02% sodium azide, and 5% D 2 O, pH 4.0. The resulting protein solution had a measured pH of 4.1, uncorrected for isotope effects. The sample was placed in an NMR tube that had been purged with argon for 15 min, and the dissolved protein was added to the sample tube under an argon atmosphere. The NMR tube cap was sealed with Teflon tape (Dupont Dow Elastomers LLC, Wilmington, DE). The final protein concentration was about 1 mM. The purity of the protein was confirmed by matrix-assisted laser desorption ionization mass spectrometry. The pH titration was done by dissolving separate protein samples in degassed buffers as described above, with subsequent verification of the pH.
NMR Backbone Assignments-All backbone assignment experiments for the 92-residue 13 C/ 15 N-labeled histidine-tagged Tat-(1-72) were done on a 600-MHz Varian INOVA spectrometer equipped with a triple resonance probe head at 20.2°C, using standard Varian BioPack pulse sequences (62-67) (see Table 1). The NMR probe was calibrated with methanol (68), and all spectra were processed with NMRPipe (69). Spectra were apodized using a squared cosine bell function, zero-filled to twice the data set size, and linear predicted (forward-backward with eight prediction coefficients) prior to Fourier transformation. The dimensions of the resulting processed data sets were 4096 ϫ 1024 for the 1 H/ 15 N HSQC experiment and 2048 ϫ 256 ϫ 128 for all threedimensional experiments. The pulse sequences used are sensitivity-enhanced (with the exception of the HNHA experiment) and use gradients for coherence selection and water suppression (70). Radiation damping was suppressed with a water flip-back pulse (1.42 ms). 15 N decoupling during acquisition was done using the WALTZ-16 sequence (71) with a 7.196-kHz field strength. 1 H chemical shifts were referenced to the water signal that resonates 4.821 ppm from 2,2-dimethyl-2-silapentane sulfonate at 293 K (68). 15 N and 13 C referencing were done indirectly relative to 2,2-dimethyl-2-silapentane sulfonate as recommended (72).
Random coil values were subtracted from the measured chemical shifts after correction for sequence effects as described in Ref. 73. The 3 J HNH␣ measured were not corrected for relaxation effects and are likely to be 5-10% underestimated (68). The ⌬ 3 J HNH␣ values were calculated by subtracting the measured COIL values reported in (74).
NMR Relaxation Measurements-NMR relaxation data were collected on both 15 N-labeled and 13 C/ 15 N-labeled His-tagged Tat-(1-72) on a Varian INOVA 600 MHz spectrometer at the University of Manitoba and on a Varian INOVA 800-MHz spectrometer at the University of Alberta (NANUC) with triple resonance probe heads at 20°C, using BioPack pulse sequences (70,75,76). Cross-peak intensities were measured as peak heights. Spectra were processed with NMRPipe (69), which was also used to fit the relaxation data to two-parameter decays. The errors in the relaxation rates were calculated using the signal/noise ratios of the individual peaks and the fits of the data to the decays. Duplicate measurements were made to verify the error estimates.
A total of nine data sets were acquired to obtain R 1 relaxation rates using relaxation delays of 0, 50, 100, 250, 500, 1000, 1500, 3000, and 4000 ms. R 1 measurements were made with eight data sets with spin lock times of 30, 60, 90, 120, 150, 180, 210, and 240 ms. The 15 N spin lock continuous wave field strength for the R 1 relaxation experiments was 1.5 kHz, with 90°pulse lengths of 166.755 and 125.029 ms for the 600 and 800 MHz fields respectively. The R 1 measurements were corrected for offset from the carrier, using the measured R 1 values as described in Ref. 76. The peaks at the outer edges of the spectra required correction by less than 10%. R 2 measurements were done at 600 MHz only and with Carr-Purcell-Meiboom-Gill times of 30, 60, 90, 120, 150, 180, 210, and 240 ms. The R 1 and R 1 data were acquired using four transients, whereas eight transients were collected for the R 2 experiments; the postacquisition relaxation delay was 5 s. Data collected at 600 MHz were 2048 ϫ 256 complex points with SW[ 1 H] ϭ 10 ppm and SW[ 15 N] ϭ 24 ppm. The steady-state 1 H-15 N NOE values were obtained from ratios of peak heights from experiments with (I NOE ) and without (I noNOE ) saturation of the protons for 5 s at the beginning of the experiment. The heteronuclear NOE values were then obtained from (I NOE Ϫ I noNOE) / I noNOE . The spectra were acquired with 32 transients, a 5-s relaxation delay, and the same resolution as in the R 1 and R 1 experiments. Water suppression was achieved through the use of gradients to select for the 15  Relaxation Data Analysis-The measurement of NMR relaxation rates provides a window on protein dynamics over a broad range of time scales: 15 N longitudinal (R 1 ), transverse (R 2 ), rotating frame (R 1 ), and heteronuclear cross-relaxation (contained in the NOE) rates are sensitive to dynamics on the picosecond to nanosecond time scales, and R 2 and R 1 can also be sensitive to conformational exchange (R ex ) on the millisecond to microsecond time scales. The equations relating the macroscopic rates of relaxation (R x ) to the values of the spectral density of motions (J) at the nuclear spin transition frequencies () were given by Abragam (78) but can be approximated as follows (79,80).
and C ϭ (⌬ 2 N 2 )/3, where 0 is the permeability constant of free space (4 ϫ 10 Ϫ7 kg⅐m⅐s Ϫ2 ⅐A Ϫ2 ), ␥ H is the proton magnetogyric ratio (2.68 ϫ 10 8 rad⅐s Ϫ1 ⅐T Ϫ1 ), ␥ N is the magnetogyric ratio of 15 N (Ϫ2.71 ϫ 10 7 rad⅐s Ϫ1 ⅐T Ϫ1 ), r NH is the proton-nitrogen internuclear separation (104 pm), ⌬ is the difference between the parallel and perpendicular components of the 15 N chemical shift tensor (Ϫ163 ppm), and is Planck's constant divided by 2 (1.05 ϫ 10 Ϫ34 J⅐s). Note that R 2 and R 1 are determined by the same combination of spectral density values as long as the 15 N spin lock is on resonance for all spins (81). An exact solution of Equations 1-3, which contain six unknowns, is not possible, but because the spectral density is relatively flat at high frequencies, the number of variables can be reduced to three by estimating the spectral density values near J( H Ϯ N ), as described by Farrow et al. (82). Briefly, J(0.87 H ) is determined from Equation 3, and J(0.921 H ) and J(0.955 H ) are calculated directly from it using the assumption that J() ϰ 1/ 2 . One advantage to measuring R 1 is that, in contrast to R 2 , contributions from conformational exchange are eliminated (R ex ϳ 0) as long as the nitrogen carrier is placed on resonance and the spin lock power is sufficiently high (51,83,84). This allows a direct calculation of J( N ) and J(0) (strictly J( e ), the magnitude of the effective field/frequency in the presence of the spin lock) from the measured relaxation rates and steady-state NOE. Uncertainties in the spectral densities were determined by repeating the calculations 500 times using the S.D. values of the NMR measurements and Monte Carlo methods to generate a normal distribution and are described in more detail in Ref. 85. The calculations were done using the program Mathematica 5.0 and its simulated annealing protocol (86); statistical analyses were done with the program JMP IN 5.1.
Relaxation measurements were done at two fields to permit a finer mapping of the spectral density and more specifically to test the assumptions inherent in the reduced spectral density analysis. In addition, since R ex scales with the square of the applied magnetic field, it is possible to determine the contribution of R ex to Equation 2 by measuring relaxation parameters at two fields. Thus, R ex and J(0) values were calculated from the relaxation measurements at 600 and 800 MHz as described in Ref. 82. The R 1 relaxation data were modeled by assuming that the effect of its neighbors (j) on the correlation time of a residue (i) decreases exponentially as the distance from the residue increases and was first described in Ref. 51, where R 1 int is an intrinsic residue relaxation rate, N is the length of the polypeptide, V is the residue molecular volume (87), and is the persistence length of the polypeptide in residues.
A different solution to Equations 1-3 was proposed by Lipari and Szabo (88), who derived a simplified spectral density function J() LS on the assumption that global molecular reorientation ( c ) and fast internal motions ( e ) are stochastically uncorrelated (89), Lipari-Szabo "model-free" spectral density reduces the number of unknown parameters in Equations 1-3 to three: S 2 , the square of the generalized order parameter, which indicates the degree of spatial freedom of the internal motion; c , the global rotational correlation time for molecular reorientation, and e , the effective internal rotational correlation time, which is related to both the amplitude and the rate of internal motion. The separability of internal and overall dynamics is questionable for a random coil polymer, but comparisons of the Lipari-Szabo parameters with those obtained for other folded and unfolded proteins can be instructive. We have implemented the data analysis approach developed by Schurr et al. (90), in which all three Lipari-Szabo parameters are optimized for each residue individually, since this is reported to provide a significantly better fit to the NMR data (90). The errors in the Lipari-Szabo parameters were determined by Monte Carlo analysis as described above for the spectral density analysis, except that only 100 points were calculated (85).

RESULTS AND DISCUSSION
NMR Resonance Assignments-Sequential assignments of 1 H N , 15 N, 13 C', 13 C ␣ , 1 H ␣ , and 13 C ␤ resonances were done entirely with three-dimensional heteronuclear triple resonance experiments that use one-and two-bond scalar couplings to connect the atoms (68). The experiments take advantage of the comparatively wide chemical shift dispersions of 15 N, 13 CЈ, and 1 H N resonances in unfolded proteins (91,92). All of the backbone resonances were sequentially assigned, except for Met-1, Met-21, Phe-52, Phe-58, Arg-77, and Pro-78. The Met-1 amino and Gly-2 amide protons exchange too rapidly to be observed (see below). Resonances from Phe-52/58 could be assigned to residue type only, because they are both preceded by weak Cys resonances. Arg-77 and Pro-78 are part of the difficult sequence Arg-Arg-Arg-Pro-Pro and could not be unambiguously sequentially assigned. Of the assigned Pro resonances, all but one have 13 C ␤ chemical shifts characteristic of the trans peptide. Pro-38 13 C ␤ shifts are outside the canonical chemical shifts for the trans configuration but are nearer the trans than the cis configuration. As an example, parts of the HN(CA)CO and HNCACB experiments used for the backbone assignment of residues Gln 83 -Leu 89 are shown in Fig. 1. The assignments are listed in Table 1 of the supplemental data.
A 1 H/ 15 N HSQC spectrum of Tat is shown in Fig. 2 and contains three notable features. First, the cross-peaks are separated into regions typical of the residue random coil positions (Gly, Ser/Thr, other backbone amides), and the chemical shift dispersions of 15 N (20 ppm) and 1 H N (1.1 ppm) resonances are virtually identical to those of strongly denatured ubiquitin (93). Second, the peaks exhibit a range of intensities, with nearly all of the weak and medium intensity cross-peaks falling in the sequence between Cys-47 and Leu-63. This suggests that this region of the protein is under- Third, the spectrum contains several more mostly weak crosspeaks than can be accounted for by the backbone and side chain atoms of a 92-residue protein. Sixteen of the additional resonances have been sequentially assigned (indicated with the designation "a" in supplemental Table 1S), some were assignable to amino acid type, and some could not be unambiguously assigned. Eleven of the assigned minor resonances fall in the sequence between Cys-45 and Arg-69. One example of the multiplicity of cross-peaks is shown for the single Trp at position 31 (see Fig. 2a, inset), which exhibits one strong and two weaker side chain indole amine cross-peaks. The Trp is preceded by a Pro, so one of the two minor peaks could arise from the cis-peptide bond isomer. Another possible explanation is that some of the minor peaks are due to the presence of minor amounts of oxidized Cys residues that are either unassigned or unobserved. Another interesting example is Gly-64, which exhibits two amide cross-peaks of approximately equal intensity (Fig. 2). The nearest Pro to Gly-64 is separated from it by 14 residues, ruling out cis-trans Pro isomerism as an explanation of the peak multiplicity. This suggests that some segments of the reduced, monomeric Tat protein exist in multiple conformations that are in slow equilibrium on the chemical shift time scale (milliseconds to seconds). In the case of Gly-64, the two resonances have comparable intensity, suggesting equal populations of 2 conformers, whereas in many other cases one resonance is significantly more intense than those arising from alternate conformers, suggesting one dominant conformer and minor alternates. Gly-64 is located between Leu and the ␤-branched Ile, and steric crowding could locally restrict the dynamics of the Gly amide. Since there is a variation in the intensities of the duplicate peaks across the sequence, this immediately suggests that the conformers populated arise from local interactions, as expected in an unfolded protein.
Chemical Shifts and 3 J HNH␣ -The NMR chemical shift is a sensitive indicator of conformation, and assignment of backbone chemical shifts permits an analysis of secondary structure by comparison with random coil values corrected for local sequence effects (73,94). Consensus multinuclear ( 13 CЈ, 13 C ␣ , 13 C ␤ , and 1 H ␣ ) chemical shift indexing (95) suggests that all segments of the reduced protein at pH 4.1 exist in a random coil conformation (data not shown). Examination of the individual chemical shift indexing plots shown in Fig. 3 indicates that a majority of the resonances are within the random coil range and that rarely are there more than three consecutive resonances in the ␣-helix or ␤-sheet chemical shift ranges. However, among the 1 H N , 13 C ␣ , and 1 H ␣ resonances, there appears to be a slight weighting of the conformations by the ␣-helix, the most consistent classification being for the segment around Glu-29. Unlike some other denatured proteins (50,51,96), there is less evidence of a tendency to the ␤-sheet conformation, perhaps because of a paucity of hydrophobic ␤-branched amino acids in Tat-(1-72) (two Ile and four Val). However, there is a slight suggestion of this in the region between Val-56 and Ile-59. The conclusions based on chemical shift indexing are supported by the uncorrected 3 J HNH␣ measurements, which are all in the range of 5.5-7.1 Hz with a mean value of 6.71 characteristic of unfolded molecules (74). The results are shown in Fig. 3g, in which the differences of the measured values from random coil values are presented. They show that the entire polypeptide is undergoing rapid sampling of the ␣-helix and ␤-sheet regions of Ramachandran space with a slight preference for the ␣-helix in most segments.
NMR Relaxation-Despite the relatively poor chemical shift dispersion in the NMR spectra, relaxation data could be measured for 64/84 (60/84) non-Pro resonances at 600 MHz (800 MHz). As indicated in Fig.  4a, the steady-state heteronuclear NOEs measured at 600 and 800 MHz exhibit a relatively featureless, flattened bell-shaped variation with amino acid sequence, as expected for an unfolded protein (97). The NOEs range from Ϫ3.3 (Ϫ2.6) to Ϫ0.60 (Ϫ0.41) with mean values of Ϫ1.27 (Ϫ0.933) at 600 MHz (and 800 MHz). For comparison, an average NOE of about Ϫ0.2 is observed for several folded proteins with similar lengths of polypeptide chain (98 -100). The more negative NOE values for Tat indicate much less restricted dynamics on the nanosecond to picosecond time scales than for the folded proteins. The ends of Tat exhibit the most negative values indicative of faster dynamics and the values gradually increase away from the C terminus, whereas the increase away from the N terminus is steeper. Significant deviations from the average values are observed for Thr-43 and Lys-61-Ala-62.
The R 1 measurements show a similar bell-shaped profile with two notable features (Fig. 4b). The  (99,101). The slower relaxation in Tat indicates a shorter rotational correlation time and faster dynamics on the nanosecond to picosecond time scale than for a folded protein. Similar to the NOE values, the R 1 rates decline near the ends of the protein more steeply at the N terminus than the C terminus. The lowest rates, apart from the termini, are found in the segment connecting the Cys-rich and basic regions, between Thr-60 and Ser-66, and suggest fast dynamics there.
The rotating frame longitudinal relaxation rates (R 1 ) measured for Tat at 600 and 800 MHz are plotted in Fig. 4c. The R 1 rates range from 1.5 s Ϫ1 (1.3 s Ϫ1 ) to 5.9 s Ϫ1 (7.2 s Ϫ1 ) with mean values of 3.26 s Ϫ1 (3.29 s Ϫ1 ) at 600 and 800 MHz, respectively. The R 2 rates, measured at 600 MHz (supplemental data), range from 1.6 to 7.1 s Ϫ1 , with an average value of 3.5 s Ϫ1 . The differences between the R 2 and the R 1 rates presumably arise from contributions to the former from slow conformational exchange, but hydrogen exchange could also contribute to the higher R 2 values. In general, low R 1 and R 2 measurements indicate unrestricted fast dynamics, whereas high values suggest restricted fast dynamics and possible contributions from slow conformational exchange (100). In folded proteins of similar length to Tat, the R 2 values in the absence of exchange are on the order of 8 s Ϫ1 (99). The low R 1 and R 2 values and the negative NOE values measured for Tat indicate large amplitude fluctuations on the picosecond-nanosecond time scale characteristic of a random coil conformation.
The R 1 relaxation data at 600 MHz were fit to Equation 4, in which the influence of neighboring residues is modeled as a decaying exponential (51). The flattened, bell-shaped solid curve shows the behavior predicted for a random-coil polymer of uniform composition (Fig. 4d). The dashed line shows the variation in relaxation when residue contributions are weighted by residue volume (87). Although a number of individual residues deviate from the volume-weighted model, overall the theoretical line follows the data fairly closely and is a much better fit than the uniform polymer model. The minima in the model correspond mainly to the small, flexible residues Gly, Ala, and Ser, whereas the maxima are found at the positions of Trp, Arg, and Lys. In contrast to some other applications of this model to denatured proteins (51,102,103), there are no obvious regions with large positive deviations from the theoretical curve, further evidence that reduced Tat-(1-72) at pH 4.1 is predominantly a random coil. The segment from Pro-23-Pro-38 contains 5 prolines, and the measured R 1 values for most residues in this region are greater than the calculated ones. This suggests that the prolines restrict dynamics on the millisecond to microsecond time scale, stiffening the backbone in this region. In the C terminus, from residue 60 onward, the measured values generally fall below the calculated ones, suggesting greater flexibility in this region of the molecule. One exception is the high value for Gly-64, suggesting restricted motion and slow exchange at this position (102).
One final observation is that the region of the protein spanning residues 45-60 (Cys-rich region and core) contains the fewest dynamics measurements. This is because the peak intensities in this region are low. Since the largest number of assigned minor peaks is also found in this segment (supplemental data), this supports the suggestion that some residues in this segment undergo slow conformational exchange and are the most likely sites of folding nuclei.
Spectral Density Values-The values of the spectral density of atomic motions measured at 0, 61, 81, 522, and 696 MHz are plotted in Fig. 5. The high frequency values make a small, relatively uniform contribution to the relaxation across the sequence except at the N terminus, where a significant increase in high frequency motions is observed for the first 10 residues at 522 MHz (Fig. 5a). This is in contrast to urea-unfolded apomyoglobin in which minima in J(0.87 H ) plots correspond to maxima in buried surface area in the folded protein (102), suggesting that  hydrophobic interactions persist even in 8 M urea. The lack of definition in the J (522) and J (696) plots for reduced Tat-(1-72) suggest a lack of formation of any residual structure at pH 4.1.
The spectral density profiles at 61 and 81 MHz are highly similar and do show some variation with sequence (Fig. 5b). The ends of the protein, residues Lys-61 to Leu-63 and Thr-43, exhibit the smallest contributions at midfrequencies. Interestingly, in the acid-denatured state of apomyoglobin, maxima in buried surface area correlate weakly with maxima in the J( N ) plot (104), suggesting that J( N ) is sensitive to formation of transient folding nuclei. The smaller J( N ) values in Tat near Lys-61 and at the termini suggest more flexibility in these regions. However, there is little evidence of transient folding in the J( N ) data for Tat- (1-72).
The low frequency spectral densities cover a wider range of values but also contain the highest levels of error in comparison with the values calculated at high frequencies (Fig. 5c). This arises from the uncertainties involved in the T 1 measurements. The most notable feature is a peak in slow motions centered at the His 6 affinity tag. There are also less well defined peaks in the proline-rich region and in the basic region. The Cys-rich segment contains the fewest measurements, and they are associated with some of the largest errors. The errors arise from the weak peak intensities in this region of the protein, probably indicating the presence of conformational exchange in this region.
Model-free Analysis-Lipari-Szabo model-free analysis of the relaxation measurements is shown in Fig. 6 and in the supplemental data. Calculations were done both separately for the data collected at each field (supplemental data) and for the two fields combined (Fig. 6) and both with and without R ex terms. The following observations were made in all analyses. In general, the R ex contributions were less than 1 s Ϫ1 for most residues, the largest value being 2.5 s Ϫ1 . The average S 2 values were 0.55 and 0.50 for calculations with and without R ex , respectively. These results are very similar to analyses of relaxation data in other unfolded proteins, where order parameters in the range of 0.4 -0.6 were determined (82,105). Although the most ordered part of the protein is found in the His 6 affinity tag, some of the highest errors in the analysis are found here. This region also exhibited the highest R 2 values, suggesting that accounting for conformational exchange or hydrogen exchange, not detectable by R 1 , may improve the analysis. The c values show a slight bell-shaped variation with sequence (supplemental data), the correlation times at the ends of protein being smaller than in the center. Furthermore, the average c values (2.1 and 3.3 ns, calculated with and without R ex respectively) are barely a factor of 10 greater than the average e values (0.19 and 0.25 ns), and in some cases the errors in the internal and overall correlation times overlap. Both of these observations support the notion that HIV-1 Tat-(1-72) exists in a random coil conformation in which there is no clear separation of internal and overall rotational correlation times (97).
Effect of pH-The NMR samples used for the assignment and relaxation analysis of Tat were stable for over 1 year. Fig. 7 shows the effects of pH on the 1 H/ 15 N HSQC spectrum of Tat-(1-72). The most obvious result is an overall reduction of cross-peak intensities. Fast hydrogen exchange with water might account for this, because exchange at rates greater than 10 3 s Ϫ1 will result in the loss of signal intensity from chemical exchange line broadening. To determine if the observed losses are a result of the increasing rates of hydrogen exchange, the theoretical hydrogen exchange rates in unstructured Tat at various pH values were calculated, taking into account the nearest neighbor inductive and steric effects (106) (not shown). For unstructured Tat, the exchange rates between pH 3.3 and 5.8 are   -(1-72) at 293 K observed at pH 3.3 (a), pH 4.1 (b), and pH 5.8 (c). All samples are ϳ1 mM and were obtained from a single expression/purification. Each spectrum was collected with 32 transients, 2048 ϫ 256 complex points, and sweep widths of 10 ppm in F2( 1 H) and 24 ppm in F1( 15 N).
predicted to increase on the order of 250-fold. The calculated exchange rates at pH 5.8 range from 0.91 s Ϫ1 for Gln-92 to 3.8 ϫ 10 3 s Ϫ1 for Gly-2, and it is likely that for many of the peaks, the intensity loss is attributable to rapid hydrogen exchange. For example, the histidines in the affinity tag and the Cys residues are predicted to be the fastest exchanging amides, and their cross-peaks are lost early in the pH titration. In these cases, the loss of cross-peaks from the spectra is thus indirect evidence that a residue is not involved in a stable, folded conformation. However, detailed analysis of the peaks shows that hydrogen exchange alone cannot explain all of the peak heights (e.g. Thr-40 and Gly-64 are predicted to exchange slowly, yet they disappear early in the titration (Fig. 7)). These results suggest that some cross-peaks lose intensity because of the development of local conformations that are in intermediate exchange on the millisecond to microsecond time scale as observed in the molten globules of two other proteins (58).
Conclusions-Natively unfolded proteins have been classified into two categories, depending on whether they exist in a random coil conformation or in a condensed premolten globule (32). Measurement of both static and dynamic multinuclear NMR parameters shows that Tat-(1-72) exists predominantly in a random coil conformation at pH 4.1. However, multinuclear NMR has also uncovered evidence for multiple backbone conformations mainly, but not exclusively, in the Cys-rich region and core. The possible origin of the minor cross-peaks includes cis-trans proline peptide isomerization, minor Cys oxidation, and multiple conformers in slow equilibrium. The multiplicity of some peaks in the spectra, together with broadened peaks and the changes in peak intensity as a function of pH suggest that the Cys-rich and core regions form transiently stabilized structures at acidic and neutral pH. The present results are pertinent to Tat interactions with intracellular binding partners, such as cyclin T1, that are expected to encounter only reduced Tat in the intracellular environment. Cyclin T1 probably recognizes the transiently stabilized structure that forms in the Cys-rich region of Tat. Furthermore, the affinity of Tat for the loop region of TAR is greatly increased by interaction with cyclin T1, suggesting binding-induced folding (107), another feature of natively unfolded proteins. Multinuclear NMR is likely to be of value in determining the structures of complexes of Tat and its interaction partners. Finally, there is considerable interest in developing a Tat vaccine (70) based on the presence of Tat antibodies in HIV-1-infected individuals who are long term nonprogressors to AIDS. The antibodies raised against oxidized protein putatively recognize conformational epitopes, suggesting that at neutral pH parts of the protein exist in a stable conformation. The present dynamics analysis suggests that the most likely region to fold is the Cys-rich region, and the formation of disulfide bonds could stabilize local structure there. However, the high positive charge density, paucity of hydrophobic residues, and present dynamics analysis suggest that the remainder of the protein is unlikely to form a stable conformation even at neutral pH.