Visualizing a specific contact in the HIV-1 Tat protein fragment and trans-activation responsive region RNA complex by photocross-linking.

Replication of human immunodeficiency virus type 1 (HIV-1) requires specific interactions of Tat protein with the trans-activation responsive region (TAR) RNA, a stem-loop structure containing two helical stem regions separated by a trinucleotide bulge. The Tat protein contains a basic RNA-binding region (amino acids 49-57) located in the carboxyl-terminal half of the protein, and peptides containing this basic domain of Tat protein can bind TAR RNA with high affinities. We synthesized a 31-amino acid Tat fragment (amino acids 42-72) containing the basic region and part of flanking regulatory core domain that formed a specific complex with TAR RNA. Upon UV irradiation (254 nm), this Tat fragment cross-linked covalently with TAR RNA. Sites of cross-links were determined on both the TAR RNA and Tat protein fragment by RNA and protein sequencing, respectively. These results revealed that guanosine 26 of TAR RNA was cross-linked with tyrosine 47 of the Tat peptide. Our results provide the first physical evidence for a direct amino acid-base contact in Tat-TAR complex. Recently, orientation of the Tat-(42-72) was determined in our laboratory by psoralen.Tat-(42-72) conjugate (Wang, Z., and Rana, T. M. (1995) J. Am. Chem. Soc. 117, 5438-5444). On the basis of our findings, we suggest a model in which Tat binds to TAR RNA by inserting the basic recognition sequence into the major groove with an orientation where lysine 41 in the core domain of Tat contacts the lower stem and Tyr47 is close to G26 of TAR RNA. The knowledge of the orientation of Tat and details of other interactions with TAR RNA in Tat-TAR complex has significant implications for understanding gene regulation in HIV-1.

The role of RNA-protein interactions is vital for many regulatory processes, especially in gene regulation where proteins specifically interact with binding sites found within RNA transcripts. RNA molecules can fold into extensive structures containing regions of double-stranded duplex, hairpins, internal loops, bulged bases, and pseudoknotted structures (1,2). Due to the complexity of RNA structure, the rules governing sequence-specific RNA-protein recognition are not well understood. Recent structural studies have demonstrated that RNA-binding proteins interact with RNA in both the minor and major grooves. For example, two tRNA synthetases (alanine and glutamine) interact with the acceptor stems of their cognate tRNAs in the minor grooves (3,4). Major groove recognition takes place between aspartyl-tRNA synthetase and its cognate tRNAs at a site of local distortion in the RNA helix (5). Bulge loops or bulges (unpaired nucleotides on one strand of a duplex) in RNA helices are potentially important in tertiary folding of RNA and in providing sites for specific RNA-protein interactions, as illustrated by TFIIIA of Xenopus (6) and the coat protein of phage R17 (7). In a recent report, interactions between U1 small nuclear RNA and the N-terminal domain of the human U1A protein were mapped by multidimensional heteronuclear NMR studies (8). These studies showed that protein-RNA contacts occur at the single-stranded apical loop of the hairpin and also in the major groove of the helical stem at neighboring U-G and U-U non-Watson-Crick base pairs (8). Crystal structure of the RNA-binding domain of the U1A spliceosomal protein complexed with an RNA hairpin also revealed that the loop sequence (AUUGCAC) interacts with the surface of the four-stranded ␤-sheet (9). On the basis of NMR data, it has been shown that TAR 1 RNA in HIV-1 changes its conformation upon arginine binding (10,11). All of these studies suggest that the diversity of RNA structures plays a central role in their specific recognition by proteins.
The promoter of the human immunodeficiency virus type 1 (HIV-1), located in the U3 region of the viral long terminal repeat, is an inducible promoter that can be stimulated by the trans-activator protein, Tat (12). As in other lentiviruses, Tat protein is essential for transactivation of viral gene expression (13)(14)(15)(16). In the absence of Tat, most of the viral transcripts terminate prematurely, producing short RNA molecules ranging in size from 60 to 80 nucleotides. Jeang et al. (17) reported that integrated HIV-1 promoters did not show a high rate of abortive transcription. Nonetheless, HIV-1 proviruses and integrated long terminal repeats respond efficiently to Tat (17). The Tat protein is a small, cysteine-rich nuclear protein containing 86 amino acids and comprised of three important functional domains. HIV-1 Tat protein acts by binding to the TAR (trans-activation responsive) RNA element, a 59-base stemloop structure located at the 5Ј-ends of all nascent HIV-1 transcripts (18 -22). Upon binding to the TAR RNA sequence, Tat causes a substantial increase in transcript levels (23)(24)(25)(26)(27). The increased efficiency in transcription may result from preventing premature termination of the transcriptional elongation complex (28) or from enhancing initiation of transcription (29). TAR RNA was originally localized to nucleotides ϩ1 to ϩ80 within the viral long terminal repeat (18). Subsequent deletion studies have established that the region from ϩ19 to ϩ42 incorporates the minimal domain that is both necessary and sufficient for Tat responsiveness in vivo (21,30,31). As shown in Fig. 5, the TAR RNA contains a six-nucleotide loop and a three-nucleotide pyrimidine bulge, which separates two helical stem regions (18,21,22,25). The trinucleotide bulge is essential for high affinity and specific binding of the Tat protein (32,33).
The Tat protein contains a basic RNA-binding region (amino acids 49 -57) located in the carboxyl-terminal half of the protein (19, 34 -37). Peptides containing the basic domain (residues 49 -57) of Tat protein can bind TAR RNA with high affinities (36, 38 -44). We used a 31-amino acid Tat fragment (amino acids 42-72) to form a specific complex with TAR RNA. Upon UV irradiation, this Tat fragment formed a covalent cross-link with TAR RNA. Sites of cross-links were determined on both the TAR RNA and Tat protein fragment by RNA and protein sequencing, respectively, which revealed that Tyr 47 of Tat is close to G 26 of TAR RNA. Our results provide the first physical evidence for a direct amino acid-base contact in Tat-TAR complex.

Oligonucleotide Synthesis
DNAs-All DNAs were synthesized on an Applied Biosystems ABI 392 DNA/RNA synthesizer. The template strand encodes the sequence for the TAR RNA wild type. The top strand is a short piece of DNA complementary to the 3Ј-end of all template DNAs having the sequence 5Ј-TAATACGACTCACTATAG-3Ј. DNA was deprotected in NH 4 OH at 55°C for 8 h and then dried in a Savant Speedvac. The samples were resuspended in sample loading buffer and were purified on 8 M urea-20% acrylamide denaturing gels, 450 ϫ 0.8 mm. Gels were run for 4 h at 30 W until bromphenol blue tracking dye was 5 cm from the bottom of the gel. DNAs were visualized by UV shadowing, excised from the gel, and eluted in 50 mM Tris, 50 mM boric acid, 1 mM EDTA, and 0.5 M sodium acetate. DNAs were ethanol-precipitated and resuspended in diethyl pyrocarbonate-treated water. Concentration of DNAs was determined by measuring absorbance at 260 nm in a Shimadzu UV spectrophotometer. Samples were stored at Ϫ20°C.
RNAs-RNAs were prepared in vitro by transcription from synthetic DNA templates by T7 polymerase (45). The template strand of DNA was annealed to an equimolar amount of top strand DNA, and transcriptions were carried out in transcription buffer and 4.0 mM NTPs at 37°C for 2-4 h. For reactions containing 8.0 pmol of template DNA, 40 -60 units of T7 polymerase was used. Reactions were stopped by adding an equal volume of sample loading buffer. RNAs were purified by electrophoresis on an 8 M urea, 20% polyacrylamide gel as described above. The sequence of RNAs was determined by base hydrolysis and nuclease digestion. RNA, starting with 300 pmol, was 5Ј-end-labeled with 32 P to a specific activity of 2000 ϫ 10 4 cpm/pmol after dephosphorylation with calf intestinal alkaline phosphatase (Promega), as described (46). The labeled RNAs were purified by electrophoresis on an 8 M urea, 20% polyacrylamide gel or by a Sep-Pak C 18 cartridge (Waters, Millipore Corp.).

UV Irradiation of RNA-Peptide Complex
A typical cross-linking reaction mixture (15 l) contained 0.25 M labeled RNA, 1.9 M Tat peptide, 80 g/ml bovine serum albumin, 25 mM Tris (pH 7.4), and 100 mM NaCl. RNAs in H 2 O were heated at 75°C for 2 min and slowly cooled to room temperature. The reaction mixtures were incubated at 25°C for 30 min and cooled on ice before UV irradiation. The UV irradiation was conducted on ice for 10 min (254 nm; 1280 ergs/mm 2 /s) in a Rayonet RPR 100 photochemical reactor (Southern New England Ultraviolet Inc.). After UV irradiation, 2 l of 20 mg/ml yeast tRNA (to remove Tat peptide from non-cross-linked RNApeptide complex) and 10 l of sample loading buffer were added. The mixtures were heated at 75°C for 1 min before being loaded on 8 M urea, 20% polyacrylamide gel. After electrophoresis, the gels were directly autoradiographed. Efficiencies of cross-linking were determined by a PhosphorImager analysis (Molecular Dynamics).

Base Hydrolysis of RNA Cross-linked with Peptide
5Ј-end-labeled cross-linked RNA-peptide complexes were prepared as described above. After electrophoresis, the band corresponding to the cross-linked RNA-peptide complex, identified by autoradiography, was excised and eluted into the TBE buffer at 4°C overnight. The samples were then ethanol-precipitated and dried in a Savant Speedvac. Approximately 2 pmol of 5Ј-end-labeled RNA-peptide complex was partially hydrolyzed in the presence of 50 mM sodium carbonate (pH 9.2) for 20 min at 85°C. Size was assigned to the individual band by comparison with the migration of RNA digestion products produced by RNase T1 and Bacillus cereus (Pharmacia Biotech Inc.). respectively. Under these conditions, electrophoretic mobility shift assays revealed (data not shown) only one slow migrating RNA-peptide complex, indicating the absence of other nonspecific RNA-peptide complex formation (44,48). TAR RNA labeled at the 5Ј-end with 32 P was incubated with the Tat peptide for 20 min in 25 mM Tris (pH 7.4) and 100 mM NaCl and UV-irradiated with 254-nm light (see "Experimental Procedures"). Products of the photoreaction were analyzed by denaturing 8 M urea-polyacrylamide gel electrophoresis (Fig. 1). Irradiation of the RNA-peptide complex yields a new band with electrophoretic mobility less than that of TAR RNA (lane 4). Both the peptide and UV irradiation are required for the formation of a cross-linked RNA-protein complex. This is evidenced by the fact that no cross-linked products are observed when RNA is irradiated in the absence of Tat peptide (lane 2) or incubated with peptide in the dark without UV irradiation (lane 3). Further control experiments showed that no crosslinking was observed when RNA and peptide were irradiated separately and then mixed (lane 5). Digestion of the RNApeptide cross-link with Proteinase K (5 units for 30 min at 37°C) resulted in an RNA species with mobility similar to TAR RNA (lane 6). The products of irradiation were also analyzed on a denaturing SDS-15% polyacrylamide gel. Again, a photoproduct with electrophoretic mobility less than that of TAR RNA was observed that was dependent on the presence of RNA and peptide (data not shown). The photoproduct yield is ϳ5% as determined by a PhosphorImager analysis. Since the crosslinked RNA-peptide complex is stable to alkaline pH (9.5), high temperature (85°C) and denaturing conditions (8 M urea, 2% SDS) we conclude that a covalent bond is formed between TAR RNA and the peptide during the cross-linking reaction.

Purification of Cross-linked Complex
Dependence of the Cross-linking Reaction on the Concentration of Peptide and Time of Irradiation-Formation of RNApeptide photocross-link was dependent on the concentration of Tat peptide. Here, as in Fig. 1, the major cross-link product (XL1) has slightly lower electrophoretic mobility than TAR RNA. In Fig. 2, the efficiency of XL1 formation was increased as the peptide concentration was raised from 0.13 M to 1.25 M. At a peptide concentration higher than 1.25 M, a second minor cross-linked product (XL2) with a lower electrophoretic mobility than that of XL1 was observed.
The photocross-linking reaction between the Tat peptide and TAR RNA was also dependent on time of irradiation. The yields of cross-linked RNA-peptide complex were increased with an increase in time of irradiation (Fig. 3). In this experiment, similar to that shown in Fig. 2, extended time of irradiation also resulted in the formation of XL2 at 30 and 40 min. This second minor photoproduct could be the result of nonspecific binding of the peptide to RNA (at higher concentrations of peptide) or nonspecific association of photodamaged RNA and peptide after longer irradiation times. Further characterization of this minor photoproduct was not carried out in this study.
Specificity of the Cross-link Formation-Specificity of the cross-linking reaction was established by competition experiments. Cross-linking reactions were performed in a 15-l volume containing 0.25 M of 32 P-5Ј-end-labeled TAR RNA, 1.9 M Tat peptide, 80 g/ml bovine serum albumin, 25 mM Tris-HCl (pH 7.4), 100 mM NaCl, and up to 10 M unlabeled competitor RNA. Cross-linked products were separated by 8 M urea-20% acrylamide gels and quantitated by PhosphorImager analysis. Fig. 4 shows that cross-linking was inhibited by the addition of unlabeled wild-type TAR RNA and not by a mutant TAR RNA lacking the trinucleotide bulge. Additional control experiments showed that cross-linking did not occur between a mutant TAR RNA without trinucleotide bulge and Tat-(42-72) (data not shown). Therefore, we conclude that formation of a specific RNA-protein complex between TAR RNA and Tat peptide is necessary for photocross-linking.
Guanosine 26 in TAR RNA Cross-links to Tat-Mapping of the cross-link site on TAR RNA to single nucleotide resolution was carried out by partial alkaline digestion of gel-purified RNA-protein cross-link XL1. Fragment sizes are determined by comparison with RNA oligonucleotides of defined sequence and length generated by digesting RNA with RNases T 1 and B. Cereus. Base hydrolysis of RNA and cross-linked RNA-peptide complex generates a ladder of RNA degradation products. Bands of cross-linked RNA-peptide complexes migrate more slowly than corresponding free RNA (Fig. 5A, lanes 2 and 3). Base hydrolysis of the 5Ј-end-labeled cross-linked complex (Fig.  5A, lane 3) results in an RNA ladder in which all fragments up to uridine 25 are resolved. There is an obvious gap in the hydrolysis ladder after U 25 , indicating that the fragments above uridine 25 from the 5Ј-end are linked to the Tat peptide (Fig. 5A, lane 3). A standard base hydrolysis ladder was observed for 5Ј-end-labeled TAR RNA showing sensitivity to base hydrolysis at all positions, including U 25 in the sequence (Fig.   FIG. 1. Separation of covalently  5A, lane 2). Thus, we conclude that guanosine 26 of TAR RNA is the site at which cross-linking occurs (Fig. 5B).
Cross-linking Occurs at Tyr 47 of Tat-To identify the amino acid(s) of Tat that are involved in specific cross-linking with TAR RNA, the cross-linked RNA-peptide complex (XL1) was prepared on a preparative scale (see "Experimental Procedures"), purified from noncross-linked TAR RNA by ion ex-change fast protein liquid chromatography, and digested with trypsin. The tryptic digest products were purified by 8 M urea, 20% acrylamide denaturing gels and visualized by autoradiography. We recovered ϳ100 pmol quantities of a tryptic fragment of XL1 and subjected it to N-terminal sequencing. The amino acid sequencing data showed that it had a sequence of Ala-Leu-Gly-Ile-Ser-X-Gly-Arg-Lys-Lys. This sequence corresponds to the sequence encompassing amino acids 42-51 in HIV-1 Tat protein (Fig. 6). X at the 6th position represents a nonstandard amino acid instead of tyrosine 47 of the Tat peptide. Thus, cross-linking occurs at tyrosine 47 of the Tat peptide. DISCUSSION Ultraviolet-induced cross-linking of RNA to proteins is a widely used technique to study in vitro and in vivo RNA-protein interactions (49 -51). UV irradiation with sufficient intensity generates a highly reactive species of RNA, which reacts with protein and organic molecules involved in making direct contacts with RNA (52,53). To identify specific RNA-protein contacts, we irradiated TAR RNA and Tat-(42-72) protein complex with UV light and observed the formation of a covalent bond between RNA and protein. Formation of this covalently crosslinked product was dependent on the concentration of Tat peptide and irradiation time (Figs. 2 and 3). Our competition and control experiments showed that a specific RNA-protein complex formation between TAR RNA and Tat fragment was necessary for photo-crosslinking reactions (Fig. 4).
To locate the cross-link sites in TAR RNA and the Tat peptide, we prepared the RNA-protein cross-link on a preparative scale, purified the cross-link, and analyzed it by RNA and protein sequencing. Alkaline hydrolysis of 5Ј-end-labeled crosslinks indicated that a single nucleotide, G 26 , in TAR RNA was involved in covalent interaction (Fig. 5, A and B). The absence of bands in the hydrolysis ladder after U 25 from the 5Ј-end of RNA indicates that the RNA fragments after U 25 are covalently linked to the Tat peptide and migrate more slowly to create a gap in the standard hydrolysis ladder. Our results clearly demonstrate that cross-linking occurs at at G 26 in TAR RNA.
Peptide sequencing on a tryptic fragment of the cross-link complex was accomplished by Edman degradation chemistry. The sequencing data indicate that cross-linking occurred at Tyr 47 of the Tat peptide. As shown in Fig. 6, peptide sequencing identified a nonstandard amino acid X at the 6th position of the cross-linked peptide, Ala-Leu-Gly-Ile-Ser-X-Gly-Arg-Lys-Lys. This sequence corresponds to the region encompassing amino acids 42-51 in HIV-1 Tat protein (Fig. 6). The nonstandard amino acid most likely corresponds to a photomodified tyrosine. Sequencing of proteins by Edman degradation chemistry requires unmodified amino and carbonyl groups in the backbone of the peptide. Evidence that the Edman sequencing reaction  3). The sequence of TAR RNA from C 19 to U 25 is labeled. A gap in the sequence is obvious after the U 25 residue, indicating that G 26 is the cross-linked base. B, sequence and secondary structure of wild-type TAR RNA used in this study. TAR RNA spans the minimal sequences that are required for Tat responsiveness in vivo (21) and for in vitro binding of Tat-derived peptides (38). Wild-type TAR contains two non-wild-type base pairs to increase transcription by T7 RNA polymerase. U25 represents the nucleotide at which the hydrolysis of the 5Ј-end-labeled cross-linked RNA-peptide complex was stopped. The arrow indicates the location of guanosine 26, which is the cross-linked base in TAR RNA (shown in boldface). was able to continue through Tyr 47 indicates that cross-linking does not occur at these locations (54). Therefore, we conclude that the aromatic side chain or C-␣ atom in the peptide backbone of Tyr 47 is involved in the covalent cross-link formation with TAR RNA.
It has been shown by a number of groups that Tat-derived peptides that contain the basic arginine-rich region of Tat are able to form in vitro complexes with TAR RNA (36, 38 -44). Recently, Churcher et al. (44) published a detailed comparative study arguing that Tat peptides can mimic the binding affinity and specificity of Tat protein. Results from that study showed that the addition of amino acid residues from the core region of the Tat protein to the arginine-rich domain-containing peptides increased binding specificities (44). To achieve specific RNA binding by a Tat fragment, we used a Tat peptide, Tat- , that contained an RNA-binding domain and six amino acids from the core domain of the Tat protein. In this report, our cross-linking results have established that this Tat-(42-72) peptide forms a specific covalent photocross-link to TAR RNA where Tyr 47 of the peptide contacts G 26 of the RNA.
What is the biological relevance of these findings? A number of studies showed that the immediate stem nucleotide base pairs flanking the bulge region of TAR RNA are required for Tat binding and trans-activation (44,55,56). During a detailed mutational analysis of TAR RNA, it was reported that a change of the G 26 -C 39 base pair to C 26 -G 39 base pair resulted in only 12% trans-activation by HIV-1 Tat (56). These reports strongly support our finding that G 26 is directly involved in sequencespecific recognition and trans-activation by HIV-1 Tat protein.
However, Tat protein mutants where Tyr 47 was substituted with Ala or His were functional for trans-activation (57,58). These data raise the possibility that Tyr 47 is not essential for RNA recognition and that the cross-link formation between Tyr 47 and G 26 could be the result of close proximity and favorable photochemistry. To address this question, we carried out cross-linking experiments with a Tat fragment lacking Tyr 47 , Tat-(48 -72), which binds TAR RNA with high affinities (38,39,44). UV irradiation of TAR RNA complexed with Tat-(48 -72) did not yield any specific RNA-protein cross-link products (data not shown). These results support our model of Tat-TAR interactions where the basic recognition sequence of Tat is located in the major groove of TAR RNA, bringing Tyr 47 in close vicinity of G 26 (Fig. 7). The cross-link formation between G 26 and Tyr 47 is likely the result of close proximity, favorable orientation, and photoreactivity of tyrosine.
How does Tat interact with TAR RNA? Several lines of evidence suggest that Tat protein contacts TAR RNA in a widened major groove. In a recent study from our laboratory, we used a rhodium complex, bis(phenanthroline)(phenanthrenequinone diimine)-rhodium(III) (Rh(phen) 2 phi 3ϩ ), to probe the effect of bulge bases on the major groove width in TAR RNA (59). This metal complex does not bind double helical RNA or unstructured single-stranded regions of RNA. Instead, sites of tertiary interaction that are open in the major groove and accessible to stacking are targeted by the complex through photoactivated cleavage (60). The sites targeted by the rhodium complex have been mapped to single nucleotide resolution on wild-type TAR RNA and on several mutants of the TAR RNA containing different numbers of mismatch bases in the bulge region (59). A strong cleavage at residues C 39 and U 40 was observed on the wild-type TAR RNA and in mutant TAR RNA containing two mismatch bases in the bulge. No cleavage at C 39 and U 40 was observed in a bulgeless TAR RNA and in a onebase bulge TAR RNA. Our studies establish two important factors involved in Tat-TAR recognition. (i) There is a correlation between major groove opening and Tat binding. At least a two-base bulge is required for major groove widening and other conformational changes to facilitate Tat binding. This cannot be accomplished by a single base bulge. (ii) The Tat fragment Tat-(42-72) occupies the major groove of TAR RNA and abolishes access of the rhodium complex. On the basis of chemical modification and gel mobility studies, a similar model was suggested earlier by Weeks and Crothers (55). Last, Hamy et al. (61) carried out site-specific modifications of functional groups on TAR RNA and showed that Tat forms multiple specific hydrogen bonds to a series of dispersed sites displayed in the major groove.
To determine the relative orientation of the nucleic acid and protein in the Tat-TAR complex, we have devised a new method based on psoralen photochemistry (48). We synthesized a 30amino acid fragment containing the arginine-rich RNA-binding domain of Tat (residues 42-72) and chemically attached a psoralen at the amino terminus. Upon near ultraviolet irradiation (360 nm), this synthetic psoralen peptide cross-linked to a single site in the TAR RNA sequence. The RNA-protein complex was purified, and the cross-link site on TAR RNA was determined by chemical and primer extension analyses. Our results show that the amino terminus of Tat-(42-72) contacts, or is close to, uridine 42 in the lower stem of TAR RNA (48).
On the basis of the above studies, we suggest a model in which Tat binds to TAR RNA by inserting the basic recognition  (48). The TAR RNA structure is based on NMR data (63). Ribbon structure of TAR RNA is shown in five dark lines. The basic region of Tat-(47-57) is represented as a barrel positioned in the wide major groove, and the N-terminal region containing Tat-(42-46) is drawn as a line. Tyrosine 47 is shown directly above the G 26 of TAR RNA (indicated in black) to demonstrate a close proximity between Tyr 47 and G 26 . As determined by psoralen-Tat cross-linking experiments, the amino terminus of Tat-(42-72) contacts, or is close to, uridine 42 in the lower stem of TAR RNA (48); the amino terminus of the peptide is labeled as NH 2 , and its proximal base, U 42 of TAR RNA, is indicated in black. Structures of TAR RNA were visualized using Insight II software on an IRIS work station. sequence into the enlarged major groove with an orientation where lysine 41 in the core domain of Tat contacts the lower stem and Tyr 47 is close to G 26 of TAR RNA (Fig. 7). These findings are intriguing and suggest a possible mechanism of RNA recognition by Tat.