Presteady-state Analysis of Avian Sarcoma Virus Integrase

Integrase catalyzes insertion of a retroviral genome into the host chromosome. After reverse transcription, integrase binds specifically to the ends of the duplex retroviral DNA, endonucleolytically cleaves two nucleotides from each 3′-end (the processing activity), and inserts these ends into the host DNA (the joining activity) in a concerted manner. In first-turnover experiments with synapsed DNA substrates, we observed a novel splicing activity that resembles an integrase joining reaction but uses unprocessed ends. This splicing reaction showed an initial exponential phase (k splicing = 0.02 s−1) of product formation and generated products macroscopically indistinguishable from those created by the processing and joining activities, thus bringing into question methods previously used to quantitate these reactions in a time regime where multiple turnovers of the enzyme have occurred. With a presteady-state assay, however, we were able to distinguish between different pathways that led to formation of identical products. Furthermore, the splicing reaction allowed characterization of substrate binding and specificity. Although integrase requires only a 3′ hydroxyl with respect to nucleophiles derived from DNA, it specifically favors the cognate sequence CATT as the electrophile. These experimental results support a two-site “switching” model for binding and catalysis of all three integrase activities.

Integrase catalyzes insertion of a retroviral genome into the host chromosome. After reverse transcription, integrase binds specifically to the ends of the duplex retroviral DNA, endonucleolytically cleaves two nucleotides from each 3-end (the processing activity), and inserts these ends into the host DNA (the joining activity) in a concerted manner. In first-turnover experiments with synapsed DNA substrates, we observed a novel splicing activity that resembles an integrase joining reaction but uses unprocessed ends. This splicing reaction showed an initial exponential phase (k splicing ‫؍‬ 0.02 s ؊1 ) of product formation and generated products macroscopically indistinguishable from those created by the processing and joining activities, thus bringing into question methods previously used to quantitate these reactions in a time regime where multiple turnovers of the enzyme have occurred. With a presteady-state assay, however, we were able to distinguish between different pathways that led to formation of identical products. Furthermore, the splicing reaction allowed characterization of substrate binding and specificity. Although integrase requires only a 3 hydroxyl with respect to nucleophiles derived from DNA, it specifically favors the cognate sequence CATT as the electrophile. These experimental results support a two-site "switching" model for binding and catalysis of all three integrase activities.
After infection, retroviruses create a linear DNA copy of their RNA genome that, through the strand-transfer mechanism of reverse transcriptase, places the U3 region of the LTR sequence at one terminus and the U5 region of the LTR sequence at the other terminus (1). Retroviral replication is dependent on the viral protein integrase catalyzing the recombination of the viral DNA genome into the host genomic DNA. Integrase binds to the two blunt-ended viral LTRs, hydrolyzes the terminal two nucleotides to expose a recessed 3Ј-OH of the conserved CA dinucleotide at each of the ends (the processing activity), and inserts these "processed" ends into the host DNA (the joining activity) at sites separated by a virus-specific stagger of six base pairs for avian sarcoma virus (ASV). 1 The location of the insertion is nearly random as there is little sequence specificity for the site of recombination within the host genome (2)(3)(4). The processing and joining activities are biochemically similar in that both use a hydroxyl group as the nucleophile in an endonucleolytic cleavage. In the case of the processing activity (3Ј-dinucleotide removal), the enzyme is specific in its choice of electrophile (the cognate CATT), whereas the nucleophile (the processed CA-OH) is specified in the joining reaction (strand transfer). Both the ends-processing and joining activities have been reproduced in vitro with purified recombinant integrase and oligonucleotides whose sequences are derived from the retroviral U3 and U5 LTR sequences (5)(6)(7). Detailed examination of the processing activity in vitro has revealed that integrase requires the physiologically relevant configuration of both U3 and U5 ends to be bound concurrently for maximal efficiency of processing catalysis, with the U3 sequence undergoing both the cleavage and recombination reactions earlier than the U5 sequence (8).
Structural investigations of integrase suggest that the enzyme possesses three structural domains, 1) an N-terminal domain characterized by a zinc-stabilized helix-turn-helix, 2) a central core domain with a D,D(35)E motif, and 3) a C-terminal domain with structure resembling a Src homology 3 (SH-3) domain (for review, see Refs. 9 -11). Although there is much evidence that active integrase functions as a multimer (12)(13)(14)(15), it is still undetermined how the three domains interact with each other within a single integrase monomer or a multimer integrase-DNA complex. Wang et al. (16) have recently reported the structural solution of a two-domain fragment of human immunodeficiency virus-1 integrase that suggests a dimer of dimers resembling Tn5 transposase. However, the actual oligomeric state and geometric arrangement of integrase monomers within the active DNA-protein complex remain unconfirmed, and the detailed catalytic mechanism of the coordinated cleavage of four different DNA segments, resulting in the concerted insertion of the two LTR ends, remains unsolved. Two general models of the organization of the multimeric complex necessary to catalyze the insertion of the two ends of DNA have been proposed (17)(18)(19). In one model, separate integrase molecules bind each of the two processed viral DNA ends with two separate integrase molecules binding host DNA to prepare the two host phosphates for attack. In the other model, each integrase monomer contains separate binding sites for host and viral DNA, and a single active site catalyzes both processing and joining reactions, thus requiring the active integrase complex to only be a dimer. An unanswered question with both of the models, however, is the manner in which the LTR ends, which serve as electrophiles in the processing reaction, come to reside in the active site as nucleophiles in the joining reaction (where the host DNA is the electrophile). This question becomes increasingly complex when one considers the possible combinations of nucleophiles recognized by integrase (20) and the fact that there must be specific and nonspecific DNA binding sites.
In the course of undertaking a presteady-state investigation of the processing reaction using synapsed-end substrates, we developed an assay that features a preincubation step to form DNA-protein complexes before the initiation of catalytic activity. Along with analysis of product formation during the first enzymatic turnover, the presteady-state assay enabled direct comparison of the reactivity of substrates while minimizing the complications involved in the quantitation of enzymatic activity. Using this assay for the processing reaction with synapsed-end substrates modeled after those of Kukolj and Skalka (5), we report here the discovery of a novel splicing reaction with products nearly indistinguishable from those of the processing and joining reactions. The identification of the splicing reaction resolved complications with the accurate quantitation of enzymatic activity. Additionally, the splicing reaction was used as a tool to gain significant insight into the structure-function relationship in the mechanism of sequence recognition by integrase. Specifically, the splicing reaction allowed us to probe the selection of nucleophiles and electrophiles by the structurally defined, sequence-specific binding sites of integrase. The results from these experiments allow us to propose a model for the configuration of the nucleophile and electrophile within the enzyme active site that satisfies the varying specificity requirements of the three integrase-catalyzed reactions.
For clarity, the remainder of this report will refer to the 3Ј-end dinucleotide TT-trimming activity as the "processing" reaction, the subsequent sequence-dependent insertion of processed ends into a double-stranded DNA target as the "joining" reaction, and this novel activity with synapsed substrates as the "splicing" reaction.

EXPERIMENTAL PROCEDURES
Reagents and Buffers-Except where noted, all buffers were made with reagent grade chemicals and Milli-Q Plus (Millipore, Bedford, MA) purified distilled-deionized water. Urea, SDS, and dithiothreitol were Ultrapure grade obtained from U. S. Biochemical Corp. Kanamycin sulfate was obtained from Amresco (Solon, OH). EDTA, HEPES, and ammonium sulfate were SigmaUltra grade obtained from Sigma. Spectroscopic grade glycerol was from Aldrich. Isopropyl ␤-D-thiogalactopyranoside was from Fermentas (Hanover, MD). Buffer A is 50 mM Tris-HCl, pH 8.0, 10% sucrose (w/v). Buffer B is 50 mM Tris-HCl, pH 7.5, 10% glycerol (v/v). When NaCl is added to either buffers A or B, the resulting buffer is denoted with an A or B followed by the NaCl concentration in mM. Storage buffer is 50 mM Na-HEPES, pH 7.5, 500 mM NaCl, 40% spectroscopic grade glycerol. 5% polyethyleneimine (PEI) was made from 50% (w/v) PEI (Sigma), and the pH was adjusted to 7.5 with concentrated HCl. All buffer stock solutions were filtered through a 0.2-m polyethersulfone filter (Nalgene, Rochester, NY).
Enzymes and Proteins-Restriction enzymes and T4 DNA ligase were obtained either from Fermentas (Hanover, MD) or Invitrogen. T4 polynucleotide kinase and acetylated bovine serum albumin were obtained from U. S. Biochemical Corp. Lysozyme was obtained from Sigma. ASV integrase was purified as described under "Purification of Integrase." Synthetic Oligodeoxyribonucleotides-Oligodeoxyribonucleotides were synthesized by the Center for Gene Research and Biotechnology Central Services Laboratory (Oregon State University) and purified by denaturing PAGE (20 or 13% acrylamide, 8 M urea in TBE) as previously described (21). Reversed-polarity oligodeoxyribonucleotides were synthesized using 5Ј-␤-cyanoethyl phosphoramidites (Glen Research, Sterling, VA). Concentrations were determined spectrophotometrically in Tris-EDTA using the calculated extinction coefficients at 260 nm (22) listed in Table I. Radiolabeled oligodeoxyribonucleotides were 5Ј-endlabeled except where otherwise specified and are designated by an asterisk (*). 5Ј-Radiolabeled oligodeoxyribonucleotides were prepared using 3 units of T4 polynucleotide kinase, 70 Ci of [␥-32 P]ATP (Amersham Biosciences), and 50 pmol of DNA in 10 l using the manufacturer's reaction buffer. After incubation at 37°C for 15 min, reactions were quenched by the addition of 10 l of 500 mM EDTA. Complementary strands of DNA were then added at equimolar amounts and annealed by heating to 100°C followed by slow cooling to room temperature over a period of 15-20 min. Kinase was removed by extraction with phenol: chloroform:isoamyl alcohol (25:24:1) followed by a chloroform-only extraction. Unincorporated nucleotides and residual organic material were removed by purification through a Bio-Spin-6 micro-column (Bio-Rad). Final yield and purity of radiolabeled substrates were determined by thin-layer liquid chromatography on PEI cellulose plates developed in a mobile phase of 0.75 M LiCl, 1.25 M formic acid, 40% ethanol. Under these conditions, labeled DNA remains at the origin, whereas unincorporated nucleotides are eluted by the mobile phase. Accurate quantitation of DNA yield is achieved by comparing the amount of radioactivity bound at the origin for samples obtained before and after purification steps. Typically, 87-90% of the DNA is recovered with complete removal of unincorporated nucleotides. In addition, comparison of the radioactivity in unincorporated label and DNA in the pre-purification sample shows that the efficiency of the labeling reaction exceeds 90% under these conditions.
The naming convention used for annealed DNA substrates is as follows. 1) Strands with sequences derived from the U5 and U3 ends of the ASV genome are designated with a "5" and "3," respectively, 2) strands of duplex DNA containing ASV integrase cognate sequence, CATT, are designated with a "t," 3) strands containing the complementary GTAA sequence are designated with a "b," 4) synapsed strands are designated with the length of the tether within parentheses, 6) duplex names consist of a list of the names of all oligodeoxyribonucleotide strands annealed separated by slashes (/), and 7) sequences labeled with 32 P will be represented in the text with an asterisk (*) at the beginning or end to denote 5Ј-or 3Ј-end radiolabeling, respectively.
Bacterial Strains and Plasmid DNA-Plasmid pRC23(IN) in Escherichia coli MC1061 has been described (23). The translated sequence of this construct differs from the sequence recorded for the Schmidt-Ruppin B strain in the GenBank (accession number AF052428). These The following is the naming convention used for annealed DNA substrates. Strands with sequences derived from the U5 and U3 ends of the ASV genome are designated with a "5" and "3," respectively. Strands containing the cognate sequence, CATT, are designated with a "t." Strands containing the complementary GTAA sequence are designated with a "b." Synapsed strands are designated with the length of the tether within parentheses. Duplex names consist of a concatenation of the names of all oligodeoxyribonucleotide strands annealed separated by slashes (/). Sequences labeled with 32 P will be represented in the text with an asterisk (*) to denote 5Ј-end radiolabeling.
differences have been attributed to strain variance. 2 Because of the possible importance of one of these differences, a glutamic acid at position 256 was altered to match the lysine of the reported sequence by PCR site-directed mutagenesis. The altered gene was subsequently subcloned into pET24a (Novagen Madison, WI) to produce the recombinant plasmid, pET24a(IN). Overproduction of integrase from this construct was obtained in E. coli BL21(DE3) (Novagen) by using isopropyl ␤-D-thiogalactopyranoside induction at 25°C. Although recombinant integrase from either variant is biologically competent, 2 in presteady state in vitro assays, integrase with Lys-256 produced 2-fold greater processing and splicing burst amplitudes (data not shown).
Protein Expression-E. coli BL21(DE3)/pET24a(IN) was grown in Luria-Bertani broth with 30 g/ml kanamycin, 20 mM HEPES, pH 7.5, and 6 drops/liter Sigma antifoam B at 25°C to A 600 Х 0.8 (ϳ4 ϫ 10 8 cells/ml) and induced with 1 mM isopropyl ␤-D-thiogalactopyranoside, 0.5 mM ZnCl 2 , and additional Sigma antifoam B. The cells were then grown for 3 h at 25°C and harvested by centrifugation, rinsed in buffer A containing 100 mM NaCl (A100), recentrifuged, and the pellet was stored at Ϫ80°C. The average yield of wet weight cell paste ranged from 4 to 5 g/l of culture. The crude post-induction cell lysate was analyzed by SDS-PAGE stained with Coomassie Blue.
Purification of Integrase-All purification steps were performed at 4°C or on ice except where noted. Frozen cells (35 g) were thawed and completely resuspended in 80 ml of buffer A20 plus 1 ml of protease inhibitor cocktail (Sigma, P8849), 5 mg of carboxypeptidase inhibitor from potato tuber (Sigma), and 300 g/ml lysozyme. The suspension was incubated for 25 min at room temperature with constant stirring, placed on ice, and adjusted to 0.52 M NaCl, 1 mM ZnCl 2 , 5 mM dithiothreitol. An additional 1 ml of protease inhibitor mixture and 5 mg of carboxypeptidase inhibitor were added, and the suspension was stirred for an additional 20 min. The lysate was sonicated for 20 min (20-W output, Fisher Scientific 60 Sonic Dismembrator) to reduce viscosity followed by centrifugation at 43,700 ϫ g for 60 min to remove cellular debris, yielding 90 ml of cleared lysate. The cleared lysate was adjusted to 100 mM NaCl by the addition of buffer B with no salt (B0). Integrase was precipitated by the addition of 5% PEI while stirring to a final concentration of 0.05% (v/v). Stirring was continued for an additional 20 min, and the precipitate was collected by centrifugation at 15,300 ϫ g for 20 min. The pellet was washed by homogenization using a Dounce homogenizer with 100 ml of buffer B with 350 mM NaCl (B350), and the pellet was recovered by centrifugation at 15,300 ϫ g for 15 min. The pellet was then extracted twice using a Dounce Homogenizer with 150 ml of buffer B500 containing 3 mM dithiothreitol and 50 nM ZnCl 2 followed by centrifugation at 15,300 ϫ g for 20 min each. The combined 300 ml of "PEI eluate" was diluted to 350 mM NaCl with buffer B0 and loaded at a flow rate of 5 ml/min onto a 38-ml Macro-Prep High S (Bio-Rad) column (6.4 ϫ 2.6 cm) pre-equilibrated in B350. The column was washed to base line with 3-4 column volumes of buffer B500, and integrase was eluted with a 225-ml linear NaCl gradient from 0.5 to 1.5 M NaCl in buffer B at the same flow rate. Integrase was eluted as a broad peak centered around 1.1 M NaCl. Fractions from 900 mM to 1.4 M NaCl were pooled. The front portion of the elution peak, corresponding to an NaCl concentration lower than 900 mM, was excluded from the pooled eluate to avoid a nuclease contaminant. Integrase was precipitated with ammonium sulfate at 56.4% saturation and redissolved in 20 ml of storage buffer. The concentrated protein was dialyzed against 3 changes of 500 ml of storage buffer. The final integrase concentration in storage buffer after a 45-min clearing spin at 43,700 ϫ g typically ranged from 130 to 200 M as determined spectrophotometrically using ⑀ 280 ϭ 59,940 M Ϫ1 cm Ϫ1 . This value of ⑀ 280 was calculated based on the presence of 2 Tyr, 10 Trp, and 4 Cys (24). UV spectra of integrase obtained in 0 -8 M urea showed no changes in A 280 upon unfolding of the protein. Protein was stored at Ϫ80°C in 0.3-1-ml aliquots. The protocol yielded 2-2.5 mg of purified integrase (Ͼ99% homogeneity as judged by SDS-PAGE stained with Coomassie Blue)/g of wet cells. The purified pooled fractions, when assayed for both single and double-stranded nuclease activity, showed less than 5% nonspecific degradation of labeled DNA at incubation times in excess of 60 min, which exceeds the longest time points used in our experiments.
Presteady-state Burst Assay-Standard reaction mixtures (100 l) contained 10 mM Na-HEPES, pH 7.5, 4% glycerol, 20 mM Tris, pH 8.0, 10 mM 2-mercaptoethanol, 0.050 mg/ml acetylated bovine serum albumin, and 130 mM NaCl. Integrase was first preincubated with radiolabeled oligonucleotide substrates on ice for 30 min then warmed briefly for 2 min at 37°C immediately before the start of the assay. A complete range of preincubation (0 -2 h) and warm-up times (0 -15 min) were tested to ensure the mixtures had achieved equilibrium while minimizing enzyme degradation. Reactions were initiated by the addition of 37°C MnCl 2 to 5 mM. Although there is uncertainty in the literature with regard to the physiological divalent metal cofactor for integrase, Mn 2ϩ was used because it has been observed that retroviral integrases exhibit maximal activity in standard assays with this divalent cation (20). Additionally, no activity was detected with ASV integrase when Mg 2ϩ was used at physiological concentrations. A range of MnCl 2 concentrations (0 -20 mM) were examined to optimize enzymatic activity. At time points ranging from 4 s to 30 min, aliquots of the reaction mixture were withdrawn and added to a quenching solution (8 M urea, 0.25 M EDTA, 20% sucrose) in a 2:1 quench:reaction mix ratio. Generally, seven time points were taken in the first 30 s of the reaction with seven more time points in the ensuing 29 min. The reaction products were then analyzed by denaturing sequencing PAGE (20% acrylamide, 8 M urea, TBE) using a Sequi-Gen GT 38 ϫ 39 cm apparatus (Bio-Rad). Bands in the gel were visualized by using a Molecular Dynamics Phos-phorImager (Amersham Biosciences), and the intensity of DNA bands was quantitated using ImageQuant software (Amersham Biosciences).
Quantitation-The intensity of each product band, I i (t), at each time, t, was first normalized with respect to the sum of intensities in the starting substrate band, I 0 (t), plus all product bands according to Equation 1.
This normalized product fraction, F i,norm , was then corrected for background intensity present at t ϭ 0 for the ith band and renormalized for background intensities of all product bands to obtain the final corrected product fraction, F i,corr according to Equation 2.
The number of exponential phases observable for any particular experiment increased with the length of the time regime under examination (8). Accordingly, the resulting time courses were fitted to Equation 3 consisting of n exponential terms, with amplitudes A i and apparent rate constants burst,i , plus a linear term with an apparent rate constant lin to fit to the linear portion of the ensuing exponential phase.
Non-linear least squares fittings were performed using Kaleidagraph software (Synergy, Redding, PA).

Preparation and Purification of Splicing Product for Maxam-Gilbert
Sequencing-A 300-l reaction was performed under standard conditions with 3 M integrase and 3 M *5t/5b(2)3t/3b or *5t-2/5b(2)3t/3b (for oligodeoxyribonucleotide definitions see Table I) and allowed to proceed for 6.5 min. The reaction was stopped with quench solution, and the products were separated by denaturing sequencing PAGE in a 20% acrylamide, 8 M urea gel. The product bands were recorded on Kodak X-Omat film, and the largest product band was excised from the gel and isolated with an Elutrap electroelution device (Schleicher & Schuell). Maxam-Gilbert sequencing was performed on the purified product DNA according to the published protocol (25).
Double-filter Binding-Nitrocellulose filter binding experiments were performed with only minor modifications to the double-filter method previously described (26). Standard assay buffer without Mn 2ϩ was used for binding studies. Integrase was first incubated on ice for 30 min with radiolabeled oligonucleotide substrates. After incubation, the solution was filtered through a combination of nitrocellulose (BA-S83) and DEAE (1 NA-45 or 3 DE-81) membranes (Schleicher & Schuell or Whatman, Maidstone, England) by using a 96-well Bio-Dot microfiltration apparatus (Bio-Rad). Immediately after filtration of the reaction solution, the wells of the apparatus were rinsed twice with an equal volume of ice-cold buffer. The filters were imaged by using a Molecular Dynamics PhosphorImager (Amersham Biosciences) and quantitated as described (26). For experiments using DE-81 membranes, the radioactivity of all three membranes was quantified and summed. 2 R. Katz, personal communication. Fig. 1A shows results from a typical first-turnover experiment performed with a synapsed DNA substrate, *5t/5b(2)3t/3b, as detected by electrophoresis on an 8 M urea, 20% acrylamide sequencing gel. This substrate contains DNA sequences from the U3 and U5 ends of the viral LTR organized with the cognate CATT sites held together by a two-nucleotide single-stranded tether in a head-to-head configuration (5). The substrate was radiolabeled at 5t, the 21-mer DNA strand containing the U5 sequence. This 21-mer terminates at the 3Ј-end with the processing cognate sequence, CATT, which is endonucleolytically cleaved by integrase at the Ϫ2 position to exponentially yield an expected 19-mer as the major product at 0.2 s Ϫ1 (Fig. 1B) plus a dinucleotide TT product, which is silent in this assay (27)(28)(29). The 5.9 nM burst amplitude reflects only 1.2% of total integrase monomers. Although the concentrations used (500 nM integrase, 500 nM DNA) were known to be not saturating, the propensity of integrase to aggregate at higher concentrations precluded experiments under saturating conditions (8). In addition, a minor 18-mer side product was also observed. Because the ratio between the two products was approximately constant, the 18mer was likely produced in parallel to the 19-mer major product rather than from the degradation of the 19-mer. This 18mer is consistent with "near miss" processing at the Ϫ3 position observed with Mn 2ϩ as the metal cofactor as reported (28,30).

Presteady-state Assay Reveals Splicing of Synapsed Substrate by ASV Integrase-
Unexpectedly, a third product appeared that was larger than the starting 21-mer. Isolation from a semi-preparative scale assay reaction and sequencing ( Fig. 2A) showed that this product was a 46-mer derived from the direct splicing of an unprocessed *5t 21-mer into a 5b(2)3t 44-mer strand at its internal cognate CATT site (Fig. 2C). Interestingly, the electrophilic site of splicing occurred at the expected processing position for the U3 cognate sequence, CATT, of 5b(2)3t.
To determine whether this splicing reaction maintains the same sequence specificity as the processing reaction or if the position of attack resulted from the opportunistic placement of the 3Ј-OH of *5t as a convenient nucleophile near the process-  (5t-2) products. DNA in the large product bands was isolated and purified from a 390-s activity assay at 3 M integrase, 3 M *5t/5b(2)3t/3b (A) or *5t-2/5b(2)3t/3b (B), 100 mM NaCl. Maxam-Gilbert sequencing of the product identified the position of nucleophilic attack by the unmodified 3Ј-OHs to be between the A and T of the internal cognate CATT for both *5t/5b(2)3t/3b (C) and *5t-2/5b(2)3t/3b (D).
ing site of 5b(2)3t, an alternate substrate, *5t-2/5b(2)3t/3b, with a slightly different geometry was assayed, and the large product was isolated and its nature examined. In place of the radiolabeled *5t 21-mer previously used, this substrate contained a radiolabeled 19-mer, *5t-2, lacking the terminal TT dinucleotide of *5t. This substitution placed the attacking 3ЈOH two base pairs farther away from the CATT of 5b(2)3t, where splicing was observed with the 21-mer. If the observed splicing into 5b(2)3t at the cognate CATT site by *5t was simply a result of spatial constraints, then the site of attack by the 19-mer *5t-2 would be expected to change in accordance with the distance change. Conversely, if splicing into the cognate CATT was sequence-specific, then *5t-2 would attack at the same location as *5t despite being two nucleotides farther removed from the site of splicing. Because *5t-2 was identical in sequence to the processing product 19-mer, no bands corresponding to smaller products were expected or observed. However, a product larger than the starting material was again detected (data not shown). Sequencing showed this product to be a 44-mer (Fig. 2B), resulting from the attack of the terminal 3Ј-CA-OH of *5t-2 on 5b(2)3t at the identical position, between the A and T, of the internal cognate sequence (Fig. 2D). Surprisingly, Kukolj and Skalka (5) did not observe products consistent with the splicing reaction for a substrate with spacing similar to *5t-2. At present, no obvious rationale can be provided for this apparent discrepancy.
The 44-and 46-mer products result from nucleophilic attack by a 3Ј-OH from two different terminal sequences, CA and CATT, respectively, at an apparently sequence-specific electrophilic site despite the fact that the two substrates placed the attacking 3Ј-OH nucleophiles at different distances from the site of splicing. The relative efficiencies of the two reactions were measured in first-turnover experiments to investigate this phenomenon quantitatively (Fig. 3). Formation of splicing products in both reactions showed identical initial exponential phases with an exponential product formation rate of 0.02 s Ϫ1 in the early time regime. Interestingly, the amplitudes and, more importantly, the rates of product formation of the initial exponential phase, representing the first turnover of catalytically competent pre-formed complexes, were comparable for both substrates despite the possible difference in the location of the nucleophiles relative to the site of splicing. The cognate sequence specificity of the electrophile binding site alone appears to be sufficient to determine the specificity of the splicing reaction.
Sequence Requirements of the Splicing Reaction-Two mutant single-site synapsed substrates were synthesized by changing either the U5 or U3 cognate CATT sequence to GCAA to examine nucleophile versus electrophile selectivity. The substrate *m5t/m5b(2)3t/3b contained a radiolabel on the 21-nucleotide-long m5t strand, in which the terminal CATT sequence of the 5t strand was substituted with GCAA. A complementary mutation was made on the DNA strand of the substrate containing the tether, m5b(2)3t, to maintain base pairing. The internal U3-derived cognate sequence, where splicing occurs, was unmodified. Alternatively, the substrate *5t/5b(2)m3/m3b was synthesized by making the CATT to GCAA substitution at the internal cognate sequence, derived from the U3 LTR, while leaving the sequence at the U5-derived terminus unaltered. Fig. 4 shows the results from assays, at low (100 mM, A) and high (400 mM, B) NaCl concentrations, with the unmodified dual-site substrate along with those of the two mutant single site substrates. The single-site substrate *m5t/m5b(2)3t/3b (circles), comprising a nonspecific splicing nucleophile and a cognate splicing electrophile, was observed to have significantly higher reactivity than the conventional dual-site substrate *5t/5b(2)3t/3b (squares). In contrast, the single-site substrate with a cognate splicing nucleophile and a nonspecific splicing electrophile sequence, *5t/5b(2)m3b/m3t (diamonds), had barely measurable splicing activity. Fig. 4B also shows that the amounts of splicing product observed in the presteadystate assays are greatly increased at 400 mM NaCl. Although the rate constant for the initial appearance of these products was smaller, the actual rate of product formation when expressed as mole of product formed/s was similar. This suggests that the observed reduction in rate constant may merely be a reflection of the increase in amplitude. Consistent with this result, preliminary experiments beyond the scope of this report indicate that the inhibitory effects due to the aggregation of integrase into higher assembly states can be partially reversed by the addition of salt (8).
Because the choice of nucleophile used in the splicing reaction can be CATT-OH, a processed CA-OH (Fig. 3), or the non-cognate GCAA-OH, these results showed that the nucleophile binding site on integrase is not sequence-specific. The cognate CATT sequence was, however, required at the electrophile site in order for efficient splicing to occur. Given the lack of discrimination of the enzyme regarding selection of a nucleophile in these reactions, the 5Ј-OH in spatial proximity to the splicing site was also examined for reactivity. This hydroxyl is at the 5Ј terminus of the 21-mer strand, 3b, annealed to the 44-mer 5b(2)3t that contains the internal CATT splicing site. In an experiment where 3b* was 3Ј-end-radiolabeled using ␣ 32 P-TTP and wild type T7 DNA polymerase, no new bands representing any reaction products were observed under the conditions tested (data not shown). This result indicated that although any 3ЈOH can act as a nucleophile, the 5ЈOH is non-reactive.
Effect of Synapsed Ends on Equilibrium Binding-The equilibrium binding of integrase to these substrates was investigated by the nitrocellulose double-filter binding method (26). In addition to the usual dual-site synapsed substrate *5t/5b(2)3t/ 3b, the single-site 20-mer substrate *3t/3b and the single-site The parameter n, akin to the Hill coefficient, is a direct measure of the apparent cooperativity of binding and provides a lower limit for the minimal number of binding sites required for a fit. The two single-site substrates titrated identically despite their difference in length (Fig. 5). By comparison, the dual-site synapsed substrate showed both an increase in binding affinity and an increase in apparent cooperativity of binding as indicated by a shift to the left and an increase in the steepness of transition from n ϭ 1.6 Ϯ 0.06 to 2.0 Ϯ 0.08. In the absence of an independent measure of the assembly state of integrase, these titration data alone were insufficient to define a specific model for binding. However, these results were indicative of a change in binding modes by integrase due to the presence of two cognate sites on the DNA substrate. In contrast, when only a single cognate site was present on a synapsed substrate, binding was identical to that of two different substrate molecules despite a significant difference in their sizes.
Intramolecular Versus Intermolecular-Theoretically, the splicing reaction could have occurred either intermolecularly, between two different synapsed substrates, or intramolecularly, with the *5t strand splicing into the CATT of the 5b(2)3t annealed to it (Fig. 6A). To address this issue, an integrase activity assay was performed using an equimolar solution of radiolabeled "preprocessed" *5t-2/5b(2)3t/3b and unlabeled full-length 5t/5b(2)3t/3b. The rationale for the assay stems from the assumption that both the 5t and the 5b(2)3t strands can serve as electrophiles because they both contain a cognate sequence CATT. Therefore, in an intermolecular splicing reaction, the 3Ј-OH from a given 19-mer *5t-2 strand would have the choice of splicing into either the internal CATT of 5b(2)3t (Fig. 6A, Intermolecular option I) or the terminal CATT-OH of 5t (Fig. 6A, Intermolecular option II) to yield products with expected sizes of 44 and 21 nucleotides, respectively. Although a substantial amount of 44-mers was observed in the assay, no 21-mer products were detected (Fig. 6B).
Processing Versus Splicing of the Internal Cognate Site-Assays in which the synapsed 5b(2)3t* 44-mer was radiolabeled yielded radiolabeled 19-mer product from either the splicing or the processing reactions since both are targeted at the identical phosphate bond between the A and T of the internal CATT cognate sequence. To determine the relative contributions of the two reactions to total 19-mer products observed, a first-turnover experiment was performed with 5t/5b(2)3t*/3b. The radiolabel was placed exclusively on the synapsed 5b(2)3t* 44-mer, and time points were collected within the first turnover. In the presteady state, the 19-mer products from the splicing and processing reactions were expected to resolve into two distinct exponential phases with characteristic rate constants of 0.02 and 0.20 s Ϫ1 , respectively. Fig. 8A shows the results of such an experiment. Best fit for the amount of 19-mer products formed over time revealed a single exponential phase with an amplitude of 5.8 nM and an apparent rate constant of 0.02 s Ϫ1 , which is characteristic of the splicing reaction. No exponential phase with the faster rate constant of 0.20 s Ϫ1 , characteristic of the processing reaction, was observed. Furthermore, the amplitude of this exponential phase was, within error, identical to that observed for the production of the 46mer splicing product in experiments performed with *5t/ 5b(2)3t/3b with respect to both rate of product formation and amplitude (Fig. 8B). This result shows that the internal CATT cognate site is not processed as a terminal cognate site in a hydrolysis reaction but, instead, undergoes the splicing reaction exclusively.

Presteady-state Analysis of a Novel Splicing Reaction-
Integrase catalyzes two distinct reactions in vivo, a 3Ј dinucleotide trimming or processing reaction and a joining reaction whereby the processed viral 3Ј-ends become integrated into the host genome. Kukolj and Skalka (5) originally designed synapsed-end substrates to improve integrase processing in vitro. The rationale for these substrates, where two cognate sites are tethered in a head-to-head fashion, was to facilitate the assembly and binding of an active complex with both LTR ends bound to integrase. Under the steady-state assay conditions reported, these substrates gave enhanced product formation relative to short oligodeoxynucleotide duplexes containing only single ends.
Using similarly designed synapsed-end substrates in our presteady-state studies of the integrase-catalyzed processing reaction, we discovered a novel intramolecular splicing reaction resembling an integration reaction that occurs without prior requisite processing. This splicing reaction is characterized by the following features. 1) The sequence of the DNA strand containing the 3Ј-OH nucleophile is not critical; however, a 5Ј-OH cannot replace the 3Ј-OH or H 2 O; 2) a CATT cognate site is preferentially targeted as the electrophile in the reaction; 3) splicing occurs intramolecularly with respect to the synapsed substrates to yield a long hairpin product plus a short fragment, which is structurally indistinguishable from a processing product; and 4) splicing occurs as an exponential phase with a characteristic rate constant of 0.020 s Ϫ1 , which is 10 times slower than processing (k processing ϭ 0.20 s Ϫ1 ).
Although the splicing reaction appears to resemble the joining activity of integrase, the sequence specificity for a cognate CATT as the electrophile represents a significant mechanistic difference between the two. The integrase-catalyzed joining reaction shows very little sequence specificity with regard to target site selection (4,31), although it does seem to show a structural preference, e.g. for bent DNA (32) or cruciform stemloops (31). In contrast, the splicing activity of integrase is specific for the cognate sequence CATT such that splicing preferentially occurs between the A and T of this sequence. If the splicing reaction were representative of the true joining reaction, then selective integration at CATT sites in the host genome would be expected. This has not been observed. We have examined the sequence of putative "hot" sites (33) and were not able to correlate sites of integration with the existence of CATT sequences (data not shown).
On the other hand, the preference for a cognate CATT as the electrophile is shared in common with the processing reaction, the difference being that in the splicing reaction, a 3Ј-OH from DNA serves as the nucleophile, a role played by H 2 O in the processing reaction. However, even in this aspect, the two reactions are similar. The promiscuity regarding the choice of nucleophiles in the processing reaction is well documented; non-water nucleophiles ranging from glycerol to serine and threonine (29) and even the 3Ј-OH of the leaving dinucleotide (18) have been observed. In the case of the splicing reaction, we have additionally shown that the DNA sequence attached 5Ј to the attacking 3Ј-OH is irrelevant. Thus, the splicing reaction may in fact represent a "rogue reaction" closely akin to processing, at least with regard to the specificity requirements of the electrophile and nucleophile binding sites.
An unexplained feature of the splicing reaction is the observation that no processing, as defined by H 2 O cleavage, occurred at the internal CATT site. It would be expected that the internal CATT is a viable site for processing as previous studies (29) have shown that processing can occur at cognate sites with extended 3Ј-tails. Perhaps exclusive splicing arises from strong competition due to the adventitious proximity of the 3Ј-OH inherent in the design of the substrate. Additionally, It has been observed that ASV integrase distorts the linear DNA ends before the processing reaction and that fraying of the DNA ends leads to an increase in processing activity (34). It is possible that the design of the synapsed-end substrates is such that the internal cognate site is not as susceptible to fraying. Implications for Previous Steady-state Analysis with Synapsed-end Substrates-The presteady-state results illustrate the problems inherent in the "long" reaction time assays routinely favored. Products found after multiple exponential phases of product formation represent the time-averaged accumulation of products formed from multiple catalytic events. Assays that rely on the quantitation of products at a single point or a series of points in a longer time regime assume a single congruent reaction pathway for the generation of each product. In addition, the comparison between the accumulation of two different products in this time regime represents an inaccurate characterization of the reactivity of the enzyme with respect to the two reactions. This is especially true for reactions such as those catalyzed by integrase, when an exponential formation of product is apparent before reaching steady state. The initial exponential phase represents the actual reactivity of the enzyme with respect to both binding specificity and catalytic rate. In addition, the initial exponential phase arises as a direct consequence of a slow post-chemistry step in the reaction mechanism and generally reflects a slow rate-limiting product release step (35). In these instances, a faster steadystate turnover typically reflects only a faster rate of product release (21, 36 -39). A more complete discussion of the use of the initial exponential phase to assess substrate reactivity is contained in the second paper of this series (8).
Analysis of the exponential rates and amplitudes characteristic of the initial exponential phase, arising from both true processing and the rogue splicing reaction, provided the basis for the discovery of exclusive splicing activity at the internal CATT site. In contrast, previous experiments relying on measurements made in a time regime where the enzyme had undergone multiple exponential phases of product formation (5) erroneously attributed all the cleavage products at this site to a processing reaction and, as a result, incorrectly estimated the extent of processing seen with these substrates. Furthermore, under these assay conditions, the splicing reaction has almost twice the linear turnover rate when compared with the processing reaction despite having a similar initial exponential amplitude with a 10-fold slower exponential rate of product formation. Therefore, the net accumulation of splicing products after multiple exponential phases of product formation, mistakenly quantified as processing products, does not represent an accurate measure of the relative extent of the two reactions but may merely reflect the fact that the splicing product is bound less tightly by the enzyme than the processed product.
Although the design of the original synapsed-end substrate introduced complications with respect to quantitation of integrase activity, the binding data reported here do support the hypothesis that the tethering of two cognate sequences facilitates assembly of an integrase-DNA complex. These data show a higher level of cooperativity and tighter binding for the synapsed substrate versus either a single-ended untethered substrate (3t/3b) or a synapse substrate with only one cognate CATT present. In addition, the faster 0.2 s Ϫ1 burst rate constant for the processing reaction was observed only using substrates that contained two cognate sites. A more ideal design of synapsed substrates, therefore, requires both cognate CATTs to be situated at the 3Ј-termini of their respective strands. This requirement is readily accommodated by the incorporation of a 5Ј-5Ј linkage in the tether segment, as found in the reverse polarity substrate, 5t/5b(2)3b/3t. Mechanistic studies using this new class of synapsed substrates are presented in the accompanying paper of this series (8).
Similarities to V(D)J Recombinases-Interestingly, the splicing reaction of integrase closely resembles a reaction catalyzed by the V(D)J recombinases RAG1 and RAG2 (40 -43). This site-specific recombination requires a 3Ј-OH at the end of the attacking strand for joining, uses this 3Ј-OH as the nucleophile in an intramolecular attack at a sequence specific site to form a hairpin, requires divalent metal ions, and is independent of ATP (42). The parallel observed between these two enzymecatalyzed transesterification reactions provides the first functional evidence in support of the hypothesis that V(D)J recombinases belong to the same transposase structural superfamily that includes human immunodeficiency virus integrase, ASVintegrase, MuA (42), and would therefore predict a strong structural similarity between these enzymes and retroviral integrases.
A Model for Functional Switching of Binding Sites-The characterization of the splicing reaction also provides significant insights into the structure-function relationship of integrase. In particular, it highlights the important distinction between structurally defined binding sites such as the cognate sequence recognition site, where CATT is bound, versus functionally defined sites, such as the electrophile or the nucleophile binding sites. Our results show that in both the splicing and the processing reaction, only the electrophile is bound in the structurally fixed sequence-specific CATT binding site, whereas the nucleophilic 3ЈOH occupies a nonspecific binding site. On the other hand, in the true joining reaction, this rela- FIG. 9. A two-site model for the structure-function switch in integrase. The sequence-specific binding site is depicted at the top of each reaction, and the sequence-nonspecific site is depicted at the bottom of each reaction. Whether a particular site is poised to bind an electrophile or nucleophile in the reaction is denoted by a solid box or a dashed box, respectively, and corresponds to the protonation state of an acid-base pair in this hypothetical "simplest case" mechanism. tionship between functional and structural sites is reversed. Namely, nonspecific DNA serves as the electrophile, whereas sequence-specific processed-end CA-OHs act as nucleophiles.
In an extreme case, this structure-function switch would require having different sets of active sites for the processing reaction and the joining reaction. In such a model, the processing would take place in a structurally coupled cognate sitespecific electrophile binding site. After processing, the viral ends would then be transferred to a nearby cognate site-specific nucleophile binding site. In addition, joining would also require the binding of target host DNA in a third site, one where nonspecific DNA is bound as the electrophile. The host DNA cannot bind in the electrophile site used for processing, since it is cognate-sequence specific.
The current results suggest the possibility of a model that requires only two structurally fixed sites per retroviral DNA end, a sequence-specific site and a sequence-nonspecific site. In contrast to the more complex model, the functionalities of these two sites are not fixed but would switch during the course of catalysis. This is similar to classical acid-base mechanisms for phosphodiester hydrolysis proposed for ribonuclease (44) and alkaline phosphatase (45). In the initial processing state, the constellation of catalytic groups in the sequence-specific site is poised to activate the bound DNA as an electrophile, whereas the sequence-nonspecific site functions as a nucleophile-activating site. A switch in the functionality of the two sites accompanies the catalysis of the processing reaction. This switch obviates the need to translocate the processed DNA to a separate nucleophile-activating site. At the same time, the coupled switch would create an electrophile-activating site in the nonspecific binding site in preparation for binding nonspecific target DNA.
The model makes chemical sense, as illustrated in Fig. 9 for a minimal constellation consisting of an acid-base pair. In this hypothetical scenario, a base in the nonspecific site activates H 2 O as the nucleophile, whereas a corresponding acid in the sequence-specific site activates the phosphate backbone of the bound CATT sequence as the electrophile. In the course of the reaction, the base in the nonspecific site becomes protonated, and the acid in the specific site is deprotonated. This change in the protonation state of the catalytic acid and base in the two sites effectively switches the functionality of the two structural sites. As a result, the processed ends of the viral DNA automatically become bound in the nucleophile site without any need for translocation into a separate site.
This switching model predicts that the processing and joining reactions should be chronologically coupled because the correct association of functional and structural sites for the joining reaction requires prior catalysis of the processing reaction. This may be the basis for the contrast between the observed sequence specificity of the splicing reaction and the relative lack of sequence specificity of the joining reaction. In this context, the hybrid nature of this splicing reaction, with specificity akin to the processing reaction but functionality related to the joining reaction, reflects the fact that integrase exists predominantly in a form where the sequence-specific site is in an electrophile-activating configuration. Thus, integrase is generally poised to bind a cognate sequence as the target of nucleophilic attack and use any convenient functional hydroxyl, the spatially proximal 3Ј-OH of an unprocessed DNA strand in the case of the synapsed-end DNA molecules, as the nucleophile. The result is catalysis of the observed splicing reaction with these synapsed-end substrates.