Structural Insights into the Assembly of the Adeno-associated Virus Type 2 Rep68 Protein on the Integration Site AAVS1*

Background: Rep68 catalyzes site-specific integration into chromosome 19 AAVS1 site. Results: Rep68 forms a heptameric complex on the AAVS1 site. Conclusion: Assembly requires the cooperative interaction of all functional domains and requires a minimum of two GCTC repeats. Significance: These results provide insights into the first step of the site-specific integration reaction. Adeno-associated virus (AAV) is the only eukaryotic virus with the property of establishing latency by integrating site-specifically into the human genome. The integration site known as AAVS1 is located in chromosome 19 and contains multiple GCTC repeats that are recognized by the AAV non-structural Rep proteins. These proteins are multifunctional, with an N-terminal origin-binding domain (OBD) and a helicase domain joined together by a short linker. As a first step to understand the process of site-specific integration, we proceeded to characterize the recognition and assembly of Rep68 onto the AAVS1 site. We first determined the x-ray structure of AAV-2 Rep68 OBD in complex with the AAVS1 DNA site. Specificity is achieved through the interaction of a glycine-rich loop that binds the major groove and an α-helix that interacts with a downstream minor groove on the same face of the DNA. Although the structure shows a complex with three OBD molecules bound to the AAVS1 site, we show by using analytical centrifugation and electron microscopy that the full-length Rep68 forms a heptameric complex. Moreover, we determined that a minimum of two direct repeats is required to form a stable complex and to melt DNA. Finally, we show that although the individual domains bind DNA poorly, complex assembly requires oligomerization and cooperation between its OBD, helicase, and the linker domains.


Adeno-associated virus (AAV)
is the only known eukaryotic virus that can establish latency by integrating its genome site specifically into the host genome (1)(2)(3)(4). The integration site known as AAVS1 is located within a 4-kb region in human chromosome 19 at 19q13.4 (3)(4)(5). This site is within a regulatory region that controls the myosin-binding subunit 85 (MBS85) also known as protein phosphatase 1 regulatory protein (PPP1R12C) (6). Latency can also arise by episomal persistence, and both mechanisms happen in the absence of a helper virus such as adenovirus (7,8). The AAVS1 site resembles a specific region found in the palindromic sequences known as inverted terminal repeats (ITR) at both ends of the AAV genome (9 -13). The ITRs span a region of 146 nucleotides that fold into a T-shaped hairpin structure with the stem of the hairpin containing the Rep-binding site (RBS) and the terminal resolution sequence (trs); the latter undergoes a strand-specific nicking reaction that is essential for DNA replication and integration (14 -17). The RBS is made up of multiple repeats of the tetranucleotide sequence 5Ј-GCTC-3Ј or small variations of this sequence. Interestingly, most of the AAV serotypes have three contiguous repeats flanked by pseudo-repeats, a feature that is shared with AAVS1 (18). The site-specific integration process is contingent on the presence of the large regulatory proteins Rep78/Rep68, the AAVS1 site, and a cis-acting viral DNA sequence (1,17,19,20). Earlier studies reported that for the AAVS1 region a minimum sequence of 33 bp containing the RBS and trs sequences is essential for site-specific integration (16,17,21). Furthermore, a 16-bp sequence from the p5 promoter known as the P5 integration efficiency element (P5IEE) was identified as the minimal viral cis-acting element able to mediate site-specific integration (22,23). However, the mechanistic details of how AAV Rep78/Rep68 proteins drive this mechanism are not known. The requirement of site-specific nicking of the AAVS1 trs site suggests that some of the steps resemble the terminal resolution reaction during AAV DNA replication. This process is a variation of the mechanism used by DNA relaxases during DNA conjugation (20,24,25). The first step in site-specific integration is the assembly of a Rep78/ Rep68-DNA complex guided by the recognition of GCTC repeats through the N-terminal origin-binding domain (OBD). The crystal structure of AAV5 OBD bound to a 26-bp AAV5 RBS illustrated for the first time the structural determinants of repeat recognition (26). Each OBD molecule interacts with two contiguous repeats using two secondary structure motifs as follows: a loop contacts the major groove bases of one repeat while an ␣-helix interacts with a downstream repeat through the minor groove. Thus, two contiguous OBD molecules share a repeat, with one molecule interacting through the major groove and the second molecule binding in the minor groove on the opposite face of the DNA. Whether this DNA recognition mode is conserved in other AAV Rep serotypes and in the AAVS1 site is not known due to the lack of structural information of a more representative member, such as AAV2. Intriguingly, differences in serotype specificity for ITRs have been reported such that Rep proteins are functional only when binding to their own serotype ITR (27,28). Whether the difference in specificity is due to the number and/or arrangement of repeats found in the origin of replication of the different AAV serotypes has not been established. In addition, the effect of the helicase domain in the overall formation of the initial complex in the context of Rep78/Rep68 proteins is not known. To understand the mechanism of Rep68 assembly on the AAVS1 site and the role that each of the AAV Rep68 functional domains plays in this process, we first determined the crystal structures of AAV2 OBD in complex with integration site AAVS1. Next, we characterized the complex of Rep68 bound to a 41-mer AAVS1 site and performed binding studies with each of the Rep68 individual functional domains. Our results show that AAV2 OBD has a similar binding mode to AAV5; however, the longer recognition loop in AAV2 has more extended contacts with the major group due to its inherent flexibility by several glycine residues. We determined that Rep68 forms a heptameric complex with the AAVS1 site and that high binding affinity for AAVS1 is due to the cooperative binding and assembly of the OBD, linker, and helicase domains.

Protein Expression and Purification
OBD Proteins-The DNA region encoding amino acids 1-208 and 1-224 from adeno-associated virus type 2 (AAV2) (GenBank TM AAC03774.1) was cloned into pET-15b (Novagen) using restriction sites NdeI and XhoI. The residue Cys-151 was mutated to serine as it was found to produce disulfide bonds and inhibit crystallization. Residue Tyr-156 was mutated to phenylalanine to eliminate any potential nuclease activity. The OBDN208 was overexpressed by growth in Escherichia coli strain BL21 pLysS at 37°C in Luria-Bertani (LB) broth until reaching an absorbance of 0.6. Isopropyl ␤-D-thiogalactopyranoside was added to a final concentration of 1 mM. Cells were harvested after 5 h and stored at Ϫ80°C. The cell pellet was resuspended in binding buffer (25 mM Tris-HCl, 500 mM NaCl, 10 mM imidazole, 10% glycerol, 1 mM TCEP, pH 7.9) and lysed by sonication. OBD was purified with nickel-nitrilotriacetic acid column (Qiagen) using step gradients of 75 and 125 mM imidazole to wash nonspecific proteins binding to the column and was eluted with 300 mM imidazole. Protein was loaded onto a Hi-Load desalting column (GE Healthcare) to change into thrombin buffer (25 mM Tris-HCl, 200 mM NaCl, 10% glycerol, pH 8.0). Hexa-histidine tag was cut by addition of thrombin (1 unit/mg) and removed by passing through a nickel-nitrilotriacetic acid column. Untagged OBD was collected from the flow-through, concentrated, and further purified by gel filtration on a Hi-Load 16/60 Superdex 75 column (GE Healthcare) previously equilibrated with GF buffer (30 mM Tris-HCl, pH 7.5, 150 mM NaCl). The protein was concentrated to ϳ12 mg/ml using Millipore Centricon (10-kDa cutoff) prior to crystallization.
Rep68 Proteins-All mutant proteins were generated using the pHisRep68/15b plasmid, which contains the AAV2 Rep68 ORF subcloned in vector pET-15b (Novagen). The Rep68 octlink construct was generated by substitution of residues 206 -224 of AAV2 Rep68 with the mouse Oct-1 linker residues 328 -346 (GenBank TM CAA49791) using the gene synthesis services from GeneScript. Proteins were expressed in E. coli BL21(DE3) cells (Novagen) and purified as described previously (30). The size exclusion buffer contains (25 mM Tris-HCl, pH 8.0, 200 mM NaCl, and 2 mM TCEP). In brief, cell pellets were lysed in Ni-Buffer A (20 mM Tris-HCl, pH 7.9 at 4°C, 500 mM NaCl, 5 mM imidazole, 10% glycerol, 0.2% CHAPS, and 1 mM TCEP) and purified using a nickel column. The hexa-histidine tag was removed by PreScission protease, and Rep68 was further purified by gel filtration chromatography using a HiLoad Superdex 200 16/60 column (GE Healthcare) and Size Exclusion buffer. Rep68 WT and mutant proteins were concentrated to 10 mg/ml, flash-frozen in liquid N 2 , and kept at Ϫ80°C.

Fluorescence Anisotropy DNA Binding Assay
Binding assays were performed using 5 nM fluorescein-labeled 41-mer AAVS1 DNA. Rep68 constructs at different concentrations were mixed with DNA at a final volume of 200 l using the following buffer: 25 mM HEPES, pH 7.0, 200 mM NaCl, 1 mM TCEP. Fluorescence readings were taken on a PC1 fluorimeter (ISS, Inc.) with excitation and emission filters at 492 and 528 nm, respectively. Tubes were equilibrated at 20°C for 20 min before measurement. Each anisotropy point is the average of 10 measurements. Anisotropy is calculated as the ratio of the difference between vertical and horizontal emission intensities over the total normalized intensity. The fraction of DNA bound (B) was calculated using Equation 1, where [A] x represents the anisotropy measured at protein concentration X; [A] DNA is the anisotropy of free fluorescence DNA, and [A] final is the anisotropy at saturation. To obtain overall apparent binding constants that could be used to compare different mutants, data were fitted to a single binding site model with a Hill coefficient. This model was selected because at this time sufficient details of the assembly process are not known. Fitting was performed using the program Prism 6 TM (GraphPad). Each experiment was done in triplicate.

Analytical Ultracentrifugation
Sedimentation velocity experiments were carried out using a Beckman Optima XL-I analytical ultracentrifuge (Beckman Coulter, Inc.) equipped with both four-and eight-hole rotors. Samples (2, 5, 10, 20, and 40 M) were loaded in the cells, using in all cases size exclusion buffer. Samples were centrifuged in two-sector carbon-filled Epon centerpieces at 20°C. Sectors were loaded with 420 l of sample volume. Typically, 200 or more scans were collected at 5-min intervals at 25,000 rpm. Sedimentation profiles were collected using both UV absorption (280 nm) and Rayleigh interference optical systems. Results were analyzed using both SEDFIT and SEDPHAT (31,32).

Crystallization and X-ray Structure Determination
The oligonucleotides used for crystallization were purchased from Integrated DNA Technologies, Inc. as follows: site A, 5Ј-CTCGGCGCTCGCTCGCTCGCT-3Ј and 5Ј-GAGCGAGCG-AGCGAGCGCCGA-3Ј; site B, 5Ј-GCGCTCGCTCGCTCGC-TGGCC-3Ј and 5Ј-CGCCCAGCGAGCGAGCGAGCG-3Ј. DNA was purified on a Mono Q-5/50GL column. The purified DNA was desalted, lyophilized, and resuspended in TE buffer (10 mM Tris-HCl, pH 8.0, 40 mM NaCl, 1 mM EDTA). The oligonucleotides were mixed in 1:1 molar ratio, heated to 90°C for 5 min, and cooled slowly to room temperature. OBD and double-stranded DNA were mixed in 1:2 (complex A) and 1:3 (complex B) molar ratios, respectively. These complexes were concentrated to a final concentration of protein of about 18 mg/ml. The buffer concentration was exchanged during the concentration process to 30 mM Tris-HCl, pH 7.3, and 40 mM NaCl. All crystallization experiments were carried out using the hanging and sitting-drop methods with commercially available screening kits at 4°C. Crystals of complex A grew from 3-l hanging drop after 2-3 weeks. The best crystals were obtained from reservoir solution containing 50 mM sodium citrate, pH 4.3, 5.75-7.25% PEG-6K, and 0.2 M LiCl. Crystals of complex B grew from 6-l sitting drop where reservoir solution was 100 mM sodium phosphate/citrate buffer, pH 4.2, 5-7.5% PEG-3K and 0.2 M NaCl. Both crystal forms were cryo-protected in their corresponding reservoir solution by adding ethylene glycol to 25% before flash-freezing in liquid nitrogen. Diffraction data were collected at the National Synchrotron Light Source at Brookhaven National Laboratory beamline X6a. The data were processed with the program HKL2000 (33), and the structure was solved by molecular replacement using the program PHE-NIX. We used the structure of the AAV2 OBD as a search model (Protein Data Bank code 4ZO0). Model building was carried out using PHENIX (34) and manual building using the program COOT (35). Analysis of the buried surface area was determined by calculating the difference in a solvent-accessible area (ASA) as shown in Equation 2, The individual solvent-accessible areas were calculated using Chimera (36). Figures were generated by PyMOL (37), DOG (38), and Adobe Photoshop.

Transmission Electron Microscopy Analysis
Protein samples at 0.1 mg/ml were adsorbed directly onto carbon-coated copper grids. Following negative staining with 0.75% (w/v) uranyl formate, samples were visualized in an electron microscope Tecnai F20 operated at 200 kV, and images were collected at a magnification of ϫ50,000 under low dose conditions on a Gatan 4k x 4k CCD camera. Particle windowing, two-dimensional alignment, and classification reconstruction were carried out with EMAN2. The entire process followed the default settings of this image-processing software for eight iterative alignments. The two-dimensional averages were obtained from a final set of 520 particles.

Results
To characterize the assembly of Rep68 on the integration site AAVS1, we utilized a three-pronged strategy. First, we determined the x-ray structure of AAV-2 OBD bound to AAVS1 DNA to define the structural details of specific repeat recognition. Next, we determined the stoichiometry of the Rep68-AAVS1 complex by sedimentation velocity and electron microscopy. Finally, we carried out a detailed analysis of the role of each domain in DNA binding and the minimal number of repeats that are needed to form a stable complex.
Overview of Structure Determination-We have determined the structure of two AAV2 OBD-AAVS1 complexes with different stoichiometries. Complex A has a 2:1 OBD/AAVS1 ratio, and complex B has three molecules of OBD per AAVS1 site. The AAV2 OBD used in crystallization spans amino acids 1-208 of Rep78/Rep68 (Fig. 1A). Both were crystallized with 21-mer AAVS1 DNA sites containing a core of three direct GCTC repeats but differing in the number of residues upstream and downstream of this core (Fig. 1B). The stoichiometry of the complexes parallel the protein-DNA ratios used during crystallization. Complex A crystallized in space group C2 with two complexes per asymmetric unit, and complex B crystals belong to P2 1 with one complex in the asymmetric unit. The structures were solved by molecular replacement using the structure of AAV2-OBD as a search model (Protein Data Bank code 4ZO0). Structures of the complexes were refined to 2.5 and 2.6 Å, respectively; refinement statistics are shown in Table 1. These structures may represent snapshots of consecutive binding events as the DNA is titrated with higher concentrations of the protein (Fig. 1, C and D).
Overall Structure of OBD-AAVS1 Complex-Complex A can be superimposed onto complex B with an overall root mean square deviation of 1.14 Å for two OBD molecules and 17 bp of DNA. Although both complexes crystallize in different space groups, the overall interactions with DNA are identical thus confirming the validity of the specific contacts between OBD and DNA. Each OBD molecule interacts with two consecutive GCTC repeats where the DNA recognition loop (L DB ) interacts with the major groove of the first repeat, and ␣-helix D interacts with the second repeat through the minor groove (Fig.  2). This particular binding mode establishes that multiple OBD molecules will share repeats such that an OBD molecule interacting through the major groove will face a second OBD molecule binding upstream through the minor groove in the opposite face of DNA (Fig. 2). The OBDs dock onto DNA in a head to tail orientation spiraling around the DNA axis by ϳ144°. The AAVS1 site contains an RBS with three perfect 5Ј-GCTC-3Ј repeats and two "imperfect" repeats with the sequence 5Ј-CGGC-3Ј and 5Ј-GCTG-3Ј located upstream and downstream of the perfect repeats, respectively (Fig. 1B). Two of the OBDs in complexes A and B bind the same repeats, and the third OBD in complex B binds the final GCTC repeat and the downstream pseudo-repeat. This type of binding mode positions the C-terminal ends of the OBD molecules pointing upstream of the RBS site where the helicase domains would be positioned in the full-length Rep68/Rep78 proteins.
Sequence-specific Recognition of GCTC Repeats in AAVS1-Two OBD molecules binding to contiguous repeats cooperate in recognizing a full GCTC repeat sequence (Fig. 3). Four residues generate all specific contacts with the bases as follows: three are located in the DNA recognition loop L DB of one OBD molecule (Arg-138, Ala-141, and Gly-142), and the upstream OBD molecule contacts DNA via Arg-107 (Fig. 3A). For instance, for the recognition of the central repeat (Fig. 1B, repeat 2), residues from the L DB loop specify the sequence GCT with Arg-138 making hydrogen bonds with guanine 7 via O6 and also with the guanine 15Ј of the second base pair (Fig. 3, A  and B). Cytosine 8 complementary to guanine 15Ј is recognized by the main chain carbonyl of Ala-141 (Fig. 3A, bottom panel). The recognition of the third base pair (T:A) is provided via Gly-142 whose main chain carbonyl makes a bidentate interaction with adenine 14Ј. Finally, Arg-107 of the upstream OBD makes a bidentate interaction with thymine 9 and cytosine 10. Thus, of the eight possible nucleotides in the GCTC sequence, only two do not directly contact the protein and are located at the ends of the tetranucleotide repeat (Fig. 3). The OBD-DNAspecific contacts are stabilized by 6 -8 phosphate-backbone contacts per OBD molecule. Residues that provide these interactions are also located in loop L DB and ␣D elements. There are no direct contacts between the different OBD molecules, and whether the binding of multiple molecules is a cooperative event mediated by DNA, remains to be determined. The DNA is essentially B-form with no major distortions in any of its geometric parameters as determined by the program 3DNA (39, 40). Green sequence represents the trs site. C, structure of complex A with a 2:1 stoichiometry with OBD1 (green) binding to repeats 1 and 2; OBD2 (pale green) binding to repeats 2 and 3. D, structure of complex B with a 3:1 stoichiometry showing OBD1 (green) binding to repeats 1 and 2; OBD2 (pale green) binding to repeats 2 and 3, and OBD3 (cyan) binding to repeat 3 and pseudo-repeat.

Recognition of GCTC Repeats by Different AAV Rep
Serotypes-Although the principles of repeat recognition were shown by the structure of the AAV5 OBD-RBS complex, there are significant differences between AAV5 and the majority of the other AAV serotypes, particularly in the recognition loop L DB (Fig. 4C). In AAV2 and the majority of AAV Rep serotypes, this region spans ϳ10 residues and contains the sequence RNGAGGG. In contrast, AAV5 has a shorter recognition loop with the sequence K-KGGA. The underlined residues represent amino acids that interact with DNA bases as discussed above (Fig. 3B). Using the structure of AAV5 OBD bound to the RBS element from the AAV5 ITR, we compare its recognition of the GCTC repeats with AAV2 OBD (26). Although the two structures superimpose with a root mean square deviation of ϳ1 Å, their respective repeat DNAs "sit" differently with respect to the superimposed OBD domains. The AAV5 RBS is translated up the helical axis by ϳ1-1.5 Å, and the conformation of the recognition loops moves up accordingly (Fig. 4A). The fact that both OBDs make similar specific interactions with DNA despite differences in their relative positions can be attributed to the flexibility of the glycine-rich recognition loop that can accommodate the conformational requirements needed to fit into the major groove (Fig. 4B). The conformation of the recognition loop follows the DNA backbone making significant van der Waal contacts with its ribose groups. This is more pronounced in the larger AAV2 loop (Fig. 4B). In AAV5, Lys-137 plays the equivalent role of residue Arg-138 in AAV2. It interacts with the first guanine of the repeat. However, in contrast to Arg-138, Lys-137 does not make any interactions with the guanine in the second base pair of the repeat. In AAV5, residue Lys-138 interacts with cytosine through its main chain carbonyl similarly as Ala-141 in AAV2. Finally, the main chain carbonyl of AAV5 Gly-139 and AAV2 Gly-142 plays equivalent roles in recognizing adenine in the third base pair of the repeat (Fig. 4A). Thus, with the exception of Lys-137 and Arg-138, most of the interactions are made through main chain atoms. Consequently, sequence variations in the loop that allow sufficient conformational flexibility to dock into the major groove are one of the main requirements to make specific contacts (Fig. 4C).
Rep68 Forms a Heptameric Complex on AAVS1-Recognition of the GCTC repeats is only one of the events required for the assembly of Rep68 on the AAVS1 DNA site; however, not much is known about the nature of the final complex. To start the characterization of the Rep68-AAVS1 complex, we first used analytical gel filtration. The elution profile shows two peaks as follows: one eluting at ϳ12.5 ml that represents a Rep68-AAVS1 complex, and the second eluting peak represents excess DNA (Fig. 5A). The purified Rep68-AAVS1 complex is stable even at a concentration of 500 mM NaCl (data not shown). The elution profile of the complex differs drastically from that of apo-Rep68, which shows the presence of multiple species (41). To further characterize this complex, we used sedimentation velocity. Experiments were performed at 20°C using the fluorescein-labeled 42-mer AAVS1 DNA site at a concentration of 2 M. Data were collected at 492 nm to detect species containing DNA. The sedimentation profile shows a single peak sedimenting at 12.5 s 20,w . The presence of a single peak allowed us to fit the data to a single ideal species model to estimate the molecular mass. We obtained a value of 438 kDa, close to the theoretical value for a heptameric complex (453 kDa) (Fig. 5B). A similar estimate was obtained by sedimenta-tion equilibrium (Fig. 5C). To further confirm these results, we used negative-stain transmission electron microscopy to obtain structural information by single-particle reconstruction. Fig.  5D shows a representative electron microscopic image of a negatively stained Rep68-AAVS1 complex. A reference-free twodimensional alignment without imposing any symmetry was carried out that clearly shows the presence of seven-member ring structures (Fig. 5E). Thus, Rep68 forms a heptameric complex when bound to the AAVS1 site.

Minimum of Two Direct Repeats Is Required for Stable
Rep68-RBS Complex-To determine the minimal number of repeats that are required to obtain a stable complex, we calculated the binding affinity and stoichiometry of Rep68 to DNA sites with one, two, and three GCTC repeats. Binding isotherms were determined using fluorescence anisotropy measurements on 41-mer DNA sites labeled at the 5Ј end with carboxyfluorescein. Data were fitted using a single site model with a Hill coefficient. Fig. 6 shows that Rep68 binds to the AAVS1 with a K D, app of 128 nM (Fig. 6A). Sites with two and three repeats have affinities of ϳ140 nM, which is close to the binding constant for the AAVS1 site. Moreover, the h values obtained from the fitting range from 1.2 to 1.7 imply positive cooperativity during complex formation. In contrast, the single repeat site increases the binding constant to about ϳ400 nM. To determine whether Rep68 forms the same complex on these sites, we performed sedimentation velocity. Fig. 6B shows the comparison of the sedimentation velocity profiles of the mutant sites with the AAVS1 complex. Whereas 3rpt and 2rpt form complexes that sediment at 12.5 S as the AAVS1 site, the 1rpt site does not. Thus, Rep68 needs at least two GCTC repeats to assemble into a stable heptameric complex.
To determine whether the formation of a stable complex translates into efficient DNA melting, we tested the ability of Rep68 to melt the AAVS1 mutant DNA sites with varying numbers of GCTC repeats using the strand-displacement assay (42). The method consists of the generation of fluorescein-labeled single-stranded DNA upon addition of ATP and magnesium. The results show that although AAVS1, 3rpt, and 2rpt sites are melted, the Rep68 is not able to melt the 1rpt site (Fig. 6C).
Rep68 DNA Binding Requires the Cooperative Interaction of OBD, Helicase, and Linker-AAV Rep68 has two DNA binding domains that play a role in the overall affinity for the AAVS1 site. The helicase domain binds DNA nonspecifically, whereas the specificity for RBS is provided by the OBD. To determine the contributions of each of the domains for binding the AAVS1 site, we determined the binding affinities of Rep68, OBD, and the helicase domain (Rep40). At the experimental conditions tested, we could not measure the binding constant of OBD because it never reached saturation (Fig. 7B). Under the same conditions, Rep40 binds with an apparent dissociation constant of ϳ 22 M (Fig. 7C). In previous studies, we and others determined that the linker is important for the oligomerization of Rep68 (30,43). To determine the role of the linker in DNA binding, we generated the construct OBDL spanning residues 1-224 that includes the OBD and the entire linker region. We calculated the binding affinity for this construct and obtained a K D, app of ϳ50 M (Fig. 7D). These results suggest that the linker provides additional contacts with DNA increasing the affinity of the minimal OBD domain. To determine the role of the linker region in the context of Rep68, we determined the affinity of Rep68 octlink , a mutant that has 18 residues from the Oct-1 protein instead of Rep68 residues 206 -224. We determined previously that this mutant protein behaves as a monomer in solution (30). Rep68 octlink has a K D, app of 29 M FIGURE 4. Comparison of AAV2 and AAV5 binding to GCTC repeats. A, superposition of structurally aligned structures of AAV2-OBD (pale green) and AAV5 OBD (red) bound to AAVS1 and AAV5 origin RBS. Equivalent residues Arg-138 (AAV2) and Lys-137 (AAV5) are represented as sticks. B, differences in the conformation of loops L DB between AAV2 (pale green) and AAV5 (red) OBDs. AAV2 loop L DB makes more extensive contacts with the DNA backbone due to its larger size; however, the conformation of the second half of the loops is very similar. C, sequence and structural alignment of AAV serotypes from ␤2 to ␣E that include residues involved in specific DNA contacts (red) and backbone DNA contacts (blue). NOVEMBER 13, 2015 • VOLUME 290 • NUMBER 46 (Fig. 7E). Moreover, in a recent study, we determined that mutation of linker residue proline 214 to alanine affects the ability of Rep68 to oligomerize and bind DNA (44). To test the effect of this mutation on the melting activity of Rep68, we performed the strand displacement reaction and showed that this mutant is unable to melt DNA (Fig. 7F). Taken together, our data show that the two functional domains plus the linker participate in DNA binding and act cooperatively to bind DNA. Moreover, our data suggest that oligomerization of Rep68 on DNA drives much of the binding affinity.

Discussion
Our studies illustrate the binding mode that AAV Rep68 uses to assemble at the AAVS1 integration site. The x-ray structure  Fig. 1B. The profile shows two peaks with the faster eluting peak corresponding to the Rep68-DNA complex, and the second peak corresponding to the excess free-DNA. mAU, milliabsorbance units. B, purified complex was analyzed by sedimentation velocity using a fluorescein-labeled AAVS1 DNA. Data were collected at 492 nm, and data were analyzed using the program Sedfit. Estimated molecular weight corresponds to a heptameric complex. C, equilibrium sedimentation of Rep68-AAVS1 was carried out in the same conditions as B and was fitted as a single sedimenting species model. The distribution of the residuals of the fit is shown. D, EM image of negatively stained Rep68-AAVS1 complex. The rings are easily recognized in the raw images. E, representative two-dimensional averages showing the heptameric complex.
of AAV2 OBD bound to the AAVS1 site expands our knowledge of the structural determinants that determine binding of AAV Rep proteins to GCTC repeats. Specific recognition is achieved through the interaction of residues in flexible loop L DB and ␣-helix D docking into contiguous major and minor grooves, respectively. The residues in helix D contacting the DNA bases are conserved in all AAV serotypes. In contrast, the nature of loop L DB varies only in AAV5 and AAV8 both in sequence and length (Fig. 4C). However, because many of the interactions of loop L DB with the DNA bases involved backbone atoms, it appears that the only prerequisite is to have enough backbone flexibility to dock into the major groove. This is pro-vided by the high content of glycine residues found in the loop (Fig. 4C). Thus, even though in AAV5 the loop is shorter by a couple of residues, its inherent flexibility allows it to dock into the major groove where main chain groups from glycine and alanine make hydrogen bond contacts with DNA. A puzzling case is the AAV8 Rep with a longer L DB loop but no basic residues to interact with DNA, particularly no equivalent residue for Arg-138. Instead, three hydrophobic residues alanine, valine, and methionine are found in the loop together with a proline (Fig. 4C). It is not clear how these changes still allow recognition of the GCTC repeat. Future structural studies are needed to answer this question. Our results show that high affinity and specific binding of Rep68 to DNA containing RBS sites require the concerted and cooperative binding of its two domains (OBD, helicase) and the linker. Thus, although the OBD provides the specificity to bind the GCTC repeats, affinity arises from the helicase domain and oligomerization during binding. This is in agreement with previous studies showing that binding affinity to the minimal RBS sequence (28 bp) is significantly lower than larger DNA sites with additional upstream sequences (45). The low affinity of the OBD for DNA can be understood by its peculiar binding mode. Analysis of the OBD-AAVS1 interface shows that the total buried surface area upon complex formation per monomer is only 268 Å 2 , one of the small-est for a specific protein-DNA complex. The number of residues per base contact is also at the lower end of specific DNA-binding proteins with only four residues contacting the DNA bases. These properties resemble those of proteins that bind DNA non-specifically (46). Moreover, the number of phosphate contacts is sparse, reflecting the low DNA binding affinity of the OBD.
Our data suggest that oligomerization of Rep68 is a prerequisite to bind DNA with high affinity. Binding studies with the oligomerization-deficient mutant Rep68 octlink show that this protein binds DNA poorly despite having all the functional side chains to make DNA contacts (Fig. 7E). The same results were obtained with a Rep68 P214A mutant that is mostly monomeric   NOVEMBER 13, 2015 • VOLUME 290 • NUMBER 46 in solution (44). Nevertheless, because of steric constraints, there is a limit to the number of Rep68 molecules binding to multiple GCTC repeats at the RBS site. A model of five Rep68 molecules binding to the AAVS1 site based on the AAV5 OBD-RBS x-ray structure shows that more than three Rep68 molecules bound to the RBS simultaneously will start to clash with each other. Thus, a Rep68 molecule bound to the first repeat will clash with the helicase domain of a Rep68 molecule binding downstream in the fourth repeat (Fig. 8A). We propose a model where a Rep68 oligomer scans DNA until it recognizes the RBS site. Assembly of the final heptameric complex is accomplished by addition of further molecules guiding the helicase domain (Fig. 8B). This process may be similar to the assembly of the simian virus T large antigen 40 protein where only one direct repeat is needed to assemble the final complex (47). These facts bring the question of why there are multiple repeats at the origin of replication in all AAV serotypes and the AAVS1 site. Part of the answer lies in the binding mode of the AAV OBD domain that requires two repeats to bind DNA specifically. However, the presence of multiple identical binding sites in regulatory DNA sequences such as enhancers and promoters is a general feature known as homotypic site clustering (48 -51). Thus, in addition to promoting cooperative interactions and producing a non-linear response to protein concentration, the occurrence of multiple repeats increases the local concentration of the protein by facilitating lateral diffusion on DNA (52)(53)(54)(55). Consequently, the number of repeats found in the RBS sites is not an indicator of the number of Rep molecules recruited to DNA.

Assembly of Rep68 Protein on AAVS1 Site
The heptameric Rep68-AAVS1 complex is different from other published reports that show a variety of Rep oligomers such as a hexamer of Rep78 on the AAV-2 ITR site (56), a Rep68 octamer bound to a single-stranded poly(dT) DNA (57), and the pentameric complex observed in the AAV5 OBD-RBS structure (26). These large varieties of complexes are the reflections of the versatility of AAV Rep proteins to be modulated depending on the DNA substrate they bind. However, the pentameric complex obtained in the x-ray structure of the AAV5 OBD-RBS complex shows the effect of the saturation of all possible repeats at high concentrations used in the crystallization conditions and does not reflect the real stoichiometry that occurs in the context of the full-length proteins. Thus, we hypothesize that in order to bind DNA with specificity and high affinity, Rep68 binds the AAVS1 site as an oligomer, at least a dimer, that subsequently assembles to form the final complex. However, it is possible that ATP binding and hydrolysis may lead to formation of a different stoichiometry reflecting the dynamic processes occurring at the time of assembly such as melting and nicking of DNA. These questions will need to be answer by future studies that are currently in progress in our laboratory.
Author Contributions-C. R. E. designed, analyzed data, and wrote the paper. F. N. M. expressed, purified the AAV2 protein, crystallized, solved, and refined the x-ray structures. F. Z. P. purified Rep68, performed analytical ultracentrifugation, collected and analyzed EM data, and performed melting assays. C. B. purified Rep68 and Rep68 mutants and performed binding assays. J. W. B.II performed analytical ultracentrifugation and analyzed the data. All authors analyzed the results and approved final version of manuscript.