Structure of the functional domain of φ 29 replication organizer: insights into oligomerization and DNA binding

The Bacillus subtilis phage phi29-encoded membrane protein p16.7 is one of the few proteins involved in prokaryotic membrane-associated DNA replication that has been characterized at a functional and biochemical level. In this work we have determined both the solution and crystal structures of its dimeric functional domain, p16.7C. Although the secondary structure of p16.7C is remarkably similar to that of the DNA binding homeodomain, present in proteins belonging to a large family of eukaryotic transcription factors, the tertiary structures of p16.7C and homeodomains are fundamentally different. In fact, p16.7C defines a novel dimeric six-helical fold. We also show that p16.7C can form multimers in solution and that this feature is a key factor for efficient DNA binding. Moreover, a combination of NMR and x-ray approaches, combined with functional analyses of mutants, revealed that multimerization of p16.7C dimers is mediated by a large protein surface that is characterized by a striking self-complementarity. Finally, the structural analyses of the p16.7C dimer and oligomers provide important clues about how protein multimerization and DNA binding are coupled.

The Bacillus subtilis phage φ29-encoded membrane protein p16.7 is one of the few proteins involved in prokaryotic membraneassociated DNA replication that has been characterized at a functional and biochemical level. In this work we have determined both the solution and crystal structures of its dimeric functional domain, p16.7C. Although the secondary structure of p16.7C is remarkably similar to that of the DNA binding homeodomain, present in proteins belonging to a large family of eukaryotic transcription factors, the tertiary structures of p16.7C and homeodomains are fundamentally different. In fact, p16.7C defines a novel dimeric six-helical fold. We also show that p16.7C can form multimers in solution and that this feature is a key factor for efficient DNA binding. Moreover, a combination of NMR and X-ray approaches, combined with functional analyses of mutants, revealed that multimerisation of p16.7C dimers is mediated by a large protein surface that is characterized by a striking selfcomplementarity. Finally, the structural analyses of the p16.7C dimer and oligomers provide important clues about how protein multimerisation and DNA binding are coupled.
Despite extensive studies on DNA replication, relatively little is known about its in vivo organisation, which, in prokaryotes, occurs at the cell membrane (1)(2)(3)(4). The well-studied B. subtilis phage φ29 (5) is one of the few systems for which this fundamental process has been investigated. The genome of φ29 consists of a linear double-stranded DNA (dsDNA) that contains a terminal protein covalently linked at each 5' end. Initiation of φ29 DNA replication, as well as that of most other linear genomes containing a terminal protein attached to its DNA ends, occurs via a so-called protein-primed mechanism (5-7). The φ29 genome encodes most, if not all, proteins required for phage DNA replication making φ29 an attractive system to study membrane-associated DNA replication.
The early-expressed φ29 gene 16.7, which is conserved in all φ29-related phages studied so far, encodes a well characterized membrane protein, p16.7 (130 amino acids), that plays an important role in membrane-associated φ29 DNA replication (5,8,9). It contains an N-terminal transmembrane domain which is responsible for membrane localization (9) (a schematic organization of protein p16.7 is shown in Fig. 1a).
2 Analyses of a soluble variant lacking the Nterminal membrane anchor, p16.7A, revealed that it has affinity for both single-stranded (ss) and dsDNA, as well as for the φ29 terminal protein. Moreover, p16.7A, which is a dimer in solution, can form multimers, especially upon DNA binding, and multimerization is important for the mode by which it binds DNA (9)(10)(11). Recently, it was shown that the dimerization and DNA binding activities of p16.7 are confined to the C-terminal half of the protein, p16.7C (12).
Here, we show that p16.7C is able to form multimers in solution and that this process is enhanced in the presence of DNA. To gain insight in the multiple features and functions of p16.7C we determined its solution and crystal structures. In addition, the protein oligomers have been analyzed both by X-ray and NMR methods. The structural features of the monomer, dimer and oligomers and their implications for DNA binding are discussed.
Site-directed mutagenesis of p16.7C.-Sitedirected mutations in gene 16.7C were obtained by polymerase chain reaction using the QuikChange  Site-D Mutagenesis Kit (Stratagene). Plasmid pET-16.7C (12) was used as template DNA.
Overexpression and purification of p16.7C and its derivatives.-Protein p16.7C and its derivatives were overexpressed and purified using a Ni 2+ -NTA resin column as described before (9,12). Protein p16.7Cb was obtained by digestion of protein p16.7C with the thrombin endoprotease using the Thrombin Cleavage Capture Kit (Novagen, Merck Biosciences, Darmstadt, Germany) as described before (12). To obtain 15 N and 13 C/ 15 N-labelled p16.7C proteins, cells were grown in U-15 N and U-13 C/ 15 N and Bio-express medium, respectively (Cambridge Isotope Laboratories, Andover, MA, USA).
Cross-linking assays.-Cross-linking reactions using 10 µM of protein and disuccinimidyl suberate (DSS) as cross-linking agent were performed as described before (9). After cross-linking, proteins were precipitated upon the addition of 1 volume of ice-cold 20% (w/v) trichloroacetic acid and, after resuspension, analysed by PAGE in the presence of SDS. The proteins were visualised by Coomassie blue staining.
Gel Mobility Shift Assays.-Gel retarding assays were performed as described before (11).
Nuclease digestion assays.-Nuclease digestion assays were performed using endlabelled 297 bp DNA fragments corresponding to the φ29 right end genome. Labelled fragments were either used directly (dsDNA) or after heatdenaturation (ssDNA). Micrococcal nuclease digestions of the ssDNA nucleoprotein complexes were performed as described before (11). DNase I reactions contained, in 20 µl, besides the endlabelled dsDNA fragment and the indicated amount of protein, 25 mM Tris-HCl (pH 7.5) and 10 mM MgCl 2 . Binding reactions were incubated for 10 min at 37 °C before 0.05 U of DNase I (Promega, Madison, USA) was added. Digestion was allowed to proceed for 2 min at 37 °C after which the reaction was stopped upon EDTA addition to a final concentration of 20 mM. For both, the micrococcal nuclease and DNase I digestions, a phenol/chloroform extraction step was performed before the DNA was precipitated 3 with ethanol in the presence of 15 µg linear polyacrylamide as carrier. Next, the resuspended DNA was analysed in denaturing 6% polyacrylamide gels. Finally, gels were dried and subjected to autoradiography.  15 Nlabelled and 13 C/ 15 N double-labelled molecules. Thus, HNCA, HNCO and HN(CO)CA spectra were employed for backbone assignment. The side-chain assignments were completed with 3D HCCH-TOCSY experiments. NOE distance restraints were obtained from 15 N-or 13 C-edited 3D-NOESY spectra. In addition, 2D-NOESY, 13 Cselected-12 C-filtered experiments were performed on a heterolabelled dimer to analyze the interprotein contacts. This protein sample was generated by mixing equimolar amounts of unlabelled and 13 C/ 15 N double-labelled protein to a global protein concentration of 100 µM in 200 mM NaCl, 1mM DTT, 10 mM phosphate buffer and pH 5.0. The sample was incubated at 40 °C for 24h and then concentrated for the NMR analysis.
Upper limits for proton-proton distances were obtained from NOESY cross-peak intensities at three mixing times, 50, 75, and 150 ms. Cross peaks were classified as strong, medium, and weak corresponding to upper limits of 2.5, 3.5, and 5.0 angstroms. The lower limit for proton/proton distances was set as the sum of the van der Waals radii of the protons. Structure calculations were performed using the program DYANA (18). A set of 2180 constraints (204 interprotein) were used in the final round of calculations. The thirty best DYANA structures in terms of target function were submitted to a simulated annealing protocol with the AMBER 5.0 package with the Cornell et. al. force field (19).
Protein oligomerization was detected employing DOSY (20) experiments. The average diffusion coeficients of the protein were determined at two different protein concentrations (350 µM and 3500 µM) and two different pH values (pH 5.0 and pH 7.0) in 200 mM NaCl, 10 mM sodium phosphate buffer. In order to get structural information about the protein oligomers 2D-NOESY experiments were also carried out with these samples. In addition, 13 C-HSQC and 15 N-HSQC spectra were recorded at high and low protein concentration. The high protein concentration sample was prepared by mixing unlabelled with 13 C/ 15 N double-labelled protein (4:1 ratio) to a final protein concentration of 3500 µM. Finally 2D-NOESY, 13 C-selected-12 C-filtered experiments were performed on this sample to analyze the interprotein contacts responsible for oligomerization.
Protein crystallization and data collection.-Crystallyzation experiments were carried out with a p16.7C solution at 10 mg/ml containing 50 mM phosphate buffer pH 7.5, 200 mM NaCl. Protein crystallization was achieved by dialisys of the protein solution against a solution containing 5 mM phosphate buffer pH 7.5, 75 mM NaCl. Prismatic crystals (0.05x0.05x0.2 mm) appear after two or three days of incubation at 4º C. Crystals were successively transferred to a series of solutions with increased percentage of glycerol (from 5% to 35%) and mounted in a fiber loop and frozen at 100 K in a nitrogen stream. Xray diffraction data were collected in a CCD detector using the ESRF Grenoble synchrotron radiation source at wavelength 0.92 at the BM16 beam-line. Diffraction data were processed using MOSFLM (21) (Table I).
Structure determination and refinement.-As mentioned above, the X-ray structure of p16.7C was solved by molecular replacement using the coordinates of an ensemble of 10 preliminary NMR models [AMoRe (22)]. The preliminary electron density map was improved by a density average protocol performed with DM (23). The averaged density map was good enough 4 to manually refine the NMR model from residues 66a to 86b. This model was then refined using the simulated annealing routine of CNS (24). Several cycles of restrained refinement with REFMAC5 (25) and iterative model building with O (26) were carried out. Water structure was also modeled. Calculations were performed using CCP4 programs (27). The final model was refined by iterative maximum likelihood positional and traslation, libration and screw rotation displacement (TLS) refinement (28) The stereochemistry of the model was verified with PROCHECK (29). Ribbon figures were produced using MOLSCRIPT (30) and RASTER (28). The accessible surface area of p16.7C dimer and protomer was calculated with the program "naccess" from LIGPLOT package (31).

The functional domain of protein p16.7 can form multimers.
Gel retardation studies demonstrated that the C-terminal half of protein p16.7 (p16.7C, containing p16.7 residues 63-130) has ss and dsDNA binding activities (12). These experiments were performed with protein p16.7Cb, the derivative of p16.7C lacking its N-terminal (His 6 )tag, to exclude possible effects of this positively charged region on DNA binding. Although these experiments unequivocally demonstrated DNA binding activity, they did not provide insight in the ability of p16.7Cb to form multimers or the mode of DNA binding, which can be deduced by nuclease digestion analyses of the nucleoprotein complexes. Therefore, increasing amounts of p16.7Cb were incubated with 5'-labelled ss or dsDNA probes. Next, the nucleoprotein complexes formed with ss and dsDNA were challenged to micrococcal nuclease and DNase I digestion, respectively, after which the fragments were fractionated through denaturing polyacrylamide gels (see Figs. 1b and c). In the absence of p16.7Cb nearly all the end-labelled ssDNA fragment was degraded into small oligonucleotides (Fig. 1b, lane 6). In the presence of p16.7Cb, in the range of 5 till 20 µM, however, the amount of small oligonucleotides decreased giving rise to a variety of larger ssDNA digestion products (Fig.  1b, lanes 1-3). At higher p16.7Cb concentrations, hardly any degradation products were observed (lanes [4][5] indicating that the ssDNA was (almost) completely protected from micrococcal nuclease attack. Together, these results confirm that p16.7Cb binds to ssDNA and, importantly, provide strong evidence that, at elevated concentrations, p16.7Cb forms multimers upon ssDNA binding leading to the generation of a continuous array of protein protecting the entire DNA fragment from micrococcal nuclease attack. Similar results were obtained when dsDNA was used (Fig. 1c). Thus, whereas a typical DNase I digestion pattern was observed in the absence or in the presence of low amount (5 µM) of p16.7Cb (lanes 1 and 2, respectively), full protection of the dsDNA fragment was seen at higher p16.7Cb concentrations (lanes 3 and 4). Contrary to the analyses with ssDNA, though, no intermediate levels of nuclease protection were observed with dsDNA. This might indicate that multimer formation of p16.7Cb is enhanced upon binding to dsDNA, although this difference may also be due to different digestion characteristics of DNase I versus micrococcal nuclease. In vitro cross-linking carried out in the presence of dsDNA followed by Western blot analyses confirmed the ability of p16.7Cb to form multimers (Fig. 1d). Similar results were obtained using ss instead of dsDNA or using p16.7C (not shown).

Structure determination of p16.7C dimer
Analytical ultracentrifugation experiments showed that p16.7C was mainly in its dimeric form at the temperature (30-40 ºC), pH (5-7), ionic-strength (200 mM NaCl) and concentration (0.4-0.8 mM) conditions employed for the NMR studies described below. Full backbone and sidechain assignment for p16.7C (supplementary Table I) was obtained using standard 2D and 3D NMR techniques on 15 N and 13 C/ 15 N labelled samples. In order to distinguish intra-from intermolecular NOEs, half-filtered NOESY experiments ( Fig. 2a) were carried out on a heterolabelled dimer obtained by mixing equivalent amounts of unlabelled and 13 C/ 15 N labelled protein. A summary of the experimental constraints employed and the characterisation of the final NMR ensemble are shown in Table I.
In the course of the NMR studies, diffraction quality crystals of the protein were obtained in a dialysis membrane. These conditions were then used as starting point for crystallographic studies. A preliminary NMR-5 derived ensemble of ten models was initially employed in the X-ray structural determination of p16.7C by molecular replacement at 2.9 Å resolution (see Table II and Materials and Methods). The RMSD values for superimposition show that the NMR and crystal structures of p16.7C dimer are very similar (0.87 and 1.04 Å for the monomer and dimer backbone superimposition respectively in ordered regions; i.e. residues 68-125). The combined NMR and X-ray structures not only provide validation of one another but also give a more complete picture of the structure and dynamics of p16.7C than either structure alone.

Architecture of the Dimer
The solution and crystallographic 3D structures of p16.7C dimer are shown in Figures  2b and 2c Fig. 1b). The secondary and ternary structure of each monomer is stabilised by formation of a hydrophobic core resulting from the packing of the three helices.
According to both NMR and X-ray data, p16.7C forms a symmetric dimer that corresponds to a novel six-helical fold. Indeed, none of the proteins present in the Brookhaven Protein Data Bank exhibits high structural homology with p16.7C according to DALI (32) and SCOP (33). The two monomers, related by a noncrystallographic two-fold symmetry, are held together by a combination of extensive hydrogen bonding, and hydrophobic and electrostatic interactions. In the following description of the main intermolecular contacts, letters a and b will be employed to specify the polypeptide chain only when essential to avoid ambiguities. The primary dimer interface is formed by helices H3a/H3b and the extended C-terminal region of both monomers, which are oriented in an antiparallel fashion. Thus, the extended C-terminal region of each protein packs against helices H1 and H3 of the other, being involved in both hydrophobic and polar intermolecular contacts. Especially relevant for the dimer stability seems the role of L124 of each monomer, which is totally buried in a hydrophobic patch formed by Y79, L78, and L106 of the other protein monomer. In addition, there are polar intermolecular contacts between R113/K122, S88a/S88b and R85/Y115. Recent studies have highlighted the influence of CH-π interactions involving proline and aromatic side-chains on protein stability (34). In p16.7C the P87a and P87b side-chains, 4 Å apart at the dimer interface, pack against W116 indol ring of both polypeptides (W116a and W116b, supplementary Fig. 2). This particular arrangement is likely to make a significant contribution to the stability of the complex.
It has been shown that nonspecific binding of proteins to polyelectrolytes such as DNA is purely electrostatic, and can be quantitatively explained in terms of competitive condensed counterion displacement from DNA by polycationic regions of the protein (35). In this sense, the distribution of basic residues on the surface of a protein can provide important clues regarding nucleic acid binding sites. Thus, the surface electrostatic distribution of the p16.7C dimer was calculated to determine the most probable DNA binding site location. Figure 3 shows that the negatively charged resides are clustered at the surface defined by helices H1/H2. In contrast, the opposite side of the dimer exhibits a moderate positive potential and therefore constitutes the most likely DNA binding site. Indeed, this protein region, defined by helices H3a/H3b plus the C-terminal regions (residues 122-130), contains several basic residues (K122a/K122b, K123a/K123b, R126a/R126b and K130a/K130b). The eight positive charges available for protein-DNA interactions seem low in comparison with those observed in other nonspecific DNA binding proteins (36). Nevertheless, it is important to mention that protein oligomerization may multiply the DNA recognition surface and therefore enhance DNA binding. This view agrees with the apparent cooperative binding observed for p16.7C (see Fig.  1). 6

Structural basis for p16.7C oligomerization
Recent studies have shown that p16.7 multimerization is a prominent feature in its DNA binding mode (11). Here we have demonstrated that p16.7C retains the capacity to form multimers, implying that the C-terminal half of p16.7 contains a region(s) responsible for p16.7C dimer-dimer interaction. p16.7C dimers form a fibre around a crystallographic three-fold screw axis in the crystals obtained (Fig. 4a). This structural organisation may reflect the multimerization mode required for efficient DNA binding and therefore we initiated our studies on the protein surface involved in DNA-binding induced multimerization by examining the interdimeric p16.7C contacts observed in the crystal.
The interdimeric contacts observed in the p16.7C crystal are not essential for DNA-binding induced multimerization.-The interactions between two neighbouring p16.7C dimers in the crystal (see Fig. 4b) involve two salt bridges (E72a/R98b and the symmetry related pair) and six hydrogen bonds (R98a/V95b, N67a/N96b, C71a/Q100b and the symmetry related pairs) and cover 881 Å 2 of the dimer surface (11% of the total solvent accessible area). Interestingly, the E72 side-chain is buried into the hydrophobic interface formed by helices H1, H2 and H3, being a rather unusual orientation for a polar residue. Probably, the poor solvation of E72 confers extra stability to its interaction with R98. Consequently, the two salt-bridges seem to be especially relevant for the stability of the complex.
To study whether the interdimeric p16.7C contacts observed in the crystal reflect the multimerisation mode involved in efficient DNA binding we have constructed and purified two p16.7C mutants, pE72Q and pR98W, and analyzed their features. In E72Q, the salt-bridges observed in the wild-type protein would be replaced by simple polar interactions between a charged and a neutral side-chain. In R98W, the interaction between residues 72 and 98 would be totally absent. In addition, the large steric volume of the tryptophan side-chain is expected to completely disrupt most of the polar interactions detected at this multimerization interface. To assure that the mutations introduced do not affect dimerisation, these mutants were subjected in parallel with p16.7C to in vitro cross-linking analysis using the bifunctional cross-linking agent disuccinimidyl suberate (DSS). The results, presented in Figure 5a, show that similar amounts of cross-linked dimers were obtained for each protein demonstrating that the mutations do not have major effects on dimerisation. Gel retardation and footprinting assays showed that the DNA binding characteristics of both mutants was similar to that of p16.7C (not shown). Finally, Western blot analysis of in vitro DSS cross-linked samples in the presence of DNA conclusively showed that both mutants retained their ability to form multimers (Fig. 5b). Together, these results indicate that residues E72 and R98 are not crucial for the DNA-binding induced multimerization of p16.7C.
Determination of the p16.7C surface involved in oligomerization by NMR.-Solution studies were then undertaken to get information about the p16.7C multimerization mode. First, DOSY (20) experiments were performed at low (350 µM) and high (3500 µM) protein concentration and two pH values (pH 5.0 and 7.0). Significant differences in the diffusion coefficient of the protein were detected between the high and low protein concentration NMR samples (∆logD= 0.17 at pH 7 and 0.1 at pH 5.0), indicating a clear difference in the oligomeric state of p16.7C. Second, 2D NOESY and N 15 /C 13 HSQC NMR spectra were carried out with the 350 and 3500 µM protein NMR samples. Interestingly, despite the expected general broadening of the resonances at high protein concentration due to the limited oligomerization occurring under these conditions, clear sequence-dependent differences were detected (these were more evident at pH 5.0, when the extent of oligomerization is lower, according to the diffusion coefficient). Thus, several proton signals in N83, Y108, Y115 and Q112 were extremely affected by line broadening and almost disappeared (some examples are shown in supplementary Figs. 3 and 4b). Moreover, chemical shift changes (> 0.025 ppm) between the high and low concentration samples were measured for some aliphatic proton signals in residues N83, R85, Q97, Y108, Q112, Y115, E119, K123 and Y125 (some examples are shown in supplementary Fig. 4a and b). Interestingly, they are located at the same extended surface of the protein dimer (see Fig. 6a Fig. 6). All these contacts (shown in Fig. 6b and listed in supplementary Table II) were in agreement with the pattern of concentration dependent line broadening and chemical shift changes described above. For example, the Y125 aromatic side-chain shows clear NOEs with the methyl group of T111. Both residues are more than 21 Å apart in the dimer structure and were not detected in the diluted protein sample. Similarly, clear intermolecular contacts were observed between N83/Y115, L99/Y125, and Y125/Y108. Interestingly, all the identified residues involved in interdimeric contacts fall at either side of the p16.7C dimer. Altogether, these data indicate the existence of a multimerization interface that is different from the one detected by X-ray crystallography and which involves larger surface areas.
Most interestingly, the addition of dsDNA (A 14 T 14 ) to a low p167.C concentration sample (350 µM) at pH 7.0 produced remarkably similar changes on the 15 N.HSQC spectrum as those observed at high p16.7C concentration (3500 µM) in the absence of DNA. According to cross-link and band-shift experiments, DNA binding promotes protein oligomerization. Thus, the observed line-broadening of p16.7C signals in the presence of dsDNA, is likely to result from a combination of two different association processes, DNA binding and protein oligomerization. Figure 7 shows a comparison between two regions of a p16.7C 15 N-HSQC spectrum under three different conditions; low (middle frames) and high (lower frames) protein concentration without dsDNA and low protein concentration in the presence of stochiometric amounts of dsDNA (upper frames). It can be observed that the NH-signals that are most affected by increasing the p16.7C concentration (as Y115, Y108, Q112 or N83 side-chain) are also the ones most altered at low protein concentration in the presence of DNA. In contrast, several residues involved in the interdimer contacts observed in the crystallographic fibre, as the side chain of Q100 or N67, remain totally unaffected in both cases. These results strongly suggest that DNA binding promotes the same multimerization process detected in solution without DNA at much higher p16.7C concentration.
An NMR-based model of a p16.7C tetramer was calculated by employing the experimental sets previously measured for the dimeric species and 30 additional interprotein NOE-derived constraints to define the relative orientation of both p16.7C dimers within the complex (see Fig. 8a and supplementary Fig. 7). A particularly striking feature of this model is the self-complementary shape of the p16.7C dimer. It can be observed that the two most internal helices H3 plus the extended 122-127 regions belonging to separate p16.7C dimers are aligned in an antiparallel fashion (crossing angle of 0 o ). The overall orientation of the two protein dimers within the tetramer differs by 35 o . Various experimentally detected interdimeric interactions are apparent from the model. Thus, (i) the aromatic side-chain of Y125 presents hydrophobic contacts with Y108 and the methyl group of T111, (ii) helix H3 and loop I would participate in a number of polar interprotein interactions involving residues N83, R85, E119 and Y115, and (iii) the helix H2 solvent-exposed surface might present several polar contacts with helix H1.
A model for a larger multimer was obtained by simply extending this oligomerization mode in both directions (Fig. 8b). Due to the 35 o twist present at each step, the protein oligomer would define a large helical structure with 10 dimers per turn. The inner part of the helix is characterized by a positive electrostatic potential and its shape and charge complementarity with dsDNA surface strongly suggests that this region of the supramolecular complex participates in DNA recognition. This organization implies that the coiled-coil and transmembrane domains, present in the full-length p16.7 protein, would be exposed to the outer face of the helix. It seems clear that such a configuration would be compatible with attachment to the membrane only for protein oligomers of limited size (up to four p16.7 dimers).

DISCUSSION
The φ29 dimeric membrane protein p16.7 has been shown to be involved in the organization of membrane-associated φ29 DNA replication (8,9). This small protein (130 residues) presents a 8 remarkable diversity of activities. First, it binds nonspecifically to both ss and dsDNA (10)(11)(12). Second, p16.7 can form multimers, both in vitro and in vivo, and it has been shown that multimerization is crucial for its DNA binding mode (11). Third, it has affinity for the φ29 terminal protein. Interestingly, the C-terminal half of protein p16.7 (p16.7C) is responsible for protein dimerization and contains the φ29 terminal protein and DNA binding capacities (12). Here, we have shown that protein p16.7C is also capable of forming multimers. Altogether, these results indicate that p16.7C constitutes the functional domain of p16.7. This conclusion is in line with the observation that the C-terminal half of all p16.7 homologues known has a higher level of conservation when compared to the N-terminal half (5).
To gain insight into its various features and functions we have determined the solution and crystal structures of p16.7C. The primary sequence of p16.7C has homology with the DNA binding homeodomains (around 20 and 40% of identity and similarity, respectively), which are present in a large family of eukaryotic transcription factors. Especially, several residues highly conserved in most homeodomains and critical for either structure or function are also conserved in p16.7C. In addition, computer-assisted analyses suggested that the secondary structure p16.7C may be similar to that of homeodomains (12). Determination of the structure indeed reveals that the secondary structures of p16.7C and homeodomains are very similar. Moreover, as in homedomains, the secondary and tertiary structures are stabilized by the formation of a well defined hydrophobic core, resulting from the packing of the three helices. Nevertheless, the spatial organisation of the 3 α−helices in p16.7C is fundamentally different from that of homeodomains (37) and, in contrast to them, p16.7C forms stable dimers. Thus, despite its similarities with homeodomains regarding the primary and secondary structures, the functional domain of p16.7 defines a novel dimeric sixhelical fold.
The positively charged surface formed by the H3 helices and its following extended regions probably constitutes the DNA binding site. Considering that native p16.7 is anchored at the membrane by its N-terminus, this putative binding surface of p16.7 will be directed towards the center of the cell, and thus is compatible with a role in organizing φ29 DNA replication at the membrane of the infected cell.
Dimers of p16.7C are arranged forming a protein fibre in the crystals obtained. Mutagenesis studies, however, show that this organization is not crucial for DNA recognition. Nevertheless, given the large variety of different activities displayed by p16.7 and the apparent strength of the interactions that promote fibre formation, a possible biological function for these contacts can not be ruled out.
The overall structural features of p16.7C oligomers in solution have been determined by NMR methods. Interestingly, the oligomers display a striking charge and shape complementarity with dsDNA, which is in agreement with a role in promoting binding of p16.7C to dsDNA. The fact that the same multimerization mode was also detected at low p16.7C concentration in the presence of dsDNA strongly supports this view.
The observed protein oligomers present a rather low stability as shown by analytical ultracentrifugation experiments. However, under natural conditions, compartimentalization of native p16.7 by attachment of its N-terminal membrane anchor to the membrane (9) restricts its translation diffusion to two dimensions and increases its local concentration and hence is expected to stimulate multimerization. Local protein concentration will also increase upon DNA binding. Moreover, DNA binding may further enhance multimerization by restricting p16.7C flexibility and/or by introducing subtle changes in the p16.7C dimer.
It is worth mentioning that, although feasible in vitro, the helical arrangement resulting from extensive protein oligomerization is unlikely to be compatible with the anchorage to the bacterial membrane for the full-length p16.7 protein. Fluorescence microscopy showed that, especially at early infection times, strong foci of p16.7 signals were obtained throughout the membrane (8). No indication, however, was obtained that p16.7 would form ring or helical structures at the membrane. Rather, the dispersed patches suggest clustering of limiting amounts of p16.7 molecules at the membrane.
Based on all this information we propose the following model for the protein-DNA 9 complex. The structural organisation of p16.7C dimers precludes deep penetration of the H3helices, located at the probable DNA-binding surface, into the major groove of dsDNA in a similar way to that observed for homeodomains or prokaryotic helix-turn-helix proteins, implying that it binds dsDNA in a fundamentally different way. A protein-DNA complex compatible with the structural features of the p16.7C dimer and oligomers can be built, however, by spanning the binding surface of the protein dimer above the major or minor groove of a canonical B-form of DNA with helices H3a/H3b plus the extended Cterminal regions sitting on the polyphosphate backbone. This proposed DNA binding mode (supplementary Fig. 8), which is highly similar to that of the so-called MADS-BOXES (38,39), is attractive because it allows interaction of the arginine and lysine side chains, located at the putative binding surface, with the antiparallel phosphate backbone that forms the groove-edges of the dsDNA which may explain the lack of sequence specificity of p16.7C. Moreover, this model agrees with several other features determined for p16.7C. First, the moderate number of basic residues located at the probable p16.7C DNA binding surface explains why p16.7C oligomerization is required for efficient dsDNA recognition. Second, this binding mode is compatible with the multimerization mode detected in solution permitting consecutive dimers of p16.7C within the oligomer to bind to adjacent DNA regions. And third, due to extension of the DNA binding surface in the protein oligomer the protein-DNA interactions will be multiplied, which explains the observed non-linear increase of in vitro DNA binding with respect to protein concentration and also the observed DNAenhanced multimer formation of p16.7C. In fact, the protein concentration-dependent effects on DNA binding, observed in gel retardation and nuclease protection assays, strongly indicate that p16.7C binds DNA in a cooperative manner. A cooperative way of DNA binding is observed for most proteins that act transiently during DNA replication, and p16.7 is believed to fall into this class of proteins.

Accession Numbers
The coordinates corresponding to the NMR and X-ray structures of p16.7C dimer have been deposited in the PDB (XXXX and XXXX codes, respectively). Figure 1.-The functional domain of p16.7 can form multimers. a) Schematic representation of the organization of protein p16.7 (130 residues). TM, transmembrane region. b-c) Nuclease treatment of nucleoprotein complexes formed by p16.7Cb. The end-labelled 297 bp right end φ29 DNA fragment was used directly (c) or after heat denaturation (b) in nuclease digestion assays. The labelled probes were preincubated either in the absence or presence of increasing amounts of p16.7Cb. After complex formation, samples were treated with micrococcus nuclease (b) or DNase I (c) as described in Materials and Methods, and the DNA products fractionated through polyacrylamide gels under denaturing conditions. The digestion pattern observed in Fig.  1b is the consequence of preferential cleavage by micrococcal nuclease at AT-rich regions 11,39 . Digestion of the ssDNA fragment without protein during a shorter period of time or using lower micrococcus nuclease concentrations gave patterns highly similar to those obtained in the presence of limited amounts of p16.7Cb. Protein p16.7Cb concentrations used in (b) were 5, 10, 20, 40 and 80 µM; those in (c) were 5, 10 and 20 µM. d) Protein p16.7Cb was cross-linked in vitro in the presence of dsDNA using the cross-linking agent DSS. After cross-linking, the sample was subjected to SDS-PAGE and Western blot analyses using polyclonal antibodies against p16.7.        The average maximum violation for the upper distance constraints was 0.27 Å (average number 4.1)-For the R.m.s deviations between pair of structures the two values correspond to superimposition of the monomeric (residues 68 to 125) and dimeric (residues 68a to 125a and 68b to 125b) structures.                 -a) Schematic representation of the main interprotein contacts between p16.7C dimers deduced from the analysis of 2D-NMR spectra at high protein concentration. b) NMR-derived model for the p16.7C tetramer. An ensemble of 15 structures is shown. It is clear from these data that 30 experimental constraints is enough to define the relative orientation of the two p16.7C dimers within the supramolecular complex.