Structural analysis of Bacillus subtilis SPP1 phage helicase loader protein G39P.

The Bacillus subtilis SPP1 phage-encoded protein G39P is a loader and inhibitor of the phage G40P replicative helicase involved in the initiation of DNA replication. We have carried out a full x-ray crystallographic and preliminary NMR analysis of G39P and functional studies of the protein, including assays for helicase binding by a number of truncated mutant forms, in an effort to improve our understanding of how it both interacts with the helicase and with the phage replisome organizer, G38P. Our structural analyses reveal that G39P has a completely unexpected bipartite structure comprising a folded N-terminal domain and an essentially unfolded C-terminal domain. Although G39P has been shown to bind its G40P target with a 6:6 stoichiometry, our crystal structure and other biophysical characterization data reveal that the protein probably exists predominantly as a monomer in solution. The G39P protein is proteolytically sensitive, and our binding assays show that the C-terminal domain is essential for helicase interaction and that removal of just the 14 C-terminal residues abolishes interaction with the helicase in vitro. We propose a number of possible scenarios in which the flexibility of the C-terminal domain of G39P and its proteolytic sensitivity may have important roles for the function of G39P in vivo that are consistent with other data on SPP1 phage DNA replication.

The initiation of DNA replication is a key step in the life cycle of all cells and as such its careful and precise control is essential. Studies of prokaryotic systems centered mainly on Escherichia coli, and its extrachromosomal elements have identified the following three key stages involved in DNA replication initiation: first, the recognition of the DNA replication origin and initial melting of the DNA strands; second, the recruitment of the replication machinery to the origin; and third, the remodeling of a replication complex to trigger the transition from a stable origin-bound complex to a mobile replication machine (1).
Initiation of DNA replication in E. coli proceeds via a sequence of events involving a replicon-specific recognition of oriC by DnaA and the loading of the replicative helicase DnaB by DnaC following the local melting of the DNA in an A ϩ T-rich region (1)(2)(3). Initiation of -type DNA replication in many different extrachromosomal elements follows a similar central scheme but can differ in the requirements for hostencoded components and their remodeling (see Ref. 3). The most well characterized example of initiation of DNA replication within a Gram-positive bacterial environment is that of the Bacillus subtilis phage SPP1. Initiation of SPP1 replication requires the phage-encoded products of genes 38, 39, and 40 (G38P, 1 G39P, and G40P) in addition to the host DNA polymerase III and DnaG primase (4). G38P (a monomer with a predicted molecular mass of 29,997 Da) acts as a close functional equivalent to DnaA (although the proteins share no sequence similarity) and is the replisome organizer of the SPP1 system. The G38P protein specifically interacts with its cognate site present in multiple copies at the phage replication origins (oriL and oriR). This interaction occurs in the absence of ATP and is thought to induce the local unwinding of the adjacent A ϩ T-rich sequence present within oriL to initiate -type DNA replication (5)(6)(7). G40P is a DnaB-like helicase and as such is a ring-shaped hexamer, capable of unwinding duplex DNA with a 5Ј to 3Ј polarity in a reaction fueled by nucleotide 5Ј-triphosphate hydrolysis (6,8). G39P is predominantly a monomer (molecular mass 14,610 Da) when free in solution and forms a specific interaction with ATP-activated G40P that inactivates the ssDNA binding, ATPase, and unwinding activities of the helicase. Targeting of G40P by G39P to the G38Pbound oriL then functions to activate G40P upon delivery (6). It is believed that G40P, in the form of the G39P-G40P-ATP complex, is delivered to G38P-bound oriL via the specific protein-protein interaction of the helicase-bound G39P with the G38P bound at oriL. These interactions result in the formation of an unstable nucleoprotein oriL-G38P⅐G39P⅐G40P-ATP intermediate, with subsequent release of G38P/G39P heterodimers that leaves the ATP-activated G40P complex to bind the melted origin region (6). Uncomplexed G38P remains bound to oriL, and the G40P helicase is free to interact with DnaG and the subunit of both DNA polymerases and begin DNA unwinding (6,9,10). The action of G39P protein is similar to that of the bacteriophage gene P helicase-loading protein, but P requires an elaborate remodeling for freeing the helicase from the helicase/helicase-loader complex (1,3). Furthermore, the action of G39P protein is quite distinct from that of the bacteriophage T4 gene 59 helicase-loading protein whose functions combine those of G38P and G39P as well as having a role in both replication and recombination (11,12).
As the G39P protein exists in a variety of oligomeric forms including monomers, hetero-oligomers with G40P (probably in a 6:6 ratio of G39P:G40P-ATP), and heterodimers with G38P, the protein must possess a fold capable of allowing it to form a variety of specific interactions depending upon its local environment and the exact stage of the DNA replication process (4,6). We have carried out a structural investigation of G39P to get a better understanding of the way in which G39P can interact specifically with both G38P and G40P and thus act as a key component of the system. Thus we present the first crystal structure and a preliminary NMR analysis of G39P alongside functional analyses of deletion mutants. It is the first structure of a prokaryotic helicase loader protein involved in -type DNA replication that also functions as an inhibitor of helicase function.

EXPERIMENTAL PROCEDURES
Crystal Structure Determination-The details of wt G39P and G39P112 variant constructs, their overexpression, purification, and crystallization are described elsewhere. 2 Following purification, the wt protein was crystallized from ammonium phosphate, and preliminary x-ray diffraction data were collected at room temperature on home laboratory x-ray sources. These data suggested that the crystals belong to space group P6 1 22 (or P6 5 22) with cell dimensions a ϭ b ϭ 105.3 Å, c ϭ 47.4 Å, and diffract to ϳ3.5 Å. However, the reproducibility of these crystals was very poor, and an analysis by mass spectrometry of freshly purified protein and the crystals themselves revealed multiple fragments derived from proteolytic cleavage at the C terminus of the protein. The proteolysis terminated markedly after the final 14 residues of G39P had been removed, and thus gene 39 was engineered to produce a mutant construct that coded for a truncated protein comprising only the first 112 of 126 residues of G39P. The G39P112 protein and selenomethionine incorporated form were prepared in a similar manner to wild type protein but crystallized from ammonium sulfate in space group P2 1 2 1 2 1 with cell dimensions a ϭ 85.6 Å, b ϭ 89.7 Å, c ϭ 47.6 Å. The crystals have three monomers in the asymmetric unit and an assumed solvent content of 47% based on a V M (Matthews coefficient) of 2.3 Å 3 Da Ϫ1 . The SeMet G39P112 crystals were used in a subsequent multiwavelength anomalous dispersion experiment in which data were collected using a Mar Research 345 imaging plate scanner at the European Synchrotron Radiation Facility on station BM30 using inverse beam geometry to collect Friedel pairs. The data for each wavelength were processed individually and scaled in such a way as to preserve anomalous signal using the HKL Suite of programs (13). The data processing statistics are shown in Table I. The positions of six selenium atoms were found using the program SOLVE (14). These positions were then refined, and initial phases were calculated in the program ML-PHARE (15) following the pseudo-MIR procedure (16). Phasing statistics are shown in Table I.
An electron density map was calculated at 3.0 Å and subsequently improved by solvent flattening and histogram matching with the program DM (17). This map was of good quality with readily identifiable regions of secondary structure. Following a preliminary trace of the secondary structure, non-crystallographic symmetry operators were determined for the three monomers found in the asymmetric unit, and the map was averaged and phase-extended to 2.4 Å using DM. The model fitted to the resultant map was submitted to refinement using the program REFMAC (18). Iterative cycles of phase combination of the partial structure phases and those from the multiwavelength anoma- lous dispersion experiment, model building, and refinement, which in the latter stages was performed using individual isotropic B-factors, translation, libration, and screw tensor parameterization (19), and loose non-crystallographic symmetry restraints were used to construct a model with good stereochemistry that accounted for residues 1-67 in each subunit. Maps that had not been solvent-flattened nor had noncrystallographic symmetry operators applied were examined to check for possible errors in the assignment of solvent boundary and accidental protein density flattening or for use of inappropriate restraints, but there was no indication that this had happened. The positions of the six selenium atoms correlated with the locations of the methionine residues in the N-terminal portions of each monomer. The refinement statistics are presented in Table I. NMR Analysis-Protein samples at concentrations of 1-2 mM in 20 mM phosphate buffer, pH 6.5, and temperatures ranging from 25 to 55°C were used. Both one-and two-dimensional 1 H 1 H experiments were recorded as described previously (20), using a Bruker DRX 500 spectrometer. Data were processed using FELIX (Molecular Simulations Inc.).
Gel Filtration and Analytical Ultracentrifugation Analyses Studies-The G39P protein used in the gel filtration experiments was prepared and analyzed as described elsewhere (6).
In the analytical ultracentrifugation sedimentation velocity analysis, 0.42-ml samples of protein at 1 mg ml Ϫ1 were centrifuged in 1.20-cm path length, two-sector aluminum centerpiece cells with sapphire windows in a four-place An-60 Ti analytical rotor running in a Beckman Optima XL-I analytical ultracentrifuge at 50,000 rpm at 16°C. Changes in solute concentration were detected by Rayleigh interference and 280 nm absorbance scans. Results were analyzed by g(s*) analysis (21) using the program DCDTϩ version 1.13 (22).
Limited Proteolysis Assays-For proteolytic studies, pure wtG39P or N-terminal His-tagged variants (6 M) were prepared in phosphate buffer (PO 4 H 2 Na/PO 4 HNa 2 , pH 7.5, 0.5 mM dithiothreitol, 5% glycerol) containing 50 mM NaCl and 1 mM phenylmethylsulfonyl fluoride and then incubated with proteinase K (62 ng/reaction) for increasing time intervals (0.5, 1, 2, 5, and 10 min) at 37°C. An aliquot of the mixture was then removed, and the reaction was stopped by addition of stop buffer (50 mM Tris-HCl, pH 7.5, 400 mM glycine, 3% 2-mercaptoethanol, 2% SDS, 10% glycerol), before the products were loaded onto a 15% SDS-PAGE gel. The signal was quantified using a PhosphorImager. The 1-min proteinase K incubation reaction mixture was dialyzed against water and subjected to matrix-assisted laser desorption ionization/time of flight mass spectrometry. The N-terminal His-tagged G39P variant was incubated with proteinase K (62 ng/reaction) for 1 min at 37°C. The reaction mixture was loaded onto a Ni-NTA column, and the column was washed in phosphate buffer containing 5 mM imidazole before elution with phosphate buffer containing 250 mM imidazole and subsequent analysis of the eluant on a 15% SDS-PAGE gel.
Deletion Mutant Assays-The B. subtilis SPP1 wt phage was routinely propagated in B. subtilis strain YB886 (sup o ) and the conditional lethal mutants SPP1sus53 and SPP1sus22 in BG295 (sup3) strain. Phage stocks had titers of 1.0 -5.0 ϫ 10 10 plaque-forming units/ml when plated under permissive conditions. Reversion frequencies were not higher than 10 Ϫ5 . SPP1 wt, SPP1sus53, and SPP1sus22 were used to infect B. subtilis YB886 cells bearing plasmid-borne gene 39 or gene 39-112, and manipulations followed the standard procedures described for SPP1 (23).
For the affinity chromatography assay, gene 39 mutants were constructed that encoded for His-tagged, truncated variants of the wt protein.
In separate experiments, each protein variant was loaded onto a Ni-NTA-agarose column (2 g of protein per 20 l matrix). G40P (2 g) was then loaded onto the column, and the binding ability of the G39P variant was confirmed by elution using imidazole (250 mM) followed by SDS-PAGE analysis.

RESULTS
Crystal Structure of G39P-The x-ray crystallographic analysis of G39P has revealed a completely unexpected bipartite structure for the protein that is made even more striking given its comparatively small size (126 residues in the wt protein). In the final model fitted to a map at 2.4-Å resolution, residues 1-67 for each of the three copies of the protein in the a.u. were present, and there was a total of 41 solvent molecules. There was no interpretable electron density for residues 68 -112 at the C terminus of each subunit, and the N-terminal domain was sufficient to make all the necessary crystal packing contacts. The final model R-factor is 0.20 with a corresponding value for R free of 0.23 and strongly supports the proposal that all of the ordered scattering matter has been reasonably modeled at this resolution.
Thus the non-complexed G39P protein in vitro would appear to consist of two distinct domains as follows: a fully folded 67-residue N-terminal domain, and a C-terminal domain that has only limited fold (see NMR analysis below). Each of the three copies of the G39P monomer in the a.u. has essentially the same fold for the N-terminal domain that is composed of four helices. There are three ␣-helices as follows: ␣A (residues 3-16); ␣B (residues 26 -39); ␣C (residues 42-55), plus there is a very short 3 10 helix D (residues 62-65) comprising little more than one turn. The helices can be described as two approximately parallel pairs (␣A/␣C and ␣B/D) that cross at an angle of about 70° (Fig. 1). The structure of the bacteriophage T4 gene 59 helicase-loading protein also has two ␣-helical domains, but its N-terminal domain shows a strong structural similarity to the high mobility group family proteins (11), which is not seen in the N-terminal domain of G39P.
The G39P112 protein crystallized with three independent monomers in the a.u., but they adopted an arrangement around a non-crystallographic 6 1 screw axis parallel to the crystallographic c axis (Fig. 2). When viewed along the direction of the c axis, the disordered C-terminal domains are located on the exterior of the helical arrays of monomers formed by the screw axes. The cavities in the crystal lattice are clearly sufficient to accommodate the C-terminal domains of the G39P112 variant as supported by mass spectrometric analysis of the crystal used in the structure determination that confirmed the presence of intact variant (data not shown). However, it would seem that the extra bulk provided by the 14 C-terminal residues of the intact wt protein necessitates an alternative crystal packing arrangement that appears to be less stable as judged by the noticeably poorer crystal reproducibility and diffraction quality. The inter-monomer contacts are made predominantly between residues immediately following helices ␣A and ␣C and those preceding helices ␣B and D. The residues involved are both polar and hydrophobic, and the interface includes two completely buried water molecules. Pairwise superposition of the ␣-carbon positions of each of the monomers gives root mean square deviation values of 0.2 to 0.3 Å. Approximately 1200 Å 2 or 30% of the surface area of each monomer is buried in the interfaces with other monomers in the non-crystallographic 6 1 helix and a further ϳ10% on average in contacts with other subunits in the crystal lattice. The missing polypeptide chain in the electron density map, extending from residue Lys-67, is directed toward a large cavity in the crystalline lattice (Fig. 2) where it adopts a flexible, mainly unfolded state (see NMR analysis below). Thus at least in the crystal lattice, the G39P112 protein appears to exist as a monomer.
Calculations of the electrostatic surface potential of the folded N-terminal domain of G39P reveals a somewhat negatively charged surface overall. However, there is a notable localized, highly negative patch on the surface formed by residues at the N terminus of helix ␣A and the N terminus and loop preceding helix ␣C that also lies adjacent to the last observed residue in the map, Lys-67. The distribution of charge is even more striking when one examines the helical packing of the G39P monomers in the crystal that reveals the helical array to have a very predominantly negatively charged outer surface with the uncharged or positive surface mostly buried in inter-monomer contacts or close to the helical axis (Fig. 2). The unobserved, flexible C-terminal domain may modify the apparent surface charge, but the calculated pI is 4.9 for the C- terminal 59 residues of G39P (the calculated pI for the Nterminal 67 residues is 5.2) and thus might suggest a generally negatively charged surface. Apart from the slight imbalance in positively and negatively charged residues that leads to the acidic pI, the C-terminal domain of G39P does not show a particularly abnormal distribution of residue type.
Molecular replacement attempts to determine the structure of the wt G39P protein using the lower resolution data collected to 3.5 Å from the seemingly related P6 1 22/P6 5 22 crystal form are ongoing but have so far been unsuccessful.
Analysis of Internal Mobility-In order to determine to what extent the disorder observed for the C-terminal half of the molecule reflected conformational heterogeneity in solution rather than disorder within the crystal, the 1 H-NMR behavior of G39P was investigated. One-and two-dimensional 1 H 1 H-TOCSY experiments were recorded on samples of both wt G39P and G39P112 mutant protein forms (Fig. 3). Spectra recorded at room temperature for both forms of the protein revealed two domains with very different degrees of motion, as revealed by differential NMR relaxation rates. For the C-terminal ϳ50 residues, i.e. about half the size of G39P, NMR relaxation rates are slow and are thus dominated by internal mobility far in excess of the overall rotation of the protein. Resonances from these residues show little chemical shift dispersion away from their random coil values, indicative of conformational averaging, and high intensity cross-peaks in TOCSY spectra, as illustrated by the correlations between the aromatic ring protons of Phe-76 and Tyr-80 in Fig. 3B. The intensity of the primary amide cross-peaks of glutamines 84, 90, 104, and 107 and asparagines 99 and 110 is also apparent in Fig. 3B. Backbone amide proton resonances from the mobile C-terminal domain, upon heating the sample, are severely attenuated in intensity by solvent exchange, following saturation of the water resonance (Fig. 3A). This demonstrates that there is weak or no hydrogen bonding involving this part of the protein backbone other than to solvent molecules.
The resonances from the N-terminal residues, at room temperature, have far lower intensity than would be expected for a protein of around 13-15 kDa, indicative of a self-association process under the conditions of the NMR experiments. Upon heating, resonances from this region sharpen markedly, indicating the thermal dissociation of the aggregate (Fig. 3A). In contrast to the resonances corresponding to the mobile region, these resonances display the chemical shift dispersion of a normally folded domain and include, for example, the easily identifiable indole NH of Trp-33 located at the N terminus of helix ␣B and found within the hydrophobic core of the crystal structure of the N-terminal domain.
The NMR experiments indicate that in solution the protein behaves as a two-domain entity. One domain, corresponding to approximately the 67 N-terminal residues observed in the electron density map from the x-ray experiment, exists in the fold determined above, although at room temperature it is involved in a thermally labile, self-association process. The second domain, corresponding to the remaining 60 C-terminal residues that are not observed in the electron density map, has rapid internal motion and no well defined and stable fold involving immobilized side chains. A detailed comparison of the spectra from the intact wt and truncated forms of G39P revealed no major difference between the two forms. Our findings strongly support the idea that the disorder observed in the crystal structure for the C-terminal region was not a result of the truncation of the protein nor was it merely some form of crystal artifact but reflected an underlying flexibility that may be closely related to the function of the protein.
Analysis of G39P Oligomeric State-In order to examine further the apparent difference between the oligomeric state of G39P as observed in the crystal and that reported previously in solution (6), the wt protein and the G39P112 mutant were subjected both to gel filtration and analytical ultracentrifugation analyses.
The gel filtration studies suggested that under the conditions tested (40 mM Tris-HCl, pH 8.0, containing 100 mM NaCl at 4 and 25°C) both the wt G39P and the G39P112 mutant apparently exist largely as a dimer when in the micromolar concentration range (Fig. 4) and as an equilibrium between dimer and monomer when in the nanomolar concentration range (data not shown). However, the NMR data above reveals a rapid equilibrium in solution between dissociated and aggregated states of the protein and emphasizes the need for a more cautious interpretation of the results of gel filtration experiments that are necessarily carried out over much longer time scales. The observation of a species approximating to the size of a dimer might actually arise from the rapid interchange between the monomeric and aggregated forms of the protein and is further complicated by an increase in the hydrodynamic radius arising from the flexible C-terminal domain. Gel filtration of G39P samples subjected to protein cross-linking in the micromolar concentration range apparently revealed monomers, dimers, trimers, and higher order oligomers, but interpretation of these results carries the same caveats as described for the non-crosslinked gel filtration analysis with respect to the rapid equilibration between the aggregated states of the protein and its hydrodynamic radius.
In the analytical ultracentrifugation experiments, samples of the wt G39P and G39P112 mutant proteins were subjected to sedimentation velocity measurements under similar solvent conditions to those used in the gel filtration experiment but at both acidic and basic pH values (10 mM BisTris-HCl, pH 6.0, or Tris-HCl, pH 8.0, 160 mM KCl) and a protein concentration in the micromolar range (Fig. 5). The average mass calculated from both absorbance and interference scans of the solute front was ϳ12.0 kDa for wt G39P and 12.4 kDa for G39P112, which fits well with a monomer form. However, the plots of the rate of change of the concentration (dc/dt) versus the sedimentation coefficient S look slightly irregular for the wt G39P sample at pH 8.0, and a better fit to a mixture of monomer and dimer can be made, and it is reasonable to conclude that there is an equilibrium between monomer and higher order oligomer forms favoring the monomeric species under these conditions.
Limited Proteolysis Study of G39P-An investigation of the general susceptibility of G39P to proteolytic degradation was performed using proteinase K and both wtG39P and an Nterminal His-tagged variant. The analysis revealed a much more pronounced sensitivity to proteolytic cleavage in the Cterminal half of the protein that resulted in fragments corresponding to residues 1-79, 1-87, 1-90, 1-94, and 1-106 (Fig.  6). The identity of the fragments was confirmed by the correspondence of the molecular weights determined by mass spectrometry and by retention on a Ni-NTA column of equivalent fragments (as assessed by SDS-PAGE) from the N-terminal His-tagged variant.
Genetic and Biochemical Analysis of Deletion Mutants-A series of SPP1 gene 39 mutants was studied in complementation assays, and their expressed protein products corresponding to fragments of G39P were examined in vitro to test their ability to interact with G40P-ATP␥S and inhibit its helicase, ATPase, or ssDNA binding activities (6). The results of these studies can be analyzed in the light of the domain structure of G39P revealed by this study.
The gene 39 mutant, G39112, encoding the truncated protein corresponding to residues 1-112, was established in a plasmidborne system, and this was used to test its ability to complement the defect of the SPP1sus53 conditional lethal mutant. The SPP1sus53 mutant allele has a suppressible mutation at the eighth codon of gene 39, and although a plasmid-borne wt 39 gene fully complements the defect of SPP1sus53, leading to a phage yield indistinguishable from the titer of wt SPP1, the plasmid-borne gene 39-112 mutant is unable to do so. This is consistent with the fact that an SPP1 conditional lethal mutant with a suppressible mutation at codon 103 of gene 39 (SPP1sus22) has been isolated previously (4) and suggests that G39P112 is inactive as a loader and/or inhibitor of the G40P helicase in vivo.
The full-length wt G39P protein is able to interact with G40P-ATP␥S and to inhibit all three associated activities (ssDNA binding, ATPase, and helicase activity (6)). Assays have been performed on fragments of G39P and have shown that the G39P112 variant can neither exert a negative effect on G40P activities nor compete out the wt protein from the G39P-G40P-ATP complex (data not shown). A series of G39P truncated variants with N termini deleted up to residue 73 still show interaction with G40P-ATP as assessed by affinity chromatography assay, but a variant consisting of the N-terminal residues 1-68 does not (Fig. 7).

DISCUSSION
Our structural analysis has shown that the G39P protein has two domains: a stably folded 67-residue N-terminal domain and a highly flexible and largely unfolded 59-residue C-terminal domain. This correlates well with our biochemical observations that suggest a bipartite nature for G39P in which the C-terminal domain has been implicated as the fragment of the protein responsible for the interaction with the G40P helicase. The difference in the folding behavior of the two domains of the protein is striking and unexpected and may also indicate some functional significance.
Multifunctional, multidomain proteins are common in biology, but examples as small as the G39P protein are more rare. There is an increasing body of evidence to suggest that "natively unfolded" proteins are quite common in vitro and possibly in vivo (24) and that some can adopt more structured forms only in the presence of partner or target molecules or other ligands (25). Many of these "natively unfolded" proteins have been implicated in disease states such as various forms of cancer, Alzheimer's and Parkinson's diseases, and myotonic dystrophy (24), and their unfolded nature has been linked often to their pathological effects. The presence of ordered domains coupled to other largely unfolded domains as observed for the N-and C-terminal domains in G39P also has a precedent in other structural studies, and in many of these cases a functional significance has been assigned to the flexibility of the domains (26 -28). Studies on the Salmonella typhimurium regulatory protein FlgM suggest that this protein is intrinsically unstructured when in dilute solution in vitro (26). However, the Cterminal domain of FlgM is observed to adopt a more structured form either when its partner molecule, the RNA polymerase -factor 28 , is added (26), when it is in vivo, or when in vitro conditions are adjusted to match more closely those found in living cells (29). Thus, it is possible that the C-terminal domain of G39P may possess more structure under in vivo conditions, even in the absence of partner molecules, than we have observed in our experiments. However, it might also be the case that the C-terminal domain of G39P is inherently flexible and has little structure when not in a complex to enable the optimal interaction of G39P with its G40P helicase partner. This interaction has an apparent 6:6 G39P:G40P stoichiometry (6), and the flexibility may be essential for the correct formation of a hexameric arrangement of G39P on the surface of G40P that is still accessible for interaction with origin bound G38P. Indeed G39P may need to bind potentially to a number of monomer forms of the G40P within the hexamer as these may vary depending upon either the relative conformations of the helicase subunits or the state of loading of ATP nucleotide or its hydrolyzed products as observed for the T7 gene 4 helicase (30). Use of a largely unfolded state to bind a variety of targets has been observed previously (31), and the cyclin-dependent kinase inhibitor, p21, has a completely unfolded native state that is suggested to enhance its ability to bind multiple protein targets. Another possible reason for maintaining a flexible C-terminal domain in G39P could be to enable inactivation of the protein by rapid proteolytic degradation (25). We have observed that removal of just the C-terminal 14 residues impairs G40P binding. Within the cell, random unregulated DNA binding and unwinding by G40P-ATP would be deleterious, and hence some control and targeting of its function is required through the combined action of G38P and G39P to ensure loading at the origin of replication. However, once replication has commenced and the replication machinery moved on from the origin, problems can arise from DNA damage, and the replication fork can stall with release of the replicative machinery and the requirement for subsequent reloading of these components after damage repair. Recently, it has been shown that the loading of G40P at any stalled replication fork by the SPP1 phage-encoded G35P protein can lead to replication fork reactivation (32). At this point, binding of G40P by G39P could be harmful, and thus the levels of G39P might need to be kept low either through its interaction with other factors such as G38P or by its degradation. Indeed, G39P accumulates very fast after phage infection and reaches a plateau at minute 5, remains constant up to minute 18, and goes to initial basal levels after minute 20, whereas levels of G40P accumulate with similar kinetics to those of G39P but remain constant until phage lysis (33). The reason for the apparently abrupt stop in the proteolysis of G39P after the removal of the 14 C-terminal residues under the experimental conditions used prior to crystallization is still under investigation as there is no obvious protease target site at this point. Our proteolytic degradation experiments reveal that the extent of protease sensitivity corresponds well with a more structured N-terminal domain and a less structured C-terminal domain, although the presence of substantial amounts of discrete fragments during the initial stages of proteolysis may argue for some limited structure in the C-terminal domain. Recent studies (34) on the structure of the E. coli protein DnaC that loads the replicative helicase DnaB also suggest that it is unusually flexible when free in solution. This prompts the proposal that a high level of structural flexibility might be a recurring theme in domains of loader proteins involved in -type replication that interact with replicative helicases.
The flexibility of the C-terminal domain may also be intrinsic to the ability of the protein to bind G38P and act as a linker in the transfer of the G40P helicase onto its ssDNA target. Upon transfer of the G40P-ATP to the DNA, the G39P dissociates as a heterodimer with G38P (6). The predicted pI of G38P is 9.0, and hence it is likely to have a positive electrostatic surface consistent with its DNA binding function, but this feature might also be important in the interaction with G39P given its overall negative surface charge distribution (see Fig. 2).
Our analyses of the oligomeric state of G39P when free in solution suggest that it is most likely in a monomeric form at the sub-micromolar concentrations found in vivo but that it can form higher order species or aggregates as the local concentration increases. The crystal structure implies that the oligomerization of the monomers probably does not proceed via the initial formation of 2-fold rotationally symmetric dimers but rather the gradual building up of larger species via growing chains of monomers that have 6-fold symmetry potential. Indeed the primary function of the N-terminal domain of G39P may be oligomerization for presentation of the C-terminal domain to partner proteins, but we are currently investigating further possible roles in the interaction with G38P.
Thus, this first crystal structure for a helicase loader/inhibitor protein involved in -type DNA replication has revealed an unexpected, highly plastic, bipartite structure that has developed to fulfill multiple interaction functions and to ensure the critical loading of the replicative DNA helicase on the DNA origin of replication.