Understanding DNA replication by the bacteriophage T4 replisome

The T4 replisome has provided a unique opportunity to investigate the intricacies of DNA replication. We present a comprehensive review of this system focusing on the following: its 8-protein composition, their individual and synergistic activities, and assembly in vitro and in vivo into a replisome capable of coordinated leading/lagging strand DNA synthesis. We conclude with a brief comparison with other replisomes with emphasis on how coordinated DNA replication is achieved.

The T4 replisome has provided a unique opportunity to investigate the intricacies of DNA replication. We present a comprehensive review of this system focusing on the following: its 8-protein composition, their individual and synergistic activities, and assembly in vitro and in vivo into a replisome capable of coordinated leading/lagging strand DNA synthesis. We conclude with a brief comparison with other replisomes with emphasis on how coordinated DNA replication is achieved.
Enterobacteria phage T4 infects Escherichia coli bacteria. Its genome is 170 kb (1) long and encodes 289 proteins. The DNA genome within an icosahedral head of a virus whose tail is hollow passes into the bacterial cell for infection. The rate of DNA replication in the cell is 400 -700 nucleotides s Ϫ1 (2) with a mutation rate per base pair of only 7 ϫ 10 Ϫ8 (3). This efficient and highly accurate replication system is the subject of this review.
In vitro reconstitution of a T4 replication system capable of leading-and lagging-strand synthesis on a duplex DNA substrate requires a minimum of seven proteins: the DNA polymerase (gp43); the ssDNA 2 -binding protein (gp32); the clamp loader (gp44/62); the clamp (gp45); the helicase (gp41); and the primase (gp61). The helicase loader (gp59) accelerates the reconstitution of the replication system but is not essential once assembled. The numbers in parentheses are the T4 gene designations. The brilliant biochemistry by the Alberts group and in parallel with the Nossal laboratory established the basis for functional and structural characterization of the system (4 -7). An attractive reason for study of the T4 system is the strong similarities between it and the less accessible eukaryotic replication complexes (8).

Polymerase (gp43)
As a first step in understanding the contributions of individual proteins to the functional properties of the complex is the elucidation of their properties. Kinetic schemes for the 5Ј-3Ј polymerase and 3Ј-5Ј exonuclease activities of gp43 were determined by pre-steady-state kinetic methods and fit by computer simulation (9). The minimal kinetic scheme for the action of gp43 on a model duplex is depicted in Fig. 1.
Incorporation of a single correct base follows a minimal fivestep kinetic sequence with an observed rate constant for single nucleotide incorporation of Ͼ400 s Ϫ1 assigned to the chemical step and close to that observed in the cell. Thus, the accessory proteins do not increase the rate of the polymerization reaction per se. The dissociation rate of gp43 from the duplex sets the observed steady-state velocity. The polymerization process is further distinguished by a high degree of dNTP-binding discrimination-up to 300-fold-in the ternary reactive complex.
gp43 exhibits an active 3Ј-5Ј-exonuclease cleavage rate of 100 s Ϫ1 . Note there is a kinetic barrier that protects a properly base-paired 3Ј terminus from excision as illustrated by a partitioning between the polymerase site and the exonuclease site biased 23:1 in favor of the polymerase site for correct base pairing. In the case of a mismatched 3Ј terminus, this partitioning drops to 5:1 markedly favoring excision. The distance between the two active sites has been estimated as 2-3 nucleotides (10). The 2.8-Å resolution structure of the bacteriophage RB69 gp43 (63% identical in amino acid sequence to its homolog from T4 phage) as well as a 2.6-Å structure with primer/template and nucleotide bound are central for examining protein-protein interactions that maintain the replisome (11). A ribbon representation of the enzyme showing gp43 and its active sites can be found in supplemental Fig. S1. The revealed palm domain contains the three conserved carboxylates (Asp-411, Asp-621, and Asp-623) implicated in catalyzing the nucleotidyl transfer reaction. The conversion of Asp-219 in the exonuclease domain to Ala-219 generates an enzyme that is devoid of exonuclease activity but retains unchanged polymerase activity (12). The distance between the polymerase and exonuclease sites in the crystal structure can be spanned by a 4-base oligonucleotide or a three-nucleotide duplex DNA (13).

Holoenzyme (HE)
In the absence of accessory proteins, gp43 has limited ability to extend DNA templates (2). Numerous physical studies and processivity assays found that gp44/62, gp45, and gp43 in the presence of ATP formed an active replication complex capable of extending large primed circular ssDNA templates (M13 or ⌽X174) or polyoligonucleotides (14 -18). Experiments on endblocked linear primer/template substrates established that the core HE consisted of a complex of gp45 and gp43 with the gp44/62 acting catalytically to load gp45 but not acting in its unloading (19 -21). The molecular basis for the increased processivity of the HE was revealed by the structure of gp45 that, like other processivity factors for the bacterial and eukaryotic DNA polymerases, is a highly symmetrical, three-subunit, ringshaped structure through which duplex DNA can be threaded (supplemental Fig. S2) (22).
Investigations of the solution structure of gp45 by various physical methods found a cooperative assembly of the monomers into an open complex composed of two closed subunit interfaces with a third subunit interface separated by a distance of 35-38 Å (23). This intriguing finding was further scrutinized by time-resolved fluorescent spectroscopy of gp45 labeled across its three-subunit interfaces with a pair of dyes capable of FRET. gp45 was found to exist in two forms in solution with 67% in a closed state and 33% with a gap between two subunits of 42 Å (24). The gap is sufficiently large to permit clamp loading to form a functional replication complex without the gp44/62 (25).
Interactions between gp45 and gp43 were first uncovered by deleting the last six C-terminal amino acids of gp43. This mutant, which retained all the kinetic parameters of the parent, does not form an HE (26). A solution model of the HE bound to DNA (supplemental Fig. S3) (27) was created through studies with the following: (i) a fluorescently labeled peptide based on the gp43 C-terminal residues as well as an analogous peptide cross-linker; (ii) FRET-based stopped-flow measurements (gp45 was labeled in multiple positions on opposite sides of the subunit interface) that tracked the kinetics of HE formation (28,29); and (iii) a crystal structure of the C-terminal peptide bound to gp45 (13). The model shows the C terminus of gp43 inserted into the open subunit interface of gp45.

Assembly/disassembly of the HE
Before discussing how gp44/62 solves the topological problem of opening and closing gp45 on DNA, it is instructive to view the structure of gp44/62. The general organization of the clamp loader consists of one copy of gp62 and four copies of gp44. The architecture of a gp44/62⅐gp45⅐DNA complex with gp44/62 bound to an open gp45 has been solved by X-ray crystallography (supplemental Fig. S4) (30).
Germane to integrating this structure with function is the proposed reaction cycle for loading gp45 by gp44/62 followed by the latter's dissociation from the DNA. In the presence of ATP bound to the four subunits of gp44, gp44/62 binds, opens gp45, and loads gp45 onto the duplex (Fig. 2, steps 1-4). ATP hydrolysis is associated with both loading and departure of gp44/62 but not for gp43 binding (31). Hydrolysis of ATP promotes closure of gp45 and ejection of gp44/62 (Fig. 2, steps 4 -10). All of these steps are associated with large conformational changes in the four gp44 subunits (supplemental Fig. S5) (30,32). The stoichiometry of ATP hydrolysis by gp44/62 has been measured by pulse-chase kinetics; however, the numbers vary from 2 to 4 ATPs per turnover cycle apparently arising from differences in the quenching procedures (33)(34)(35).
Besides loading gp45 onto duplex DNA, gp44/62 acts as a chaperone to escort gp43 to its binding site on the DNA duplex, consistent with the finding that gp43 binds to the same face of gp45 as gp44/62 (36). Moreover, the formation of the HE can occur through one of four pathways as illustrated in supplemental Fig. S6 (37,38).
The dissociation of the HE from the DNA duplex is governed by the dissociation of gp45 subunits into monomers at a rate of (3.3 Ϯ 0.6) ϫ 10 Ϫ3 s Ϫ1 as measured using a FRET signal engineered across the subunit interface (39). Unexpectedly, gp43 in the HE was found to exchange with an unbound gp43 in solution (40). Neither ATP hydrolysis nor the presence of gp44/62 was required for the exchange. Two possible models for the exchange process are shown in supplemental Fig. S7 (40).
gp32 is often considered part of the HE because it stimulates replisome processivity and the rate at which gp43 traverses helical regions of the DNA by melting out adventitious secondary structure (41). It is essential for leading-and lagging-strand synthesis in vitro (42). Its crystal structure revealed an ssDNAbinding cleft with a positively charged surface parallel to a series of hydrophobic pockets conferring sequence independence and high discrimination against duplex DNA (43). Consequently, the protein may slide along the ssDNA, although its cooperative binding favors complete coverage of the ssDNA.

The primosome (helicase gp41/primase gp61)
The gp41⅐gp61 complex exhibits both helicase and primase activities (44,45). The preferred substrate for gp41 is a replication fork with single-stranded extensions of Ͼ29 nucleotides on both strands of the fork duplex consistent with gp41 interacting with both the leading and lagging strands (46). Unwinding requires ATP or GTP hydrolysis and proceeds at a rate of 30 bp/s (47). gp61 stimulates gp41 unwinding less than 2-fold by facilitating its binding to the ssDNA (46). At physiological concentrations, gp41 exists primarily as a dimer, but the binding of ATP/GTP or ATP␥S/GTP␥S drives the assembly of the dimers into an asymmetric hexametric ring complex (supplemental Fig. S8) (48). Electron microscopy images of gp41 support open and closed forms of the ring (49). gp41 is highly processive in the presence of the six other replication proteins (excluding gp59) with an association half-life of ϳ11 min (50), sufficiently long to accomplish the replication of the entire T4 genome implying that gp41 also has an accelerated rate in the presence of the other replication proteins. gp61 generates the pentameric ribonucleotide primers required to initiate Okazaki fragment synthesis (51,52). The biological relevant primers with the sequence pppApC(pN) 3 are the products of the gp41⅐gp61 complex; in the absence of gp41, gp61 can generate dimers as well as products greater than five nucleotides (52). The primase activity is greatly stimulated by complexation with gp41. In fact, gp41⅐gp61 complexes on templates coated with gp32 exhibit a physiologically relevant priming rate of about 1 primer s Ϫ1 to provide sufficient primers for lagging-strand synthesis given the rate of replication (53). The stoichiometry of gp61 binding to a gp41 hexamer has been reported as 1:1 (54), 6:1 (53, 55), or 3:1. 3 The last stoichiometry measurement was done with single-molecule photobleaching and is more definitive and reflective of physiological conditions. Moreover, the variability of this stoichiometry probably arises from the dissociative rather than processive nature of gp61 (56), and it is likely that only one gp61 is scanning the ssDNA to synthesize a primer at a given time. The primosome synthesizes far more primers than needed with only ϳ25% being utilized for Okazaki fragment synthesis (57).

Assembling the replisome
Replication initiates from R-and D-loops for origin-dependent and recombinant-dependent replication, respectively (58). Origins of replication facilitate the formation of RNA primers within the R-loop to start leading-strand DNA synthesis implying that it is primed by the gp41⅐gp61 complex or coupled with gp61-dependent lagging-strand synthesis. Recombinant-dependent replication begins with the strand-invasion reaction that creates D-loops with the invading 3Ј-end of the DNA being used to prime leadingstrand synthesis following loading of a gp41⅐gp61 complex and gp59 on the displaced strand of the D-loop. gp41 and gp59 form a 1:1 hexameric complex with the lagging strand of the replication fork passing through the center of the ring-shaped helicase (47). Loading of gp41 onto gp32 that coats ssDNA exposed during replication initiation is facilitated by gp59 that destabilizes the interaction between gp32 and ssDNA through a direct contact with gp32 (59). At least two to three gp32 proteins (binding-site size of 8 nucleotides each) must be released for loading of gp41 (binding-site size of 12-20 nucleotides per hexamer) (60,61), and indeed, gp32 promotes gp59 oligomerization (62). In turn, direct interactions between the C-terminal peptides of gp59 and those of gp41 promote the latter's assembly (63). A plausible mechanism revealed by cross-linking experiments is depicted in supplemental Fig. S9. The hexameric nature of gp59 when bound to forked DNA was definitively confirmed by single-molecule photobleaching (64). The stepwise assembly of the primosome was then traced by  single-molecule FRET microscopy leading to the sequence illustrated in Fig. 3 (65).
The leading-strand HE can readily assemble on the DNA fork as the primer strand becomes available. What's to prevent premature synthesis before the primosome is in place? Mutations in gp59, which have a deleterious effect on origin-dependent DNA replication in vivo, suggest a gatekeeper role for it (66). In vitro studies found a complex formed between gp59 and the leading-strand gp43, whose structure was modeled (supplemental Fig. S10) (67). Both the synthesis and exonuclease activities of gp43 are inhibited. Single-molecule FRET microscopy showed this complex was "unlocked" by the addition of gp41 followed by gp61 to form a functional primosome and subsequently a fully active replisome (supplemental Fig.  S11) (68). The "unlocking" stems from the loss of gp59 contacts with the replication fork (69) leading to its loss from the active replisome (68). We have summarized the known interactions between the proteins within the replisome in supplemental Fig.  S12.
The in vitro gp41 unwinding rate is ϳ10-fold less than the replication rates observed in vivo and in vitro. Likewise, an independent HE is very inefficient at strand displacement synthesis at a rate of about 1 nt/s (70). However, no physical interaction was found between the two to account for their rate of replication when both are present at a replication fork, leading to the postulate of a functional coupling that depended on interac-tions modulated by the DNA replication fork (71). In this model, the trailing HE traps the ssDNA product of the gp41 unwinding activity preventing the separated strands from reannealing and causing back slippage of gp41 and thereby increasing the unwinding rate. From investigations with magnetic tweezers, a collaborative model was postulated where gp41 prevents the HE from stalling and in turn HE blocks gp41 slippage so both activities are stimulated through the DNA fork (key data and instrumentation are shown in supplemental Fig. S13) (70,72). Consequently the coupled replication of gp41 and gp43 manifests a rate of 300 -400 bp/s in accord with the rates for replication fork movement noted above.

Coordinated leading-and lagging-strand replication
DNA synthesis in vivo is tightly coordinated with the synthesis of both strands completed simultaneously despite the continuous replication of the leading strand and the discontinuous replication of the lagging strand. How is this achieved?
As first steps in understanding leading-and lagging-strand DNA synthesis, reconstitution in vitro of an active replisome was achieved on a model replication fork or a minicircle substrate (42,73). The latter allows for the quantitation of synthesis of each strand (supplemental Fig. S14). The synthesis was initiated with all eight proteins and was tightly coordinated. With a two-hybrid system based on the phage C1 repressor, a homodimerization domain was established in the 400 -600amino acid region of gp43 (42) and then narrowed by crosslinking to Cys-507 specifically (74). The physical tethering of the two gp43s in a replisome necessitates the formation of a replication loop (5) visually observed in electron micrographs (75).
Experiments showed that the activity of coordinated synthesis by a reconstituted replisome was highly resistant to dilution provided the buffer was supplemented with gp45, gp44/62, and gp32 (76). More sensitive trapping experiments additionally revealed gp61 dissociated as well (56). As noted earlier, in the presence of excess gp43 in solution, the gp43s in the replisome will exchange but not impede the processivity of the replisome (40). Thus, only gp41 and the gp43s do not dissociate from the replisome within lifetimes sufficiently long to permit processive duplication of the entire T4 genome.
Central to the synthesis of Okazaki fragments on the lagging strand is the need to accommodate gp41 and gp61 translocating in opposite directions (5Ј to 3Ј unwinding versus 3Ј to 5Ј primer synthesis). Three possible scenarios can be visualized (supplemental Fig. S15): pausing both gp41 and gp61; disassembly of the primosome to synthesize a primer forming a pppRNA⅐gp61 complex while gp41 continues unwinding; and having the primosome remain intact forming a priming loop with the unwound DNA. With magnetic tweezers experiments, two mechanisms were observed: disassembly of the primosome to form pppRNA⅐gp61 complexes and priming loop formation with an intact primosome. No pausing was found (77). Primosome disassembly during primer synthesis has important ramifications for the discontinuous lagging-strand synthesis.
Lagging-strand synthesis requires transient release of gp43 from the DNA template upon Okazaki fragment completion, Figure 3. Assembly mechanism of the T4 lagging-strand primosome on forked DNA. The gp32 protein binds to forked DNA with either subsequent or concurrent binding of gp59. Subsequently, gp41 binds to gp59 and is loaded onto the forked DNA in the presence of nucleotide. ATP hydrolysis is required for gp41 to displace gp32 and gp59, either directly or by translocation. The gp61 protein then binds and interacts closely with gp41 on forked DNA. In the absence of gp32 and gp59, both gp41 and gp61 bind to forked DNA. Figure  yet it remains associated with the replisome during recycling to initiate a new fragment (76,78). Recycling can be triggered by the lagging-strand gp43 colliding with the end of the previous Okazaki fragment that accelerates the transient dissociation, i.e. the collision mechanism (5,79,80). However, the size and number of the Okazaki fragments can be manipulated by varying the activity of gp61, the gp45 and gp44/62 levels, and the rate of synthesis by the lagging-strand gp43 to create a pattern of gapped Okazaki fragments, i.e. the signaling mechanism (57,81). The cumulative events of primer synthesis and gp43 recycling to initiate another Okazaki fragment compete with the advance of the leading-strand HE to increase the separation of the two HEs and potentially disrupt coordinated replication by the replisome. How then is replication coordination maintained?
Recognizing that only 20 -30% of the primers produced by gp61 are actually utilized for Okazaki fragment synthesis, many unused primers in the form of pppRNA⅐gp61 complexes could build up ahead of the lagging-strand HE. Indeed, the collision with such complexes was found to trigger early termination of Okazaki fragment synthesis (82). Consequently, the mechanism for dissociation of gp43 to recycle is always collision either with the previous Okazaki fragment or an unused pppRNA⅐ gp61 complex. These signaling and collision mechanisms are illustrated in Fig. 4 (82) and accommodate the above factors that affect the signaling pathway.
A model derived from all the available kinetic data (82) reproduces the observed distribution of Okazaki fragments (83). Several important conclusions may be drawn from the modeling. First, all Okazaki fragments originate from primers synthesized by the looping mechanism. Second, the interplay between the recycling gp43 binding to an already available pppRNA⅐gp45⅐gp44/62 complex (Ͻ1 s) and a median time of ϳ6 s for the formation of a pppRNA⅐gp45⅐gp44/62 complex provides a mechanism for a semi-random distribution of Okazaki fragment lengths in our simulation. Finally, the laggingstrand gp43 will recycle to bind newly synthesized primers a mean distance of 400 nt from gp41 as a result of the brief lifetime of a naked primer and the slightly longer lifetime of a pppRNA⅐gp45⅐gp44/62 complex. Thus, signaling alone is sufficient to keep the leading-and lagging-strand synthesis coordinated even though the two gp43s are moving at the same mean velocity.

Comparison with three DNA replication systems
We conclude with a comparison of three DNA replication systems focusing on how they coordinate leading-and laggingstrand synthesis.
The bacteriophage T7 replication system has been the subject of a very recent comprehensive review (84). The replisome and its constituent proteins are depicted in supplemental Fig.  S16. The slowest event in coordinated replication stems from the collection of steps involved with primer synthesis (a short tetraribonucleotide) requiring some 6 -12 s. In contrast, the polymerase (gp5) with bound processivity factor, thioredoxin, extends primers at a rate of 114 -122 bp/s. Importantly, this is also the rate of duplex unwinding by the helicase (gp4) in this gp4⅐gp5 complex that is accelerated some 13-fold over that of gp4 alone. Recall a similar phenomenon was noted for the T4 helicase in a gp41⅐HE complex (72).
This large difference between the rate of priming versus the rate of the gp4⅐gp5 complex advance would result in the loss of coordination between the syntheses of the two strands (85). This problem can be resolved by pausing duplex unwinding and leading-strand synthesis, by having the lagging-strand gp5 advance at a faster rate than the leading-strand gp5, or by having more frequent gp5 recycling and Okazaki fragment initiation via the signaling mechanism. An initial report found pausing of the gp4 and leading-strand synthesis (86); a later report found no pausing but an ϳ30% faster rate for lagging-strand synthesis versus leading-strand synthesis (87). Given the observations of concomitant primer synthesis and efficient interprotein transfer of the primer to gp5, the faster copying of the lagging-strand template ensures synthesis of both strands remains coordinated.
The T7 system, however, also exhibits both signaling and collision mechanisms in the synthesis of Okazaki fragments. The signal for premature release of gp5 appears to be primer synthesis followed by replication loop release before completion of the previous Okazaki fragment (88) rather than polymerase release upon collision with unused primer⅐primase complexes because the helicase and primase activities are properties of a single polypeptide.
An unusual characteristic is the binding of additional units of gp5 to gp4 (up to six gp5s theoretically can bind to the six C termini of hexameric gp4), which offers a switching mechanism enabling exchange of the synthesizing lagging-strand polymer-ase and the non-synthesizing polymerase reservoir (89). This exchange could also contribute to the early termination of Okazaki fragments as a signal along with primer synthesis. Both the faster synthesis rate of the lagging-strand gp5 and the added benefit of the premature recycling of gp5 via the switching mechanism act together to retain coordination of DNA replication by the T7 replisome.
The Escherichia coli replisome has also been the subject of a recent review (90). The replisome with its more complex constituent proteins is exhibited in supplemental Fig. S17. Moreover, the E. coli system features the helicase (DnaB) and primase (DnaG) activities residing on separate proteins like the T4 replisome. The processive coupling of DNA synthesis on both template strands faces the same timing issues noted above. The slowest event surrounds priming of the lagging-strand polymerase. Like T4, the primer synthesis activity of DnaG is accelerated ϳ5,000-fold by the presence of DnaB reducing the time for this part of the process to ϳ0.2 s. Deposition of primers then occurs at the physiological 1-2-s interval similar to the T4 replisome (91).
The leading-strand polymerase complexed to ␤-clamp and in the presence of DnaB in single-molecule experiments shows bursts of synthesis at 500 -600 nt/s and mean rates of about 350 nt/s independent of DnaG activity (92,93). Again, the activity of DnaB is stimulated by the presence of the leading-strand HE; by itself, the DnaB unwinds duplex DNA at a rate of only 84 -86 bp/s (93). Consequently, a displacement of about 350 -700 bp between the two polymerases can be imagined as a consequence of each Okazaki fragment synthesized (90).
The question again is how the lagging-strand polymerase is recycled after Okazaki fragment synthesis. Testing for the collision mechanism found it might act 40% of the time (94). However, recent reports contend the polymerase recycling time for the collision mechanism is greater than 2 min and would lead to lengthening of successive Okazaki fragments contrary to observation (95). An additional hypothesis is that the torsional strain generated by a replisome with two interconnected polymerases is responsible for polymerase recycling (96), but a direct measure of the force required to break the connection would bolster this proposal. Consequently, the literature appears to favor a more conventional signaling mechanism (93,97).
The signal that triggers lagging-strand polymerase recycling has been ascribed to the synthesis of a primer (97), consistent with an earlier observation that the frequency of primer synthesis and the efficiency of primer utilization control Okazaki fragment size (98). Pertinent to our discussion, however, is that primer utilization varies from 70 to 95% and that DnaG acts distributively (99) like gp61. This raises the untested possibility that the recycling of the E. coli lagging-strand polymerase might also be triggered by collision with an unused primer or pppRNA⅐DnaG complex similar to the T4 system.
Recent single-molecule experiments found the replication velocity of the leading-and lagging-strand polymerases to be equivalent, but punctuated with stops/starts to change replication rates (93). Consequently, their behavior would not be coordinated but stochastic. Because there is no difference in velocity to compensate for the time required for various events related to Okazaki fragment synthesis, one can question whether the stochastic rate fluctuations can be coordinated to achieve the DNA replication necessary for a long genome. A further complicating factor is the observation that both in vivo and in vitro studies indicate that two polymerases may function on the lagging strand, which is not taken into account in the single-molecule experiments. So the issue of replisome coordination or lack thereof is somewhat unsettled.
Much remains to be done on eukaryotic replisomes whose constituent proteins are more numerous and complex. For example, there are two distinct, multisubunit polymerases: one for leading-(⑀) and one for lagging-strand replication (␦). The fact that the primase is both an RNA and DNA polymerase is another prominent difference. An understanding of the present status of the eukaryotic replication machine and a brief comparison with bacterial replisomes have appeared recently (100). Many questions that have been answered for T4, T7, and E. coli are only now being explored for eukaryotic replisomes. One wonders to what extent similarities will be universal and whether the differences between phage, bacteria, and eukaryote replisomes are merely evolutionary nuances with the key features of the mechanistic solution to DNA replication already solved by primitive organisms.