The Chromosome Replication Machinery of the Archaeon Sulfolobus solfataricus*

In the three domains of life, the archaea, bacteria, and eukarya, there are two general lineages of DNA replication proteins: the bacterial and the eukaryal/archaeal lineages. The hyperthermophilic archaeon Sulfolobus solfataricus provides an attractive model for biochemical study of DNA replication. Its relative simplicity in both genomic and biochemical contexts, together with high protein thermostability, has already provided insight into the function of the more complex yet homologous molecules of the eukaryotic domain. Here, we provide an overview of recent insights into the functioning of the chromosome replication machinery of S. solfataricus, focusing on some of the relatively well characterized core components that act at the DNA replication fork.

The archaea constitute a domain of prokaryotic organisms that occupy the most diverse range of environments of the three domains of life. Although the archaea are well established to be phylogenetically separable from the bacterial and eukaryal domains (1, 2), a fascinating evolutionary relationship between the core information processing genes of the archaea and eukarya has come to light as a result of detailed genomic analyses over the past decade (3,4).
Much of our knowledge of fundamental DNA replication mechanisms has been established through studies of the bacterium Escherichia coli and its plasmids and bacteriophages (5,6). This provides a firm basis for study of replication in the other domains, but the detailed mechanisms of protein function are likely to be different in archaea and eukaryotes. Although it may be initially surprising, certain archaea provide attractive model systems for DNA replication studies because they contain a simplified, yet homologous, version of the core eukaryotic DNA replication machinery and are amenable to laboratory study. In addition, the archaea provide fascinating insights into the evolution and diversity of DNA replication mechanisms, helping pinpoint important parts of the machinery that may be targeted for drug development. Hyperthermophilic Sulfolobus species from the Crenarchaeal kingdom are becoming a popular choice for researchers for a number of reasons: they contain arguably the most eukaryotic-like replication machinery of the well known archaea, they can be grown simply in the laboratory in aerobic conditions with solid or liquid media (at ϳ75-80°C and pH 2-3), their proteins exhibit high thermostability, and they can harbor numerous viruses and plasmids from which experimental systems are developing (7).

Initiation of Chromosome Replication
Many archaea contain a single circular chromosome. However, in contrast to circular bacterial chromosomes that contain a single initiation site (origin) for DNA replication, it was recently discovered that three origins (oriC1, oriC2, and oriC3) are active in the circular Sulfolobus solfataricus chromosome, analogous to the multi-origin arrangement of linear eukaryotic chromosomes (8,9). Therefore, the problem of replicating DNA in a reasonable time appears to be have been solved in two ways during evolution, independent of chromosome circularity; bacteria with fast DNA synthesis rates can initiate multiple temporally overlapping rounds of replication from a unique origin, whereas eukaryotes and some archaea have acquired multiple origins.
During initiation of replication in all species, initiator proteins recognize origin DNA. This is then followed by the recruitment of additional proteins, the localized unwinding of origin DNA in preparation for DNA synthesis on both template strands, and the recruitment of the DNA replication machinery for the elongation stage of replication. Almost all archaea contain at least one homolog of the eukaryotic initiator proteins Orc1 and Cdc6 that are involved in establishing the pre-replicative complex (pre-RC). S. solfataricus has three Orc1/Cdc6 homologs, designated Cdc6-1, Cdc6-2, and Cdc6-3 (10). All three proteins have sequence similarity to both Cdc6 and Orc1, hinting at a possible combined functionality of the S. solfataricus proteins. Origins oriC1 and oriC2 are closely linked to the genes encoding Cdc6-1 and Cdc6-3, respectively.
A role of the Orc1/Cdc6 proteins in defining the S. solfataricus origins was demonstrated by DNA footprinting experiments showing that both oriC1 and oriC2 contain several "origin recognition boxes" (ORBs), 2 sequences within the origin DNA that specifically interact with the Cdc6 proteins. ORB and related mini-ORB elements are conserved among known archaeal origins (9,11). Interestingly, there was a different arrangement of ORBs at oriC1 and oriC2, and the binding characteristics of the Orc1/Cdc6 proteins varied considerably in both position and affinity. As is the case for all well defined replication origins, AT-rich sequences exist within the S. solfataricus origins; these may facilitate origin unwinding.
The Orc1/Cdc6 proteins are members of the AAAϩ protein family (ATPases associated with various cellular activities) (12,13). In addition to the N-terminal AAAϩ domain, Orc1/Cdc6 proteins also contain a winged-helix DNA-binding motif toward the C terminus. This overall arrangement of AAAϩ and DNA-binding domains is conserved among initiators from all three domains of life (14). Three-dimensional structures of the Orc1/Cdc6 protein from the Crenarchaeon Aeropyrum pernix suggest that the AAAϩ domain may regulate the DNA binding by an ATP-mediated stabilization of the DNA binding conformation of the protein (15), possibly explaining the observation that ATP is required for effective DNA binding by eukaryotic Cdc6 (16). However, as it appears that neither ATP binding nor hydrolysis is required for DNA binding by Sulfolobus Orc1/Cdc6s (9), ATP hydrolysis may be required for unwinding of origin DNA or to establish a later stage in the pre-RC. Interestingly, the yeast origin recognition complex, requires ATPase activity in cooperation with Cdc6 for the development of the mature pre-RC at an origin (17).
Like many Orc1/Cdc6 proteins, the purified S. solfataricus proteins have an autophosphorylation activity in the absence of DNA, suggestive of another level of regulation (18,19). A role for autophosphorylation in vivo is undefined but may be involved in the regulation of the pre-RC. Thus, despite some initial clues, the question of what events trigger initiation of replication in archaea is unresolved and largely unexplored. It is also still unclear how initiation proteins mediate localized unwinding of DNA or how they recruit proteins involved in the elongation phase of replication.
The last major step in establishing the pre-RC is recruitment of the helicase molecules, to provide DNA unwinding activity at the DNA replication fork. Helicase recruitment is achieved through the combined activity of the initiator proteins and other accessory factors of the pre-RC, although as mentioned above, the precise molecular mechanisms have not been defined in the archaea or eukarya. By analogy to the partially characterized E. coli system, this may be an active process involving specialized helicase loader proteins that accompany the helicase. It is possible that Orc1/Cdc6 plays a direct role in helicase loading, since archaeal Orc1/Cdc6 has been reported to increase DNA binding of the likely replicative helicase, MCM (mini-chromosome maintenance), and inhibit its DNA unwinding activity (19 -21). However, the situation is still unclear in S. solfataricus, and interactions between Orc1/Cdc6s and the helicase seem to vary among archaea, suggesting that additional factors may be involved in many cases.

The Chromosome Replication Machinery
After establishment of the pre-RC and receipt of signals marking the start of S phase of the cell cycle, further recruitment events complete the building of mature DNA replication machines ("replisomes") for bidirectional DNA replication from each origin. The basic components of a replisome are the DNA replication fork itself, together with the helicase, and the polymerases and processivity factors required to synthesize DNA on each template strand. A comparison of chromosome replication proteins of the three domains of life is given in Table 1.
Helicase-Helicases unwind the helical structures of nucleic acids. The MCM proteins were originally identified as being required for yeast (see Ref. 22), and homologs in both eukaryotes and archaea almost certainly act as the primary replicative helicases. Like the bacterial replicative helicases, the MCM proteins appear to act as ring-shaped oligomers (usually hexamers or double hexamers) fuelled by NTP hydrolysis. The helicase ring encircles DNA, providing a topologically stable interaction that allows efficient translocation and unwinding. However, it is unclear whether the MCM helicases encircle singleor double-stranded DNA during unwinding in vivo, and the basic mechanism of DNA unwinding remains elusive. A number of models have been proposed, including the single-strand steric-exclusion model, the rotary pump model, the double-pump side-extrusion model and the "plowshare" model (reviewed in detail recently in Ref. 23).
In eukaryotes, six different MCM subunits (MCM2-7) are required for replication fork progression (24). The complexity of the enzyme and its apparent requirement for many additional factors has hampered functional analysis in vitro. However, in many archaea, including S. solfataricus, there is only one MCM homolog. Archaeal MCM proteins form primarily hexamers and double hexamers in solution, although other quaternary arrangements have also been detected (25,26). The purified proteins exhibit helicase activity in vitro, opening the door to studies of the MCM mechanism.
The best characterized archaeal MCM protein is MthMCM from Methanothermobacterium thermoautotrophicum. Its most active form as a helicase is a dodecamer composed of two head-to-head orientated hexamers (27). This is consistent with a model where MthMCM acts as a double pump for unwinding during bidirectional DNA replication, whereby the encircled double-stranded DNA is pumped into the center of the enzyme through the central channel by both apposed hexamers, and single strands are extruded through gaps in the hexamer-hexamer interface. Persuasive evidence for such a mode of action is available for the T-antigen (Tag) helicase of the eukaryotic SV40 virus, which displays a strikingly similar quaternary arrangement to MthMCM (25, 28 -30).
Purified S. solfataricus MCM forms a stable hexamer and shows clear helicase activity (31,32). SsoMCM activity in vitro requires a 3Ј single-stranded sequence to facilitate helicase loading onto the DNA substrate indicating that it has a 3Ј to 5Ј polarity of translocation. However, it is still unknown whether SsoMCM encircles single-or double-stranded DNA during unwinding in vivo. Fluorescence resonance energy transfer experiments using labeled DNA and SsoMCM indicate that SsoMCM is orientated on DNA so that the larger C-terminal AAAϩ motor domain faces toward the direction of translocation and the N-terminal domain faces away (32).
Homology modeling based on the known structures of MthMCM and Tag allowed the identification of two candidate ␤-hairpins that project in toward the central channel of SsoMCM. Mutation of positively charged residues on either or both of these structures demonstrated their role in DNA binding and helicase activity. Interestingly, the AAAϩ motor domain hairpin mutant exhibited only mild DNA binding defects yet was a non-functional helicase (32). As it is part of the motor domain, it was proposed that the hairpin may play a role as a paddle-like structure that, together with conformational changes brought about by NTP hydrolysis, helps propel the helicase along the DNA. A similar proposition has been made for the action of these structures in the distantly related Tag helicase (33).
Primase-After initial DNA unwinding at an origin of replication, the primase, a DNA-directed RNA polymerase, produces short complementary oligonucleotides of RNA on the template strands so that the DNA polymerase can start DNA synthesis during the elongation phase of replication. The "lead-ing" strand (overall 5Ј to 3Ј synthesis) only requires one priming event in principle, whereas the "lagging" strand (3Ј to 5Ј overall synthesis, actually constructed in sections by numerous 5Ј to 3Ј syntheses) requires a primer for each of the sections (or "Okazaki" fragments).
The S. solfataricus primase is homologous to eukaryal primases, although only the "core" subunits (PriS and PriL) appear to be present in archaea. There is also a bacterial-like primase homolog (DnaG) of unknown function in S. solfataricus (10). Intriguingly, this is associated with the RNA degradation machinery (34). Biochemical studies of the primase (35, 36) reveal a remarkable diversity of function in vitro, including both RNA and DNA primase activity and 3Ј-terminal deoxynucleotidyl transferase activity. It is unknown which of these activities are utilized in vivo. However, since the catalytic active site of PriS is related to the active sites of Pol-X family repair polymerases, which are absent from S. solfataricus, it might turn out that PriS fulfils roles in DNA repair (37).
The structure of the heterodimeric S. solfataricus primase was recently solved by x-ray crystallography (38), providing the first views of a eukaryal/ archaeal primase heterodimer. The S. solfataricus primase catalytic subunit, PriS, forms a tight association with the regulatory subunit, PriL, via a conserved dimerization interface. Structure guided mutational analysis defined regions of the protein involved in dimerization and substrate binding. Also, two regions in the PriSL heterodimer were identified as being important for regulation of RNA product length and overall catalytic activity. These mutational analyses were consistent with the proposed model for the enzyme-substrate complex, whereby the cashew nut-shaped primase cradles the DNA-RNA hybrid so that the regulatory subunit helps measure the RNA product length and bind the template DNA during RNA synthesis (Fig. 1). Further work on the primase should relate to how it is linked to the rest of the replisome during replication.
Polymerases and Processivity Factors-DNA polymerases synthesize DNA using a primed DNA template and deoxynucleotides, while their associated processivity factors tether the polymerase to the DNA. S. solfataricus contains three B-family DNA polymerases, Pols B1, B2, and B3, that are related to the polymerases in the primary eukaryotic replisome. (Polymerases are categorized into A, B, C, D, X, and Y families (39).) In addition, S. solfataricus contains a well characterized Y-family polymerase (Dpo4) that participates in DNA lesion bypass synthesis (see Refs. 40 -42).
Little is known about the S. solfataricus B-family polymerases in vivo, although the presence of more than one type of B-family polymerase suggests that they may different roles in chromosome replication. Different polymerases might be active on the leading and lagging strands, as observed for eukaryotes and some bacteria. Pol B3 is the homolog of the well known "Pfu" polymerase from Pyrococcus furiosus and, compared with humans, is homologous to the catalytic subunit of Pol ␦ that participates in leading strand synthesis (see Refs. 5 and 39).
The crystal structure of Pol B1 was recently solved, demonstrating a similar structural arrangement to other B-family polymerases that contain an N-terminal 3Ј to 5Ј "proofreading" exonuclease domain and a C-terminal "righthand" 5Ј to 3Ј polymerase domain (43). The structure highlighted specific features of the enzyme that are likely to have specific roles for hyperthermophiles. For example, it was recently confirmed that part of the N-terminal domain of Pol B1 possesses the archaeal template-scanning motif that recognizes and pauses at deaminated bases (e.g. uracil from cytosine) in the template strand, three to four nucleotides ahead of the nascent strand end (44 -46). The enzyme presumably waits for repair of the DNA before proceeding, thereby preventing mutations that might otherwise be common in organisms from high temperature environments where base deamination is more prevalent. It will be interesting to discover the role of each of the three B-family polymerases  (3) in S. solfataricus chromosome replication and repair and the implications of the above pausing mechanism in the context of the entire replisome. Isolated polymerases only have a low affinity for DNA, resulting in frequent dissociation during synthesis. To increase processivity within the replisome, the polymerase is coupled to a specialized sliding clamp that encircles DNA, but does not bind tightly to it, and effectively tethers polymerase to the DNA. The two sliding clamps at each replication fork are positioned behind the polymerases on each strand, such that they encircle the newly formed doublestranded DNA. All sliding clamps are donut-shaped, although they differ in primary sequence and quaternary arrangement. For example, the E. coli sliding clamp (or ␤ subunit/clamp) is a homodimer, whereas the eukaryal and archaeal sliding clamps (called PCNA) are trimers (5).
Uniquely, there are three PCNA homologs in S. solfataricus that form a heterotrimer (see Table 1), which appears to be constructed via strong interactions between PCNA1 and 2, and a significantly weaker interaction between the PCNA1-2 dimer and PCNA3, suggesting that reversible binding of PCNA3 could play an important part in the DNA loading process (47). Dionne et al. (47,48) also demonstrated interactions between the various PCNA subunits and other components of the S. solfataricus DNA replication machinery, reflecting known interactions between eukaryotic homologs. The binary interactions observed included PCNA2 with Pol B1, PCNA1 with the FEN1 nuclease, PCNA3 with DNA ligase I, and PCNA3 with uracil DNA glycosylase. DNA ligase I and FEN1 participate in the replacement of RNA primers with DNA during completion of lagging-strand replication, whereas uracil DNA glycosylase removes uracil from DNA in repair processes. Since DNA ligase I and FEN1 also have roles in DNA repair, PCNA is probably utilized in a range of DNA processing events. All of the PCNA-interacting factors were found to contain short highly conserved sequence motifs that were important for the interactions (47,49). Finally, it was demonstrated that the PCNA heterotrimer, but not individual PCNA1, -2, or -3 monomers, stimulates the activity of its partners, consistent with its role as a replication processivity factor.
During DNA replication in vivo, the sliding clamp requires a specialized clamp loader complex to open and shut the ring around double-stranded DNA whenever DNA synthesis is initiated at a primer. Since lagging strand synthesis requires frequent reloading of the polymerase and sliding clamp, the clamp loader complex is an integral component of the replisome. Indeed, it is considered to be a major organizing factor for the replisome (5). Most clamp loader complexes are composed of five or more subunits ( Table 1). The S. solfataricus clamp loader, named replication factor C (RFC), has an apparently simpler organization than both the E. coli and eukaryotic complexes, although the subunits are homologous to some of the eukaryotic proteins. It has a single large subunit (RFC L ) and four identical small subunits (RFC S ). Dionne et al. (47) determined protein-protein interactions between these subunits and the PCNA subunits, which predicted a model for clamp loading wherein the RFC S tetramer component of RFC associates with the PCNA1-2 dimer, and the RFC L component associates with PCNA3. A hinge-like action between RFC L and the RFC S tetramer then engulfs double-stranded DNA and transiently locks down on the DNA, leaving the PCNA trimer in place around the DNA. More recently, an electron microscopic structure of an RFC⅐PCNA⅐DNA complex from P. furiosus has suggested that the RFC small subunits form an open ring ("horseshoe-like") structure that mirrors the PCNA ring in the complex (50,51). It was proposed that this represents an intermediate in a "spring-washer" mechanism for PCNA loading in which the PCNA trimer is disrupted at one interface by RFC and opened, perpendicular to the plane of the ring, to allow passage of double-stranded DNA.

Putting it All Together: Future Challenges
While some progress has been made in determining the form and function of individual Sulfolobus DNA replication proteins, it is clear that considerable challenges lie ahead. Both replication initiation and replication fork assemblies are hugely complex nucleoprotein machines, with highly defined architectures and regulation. Structural studies of the clamp loader/sliding clamp interaction, for example, have begun to provide information on the level of cooperation that must exist within entire chromosome replication machines. Efforts must now be made to reconstitute higher order systems for the molecular dissection of these fundamentally important and conserved processes in the archaeal and eukaryotic domains. It is hoped that use of the simplified archaeal DNA replication systems, including that of Sulfolobus, will begin to shed light on the intricate complexities of both archaeal and eukaryotic DNA replication.  (38). The PriS and PriL subunits are shown in cyan and magenta, respectively (derived from PDB file 1ZT2 using Pymol). The catalytic residues Asp 101 , Asp 103 , and Asp 235 of PriS are shown in red. The zinc-binding stem is shown in slate blue with a Zn 2ϩ ion indicated by the gray sphere. An RNA-DNA duplex is modeled with the growing RNA chain shown in green and DNA template in yellow.