Functional domains of a geminivirus replication protein.

Tomato golden mosaic virus, a member of the geminivirus family, has a single-stranded DNA genome that is replicated and transcribed in infected plant cells through the concerted action of viral and host factors. One viral protein, AL1, contributes to both processes by binding to a directly repeated, double-stranded DNA sequence located in the overlapping (+) strand origin of replication and AL1 promoter. The AL1 protein, which occurs as a multimeric complex in solution, also catalyzes DNA cleavage during initiation of rolling circle replication. To identify the tomato golden mosaic virus AL1 domains that mediate protein oligomerization, DNA binding, and DNA cleavage, a series of truncated AL1 proteins were produced in a baculovirus expression system and assayed for each activity. These experiments localized the AL1 oligomerization domain between amino acids 121 and 181, the DNA binding domain between amino acids 1 and 181, and the DNA cleavage domain between amino acids 1 and 120. Deletion of the first 29 amino acids of AL1 abolished DNA binding and DNA cleavage, demonstrating that an intact N terminus is required for both activities. The observation that the DNA binding domain includes the oligomerization domain suggested that AL1-AL1 protein interaction may be a prerequisite for DNA binding but not for DNA cleavage. The significance of these results for AL1 function during geminivirus replication and transcription is discussed.

Geminiviruses are plant DNA viruses characterized by their single-stranded genomes and their double icosohedral particle morphology (for review, see Ref. 1). They replicate their small genomes through double-stranded DNA intermediates in the nuclei of infected plant cells using a rolling circle mechanism (2)(3)(4). These properties make geminiviruses unusual among plant viruses, most of which have RNA genomes and/or replication intermediates. Geminiviruses encode only a few proteins for their replication and depend on the host DNA replication machinery. Thus, geminiviruses represent excellent model systems for studying DNA replication mechanisms in plant cells.
Geminivirus genomes consist of either one or two circular DNA components. Each component contains divergent transcription units separated by a 5Ј-intergenic region. The intergenic region of all geminiviruses includes a hairpin motif with a conserved AT-rich loop sequence that contains the initiation site for (ϩ) strand DNA replication (3,(5)(6)(7). A directly repeated sequence upstream from the hairpin is bound by the viral replication protein AL1 and is required for virus-specific DNA replication (8) of tomato golden mosaic virus (TGMV) 1 and bean golden mosaic virus (BGMV). Similar motifs are found in a number of related geminivirus genomes (9), but their functional significance is not known. AL1 is the only viral protein required for replication of all geminiviruses (10,11). (For some geminiviruses, the AL1 homologue is designated C1.) Many geminiviruses also encode a second protein, AL3, which greatly enhances replication (10,12).
The AL1 protein plays key roles in viral DNA replication and transcription. AL1 confers virus-specific recognition of its cognate origin of replication (8) and initiates (ϩ) strand DNA replication (6,7). It represses its own expression at the level of transcription (13) and can enhance transcription of late genes of some geminiviruses (14). In addition, AL1 induces expression of a host DNA synthesis protein, proliferating cell nuclear antigen, in nondividing plant cells (15). Multiple biochemical activities have been described for AL1/C1 proteins. The AL1 proteins of TGMV and BGMV bind double-stranded viral DNA in a site-and virus-specific manner (16). Single-stranded DNA binding activity has also been reported for TGMV AL1 (17). The AL1/C1 proteins from tomato yellow leaf curl virus (TYLCV), wheat dwarf virus, and TGMV cleave the (ϩ) DNA strand in the conserved loop sequence of the hairpin (6,7). Covalent cross-linking of TYLCV C1 to the 5Ј-end of the cleaved DNA has also been detected (6). In addition, ATP and GTPase activity has been demonstrated for TYLCV C1 (18). Last, multiple protein interactions have been detected for AL1/C1 proteins. TGMV AL1 interacts with itself, AL3, and RRB1, a maize retinoblastoma homologue (19). 2 Interactions between wheat dwarf virus C1 and retinoblastoma proteins from human and maize have also been reported (21)(22)(23)(24).
Recent experiments have begun to identify the functional domains of AL1. The first 211 amino acids of TYLCV C1 are sufficient to confer site-specific DNA cleavage in vitro (25). This region contains three amino acid motifs that are conserved among all geminivirus AL1/C1 proteins and many rolling circle initiator proteins from other systems (26,27). The third motif includes a conserved tyrosine residue that functions in the DNA cleavage and joining reaction and mediates covalent linkage to the 5Ј-end of nicked DNA (28). The C terminus of AL1/C1 contains a fourth conserved motif, a P-loop, that is found in many NTP binding proteins (29). Mutation of a lysine residue in the P-loop of TYLCV C1 reduced or abolished ATPase activity of the protein (18,30). Mutations in the third DNA cleavage motif and the NTP binding sequence also interfered with geminivirus replication in vivo (18,30).
Genetic experiments using chimeric AL1 proteins mapped virus-specific origin recognition to the N-terminal third of AL1/C1 in TYLCV (31), beet curly top virus (32), TGMV, and BGMV. 3 However, the chimeric studies with TGMV and BGMV showed that interaction between the N terminus of AL1 and the cognate DNA binding motif is only part of the requirements for virus-specific origin recognition in vivo. No biochemical studies have addressed the protein domain involved in AL1-DNA binding. There is also no information regarding the various AL1 protein-protein interaction domains. In this article, we identified the domains of TGMV AL1 that mediate protein oligomerization, DNA binding, and DNA cleavage.
Coding sequences for N-terminal truncated AL1 proteins (Fig. 1A) were constructed by inserting an SphI linker into repaired restriction sites at TGMV A positions 2442 (SalI) and 2059 (NcoI) to create inframe start codons. An SphI linker was also inserted into a repaired HindIII site of pMON27025 to make pNSB448. The truncated AL1 open reading frames were subcloned as SphI-EcoRI and SphI-BamHI fragments into the same sites of pNSB448 to give pNSB516 (AL1 121-352 ) and pNSB469 (AL1 182-352 ), respectively.
Engineered restriction sites and an endogenous NcoI site at TGMV A position 2059 were used to create open reading frames for GST-AL1 fusion proteins lacking N-terminal AL1 sequences. The AL1 coding sequence was modified at TGMV A positions 2516 -2517 using the primer 5Ј-GTAATTGAGAAAGTACTTCTTCTTTGGAC to introduce an ScaI site and create pNSB162 (34). A BstBI site was also introduced in the AL1 coding sequence by mutating TGMV A position 2404 using the primer 5Ј-GGCAGCAGTATTTTCCTTCGAACTGAATAAGC to make pNSB428. The ScaI-BamHI and BstBI-BamHI fragments from pNSB162 and pNSB428, respectively, were repaired with Escherichia coli DNA polymerase (Klenow fragment) and fused in frame with the GST coding region of pNSB314 at an SmaI site. The resulting plasmids encoded GST-AL1 121-352 (pNSB564) and GST-AL1 66 -352 (pNSB563). A trimmed SacI-SmaI fragment from pNSB310 containing the GST coding region was inserted into a repaired NdeI site of pNSB516 to give pNSB547, encoding GST-AL1 182-352 .
Expression, Purification, and Oligomerization of AL1 Proteins-Recombinant proteins were produced in Sf9 cells using a baculovirus expression system according to published protocols (7,19). The GST-AL1 fusion proteins were purified by glutathione affinity chromatography (7) and analyzed in vitro for various AL1 activities. Aliquots (3 g) of the purified proteins were fractionated by SDS-polyacrylamide gel electrophoresis and visualized by staining with Coomassie Brilliant Blue dye.
Protein extracts from Sf9 insect cells co-expressing authentic and GST-AL1 fusion proteins were also assayed for AL1 oligomerization by co-purification on glutathione-Sepharose (19). Co-purification was monitored by SDS-polyacrylamide gel electrophoresis followed by transfer to a nitrocellulose membrane (Schleicher & Schuell) and immunoblotting using the ECL detection system (Amersham Corp.). Primary antibodies were rabbit polyclonal anti-GST (Upstate Biotechnology Inc.) and anti-AL1 antisera (19).
The relative molecular masses of full-length AL1 (AL1  ) and the C-terminal truncated proteins AL1 1-120 and AL1 1-181 were determined by size exclusion chromatography of extracts from insect cells infected with the corresponding recombinant baculoviruses. Extracts were prepared by mixing cells for 30 min at 4°C in column buffer (50 mM Tris-HCl, pH 8.0, 1 mM EDTA, 0.15 M NaCl, and 1 mM dithiothreitol) supplemented with protease inhibitors (19). The extracts were clarified by centrifugation for 1 h at 100,000 ϫ g. Approximately 0.2 mg of protein (1 mg/mL) was applied to a 50 ϫ 1-cm column of Sepharose CL-6B in column buffer, chromatographed at 0.2 ml/min, and eluted as 0.5-ml fractions. To determine the elution positions of the various AL1 proteins, 75 l of each fraction was analyzed by SDS-polyacrylamide gel electrophoresis and immunoblotting with anti-AL1 serum as described above. The column was calibrated with protein molecular weight markers (Sigma) individually diluted with column buffer. V e values of protein standards were determined by monitoring the column effluent at A 280 . The V 0 was determined from the elution volume of blue dextran. Relative molecular masses of AL1 proteins were estimated from linear regression analysis of V e /V 0 versus the logarithm of the molecular masses of the protein standards.
In Vitro Assays for AL1 Function-DNA electrophoretic mobility shift assays and DNA cleavage assays were performed as described previously (7). For the binding assays, an 83-base pair EcoRI fragment containing the AL1-DNA binding motif (TGMV A positions 28 -84) was isolated from pNSB378 and 3Ј-end-labeled using Klenow and [␣-32 P]dATP. The radiolabeled DNA was incubated with purified GST-AL1 fusion proteins for 1 h at room temperature. DNA and protein concentrations are provided in the figure legends. The bound and free probes were resolved on 1% agarose gels, dried on Whatman DE-81 paper, and analyzed by autoradiography.
For DNA cleavage assays, a single-stranded oligonucleotide (5Ј-GTT-TAATATTACCGGATGGCCGC) corresponding to the loop and right side of the hairpin structure in the TGMV (ϩ) strand origin was 5Ј-endlabeled using polynucleotide kinase and [␥-32 P]ATP. Approximately 5000 cpm of labeled DNA was incubated with 100 ng of purified GST-AL1 fusion protein in 10 l of cleavage buffer (25 mM Tris-HCl, pH 7.5, 75 mM NaCl, 5 mM MgCl 2 , 2.5 mM EDTA, and 2.5 mM dithiothreitol) for 30 min at 37°C. The reactions were terminated by adding 6 l of gel loading buffer (95% formamide, 20 mM EDTA, and 0.05% bromphenol blue) and heating to 90°C for 2 min. The reaction products were resolved on 15% polyacrylamide denaturing gels.
ATPase assays were performed essentially as described by Desbiez et al. (18). Approximately 300 ng of GST-AL1 fusion proteins were incubated for 30 min at 37°C in a buffer containing 25 mM Tris-HCl, pH 7.5, 20 mM NaCl, 2 mM MgCl, 0.01% Triton X-100, 40 M ATP, and 110 fmol [␥-32 P]ATP. Free phosphate was extracted and measured according to the protocol described by Seto-Young and Perlin (35) with the following modifications. The reaction was stopped with 3 volume of 5% ammonium molybdate in 2 N SH 2 O 4 , and free phosphate was extracted with an equal volume of N-butanol. Radioactivity in a 50-l aliquot was measured by liquid scintillation.
AL1 oligomerization was further examined by size exclusion chromatography through Sepharose CL-6B. The chromatographic properties of AL1 1-352 and AL1 1-181 , both of which are predicted to oligomerize, were compared with those of AL1 1-120 , which lacks the oligomerization domain. The column was calibrated with seven globular proteins of known molecular masses from 12.4 to 660 kDa. Linear regression analysis demonstrated a linear relationship between the elution volumes of the protein standards and the logarithms of their reported molecular masses. Immunoblot analysis indicated that AL1 1-352 , AL1 1-181 , and AL1 1-120 eluted with apparent molecular masses of 318, 156, and 13.7 kDa (Fig. 2B), respectively, whereas their predicted monomeric molecular masses are 40.5, 20.8, and 14.0 kDa. Thus, the elution profiles of AL1 1-352 and AL1 1-181 demonstrated that they form large protein complexes but that AL1 1-120 occurs as a monomer in solution, corroborating the glutathione affinity chromatography data ( Fig. 2A).
The AL1-DNA Binding Domain-AL1 binds specifically to a directly repeated DNA sequence in the 5Ј-intergenic region of the TGMV genome (8,33). We used purified GST fusion proteins truncated at AL1 amino acids 213, 181, and 120 ( Fig. 1B) to map the C-terminal boundary of the AL1-DNA binding domain in vitro. GST-AL1 1-352 , the GST-AL1 truncations, or GST alone were expressed in insect cells and purified by binding to glutathione resin. The affinity-purified proteins were pure or highly enriched, as determined by Coomassie Brilliant Blue staining of SDS-polyacrylamide gels (Fig. 3, lanes 1-5).
Three amino acid motifs proposed to be part of the DNA cleavage domain are conserved in the N termini of geminivirus AL1/C1 proteins (Fig. 1A) and in the initiator proteins from other rolling circle replication systems (26). GST-AL1 29 -352 , GST-AL1 66 -352 , and GST-AL1 182-352 , which sequentially delete each motif, were assayed for DNA cleavage activity (Fig. 6B). GST-AL1 1-352 specifically cleaved DNA containing the origin nick site (Fig. 6B, lane 1), whereas all three truncated proteins were deficient for DNA cleavage activity (Fig. 6B, lanes 2-4). Purified GST alone did not cleave DNA (Fig. 6B, lane 5), indicating that the product resulted specifically from AL1 activity. These results demonstrated that sequences in the first 28 amino acids of AL1, which contain motif I, are essential for DNA cleavage.
The N-terminal truncation, GST-AL1 182-352 , was deficient for AL1-AL1 interaction and DNA cleavage and did not contain the domain responsible for DNA binding. To verify that GST-AL1 182-352 was properly folded, we assayed for ATP hydrolysis, because previous studies showed that ATP and GTPase activity is located in the C terminus of TYLCV C1 (18). Equivalent amounts of GST-AL1 1-352 and GST-AL1 182-352 hydrolyzed 80 and 50% of the radiolabeled ATP, respectively, whereas GST showed background levels of free phosphate (data not shown). Thus, all of the truncated GST-AL1 proteins possessed at least one of the activities described for AL1. DISCUSSION Small DNA viruses with their limited coding capacities frequently specify proteins that have multiple roles during infection. The range of activities and the complexity of multifunctional viral proteins is best exemplified by SV40 large T antigen, which is involved in replication, transcription, and host induction (36). Recent studies established that the TGMV AL1 protein displays a similar range of functions during geminivirus infection (37,38). To begin to understand the organization of the AL1 protein and how the different activities are coordinated, we mapped the functional domains for TGMV AL1 oligomerization, DNA binding, and DNA cleavage. Our experiments showed that all three functions are mediated by overlapping domains in the N terminus of the AL1 protein.
We mapped the AL1 oligomerization domain by examining the capacities of truncated proteins to co-purify with GST-AL1  . In this assay, AL1 1-181 , AL1 1-213 , and AL1 120 -352 interacted with GST-AL1 1-352 . The only region common to all three proteins is from amino acids 121 to 181. Two proteins, AL1 1-120 and AL1 182-352 , which lacked this sequence, failed to co-fractionate with GST-AL1 1-352 . However, GST fusions of AL1 1-120 and AL1 182-352 were active for DNA cleavage and ATP hydrolysis, respectively, indicating that the truncated proteins were properly folded and that the loss of protein interaction was due to deletion of sequences required for AL1 oligomerization. This conclusion was further supported by gel filtration data showing that the apparent and predicted monomeric molecular mass of AL1 1-120 are equivalent, consistent with it occurring as a single subunit in solution. In contrast, the apparent molecular masses of AL1 1-352 and AL1 1-181 complexes are approximately eight times greater than their predicted monomeric masses. However, the precise stoichiometry of the AL1 subunits could not be determined, because the complexes may have included other AL1-interacting proteins in the crude extracts. Together, these data demonstrated that native AL1 is oligomeric and that amino acids 121-181 are required and sufficient for AL1 oligomerization.
Truncated GST-AL1 proteins were used in electrophoretic mobility shift assays to map the TGMV AL1-DNA binding domain. The failure of GST-AL1 29 -352 to bind DNA demonstrated that the first 28 amino acids of AL1 are essential for protein-DNA interactions. The loss of DNA binding activity by GST-AL1 1-120 but not GST-AL1 1-181 located the C-terminal boundary of the DNA binding domain between amino acids 121 and 181. These results showed that the functional domain for DNA binding is between AL1 amino acids 1 and 181. Hong and Stanley (39) reported that the first 57 amino acids of the C1 protein of African cassava mosaic virus (ACMV) are sufficient to repress C1 expression in tobacco protoplasts and proposed that the DNA binding domain of ACMV C1 is located in this region (39). In similar studies, we found that deletion of as little as 39 amino acids from the TGMV AL1 C terminus abrogated transcriptional regulation, 4 indicating that the DNA binding domain of TGMV AL1 is not the only requirement for repression in vivo. One potential explanation for the observed differences between the TGMV and ACMV proteins may be that the putative ACMV C1 binding site does not contain directly repeated motifs such as those found in the TGMV AL1 binding site. Thus, TGMV AL1 and ACMV C1 may contact their respective promoters differently and may repress transcription through different mechanisms.
AL1 recognition of the (ϩ) strand origin is essential for virus-specific DNA replication (8). Chimeric virus studies showed that the N-terminal third of C1 confers virus-specific replication to closely related strains of TYLCV (31) or beet curly top virus (32). Replication studies using chimeric origins and AL1 expression cassettes established that amino acids 1-116 of TGMV and BGMV AL1 specifically recognize the repeated DNA binding motifs in their respective (ϩ) strand origins in vivo. 3 In contrast, we showed that AL1 amino acids 4  1-181 are necessary for DNA binding in vitro. The additional sequences between amino acids 121 and 181 required for in vitro DNA binding may contribute essential DNA contacts that are conserved between TGMV and BGMV. Alternatively, AL1 oligomerization, which has been mapped to amino acids 121-181, may be a prerequisite for AL1-DNA binding. Chimeric studies only reveal amino acid differences involved in AL1-DNA interactions and, therefore, cannot distinguish between these possibilities.
DNA binding proteins frequently interact with DNA as dimeric or multimeric complexes (40,41). Two observations support the idea that TGMV AL1 binds DNA as a multimer. First, the TGMV AL1 binding site contains a repeated motif, such that two AL1 subunits could interact simultaneously with the site. Protein dimer interactions with directly repeated sequences have been described for ␣-2 protein (42) and HAP1 (43). Second, electrophoretic mobility shift assays suggested that AL1 binds DNA as a large multimeric complex, with AL1-DNA complexes failing to enter polyacrylamide gels and only being resolved on agarose gels. Binding experiments using circularly permuted DNA fragments indicated that it is unlikely that the low electrophoretic mobility of the AL1-DNA complex is due to unusual DNA structure or bending. 5 Several assays failed to determine the stoichiometry of the AL1-DNA complexes. Electrophoretic mobility shift assays with fulllength and truncated AL1 proteins bound to DNA did not distinguish heterodimer formation (data not shown). In addition, fusion to GST, which dimerizes with itself (44), or addition of AL1 antibodies did not restore DNA binding activity to AL1 1-120 . 5 Based on these results, we think that it is unlikely that fusion to a heterologous protein interaction domain will restore DNA binding activity to AL1 1-120 and that a different strategy will be necessary to address the relationship between DNA binding and oligomerization.
AL1 initiates rolling circle replication by introducing a nick into the (ϩ) strand origin of the viral DNA. Heyraud-Nitschke et al. (25) showed that the first 211 amino acids of the TYLCV C1 protein possess DNA cleavage activity. Our results showed that the first 120 amino acids of TGMV AL1 specifically cleaves single-stranded DNA containing the (ϩ) strand origin in vitro and that AL1 oligomerization and DNA binding were not prerequisites for cleavage of a single-stranded DNA in vitro. However, AL1-DNA binding may be required for cleavage of the double-stranded viral genome during rolling circle replication in vivo. Three motifs in the N termini of all geminivirus AL1/C1 proteins are also conserved among initiator proteins from other rolling circle systems (26,27). Motif I (FLTY) is located between amino acids 16 and 19. Motif II (HLH) is a putative metal binding site consisting of two histidines within a region of bulky hydrophobic residues. Motif III includes a highly conserved tyrosine residue that is required for DNA cleavage and ligation by TYLCV C1 (28). Although the role of motif I is unknown, deletion of the N-terminal 28 amino acids of TGMV AL1 abolished DNA cleavage activity, suggesting that this conserved element may be essential for DNA cleavage. The loss of DNA cleavage activity by GST-AL1 29 -352 precluded any conclusions about motifs II or III based on other N-terminal truncations of AL1.
The AL1/C1 protein sequences from 17 dicot-infecting geminiviruses were compared using the EMBL Predict program (45) to determine whether the N terminus of AL1 contains any conserved structural motifs that might contribute to DNA binding, DNA cleavage, or oligomerization. This analysis revealed two sets of ␣-helices that are predicted with greater than 80% probability (Fig. 7A). Helices 1 and 2 are between TGMV AL1 amino acids 25-52 in the overlapping DNA binding and cleavage domains and might be involved in these activities. The sequences of both helices show a high degree of homology among different geminiviruses (Fig. 7B), especially helix 2, which is conserved at 9 of 11 positions and displays a strong amphipathic character (Fig. 7C). Most known DNA binding motifs include ␣-helical regions that recognize and contact DNA (46). However, the AL1 N terminus shows no obvious 5  homology to the ␣-helical motifs of basic/helix-loop-helix, ets, homeodomain, zinc finger, or basic/leucine zipper proteins (for review, see Refs. 46 -48). AL1 helices 1 and 2, which are separated by a 5-amino acid loop, most resemble the helix-turnhelix motif, but our sequence comparison failed to uncover a nearby third helix characteristic of most helix-turn-helix DNA binding domains (49). The second set of predicted ␣-helices is located between TGMV AL1 amino acids 131 and 152 in the overlapping AL1-DNA binding and oligomerization domains (Fig. 7A). Several classes of DNA binding proteins, including members of the basic/helix-loop-helix, homeodomain, and basic/leucine zipper families, use ␣-helices for dimerization as well as DNA contacts (46,50,51). The significance of these predicted structures in DNA binding and/or protein interactions is being investigated.
The oligomerization, DNA binding, and DNA cleavage domains are located to the N-terminal half of AL1, whereas very little is known about the C terminus of the protein. To date, the only biochemical activity that has been attributed to the AL1 C terminus is ATP and GTP hydrolysis (18). We found that deletion of only 39 amino acids from the TGMV AL1 C terminus abolished DNA replication and repression in vivo, 5 further demonstrating the functional importance of this region. We also observed that deletion of the C-terminal 139 amino acids of AL1 enhanced DNA binding activity approximately 4-fold in vitro, suggesting that a negative effector of DNA binding may be located in this region. Many transcription factors contain regions that inhibit their DNA binding activity unless complexed with other proteins or co-factors (43,52,53). The AL1 C terminus may also mediate interaction with other proteins, single-stranded DNA binding, nuclear localization, and/or attachment to the nuclear matrix. We are continuing to map the functional domains of TGMV AL1 to gain a more complete understanding of AL1 structure and function. We are also constructing a series of site-directed mutations in the N terminus of TGMV AL1 to address the functional significance of the predicted ␣-helices and the relationship between DNA binding and protein oligomerization.