Phage φ29 DNA Replication Organizer Membrane Protein p16.7 Contains a Coiled Coil and a Dimeric, Homeodomain-related, Functional Domain*

The Bacillus subtilis phage φ29-encoded membrane protein p16.7 is one of the few proteins known to be involved in prokaryotic membrane-associated DNA replication. Protein p16.7 contains an N-terminal transmembrane domain responsible for membrane localization. A soluble variant lacking the N-terminal membrane anchor, p16.7A, forms dimers in solution, binds to DNA, and has affinity for the φ29 terminal protein. Here we show that the soluble N-terminal half of p16.7A can form a dimeric coiled coil. However, a second domain, located in the C-terminal half of the protein, has been characterized as being the main domain responsible for p16.7 dimerization. This 70-residue C-terminal domain, named p16.7C, also constitutes the functional part of the protein as it binds to DNA and terminal protein. Sequence alignments, secondary structure predictions, and spectroscopic analyses suggest that p16.7C is evolutionarily related to DNA binding homeodomains, present in many eukaryotic transcriptional regulator proteins. Based on the results, a structural model of p16.7 is presented.

In the case of bacteria, replication of the chromosome, plasmids, and infecting phages occurs at the cell membrane (for review see Ref. 1). Besides acting as a scaffold and inherently compartmentalizing DNA replication, membranes are also likely to play an important role in the organization and function of the DNA replication complex (for recent reviews see Refs. 2 and 3). Nevertheless, very little is known about proteins participating in these processes. We started to investigate this fundamental process by using the Bacillus subtilis phage 29, one of the best studied phages (for review see Ref. 4), as a model system. The genome of 29 encodes most, if not all, proteins required for phage DNA replication, and detailed knowledge is available on in vitro 29 DNA replication. This makes 29 an attractive system to study membrane-associated DNA replication.
The genome of 29 consists of a linear double-stranded DNA (dsDNA) 1 that contains a terminal protein (TP) covalently linked at each 5Ј end. Initiation of 29 DNA replication occurs via a so-called protein-primed mechanism (4 -6). First, the TP-containing DNA ends are recognized by a 29 DNA polymerase/TP heterodimer. Then, after a transition step, these two proteins dissociate, and the DNA polymerase continues processive elongation, which is coupled to strand displacement, until replication of the nascent DNA strand is completed. Consequently, the DNA replication intermediates can contain extremely long stretches of ssDNA. Ivarie and Pène (7) provided evidence that early expressed 29 protein(s) are required for membrane-associated 29 DNA replication. Gene 16.7, present in an early expressed operon, is conserved in all 29-related phages studied so far (4). Previously, we have shown (4,8,9) that p16.7 (130 amino acids) is involved in the organization of membrane-associated 29 DNA replication. Protein p16.7 is a membrane protein, and its Nterminal transmembrane domain is responsible for membrane localization (8). Studies using a soluble variant in which the membrane anchor is replaced by a histidine tag, p16.7A, revealed (i) that it is a dimer in solution, (ii) that it has both single-stranded (ss) and dsDNA binding activity, and (iii) that it has affinity for the 29 TP (8 -10).
Here we determine the regions responsible for p16.7A dimerization and identify the functional DNA and TP binding domain. We provide evidence that this functional domain may be evolutionarily related to eukaryotic homeodomains.

EXPERIMENTAL PROCEDURES
Bacterial Strains and Growth Conditions-Escherichia coli strains JM109 (11) and BL21(DE3)pLysS (12) and B. subtilis strain 110NA (13) were used for cloning and overexpression of proteins. Chloramphenicol, kanamycin, and ampicillin were added to cultures and plates at final concentrations of 10, 20, and 100 g/ml, respectively.
DNA Techniques-All DNA manipulations were carried out according to Sambrook et al. (14). [␥-32 P]ATP (3000 Ci/mmol) was obtained from Amersham Biosciences. PCRs to amplify the N-and C-terminal 16.7 regions were done using proofreading-proficient Vent DNA polymerase (New England Biolabs, Beverly, MA) as described previously (9). The purified PCR products were digested with NdeI and NotI, and cloned into the pET-28b(ϩ) vector (Novagen) digested with the same enzymes resulting in vectors pET-16.7N and pET-16.7C. Site-directed mutants of 16.7A were obtained using the QuikChange TM site-directed mutagenesis kit (Stratagene). Mutant gene 16.7A4 was constructed by stepwise introduction of mutations changing a Leu into an Arg codon using appropriate complementary oligonucleotides and plasmid pUSH16.7A (8) as starting template DNA.
Overexpression and Purification of p16.7A and Its Derivatives and of TP-Protein p16.7A and all its derivatives, except p16.7Cb, contain an N-terminal His 6 tag (see Fig. 1A). Therefore, these proteins were purified from the corresponding E. coli strains in which they were overexpressed using Ni 2ϩ -NTA resin columns as described before (8). Thus, overnight cultures of E. coli harboring the appropriate plasmid were diluted 100-fold in fresh prewarmed LB medium and grown to an A 600 of 0.6 -0.7. Expression, induced upon isopropyl 1-thio-␤-D-galactopyranoside addition to a final concentration of 1 mM, was allowed for 2 h. Cells were harvested by centrifugation and stored at Ϫ70°C until further use. Frozen cells were thawed at 4°C and ground with twice their weight of alumina powder (Merck) for 20 min. The slurry was resuspended in buffer A (50 mM Tris-HCl, pH 7.5, 0.5 M NaCl, 1 mM EDTA, 5% (v/v) glycerol, and 7 mM ␤-mercaptoethanol) using 4 volumes per g of cells. To remove the alumina and intact cells, the mixture was centrifuged at 2500 ϫ g. The pellet was resuspended in 2 volumes of buffer A and centrifuged again as before. The pooled supernatants were next centrifuged for 15 min at 15,000 ϫ g to pellet the insoluble proteins. After dialyzing the soluble fraction against buffer B (50 mM NaPO 4 buffer, pH 7.8), the cell extract was centrifuged again for 15 min at 15,000 ϫ g, and the supernatant was subsequently passed twice over a Ni 2ϩ -NTA resin column equilibrated in buffer B. The column was then washed with 10 column volumes of buffer B containing 20, 30, 40, and 45 mM imidazole. The recombinant protein was eluted with 5 ml of buffer B containing 200 mM imidazole. Next, the eluate was dialyzed against buffer A containing 200 mM NaCl and 50% glycerol, and the protein was stored at Ϫ70°C in aliquots. Wild type TP was purified from B. subtilis 110NA cells harboring the TP expression plasmid pPR54w3 (15) essentially as described before (16).
Protein p16.7Cb was obtained by digestion of protein p16.7C with thrombin using the thrombin cleavage capture kit (Novagen). Complete cleavage of p16.7C with thrombin, carried out according to the manual, was confirmed by SDS-PAGE. The biotinylated thrombin used to cleave p16.7C was then removed from the sample by using the streptavidinagarose column provided with the kit, and SDS-PAGE was used to verify the removal of thrombin. Next, the sample was passed over a Ni 2ϩ -NTA resin column to remove the cleaved histidine tag and possible trace amounts of undigested p16.7C. Finally, the sample containing purified p16.7Cb was dialyzed against buffer A containing 200 mM NaCl and 50% glycerol and was stored at Ϫ70°C in aliquots. Matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry analyses of the intact protein p16.7Cb gave a single peak corresponding to 8461.6 Da (average molecular weight), which corresponds well with the calculated molecular mass of 8457.6 Da if thrombin had cleaved p16.7C at its expected site indicated in Fig. 1A. Moreover, the peptide map obtained after in-gel tryptic digestion of p16.7Cb contained a fragment with a molecular mass of 674.3 Da, which matches the calculated molecular weight of the "GSHMDK" peptide corresponding to the expected N-terminal tryptic fragment. Finally, the sequence of this peptide was confirmed by RP-LC/ESI-IT mass spectrometry analysis.
In Vitro Cross-linking and Gel Mobility Shift Assays-Cross-linking reactions using disuccinimidyl suberate (DSS) as cross-linking agent and gel retardation assays were performed as described before (8,9).
CD Spectroscopy and Dissociation/Unfolding Equilibrium Analyses-CD measurements were carried out using a Jasco-600 spectropolarimeter equipped with a NESLAB RTE-100 temperature control unit interfaced to a computer. The recorded far-UV spectra were the average of three to five scans obtained at a rate of 50 nm/min, a response time of 2 s, and a bandwidth of 1 nm. Samples were prepared in 50 mM phosphate buffer, pH 7.5, and 250 mM NaCl, at the indicated concentrations. The temperature was kept constant at 25, 15, or 4°C, and each sample was allowed to reach chemical and thermal equilibrium. The percent helicity of the protein was calculated by using the following equation (17): % helicity ‫؍‬ R /( RH (1 Ϫ k/n)), where R is the molar ellipticity per residue at 222 nm; RH is the molar ellipticity per residue at 222 nm for a 100% ␣-helical protein, taken as Ϫ34,500 degrees⅐cm 2 / dmol; k is a constant factor (k ϭ 2.57 for ϭ 222 nm); and n is the number of peptide bonds in the protein. Chemical denaturation equilibrium experiments were carried out by measuring the ellipticity at 222 nm of p16.7N solutions at 548 M in 23 mM phosphate buffer, pH 7.5, and 117 mM NaCl, containing different concentrations of GdmHCl (from 0 to 3.5 M), and using 0.1-mm path length cells. The temperature was kept constant at 4°C, and each sample was allowed to reach chemical and thermal equilibrium. The data were fitted to a folded dimer-to-unfolded monomer, two-state transition. The equations described in Mateu (18) and the program Kaleidagraph (Abelbeck Software) were used.
Analytical Size-exclusion Chromatography-A calibrated Superdex 75 HR 10/30 fast protein liquid chromatography column (Amersham Biosciences) was used. The chromatographic process was carried out at constant room temperature (23°C) in 50 mM Tris-HCl buffer, pH 7.5, 300 mM NaCl, 1 mM EDTA, and 1 mM dithiothreitol at 1 ml/min. For an estimation of dissociation constants, analytical gel filtration experiments were carried out as described (19,20).
Partial Proteolytic Digestion of p16.7A-Proteolytic digestion was performed in a final volume of 25 l, containing 50 mM Tris-HCl, pH 7.5, 1 mM dithiothreitol, 10% (v/v) glycerol, 9 g of purified p16.7A protein, and the indicated amount (ng) of proteinase K. After incubation for 1 min at 30°C, proteolysis was stopped by addition of 4ϫ loading buffer after which the samples were subjected to SDS-PAGE and Coomassie Blue staining.
Mass Spectrometry Analyses-Protein bands were cut from Coomassie Blue-stained SDS-polyacrylamide gels, minimizing the amount of polyacrylamide gel. Each band was digested "in situ" with trypsin as described before (21). A small aliquot of the trypsin digestion supernatant (0.5 l) was analyzed directly by MALDI-TOF-type of mass spectrometry using an Autoflex model of Bruker Daltonic (Bremen, Germany) equipped with a reflector and employing 2,5-dihydroxybenzoic acid as matrix and a Anchor-Chip surface target (Bruker Daltonic, Bremen, Germany). The experimentally obtained tryptic peptide maps were assigned by comparing their masses with the calculated ones obtained after theoretical tryptic digestion. The assignation was verified by analyzing the various peptides by reverse phase-liquid chromatography coupled to mass spectrometry (RP-LC/MS). For this, an electrospray with an ion trap analyzer (ESI-IT) mass spectrometer (model Deca-XP; Thermo-Finnigan, San José, CA) and a ThermoHypersil (0.18 ϫ 150 mm) C18 column was used. The digests were dried in a speedvac and resuspended in 0.5% acetic acid (dissolved in water) before they were supplied to LC/RP-MS and subjected to an organic gradient. The solvents used are as follows: 0.5% acetic acid in water (aqueous solvent) and 80% acetonitrile in water (organic solvent) at a flow rate of 1.5 l/min using a micro-spray "metal needle-kit" (Thermo-Finnigan, San José, CA) interface. The analyzer was programmed to isolate and fragment the masses of interest. The obtained MS/MS spectra from each of the peptides were analyzed by assigning the fragments to each of the candidate sequences after calculating the series of theoretical fragmentations according to the nomenclature of the ion series described before (22).
Molecular Modeling and Computer Analysis-The secondary structure of p16.7C was predicted using different programs including SAM-T02, PredictProtein, Psi-Pred, and Jpred. The identities of p16.7C with other DNA-binding proteins were screened against the protein data base using the three-dimensional PSSM program (URL, sbg.bio.ic.ac.uk/3dpssm/). The three-dimensional model of the C-terminal half of p16.7 was carried out by the Swiss Model program (URL, expasy.org/ swissmod/SWISS-MODEL.html) (23)(24)(25).

RESULTS
The p16.7 Region Spanning Residues 21-68 Is Able to Dimerize as a Low Affinity Coiled Coil-Protein p16.7, in which the N-terminal transmembrane domain has been replaced by a histidine tag (Fig. 1A, p16.7A), is a dimer in solution (8). Analyses of the p16.7A sequence revealed that the region spanning amino acid residues ϳ25-60 (residue numbering is referred to the complete p16.7 sequence) has a high probability to form a coiled coil (8). A coiled coil consists of at least two amphipathic ␣-helices that are wound into a superhelix having a hydrophobic interface (26). The protein sequences of coiled coils are characterized by a heptad repeat of amino acids, denoted a-g. Residues at positions "a" and "d," which are located at the same face of the helix, are predominantly hydrophobic and form the helix-helix interphase. A helical wheel showing the amphipathic character of this p16.7 region is presented in Fig. 1B. The following reasons made this region of p16.7 the prime candidate to be responsible for p16.7 dimerization. First, coiled-coils are ubiquitous dimerization domains found in a wide range of structural and regulatory proteins. Second, no other region was detected by analyses of the p16.7 protein sequence that a priori would hint to a possible involvement in dimerization. To test the possibility that the putative coiled-coil region is responsible for p16.7 dimerization, plasmids pET-16.7N and pET-16.7N4 were constructed and used to purify proteins p16.7N and p16.7N4 (Fig. 1A). Protein p16.7N encompasses the wild type sequence of the predicted p16.7 coiled-coil region. Protein p16.7N4 contains the same region but includes four Leu (Leu-36, Leu-39, Leu-50, and Leu-53) to Arg substitutions that would completely disrupt the hydrophobic face of the putative ␣-helix (Fig. 1B) and therefore would be unable to form a coiled-coil-mediated dimer.
To determine their oligomeric state in solution, proteins p16.7N and p16.7N4 were subjected to in vitro cross-linking ( Fig. 2A). Whereas a band with a molecular weight corresponding to a dimer was observed in the case of the wild type p16.7N protein after DSS treatment, no cross-linked species were observed for the mutant p16.7N4 protein. These results indicate that, contrary to mutant protein p16.7N4, the wild type protein p16.7N is able to dimerize in solution. The secondary structure of the wild type and mutant proteins was analyzed by far-UV CD spectroscopy. The CD spectrum of p16.7N and p16.7N4 at low and intermediate concentrations (15 and 75 M) at 25°C was characteristic of a random coil (not shown). However, the spectrum of protein p16.7N obtained at high concentration (590 M) revealed a substantial helical content at 25°C, which increased progressively at lower temperatures (Fig. 3A). In sharp contrast, the CD spectrum of the mutant p16.7N4 remained characteristic of a random coil at high concentration (590 M), even at 4°C (Fig. 3A). The estimated helical content of the proteins at the various conditions tested are given in Table I. A maximum helical content of 44% was obtained for p16.7N at 590 M and 4°C. Because p16.7N is 72 amino acids long, about 31 amino acids must be contained in a helical structure under these conditions, which is consistent with the sequence-based prediction of a coiled coil spanning residues 29 -57 of p16.7 (see above and Fig. 1). The equilibrium between an essentially nonstructured state and a partly helical conformation of p16.7N and its dependence on protein concentration indicate that the transition corresponds to a coupled folding-dimerization process. Dimerization of p16.7N is a very low affinity process, however, because the dimeric form is favored only at high protein concentrations and low temperatures. Moreover, the presence of an isodichroic point for the CD spectra obtained at different temperatures is consistent with a two-state transition between unfolded monomer and folded dimer, with no stable intermediates (27,28). Finally, the observation that mutant p16.7N4 was unstructured under all conditions tested further supports that dimerization of p16.7N proceeds through formation of a coiled coil.
The dissociation/unfolding equilibrium of p16.7N was quantitated in chemical dissociation/denaturation experiments. Samples of p16.7N (548 M) were treated with increasing concentrations of GdmHCl, and the ellipticity of the samples at 222 nm was obtained from the corresponding CD spectra (Fig.  3B). Because of the very low dimerization affinity, the equilibrium was studied at 4°C. The data fitted well a two-state monomer-dimer equilibrium with a free energy difference in the absence of denaturant ⌬G u H2O ϭ 5.7 Ϯ 0.6 kCal/mol in the standard (1 M) state at 4°C, and an m value (the increase in ⌬G u upon increasing the denaturant concentration by 1 M), m ϭ (Ϫ2.7) Ϯ 0.7 kcal/mol⅐M. The value of ⌬G u H2O corresponds to an equilibrium dissociation/unfolding constant K u ϭ 33 M at 4°C.
In summary, the results clearly indicate that an isolated protein containing amino acid residues 21-68 of p16.7 is able to form, albeit with low affinity, a dimeric coiled coil of about 30 residues in length, in a coupled folding and association process. Fig. 3C is described below. Protein p16.7C Forms High affinity Homodimers-To study whether the coiled coil of p16.7A is the main dimerization determinant, a mutant 16.7A gene was constructed encoding a protein, p16.7A4, that contains the same four Leu to Arg substitutions present in p16.7N4 (Fig. 1A). In vitro DSS crosslinking analysis of p16.7A and p16.7A4 (Fig. 2B) revealed two interesting features. First, the amount of cross-linked dimer species was much higher for these proteins as compared with that obtained with p16.7N at the same protein concentration, which suggests that p16.7A and p16.7A4 dimerize much more readily than p16.7N. Second, similar amounts of dimers were obtained for p16.7A and p16.7A4 after DSS cross-linking, indicating that the mutant protein p16.7A4 does form dimers in solution, despite having a disrupted coiled-coil region (see also below).
The oligomerization state of p16.7A (14.5 kDa) and p16.7A4 (14.65 kDa) was confirmed by analytical gel filtration. Both proteins behaved essentially as dimers, even at low concentrations. Their respective apparent molecular masses were 35.5 and 36.3 kDa, close to the calculated values (29.0 and 29.3 kDa, respectively). Analytical gel filtration was also used to estimate the order of magnitude of the dissociation constant. The elution volumes at the extremely low protein concentrations where the monomeric form would predominate could not be determined because of the very low signal-to-noise ratio. Thus, only the final part of the association curve could be traced. Nevertheless, a very good fitting to a dimer-monomer equilibrium was obtained (Fig. 4). Also, the fitting for p16.7A yielded an apparent molecular mass of 13.5 kDa for the monomeric form, which is very close to the calculated value (14.5 kDa). The dissociation constant thus obtained for p16.7A was about 20 nM. The actual value could be even somewhat lower, because of the possible dilution of the sample in small zone elution experiments (20). Most interestingly, the dissociation constant of p16.7A4 was also in the nanomolar range. These results can be summarized as follows. First, the dissociation constant obtained for p16.7A was more than 3 orders of magnitude lower than that of p16.7N. Second, no significant reduction in the dissociation constant was observed for p16.7A4 with respect to p16.7A, despite the fact that this variant could not dimerize through the coiled-coil domain. This constitutes strong evidence that a  region other than the coiled coil is responsible for the high p16.7A dimerization affinity.
To study whether the isolated C-terminal half of p16.7 is able to dimerize, plasmid pET-16.7C was constructed and used to purify protein p16.7C (Fig. 1A). In vitro cross-linking analyses (Fig. 2C) and analytical gel filtration studies (Table II) demonstrated that p16.7C forms dimers in solution. The values obtained for the weight average partition coefficient corresponded very well with that calculated for the dimeric form (Table II) and remained constant over the protein concentration range tested (from 50 down to 0.1 M). As no significant amount of the monomeric form could be detected at concentrations as low as 100 nM, the dissociation constant cannot be higher than about 10 nM (at 25°C). This value is in the same order of magnitude as that of p16.7A and p16.7A4 at the same temperature and is more than 3 orders of magnitude lower than the dissociation constant of p16.7N, even though the latter value was determined at a lower temperature (4°C).
The Coiled Coil Is Formed in the Context of p16.7A-The results described above show that the coiled-coil-mediated dimerization of p16.7N is a very low affinity process. High affinity dimerization of p16.7A through the C-terminal halves of the protein will restrict the mobility and orientation of the two N-terminal regions, which would increase the frequency of productive collisions and thereby shift the equilibrium of the N-terminal regions toward association in a parallel coiled-coil structure. As a consequence, the coiled coil might therefore be formed at lower concentrations and higher temperature in the complete p16.7A protein as compared with that observed in its isolated form present in p16.7N. To test this possibility, the helical content of p16.7A and p16.7A4 was estimated by CD. The calculated helical content of p16.7A and p16.7A4 (at 15 M and 25°C) were 59 and 27%, respectively ( Fig. 3C and Table I). The reduction in helical content in p16.7A4 (ϳ39 residues) corresponds roughly to the size of the coiled-coil region that would be disrupted by the mutations introduced. Thus, the N-terminal coiled coil is formed in the context of p16.7A, even at a low concentration and at 25°C.
The Coiled Coil and the C-terminal Region Are Separated by a Protease-sensitive Linker-The results described above show that the isolated p16.7 regions spanning residues 21-68 (present in p16.7N), and 63-130 (present in p16.7C) contain a low and high affinity dimerization domain, respectively, and indicate that each region constitutes a separate domain. To verify this latter assumption and to study whether a protease-sensitive linker connects these two domains, p16.7A was subjected to partial proteolysis using the nonspecific cleavage site prote-ase, proteinase K (see "Experimental Procedures"). Partial digestion of p16.7A gave two major proteolytic fragments with apparent molecular masses of ϳ7.5 and ϳ7 kDa (Fig. 5A), which, for simplicity, will be referred to as proteinase K fragment A and B, respectively. To gain insight in their nature, the tryptic peptide maps of these two proteolytic fragments were compared with the tryptic map of proteinase K-untreated p16.7A. Therefore, the two proteolytic p16.7A fragments as well as the complete p16.7A protein were digested in situ with trypsin, and the resulting peptides were subsequently analyzed by MALDI-TOF-type mass spectrometry. The results of these analyses are schematically presented in Fig. 5B. Comparison of the experimentally obtained peptide masses for p16.7A with the theoretically predicted ones after trypsin digestion of p16.7A allowed us to assign unambiguously eight tryptic peptides, which together covered almost the entire p16.7A protein.
Only three regions, ranging between four and eight residues, were not detected. The expected peptide sequence of the tryptic fragments was verified by RP-LC/ESI-IT mass spectrometry (not shown). Most interestingly, high levels of the tryptic fragments 5-7, which allocate to the C-terminal region (down to p16.7 residue 65), were detected in proteinase K fragment B. Reciprocally, high levels of the N-terminally located tryptic fragments 2 and 3 (up to p16.7 residue 60) were detected in proteinase K fragment A. Thus, partial proteolysis of protein p16.7A with proteinase K generated two main products corresponding to the N-terminal region encompassing the coiled-coil region, and the C-terminal region containing the high affinity dimerization domain, demonstrating that these are two separate structural modules that are connected by a short proteinase K-sensitive linker region. The inference that proteinase K cleaved preferentially in the p16.7 region spanning residues 61-65 (corresponding to the protein sequence "SIDK") is further supported by the observation that fragment 4 was only detected in the proteinase K-untreated p16.7A sample. Finally, these results demonstrate that the proteins p16.7N and p16.7C encompass the complete coiled-coil and C-terminal domain, respectively.
The C-terminal Region Constitutes the Functional Domain of Protein p16.7-Protein p16.7A can bind both ssDNA and dsDNA and has affinity for the 29-encoded TP (9, 10). We have tested whether these functional properties are specifically associated with the p16.7 C-terminal region. To exclude the possibility that the positively charged histidine tag of p16.7C (Fig. 1A) might be (partly) responsible for DNA binding or for the interaction with TP, protein p16.7C was cleaved with thrombin, and the resulting purified protein, p16.7Cb (Figs. 1A  and 6A), was used. Mass spectrometry analyses of p16.7Cb FIG. 4. Dissociation curve of p16.7A obtained by zonal analytical gel filtration. is the weight average partition coefficient at each protein concentration (19,20). The curve was fitted to a dimer-monomer equilibrium as described (19).
, where V t is the total column volume, and V 0 the void volume (20). exp are experimental values obtained in analytical gel filtration experiments. theor are theoretical values deduced from the true molecular weight of the protein in dimeric ( dimer ) or monomeric ( monomer ) form.
c Frontal elution was used.
confirmed that cleavage of the histidine tag had occurred at the correct place (see "Experimental Procedures" for details). In vitro DSS cross-linking experiments were performed to determine whether p16.7Cb (8.5 kDa) has affinity for TP (31 kDa). The cross-linked samples were subjected to SDS-PAGE followed by Western blot analysis using polyclonal antibodies against TP (Figs. 1A and 6B). Similar to results obtained with p16.7A (9), besides signals corresponding to TP monomers and dimers, an additional signal was detected when the sample contained both TP and p16.7Cb. This band, which has an apparent molecular weight corresponding to a p16.7Cb/TP heterodimer (ϳ 40 kDa), was also detected when the blot was stripped and re-used with antibodies against p16.7, confirming that it contains both TP and p16.7Cb (not shown). Thus, the C-terminal half of p16.7 is sufficient to interact with TP.
Gel retardation studies were performed to determine whether p16.7Cb has ssDNA and dsDNA binding capacity. Thus, the 297-bp right end fragment of the 29 genome was end-labeled and used directly or after heat denaturation in gel mobility shift assays. The results (Fig. 6, C and D) show that p16.7Cb and p16.7A have similar DNA binding capacities. Together, these results demonstrate that the basic functional properties of p16.7, binding to DNA and to 29 TP, reside in the C-terminal region of the protein.
The Functional Domain of Protein p16.7 Has a Helical Structure and May Be Evolutionarily Related to Eukaryotic Homeodomains-A sequence-based homology search was carried out for the functional domain of p16.7. Most interestingly, the sequences with the highest similarity (around 20 and 40% of identity and similarity, respectively) corresponded to DNA binding homeodomains, which are present in a large family of eukaryotic transcription factors (29,30). Homeodomains contain about 60 amino acid residues and are composed of three ␣-helices that are folded into a tight globular structure (31). An alignment between the C-terminal region of protein p16.7 and a consensus sequence based on 346 homeodomain sequences (Fig. 7) shows that various residues highly conserved in most homeodomains, and critical for either structure or function, are also conserved in the functional domain of p16.7. For instance, homeodomains contain four highly conserved residues (Trp-48, Phe-49, Asn-51, and Arg-53) in their DNA recognition helix III. Of these, Arg-53 is conserved in p16.7 (corresponding to Arg-113). In addition, the aromatic homeodomain residue Trp-48 corresponds to the aromatic tyrosine residue 108 in p16.7, and the apolar homeodomain residue Phe-49 corresponds to p16.7 residue isoleucine 109. Moreover, the invariant homeodomain residue Leu-16 is conserved in p16.7 (corresponding to Leu-76). In addition to sequence similarity, p16.7C and homeodomains FIG. 5. Protein p16.7A contains two structurally separate domains as assessed by partial proteolysis. A, Coomassie-stained SDS-PAGE of proteolytic fragments resulting from digestion of p16.7A (9 g) for 1 min at 30°C with the indicated amounts of proteinase K (see "Experimental Procedures"). C, protein standards; 0, undigested protein p16.7A (14.5 kDa). The two major proteolytic fragments of ϳ7.5 and ϳ7 kDa are indicated as fragment A and B, respectively. B, schematic overview of MALDI-TOF-type mass spectrometry analysis of the "in gel" trypsin digestion-generated products of intact p16.7A and those of the proteinase K-generated proteolytic fragments A and B. The positions of the theoretical trypsin cleavage sites and those that were actually cleaved by trypsin are indicated with black and red arrows, respectively, below p16.7A, which is depicted as a thick black bar. The eight tryptic fragments (numbered 1-8), detected by analysis of intact trypsin-digested p16.7A and their position with respect to the protein p16.7A sequence, are indicated with thin bars. The experimentally obtained molecular mass of all fragments deviated less than 0.5 Da from the calculated one, and their expected sequence was verified by RP-LC/ESI-IT mass spectrometry analysis (not shown). The following small tryptic fragments were not detected: KKQEAR (located between fragments 1 and 2), VVQR (located between fragments 2 and 3), and KLYRGSLK (extreme C-terminal fragment). The tryptic fragments corresponding to the N-and C-terminal part of p16.7 A are shown in red and blue, respectively. Although trace amounts of the C-terminal tryptic fragments were detected in the sample containing fragment A and, vice versa, N-terminal tryptic fragments in the fragment B sample, there was at least a 10-fold difference in the relative level of the respective peptides between both samples. Because of their similar molecular weights, minor amounts of one fragment might have contaminated the other one during excision of the gel. Fragment 4, shown in black, was only detected in the intact p16.7A processed sample. The SIDK sequence located between the N-terminal fragment 3 and C-terminal fragment 5 is shown enlarged at the top. The p16.7 coiled-coil (red) and the C-terminal containing region (blue), present in proteins p16.7N and p16.7C, respectively, are indicated above the bar representing p16.7A.
are also similar in their secondary structure, as deduced from the following observations. First, various secondary structure prediction programs invariably predicted three ␣-helical segments within the C-terminal domain of p16.7, the position and length of which corresponded neatly to those present in homeodomains (Fig. 7). Second, experimental evidence that is consistent with the above predictions was obtained by CD and NMR spectroscopy. The CD spectrum of p16.7C indicates a helical content of about 40% (Fig. 3C and Table I), very similar to that derived from the secondary structure predictions and also similar to that found in homeodomains. In addition, preliminary analyses by NMR spectroscopy of p16.7C confirm the helical content estimated by CD and the predicted positions of the helical segments in the sequence. 2 The homeodomain of the human Pbx1 protein, for which the three-dimensional struc-FIG. 6. The C-terminal half of p16.7 constitutes the functional domain. A, protein p16.7C was digested with thrombin, and the resulting p16.7Cb protein was purified as described under "Experimental Procedures." Purified p16.7C and p16.7Cb were subjected to SDS-PAGE and Coomassie Blue staining. The calculated molecular masses of p16.7C and p16.7Cb are 10.3 and 8.4 kDa, respectively. B, protein p16.7Cb interacts with the TP as assessed by in vitro cross-linking. Samples containing TP (2.5 M) with or without protein p16.7Cb (2.5 M) were treated with DSS after which they were subjected to SDS-PAGE followed by Western blotting using antibodies against TP. The monomer and dimer position of TP, and that of the heterodimer, are indicated. C and D, protein p16.7Cb is able to bind both ssDNA and dsDNA. Gel mobility assays were used to study the p16.7Cb binding activity to dsDNA (C) and ssDNA (D). Protein p16.7A was included as internal control. The 297-bp 29 right fragment, labeled at its 5Ј end, was incubated directly (C) or after heat denaturation (D) in the absence or presence of increasing amounts (0.37, 0.75, 1.  (34). Numbers between slashes indicate the amino acid position from the N terminus of the proteins. The three amino acid insertion at the C terminus of homeodomain helix I of the Pbx1 homeodomain (residues LSN) is labeled abc. Residues of the C-terminal half of p16.7 or the Pbx1 homeodomain are indicated with an asterisk when identical to one of the six most frequent amino acids occupying the corresponding homeodomain residue. Vertical bars and colons indicate residues that are identical or conserved, respectively, with the consensus homeodomain sequence. The yellow-boxed regions indicate ␣-helices. For Pbx1 homeodomain the ␣-helices are based on the three-dimensional structure (32,33). The indicated ␣-helices in the consensus sequence are a composite derived from the structures of the Antp, en, and MAT␣2 homeodomains (34). The indicated ␣-helices for p16.7C are based on the Sam-t02-dssp-ehl secondary structure prediction algorithm and agree with preliminary NMR data. 2 Residues enclosed in blue reflect hydrophobic residues that are important for formation of the hydrophobic core of homeodomains, and thus largely responsible for its tertiary structure. ture has been solved (32,33), is among the homeodomains sharing the highest level of similarity with the C-terminal region of p16.7 (Fig. 7). Based on the structure of the Pbx1 homeodomain, a tentative homology-based model for the tertiary structure of the functional domain of p16.7 was constructed ( Fig. 8; see "Discussion"). DISCUSSION Here we show that an isolated protein containing the predicted p16.7 coiled-coil region can form a dimeric coiled coil but with a very low affinity. Most interestingly, p16.7C, lacking the coiled-coil region, forms high affinity homodimers. Thus, the main dimerization interface in p16.7A is not located in the N-terminal domain but in the C-terminal domain. However, despite the low dimerization affinity of p16.7N, the coiled coil is formed in p16.7A. Dimerization of p16.7 through the C-terminal region will restrict the mobility and orientation of the two N-terminal segments, thus increasing the frequency of productive collisions and shifting the equilibrium toward association in a parallel coiled-coil structure. In the native protein, p16.7, the membrane anchor domain may restrict even more the relative mobility and orientation of the polypeptide chain and, hence, further facilitate formation of a coiled-coil.
The coiled-coil region, although not the primary dimerization domain, may be structurally relevant in several ways. It may be important to position the functional C-terminal domain at a certain distance of the cell membrane. Alternatively or additionally, the solvent-exposed residues of the coiled coil may be involved in interactions with other proteins.
The low dimerization affinity of the p16.7 coiled-coil domain is probably a consequence of the presence of both Arg-46 residues that in the dimeric, parallel coiled coil are predicted to be located on the hydrophobic face of each helix, near each other (Fig. 1B), and which might cause a mutual electrostatic repulsion. Protein p16.7 homologues are found in all 29-related phages studied so far (4,8). All of them are predicted to have a modular structure similar to that of p16.7, including a coiledcoil domain that invariably contains a charged residue at the position corresponding to Arg-46 of 29 p16.7. Thus, it is tempting to propose that the low dimerization affinity of the p16.7 coiled-coil domain may constitute an evolutionarily selected trait.
The results clearly show that the C-terminal domain is responsible for the high affinity dimerization of p16.7. At present, the residues of the C-terminal domain involved in the dimerization interface are unknown. However, a likely candidate is a solvent-exposed hydrophobic patch formed mainly by residues belonging to helix I in the predicted model of this domain (Fig.  8). Proteolytic analyses demonstrated that the coiled-coil region and the high affinity C-terminal part of p16.7 are two separate structural domains, which are separated by a proteinase K-sensitive linker. Protein p16.7C is able to interact with 29 TP, ssDNA, and dsDNA, and thus constitutes the essential functional domain of p16.7. This conclusion is also supported by the observation that, within the 29 family of phages, the C-terminal half of the p16.7 homologues has a higher level of conservation when compared with the N-terminal half (4).
A combination of experimental results and theoretical predictions allows us to propose that the C-terminal functional domain of p16.7 is evolutionarily related to a typical eukaryotic DNA binding domain, the homeodomain, present in hox proteins (34). This evidence includes the following: (i) a sequence similarity of about 40% with homeodomains ( Fig. 7) (no other protein in the data bases scored a higher identity); (ii) the conservation of many residues that are also conserved in homeodomains and that are critical for their structure and/or function; (iii) the predicted and experimentally confirmed helical content and position of the three helices found in all homeodomains; and (iv) the homology-based generation of a structurally sound model for the approximate tertiary structure of the C-terminal domain of p16.7 based on the Pbx1 homeodomain x-ray structure (Fig. 8).
Despite structural similarities there are also differences between p16.7C and homeodomains, which probably are related to their distinct functions. Whereas homeodomain proteins are transcriptional regulators playing critical roles in eukaryotic development and body plan formation, protein p16.7 is involved in membrane-associated organization of phage 29 DNA replication. Thus, although nearly all homeodomains bind exclusively dsDNA in a sequence-specific way, p16.7C is able to bind both dsDNA and ssDNA without apparent sequence specificity. These differences may be due to the oligomeric state of these proteins. Protein p16.7 functions as a homodimer. On the other hand, homeodomain proteins usually bind dsDNA with high affinity and specificity as a heterodimer consisting of two homeodomaincontaining proteins (35). DNA binding of homeodomains involves residues that make nonspecific interactions with the sugar phosphate backbone and other residues that make specific interactions with the bases (34). The fact that homeodomain residues involved in nonspecific DNA binding are preferentially conserved in p16.7C may also explain, at least in part, the lack of sequence specificity of p16.7. Finally, the functional differences between homeodomains and p16.7C could also be related to some difference in the tertiary structure.
Similar to 29, the genome of the eukaryotic adenoviruses consists of a linear dsDNA having a TP at each DNA 5Ј end and replicating via a protein-primed mechanism (for reviews see Refs. 6 and 36). The cellular transcription factor Oct1 plays an important role in initiation of adenovirus DNA replication by recruiting the DNA polymerase-TP heterodimer complex to the origin of replication. This recruitment occurs through specific contacts between the Oct1 homeodomain and the adenovirus TP (37). The fact that the homeodomain-like C-terminal region of p16.7 interacts with the 29 TP reflects a remarkable structure-function parallelism between Oct1 and p16.7.
In conclusion, the results presented here, together with those obtained previously (8 -10), show that dimeric p16.7 is com- FIG. 8. Homology-based model for the tertiary structure of p16.7C. The model is based on the tertiary structure of the homeodomain of the human Pbx1 protein (32). The prediction was done using the Swiss-Model program of the Expasy Molecular Biology Server. The C-terminal domain of p16.7 and the Pbx1 homeodomain are illustrated in yellow and violet, respectively. The model superimposed very well with the Pbx1 homeodomain structure, except in the loop between helices I and II, where the Pbx1 has a three-amino acid insertion. The Ramachandran plot of the model showed only two residues in a forbidden region, and the number of steric clashes between atoms was limited. The polar-apolar distribution was found reasonable, except that two polar side chains, especially arginine 36, were predicted to be buried. However, arginine 36 could form a buried salt bridge with glutamic 10 in the model. posed of three clearly distinguishable domains as follows: (i) a 20-amino acid-long N-terminal transmembrane domain that serves as an anchor for the viral replication components to be located at specific sites within the cell; (ii) an intermediate, ϳ30-amino acid-long coiled-coil domain, which could serve as a rigid spacer between the membrane and the functional domain, and/or to interact with other protein(s); and (iii) a functional C-terminal domain that may be evolutionarily related to homeodomains, which binds TP and DNA and acts, in addition, as the main determinant for p16.7 dimerization. This modular organization would effectively allow p16.7 to contribute to the organization of the 29 DNA-replication complex at the membrane of the infected cell.