The DUF328 family member YaaA is a DNA-binding protein with a novel fold

DUF328 family proteins are present in many prokaryotes; however, their molecular activities are unknown. The Esche- richia coli DUF328 protein YaaA is a member of the OxyR regulon and is protective against oxidative stress. Because uncharacterized proteins involved in prokaryotic oxidative stress response are rare, we sought to learn more about the DUF328 family. Using comparative genomics, we found a robust association between the DUF328 family and genes involved in DNA recombination and the oxidative stress response. In some proteins, DUF328 domains are fused to other domains involved in DNA binding, recombination, and repair. Cofitness analysis indicates that DUF328 family genes associate with recombination-mediated DNA repair pathways, particularly the RecFOR pathway. Purified recombinant YaaA binds to dsDNA, duplex DNA containing bubbles of unpaired nucleotides, and Holliday junction constructs in vitro with dissociation equilibrium constants of 200 – 300 n M . YaaA binds DNA with positive cooperativity, forming multiple shifted species in electrophoretic mobility shift assays. The 1.65-Å resolution X-ray crystal structure of YaaA reveals that the protein possesses a new fold that we name the cantaloupe fold. YaaA has a positively charged cleft and a helix-hairpin-helix DNA-binding motif found in other DNA repair enzymes. Our results demonstrate that YaaA is a new type of DNA-binding protein associated with the oxidative stress response and that this molecular function is likely conserved in other DUF328 family members.

DUF328 family proteins are present in many prokaryotes; however, their molecular activities are unknown. The Escherichia coli DUF328 protein YaaA is a member of the OxyR regulon and is protective against oxidative stress. Because uncharacterized proteins involved in prokaryotic oxidative stress response are rare, we sought to learn more about the DUF328 family. Using comparative genomics, we found a robust association between the DUF328 family and genes involved in DNA recombination and the oxidative stress response. In some proteins, DUF328 domains are fused to other domains involved in DNA binding, recombination, and repair. Cofitness analysis indicates that DUF328 family genes associate with recombination-mediated DNA repair pathways, particularly the RecFOR pathway. Purified recombinant YaaA binds to dsDNA, duplex DNA containing bubbles of unpaired nucleotides, and Holliday junction constructs in vitro with dissociation equilibrium constants of 200-300 nM. YaaA binds DNA with positive cooperativity, forming multiple shifted species in electrophoretic mobility shift assays. The 1.65-Å resolution X-ray crystal structure of YaaA reveals that the protein possesses a new fold that we name the cantaloupe fold. YaaA has a positively charged cleft and a helixhairpin-helix DNA-binding motif found in other DNA repair enzymes. Our results demonstrate that YaaA is a new type of DNA-binding protein associated with the oxidative stress response and that this molecular function is likely conserved in other DUF328 family members.
Bacteria confront diverse environmental stresses and mount defensive responses to ensure survival and growth. The accumulation of reactive oxygen species is a common stressor, causing cellular oxidative stress and eliciting a robust and multifaceted response in bacteria. In Escherichia coli, one of the primary sensors for oxidative stress is the OxyR transcription factor, which regulates the expression of ;20-40 genes in response to oxidation of its regulatory cysteine residues (1,2). Hydrogen peroxide is the dominant OxyR effector and therefore most of the genes controlled by OxyR are involved in the peroxide stress response (3). Although many of these downstream peroxide-responsive proteins have been extensively characterized in bacteria, some are surprisingly poorly understood.
One peroxide-responsive protein whose function is unclear is E. coli YaaA (YaaA Ec , locus b0006), a member of the DUF328/UPF0246 family. YaaA Ec is up-regulated in response to increased H 2 O 2 levels (1), and it decreases intracellular Fe 21 levels in E. coli by an unknown mechanism (4). Fe 21 is cytotoxic in the presence of hydrogen peroxide because of the production of highly reactive hydroxyl radicals via Fenton chemistry (5). Therefore, the tight control of Fe 21 is important during peroxide stress and helps explain why YaaA Ec is under the control of the OxyR regulon.
Gene expression array analysis showed that the transcription level of yaaA in E. coli is reduced in stationary phase and in anaerobic conditions and increased during recovery in LB broth from a stationary phase inoculum (6). This observation suggests that YaaA is important for the exponential phase of aerobic growth. In addition, the transcription levels of yaaA, yaaJ (a putative transport protein next to yaaA), and ten genes (fhuA, fhuF, fiu, cirA, entC, entB, exbD, fecI, fepB, and fepD) that are responsible for iron acquisition were considerably lower in exponential phase growth of an rpoS mutant compared with WT E. coli MG1655 (7), indicating that RpoS regulates YaaA in E. coli. The connection between iron, peroxide, and YaaA was bolstered by the recent report that a yaaA mutant of Klebsiella pneumonia has impaired survival in the presence of H 2 O 2 in iron-replete culture conditions, whereas there is no difference in survival of WT and yaaA mutant K. pneumonia in oxidative stress under low-iron conditions (8).
Bioinformatic analyses of the DUF328 family have indicated a link between these proteins and nucleic acid metabolism, particularly to the metabolism of the 7-deazaguanosine-modified nucleosides (9). A DUF328 domain is found in DpdC proteins involved in the formation of 29-deoxy-7-amido-7-deazaguanosine (dADG) from 29-deoxy-7-cyano-7-deazaguanosine (dPreQ 0 ) in vivo (10). The 7-deazaguanosine derivatives dADG and dPreQ 0 are recently discovered DNA modifications encoded by the dpd cluster found in a diverse set of bacteria. Of the 11 genes found in the Salmonella serovar Montevideo dpd cluster (dpdA-dpdK), dpdC probably encodes an enzyme that hydrolyzes dPreQ 0 to dADG (10 the queuosine-tRNA pathway but no dpd genes, suggesting that YaaA Ec is likely to have a distinct function. A further cellular connection between YaaA Ec and DNA repair was reported by Liu et al. (4), who observed enhanced mutation rates in DyaaA E. coli and severe growth defects for DyaaA recA56 and DyaaA polA1 double mutant strains that had intact oxidative stress response enzymes when grown in an aerobic atmosphere. RecA is a ssDNA-binding protein that plays an important role in DNA recombination and repair (11). The recA56 mutant is defective in ATP-regulated formation of nucleoprotein filaments and can partially impair recombination by WT RecA (12). PolA (DNA polymerase I) is an E. coli DNA polymerase that is involved primarily in DNA repair; the polA1 mutation results in increased mutagenesis and diminished base mismatch repair (13,14). The DyaaA recA56 and DyaaA polA1 double mutants displayed growth defects in aerobic growth conditions that otherwise required a severely impaired oxidative stress response to be observed in the DyaaA single mutant E. coli (4). In aggregate, these results suggest a functional link between YaaA Ec , oxidative stress, and DNA repair. Despite a growing body of data about the role of YaaA and other DUF328 proteins in bacterial oxidative stress defense, little is known about any DUF328 protein at the molecular level, including their structures or functions.
Here, we provide a direct molecular connection between the DUF328 family, specifically YaaA Ec , with DNA maintenance and the oxidative stress response in bacteria. Bioinformatic analysis shows a strong association between DUF328 proteins and genes involved in DNA recombination and oxidative stress response. We find that YaaA Ec directly binds to DNA in vitro, and the 1.65-Å resolution X-ray crystal structure of YaaA Ec shows that the protein possesses a new fold with a positively charged cleft and a helix-hairpin-helix (HhH) DNA-binding motif found in other proteins that bind DNA in a nonsequence-specific manner. As the first member of the DUF328 family to be structurally characterized, YaaA Ec establishes a new protein fold family that we propose be named the cantaloupe fold. Our results show that the DUF328 family comprises DNA-binding proteins that play important roles in DNA maintenance during oxidative stress, assigning a molecular function to this family of proteins.

Results
Comparative genomic analyses show that DUF328 genes are widespread and are fused or physically clustered with genes involved in oxidative stress and DNA repair YaaA Ec is a member of the DUF328 (IPR005583/H2O2_YaaD/ PF03883) family. Analysis of the taxonomic distribution of members of this family shows that it is widespread in bacteria ( Fig. 1), with homologs found in ;45.0% (10,565/23,458) of the reference organisms in the Genome Taxonomy Database (GTDB) (15). Phyla that harbor the most DUF328 family members are Campylobacterota (;87.4%, 222/254), Actinobacteriota (;83.9%, 2617/3118), Proteobacteria (;59.1%, 4512/7630), and Bacteroidota (;55.3%, 1573/2843), where the values in the parenthesis indicate the fraction of total reference organisms that contain DUF328 members. In contrast, DUF328 members are much less widespread in archaea, with homologs found in only ;1.3% (31/2,392) of the reference organisms in the GTDB (Fig. S1). Genes with related functions often cluster together in bacterial genomes. To capture gene neighborhood associations of the DUF328 family, we performed a sequence similarity network (SSN) analysis combined with genome neighborhood network (GNN) analysis (16,17). Using a stringent alignment score threshold of 90, YaaA Ec (UniProt ID P0A8I3) is partitioned into the largest cluster (cluster 1) that contains no sequence with an associated gene ontology term (Fig. S2), indicating that YaaA Ec is not closely associated with any sequence of annotated function.
The GNN analysis of genes encoding proteins from the major DUF328/PF03883 SSN clusters shows extensive clustering with genes involved in oxidative stress response (Figs. S3-S7). Of the nine extracted neighborhoods, four contain genes encoding alkyl hydroperoxide reductase/thiol-specific antioxidant proteins of the AhpC family (PF00578). Three contain genes encoding for Ruberythrin, a nonheme iron-containing metalloprotein involved in oxidative stress tolerance in anaerobic bacteria (PF02915). Two contain genes encoding oxidative dealkylation repair proteins of the AlkB family (PF13640) and members of the DUF3501 family (PF12007), previously linked to oxidative stress (18). No other functional area is as highly represented among the DUF328 GNN proteins, even though we see quite a few RNA metabolism and translation proteins (PF00270, PF04073, PF00587, PF00849, PF10150, PF14489, and PF14819).
GNN analysis links members of the DUF328 family to oxidative stress, and analysis of Rosetta stone-type protein domain fusions (19) indicates an association between DUF328 proteins and DNA maintenance and repair. Several DUF328 proteins are fused to domains from the GIY-YIG endonuclease superfamily (IPR000305) (Fig. 2). Nucleases of the GIY-YIG family are involved in DNA repair and recombination, transfer of mobile genetic elements, and restriction of incoming foreign DNA (20)(21)(22). The GIY-YIG proteins are so-named because they contain a domain of typically ;100 amino acids with two short motifs, "GIY" and "YIG", in their N-terminal regions. Additionally, DUF328 domains are found fused to multiple recombinase domains, a Zn 21 -b ribbon domain, a PadR helix-turn-helix motif, and an AlbA_2 DNA-binding domain (Fig. 2), underscoring the connection between DUF328 and DNA binding and repair. The gene fusion and neighborhood analyses corroborate the function of DUF328 proteins in oxidative stress first reported for the YaaA Ec (4) but add a link to DNA repair that was not obvious from previous studies.

DUF328 genes show strong cofitness with the RecFOR pathway for DNA repair
A measure of the joint contribution of two or more genes to organism survival is given by the cofitness, which is the Pearson correlation coefficient of their contributions to organismal fitness under different physiological conditions. Cofitness analysis derived from an extensive Tn-Seq analysis of 32 organisms in dozens of conditions (23) further supports an association between DUF328 genes, DNA recombination, and hydrogen peroxide detoxification. In the case of E. coli, the top cofitness associations are with peptidoglycan synthesis and cell division genes (alr, ftsN and amiA) with scores between 0.45 and 0.48, and then with a few DNA repair genes such as recG and uvrD (scores of 0.44 and 0.43, respectively). However, none of these scores were extremely high. The highest cofitness scores (.0.75) were found in other species, indicating that DUF328 genes are likely to function in the same pathway with the DNA replication and repair protein coding gene recF in Shewanella sp. ANA-3, with catalase/peroxidase coding genes in Acidovorax sp. GW101-3H11, and with unknown genes in Pseudomonas syringae pv. syringae B728a DmexB and Dechlorosoma suillum PS ( Table 1). The strong conserved cofitness (cofitness .0.6 and ortholog cofitness .0.6) indicates functional relationships of DUF328 genes with genes coding DNA repair and recombination proteins, such as recA and recN (Table 1). DUF328 genes from at least seven bacteria show strong conserved cofitness (ortholog cofitness .0.6 in Table 1 and Table S1) with components of the RecFOR pathway that initiates recombination-mediated DNA repair by processing ssDNA gaps and loading RecA onto the recombinogenic ends (24)(25)(26). The highest ortholog cofitness scores (.0.75) include genes encoding the DNA replication and repair protein recF in Caulobacter crescentus and Pseudomonas fluorescens FW300-N2E2 (Table  S1). Several other genes in the recFOR pathway (recA, recJ, recN, recO, and recR) also display high cofitness with DUF328 genes, pointing to a strong joint contribution to organism survival (Table 1 and Table S1). The recurrent cofitness association with exonuclease I gene (scbB) further corroborates the association between DUF328 proteins and ssDNA involved in recombination. Prokaryotic exonuclease I possesses a 39-59 single-stranded DNA exonuclease activity that is stimulated by ssDNA-binding protein and plays an important role in DNA repair (27). Also recurring in the DUF328 cofitness analysis is catalase, which defends against H 2 O 2 stress by converting H 2 O 2 to H 2 O and O 2 , reinforcing the connection between DUF328 proteins and peroxide stress.

Recombinant YaaA Ec binds DNA with nanomolar affinity
Consistent with strong bioinformatic evidence connecting DUF328 family proteins to DNA maintenance and repair, YaaA Ec is associated with a large amount of nucleic acid during purification of the recombinant protein from E. coli. We determined that the associated nucleic acid was DNA based on its selective degradation by DNaseI. Ultimately, nearly all of the DNA was removed from YaaA Ec by hydroxyapatite chromatography (see "Experimental procedures"), but strong anion exchange chromatography with a Q resin could not separate YaaA Ec from the DNA. The persistence of DNA in association with YaaA Ec suggests direct and tight binding.
YaaA Ec binding to DNA was measured using EMSA with a variety of defined DNA structures. YaaA Ec binds to double-stranded, bulge-containing, and Holliday junction DNA with comparable dissociation constants (K D ) of ;200-300 nM (Fig.  3). YaaA Ec also binds ssDNA, although with lower apparent affinity (Fig. 3A). Unlike many other DNA binding proteins, YaaA Ec shows no strong preference for specific DNA structures. Multiple shifted DNA bands are observed at higher concentrations of YaaA Ec in the EMSA, suggesting that these DNA constructs are capable of binding multiple copies of YaaA Ec (Fig. 3, A-C). The fraction of the DNA duplex with a 12-nt bubble bound by YaaA Ec was well-fitted by a cooperative binding model, giving a K D = 265 nM and a Hill coefficient of ;3.1 (Fig.  3D). All the EMSAs show evidence of positive cooperativity in YaaA Ec binding, and several distinct bands are seen at higher concentrations of YaaA Ec . Either multiple YaaA Ec molecules can bind a single DNA molecule directly, or one YaaA Ec binds to the DNA fragment and additional YaaA Ec molecules interact with the bound YaaA Ec in the YaaA Ec -DNA complex. Our data do not discriminate between these two possibilities, although the Hill coefficient of 3.1 suggests a cooperative YaaA Ec recruitment mechanism.
YaaA Ec role is PreQ 0 -independent DUF328 domains are present in DpdC proteins that are predicted dPreQ 0 nitrile hydratases (10), and DUF328 genes physically cluster with queF genes encoding PreQ 0 reductase (Fig. 1B). A parsimonious prediction is that YaaA Ec may be involved in the recognition and repair of PreQ 0 that has been incorporated into DNA by mistake. Indeed, tRNA guanine transglycosylase (TGT), normally involved in the synthesis of queuosine in tRNA, can also insert the PreQ 0 derivative PreQ 1 in DNA in vitro if the thymine base is replaced with uracil (28). The presence of uracil in DNA does occur and is usually corrected by a specific repair machinery (29). However, the addition of exogenous PreQ 0 did not affect growth of the DyaaA mutant in LB (Fig. S8). In addition, the growth defect caused by the deletion of yaaA in the Hpx 2 background (4) seemed to be improved by the addition of PreQ 0 and exacerbated by the deletion of the queD gene involved in the PreQ 0 synthesis pathway. Even if the difficulty of working with the Hpx 2 strain makes DUF328 members are novel DNA-binding proteins this last result within the margin of error (Fig. S9), it does not fit with a role of YaaA in repairing potential misincorporations of PreQ 0 in DNA. Additional studies in which uracil levels are increased and PreQ 0 levels are measured in DNA would be needed to totally rule out this hypothesis, however.

YaaA Ec possesses a new fold
We determined the X-ray crystal structure of YaaA Ec to 1.65-Å resolution using single-wavelength anomalous diffraction (SAD) phasing of the selenomethionine (SeMet)-substituted protein. YaaA Ec is a single-domain protein possessing a new fold comprising 12 a-helices and 14 b-strands with a core defined by a three-stranded parallel b-sheet (Fig. 4, A and B). Several of the secondary structural elements are short (e.g. aD, aG, aI, aJ, and b2, b3, b4, b8, b9) and may be differently classified by various secondary structure-detection algorithms. Overall, YaaA Ec is wedge-shaped with an apical cleft, resembling a slice of cantaloupe (Fig. S10). Helices aB and aC compose a HhH DNA-binding motif (see below) that is positioned opposite a b-strand motif comprising b11-14. The b-strand motif has an unusual abundance of solvent-exposed aromatic amino acids and several lysine residues. The cleft region that lies between the HhH and b-strand motifs is ;20 Å wide and is rich in basic residues, resulting in a highly positive electrostatic potential as calculated by the Adaptive Poisson-Boltzmann Solver (APBS) (30) (Fig. 4C). Residues in the cleft region display an approximate dyad symmetry (Fig. 4D) and are among the most highly conserved residues in the DUF328 family, including a 209 KKARG 213 motif that binds a chloride ion from the crystallization buffer. The positive electrostatic potential of the cleft, the rough dyad symmetry of several conserved residues in the region, and a width that matches the diameter of B-form DNA make this cleft a plausible contact surface for DNA.
YaaA Ec has multiple extended stretches of polypeptides adopting nonstandard structures from residues 6-22, 67-78, and 123-135, interrupted by a short b-strand from residues 10-12. These distinctive and atypical polypeptide structures define a broad area on the exterior of YaaA Ec that includes part of the cleft and also penetrate into the core of the protein (Fig.  5A). These regions are not loops as typically defined because they are not confined to the surface of the protein and meander through the core of the protein structure. Unlike loops, these atypically structured residues make extensive contacts with neighboring residues and are well-ordered judging from both the quality of the electron density (Fig. S11) and their low refined atomic displacement parameters (ADPs). The majority of the peptide atoms in these nonstandard secondary structural regions make hydrogen bonds with solvent or nearby amino acid sidechains rather than other peptides, again differentiating them from peptide groups in standard secondary structural elements such as a-helices and b-strands. Among these unusual contacts, a chain of Tyr-mediated hydrogen bonds extends across the conserved core of the region. This Tyr-rich hydrogen bond network includes a highly conserved SGXYG motif ( 112 SGLYG 116 in YaaA Ec ) that is located at a sharp turn between b-strands in the core of YaaA Ec (Fig. 5B) and makes extended contacts with residues 67-72 and 124-131. The high degree of conservation (Fig. S12) and clear structural importance of the SGXYG motif indicates that the surrounding unusually

YaaA contains a HhH DNA-binding motif
Although YaaA Ec possesses a new fold, the Phyre2 fold recognition server (31) identifies an HhH motif from residues 35-66 that is found in several DNA-binding proteins (Fig. 6A).  (37), and mitochondrial transcription elongation factor 2 (PDB 5OL9) (38), among others. In the structures that include bound DNA (6SXB and 2BGW), the HhH motif directly contacts the minor groove of DNA, suggesting that this is a possible DNA-binding mode for YaaA Ec as well (Fig. 6B). A preference for binding the minor groove is consistent with the lack of sequence-specific DNA binding by HhH domains (39). The putative DNA-binding regions in the apical cleft of YaaA Ec , including parts of the HhH motif, show the highest sequence conservation in the  DUF328 members are novel DNA-binding proteins DUF328 family and strongly support the functional significance of these regions (Fig. S13).

Discussion
In this study we show that DUF328 proteins, including the E. coli representative YaaA Ec , are DNA-binding proteins involved in the oxidative stress response. Prior work from the Imlay group (4) showed that YaaA Ec is important for regulating ferrous iron (Fe 21 ) levels in oxidatively stressed E. coli. Fe 21 is dangerous in the presence of H 2 O 2 because Fenton chemistry can generate the highly reactive hydroxyl radical, which is indiscriminately destructive (40). The hydroxyl radical is thought to result in bacterial death predominantly through DNA damage (41). Although our results do not explain how YaaA Ec regulates Fe 21 levels, they demonstrate that a key molecular activity of YaaA Ec is DNA binding, indicating that it plays an important role in DNA maintenance and repair under oxidative stress conditions. This is consistent with prior observations that yaaA deletion resulted in a mutator phenotype and a filamentous E. coli cell morphology (4), which is a common cellular manifestation of extensive DNA damage in bacteria (42). Moreover, the multiple and strong comparative genomic connections between YaaA Ec homologs and DNA repair suggest that this function is likely conserved throughout the DUF328 family.
Cofitness analysis reveals a strong connection between DUF328 proteins and the RecFOR pathway of single-stranded gap DNA repair. Although this indicates a connection between DUF328 proteins and DNA repair, it does not require that YaaA Ec operate in the RecFOR pathway. Cofitness indicates that DUF328 and RecFOR genes make joint contributions to organism survival, which may be because they operate in the same pathway or, alternatively, because they operate in different but functionally intersecting pathways. The other major DNA repair pathway in bacteria is the RecBCD pathway, which primarily targets and repairs double-stranded breaks in DNA (43). Although genes in this pathway did not associate with DUF328 genes in our comparative genomics study, it is possible that YaaA is involved in this or other DNA-maintenance proc-esses that partially overlap with the RecFOR pathway (44). A circumstantial argument against DUF328 proteins being direct participants in the RecFOR pathway is that this DNA repair pathway is present in archaea but DUF328 proteins are rare in that kingdom. Moreover, defects in the RecFOR pathway in E. coli result in phenotypes that do not depend on oxidative stress, whereas a phenotype is seen in DyaaA E. coli only in oxidative stress conditions (4). However, the selective protective role of YaaA Ec during oxidative stress may be driven by its increased expression under oxidative stress, rather than specificity for repair of oxidative DNA damage. Although the details of DUF328 proteins' role in DNA protection during oxidative stress remain to be determined, cofitness analysis indicates a probable contribution from the RecFOR pathway in diverse bacteria.
The crystal structure of YaaA Ec reveals a new fold for the DUF328 family (45). Given its resemblance to a slice of cantaloupe (Fig. S10), we propose that this be called the cantaloupe fold. This fold features a distinctive abundance of structured peptide stretches that are neither a-helix nor b-strand and an apical cleft demarcated on one end by a HhH DNA-binding domain and on the other by a b-strand motif. HhH domains are nonsequence-specific DNA-binding modules that are commonly found in proteins that digest, synthesize, or repair DNA (39). The apical cleft is highly basic, enriched in conserved residues, and a plausible site for DNA binding. It is unclear if DNA could simultaneously bind both regions and whether such bound DNA would be partially unwound or otherwise structurally perturbed. The minor groove-binding preference of HhH motifs suggests that at least some portion of the bound DNA should be a B-form double helix. The presence of two candidate DNA-binding sites may explain the presence of multiple shifted bands at higher concentrations of protein in the EMSA of YaaA Ec -DNA complexes (Fig. 3). These multiple bands indicate a YaaA Ec :DNA binding stoichiometry of greater than 1:1 at higher YaaA Ec concentrations, consistent with multiple YaaA Ec proteins binding cooperatively to the DNA. The Hill coefficient of 3.1 for YaaA Ec binding indicates strong positively cooperative DNA binding, suggesting either that protein-protein interactions enhance YaaA Ec affinity for DNA or that bindinginduced perturbations to DNA structure recruit additional YaaA Ec molecules. Determining how YaaA Ec interacts with DNA will be an important future direction for elucidating the molecular basis of DNA recognition by this new protein fold.
DpdC proteins are members of the DUF328 family that possess nitrile hydratase activity involved in PreQ 0 metabolism (9,10). The involvement of DpdC proteins in PreQ 0 -containing DNA is broadly consistent with our results connecting YaaA Ec to DNA repair, although we find that YaaA Ec is not involved in queuosine metabolism in E. coli. Regardless, the nitrile hydratase activity of DpdC proteins demonstrates that the DUF328 cantaloupe fold can support enzymatic activity, leaving open the possibility that YaaA Ec may have an undiscovered enzymatic function. An important avenue for future research will be to determine the mechanism by which YaaA Ec (and other DUF328) proteins regulate Fe 21 levels to protect cells from oxidative stressors (4), including whether YaaA Ec simply binds DNA or acts on it as a substrate. This report provides a Figure 6. YaaA Ec has a HhH DNA-binding motif. A, a ribbon diagram of YaaA Ec with the region that is structurally conserved in other DNA-binding proteins colored magenta. The classical HhH motif is defined by aB and aC (labeled). B, a superposition of YaaA Ec (gray, blue) with the HhH motif (yellow) and bound DNA from XPF-ERCC1 endonuclease (PDB 6SXB). This illustrates one potential DNA-binding mode for YaaA Ec and other DUF328 proteins. molecular activity for YaaA Ec and other DUF328 proteins that will inform additional research into the protective mechanisms of this new family and fold class of DNA-binding protein.

Experimental procedures Bioinformatics analyses
The protein sequences were retrieved from the NCBI Protein Database using the following accession numbers: YaaA Ec , Uniprot ID P0A8I3; DpdC, AJQ72467.1. Protein domain analysis was performed using HHpred by the MPI Bioinformatics Toolkit (46,47) against the Pfam database (48) using default parameters.
SSNs were generated with the Enzyme Function Initiative suite of webtools (16,17). SSNs were visualized using Cytoscape (52). The parameters used for the generation of the DUF328/IPR005583/PF03883 SSNs were as follows: for the whole family SSN, the input method "FASTA" (option C) was used, using YaaA Ec and IPR005583, UniRef90 with the minimum length filter of 100 and maximum length filter of 500. The alignment score threshold was 90. Sequences that share 90% or more identity are collapsed together to a single node to reduce the complexity for visualization. The obtained SSN was subjected to Enzyme Function Initiative Genome Neighborhood Tool analysis to obtain genome neighborhood diagrams.

Bacterial strains
All strains used in this study are listed in Table 2. E. coli strains were routinely grown in LB media (Tryptone 10 g/liter, yeast extract 5 g/liter, and NaCl 5 g/liter) at 37°C. When antibiotic selection was required, media were supplemented with 100 mg/ml ampicillin, 25 mg/ml chloramphenicol, or 50 mg/ml kanamycin. PreQ 0 was purchased from Ark Pharm.
Growth studies were adapted from those previously described (4). Anaerobic overnight cultures were diluted into anaerobic LB and grown for four to five generations to early log phase (optical density at 600 nm (A 600 ) of 0.15 to 0.20). The cultures were then diluted into aerobic LB of the same composition to an A 600 of '0.005. All cultures were grown at 37°C. LB medium was made 1 day prior to culturing and stored in the dark to avoid the photochemical generation of H 2 O 2 . It was transferred immediately after autoclaving to an anaerobic chamber (BACTRON anaerobic chamber), where it was stored under an atmosphere of 5% CO 2 , 10% H 2 , and 85% N 2 overnight or longer prior to use.

Expression and purification of YaaA Ec
The YaaA Ec gene (yaaA:b0006) was PCR-amplified from E. coli XL1-blue genomic DNA using Platinum II Taq DNA polymerase (Thermo Fisher Scientific) and primers that introduced 59 NdeI and 39 XhoI restriction sites. The YaaA Ec gene was subcloned between the NdeI and XhoI sites of pET15b (Novagen) and sequence-verified by dideoxy DNA sequencing. The validated YaaA Ec -pET15b construct was transformed by heat shock into chemically competent BL21(DE3) (Novagen) E. coli for protein expression. This construct expresses YaaA Ec with a thrombin-cleavable N-terminal hexahistidine tag, although the tag was difficult to remove from the purified protein with thrombin and was retained in the final purified protein. BL21 (DE3) cells containing YaaA Ec -pET15b were grown in LB medium with 100 mg/ml ampicillin at 37°C with shaking at 270 rpm to an A 600 of 0.2-0.3, at which point the culture was transferred to 20°C and incubated with shaking at 150 rpm for an additional two h. YaaA Ec expression was induced with the addition of isopropyl b-D-1-thiogalactopyranoside (Calbiochem) to a final concentration of 0.2 mM, and the culture was incubated at 20°C with shaking overnight. Chloramphenicol was added to a final concentration of 100 mg/ml two h prior to harvest to enhance protein solubility (56). Cells were harvested by centrifugation and cell pellets were frozen on liquid N 2 and stored at 280°C.
Recombinant hexahistidine-tagged YaaA Ec was purified using nickel-nitrilotriacetic acid metal affinity chromatography as previously described (57). Briefly, the cell pellet was lysed by the DUF328 members are novel DNA-binding proteins addition of lysozyme to a final concentration of 1 mg/ml followed by sonication. The crude lysate was clarified by centrifugation at 12,000 3 g and the supernatant was incubated with HIS-select nickel-nitrilotriacetic acid resin (Sigma-Aldrich), washed with wash buffer (25 mM HEPES, pH 7.5, 300 mM NaCl, and 25 mM imidazole), and eluted with elution buffer (25 mM HEPES, pH 7.5, 300 mM NaCl, and 250 mM imidazole). Fractions containing the YaaA Ec protein were determined by Coomassie-stained SDS-PAGE and the purest fractions were pooled. Despite bearing a thrombin cleavage site downstream of the N-terminal hexahistidine tag, incubation of the purified protein with thrombin did not efficiently cleave the hexahistidine tag. Therefore, the final protein retains the tag and the thrombin cleavage site, adding the N-terminal sequence MGSSHHHHHHSSGLVPRGSH before the first methionine of YaaA Ec . YaaA Ec co-purified with large amounts of nucleic acid that was identified as DNA based on its sensitivity to DNaseI on GelRed (Biotium)-stained agarose gel electrophoresis. The contaminant DNA was present at ;0.1 mg/mg of protein based on 260 nm/280 nm absorbance ratio and could not be effectively removed by passage over a High Q anion exchange column (Bio-Rad). The large majority of the DNA could be removed using hydroxyapatite chromatography (ceramic hydroxyapatite Type II resin, Bio-Rad) with gradient elution from 20 to 500 mM KPO 4 buffer, pH 7.2, over 10 column volumes. A small amount of residual nucleic acid remained in YaaA Ec even after hydroxyapatite chromatography as determined by GelRedstained agarose gel electrophoresis of the purified protein. After hydroxyapatite chromatography, the final YaaA Ec protein was dialyzed into storage buffer (25 mM Tris-HCl, pH 8.8, and 100 mM KCl) and was concentrated to 28 mg/ml (e 280 = 25,900 M 21 cm 21 ) with a 10-kDa molecular weight cutoff regenerated cellulose membrane spin concentrator at 4°C (Millipore). Before storage, 10 mM EDTA (final concentration) was added to enhance protein stability and the protein was flash-frozen in 50-200-ml aliquots in liquid N 2 and stored at 280°C until needed.
YaaA Ec DNA-binding assay Synthetic Holliday junction (X-0), single-stranded, and double-stranded oligonucleotide substrates were generated as previously described (58,59). DNA-binding assays were performed by incubating YaaA Ec protein with 4 pg of 32 P-labeled oligonucleotide substrates in DNA-binding buffer (30 mM HEPES, pH 7.5, 1 mM DTT, and 100 mg/ml BSA) on ice for 15 min. The protein-DNA complexes were analyzed on 5% native polyacrylamide gels. To compare YaaA Ec binding to either circular DNA, blunt-end linear DNA, or linear DNA with singlestranded overhangs, 0.5 mg of pUC19 plasmid (Thermo Fisher Scientific, SD00661) with or without SmaI or SalI digestion was incubated with the indicated amount of YaaA Ec in DNA-binding buffer on ice for 30 min prior to separation of protein-DNA complexes on 0.6% agarose gels. ImageJ (60) was used to quantify the band intensities to determine the relative amount of DNA that was free or in complex with YaaA Ec . These fractional binding values were plotted as a function of total YaaA Ec concentration and fit to a single-site binding model with positive cooperativity in Prism (GraphPad) to determine the K D and the Hill coefficient. DNA binding was assayed for multiple independent preparations of recombinant YaaA Ec to ensure consistency.

X-ray crystallographic structure determination of YaaA Ec
Crystallization conditions for purified hexahistidine-tagged YaaA Ec at 28 mg/ml in storage buffer were screened using commercial sparse matrix screens in sitting-drop 96-well plates. Protein and reservoir solutions were dispensed in 0.5-ml drops using a Gryphon liquid handling robot (Art Robbins Instruments). Initial needle-shaped crystals were optimized by manual screening using sitting-drop vapor equilibration and improved crystals grew in 100 mM NaCl, 100 mM sodium citrate, pH 4.6, 100 mM Na 2 HPO 4 , 75 mM NaH 2 PO 4 , and 15% PEG 8000. Further optimization with the Hampton additive screen identified 3% benzamidine-HCl as improving crystal morphology and size. An ordered benzamidine molecule makes hydrogen bonds to two Pro 73 residues related by crystallographic symmetry in the final structure. In addition, the phenyl ring of benzamidine participates in cation-p interactions with two symmetry-related Arg 77 residues, explaining why this additive aided the growth of diffraction-quality crystals.
Because YaaA Ec has no homologs of known structure, experimental phasing using SeMet SAD was used to calculate initial electron density maps. The pET15b-YaaA Ec expression construct was transformed into the methionine auxotroph E. coli strain B834(DE3) (Novagen) by heat shock of chemically competent cells. YaaA Ec was expressed in M9 minimal medium supplemented with 42 mg/liter of each L-amino acid except methionine and cysteine, 125 mg/liter each of adenine, guanosine, thymine, and uracil, 4 mg/liter thiamin, 4 mg/liter D-biotin, and 30 mg/liter L-selenomethionine (Acros Organics). SeMet-YaaA Ec was purified as described above and crystallized in the same condition as the native protein. Diffraction-quality crystals of SeMet-YaaA Ec were cryoprotected by serial transfer and brief soaking in reservoir solution supplemented with ethylene glycol in 5% increments to a final concentration of 15%. Crystals were mounted in nylon loops and cryocooled by rapid immersion in liquid nitrogen.
YaaA Ec crystallized in space group P2 1 2 1 2 1 with two protein chains in the asymmetric unit. Diffraction data extending to a 1.65-Å resolution were collected from a SeMet-YaaA Ec crystal measuring 35031003100 mm at beam line 7-1 of the Stanford Synchrotron Radiation Lightsource using the oscillation method. The incident X-rays were tuned to the K-edge of selenium (0.9788 Å) to maximize anomalous signal from the six SeMet residues in YaaA Ec . Inverse beam geometry was not used. Data were processed using HKL2000 (61), and significant anomalous signal was extended to a 2.45-Å resolution using a CC anom cutoff of 0.15 (62) as determined by Aimless (63) in the CCP4 suite (64). See Table S2 for data statistics.
SAD phasing using unmerged input reflections and local scaling was performed in PHENIX (65). The figure of merit for the initial experimental SAD phases was 0.39, and these phases were improved by density modification prior to autobuilding in PHENIX (66,67). The initial autobuilt model was manually improved in Coot (68), including addition of ordered waters. Five residues of the uncleaved N-terminal tag were ordered in the crystal and thus included in the model. Refinement was performed in PHENIX using a maximum likelihood target function based on anomalous amplitudes (i.e. Bijvoet mates were kept separate), optimization of the X-ray/stereochemical and ADP weights, and translation-libration-screw treatment of the ADPs (66). Benzamidine was modeled into unambiguous electron density and mediates a key crystal-packing contact, explaining why it was an effective crystallization additive. Final model validations were performed using Coot (68) and Mol-Probity (69). The final model has excellent stereochemical and clashscore statistics, with an overall MolProbity score of 0.95 (100 th percentile). See Table S2 for model statistics.

YaaA Ec structural analysis and display
The surface electrostatic potential of YaaA Ec was calculated using the APBS (30, 70) and using PDB2PQR (71) for atomic partial charge and radius assignments. Default values of solvent (78) and protein (2) dielectric constant, probe radius (1.4 Å), and temperature (298.15 K) were used. Structural figures were made with UCSF Chimera (72) and POVScript1 (73). Sequence conservation was mapped onto the structure of YaaA Ec using the ConSurf server (74).

Data availability
Data for Fig. 1, Fig. 2