The Crystal Structure of TREX1 Explains the 3′ Nucleotide Specificity and Reveals a Polyproline II Helix for Protein Partnering*

The TREX1 enzyme processes DNA ends as the major 3′ → 5′ exonuclease activity in human cells. Mutations in the TREX1 gene are an underlying cause of the neurological brain disease Aicardi-Goutières syndrome implicating TREX1 dysfunction in an aberrant immune response. TREX1 action during apoptosis likely prevents autoimmune reaction to DNA that would otherwise persist. To understand the impact of TREX1 mutations identified in patients with Aicardi-Goutières syndrome on structure and activity we determined the x-ray crystal structure of the dimeric mouse TREX1 protein in substrate and product complexes containing single-stranded DNA and deoxyadenosine monophosphate, respectively. The structures show the specific interactions between the bound nucleotides and the residues lining the binding pocket of the 3′ terminal nucleotide within the enzyme active site that account for specificity, and provide the molecular basis for understanding mutations that lead to disease. Three mutant forms of TREX1 protein identified in patients with Aicardi-Goutières syndrome were prepared and the measured activities show that these specific mutations reduce enzyme activity by 4–35,000-fold. The structure also reveals an 8-amino acid polyproline II helix within the TREX1 enzyme that suggests a mechanism for interactions of this exonuclease with other protein complexes.

The TREX1 enzyme processes DNA ends as the major 3 3 5 exonuclease activity in human cells. Mutations in the TREX1 gene are an underlying cause of the neurological brain disease Aicardi-Goutières syndrome implicating TREX1 dysfunction in an aberrant immune response. TREX1 action during apoptosis likely prevents autoimmune reaction to DNA that would otherwise persist. To understand the impact of TREX1 mutations identified in patients with Aicardi-Goutières syndrome on structure and activity we determined the x-ray crystal structure of the dimeric mouse TREX1 protein in substrate and product complexes containing single-stranded DNA and deoxyadenosine monophosphate, respectively. The structures show the specific interactions between the bound nucleotides and the residues lining the binding pocket of the 3 terminal nucleotide within the enzyme active site that account for specificity, and provide the molecular basis for understanding mutations that lead to disease. Three mutant forms of TREX1 protein identified in patients with Aicardi-Goutières syndrome were prepared and the measured activities show that these specific mutations reduce enzyme activity by 4 -35,000-fold. The structure also reveals an 8-amino acid polyproline II helix within the TREX1 enzyme that suggests a mechanism for interactions of this exonuclease with other protein complexes.
Processing of DNA ends is an important step in many DNA metabolic pathways such as replication, repair, and recombination. The 3Ј 3 5Ј exonucleases play a critical role in correcting fragmented, modified, mispaired, or even normal nucleotides to generate 3Ј termini suitable for downstream events. The drastic consequences that result from impaired 3Ј exonuclease activities underscore the importance of these enzymes for cell survival. Proofreading of DNA synthesis by 3Ј exonucleases is one of the major determinants of mutagenesis and genome stability and cells lacking this ability show a high incidence of cancers (1-3) (for review, see Ref. 4). Cells with defects in proteins containing 3Ј exonuclease activity, such as the Werner syndrome protein, MRE11, APE1, and p53 proteins display chromosomal instability, cell cycle checkpoint defects, and sensitivity to ionizing radiation (5)(6)(7)(8)(9).
The major 3Ј 3 5Ј exonuclease activity detected in human cell extracts is catalyzed by the TREX1 enzyme. The genes encoding the TREX1 and closely related TREX2 proteins have been identified and cloned (10,11), and the recombinant proteins confirm the robust catalytic nature of these enzymes (12,13). Amino acid sequence analysis reveals the TREX proteins belong to the DnaQ family of 3Ј 3 5Ј exonucleases; a structurally conserved group of exonucleases that span Archaea and bacteria to humans and includes such proteins as the exonuclease domains of Werner syndrome protein, the bacterial ⑀ subunit of DNA polymerase III (⑀ subunit), and exonuclease I (Exo I) 2 (14 -17). A hallmark of the DnaQ family exonucleases is three conserved sequence motifs known as Exo I, II, and III. These motifs contain four conserved acidic residues that participate in coordination of divalent metal ions required for catalysis. The TREX exonucleases are members of a DnaQ family subset that contain a His rather than a Tyr in the ExoIII motif, referred to as the ExoIII⑀ motif (18 -20).
Although the mammalian TREX1 and TREX2 proteins share about 40% amino acid sequence identity, there are distinct structural differences between the two that point to different biological roles for these proteins. The TREX1 protein contains a C-terminal domain of about 75 amino acids that is not present in the TREX2 protein. The amino acid sequences of the C-terminal domains of TREX1 proteins from different mammalian species are moderately conserved, but have no sequence identity to other proteins in the available data base. Additionally, the TREX1 amino acid sequence reveals the presence of a nonrepetitive proline-rich region that is also not present in the TREX2 protein. Furthermore, the TREX2 enzymes contain a conserved DNA binding loop positioned adjacent to the active site that has a sequence distinct from the corresponding loop in the TREX1 enzymes.
The non-processive autonomous nature of the TREX enzymes suggested that these proteins might serve a proofreading function for one of the multiple human DNA polymerases (10). However, Trex1 Ϫ/Ϫ mice show no increase in spontaneous mutation rates but rather display dramatically reduced survival and develop inflammatory myocarditis, indicating a previously unrecognized cellular role for this enzyme (21). Subsequent work has shown that mutations in the human TREX1 gene at the TREX1/AGS1 locus cause Aicardi-Goutières syndrome (22), and a genetic mapping study has further shown that the AGS1 locus overlaps with a locus for chilblain lupus, a form of cutaneous lupus erythematosus (23). These data implicate TREX1 mutations in Aicardi-Goutières syndrome and in systemic lupus erythematosus (22,24,25), consistent with the clinical overlap of these disorders whose pathogenesis is likely related to the accumulation of non-processed DNA replication and repair intermediates and a subsequent aberrant immune response. Recently, it was also discovered that the TREX1 protein, but not TREX2, participates in the SET complex and acts to rapidly degrade 3Ј ends of nicked DNA during granzyme A-mediated cell death (26). TREX1 was shown to function in concert with the NM23-H1 nuclease, pp32, HMG-2, and APE1 proteins and to directly associate with the SET protein. Unlike NM23-H1, however, the TREX1 exonuclease activity does not seem to be inhibited by its interaction with the SET protein and the TREX1 protein is not a substrate for granzyme A protease, as are the SET, APE1, and HMG-2 proteins.
The crystal structure of the human TREX2 protein revealed the unique dimeric nature of the TREX family exonucleases and provided the initial picture of a human DnaQ family member (27). Symmetry in the dimer positions the active sites of each monomer on opposite edges providing open access for DNA interactions and suggested a mechanism for its non-processive catalysis. However, the TREX2 structure was determined in the absence of divalent metal ions, or nucleotides that prevented a complete understanding of how these incredibly efficient enzymes interact with DNA to facilitate hydrolytic cleavage. In an effort to better understand the biological and exonucleolytic function of this important family of enzymes we have determined the x-ray structures of the TREX1 protein in a substrate complex with ssDNA and calcium ions, as well as, a product complex containing a 2Ј-deoxyadenosine monophosphate (dAMP) and manganese ions, both to a resolution of 2.1 Å. The structures reveal important protein-nucleotide interactions that participate in defining the specificity of 3Ј terminal nucleotide recognition, binding, and cleavage. Additionally, the structure of the non-repetitive proline-rich region within TREX1 suggests a potential mechanism for interactions with other proteins.

EXPERIMENTAL PROCEDURES
Protein Expression, Purification, and Crystallization-A gene fragment encoding amino acids 1-242 of the human or mouse TREX1 protein was expressed as a fusion with the maltose-binding protein (MBP) using a modified pMAL-C2 vector (New England Biolabs). The vector was modified to encode a polyhistidine sequence on the N terminus of MBP and to encode the rhinovirus 3C protease recognition site between the mbp and TREX1 genes. The plasmid was transformed into the BL21 Star (DE3) strain of Escherichia coli (Invitrogen) for overexpression. The cells were grown to an A 600 ϳ0.5 at 37°C and quickly cooled on ice to 17°C. After induction with 1 mM isopropyl ␤-D-thiogalactopyranoside the cells were allowed to grow for 15 h at 17°C. The MBP-mTREX1 fusion protein was purified by affinity chromatography using nickel-nitrilotriacetic acid resin (Qiagen). The MBP protein was removed from the fusion by treatment with PreScission Protease TM (GE Healthcare) at 4°C for 20 h. The mTREX1⌬C protein was separated from the MBP by cation exchange chromatography, and dialyzed into 20 mM Tris-Cl, pH 7.5, and 100 mM NaCl. Protein was concentrated to 5 mg/ml and frozen at Ϫ80°C until needed. All TREX1 mutant plasmid constructs were produced using a PCR site-directed mutagenesis strategy (44), and the constructs were confirmed by DNA sequencing.
Protein Crystallization and X-ray Data Collection-The protein was crystallized using the sitting drop vapor diffusion technique. Substrate or product complex was formed by mixing protein with ssDNA (GACG) in a molar ratio of 1:2 or dAMP in a 1:3 ratio. 2 l of protein complex at 5 mg/ml was mixed with an equal volume of reservoir solution and placed on a bridge above the 500-ml reservoir solution. The ssDNA complex was crystallized using 22% PEG3350, 6% 2-methyl-2,4-pentanediol, 50 mM HEPES, pH 7.2, 50 mM NaCl, 2 mM CaCl 2 , and the dAMP complex crystallized using 22% PEG 3350, 100 mM MES, pH 6.5, 2 mM MnCl 2 , and 500 M nucleotide monophosphate. Crystallization experiments were carried out at 20°C and crystals appeared within 2 days. Prior to data collection crystals were soaked in reservoir solution containing 15% glycerol for 2 min in preparation for freezing. Crystals were then mounted in a nylon loop and flash frozen in liquid nitrogen. Crystals of the TREX1-dAMP complex belong to space group R3 with unit cell dimensions a ϭ b ϭ 119.7 Å, c ϭ 83.3 Å, ␣ ϭ ␤ ϭ 90°, ␥ ϭ 120°. Crystals of the TREX1-ssDNA complex belong to space group P2 1 with unit cell dimensions a ϭ 64.8, b ϭ 57.1, c ϭ 68.5 Å, ␣ ϭ ␥ ϭ 90°, ␤ ϭ 107.5° (Table 1). Two TREX1 molecules occupy the asymmetric unit of each crystal form (V M ϭ 2.3 Å 3 /Da for dAMP complex and 2.1 Å 3 /Da for ssDNA complex).
Phasing and Refinement-X-ray data were collected using CuK ␣ radiation on a MicroMax 007 generator and a Saturn 92 CCD detector (Rigaku). Intensity data were processed with the program d*TREK (28). Phases for the data were obtained by molecular replacement using the program PHASER (29) and a monomer of the human TREX2 protein (Protein Data Bank code 1Y97) as a search model (42% identity) (27). The TREX1-dAMP-Mn and TREX1-ssDNA-Ca models were built using the program COOT (30), the structures refined without restraints of non-crystallographic symmetry using the programs CNS and Refmac5 (31,32), and the validity of the refined structure confirmed by simulated annealing-omit procedures (31). The change in the free R-factor was monitored at each step in refinement, as well as the inspection of stereochemical parameters with the programs PROCHECK (33) and ERRAT (34). The models converged with a final R-factor of 20.5% (R free ϭ 25.1%) for the TREX1-ssDNA complex and 20.4% (R free ϭ 24.8%) for the TREX1-dAMP complex using all observed x-ray data measurements in the resolution range 50-2.1 Å. A Ramachandran plot shows that greater than 91% of all residues in the models have and angles in the most preferred regions with no residues in the disallowed regions. Coordinates have been deposited for mTREX1-ssDNA-Ca (2OA8) mTREX1-dAMP-Mn (2IOC) with the indicated accession codes.

RESULTS AND DISCUSSION
Structure of the TREX1 Protein-We crystallized a truncated form of the mouse TREX1 protein that lacks 72 amino acids at the C terminus (mTREX1⌬C) in complexes with ssDNA-Ca 2ϩ and dAMP-Mn 2ϩ . Full-length recombinant TREX1 protein is insoluble and prone to proteolytic degradation under a variety of expression and purification conditions. The truncated TREX1 protein is able to form stable dimers and possess enzymatic activity nearly identical to the full-length protein. 3 Manganese has been shown to be able to substitute for magnesium in the nucleolytic activity of TREX1 (12), whereas calcium inhibits enzymatic activity. The structures were determined by molecular replacement using the human TREX2 protein as a search model (Fig. 1, Table 1). The correct solutions each provided an initial R-factor of about 45% after placement of the two TREX1 monomers and a round of rigid body refinement, with good electron density for most of the residues in the asymmetric unit. Density for the nucleotides and two metal ions (not in search model) was also present in the active sites of both structures (Fig. 2). The position and identity of the metal ions was confirmed by calculation of an anomalous difference map that showed only the density for the metal ions at a contour level of 5 in each case. The crystallographic models of the TREX1 dimers have been refined to a final R-factor of 20.5% (R free ϭ 25.1%) and 20.4% (R free ϭ 24.8%) for the ssDNA complex and dAMP complex, respectively, using all x-ray data to a resolution limit of 2.1 Å ( Table 1).
The TREX1 structure shows the dimeric nature of the exonuclease (Fig. 1). Each TREX1 monomer consists of a mixed ␣/␤-fold with 5 antiparallel ␤-strands surrounded by 9 ␣-helices, closely structurally resembling the ⑀ subunit and Exo I proteins, both members of the DnaQ family of 3Ј exonucleases (SCOP data base) (35). The TREX1 monomers interact with each other along the outermost ␤-strand (␤3) to form an extended, central antiparallel ␤-sheet that stretches through the length of the dimer, as seen in the TREX2 protein structure. TREX1 has an extensive dimer interface that involves 1650 Å 2 of buried surface area from each monomer (ϳ15% of monomer surface) with stabilizing contributions from hydrogen bonds, van der Waals contacts, and hydrophobic interactions.
DNA Binding and Substrate Specificity-The TREX1 protein was crystallized in both substrate and product complexes. The ssDNA is bound in the TREX1 active site mostly through a combination of sequence independent hydrogen bonding and hydrophobic interactions (Fig. 3a). The phosphate oxygens of the 3Ј nucleotide contribute to the coordination of two divalent metal ions along with carboxylate oxygens of Asp 18 , Glu 20 , Asp 130 , and Asp 200 and several water molecules. A superposition of the substrate and product complexes shows minimal change in the protein structure between the two complexes (root mean square deviation ϭ 0.6 Å). The 3Ј nucleotide of the ssDNA is bound in the same orientation as the dAMP in the product complex with only a minor change in orientation of the phosphate group of the dAMP (Fig. 3b), as similarly seen in the substrate and product complexes of the exonuclease domain of the E. coli DNA polymerase I protein (36).
The structures of the substrate and product complexes reveal a nucleotide binding pocket formed by strand (␤1) and helix (␣2) within the active site that provides interactions with the base, deoxyribose sugar, and phosphate moieties of the 3Ј end of DNA that in turn provides substrate specificity and spatial positioning of terminal nucleotides for efficient catalysis (Fig. 3a). The adenine base of the bound nucleotide These specific interactions within the nucleotide binding pocket are significant because they explain the inability of the TREX1 enzyme to hydrolyze bases that have modified 3Ј termini. DNA containing a phosphate, phosphoglycolate, or tyrosyl residues at the 3Ј terminus is resistant to digestion by the TREX1 enzyme (37) because such a bulky modification sterically hinders the proper positioning of the nucleotide in the binding pocket and thereby prevents catalysis. Likewise, an abasic nucleotide is resistant to hydrolysis by TREX enzymes (data not shown) presumably due to the missing interactions of the base within the cleft that help anchor the terminal nucleotide in the correct orientation for hydrolysis.
Efficient hydrolysis of 3Ј nucleotides by the TREX enzymes seems to be, in part, a consequence of their ability to interact efficiently with DNA. The apparent high affinity of TREX1 and TREX2 for DNA, compared with other DnaQ exonucleases such as ⑀ (38, 39), has been attributed to a flexible loop between helices ␣6 and ␣7, adjacent to the  active site, that contains conserved arginine residues (12,27). In the human TREX2 protein the loop contains 3 arginines shown to contribute to tight DNA binding (Arg 163 , Arg 165 , and Arg 167 ). Mutation of any of the individual arginines to alanine increases the apparent K m value for DNA of TREX2 by about 5-fold and as much as 100-fold when all three are mutated. By contrast, the corresponding loop in TREX1 has only a single conserved arginine residue (Arg 174 ), and yet the apparent K m value of TREX1 for single-stranded DNA (ϳ10 nM) is about 10-fold lower than that of the TREX2 protein (ϳ100 nM) with nearly identical k cat values measured for the two enzymes (ϳ12 s Ϫ1 ) (12). There are several possible explanations for this apparent paradox. The first is that there are other residues outside the ␣6 -␣7 loop in the TREX1 protein contributing to DNA binding. A comparison of the TREX1 and TREX2 protein structures shows that TREX1 utilizes residue Arg 128 , located at the edge of the active site, to participate in DNA binding through hydrogen bond interactions with the O 6 and N 7 of the 5Ј guanine base that is rotated in a syn conformation (Fig. 3a). Although the arginine is interacting specifically with the guanine, it could potentially make similar hydrogen bond interactions with the exocyclic oxygen or amine of any base, and might contribute to destabilizing double-stranded DNA to provide single-stranded substrate for the enzyme active site. The corresponding residue in the TREX2 protein is Asp 121 , which has a much shorter side chain and therefore is unable to participate in similar interactions with the DNA. An additional explanation for the apparent differences between TREX1 and TREX2 in DNA binding affinity might result from entropic differences during the DNA binding event. The DNA binding loop in TREX2 was proposed to be flexible based on the lack of electron density in the structure for 9 residues including the 3 arginines (27). Similarly, in the TREX1 structure much of the same loop is disordered, however, the proximal part of the loop containing the conserved Arg 174 is present in the structure. If DNA binding requires an  ordering of this loop, particularly the region containing the arginine residues then the entropic cost of binding might be lower for TREX1 as compared with TREX2. TREX1 Protein Interactions-The discovery that the TREX1 enzyme participates in the SET complex provides a framework for beginning to understand TREX1 protein interactions (26). An interesting feature of the TREX1 protein structure is a polyproline II helix (PPII) formed by a non-repetitive prolinerich region that contains six prolines within an 8-amino acid stretch (Fig. 4). The number of crystal structures of proteins containing PPII helices with more than five C ␣ atoms is very low, presumably because they are often flexible regions that are hard to crystallize (40). The left-handed, three-sided helix, comprised of residues 54 -62 in mTREX1, is situated near the dimer interface. Symmetry places the PPII helix of the other monomer about 20 Å away on the opposite edge of the same face of the dimer (Fig. 4b). Among the TREX enzymes and DnaQ exonucleases this motif is unique to the TREX1 protein with the corresponding region in the TREX2 enzyme being a ␤-hairpin. Repetitive proline-rich sequences are found in many proteins and are widely thought to function as docking sites for signaling modules such as Src homology 3, WW (named for a conserved Trp-Trp motif), and Enabled/VASP homology (EVH1) domains (41). These interactions are often relatively weak, requiring more than one proline-rich region for stable binding. The positioning of the two PPII helices on the same face of the dimer appears ideal for allowing interactions with two of these small interacting modules without occluding the active sites and is likely to be a key mode of protein-protein interaction for the TREX1 protein. Direct interactions between the TREX1 and SET proteins might indicate that the SET protein contains one or more of the proline-rich region interacting domains.
Another important protein interaction for the TREX proteins is the dimer interface. Based on conserved residues at the interface, it has been proposed that the dimeric structure of all TREX exonucleases is conserved among species that express this enzyme (27). This suggests that the dimeric nature of the protein plays a critical role in its biological or enzymatic function. Trex1 null mice display pathological changes in lymphoid organs that suggest a role for this enzyme in processing of DNA termini during a subset of V(D)J recombination events, receptor editing, or class switch recombination. The need for simultaneous processing of two DNA termini prior to end joining or other DNA recombinational/repair events might have necessitated the evolution of a dimeric exonuclease. This structure of the TREX1-DNA complex with active sites on opposite sides of the dimer shows how the enzyme can accommodate the simultaneous processing of two 3Ј ends.
TREX1 and Disease-The recent findings by Crow et al. (22) demonstrating TREX1-inactivating mutations cause Acardi-Goutieres syndrome (AGS) prompted our investigation into the effects of these specific mutations on the TREX1 activity. Five distinct mutations within the TREX1 gene were identified as the underlying cause of AGS (22), and a form of systemic lupus erythematosus, familial chilblain lupus, is also known to map to the same genetic locus (23). AGS is a severe neurological brain disease that shares many symptoms with Cree encephalopathy, microcephaly, intracranial calcification syndrome, and systemic lupus erythematosus (42). AGS symptoms also closely mimic those of in utero viral infections (43). Two of the identified mutations in the TREX1 gene produce truncated or frame-shifted proteins at amino acids 164 and 20, respectively, which are presumably non-functional, whereas three others are point mutations resulting in Arg 114 to His (R114H), Val 201 to Asp (V201D), and an insertion of Asp at residue 201 (D201ins). The Arg 114 and Val 201 residues are conserved in all known TREX1 sequences.
To provide insight into the molecular bases for this disease, we used our mouse TREX1 structure in conjunction with biochemical assays of both human and mouse TREX1 enzymes to understand the functional consequences of these mutations on the

TABLE 2 Activities of TREX1 enzymes
Activities for human (h) and mouse (m) enzymes were determined from reactions containing the appropriate concentrations of TREX1 between 7.6 and 311,000 pM and 50 nM 5Ј-labeled 30-mer ssDNA. Products were separated on urea-polyacrylamide gels and quantified as described previously (27). D201ins mutation is the insertion of an aspartate at amino acid 201 (see text for details).

Enzyme
Activity  S1). Relative to wild-type enzyme, the D201ins mutation had the most dramatic change in enzymatic activity with a 35,000fold decrease for the human protein (100,000-fold for mouse), whereas the mutation of V201D had only a 4-fold reduction in activity (3-fold for the mouse). Val 201 is located adjacent to the active site in the helix that contains two conserved catalytic residues, His 195 and Asp 200 (Fig. 5). The resulting difference in the activities of V201D and D201ins is likely due to the fact that the insertion mutation acutely disrupts the helix and prevents proper positioning of the catalytic residues for nucleotide hydrolysis, whereas the substitution mutation is better accommodated by the helix. Mutation of R114H results in a 50-fold reduction of the human TREX1 exonuclease activity when compared with wild type enzyme (40-fold for the mouse). Residue Arg 114 lies at the TREX1 dimer interface where it hydrogen bonds with two backbone carbonyl oxygens of the opposing monomer. Mutation to histidine would disrupt these hydrogen bonds and locally destabilize the dimer interface. Although the R114H mutation is relatively distant from the active site (Ͼ15 Å) the reduction in activity by this mutation may be a further indication that the dimeric nature of the TREX1 enzyme is required for catalytic competency. The extremely broad range in catalytic activities among these three mutant forms of TREX1 makes it surprising that all yield similar human pathologies. The exonuclease activity measured in vitro for the TREX1 V201D mutation indicates that a 4-fold reduction in TREX1 activity is sufficient to elicit the severe neurological AGS phenotype in the homozygous individual. This contrasts with the 35,000-fold reduction in activity of the TREX1 D201ins mutation indicating an essentially inactive TREX1 dimer. These mutational data suggest that the high level of activity observed for this enzyme (10,12,27) has been biologically fine tuned and even minor changes in the activity can have dramatic physiological consequences. Alternatively, multiple mechanisms must be considered to underlie the observed clinical phenotypes observed in patients carrying various TREX1 mutations that might disrupt catalytic activity or perhaps the ability of TREX1 to appropriately interact with additional cellular partners, translating to an essential loss in activity.