Structural Basis of Substrate-binding Specificity of Human Arylamine N-Acetyltransferases*

The human arylamine N-acetyltransferases NAT1 and NAT2 play an important role in the biotransformation of a plethora of aromatic amine and hydrazine drugs. They are also able to participate in the bioactivation of several known carcinogens. Each of these enzymes is genetically variable in human populations, and polymorphisms in NAT genes have been associated with various cancers. Here we have solved the high resolution crystal structures of human NAT1 and NAT2, including NAT1 in complex with the irreversible inhibitor 2-bromoacetanilide, a NAT1 active site mutant, and NAT2 in complex with CoA, and have refined them to 1.7-, 1.8-, and 1.9-Å resolution, respectively. The crystal structures reveal novel structural features unique to human NATs and provide insights into the structural basis of the substrate specificity and genetic polymorphism of these enzymes.

Arylamine N-acetyltransferases (NATs, EC 2.3.1.5) 2 catalyze the acetyl-CoA-dependent N-acetylation of primary aromatic amines and hydrazines, as well as the O-acetylation of their N-hydroxylated metabolites, thereby influencing the biological activity and toxicity of this class of chemicals (1)(2)(3). NAT enzyme activity therefore plays an important role in determining the duration of action and pharmacokinetics of aromatic amine-containing drugs used in clinical therapy, as well as in influencing the balance between detoxification and metabolic activation of aromatic amine procarcinogens (4). In the latter regard, N-acetylation reactions are considered to be protective, because the resulting arylacetamide derivatives are chemically stable, whereas O-acetylation of hydroxylamines produces ace-toxy esters that can spontaneously decompose to electrophilic, DNA-binding nitrenium ions (5). Thus a better understanding of the structural features of NATs that contribute to their relative abilities to catalyze such reactions for various amine substrates may be of considerable predictive value in optimizing drug development and biomedical toxicology.
Two NATs, NAT1 and NAT2, have been annotated in the human genome. Each of these enzymes is genetically variable in human populations, with variable activity of NAT2 being responsible for the classic acetylation polymorphism that was discovered over half a century ago with the advent of isoniazid therapy for tuberculosis (6 -8). In addition to isoniazid, the disposition of several other therapeutically useful drugs is affected by defective NAT2 function, including hydralazine, phenelzine, procainamide, and some of the sulfonamide antibacterials. NAT1 is highly homologous to NAT2 yet kinetically distinct, such that certain aromatic amines (such as isoniazid and sulfamethazine) are preferentially acetylated by NAT2, whereas others (such as p-aminosalicylic acid and p-aminobenzoic acid) are selective substrates for NAT1. NAT1 is also genetically variable in human populations. Numerous epidemiological studies have reported associations between both NAT1 and NAT2 variation and the occurrence of cancers related to exposure to aromatic amines (such as 4-aminobiphenyl and 2-naphthylamine) in the environment, although in many instances such findings have been contradictory.
NAT enzyme orthologues are also present in most (but not all) mammalian species, as well as in prokaryotes. Although the crystal structures of some prokaryotic NAT enzymes have been reported, eukaryotic NATs have so far eluded any attempts at the solution of a crystal structure. Here we present high resolution crystal structures of both human NATs, including NAT1 in complex with the irreversible inhibitor 2-bromoacetanilide, a site-directed NAT1 mutant, NAT1_F125S, and a NAT2-CoA complex. By comparing our structures with known prokaryotic structures, we provide evidence for novel structural features of human NATs that are absent in bacterial enzymes, including an insertion that produces a significant difference in the structure of the carboxyl terminus of the eukaryotic enzymes. We also provide the first complete picture of the distinct substrate selectivity of human NAT1 and NAT2, as well as key features of the naturally occurring variants of NAT1 and NAT2 that may explain the structural basis of their defective function.

EXPERIMENTAL PROCEDURES
Protein Expression and Purification-DNA fragments encoding human NAT1 and NAT2 were amplified by PCR and subcloned into modified cloning vectors (pET28a-LIC and pET28a-MHL, respectively), downstream of the polyhistidine coding region. Human NAT1 and NAT2 proteins were expressed in Escherichia coli BL21(DE3) codon plus RIL strain (Stratagene). E. coli cells carrying the NAT1_F125S mutant expression plasmid were grown in M9 minimal medium supplemented with 50 mg/liter kanamycin and induced by isopropyl-1-thio-D-galactopyranoside, final concentration of 1 mM, in the presence of 50 mg/liter selenomethionine and incubated overnight at 15°C. Cells carrying wild-type NAT1 and NAT2 expression plasmids were grown at 37°C in Terrific Broth, induced by addition of isopropyl-1-thio-D-galactopyranoside, and incubated overnight at 15°C.
For purification of NAT1_F125S, the cells were harvested by centrifugation and lysed by passing through a microfluidizer (Microfluidics Corp.). The lysate was loaded onto a HiTrap Chelating column (Amersham Biosciences) charged with Ni 2ϩ . The column was washed with 10 column volumes of 20 mM Tris-HCl buffer, pH 8.0, containing 500 mM NaCl, 50 mM imidazole, and 5% glycerol, and the protein was eluted with elution buffer (20 mM Tris-HCl, pH 8.0, 500 mM NaCl, 250 mM imidazole, 5% glycerol). Eluted protein was treated with iodoacetamide at room temperature for 30 min. The protein was then loaded onto a Superdex 200 column (26 ϫ 600 mm, Amersham Biosciences) and equilibrated with 20 mM Tris-HCl buffer, pH 8.0, and 150 mM NaCl. Thrombin (Sigma) was added to combined fractions containing NAT1 to remove the His tag. The protein was further purified to homogeneity by ion-exchange chromatography. Wild-type NAT1 was purified as described above, except the eluted protein from the nickel column was treated with 2-bromoacetanilide. For purification of NAT2, the eluate from the nickel column was dialyzed against 20 mM Tris-HCl, pH 8.0, 250 mM NaCl, 5% glycerol in the presence of tobacco etch virus protease. The protein was further purified to homogeneity by ion-exchange chromatography.
Crystallization and Data Collection-Purified NAT1_F125S was crystallized using the hanging drop vapor diffusion method at 20°C by mixing 1 l of the protein solution with 1 l of reservoir solution containing 26% polyethylene glycol 4000, 0.2 M sodium acetate, 0.1 M Tris-HCl, pH 7.8. Wild-type NAT1 protein was crystallized by the same method in a solution containing 30% polyethylene glycol 4000, 0.2 M sodium acetate, 0.1 M Tris-HCl, pH 8.5. Purified NAT2 protein was complexed with CoA at a 1:10 molar ratio of protein to CoA and crystallized using the sitting drop, vapor-diffusion method by mixing 1 l of protein solution with 1 l of reservoir solution containing 2.5 M ammonium sulfate, 0.1 M Tris-HCl, pH 8.5.
Diffraction data for crystallographic model refinement were collected as shown in Table 1. For an isomorphous crystal of the NAT1_F125S, data sets at wavelengths of 0.9791 and 0.9795 Å were also collected at the National Synchrotron Light Source, beamline X25. All diffraction images were reduced to structure factor amplitudes using the HKL2000 software suite (9).
Structure Determination and Refinement-The selenium substructure of NAT1_F125S was solved by multiple wavelength anomalous diffraction (10) with the program SHELXD (11), and phasing was performed using SHELXE (12). The structures of the NAT1-acetanilide and NAT2-CoA complexes were solved by molecular replacement with the program PHASER (13) and a nearly refined model of NAT1_F125S as a search template. Initial model building was carried out with ARP/wARP (14). Restrained refinement, geometric validation, and manual rebuilding were iterated for several cycles using REFMAC (15), MOLPROBITY (16), and COOT (17), respectively.
Docking Simulations-The modeling studies presented in this report were conducted as follows. The structure of NAT1 with the covalently bound 2-bromoacetanilide inhibitor was used after the removal of the acetanilide molecule and restoration of the terminal thiol motif of C68. Similarly, the CoA cofactor was removed from the NAT2 structure. Hydrogen atoms were added to both NAT1 and NAT2 and subsequently minimized (keeping the heavy atoms frozen) using the AMBER 8.0  Identical residues are colored in red in the alignment. Secondary structure elements of human NATs are assigned by the PROCHECK program (35) and are shown above the sequences and labeled: the helices are shown as cylinders, the strands are shown as arrow bars. Residues of the catalytic triad are labeled with stars; the residues interacting with CoA in NAT2 are labeled with red triangles; the residues involved in substrate binding are labeled with blue triangles. The alignment was generated using ClustalW (36) assisted with hand fittings.

RESULTS AND DISCUSSION
Overall Fold-To elucidate the structural determinants that govern substrate-binding specificity we aimed to solve several crystal structures of NAT1 and NAT2. NAT1 and NAT2 were expressed in Escherichia coli and successfully purified to high protein purity and homogeneity, which was critical to obtain high quality crystals. First we solved the crystal structure of the NAT1_F125S mutant by using multiwave anomalous diffraction phasing and refined it to 1.7-Å resolution. We used the alkylated enzyme to improve crystal quality. Only one cys-teine residue was alkylated in our experiment, as was confirmed by mass spectrometry (data not shown). The catalytic cysteine Cys-68 was found to be alkylated during the refinement of the crystal structure (see below and Fig. 5). We then solved wild-type NAT1 with covalently bound 2-bromoacetanilide and the NAT2-CoA complex by molecular replacement and refined them to 1.8-and 1.9-Å resolution, respectively ( Table 1). The NAT1acetanilide complex was crystallized in space group P2 1 with one protein molecule per asymmetric unit, whereas the NAT2-CoA complex was crystallized in space group P4 3 2 1 2 with two protein molecules per asymmetric unit. The preference for two molecules of NAT2-CoA in the asymmetric unit is likely a consequence of its crystallization in high salt (2.5 M ammonium sulfate), leading to tight crystal packing. Gel-filtration profiles for NAT2 suggest that the functional protein exists in monomeric form in solution.
The three-dimensional structures of NAT1 and NAT2 are similar to each other, and the C␣ atoms of residues 1-290 can be superimposed with an r.m.s.d. of 0.7 Å. The fold of human NAT1 and NAT2 largely resembles the overall structure of their prokaryotic orthologues (Fig. 1), which is traditionally described in terms of three domains (20).
The amino-terminal domain (residues 1-83) consists of five helices (␣1-␣5) and one short ␤-strand between helices ␣2 and ␣3. The second domain comprises residues 85-192 and consists of nine ␤-strands (␤2-␤11 with two short helices ␣6 and ␣7). These two domains connect through the ␣-helical interdomain (helices ␣8 -␣10) to the third domain, which has four anti-parallel ␤-strands (␤12-␤15) and helix ␣11. The helix ␣11 precedes a stretch of well resolved residues that lead across the protein molecule's surface into a buried carboxyl terminus (residue 290). In human NATs, residues 167 and 183 bracket a 17-residue insertion, which is absent in the structures of prokaryotic NATs ( Fig. 1 and 2). The insertion presented an obstacle in previous modeling attempts (21,22) but is well resolved in the electron density of our crystal structures (supplemental Fig.  S1). The function of the insertion in human NATs is not fully understood. However, a recent study has shown that the insertion contributes to the stability of human NATs (22). The first few residues of this insertion form a short ␤-strand (␤10) and extend the previously mentioned 4-stranded anti-parallel ␤-sheet (domain III) by parallel alignment to its fourth strand ␤15. The insertion also acts as a lid to bury the carboxyl terminus of the protein and makes several contacts with residues from ␤14 and ␤15 strands, ␤14 -␤15, ␤9 -␤10, and ␣6 -␤14 loops (not shown), and thus contributes to overall stability of the protein. Interestingly, the structures of NAT1 and NAT2 show a significant modification in the arrangement of their carboxyl-terminal residues when compared with prokaryotic NATs. Beginning at residue 274, the carboxyl-terminal helix observed in prokaryotic NATs is dissolved ("stretched out"). In contrast, the carboxyl terminus extends deep within the folded human NATs, into close proximity to the buried catalytic triad (Fig. 1) and capable of interacting with CoA (Fig. 3A), as described below. The human NAT1 and NAT2 fold shows no similarity to the GCN5-related N-acetyltransferase (23) and amino(␣)-terminal acetyltransferase superfamilies (the crystal structure of NAT13 is available now, PDB code 2OB0). 3 Previous studies of bacterial NATs revealed a cysteine protease-like catalytic core (20). The Cys-His-Asp triad (Cys-68, His-101, and Asp-122 in human NATs) is conserved among prokaryotic and eukaryotic NATs (Fig. 2). An all-atom superimposition of the relevant residues (Cys-70, His-110, and Asp-127) in Mycobacterium smegmatis, with those in human NAT1 and NAT2 (Cys-68, His-107, and Asp-122) yields an r.m.s.d. of 0.2 Å, whereas the overall structures align with an r.m.s.d. of 1.6 Å for all atoms. Importantly, the position of the catalytic residues is identical in all three of our crystal structures. The arrangement of these residues is maintained by a complex network of non-bonding, polar interactions, involving residues that are highly conserved among eukaryotic and prokaryotic NATs (Fig. 3B).
Substrate-binding Site-The crystal structure of NAT2 in complex with CoA contains two protein molecules per asymmetric unit that can be superimposed with an r.m.s.d. of 0.3 Å for C␣ atoms. Each of the two polypeptide chains binds a CoA molecule in an equivalent position (Fig. 3A). Although the structures show no obvious similarity to acetyl-CoA-dependent N-acetyltransferases of the GCN5-related superfamily, the CoA molecules bind to the enzyme in a conformation similar to that found in GCN5-related acetyltransferases (23). The bound CoA is bent and wraps around the surface of the protein, positioning the sulfhydryl group of CoA in the cleft between the ␣-helical inter-domain and the ␤-barrel (domain II). The ␤-mercaptoethylamine and pantothenate moieties are located ϳ12 Å deep inside the cleft, placing the cofactor's sulfhydryl group close to that of the catalytic cysteine (Cys-68), with a sulfur-sulfur distance of nearly 2.74 Å (Fig. 4).
Hydrophobic residues Phe-37, Phe-93, Leu-209, Phe-217, and Leu-288 make an extensive set of van der Waals contacts throughout most of the length of the pantotheine arm of CoA and thus play a major role in orienting the reactive sulfhydryl group for acetyl transfer (Figs. 3A and 4). In addition, the carbonyl group of the pantothenate moiety establishes a hydrogen bond with the nitrogen atom of Ser-216. No other hydrogen or water-mediated hydrogen bonds are found between the pantotheine arm of CoA and the enzyme.
The pyrophosphate group of CoA makes a series of direct hydrogen bonds with the sequential amide nitrogens of residues Thr-103 and Gly-104 in the ␤3-␤4 loop and with the hydroxyl groups of Tyr-208 and Thr-214 (equivalent to Ser-214 in NAT1) from helix ␣9 and the ␣9 -␣10 loop, respectively (Figs. 3A and 4). The crystal structure of the NAT2-CoA complex does not support previous indications that the pyrophosphate loop ("P-loop") starts at the Gly-126 residue of Salmonella typhimurium and proximal to the active site (20,24).
The N6 of the adenine ring of CoA forms a single hydrogen bond with the side chain of Ser-287. The adenine ring also makes contacts with hydrophobic residues Pro-97 and Val-98. In NAT1, phenylalanine is in position 287, suggesting a mainly hydrophobic interaction of the adenine ring of CoA with NAT1 and NAT2.
Superimposition of the NAT2-CoA complex and NAT1_ F125S structures shows that CoA binding induces slight structural rearrangements that are confined to the ␤3-␤4 loop ("Ploop"). This loop moves by 1.8 Å toward the center of the cleft upon CoA binding.
The active site of NAT1 is clearly defined by the crystal structure of wild-type NAT1, which reacted with the irreversible inhibitor 2-bromoacetanilide, leading to acetanilide bound to the sulfur atom of the catalytic cysteine (Cys-68). The aromatic ring of the acetanilide molecule mimics substrates of NAT1. It 3 A. Plontnikov, manuscript in preparation. stacks with the hydrophobic surface of the substrate-binding site formed by the aromatic ring of Phe-125 and the side chain of Val-93 (Fig. 5A). It also interacts with the side chains of other hydrophobic residues (Ile-106 and Phe-217). Phe-217 is highly conserved in many NATs, whereas residues Val-93, Ile-106, and Phe-125 in human NAT1 are replaced by Phe, Val, and Ser in NAT2. The hydrophobic environment is required for the binding of substrates. Our structure of the NAT1_F125S mutant with acetamide covalently bound to Cys-68 revealed a similar environment except for the substitution of Phe-125 with serine (Fig. 5B). As has been described previously (25), this substitution virtually abolishes NAT1 affinity preference for PAS over SMZ by reducing its affinity for the smaller PAS and increasing its affinity for the bulkier SMZ. Thereby, the substrate selectivity of NAT1_F125S resembles that of wild-type NAT2.
Substrate Binding Specificity of Human NATs-With high resolution crystal structures of both NATs in hand, we attempted to rationalize the dependence of NAT substrate selectivity on the nature of key residues. We modeled the substrates for NATs (PAS for NAT1 and SMZ for NAT2) into the active sites (Fig. 5,  C-F). Inspection of the substratebinding sites of both NAT1 and NAT2 revealed key features that provide clues to their respective selectivities for binding.
First, the substrate binding pocket in NAT1 is smaller (162 Å 3 ) than that of NAT2 (257 Å 3 ). This is a consequence of two key residue substitutions at positions 127 and 129, namely Arg-127 and Tyr-129 in NAT1, whereas in NAT2, serine residues occupy these positions (Fig. 5, C and  E). The effect of having two bulky groups protruding into the substratebinding site is significant, in essence reducing the volume of the NAT1 pocket by almost 40% compared with that of NAT2 (Fig. 5, D and F). The second major substitution takes place at position 93, where NAT1 exhibits a valine and NAT2 a phenylalanine (Fig. 5, C and E). This substitution has the effect of introducing a "lip" in the Van der Waals surface of the binding pocket of NAT2, making it more selective for substrates that can accommodate this feature (Fig. 5, D  and F). The F125S mutation does not change the volume of the substrate binding pocket to any significant measure, but rather alters the extent of the hydrophobic interactions with the pocket surface at this position. Docking studies suggested a clear preference for substrate types as a function of the specific enzyme. Aminobenzyl compounds were identified to bind preferably to NAT1. In NAT1, the amino residue of PAS makes hydrogen bond contacts with His-107 and the backbone carbonyl of Ile-106. The acid motif in PAS was directed toward Arg-127, also making hydrogen bond interactions. The aromatic ring of PAS presentedstaking interactions with the phenyl group of Phe-125 (Fig. 5D), much like the pose of the co-crystallized acetanilide fragment discussed above (Fig. 5A). In contrast, the sulfonamide class of compounds bound selectively to NAT2. This largely resulted from the presence of Phe-93 in NAT2, whose side chain introduces a bump in the Van der Waals surface of this enzyme's pocket (Fig. 5F). The sulfonamide series in the library of virtual substrates, by virtue of their hinged-like structure, were identified by the docking experiments as the set of compounds that can accommodate this substitution and capitalize on the most binding interactions. The results of these docking simulations are consistent with enzyme kinetic studies of wild-type and site-directed mutant forms of NAT1 and NAT2. 4 Structural Consequences of NAT Polymorphism-Human NATs display considerable person-to-person variation in activity due to genetic polymorphism. In NAT1, there are 29 variant alleles that are associated with 11 amino acid alterations, whereas in NAT2, there are 32 variant alleles associated with alterations of 12 amino acids (Fig. 6). A complete list of the SNPs in both NAT1 and NAT2 can be obtained on-line (Dept. of Pharmacology and Toxicology, University of Louisville School of Medicine). Polymorphisms in NAT genes have been associated with a variety of drug-induced toxicities as well as cancer. Connections have been established between the NAT2 acetylator phenotype and certain cancers such as that of the urinary bladder cancer (8, 26 -30).
The crystal structures of NAT1 and NAT2 provide a rationale for the altered function of the NAT variants. Most of the human NAT variants have altered residues that are conserved between the two proteins, with a few exceptions (Fig. 6B). The positions of the point mutations in NAT1 and NAT2 are scattered throughout the entire primary structure, but with a majority of the substitution sites (Arg-117, Arg-166, Val-149, Ser-214, Met-205, and Glu-261 in NAT1; Ile-114, Glu-167, Arg-197, Lys-268, Lys-282, and Gly-286 in NAT2) located on the surfaces of the proteins and thus exposed to the cellular environment (Fig. 6A). Reduction in the enzymatic activity of several of these NAT variants is believed to be due to increased ubiquitinylation of the mutant proteins and their degradation by the 26 S proteasome, because activity reduction correlates with reduced protein level, while the transcription rate of the NAT genes and the enzyme kinetics are unchanged (4,31).
The crystal structures of NAT1 and NAT2 show that few mutations are involved in intramolecular polar or hydrophobic interactions (R64W, E167Q, R187Q, and D251V in NAT1; R64Q or R64W, D122N, L137F, and Q145P in NAT2) (Fig. 7). In NAT1 and NAT2, Arg-64 is located on the ␣4 -␣5 loop and makes hydrogen bond interactions with Glu-38 and Asn-41 (Fig.  7A). Substitutions of Arg-64 with Trp or Gln will destroy these interactions. The bulkier side chain of tryptophan will clash with the surrounding residues, thereby changing the local protein structure. Also, the surface-exposed hydrophobic side chain of tryptophan may lead to aggregation of the protein. This structural information provides the basis for previous observations that the recombinant protein product of NAT1*17 (R64W) was found to be insoluble when expressed in E. coli and forms microaggregates in vivo (31).
The R187Q (NAT1*14B) alloenzyme exhibits slow acetylation activity due to a substantially lower level of the expressed protein (4). Arg-187 forms a hydrogen bond with Glu-182, which is located in the 17-residue insertion unique to human NAT1 and NAT2 (Fig. 7B) and required for the stability of the human NATs (22). Therefore, the R187Q substitution most likely decreases the stability of the protein and lowers protein level thereafter. However, R187Q also decreases the affinity of NAT1 for PAS by ϳ15-fold (32), suggesting that this mutation also alters the active site topology.
The D251V (NAT1*22) substitution is located on strand ␤15, and Asp-251 makes contacts with Arg-242 and Asn-245 through hydrogen bonding (Fig. 7C). In D251V, changing the 4 D. Grant, manuscript in preparation. charged residue (Asp-251) to a hydrophobic residue (Val) would break the interactions and result in destabilization of the protein. This is in agreement with biochemical data that showed low catalytic activity associated with reduced protein expression in bacterial (33) and yeast (4) systems, which is apparently due to decreased stability of the protein.