The Crystal Structure of Rv1347c, a Putative Antibiotic Resistance Protein from Mycobacterium tuberculosis, Reveals a GCN5-related Fold and Suggests an Alternative Function in Siderophore Biosynthesis*♦

Mycobacterium tuberculosis, the cause of tuberculosis, is a devastating human pathogen. The emergence of multidrug resistance in recent years has prompted a search for new drug targets and for a better understanding of mechanisms of resistance. Here we focus on the gene product of an open reading frame from M. tuberculosis, Rv1347c, which is annotated as a putative aminoglycoside N-acetyltransferase. The Rv1347c protein does not show this activity, however, and we show from its crystal structure, coupled with functional and bioinformatic data, that its most likely role is in the biosynthesis of mycobactin, the M. tuberculosis siderophore. The crystal structure of Rv1347c was determined by multiwavelength anomalous diffraction phasing from selenomethionine-substituted protein and refined at 2.2 Å resolution (r = 0.227, Rfree = 0.257). The protein is monomeric, with a fold that places it in the GCN5-related N-acetyltransferase (GNAT) family of acyltransferases. Features of the structure are an acyl-CoA binding site that is shared with other GNAT family members and an adjacent hydrophobic channel leading to the surface that could accommodate long-chain acyl groups. Modeling the postulated substrate, the Nϵ-hydroxylysine side chain of mycobactin, into the acceptor substrate binding groove identifies two residues at the active site, His130 and Asp168, that have putative roles in substrate binding and catalysis.

Mycobacterium tuberculosis, the causative agent of tuberculosis (TB), 1 is the world's most devastating pathogen, respon-sible for 2-3 million deaths annually (1). Two features of this very slow growing organism make it particularly difficult to combat. First, it has a thick, waxy cell wall, rich in unusual lipids (2), that makes it impermeable to many drugs. Second, when engulfed by macrophages, it can switch its metabolism and remain in a latent or persistent state inside granulomas in the lung (3,4). This latent state can last for many years (5), until the organism is reactivated, for example when the immune system becomes compromised. Current estimates are that one-third of the world's population is infected, and that the incidence of active TB is rising, in particular as a result of synergy with the HIV/AIDS pandemic. Although effective anti-TB drugs exist, treatment regimes require a mixture of two to three drugs administered for at least 6 months.
The emergence in recent years of strains of M. tuberculosis that are resistant to all of the current front-line drugs (6) presents a new threat, paralleling the rise in resistance to antibiotics across the whole spectrum of infectious disease (7). The publication of the complete genome sequence for the H37Rv strain of M. tuberculosis (8) presents new opportunities, both for understanding, at a molecular level, the factors that contribute to antibiotic resistance and for identifying genes whose protein products have potential importance as targets for the design of new anti-TB drugs. At the same time, the problems of functional annotation are considerable; a large proportion of the gene products of the M. tuberculosis genome are still of unknown or uncertain function, some anticipated pathways cannot be traced in their entirety, and many presently unknown pathways are likely to exist.
Several open reading frames (ORFs) in the M. tuberculosis genome have been annotated as antibiotic resistance genes. These include two putative aminoglycoside 3Ј-phosphotransferases (APHs) (Rv3225c and Rv3817) and three putative aminoglycoside N-acetyltransferases (AACs) (Rv0133, Rv0262c, and Rv1347c). The aminoglycosides, which include streptomycin, the first chemotherapeutic agent to be effective against M. tuberculosis, typically have a three-ring structure comprising one highly substituted aminocyclitol ring linked to a modified ribose, which is in turn linked to N-acetylglucosamine. The APH enzymes inactivate aminoglycoside antibiotics by ATPdependent phosphorylation of a target oxygen atom and the AAC enzymes by CoA-dependent acetylation of an amino group (9). * This work was supported by the Health Research Council of New Zealand and was performed as part of the International TB Structural Genomics Consortium (www.doe-mbi.ucla.edu/TB/). The work at Lawrence Livermore National Laboratory (LLNL) was funded under the NIH P50 grant GM62410. Lawrence Livermore National Laboratory is operated by the University of California for the United States Department of Energy under Contract W-7405-ENG-48. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18  A confounding factor in the annotation of these ORFs, however, is that both the APHs and the AACs belong to wider families of enzymes with diverse functions. The APHs are structurally homologous with protein kinases (10) and possess weak protein kinase activity (11). The AACs belong to a large family of N-acetyltransferases that includes enzymes that acetylate histones and other amino-containing substrates, as well as aminoglycosides (12,13). Sequence identity within these families is generally low, making definitive identification difficult, and substrate specificity can be relatively broad. Thus, for example, the gene product of Rv0262c has been shown to be able to carry out the in vitro acetylation of aminoglycosides with either 2Ј-amino or 2Ј-hydroxyl substituents, but biochemical data and inferences from the crystal structure suggest that the "true" physiological substrate could be a substituted glucosamine derivative such as mycothiol (14,15).
Here we report the three-dimensional structure of the product of the M. tuberculosis ORF Rv1347c, determined by x-ray crystallography at 2.2 Å resolution. This gene product has been annotated as a possible aminoglycoside 6Ј-N-acetyltransferase, although recent in vitro biochemical assays have failed to demonstrate this activity (16). Intriguingly Rv1347c has been found to be essential for the growth of M. tuberculosis in a genomescale transposon mutagenesis analysis (17), suggesting that it could have some other, essential, function. The structure determined here shows that Rv1347c is a member of the GCN5 related N-acetyltransferase (GNAT) family of enzymes (13), which includes the AACs (18). Detailed analysis of the structure, however, combined with bioinformatic analysis and modeling, leads us to suggest an alternative role in siderophore biosynthesis. We propose that Rv1347c functions in the acylation of one or both of the N ⑀ -hydroxylysine arms of mycobactin, the essential iron chelator produced by M. tuberculosis, and further identify several key residues at the active site and a hydrophobic channel that can accommodate a long-chain acyl group.

EXPERIMENTAL PROCEDURES
Overexpression and Purification-The gene coding for Rv1347c was amplified by PCR from genomic DNA, and cloned into a modified pET42a vector (Novagen), with an rTEV cleavage site incorporated. Rv1347c was expressed in the Escherichia coli strain BL21(DE3) as a C-terminal glutathione S-transferase fusion protein. To increase the yield of soluble fusion protein, each 1-liter culture was grown at 37°C until an A 600 of 0.7 was reached, at which point the temperature was reduced to 25°C. The temperature was further reduced to 18°C once an A 600 of 1.2 was attained, after which expression was induced with 0.1 mM isopropyl ␤-D-thiogalactopyranoside at an A 600 of 1.5. Expression was allowed to continue overnight.
Cells were harvested at 6,000 ϫ g for 15 min at 4°C and resuspended in precooled lysis buffer; 20 mM HEPES, pH 8.0, 300 mM NaCl, 5 mM ␤-mercaptoethanol, containing 10 mM benzamidine and 1 mM phenylmethylsulfonyl fluoride. The fusion protein was extracted from the cells using ultrasonication and batch-purified on pre-equilibrated glutathione-Sepharose 4B resin (Amersham Biosciences) at 4°C for 1.5 h. Following three washes in cold lysis buffer (minus benzamidine and phenylmethylsulfonyl fluoride), the resin was resuspended in 10 ml of the same buffer and EDTA added to a final concentration of 0.5 mM. The resin was then incubated at 4°C overnight with 0.2 mg of rTEV (Invitrogen), leaving Rv1347c in the soluble fraction. The resin was removed using a 0.2-m filter and the Rv1347c separated from the poly-His-tagged rTEV by passing over a HiTrap chelating ion exchange column (Amersham Biosciences) charged with nickel and pre-equilibrated in lysis buffer. Soluble Rv1347c was present in the flow-through fraction.
The flow-through fraction was concentrated using an Amicon stirred cell (membrane cut-off 10 kDa) and loaded onto a Superdex 75 FPLC column (Amersham Biosciences), pre-equilibrated in running buffer: 20 mM HEPES, pH 8.0, 300 mM NaCl, 5 mM ␤-mercaptoethanol, 0.01% NaN 3 . Rv1347c was isolated in an elution volume consistent with a molecular mass of ϳ24 kDa. Dynamic light scattering (DynaPro, Protein Solutions) showed a monodisperse sample with a ratio C p /R H of 18.2% and a molecular mass of 28 kDa, consistent with a monomeric species in solution. The protein was concentrated to ϳ15 mg ml Ϫ1 and stored frozen in small aliquots at Ϫ80°C.
Selenomethionine-incorporated (SeMet) protein was produced via inhibition of the methionine metabolism pathway (19) and purified as above. After concentration to ϳ15 mg ml Ϫ1 , tris(2-carboxyethyl)phosphine hydrochloride (adjusted to pH 8.0) was added to a final concentration of 2 mM to help prevent oxidation of the SeMet. Incorporation of SeMet was assayed using ion spray mass spectrometry, which confirmed the incorporation of five SeMet residues.
Crystallization and Data Collection-Initial crystallization conditions were identified by the Crystallization Facility of the Mycobacterium Tuberculosis Structural Genomics Consortium (www.doe-mbi. ucla.edu/TB/) at Lawrence Livermore National Laboratory using CRYSTOOL random screening (20) and were readily reproducible in our laboratory. Crystals were grown by vapor diffusion from hanging drops by mixing equal volumes of the native or SeMet incorporated protein with crystallization buffer; 25-27% methoxypolyethylene glycol 5000, 0.1 M Tris-HCl, pH 6.5, 0.8% ␤-octyl glucoside (BOG). The crystals belong to space group P2 1 2 1 2 1 with cell dimensions a ϭ 75.85 Å, b ϭ 77.39 Å, c ϭ 297.60 Å, with eight molecules per asymmetric unit giving a Matthews coefficient V M of 2.4 Å 3 /Da (51% solvent).
For data collection, the crystals were flash-cooled in a nitrogen stream at 100 K after stepwise addition of polyethylene glycol 400 to the crystal drops to a final concentration of 5%. A native data set to 2.15 Å resolution was collected in-house, using Cu-K␣ radiation from a Rigaku RU300 rotating anode generator equipped with Osmic mirror optics and a Mar345 image plate detector. Multiwavelength anomalous diffraction data to 2.25 Å resolution were collected at three wavelengths using a Quantum 4 ADSC CCD detector on Beamline 9-1 at the Stanford Synchrotron Radiation Laboratory. All data were reduced and scaled using DENZO and SCALEPACK (21). Details of data collection and processing statistics are in Table I.
Structure Determination and Refinement-The selenium sites were found using SOLVE (22), which located 30 of the expected 32 sites (excluding the N-terminal Met residues, which are disordered in all eight molecules). The initial phases, which gave a figure of merit of 0.66 for data to 2.5 Å resolution, were improved using solvent flattening and 8-fold non-crystallographic symmetry averaging as in RESOLVE (23). The final figure of merit was 0.72. Initial tracing of the polypeptide chain was performed using MAID (24), and the side chains were placed manually using the program O (25). Refinement was carried out using CNS (26) incorporating R free validation to monitor the progress of refinement. The final model contained 1,590 residues out of the 1,680 expected in the asymmetric unit, together with 650 water molecules and three BOG molecules, which were found in equivalent positions in three of the eight molecules (D, E, and G) in the asymmetric unit. The final R and R free values are 0.227 and 0.257, respectively, with a root mean square deviation from standard geometry of 0.009 Å for bond lengths and 1.7 o for angles. The residues that are modeled in the eight   (27), and only two residues (0.1% of total) in disallowed regions.

RESULTS
Crystal Structure Determination-The ORF Rv1347c, which encodes a poplypeptide of 210 amino acid residues, was cloned into E. coli, overexpressed, purified, and crystallized. The crystal structure was then solved, in its apo form, by multiwavelength anomalous diffraction methods (28) using selenomethionine-substituted protein and was refined at 2.2 Å resolution to a final R factor of 0.227 (R free ϭ 0.257) ( Table II). The asymmetric unit of the crystal contains eight independent molecules. To investigate possible oligomerization we analyzed the interfaces between neighboring molecules using the Protein-Protein Interaction Server (www.biochem.ucl.ac.uk/bsm/PP/ server), based on principles described by Jones and Thornton (29). This analysis showed that the largest interface buries only 477 Å 2 (4.5%) of the total accessible surface of the molecule, typical for intermolecular crystal contacts, strongly suggesting that the protein is monomeric in solution. This is consistent with gel filtration and dynamic light scattering data (data not shown), which also indicate a monmeric species.
Molecular Structure-The Rv1347c monomer is folded into a single domain based on a central ␤-sheet with helices packed against both faces of the sheet (Fig. 1). The most striking feature of the structure, which is characteristic of all acyltransferases of the GNAT family (see below), is that the ␤-sheet is divided into two halves which diverge in the center to create a cleft that serves as a conserved binding site for the acyl-CoA cofactor (13). The N-terminal four strands, ␤1-␤4, form an antiparallel ␤-sheet that abuts a C-terminal three-stranded antiparallel ␤-sheet comprising strands ␤5-␤7. Strands ␤4 and ␤5 run parallel, joined by hydrogen bonding at their N-terminal ends but diverging half-way along. In other GNAT family members, this divergence has been attributed to the presence of a conserved ␤-bulge in strand ␤4 that gives an accentuated twist to this strand. Rv1347c does not have this bulge, however, yet ␤4 from Rv1347c aligns perfectly with the ␤4 strands of the other family members apart from the deletion of one residue from the middle of the strand. This suggests that the ␤-bulge may have more to do with the details of substrate binding and catalysis than the polypeptide conformation of the GNAT scaffold. The single residue, His 130 , that replaces the two ␤-bulge residues of the other proteins is invariant in all the closest homologs of Rv1347c and seems likely to have a key active site role (see below).
One face of the central ␤-sheet has three helices packed against it, ␣1, ␣2, and ␣3, following the nomenclature of Modis and Wierenga (30). These three helices form the connection between strands ␤1 and ␤2, and together with the 16-residue ␤3-␤4 loop (residues 110 -125), the short ␤6-␤7 loop, and part of a long N-terminal extension, enclose a cavity above the central ␤-sheet that we propose to be the acceptor substrate binding site. The other face of the ␤-sheet has packed against it two helices, the long ␣4 helix connecting strands ␤4 and ␤5, which is a conserved feature of all GCN5 family enzymes, and the shorter ␣5 helix joining strands ␤5 and ␤6. Prior to strand ␤1, the N-terminal portion of the polypeptide wraps around the periphery of the molecule, largely in extended form except for a short 3 10 -helix, and contributes a loop, residues 16 -20, that helps enclose the proposed binding site for the acceptor substrate.  The overall topology resembles a left-handed glove, with the N-terminal half of the ␤-sheet and helices ␣1-␣3 representing the palm and fingers of the hand and the C-terminal half of the ␤-sheet representing the thumb. The cleft formed between the "thumb" and the "palm" is the putative binding site for AcCoA and adjacent to this, in the hollow of the palm is the proposed substrate binding site.
The crystal structure contains several additional pieces of continuous density that cannot be accounted for by the polypeptide chain (Fig. 2). The most prominent of these is an extended ribbon of density that is found associated with each of the eight molecules in the asymmetric unit. Although not equally well defined in each case, it can be modeled as a molecule of the detergent BOG, which was used in crystallization, and has been fully refined as such in three of the eight molecules. The extended octyl chain inserts into a hydrophobic cleft in the "back" side of the molecule, in contact with residues Gly 96 , Trp 98 , Leu 106 , Ile 133 , Phe 143 , Leu 147 , and Ile 151 from strands ␤2, ␤3, and ␤4, and helix ␣4. The glucosyl ring resides on the surface between the ␤2-␤3 and ␤4-␣4 loops, in contact with Trp 98 , Thr 101 , Asp 135 , and Lys 138 . A second, but less well defined, ribbon of density follows another hydrophobic channel leading to the acyl-CoA binding site (discussed later).
Sequence and Structural Comparisons-Searches of the current sequence data bases with BLAST (31) reveals many homologous sequences, reflecting the widespread occurrence of proteins from the GNAT family. The top BLAST hits, which are almost exclusively from bacteria, apart from a few fungal representatives, include many proteins that are annotated as "conserved hypotheticals." Perhaps significantly, however, more than half of the top 30 hits are to various bacterial homologues of IucB, which is a CoA-dependent N ⑀ -hydroxylysine N-acetyltransferase from the biosynthetic pathway for the siderophore aerobactin (32). Sequence identity between Rv1347c and these proteins is ϳ25% on a pairwise basis, with only nine residues completely conserved (Fig. 3). Four of these conserved residues, Asp 126 , Gly 128 , His 130 , and Pro 169 in Rv1347c, together with Asp 168 , which changes only to Glu, map to strands ␤4 and ␤5, around the region of the proposed active site.
Cofactor Binding-All structurally characterized acyltransferases of the GNAT family share the same binding site for the acyl-CoA cofactor, in which the pantotheine moiety is wedged between the diverging ␤-strands ␤4 and ␤5, with the thioacyl group close to the point of divergence. Binding depends on three main features; hydrogen bonding of the pantotheine amide groups with main chain groups on strand ␤4, a hydrophobic pocket for the dimethyl moiety, and interactions of the diphosphate moiety with main chain NH groups from the ␤4-␣4 connection and the N terminus of helix ␣4. The latter interaction accounts for one of the most conserved sequence motifs in GNAT enzymes, designated motif A (12). The preponderance of main chain interactions accounts for the remarkably consistent CoA conformation found in GNAT enzymes (13), despite low sequence identity.
Attempts to co-crystallize Rv1347c with acetyl-CoA have not been successful, but the pantotheine moiety can be modeled into the Rv1347c structure in a straightforward way, as in Fig.  2, based on the other GNAT family members. The two pantotheine amide groups hydrogen bond to peptide CϭO and NH groups from strand ␤4; these correspond in Rv1347c to 131 CϭO and 133 NH. The side chain of Asn 173 could also provide an additional hydrogen bond, either directly (after a small conformational adjustment) or via a bridging water; this is not a conserved interaction in GNAT enzymes but is found in all cases where an Asn residue fills this position. As expected, the dimethyl group is adjacent to several hydrophobic side chains, from Ile 133 and Val 139 . It is not possible to model the cofactor reliably beyond this point, however, as a result of conformational and sequence differences in the ␤4-␣4 connection. The   FIG. 2. Non-protein electron density in the Rv1347c structure. The density, from a simulated annealing, NCS-averaged, F 0 Ϫ F c "omit" map, contoured at 3 , follows two hydrophobic channels leading from the acyl-CoA binding site and is shown as a gray cloud. This density, which is present in all eight molecules in the asymmetric unit, is attributed to bound ␤-octyl glucoside molecules. BOG has been modeled into the left-hand channel (B), but it is the right-hand channel (A) that is the proposed site for the acyl chain of a substrate acyl-CoA molecule. A patch of density at the molecular surface (bottom left) represents the binding site for the BOG head group; the density leading from this into channel B is not continuous in this averaged map, however, and BOG has only been modeled into three of the eight molecules. In this stereo figure, the pantotheine arm of CoA is modeled into the conserved GNAT CoA binding site, as described under "Cofactor Binding" under "Results." sequence KVNRGFGPL (residues 138 -146) approximates the GNAT motif A consensus Q/RxxGxGxxL, which corresponds to the ␤4-␣4 connection and diphosphate binding motif in other GNAT enzymes (13). However, a peptide flip between residues 139 and 140 removes one potential peptide NH interaction, and the presence of proline residues at Pro 145 and Pro 149 disrupts the beginning of helix ␣4. The closest model for Rv1347c in this region is probably the N-myristoyltransferase (39), which also has a proline (Pro 184 ) at an analogous position to Pro 145 , and differs from the other GNAT enzymes in the way it binds the diphosphate and adenosyl groups. In Rv1347c, the ␤4-␣4 loop region has high B values (60 -80 Å 2 ) in five of the eight molecules, suggesting that it may undergo conformational adjustment in response to cofactor binding. For these reasons we conclude that it is unrealistic to attempt to model the latter parts of the cofactor into the Rv1347c structure.
The thioacyl group of the cofactor sits at the point of divergence of the two ␤-strands, ␤4 and ␤5. Here, two features stand out. First, the ␤-bulge in strand ␤4, which is conserved in all other structurally characterized GNAT enzymes, is not found in Rv1347c. The effect of the ␤-bulge in most other GNAT enzymes is to direct two consecutive peptide NH groups toward the acyl oxygen of the acyl-CoA substrate; in Rv1347c, however, only the peptide NH of Ala 131 can hydrogen bond to the acyl oxygen. Second, two highly conserved residues, His 130 and Asp 168 , are located at this point; His 130 is invariant in all homologues of Rv1347c, while Asp 168 is only substituted by Glu residues. We conclude that these two residues are involved in catalysis and/or substrate specificity. His 130 is located precisely where the middle of the ␤-bulge is in the other GNAT enzymes, and Asp 168 , which is hydrogen-bonded to His 130 , is adjacent to another invariant residue, Pro 169 .
The acyl methyl group in GNAT acetyl-CoA complexes is directed into a hydrophobic pocket. In Rv1347c this pocket develops into two hydrophobic channels (Fig. 2), starting at Phe 167 and extending 10 -12 Å toward the protein surface. Both channels contain ribbons of non-protein electron density that probably arise from the presence of partial occupancy BOG molecules and indicate potential locations for a long-chain acyl group attached to CoA. The most favorably oriented of these channels (A in Fig. 2) passes between helices ␣4 and ␣5, its walls formed by Val 152 , Pro 149 , Leu 148 , and Pro 145 from ␣4, Leu 179 , Cys 180 , and Ala 183 from ␣5, and Leu 203 . This channel could accommodate a CoA acyl chain of at least 8 carbons in length and seems to be a specific feature of Rv1347c. It does not exist in the other GNAT enzymes, where helices ␣4 and ␣5 are 2-3 Å closer, and make contact through side chains, and it does not correspond to the hydrophobic groove that binds the 14carbon acyl group in the myristoyl-CoA complex of N-myristoyltransferase (39); the latter is blocked by side chains in Rv1347c.
Acceptor Substrate Binding Site-By analogy with other GNAT family enzymes, the binding site for the acceptor substrate is predicted to be in a deep groove, about 7-8 Å wide, flanked by residues 68 -73 (the ␣2-␣3 loop) on one side, and residues 195-196 (from the C-terminal ␤6-␤7 loop) and 18 -19 (from the N-terminal region) on the other (Fig. 4). This is on the opposite face of the ␤-sheet from the site of the proposed acyl channel and is topologically equivalent to the acceptor substrate binding site in aminoglycoside complexes of AAC6Ј-Iy (34) and AAC2Ј-Ic (15), the substrate complex of GNA1 (37), and the bisubstrate analog complex of AANAT (40). Although the groove in Rv1347c would fit an aminoglycoside substrate, as judged by superposition of the AAC6Ј-Iy complex, and slight reorientation of the aminoglycoside, its chemical character appears unfavorable. In AAC6Ј-Iy, where the groove is formed between two monomers, one sugar ring stacks between Trp 22 from one monomer and Tyr 66 from the other, but the predominant feature is the high negative potential from a number of acidic residues (34); the same is true in AAC2Ј-Ic (15). In Rv1347c, in contrast, the groove contains three arginine residues (from Arg 19 , Arg 172 , and Arg 196 ) and only one acidic residue.
Catalytic Site-Biochemical evidence suggests that acyl transfer in the GNAT family occurs by direct nucleophilic attack of the amino acceptor group on the thioacyl carbon, the weakness of the thioacyl linkage then leading to breakage of the S-C bond (13). For nucleophilic attack to occur, the amino nitrogen must be uncharged. Depending on the pK a of this group, a nearby general base may or may not be required for deprotonation, either directly or via intervening water molecules that link to a more distant base (34,40). In Rv1347c, the two conserved residues, His 130 and Asp 168 , have their side chains oriented upwards into the acceptor substrate binding groove. Either of these residues would be well positioned to act as a general base. In many, but not all, GNAT enzymes a tyrosine residue is positioned to protonate the thiolate anion after collapse of the tetrahedral intermediate. No equivalent tyrosine is present in Rv1347c. The side chain of Thr 176 is, however, well placed to play a similar role, either directly or through a bridging water molecule as is proposed for AAC6Ј-Iy (34). Our modeling of CoA binding suggests that helix ␣5 may move closer to the cofactor to enable Asn 173 to hydrogen bond to a phosphate oxygen, and if this happens Thr 176 would be brought close (ϳ3.5 Å) to the thiolate sulfur. DISCUSSION The crystal structure of Rv1347c shows clearly that it belongs to the GCN5-related family of N-acyltransferases known as the GNAT family. Enzymes of this family are functionally diverse and share only low levels of sequence identity (12), making functional annotation difficult. Many use acetyl-CoA as their cofactor, transferring the acetyl group to a range of acceptor substrates, including lysine residues on histones, the amino groups on aminoglycoside antibiotics, and a wide variety of small molecules such as serotonin. However, larger acyl groups than acetyl may also be transferred, such as the 14carbon myristoyl group in the case of N-myristoyltransferase (39). M. tuberculosis has particularly rich lipid chemistry, associated with the synthesis and processing of its complex, waxy, cell wall (2), and a great variety of different acyl-CoAs must be available as potential substrates.
Although Rv1347c was originally annotated as an aminoglycoside 6Ј-N-acetyltransferase, there seems little doubt that this annotation is wrong. The putative acceptor substrate binding groove appears unfavorable for binding aminoglycoside antibiotics, with their high positive charge; in known aminoglycosidemodifying enzymes the binding site is invariably marked by strong negative potential, whereas the groove in Rv1347c is much less so and contains three arginine residues. Moreover, we note that (i) clinical resistance to aminoglycosides in M. tuberculosis has been shown to be due to mutations in the 16S rRNA gene or the gene encoding the S12 ribosomal protein (41), rather than to enzymatic inactivation, and (ii) assays of the in vitro activities of two putative APHs (Rv3225c and Rv3817) and two putative AACs (Rv1347c and Rv0262c) showed that only Rv0262c, the previously characterized AAC(2Ј)-Ic, had any significant aminoglycoside modifying activity (16). Even in the latter, it has been suggested that the in vivo function may instead be in mycothiol biosynthesis (15).
What, then, is the biological role of the gene product of Rv1347c? A number of very strong indications point to an involvement in the biosynthesis of mycobactin, the siderophore that is essential for iron acquisition by M. tuberculosis. Both bioinformatic analysis (42) and microarray experiments (43) show that expression of the Rv1347c gene product is under the regulatory control of the iron-dependent regulator IdeR; expression of Rv1347c is repressed by iron through IdeR. The closest amino acid sequence homologues of Rv1347c in other genomes are all either uncharacterized or code for the protein IucB, which functions in the biosynthesis of the siderophore aerobactin in strains of E. coli and many other bacteria (32). Moreover, the phylogenetic profile of Rv1347c, describing its distribution through 80 bacterial genomes, has all its closest matches among other siderophore biosynthesis proteins. This profile was determined using an improved version 2 (see www.cs.auckland.ac.nz/ϳyhua033) of a method first proposed by Pellegrini et al. (44).
A role in mycobactin biosynthesis would explain why Rv1347c is one of the genes found to be essential for the growth of M. tuberculosis in a genome-wide mutational analysis (17); the mycobactin biosynthetic pathway has been previously shown to be essential for growth in macrophages (45). We propose that the specific biochemical function of Rv1347c parallels that of the IucB protein, its closest homologue. The latter catalyzes the CoA-dependent N-acylation of the N ⑀ -hydroxylysine arms of the siderophore aerobactin (32). Mycobactin (Fig.  5) also possesses two N ⑀ -hydroxylysine moieties, one of which is cyclized after acetylation (46) and the other of which is acylated and can bear a range of acyl groups in different species (47,48). Variations of the latter acyl group result in two predominant forms of mycobactin, one with a longer, hydrophobic acyl arm, and another with a shorter, more soluble acyl arm (watersoluble mycobactin, often referred to as carboxymycobactin) (46,48). Interestingly, although most of the genes implicated in mycobactin biosynthesis have been identified and associated with proposed biochemical steps in the pathway (46), neither the enzyme(s) involved in the acylation of the N ⑀ -hydroxylysine arms nor the precise substrates involved are known.
The Rv1347c gene is flanked, in the M. tuberculosis genome, by other iron-dependent genes that are under the control of the same regulator, IdeR, including genes for a putative acyl carrier protein (Rv1344), an acyl-CoA synthase (Rv1345, fadD33) and an acyl-CoA dehydrogenase (Rv1346, fadE14). Transposon mutagenesis in Mycobacterium smegmatis has indicated a direct role for the Rv1345 gene product in mycobactin biosynthesis and implied that this cluster acts, together with an unidentified acyltransferase, to generate the correct sidechains on the siderophore (49). We propose that it is the adjacent gene, Rv1347c, that codes for this acyltransferase.
To explore the hypothesis that the true substrate for the Rv1347c gene product is one of the N ⑀ -hydroxylysine (NHL) side chains of mycobactin, we placed an NHL moiety into the acceptor binding substrate groove and modeled the tetrahedral intermediate that would be formed when NHL attacks the thioacyl carbon (Fig. 6). In this intermediate, the negatively charged oxygen is directed toward the main chain NH of residue 131, 2.9 Å away, and the acyl group is directed toward Phe 167 and the hydrophobic channel. Importantly the N ⑀ -hydroxyl group can form a hydrogen bond (2.5 Å) with the N␦1 atom of His 130 , and the NHL nitrogen is about 3.3 Å from Asp 168 O␦1. This strongly indicates functional roles for these two residues, which are among the few residues that are conserved between Rv1347c and all IucB proteins; we conclude that His 130 provides specific recognition of the hydroxyl group on NHL side chains, and Asp 168 (which is replaced only by Glu) is the base that ensures deprotonation of the attacking nitrogen. Our proposed location for the NHL binding site, in which its ␣-carbon lies between Trp 69 and Tyr 71 on one side, and Arg 19 and Arg 196 on the other, also corresponds to the location of the aminoglycoside substrate in AAC6Ј-Iy (34) and of the substrate portion of the bisubstrate analog in AANAT (40), further validating this model.
The nature of the acyl group(s) that can be transferred by Rv1347c is unknown, given that M. tuberculosis can synthesize mycobactins with several different acyl groups attached to the NHL side chain. The predominant mycobactins are a membrane-associated form with a long, hydrophobic acyl chain of 18 -20 carbons on the NHL arm and a soluble form with 5-9 carbons (47). The hydrophobic tunnel adjacent to the CoA binding site could accommodate such chains nicely. Moreover, functional studies on the Rv1347c gene product further support the notion that acyl groups longer than acetyl are transferred. These functional studies failed to demonstrate any aminoglycoside N-acetyltransferase activity but did demonstrate thioesterase activity with numerous acyl-CoAs, with a preference for longer acyl chains (16). Since one component of N-acyl transfer involves hydrolysis of the thioester bond, thioesterase activity is consistent with an N-acyl transfer function providing an appropriate acceptor substrate is bound. The fact that larger acyl-CoA substrates are hydrolyzed is consistent with our structural and modeling results and with the proposed role in mycobactin biosynthesis.  (38). His 130 , proposed to function in substrate recognition, and Asp 168 , the proposed catalytic base, are shown interacting with the NHL moiety. A longer acyl chain, in place of the acetyl group shown here, would extend down into a hydrophobic channel, past Phe 167 (see "Cofactor Binding" under "Results").