A recently evolved diflavin-containing monomeric nitrate reductase is responsible for highly efficient bacterial nitrate assimilation

Nitrate is one of the major inorganic nitrogen sources for microbes. Many bacterial and archaeal lineages have the capacity to express assimilatory nitrate reductase (NAS), which catalyzes the rate-limiting reduction of nitrate to nitrite. Although a nitrate assimilatory pathway in mycobacteria has been proposed and validated physiologically and genetically, the putative NAS enzyme has yet to be identified. Here, we report the characterization of a novel NAS encoded by Mycolicibacterium smegmatis Msmeg_4206, designated NasN, which differs from the canonical NASs in its structure, electron transfer mechanism, enzymatic properties, and phylogenetic distribution. Using sequence analysis and biochemical characterization, we found that NasN is an NADPH-dependent, diflavin-containing monomeric enzyme composed of a canonical molybdopterin cofactor-binding catalytic domain and an FMN–FAD/NAD-binding, electron-receiving/transferring domain, making it unique among all previously reported hetero-oligomeric NASs. Genetic studies revealed that NasN is essential for aerobic M. smegmatis growth on nitrate as the sole nitrogen source and that the global transcriptional regulator GlnR regulates nasN expression. Moreover, unlike the NADH-dependent heterodimeric NAS enzyme, NasN efficiently supports bacterial growth under nitrate-limiting conditions, likely due to its significantly greater catalytic activity and oxygen tolerance. Results from a phylogenetic analysis suggested that the nasN gene is more recently evolved than those encoding other NASs and that its distribution is limited mainly to Actinobacteria and Proteobacteria. We observed that among mycobacterial species, most fast-growing environmental mycobacteria carry nasN, but that it is largely lacking in slow-growing pathogenic mycobacteria because of multiple independent genomic deletion events along their evolution.

that NasN is an NADPH-dependent, diflavin-containing monomeric enzyme composed of a canonical molybdopterin cofactor-binding catalytic domain and an FMN-FAD/NAD-binding, electron-receiving/transferring domain, making it unique among all previously reported hetero-oligomeric NASs. Genetic studies revealed that NasN is essential for aerobic M. smegmatis growth on nitrate as the sole nitrogen source and that the global transcriptional regulator GlnR regulates nasN expression. Moreover, unlike the NADH-dependent heterodimeric NAS enzyme, NasN efficiently supports bacterial growth under nitrate-limiting conditions, likely due to its significantly greater catalytic activity and oxygen tolerance. Results from a phylogenetic analysis suggested that the nasN gene is more recently evolved than those encoding other NASs and that its distribution is limited mainly to Actinobacteria and Proteobacteria. We observed that among mycobacterial species, most fast-growing environmental mycobacteria carry nasN, but that it is largely lacking in slow-growing pathogenic mycobacteria because of multiple independent genomic deletion events along their evolution.
In addition to being an important nitrogen source for plants, nitrate (NO 3 Ϫ ) is universally used as an inorganic nitrogen source for microorganisms, particularly for soil bacteria such as actinomycetes, as well as marine bacteria such as cyanobacteria, due to fierce competition for limited nutrients in the natural environment (1). In prokaryotes, nitrate is reduced to ammonia (NH 3 ) through two sequential steps, with the first committed step catalyzed by nitrate reductase (NaR). 6 The assimilatory NaR identified in bacteria, abbreviated hereafter as NAS, is cytoplasmic and reduces nitrate to nitrite (NO 2 Ϫ ) (2). Then, nitrite is further reduced by assimilatory nitrite reductase to ammonia, which is subsequently incorporated into the "organic nitrogen pool," generally glutamate and/or glutamine, via either glutamate dehydrogenase or glutamine synthetaseglutamate synthase routes (3).
The holoenzymes of all prokaryotic NASs characterized to date are highly evolutionarily conserved with respect to their catalytic subunit, which contains a canonical nitrate reduction catalytic domain (CA domain), consisting of three binding subdomains bound with a [4Fe-4S] cluster, a molybdopterin, and a bis-molybdopterin guanine dinucleotide, a form of molybdopterin cofactor (MoCo) (Fig. 1). However, these NAS enzymes are significantly different with respect to the intermolecular transfer of electrons derived from their respective electron donors to the catalytic sites. Therefore, the canonical prokaryotic NASs are categorized based on the physiological electron donors utilized (2,4,5), i.e. the flavodoxin-or ferredoxin-dependent NAS (generally NarB) and the NADH-dependent NAS that requires an additional electron transfer subunit.
Nitrate assimilation has been studied in mycobacteria for more than 50 years (14), including both pathogenic and saprophytic mycobacterial species like Mycobacterium tuberculosis and Mycolicibacterium smegmatis (formerly Mycobacterium smegmatis (15)). A recent report suggests that the membranebound respiratory NaR (NarGHJI) of M. tuberculosis functions in both nitrate respiration and nitrate assimilation (16), estab-   lishing a unique strategy for nitrate assimilation in the slowgrowing pathogenic mycobacteria. In contrast to the suggested function of M. tuberculosis NarGHJI, the proposed function of M. smegmatis NarGHJI in nitrate respiration has yet to be experimentally proven (17), and its function in nitrate assimilation has been explicitly disproved (18,19). In addition, the narB homolog in the M. smegmatis genome has been identified as a nitrate assimilation-unrelated gene (18,19). Therefore, the NAS enzyme responsible for the essential nitrate assimilation function in M. smegmatis remains unclear.
In this study, we have identified that M. smegmatis NAS is encoded by the Msmeg_4206 gene and is essential for M. smegmatis to assimilate nitrate under aerobic conditions. It is characterized as an NADPH-dependent monomeric NAS containing a canonical CA domain and an FMN-FAD/ NAD electron-receiving/transferring domain (DF domain) identified in the diflavin reductase family, and it is designated NasN. NasN is structurally unique among all of the currently known NASs. Functionally, it has significantly greater NAD(P)H-dependent enzyme activity and is more oxygen-tolerant than the NADH-dependent heterodimeric NAS enzyme. Based upon the extremely narrow distribution of nasN-containing species across three domains of life, mainly in some environmental bacteria, as well as the uniform phylogenetic clustering of nucleotide sequences homologous to nasN, it is reasonable to infer that this enzyme evolved more recently than other NASs. Unlike their fastgrowing environmental ancestors, most of the slow-growing pathogenic mycobacteria have lost nasN via a few independent genomic deletion events.

Msmeg_4206 gene of M. smegmatis encodes a functional NAS essential for bacterial nitrate assimilation
Previous genetic studies showed that neither the narGHJInor the narB-encoded putative NaR enzyme was responsible for M. smegmatis nitrate assimilation and suggested that the proposed NAS ought to be another MoCo-containing enzyme (18,19). Actually, the D806_RS20790 gene encoding a MoCo-containing NaR was annotated in the genome of M. smegmatis strain MKD8 (20). But for the genome of the reference strain mc 2 155 sequenced by TIGR (NCBI GenBank TM accession number NC_008596.1), this proteincoding sequence (Msmeg_4206) was annotated as a 4055-bp pseudogene due to a frameshift mutation (deletion of nucleotide G 1850 in the gene). In this study, we examined whether Msmeg_4206 functions in nitrate assimilation. In fact, DNA sequencing did not show the nucleotide G 1850 -deletion mutation in Msmeg_4206 in the genome of our mc 2 155 strain nor in the genomes of any other M. smegmatis strains (RRID:SCR_002474, identifier 1026). Also, importantly, the M. smegmatis Msmeg_4206 null mutant strain (mc 2 155⌬nasN) was clearly defective in nitrate assimilation for growth and was readily complemented by the prototype Msmeg_4206, but the G 1850 -deleted mutant was not (Fig. 2, A and B). These results suggest that Msmeg_4206 is involved in nitrate assimilation in M. smegmatis.
We further measured the nitrogen utilization of the Msmeg_4206-null mutant and the WT strain in nitrogen-defined minimal media to probe the role of Msmeg_4206 during nitrate assimilation. These two strains grew almost at the same rate with comparable consumption rates against either nitrite or ammonia as the sole nitrogen source, respectively (Fig. S1, C-F). However, the disruption of Msmeg_4206 caused a totally impaired growth of the resulting strain in minimal medium with nitrate as the sole nitrogen source (Fig. S1A), and the concentration of nitrate in the culture medium for the M. smegmatis Msmeg_4206 null mutant was almost unchanged over the culturing period (Fig. S1B). This growth defect and the inability to utilize nitrate of the Msmeg_4206-null mutant in the nitratecontaining minimal medium were rescued when a copy of the prototype Msmeg_4206 was introduced into the mutant (Fig.  S1, A and B), suggesting that the disruption of Msmeg_4206 only affects the utilization of nitrate during nitrate assimilation. Moreover, the supplement of glutamine could restore the growth of the Msmeg_4206-null mutant in the nitrate-containing minimal medium (Fig. S1G), although it still lost the ability to assimilate nitrate (Fig. S1H), further indicating that the disruption of Msmeg_4206 only affects the reduction of nitrate during nitrate assimilation. Unlike Msmeg_4206, the disruption of narG did not affect the growth of the resulting strain in medium with nitrate as the sole nitrogen source ( Fig. 2A), consistent with previous findings that M. smegmatis NarGHJI does not function in nitrate assimilation (18,19). Collectively, Msmeg_4206 is genetically shown to encode the sole functional NAS enzyme catalyzing the first step of nitrate assimilation in vivo for M. smegmatis growth under aerobic conditions, and thus it is designated nasN.

GlnR directly controls the transcription of nasN in M. smegmatis
GlnR is a global transcription regulator for genes related to nitrogen metabolism in high GC Gram-positive actinobacteria (21,22), essential for mycobacterial nitrate/nitrite assimilation (16,23,24). Previous transcriptomic studies have indicated that the expression of nasN in the M. smegmatis glnR-null mutant strain is largely turned down, seemingly positively regulated by GlnR (23,24). We found that M. smegmatis GlnR is involved in nitrate assimilation, as shown by the growth defect of the M. smegmatis glnR-null mutant (mc 2 155⌬glnR) in the nitratecontaining minimal medium (Fig. 3A). To test whether GlnR directly regulates nasN, we performed an electrophoretic mobility shift assay (EMSA) with a nasN promoter region and recombinant M. smegmatis GlnR. Increasing concentrations of purified GlnR protein resulted in shifting bands (Fig. 3B), indicating that GlnR can directly and specifically bind the nasN promoter region in vitro. In addition, a 48-nucleotide GlnRprotected DNA sequence in the promoter region was precisely determined by the DNase I footprinting assay (Fig. 3C). Two conserved GlnR boxes (a1-b1 and a2-b2) were identified in the GlnR-protected region (Fig. 3D), consistent with the GlnRbinding consensus sequences characterized in actinomycetes (21). Together, these data suggest that nasN belongs to the GlnR regulon for nitrogen metabolism.

NasN confers a growth advantage to M. smegmatis under nitrate-limiting conditions compared with that of the canonical NADH-dependent NAS
To investigate whether NasN is physiologically different from the canonical NASs, such as A. mediterranei NasAC, an NADH-dependent heterodimeric NAS (see Fig. 1), we compared the growth phenotype of the complemented strains expressing His-tagged M. smegmatis NasN or A. mediterranei NasA using mc 2 155⌬nasN as the host strain. The complemented strains were constructed by specifically employing a constitutive hsp60 promoter to rule out the interference caused by GlnR-mediated regulation. When incubated in minimal medium in the presence of 10 mM nitrate as the sole nitrogen source, both A. mediterranei nasAC and M. smegmatis nasN were able to fully restore the growth defect of the nasN-null mutant to the WT level (Fig. 2B). A. mediterranei nasA alone could not support the growth of the nasN-null mutant on nitrate as the sole nitrogen source (Fig. 2B), which is consistent with a previous result showing that the A. mediterranei nasA fails to complement for the nitrate assimilation defect phenotype of the Streptomyces coelicolor nasA-null mutant (9). These results suggest that NasN is a functional homolog of NasAC.
We further measured the growth of the complemented strains in minimal liquid medium containing high (10 mM) or low (0.5-2 mM) concentrations of nitrate as the sole nitrogen source. As shown in Fig. 2C, there was no significant difference in the growth rate between these two complemented strains when incubated in 10 mM nitrate-containing minimal medium. However, when they were inoculated in the minimal medium containing low concentrations of nitrate (0.5-2 mM), the nasAC-complemented strain showed a growth delay, i.e. an increase in the culturing time to reach the maximum cell density, compared with that of the nasN-complemented strain (Fig.  2D). Strikingly, compared with the nasN-complemented strain, the duration of the lag phase for the nasAC-complemented strain became longer when the concentration of nitrate used in the medium decreased. These results suggest that NasN confers a growth advantage to bacteria over that of NasAC under nitrate-limiting conditions.
We further performed Western blot analysis to determine the concentrations of these two NAS proteins in vivo. The tested strains were incubated in minimal medium with 1 mM ammonia as the sole nitrogen source at 37°C for 36 h, which ensured that all tested strains would grow to an OD 600 of ϳ1.5

Mycobacterial monomeric assimilatory nitrate reductase NasN
(ϳ2 ϫ 10 8 cells/ml) when ammonia in culture media was completely depleted (confirmed by measuring the ammonia concentration in the culture). Then, 1 mM nitrate was added to the culture for further incubation at 37°C (Fig. 2E). It is clear that the addition of nitrate supported the further growth of all tested strains to OD 600 of ϳ3.0 (ϳ2 ϫ 10 9 cells/ml), in contrast to no nitrate addition. However, compared with the nasN-complemented strain and the WT strain, the nasAC-complemented strain exhibited a 12-h growth delay (Fig. 2E) with a 2-fold slower growth rate (Fig. 2F) and a significantly lower consumption rate of nitrate ( Fig. 2G) during the period of further incubation. As shown in Fig. 2H, after the shift in nitrogen source, the expression levels of NasAC and NasN at various time points were similar to each other. We also measured the growth rate at a lower temperature (30°C), as this is the optimum temperature for A. mediterranei NasAC (9). However, a similar growth delay during the further incubation period was observed in the nasAC-complemented strain, along with a slower nitrate consumption rate (Fig. S2, A-C). These results indicate that although both NasN and the hetero-oligomeric canonical NAS (represented by NasAC) play the same role in nitrate assimilation in vivo, the NasN-mediated nitrate reduction seems more efficient physiologically, conferring a growth advantage to bacteria under nitrate-limiting conditions.

NasN is an NADPH-dependent, diflavin-containing monomeric NAS
Sequence similarity searching of the NasN protein by HMMER (25) revealed that its N-terminal region (734 aa) contains the highly conserved CA domain required for nitrate reduction, sharing 53.2, 44.9, and 47.1% similarity (calculated by the BLSM62 algorithm) with the catalytic subunits of three characterized NASs from S. elongatus, A. mediterranei, and Bacillus subtilis, respectively (Fig. S3, A and B). Unlike the canonical catalytic subunit, NasN has an additional C-terminal region (617 aa) that contains a unique FMN-FAD/NAD domain (Fig. S4), known to function in receiving/transferring electrons from NAD(P)H to the final physiological acceptor characterized in the diflavin reductase family (26,27) and is hereafter designated as the DF domain. Moreover, the typical [2Fe-2S] cluster-binding domain contained in the canonical NASs is not identified in NasN. The purified His-tagged NasN protein (148.9 kDa) heterologously expressed in Escherichia coli showed a single band at ϳ150 kDa on SDS-PAGE (Fig. 4A). When subjected to analytical size-exclusion chromatography, an apparent molecular mass of ϳ147 kDa was observed for this protein (Fig. 4B), suggesting that the native form of NasN in solution is a monomer. The UV-visible spectrum of purified NasN showed absorbance maxima at 378 and 455 nm (solid line in Fig. 4C), which is typical for diflavin reductases and is ascribed to the DF domain (28, 29). However, the holoprotein exhibited no apparent spectral features typical of iron-sulfur proteins. Thus, we attempted to overexpress the N-terminal region of NasN to prevent the possible spectral interference between flavins and iron-sulfur clusters, as described previously (30 -32). The truncated NasN (N-NasN) was coexpressed and purified with a His-tag to near-homogeneity (Fig. 4A). Following anaerobic reconstitution of the iron-sulfur cluster, N-NasN showed a brown color with UV-visible absorbance maxima at ϳ410 nm (Fig. 4C), which is characteristic of the presence of the iron-sulfur cluster. The reconstituted N-NasN was determined to be 3.7 Ϯ 0.6 mol iron and 3.8 Ϯ 0.2 mol sulfur per mol of N-NasN monomers, and the reconstituted
We measured the cytoplasmic NaR activities of the crude cell extracts from M. smegmatis WT, narG-null, and nasN-null mutant strains under anaerobic conditions and found that there was no detectable activity in any of cytoplasmic fractions of these three strains when reduced methyl viologen (MV, an artificial chemical reductant) or NADH was used as an electron donor (Fig. 4D). However, when NADPH was used as an electron donor, both cytoplasmic fractions from M. smegmatis WT and narG-null mutant showed similar detectable NaR activity, but not that of the nasN-null mutant (Fig. 4D), suggesting that NasN rather than NarGHJI is most likely an active cytoplasmic NaR present in M. smegmatis during aerobic growth and that it is likely a novel NADPH-dependent NaR. In addition, MV-de-Mycobacterial monomeric assimilatory nitrate reductase NasN pendent NaR activity of the purified NasN was readily detected under anaerobic assay conditions (Fig. 4E), confirming that NasN is biochemically active in vitro. Among the physiological electron donors tested for the purified NasN, NADPH was preferred over NADH, as shown by a 26-fold greater activity with NADPH relative to NADH (Fig. 4E), further indicating that NasN is a novel NADPH-dependent NaR.
NasN is a monomeric enzyme lacking the typical [2Fe-2S] cluster involved in electron transfer in the canonical NASs. It is likely that the C-terminal DF domain of NasN functions in electron transfer with a preference for NADPH as its electron donor, which is typical of the diflavin reductases (26,27,33). Therefore, we measured the NaR activity of purified NasN in the presence of FMN and/or FAD and found that both flavins could significantly enhance the NADPH-dependent activity of NasN (Fig. 4F). Higher activity of ϳ10-fold was achieved by adding an appropriate amount of FMN and FAD to the assay mixture, compared with that without the addition of flavins. This result supports the proposition that the DF domain plays a crucial role in electron transfer for NasN-mediated nitrate reduction. In addition, it further confirms that NasN is an NADPH-dependent, diflavin-containing monomeric NAS. All this is consistent with its unique primary structure, domain organization, and biochemical properties, which are distinct from any of the previously characterized NASs. NasN is thus a novel prokaryotic NAS.
The biochemical properties of NasN were further characterized. NasN was shown to be oxygen-sensitive, as air-exposure of the purified enzyme resulted in a significant decrease in both NADPH-and NADH-dependent NaR activities (Fig. 4E), and thus all further experiments were performed under anaerobic assay conditions. NasN exhibited a maximum reaction rate at pH 7.5, 30°C in 100 mM phosphate buffer (Fig. S5, A and B). Although the activity of NasN rapidly declined at 40°C, the enzyme was stable at temperatures below 30°C, retaining over 85% activity after incubation for 10 min (Fig. S5C).

NasN exhibits greater NAD(P)H-dependent NaR activity and tolerance to oxygen than the canonical NADH-dependent NAS
The steady-state kinetics of NasN against its nitrate substrate fit well to the Michaelis-Menten model, with K m ϭ 12 M for nitrate, V max ϭ 860 nmol/min/mg, and a k cat value of 2100 min Ϫ1 (Fig. 4G). Although the K m value for nitrate of NasN is 2-fold greater than that of cyanobacterial Fd-NarB (ϳ6 M) (8), it is much less than those of previously reported NADH-dependent NASs with values ranging from 17 to 950 M (12, 13), indicating a stronger binding affinity of NasN to nitrate than that of the canonical NADH-dependent NASs.
The biochemical properties of NasN were compared with those of a canonical NADH-dependent NAS (heterodimeric A. mediterranei NasAC), by quantitatively determining the NaR activities of cytoplasmic fractions of the crude cell extracts from the complemented strains. Under standard anaerobic assay conditions, the NaR activity present in the NasAC-containing cytoplasmic fraction using either NADPH or NADH as an electron donor was readily observed, whereas no NaR activity could be detected in the NasA-containing cytoplasmic fraction (Table 1), demonstrating that the electron receiving/ transferring via FAD/NAD and [2Fe-2S] cluster is indispensable for NAD(P)H-dependent nitrate reduction catalyzed by the canonical NADH-dependent NASs in vitro. In contrast, , and ferritin (440 kDa) standards are indicated by the arrows. C, UV-visible absorption spectra of the reconstituted N-NasN, compared with the reconstituted holoprotein. D, specific NaR activities in cytoplasmic fractions of the crude cell extracts prepared anaerobically from M. smegmatis WT strain and mc 2 155⌬narG and mc 2 155⌬nasN strains. The strains were grown aerobically at 37°C in MPLN medium containing 10 mM NaNO 3 , except that 5 mM L-glutamine was used for the cultivation of mc 2 155⌬nasN. Reactions were conducted with electron donors of 400 M NADPH or NADH, or 150 M reduced MV at pH 7.5, 30°C under anaerobic assay conditions as described under "Experimental procedures." E, NasN activities employing different electron donors with or without pre-incubation in the air. For the air-exposed samples, the purified NasN was pre-incubated in the air for 1 h at 4°C. The activities were expressed as relative to the highest activity (100%).  , and mc 2 155⌬nasN::His-nasA Ame c strains, respectively. The strains were grown aerobically at 37°C in MPLN minimal medium containing 10 mM NaNO 3 , except that 5 mM L-glutamine was used for the cultivation of the mc 2 155⌬nasN::His-nasA Ame c strain. Anti-His tag monoclonal antibodybased ELISA method was used to determine the concentrations of His-tagged proteins in enzyme samples. Reactions were conducted as described under "Experimental procedures." Data are expressed as the mean Ϯ S.D. of three independent experiments. ND means not determined.

Mycobacterial monomeric assimilatory nitrate reductase NasN
the NasN-containing cytoplasmic fraction exhibited higher NAD(P)H-dependent NaR activity than that of NasAC-containing cytoplasmic fraction. The NADPH-dependent activity of NasN was especially high with an ϳ380-fold greater value than that of NasAC (Table 1). The k cat values of NasN were ϳ450and ϳ2-fold higher than those of NasAC when using NADPH and NADH as an electron donor, respectively, resulting in ϳ890and ϳ1.5-fold greater catalytic efficiencies (k cat / K m ) of NasN than those of NasAC. Therefore, NasN is more efficient than NasAC in NAD(P)H-dependent nitrate reduction in vitro. In contrast, when testing the NaR activity employing MV as an electron donor, NasN showed a much lower activity and catalytic efficiency than did NasAC (  (36) in the presence of oxygen.
To determine whether the oxidation of the [4Fe-4S] cluster can affect the enzyme activity of NASs, the NaR activities of air-exposed NasN-or NasAC-containing cytoplasmic fractions were measured. We found that the air-exposure of these anaerobically prepared cytoplasmic fractions resulted in a time-dependent decrease of NAD(P)H-dependent NaR activity, whereas no significant decrease of NaR activity was observed when cytoplasmic fractions were kept anaerobic (Fig. 4, I and J). This is consistent with previous findings that the oxidation of [4Fe-4S] cluster-containing proteins results in protein instability (38 -40). It is worth mentioning that for NasAC both NADPH-and NADH-dependent activities declined rapidly in air within 30 min, whereas for NasN, although its NADH-dependent activity was lost rapidly within 1 h, 25% of its NADPH-dependent NaR activity was retained after air-exposure for 4 h (Fig. 4, I and J). This is consistent with the above assay employing the purified NasN (Fig. 4E). No significant decrease in MV-dependent NaR activity was observed in any of the enzymes tested after airexposure (Fig. 4H). Based on this result, it may be assumed that MV-mediated electron transfer is independent of the [4Fe-4S] cluster; however, another scenario could be that the strong reducing capacity of MV reconstituted the oxidized [4Fe-4S] cluster to make it reactive. Nonetheless, these results highlight an essential role of the N-terminal [4Fe-4S] cluster in transferring electrons to the MoCo catalytic cavity during NAD(P)H-dependent nitrate reduction by NASs and demonstrate that the NAD(P)H-dependent NAS enzymes are generally sensitive to oxygen. Moreover, the NADPH-dependent, diflavin-containing monomeric NasN has been shown to be more tolerant to oxygen oxidation than the NADH-dependent hetero-oligomeric NAS enzymes that utilize their [2Fe-2S] cluster for electron transfer, such as A. mediterranei NasAC tested in this study.

NasN evolved more recently than other NASs and experienced multiple independent losses in the slow-growing pathogenic mycobacteria evolving from their fast-growing ancestors
To characterize the phylogenetic distribution of nasN in all organisms, a total of 7007 genomes were obtained via tBLASTn search and protein domain analysis, with multiple coexisting or singleton status of nucleotide sequences homologous to nasN (the CA, DF, or intact nasN homologs) from 6307 species (unique NCBI taxonomy ID) belonging to the domains of Archaea (267 species), Bacteria (5668 species), and Eukaryota (372 species) (Fig. 5A and Table S1).
In contrast to the widely distributed canonical NASs (represented by the CA homologs) identified in 267 species of Archaea, 4838 species of Bacteria, and 3 species of Eukaryota (filamentous fungi, i.e. Aspergillus oryzae, Aspergillus glaucus, and Leptosphaeria biglobosa), the intact nasN homologs are distributed merely in a limited number of bacterial species (579 species), wherein 210 species have both the intact nasN and CA homologs and 369 species only contain the intact nasN homologs ( Fig. 5A and Table S1). In particular, the intact nasN homologs are mainly present in the phyla Actinobacteria and Proteobacteria and are focused in very limited classes, with 325 species in the class of Actinobacteria and 231 species in three classes of Proteobacteria (22 from Alphaproteobacteria, 128 from Betaproteobacteria, and 81 from Gammaproteobacteria). The remaining intact nasN homologs are identified in the phyla Firmicutes (12 species), Acidobacteria (5 species), Verrucomicrobia (5 species), and Deinococcus-Thermus (1 species). Notably, most of the intact nasN homologs (78%) are present in 11 genera, which contain at least 10 nasN-containing species (Fig. 5A). Intriguingly, among the 369 nasN-containing species that do not contain CA homologs, some environmental bacteria such as Rhodococcus species (41,42) of Actinobacteria, as well as Acidobacterium ailaaui (43,44) and Granulicella mallensis (44,45) of Acidobacteria, are known to possess assimilatory nitrate reductase activity, strongly supporting the existence of NasN as an alternative pathway, in addition to the canonical NASs, for nitrate assimilation.
The nucleotide sequences homologous to nasN, encoding the N-terminal CA domain or C-terminal DF domain of NasN, can be individually identified in many canonical NAS enzymes and diflavin reductases, respectively (2,13,26). The wide distribution of the CA homologs in many bacteria and archaea species (Fig. 5A, bluish shading) suggests that the canonical NAS containing only the CA domain in the catalytic subunit (see Fig. 1) is distributed most prevalently, and it may be the earliest NAS of all prokaryotes. Three phylogenetic trees were constructed based on nucleotide sequences homologous to nasN, respectively (Fig. 5, B-D). The significant similarity in both clustering and phylogenetic branching of the nasN-derived CA and DF homologs in their corresponding trees (Fig. 5, B and C) indicates a similar evolutionary history or even a coevolution event shared by these two homologs. Moreover, nasN-derived CA and DF homologs are distributed as narrowly as is the intact nasN (Fig. 5, B-D). Based on the integration of their similar evolutionary histories and incongruent distribution, one may hypothesize that nasN evolved via the fusion of

Mycobacterial monomeric assimilatory nitrate reductase NasN
Mycobacterial monomeric assimilatory nitrate reductase NasN the homologous nucleotide sequences for CA and DF domains relatively late in the history of evolution. This resembles the evolution of some diflavin reductase family proteins, such as the cytochrome P450 reductase CPRBM3, formed by the fusion of the DF domain to their final physiological electron acceptors (see Fig. 1) (26,46).
We noticed that the intact nasN homolog is absent in 23 of 153 mycobacterial species with available genome sequences (Fig. 6, middle panel, and Fig. S6). Intriguingly, most of the nasN-lacking mycobacteria are slow-growing pathogenic species, such as all members of the tuberculosis-related M. tuberculosis complex (MTBC) and the leprosy-related Mycobacterium leprae group, as well as a few members of the opportunistic infection-related Mycolicibacter terrae complex (MTC). Even the three fast-growing species lacking the nasN gene are all conditional pathogens, i.e. the goat infection-related Mycolicibacter algericum (47) and the human infectionrelated Mycolicibacter thermoresistibile and Mycolicibacter insubricum (48,49).
To further delineate these losses which are unanimously found in the pathogenic mycobacteria, we analyzed the nasNrelated neighboring genomic regions with lengths of maximum 100 genes from the nasN-lacking species and their nasN-containing close relatives (Fig. 6, right panel). Although no known insertion sequences (50) were found within these regions, the difference of the remaining genes after the respective nasNrelated deletion events indicates that most of these events likely occurred independently. By comparing the nasN-related neighboring regions between the closely-related genomes, especially in the members of MTBC and MTC, at least eight deletion events can be observed (Fig. 6, right panel). Among these events, all of the nasN-lacking mycobacterial species in events 1-4 have highly similar remaining genes in the nasN-related genomic regions after nasN deletion occurred, indicating that they share the same gene loss pattern. We observed that the gene loss patterns of events 5-8 were different from those of events 1-4 ( Fig. 6, right panel). Thus, the dispersed independent loss of nasN along the tree of mycobacterial evolution revealed various mechanisms of nasN deletion. In conclusion, our analysis suggests that mycobacterial nasN deletion occurred repeatedly and independently at least several times along the evolutionary pathway and that all of the nasN-lacking species are pathogenic (some conditional) and mostly slow-growing.

Discussion
The NADPH-dependent, diflavin-containing monomeric NasN characterized by this study represents a novel type of NAS, which differs from the canonical NASs in its structure, electron-transferring mechanism, enzymatic properties, and phylogenetic distribution. The holoenzyme of the canonical NAS enzymes is hetero-oligomeric in nature, composed of one conserved catalytic subunit and at least one electron transfer subunit (which may be shared with the nitrite reductase) or one electron donor subunit (ferredoxin or flavodoxin) (Fig. 1), wherein the [2Fe-2S] cluster plays a crucial role, universally, in intermolecular electron transfer for nitrate reduction (8,13). In contrast, although the N-terminal region of NasN contains a canonical CA domain, it is NADPH-dependent and lacks the [2Fe-2S] cluster-binding domain. Instead, NasN contains a unique DF domain at its C terminus, which may mediate the intra-and inter-molecular electron transfer from NAD(P)H to its CA domain. Both the physiological and the accompanying biochemical data (Table 1 and Figs. 2, D-H, and 4, H-J) indicate that the diflavin-containing monomeric NasN is advantageous over the canonical heterodimeric NasAC in regard to low concentrations of nitrate in vivo and tolerance to oxygen. Considering that an increased catalytic activity of a cytochrome P450 system was exhibited by fusion of the DF domain with its catalytic subunit (46,51), the better performance of the monomeric NasN lacking the [2Fe-2S] cluster provides additional evidence to demonstrate the higher efficiency of the direct electron-receiving/transferring mechanism mediated by the DF domain.
MV is the most commonly used artificial electron donor for in vitro NaR activity assays in many previous studies (6, 8 -10, 12, 13). However, so far there have been few reports detecting the NaR activity of bacterial NASs employing their physiological electron donors. This technical obstacle might be attributed to the previous assumption that the canonical NADH-dependent NASs cannot employ NADPH as their physiological electron donor, even though they all theoretically bear the binding sites for both NADH and NADPH (9 -13). Our data, however, indicated that the reason for the lack of reported NADPH-dependent activity of NaR in vitro may instead be due to oxidation of the enzyme samples during preparation (Fig. 4, I and J). In other words, air-exposure of NAS enzymes during their preparation and activity assay could cause the originally low activity of the oxygen-sensitive NADH-dependent NAS to become undetectable. Another possible factor that should be mentioned is the presence of large quantities of membrane-bound oxidases in the crude cell extract always consuming the reductive NAD(P)H (52). In our experiments, an excess supply of NAD(P)H as the physiological electron donors was shown to be essential for a successful assay, particularly when the cytoplas- The gray-shaded region of the tree with labels is used to represent the genera with the total number of species containing the intact nasN homolog, except the genus whose number of nasN-containing species was less than 10. B and C, phylogenetic trees were constructed based on the nucleotide sequences of CA (9218) and DF (4230) homologs individually, which were extracted and deduplicated from genome sequences at NCBI. The original sequence sources, including intact nasN or other homologs (CA or DF homologs), are illustrated with the colors of the circular ring outside the tree. The colorized end points of the branches also represent the sequences that originated from the intact nasN homologs. D, phylogenetic tree of the intact nasN homologs (549 deduplicated nucleotide sequences). The details of the used nucleotide sequences are listed in Table S2.

Mycobacterial monomeric assimilatory nitrate reductase NasN
mic fraction of the crude cell extract was used as the crude enzyme.
Employing our optimized assay system, NasN not only demonstrated its high level of NADPH-dependent NaR activity in both crude cell extracts and the purified form but also was able to use NADH as an electron donor in vitro for catalysis, albeit the resulting activity was much lower than NADPH-dependent NaR activity (Table 1 and Fig. 4, E-J). We also found that, under the same assay conditions, there was detectable NADPH-dependent NaR activity present in the NasAC-containing cyto- Figure 6. Loss of nasN in mycobacterial species during evolution. The maximum-likelihood phylogenetic tree was constructed based on the 16S rRNA gene sequences from 153 genome-sequenced mycobacterial species listed by the LPSN database, with the G. bronchialis as an outgroup (Fig. S6). A pruned tree (middle panel) with 70 selected mycobacterial species is shown. All the nasN-lacking species have been listed, including the 23 confirmed species (shaded in gray) and 5 unconfirmed species (shaded in green), due to the possible dramatic genome reduction and rearrangement or their nasN-related segments located at the ends of the sequenced contigs. The five newly proposed mycobacterial clades (15)

Mycobacterial monomeric assimilatory nitrate reductase NasN
plasmic fraction (Table 1), confirming that the FAD/NAD domain of the NADH-dependent NAS functions in NADPH binding and electron transfer. These results also show that the NADPH-dependent NasN is different from the NADH-dependent NAS in their preference of the physiological electron donors, i.e. NADPH and NADH, respectively.
Generally, taxonomic distribution combined with phylogenetic analysis has been adopted for inferring evolutionary events on a large time scale (53)(54)(55). The hypotheses concerning the origin of NasN, including CA-DF fusion and late-evolving in bacterial evolution, are jointly supported by the extremely narrow distribution of the intact nasN homologs limited to a few taxa of phyla Actinobacteria and Proteobacteria (Fig. 5,  A and D) and the restricted nucleotide diversity of NasN-related CA and DF homologs, respectively (Fig. 5, B and C). Furthermore, according to the metadata from the biosample of nasN-containing bacterial species, most of these bacteria (mainly belonging to Actinobacteria and Proteobacteria) are ubiquitously isolated from the environment, such as soil and water (Table S2), implying that the high efficiency of NasNmediated nitrate assimilation in the environmental bacteria might enforce a probable strong selection favoring the expansion of the nasN-encoding population in the corresponding species. The incongruent distribution of nasN between the fastgrowing and slow-growing mycobacteria ( Fig. 6 and Fig. S6) coincides well with the hypothesis that the slow-growing mycobacteria originated from the ancestral fast-growing environmental mycobacteria (56 -58). The nasN-containing bacteria may have a growth advantage by efficiently assimilating nitrate when competing with other bacteria under nitrate-limiting conditions, as indicated by our in vivo growth studies (Fig. 2E) and by the observation that most of the nasN-containing bacteria are isolated from the environment (Table S2). Therefore, these widely distributed and multiple independent nasN-loss events mainly in the species of slow-growing pathogenic mycobacteria seem unlikely to be evolutionarily random. In fact, many intracellular bacterial pathogens such as M. tuberculosis have evolved contrasting lifestyles within the host cell for persistence in the host and for infecting new hosts, rather than the inorganic nitrogen source-dependent rapid growth physiology (59). In addition, M. tuberculosis has evolved multiple strategies to utilize organic nitrogen sources derived from their hosts rather than the rare nitrate (60).
NarGHJI of M. tuberculosis strongly reduces nitrate under both anaerobic and aerobic conditions (16,61,62) and thus is the sole NaR responsible for both nitrate assimilation and respiration (16). This enzyme is also found to be involved in the adaptation of M. tuberculosis to multiple intracellular stresses (60). In contrast, the NarGHJI homolog in M. smegmatis has no detectable activity in a rapid anaerobic dormancy model (61) and is believed to exhibit weak activity in a hypoxia dormancy model induced by slow oxygen consumption (17). Furthermore, our data confirm that M. smegmatis expresses NasN rather than NarGHJI for highly efficient nitrate assimilation under aerobic conditions (Fig. 4D). These major differences in nitrate metabolism in vivo between the pathogenic and environmental mycobacteria may shed light on our traditional research strategy employing M. smegmatis as a laboratory model for M. tuberculosis physiology studies, particularly helping to adjust to utilizing more appropriate model systems regarding nitrate metabolism, both assimilatory and respiratory.

Bacterial strains, plasmids, and culture conditions
The strains and plasmids used in this study are summarized in Table S3. E. coli strains were grown aerobically at 37°C in Luria-Bertani (LB) broth. M. smegmatis strains were grown aerobically at 37°C in Middlebrook 7H9 broth or 7H10 agar plate supplemented with 0.2% v/v glycerol and 0.05% v/v Tween 80. A. mediterranei U32 was grown aerobically at 37°C in Bennet medium as described previously (9). To analyze nitrogen assimilation in M. smegmatis strains, a nitrogen-free modified M. phlei medium (63) 4 ), to which different nitrogen sources were added when needed: NaNO 3 (0.5-10 mM), NaNO 2 (1 mM), NH 4 Cl (1-10 mM), or L-glutamine (5 mM), respectively. When required, antibiotics were used at the following concentrations: kanamycin, 25 and 50 g/ml; hygromycin, 50 and 100 g/ml for M. smegmatis and E. coli, respectively. All chemicals used in this study were obtained from Sigma-Aldrich.

Construction of M. smegmatis mutant and complemented strains
The nasN gene (Msmeg_4206) was disrupted in M. smegmatis mc 2 155 and replaced with the hygromycin-resistance cassette by allelic exchange as described previously (64). Briefly, two fragments of 1.2-and 1.0-kb containing the upstream and downstream regions of nasN, respectively, were amplified from M. smegmatis mc 2 155 genomic DNA by PCR. The primers for gene knockout are listed in Table S4. The PCR products digested with HindIII/NheI and AvrII/AflII were subsequently inserted into the corresponding sites of the pYUB854 vector, resulting in the knockout plasmid pYUB854-nasN. Then, mc 2 155competent cells were transformed and screened for hygromycinresistant colonies and further verified by PCR and DNA sequencing. M. smegmatis mc 2 155⌬narG and mc 2 155⌬glnR mutant strains were constructed in the same way as the mc 2 155⌬nasN mutant strain (Fig. S7, A-F).
For complementation experiments, the M. smegmatis nasN gene together with its native promoter was cloned into an integrating vector pMV306. pMV306-nasN Msm was used as the template for site-directed mutagenesis of nasN, yielding plasmid pMV306-nasNdelG Msm with a deletion of nucleotide G 1850 in nasN. To constitutively express proteins in M. smegmatis, pMV306H was constructed by inserting the hsp60 promoter from pMV261 into pMV306. For the quantitative measurement of the NaR activity, His-tagged M. smegmatis nasN, A. mediterranei nasAC, and nasA were inserted into pMV306H, respectively. The complementary recombinants were screened by hygromycinand kanamycin-resistant screening after introducing the DNA sequencing-verified plasmids to the mc 2 155⌬nasN strain.

Measurement of growth and nitrogen utilization
M. smegmatis strains were grown aerobically at 37°C in 7H9 broth with shaking at 180 rpm overnight until the culture reached the stationary growth phase. The overnight cultures were washed twice with the nitrogen-free MPLN medium by centrifuging at 3000 ϫ g for 10 min and then adjusted by using MPLN medium to an OD 600 of ϳ1.0 (corresponding to ϳ1 ϫ 10 8 cells/ml) as the seeding cultures. For the spot dilution assay, the inocula were prepared by 10-fold serial dilutions. Two microliters of dilutions ranging from 10 Ϫ1 to 10 Ϫ5 were spotted onto MPLN agar plates containing 10 mM NaNO 3 and then incubated at 37°C for 4 days. For the growth phenotype test, the seeding cultures were inoculated (1:200, v/v) into 50 ml of 7H9 broth or liquid MPLN medium containing different nitrogen sources. Then the cultures were incubated aerobically at 37°C with shaking at 100 rpm. The growth of these cultures was monitored by measuring OD 600 , and the growth rates were determined by regression analysis. Nitrate and nitrite concentrations in cultures were measured by using the nitrite/nitrate assay kit (Roche Diagnostics, Germany) with a detection limit of 0.02 mg/liter nitrate/nitrite. Ammonia and glutamine concentrations in cultures were measured by using the ammonia assay kit and the glutamine/glutamate determination kit (Sigma-Aldrich) with detection limits of 0.2 mg/liter ammonia and 0.02 mg/liter glutamine, respectively.

Enzyme preparation and protein concentration determination
To purify M. smegmatis NasN, the full-length nasN gene was PCR-amplified from M. smegmatis mc 2 155 genomic DNA using the primers listed in Table S4 and cloned into the pET28b(ϩ) vector. The recombinant plasmid pET-nasN was transformed into E. coli BL21(DE3) for expressing N-terminal His-tagged NasN protein. The recombinant E. coli cells were grown aerobically at 37°C in LB broth supplemented with 50 g/ml kanamycin and trace elements solution as mentioned above. Protein expression was induced by adding isopropyl ␤-D-thiogalactoside into the culture to a final concentration of 0.5 mM when the cell density reached an OD 600 of 0.6 -0.8. After continuous cultivation for an additional 12 h at 16°C, the cells were harvested by centrifugation at 5000 ϫ g for 15 min at 4°C and resuspended in a lysis buffer (pH 7.4) containing 50 mM sodium phosphate, 0.5 M NaCl, 10% v/v glycerol, 20 mM imidazole, and 1 mM phenylmethylsulfonyl fluoride (PMSF). It should be noted that all operations were carried out under anaerobic conditions from initial sample preparation to the final analysis of the purified protein. Cells were disrupted by sonication on ice followed by centrifugation at 15,000 ϫ g for 60 min to remove cell debris and the insoluble fraction. Then the supernatant was loaded onto a Ni-Sepharose Fast Flow column (GE Healthcare, Sweden) pre-equilibrated with lysis buffer. After washing with lysis buffer supplemented with 50 mM imidazole, recombinant NasN protein was eluted with lysis buffer supplemented with 300 mM imidazole and 1 mM DTT. Fractions containing NasN were combined and concentrated using a 10-kDa cutoff ultrafilter (Merck, Ireland) and then loaded onto a Hiload 16/60 Superdex 200 prep grade column on an AKTA FPLC system (GE Healthcare, Sweden) for further puri-fication by size-exclusion chromatography at a flow rate of 1 ml/min and eluted with elution buffer, pH 7.2, containing 50 mM phosphate, 0.15 M NaCl, 5% v/v glycerol. The purified protein was concentrated to ϳ5 mg/ml maintained in the elution buffer by ultrafiltration and stored in small working aliquots at Ϫ70°C. For gel-filtration analysis, the gel-filtration calibration kits (GE Healthcare, UK), were used. These standard proteins, including aprotinin (6.5 kDa), ovalbumin (44 kDa), conalbumin (75 kDa), aldolase (158 kDa), and ferritin (440 kDa), were dissolved in the same buffer as purified NasN protein and loaded onto the column with the same volume. The apparent molecular mass and the oligomeric state of NasN were determined from its elution volume relative to those of the standard proteins plotted against the logarithm of their molecular masses. The N-terminal region of NasN (N-NasN) was prepared in a similar way as the holoprotein.
For the preparation of cytoplasmic fractions, M. smegmatis strains were grown aerobically at 37°C in MPLN medium containing 10 mM nitrate and trace elements solution as mentioned above, to which 5 mM L-glutamine was added when needed. The cells of late-log phase cultures with OD 600 0.8 -1.0 were harvested by centrifugation at 5000 ϫ g for 10 min at 4°C, washed, resuspended using 100 mM sodium phosphate buffer (pH 7.5) containing 10% v/v glycerol and 1 mM PMSF, and sonicated under anaerobic conditions, and then the cytoplasmic cell fractions were collected as the crude enzyme extracts after centrifugation at 16,000 ϫ g for 1 h at 4°C.
The Bradford method was used to determine the concentration of the purified recombinant NasN protein and the total protein concentration of cytoplasmic fractions of the M. smegmatis strains. The concentration of His-tagged protein in cytoplasmic fractions was determined by ELISA using an anti-His tag mAb, with recombinant NasN protein as a standard. Primary antibodies against His-tag (1:2000 dilution; mouse monoclonal, M20001, Abmart, Shanghai, China), GroEL (1:5000 dilution; mouse monoclonal, ab20519, Abcam), and HRP-conjugated goat anti-mouse secondary antibody (1:5000 dilution; M21001, Abmart, Shanghai, China) were used.

Cofactor analysis
UV-visible absorbance spectrum (anaerobic, sealed cuvette) was obtained using a NanoDrop ND1000 spectrophotometer (Thermo Fisher Scientific). To increase [4Fe-4S] cluster content in protein, an iron-sulfur cluster reconstitution was performed, as described previously (65). Iron and acid-labile sulfide contents were measured as described by Pierik et al. (66).

Nitrate reductase activity assay
Reaction conditions, including temperature, pH, cofactor, and amount of enzyme, were optimized. Standard NAD(P)Hdependent NaR activity was assayed at 30°C for 10 min under anaerobic assay conditions in a reaction mixture of a 0.5-ml volume that included 100 mM phosphate buffer (pH 7.5), 10 mM NaNO 3 , and appropriate amounts of purified or crude enzymes, in the presence of electron donors of 400 M NADPH or 400 M NADH, plus 25 M FMN and 25 M FAD. For the determination of MV-dependent NaR activity, 150 M MV and 12 mM dithionite solution (Na 2 S 2 O 4 /NaHCO 3 ) were added to Mycobacterial monomeric assimilatory nitrate reductase NasN the assay mixture to start the reaction under anaerobic conditions. After 10 min of reaction time, the reaction was stopped by vigorous stirring under aerobic conditions until the blue color had disappeared in the reaction mixture. The concentration of accumulated nitrite was determined by using the nitrite/nitrate assay kit (Roche Diagnostics, Germany). NaR-specific activity is expressed as nanomoles of nitrite/min/mg of proteins (nmol/ min/mg). For NaR kinetics experiments, the initial velocities of nitrate reduction were plotted against nitrate concentrations and analyzed using nonlinear regression to the Michaelis-Menten equation using GraphPad Prism 5.0 program.

EMSA and DNase I footprinting assay
M. smegmatis GlnR (Msmeg_5784) was expressed and purified as described by Lin et al. (67). The 294-bp promoter region and 302-bp coding region of M. smegmatis nasN were amplified by PCR and cloned into the HincII site of the pUC18H vector (Tolo Biotech., Shanghai, China), using the primers listed in Table S4. Then the obtained plasmids were verified by DNA sequencing and used as the templates for preparation of the EMSA probes by PCR using primers 6-carboxyfluorescein (FAM)-labeled M13F (Ϫ47) and M13R (Ϫ48). To investigate the binding of GlnR to the labeled probes, EMSA was performed as described previously (21). Briefly, in a 20-l volume reaction system, 50 ng of FAM-labeled probe was incubated at 30°C with varying amounts of purified GlnR protein for 30 min in a binding buffer containing 50 mM Tris-HCl (pH 8.0), 100 mM KCl, 2.5 mM MgCl 2 , 1 mM DTT, 10% v/v glycerol, and 100 g/l sheared salmon sperm DNA. The FAM-labeled probe from the coding region of nasN was used as a negative control. After electrophoresis, gels were scanned directly with an ImageQuant TM LAS 4000 imaging system (GE Healthcare, UK).
DNase I footprinting assay was carried out by Tolo Biotech., following the same procedure and condition as described previously (68). Specifically, the labeled probe was prepared in the same way as described above in the procedure of EMSA. Then, 400 ng of the FAM-labeled probe was incubated with different amounts of recombinant GlnR in a total volume of 40 l. After digestion and purification, the samples were loaded into an ABI 3130 sequencer and analyzed with PeakScanner software version 1.0 (Applied Biosystems).

Bioinformatics
The nucleotide sequences homologous to nasN were obtained from the NCBI nucleotide database and the RefSeq database (Bacteria and Archaea) by tBLASTn analysis (e-value Ͻ0.01, maximum target sequence: 20,000) using the complete sequence of M. smegmatis nasN as a query. The tBLASTn result was further filtered to exclude false-positive matching, which was aligned with an inconsistent Pfam domain (69) compared with the domains of M. smegmatis NasN. Among these remaining sequences, three kinds of nucleotide sequences homologous to nasN (the CA, DF or intact nasN homologs) were assigned to each sequence based on annotated domains. To reveal the evolution pattern of nasN, these nucleotide sequences were deduplicated and then aligned with MAFFT version 7 (70). The generated multiple sequence alignment was used to construct a phylogenetic tree by FastTree version 2 (71). The unique taxonomic unit (taxonomy ID) of the genome containing a homologous sequence was retrieved and summarized into the taxonomic unit level. The NCBI taxonomy tree was constructed based on the whole taxonomy ID retrieved using ETE 3 (72) and the taxdump database of NCBI and visualized with GraPhlAn (73).
The complete 16S rRNA gene sequences of 153 genomesequenced mycobacterial species listed by the LPSN database (74) were collected from the NCBI database and trimmed to ϳ1400 nucleotides corresponding to positions 53-1460 of the E. coli 16S rRNA gene sequence, as described previously (58). Then, the trimmed sequences were aligned using MAFFT version 7 (70) and used to construct the phylogenetic tree by RAxML version 8 (75). The locus_tags of the used 16S rRNA gene sequences were summarized in Table S5.

Statistical analysis
All data were expressed as means Ϯ S.D. where appropriate. Student's t test (unpaired, two-tailed) and one-way ANOVA with Tukey corrections were performed for statistical analysis by using the GraphPad Prism 5.0 program.