Leveraging a large microbial strain collection for natural product discovery

Throughout history, natural products have significantly contributed to the discovery of novel chemistry, drug leads, and tool molecules to probe and address complex challenges in biology and medicine. Recent microbial genome sequencing efforts have uncovered many microbial biosynthetic gene clusters without an associated natural product. This means that the natural products isolated to date do not fully reflect the biosynthetic potential of microbial strains. This observation has rejuvenated the natural product community and inspired a return to microbial strain collections. Mining large microbial strain collections with the most current technologies in genome sequencing, bioinformatics, and high-throughput screening techniques presents new opportunities in natural product discovery. In this review, we report on the newly expanded microbial strain collection at The Scripps Research Institute, which represents one of the largest and most diverse strain collections in the world. Two complementary approaches, i.e. structure-centric and function-centric, are presented here to showcase how to leverage a large microbial strain collection for natural product discovery and to address challenges and harness opportunities for future efforts. Highlighted examples include the discovery of alternative producers of known natural products with superior growth characteristics and high titers, novel analogs of privileged scaffolds, novel natural products, and new activities of known and new natural products. We anticipate that this large microbial strain collection will facilitate the discovery of new natural products for many applications.

Natural products have been exquisitely tailored via evolution to elicit potent and unique biological activities, rendering them unrivaled in structural complexity and diversity. Accordingly, natural products have served as drug leads for the pharmaceutical industry, affording antibiotics and anticancer compounds, greatly improving the quality of life for humanity. Of the FDA 2 -approved small molecule therapeutics, 67% of anti-infective and 83% of anti-cancer drugs are natural products, natural product derivatives, or inspired by natural products (1,2).
Bacteria and fungi are prolific resources of natural products. Among the half-million of natural products known to date, ϳ70,000 of them are of bacterial and fungal origin, of which approximately half (ϳ33,500) have had their bioactivities identified (3,4). Of the 70,000 total, ϳ30,000 are from fungi and ϳ40,000 have been isolated from bacteria, of which about half (ϳ20,000) are from actinobacteria (4).
Over the last century, many microbial strains have been collected around the world for natural product discovery. The largest commercial strain collections include the American Type Culture Collection (ATCC) and the DSMZ-German collection. ATCC houses 18,000 bacterial strains from over 750 genera and over 110,000 fungal strains from over 1,500 genera (https://www.atcc.org), 3 whereas the DSM houses over 24,000 bacterial and 2,571 fungal strains (https://www.dsmz.de). 3 Strategies for prioritization have become increasingly important due to the large number of microbial strains available. Genome sequence data are now generally regarded as a very useful starting point for prioritization. Over the last 2 decades, high-throughput capabilities and lower costs of DNA sequencing have enabled genome sequencing of many microbial strains. There are currently over 1,000 fungal and 20,000 actinobacterial whole-genome sequences publicly available from the Joint Genome Institute (JGI, https://genome.jgi.doe.gov/portal) and the National Center for Biotechnology Information (NCBI, https://www.ncbi.nlm.nih.gov).
One of the most surprising findings from microbial genome sequencing is that only a very small fraction of the biosynthetic potential of Nature has been realized. On average, a bacterial genome harbors ϳ30 biosynthetic gene clusters (BGCs), potentially encoding the production of 30 natural products (5). These numbers are even higher for fungal strains. According to the Atlas of Biosynthetic Gene Cluster within the Integrated Microbial Genome system (IMG-ABC) database (as of August 15, 2019), there are a total of 1,178,352 BGCs, of which 809,297 This work was supported by National Institutes of Health Postdoctoral Fellowships GM133114 (to A. D. S.) and GM128345 (to C. N. T.). This is manuscript no. 29884 from The Scripps Research Institute. The authors declare that they have no conflicts of interest with the contents of this article. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. 1 To whom correspondence should be addressed. cro REVIEWS BGCs are from bacteria; however, only 2,487 BGCs have been associated with characterized natural products (6). Although the annotated BGCs certainly contain redundancies, the number of natural products that remain undiscovered is certainly immense.

Microbial strain collection at The Scripps Research Institute
During the 1990s, many pharmaceutical companies eliminated their natural product discovery programs in pursuit of high-throughput screening (HTS) and combinatorial libraries. As a result, microbial natural product discovery shifted to small companies and academic institutes. In 2011, the Natural Product Library Initiative (NPLI) at The Scripps Research Institute (TSRI) was launched; the goal was to build a microbial strain collection and to develop technologies to construct a natural product library (NPL) consisting of crude extracts, partially purified fractions, and pure natural products, thus complementing the small molecule library at TSRI for HTS. Central to the NPLI is the microbial strain collection, including both our in-house collection and the historic pharma collection that Pfizer contributed to TSRI in 2018. More information on the TSRI collection can be found at https://www.scripps.edu/shen/ NPLI/npliattsri.html. 3 This large microbial strain collection has enabled natural product discovery efforts that led to the scaffold-and function-directed discovery of alternative producers of known natural products with superior growth characteristics and high titers, novel analogs of privileged scaffolds, novel natural products, and new activities of known and new natural products.
The newly-expanded NPL builds off previous successes and will undoubtedly accelerate natural product research conducted at TSRI. The microbial strain collection at TSRI currently contains a total of 217,352 bacterial and fungal strains, of which 62,328 are actinobacteria, 14,465 other bacteria, 92,225 fungi, and 48,334 unidentified (bacteria or fungi) (Fig. 1A). These strains were isolated over the last eight decades with the majority acquired between 1940 and 2010 (Fig. 1C). The wide time range of collection allows for capture of chemical diversity based on evolution and environmental cues, which change over time and are impossible to reproduce in laboratory settings today. Geographically, these strains were isolated from 109 different countries (Fig. 1D) with factors such as climate and ecology that further increase the chemical and biological diversity. Although these statistics, as well as taxonomic data, are incomplete, the fraction of the strains whose information is available provides a glimpse into the vast spatial, temporal, and taxonomic diversity of the entire collection (Fig. 1).
The actinobacterial strains are very diverse in terms of taxonomy, geographical origin, and collection time.  (Fig. 1D).
The potential number of natural products awaiting discovery from the microbial strain collection at TSRI is immense. Estimated on the assumption of ϳ30 BGCs per strain, the strains in the collection could encode more than 6 million BGCs, i.e. the potential of producing more than 6 million natural products. In reference to the ϳ70,000 microbial natural products known to date, the current number of known natural products is only ϳ1% of this value, leaving millions of compounds to be discovered. Of course, there will be redundancies with many strains producing the same natural products or structurally similar congeners. However, these redundancies are unlikely to fundamentally reduce the total number of natural products waiting for discovery. In fact, redundancies in the forms of alternative producers or producers of congeners could allow for systematic study of gene regulation of natural product biosynthesis and production, on both the pathway-specific and host-genome wide levels. Furthermore, these redundancies can be used to exploit the evolution of natural product biosynthetic pathways in Nature for combinatorial biosynthesis.
A fraction of the strain collection has also been cultured in multiple media, under varying fermentation conditions, to capture the natural product diversity as a part of the NPL for HTS. The current NPL contains 42,612 crude extracts from 8,020 actinobacteria and 136,123 crude extracts from 12,060 fungi. Of these strains cultured, 3,025 actinobacteria and 2,498 fungi have been selected for large-scale fermentation, the crude extracts of which have been subjected to C-18 chromatography, affording 32,832 and 24,980 partially-purified fractions, respectively (Fig. 1B). Again, the cultured ϳ20,000 strains could produce up to ϳ600,000 natural products (based on ϳ30 BGCs per strain estimation). Even if only 10% of the natural product diversity is captured within the crude extracts and partiallypurified fractions, there could be ϳ60,000 natural products waiting to be discovered by HTS against emerging biology.
Detailed herein are examples from our current efforts showcasing two complementary strategies to leverage the microbial strain collection at TSRI for natural product discovery. Structure-centric approaches rely on genomics and bioinformatics, whereas function-centric approaches rely on the innate biological activity of the natural products.

Structure-centric approaches
Advances in DNA sequencing and bioinformatics, along with an exponential increase in available microbial genome sequence data, have highlighted the biosynthetic potential of JBC REVIEWS: Leveraging strain collection for natural products Nature and allow for targeted natural product discovery via genome mining. Correlation of a natural product structure to its BGC has enabled the identification of alternative producing strains for the natural product of interest, as well as the targeted discovery of natural products based on their structural features, such as scaffold or key functional groups.
Because the strain collection at TSRI has not been sequenced, we developed a real-time PCR (RT-PCR) highthroughput screening method to identify strains harboring BGCs encoding the biosynthesis of a given class of natural products (7). Representative hits are then subjected to whole-genome sequencing. This method has been successfully utilized to screen the strain collection for three families of natural products, platensimycin/platencin (PTM/PTN), enediynes, and leinamycin (LNM) (Figs. 2 and 3).
PTM and PTN are highly sought-after natural products with unique, but related, structures, as well as unprecedented activ-ity inhibiting bacterial fatty acid biosynthesis (12,13). PTM and PTN both contain the same 3-amino-4-hydroxybenzoic acid moiety connected via an amide bond to distinct diterpenoidderived ketolides. PTM was discovered from Streptomyces platensis MA7327, originally isolated from South Africa (8), whereas PTN was discovered from S. platensis MA7339, which was isolated from Spain (9). We subsequently identified both the PTM and PTN BGCs from the S. platensis MA7327 and MA7339, respectively, and discovered that the BGC encoding PTM biosynthesis in the S. platensis MA7327 strain included all necessary genes for PTN production and an additional five genes encoding a "PTM" cassette (14). The S. platensis MA7327 genome therefore contains a PTM-PTN dual BGC, encoding the proteins necessary for the biosynthesis of both PTM and PTN (10). The wild type S. platensis MA7327 and MA7339 strains produced PTM and PTN in trace quantities (ϳ1-4 mg/liter). Although we were successful in generating the first These statistics are based on 20% of the actinobacteria and 41% of fungi from the total collection, for which collection locations could be traced. E, taxonomic information for actinobacteria is as follows: Actinomadura, 4.9%; Actinoplanes, 6.4%; Microbispora, 2.6%; Micromonaspora, 9.7%; Micropolyspora, 1.4%; Nocardia, 6.7%; Rhodococcus, 0.9%; Streptomyces, 55.2%; Streptosporangium, 1.1%; and the 79 other genera, 11.1%. Taxonomic information for fungi is as follows: Acremonium, 1.3%; Candida, 5.5%; Chaetomium, 1.6%; Fusarium, 4.4%; Mortierella, 1.1%; Mucor, 3.1%; Penicillium, 9.9%; Rhisopus, 1.2%; Trichoderma, 1.2%; and the 2,665 other genera; 70.7%. These statistics are based on 20% of the actinobacteria and 66% of fungi from the total collection for which some forms of the taxonomic data could be traced. JBC REVIEWS: Leveraging strain collection for natural products generation of PTM and PTN overproducers by inactivating the pathway-specific negative regulator PtmR1 or PtnR1 (11,12), the resultant mutant strains failed to sporulate and suffered from inferior growth characteristics. This resulted in our inability to further manipulate the PTM and PTN biosynthetic machinery for titer improvement and structural diversity.
We next turned to the strain collection at TSRI and applied our RT-PCR strain prioritization strategy to search for alternative producers (10). Using primers specific for the terpene cyclase genes in PTM and PTN BGCs (i.e. ptmT4, ptmT2, ptmT1, and ptmT3), 1,911 actinobacteria were screened, affording six strains as potential PTM and PTN producers, all differing in the geographical distribution, morphology, and growth characteristics (Fig. 3A) (8). Representative hits were subjected to whole-genome sequencing, which confirmed that they were all PTM-PTN dual producers. The genetic amenability was confirmed by inactivation of the PtmR1 negative regulator to afford mutant strains that overproduced PTM and PTN. Although the PTM-PTN dual BGCs are highly homologous, with DNA sequence identity varying between 90 and 99%, these strains produce PTM with varying titers, with the highest titer reaching 1.6 g/liter upon fermentation optimization, which is more than a 1,000-fold improvement from the original MA7327 and MA7339 strains (11)(12)(13)(14). Identification of alternative producers with high titers and genetic amenability allowed for detailed biosynthetic studies, as well as isolation of sufficient quantities of PTM to enable medicinal chemistry for structural diversity and further biological evaluation. This example highlights how a single genotype, i.e. a BGC, can exhibit different phenotypes depending on the genomic context.
The enediyne class of natural products are known for their unique pharmacophore consisting of two alkynes in conjuga-tion with a double bond or incipient double bond within either a 9-or 10-membered macrocycle (15,16). The 9-membered enediynes are found as a chromoprotein complex, whereas the 10-membered enediynes are discrete natural products. This privileged scaffold gives rise to the extremely potent antitumor activity, via DNA damage (Fig. 3B). Currently there are 13 members of the enediyne family of natural products. Comparative characterization of the known 9-and 10-membered enediyne BGCs led to the identification of a conserved set of genes, implicated in the biosynthesis of the enediyne core. This set of genes, known as the enediyne polyketide synthase (PKS) gene cassette, consists of E3, E4, E5, E, and E10.
C-1027, first isolated from Streptomyces globisporus in 1993 (17), has been studied as a model for 9-membered enediyne biosynthesis and mechanism of action (18). Additionally, C-1027 is undergoing phase II clinical trials, requiring a reliable and sufficient supply of the material to support further development. Detailed study into the regulation of the C-1027 BGC in S. globisporus has enabled the construction of C-1027 overproducing recombinant strains by the combination of inactivating the negative regulator SgcR and overproducing the positive regulator SgcR1. The resultant recombinant strains produced C-1027 with a titer of 466 mg/liter, representing a 6-fold increase over the WT (74 mg/liter) (19,20).
Adopting the same RT-PCR strain prioritization strategy, we screened 3,400 actinobacteria strains from the TSRI strain collection, using degenerate primers designed to target the enediyne PKS cassette. This effort led to the identification of 81 strains harboring the enediyne PKS cassettes. Further bioinformatic analysis grouped these strains into 28 distinct clades. From these, 31 representatives were subjected to whole-genome sequencing, confirming that each of them contained a distinct enediyne BGC (21, 22).

Figure 2. Schematic representation of how to leverage a microbial strain collection for natural product discovery by two complementary approaches.
A, structure-centric approach utilizes genomic information from the strain collection, along with bioinformatics, to prioritize privileged strains based on a unique pharmacophore or scaffold of the target natural products. B, function-centric approach utilizes biological activity, via HTS against targeted biology, to prioritize privileged strains based on unique targets or mechanism of action. Upon identification of these privileged strains, correlation of the targeted natural products or biology to specific BGCs and exploitation of these BGCs via enabling technology, such as cluster activation or heterologous expression, allow for the characterization of novel natural products, alternative producers of known natural products, and novel enzymes for combinatorial biosynthesis and biocatalysis.
One of the 28 clades contained four strains, Streptomyces sp. CB02366, CB00657, CB02329, and CB03608, together with the original C-1027 producer S. globisporus, indicating that they were alternative C-1027 producers (21,22). Close examination of the five strains within this clade showed that they were from distinct geographic locations, were morphologically different, and harbored the C-1027 BGC with high DNA sequence identity (83-99%) (23). Remarkably, under the optimized conditions for C-1027 production, developed for the engineered C-1027 overproducer from the WT S. globisporus, the four alternative producers produced C-1027 with titers up to 900 mg/liter, which is ϳ12-fold higher than the original S. globisporus WT (19,20). This example exemplifies once again that iden-tical or similar BGCs can be found from distinct geographical locations. Within the varying genomic context, strains harboring similar BGCs can exhibit different phenotypes and produce the desired natural products in evolutionary-optimized high titers.
In addition to alternative producers of C-1027, this study also enabled the engineered production of the anthraquinone-fused enediyne family of natural products. As of 2015, there were two known anthraquinone-fused enediyne family members, dynemicin (DYN) from Micromonospora chersina (24) and uncialamycin (UCM) from Streptomyces uncialis (25). Studies of this class of enediyne natural products were hampered by the low titers from both native strains, along with genetic intractability Figure 3. Examples of structure-centric approaches to leverage the microbial strain collection at TSRI for natural product discovery. A, PTM and PTN contain unique diterpenoid scaffolds, the biosynthesis of which is encoded by terpene cyclases, T1, T2, T3, and T4. An RT-PCR screen of a fraction of the actinobacteria strains in the collection prioritized strains containing all four cyclase genes, resulting in the identification of alternative PTM and PTN producers with high titers and superior growth characteristics. B, enediyne natural products contain a unique pharmacophore, consisting of two triple bonds in conjugation with a double bond or incipient double bond, the biosynthesis of which is encoded by the enediyne PKS gene cassette, E, E10, E3, E4, and E5. An RT-PCR screen of a fraction of the actinobacterial strains in the collection prioritized strains containing the enediyne PKS gene cassette, resulting in the identification of alternative C-1027 producers with high titers and new anthraquinone-fused enediynes TNMs. The TNM producer, with its high TNM titers and superior growth characteristics, has enabled the development of a platform strain for engineered biosynthesis and production of the anthraquinone-fused family of enediyne natural products. C, LNM contains a unique sulfur-containing heterocycle, where the sulfur incorporation chemistry is encoded by a DUF-SH didomain. An RT-PCR screen of a fraction of the actinobacterial strains in the collection prioritized strains containing the DUF-SH didomain, resulting in the identification of a family of modular biosynthetic pathways that exemplify how Nature does combinatorial biosynthesis for the LNM family of natural products. JBC REVIEWS: Leveraging strain collection for natural products for DYN in M. chersina and the inability to produce UCM in submerged fermentation of S. uncialis. To overcome these challenges, we set out to search for strains encoding BGCs of novel anthraquinone-fused enediynes, with the ultimate goal of developing them into platform strains. This platform can be utilized for the engineered production of all anthraquinonefused enediyne family of natural products.
Genome mining of the sequenced 31 representative strains identified Streptomyces sp. CB03234 as a potential anthraquinone-fused enediyne producer (21). We also mined all of the sequenced bacterial genomes publicly available at JGI and NCBI to complement our own strain collection, identifying Micromonospora yangpuensis as another potential anthraquinone-fused enediyne producer (26). Fermentation of Streptomyces sp. CB03234 afforded tiancimycin A (TNM), whereas M. yangpuensis afforded yangpumicin A (YPM) (26), two new anthraquinone-fused enediynes. Although YPM was from genetically intractable Micromonaspora, similar to DYN, the TNM-producing CB03234 strain proved to be an ideal platform strain with titers 10 -300-fold higher than DYN, UCM, and YPM (27). Moreover, Streptomyces sp. CB03234 was shown to be genetically amenable. Extensive manipulation of the TNM biosynthetic machinery in Streptomyces sp. CB03234 resulted in characterization of six additional TNM family members, thus allowing for the proposal of a unified anthraquinone-fused biosynthetic pathway common to DYN, UCM, YPM, and TNM (28). Identification of Streptomyces sp. CB03234, as a novel anthraquinone-fused enediyne producer, has overcome the challenges associated with other strains, such as the low titer and genetic amenability, thus providing a platform strain enabling further biosynthetic and biological investigations and engineered biosynthesis.
LNM was originally isolated from Streptomyces atroolivaceus S-140 in 1989 (29,30) and is a hybrid peptide-polyketide natural product (31). LNM contains a 1,3-dioxo-1,2-dithiolane heterocycle that causes the molecule to react with DNA in a reductive environment containing thiols (32). Since its debut ϳ30 years ago, no LNM congeners have been discovered. We have also not been very successful in generating LNM analogs by engineering the LNM biosynthetic machinery, despite exhaustive efforts. The biosynthetic origin of the sulfur atom at C-3 within the heterocycle has been linked to a unique PKS didomain (domain of unknown function-sulfhydralase, DUF-SH) (33,34) within the LNM BGC (35). Again, adopting the same RT-PCR strain prioritization strategy, we designed PCR primers according to the DUF-SH sequence and screened ϳ5,000 actinobacteria from the TSRI strain collection, as well as virtually screened the ϳ48,780 bacterial genomes publicly available at JGI and NCBI (as of March, 2017) (36). This screen led to the discovery of 49 new LNM-type BGCs, which fell into 17 distinct clades upon phylogenetic analysis, and representatives from each clade were subjected to whole-genome sequencing. Importantly, this allowed access to an alternative producer of LNM, as well as new scaffolds within the LNM family of natural products for the first time since LNM's discovery three decades ago. Upon fermentation optimization of the prioritized strains, two new members of the LNM family of natural products were isolated, guangnanmycin and weishanamycin.
Strikingly, upon close examination of the hybrid NRPS-PKS and PKS genes in each of the 17 LNM-like BGC clades, it was apparent that Nature is the ultimate combinatorial biosynthetic chemist in maximizing natural product structural diversity (36). The predicted scaffolds of the LNM family of natural products are similar but differ in structural features, such as starter and extender unit selection by the hybrid NPRS-PKS and PKS, and ␤-alkylation and ␣/␤-modifications by the rest of the biosynthetic machinery. The fact that LNM-type biosynthetic machinery divergently evolved to yield distinct yet structurallyrelated hybrid peptide-polyketides is a great inspiration to the field of combinatorial biosynthesis and synthetic biology. As more natural product scaffolds are studied within the context of large microbial strain collections, more combinatorial biosynthesis examples are likely to be discovered.

Function-centric approaches
Function-centric approaches, unlike structure-centric, rely on the biological function of a compound in a structure-agnostic manner. This allows for the discovery of completely novel scaffolds, whose hallmark features for their BGCs have yet to be defined, or natural products, whose bioactivity has not been linked to a single pharmacophore and/or scaffold. Enabling technologies in assay development, such as multiplexing and HTS, have allowed for the rapid advances in these approaches.
HTS platforms that screen for cytotoxicity have been highly successful throughout history, and these types of approaches still hold value for large microbial strain collections, such as the NPL at TSRI. Screening the NPL against cancer cell lines, an extract from an endophytic Streptomyces sp. YIM56209 exhibited considerable cytotoxicity (37). Bioassay-guided dereplication led to the discovery of two novel bafilomycins (Fig. 4A) and nine known congeners of this natural product family. Bafilomycins exhibit a wide range of bioactivities. They hold promise as antitumor agents (38) and are important chemical probes to study V-ATPase enzymes (39).
Lymphatic filariasis, caused by the parasitic nematodes Brugia malayi and Wuchereria bancrofti, represents a worldwide health crisis. Discovery of effective macrofilaricides, drugs that can kill adult female worms, is a top priority (40). This is especially relevant due to the realization that the parasites have started to become resistant to the well-distributed standard-ofcare medications ivermectin and albendazole. Structural characterization of the first parasitic asparaginyl-tRNA synthetase (AsnRS), compared with the well-characterized human AsnRS, revealed new opportunities to target parasitic AsnRS, offering an orthogonal mechanism of action to ivermectin and albendazole (41). Accordingly, extracts from the NPL were screened for AsnRS-inhibitory activity, and activity-guided dereplication led to the eventual discovery of several natural product classes that showed promising in vitro activities (42,43), including the tirandamycins (TAMs) (44). Two TAMs identified in this study were previously known (TAMs A and B), and three were new (TAMs E, F, and G, see Fig. 3B). Despite the highly-similar structures of the TAMs, only TAM B exhibited potent inhibitory activity against B. malayi AsnRS. In addition, TAM B was shown to have 10-fold selectivity for B. malayi AsnRS over human AsnRS and killed the adult B. malayi parasite very efficiently. Therefore, TAM B represents a new lead scaffold to discover and develop antifilarial drugs (44).
Breakthroughs in assay development have enabled efforts for our ability to screen for many desirable mechanisms of action. Eukaryotic protein translation has emerged as a promising target for cancer chemotherapy. Development of HTS for this phenotype (45,46) enabled the screening of the NPL leading to the discovery of Streptomyces spp. YIM56132 and YIM56141, alternative producers of actiphenol and cycloheximide (Fig. 4B) (47). Follow-up studies in the newly-discovered producing strains allowed biosynthetic connections between these important molecules to be made, specifically the realization that actiphenol is a biosynthetic intermediate en route to cycloheximide (48).
High-content image-based HTS has recently become a valuable method for phenotypic screening of small molecule libraries. This type of HTS involves a high-content image readout to determine the biological activity of a compound allowing for complex phenotypes to be rapidly screened. We first subjected the pure natural product collection in our NPL to a high-content image-based screen for anti-Wolbachia activity (49,50), leading to the identification of the kirromycins as a lead scaffold Figure 4. Examples of function-centric approaches to leverage the microbial strain collection at TSRI for natural product discovery. Selected strains from the collection are fermented in diverse media to make the crude extracts, and upon HPLC analysis, the most chemically diverse conditions are selected, scaled up, and subjected to chromatography to afford the partially purified fractions. The current NPL consists of crude extracts, partially purified fractions, and pure natural products (also see Fig. 1B) and has been subjected to HTS, against emerging biology, for natural product discovery. A, NPL was screened for cytotoxicity and prolactin-initiated phosphorylation of ERK1/2. Bioassay-guided dereplication of the active hits led to the isolation of two new bafilomycin congeners, along with nine known ones. B, NPL was screened for parasitic AsnRS inhibition. Bioassay-guided dereplication of the hits identified TAMs as potent and selective AsnRS inhibitors, three of which were new congeners, and two were previously characterized. C, NPL was screened for inhibition of protein translation initiation via a high-throughput bicistronic mRNA translation assay. Bioassay-guided dereplication of the hits led to the identification of actiphenol and cycloheximide as potent inhibitors, importantly, from the same Streptomyces strain, shedding new insights into their biosynthesis. D, pure natural product collection of the NPL was subjected to a high-content image-based screening to discover inhibitors of Wolbachia via a Drosophila infection model, resulting in the identification of kirromycin as a potent and specific inhibitor of Wolbachia. E, partially-purified fraction collection of the NPL was subjected to a highcontent image-based screen to search for inhibitors of Cryptosporidium via a JW18 infection model, leading to the discovery of the herbicidins as a promising scaffold for anti-Cryptosporidium drug development. JBC REVIEWS: Leveraging strain collection for natural products (Fig. 4C) (51). Impressively, the three kirromycins identified all depleted Wolbachia in Drosophila cells in vitro with IC 50 in the nanomolar range, whereas doxycycline, a registered drug with anti-Wolbachia activity, showed lower activity with an IC 50 of ϳ150 nM. Furthermore, the kirromycins eliminated the Wolbachia endosymbiont in Brugia pahangi ovaries ex vivo with higher efficiency (65-90%) at 1 M than that of doxycycline (50%) (51). This suggests that kirromycin is an effective lead scaffold, further exploration of which could potentially lead to the development of novel antifilarial drugs.
The herbicidins were discovered to have selective anti-Cryptosporidium parvum activity using a similar image-based HTS approach (52). Discovery of anti-Cryptosporidium compounds has been greatly hampered due to difficulties in working with this intractable parasite. Upon subjecting the partially purified fraction collection of the NPL to our newly-developed imagebased assay, six herbicidins were discovered, one of which was new (Fig. 4D) (53). Five of the six herbicidins showed moderate anti-Cryptosporidium activity, with herbicidin K exhibiting comparable activity to the FDA-approved drug nitazoxanide. In addition, herbicidin K showed no toxicity to human cells, highlighting the promise of the herbicidin scaffold for anti-Cryptosporidium drug development (53). This final set of examples demonstrates the value of function-centric approaches, whereby previously characterized natural products can be connected with novel bioactivity.

Conclusions and future perspectives
Natural products have been at the forefront of drug discovery, with about one-third of all FDA-approved drugs being natural products and/or derivatives thereof (2). Advances in microbial genomics and bioinformatics have shed new light on the great biosynthetic potential of Nature, rejuvenating the natural product community and paving the way for a new Golden Age of natural product discovery (54).
Currently, there are numerous challenges for natural product discovery on this path to the new Golden Age. First, consolidation of large microbial strain collections and the construction of a centralized natural product BGC data bank for the community to utilize efficiently needs to be realized. Current data banks such as anti-SMASH and MIBiG have already revolutionized the field; therefore, efforts to expand these by orders of magnitude will be even more revolutionary. Subsequent prioritization of the most promising strains for targeted natural product discovery, based on new enabling technology capable of linking genes to structures to functions, is necessary. Finally, activation of "silent" BGCs in scale and speed, whether in a native or heterologous model host, and bottlenecks in natural isolation, purification, and structural elucidation must be addressed.
These challenges can be overcome by increasing the amount of genetic information available by sequencing large strain collections, such as the microbial strain collection at TSRI. The vast amount of data acquired, ϳ6 million BGCs, will allow the establishment of a natural product BGC data bank with Ͼ210,000 genomes compared with the ϳ20,000 actinobacterial and fungal genomes currently available. This would not only enable the correlation of genes to structures to functions but would allow for systematic study of BGCs and how they interact with their host.
Moreover, the NPL consists of extracts and fractions made from more than 20,000 actinobacterial and fungal strains, harboring great biodiversity (up to 600,000 natural products based on the 30 BGCs per strain estimation). Combining genome mining, prioritization of these extracts based on predicted natural product structures will greatly facilitate efficiency of HTS and increase hit rates, as well as streamline downstream dereplication of the active natural products. The vast potential that a large microbial strain collection may offer can be fully realized for the natural product community by increasing our knowledge of the biosynthetic potential, via sequencing and bioinformatics, and correlation of these structures to their biological functions.