Computational antimicrobial peptide design and evaluation against multidrug-resistant clinical isolates of bacteria

There is a pressing need for new therapeutics to combat multidrug- and carbapenem-resistant bacterial pathogens. This challenge prompted us to use a long short-term memory (LSTM) language model to understand the underlying grammar, i.e. the arrangement and frequencies of amino acid residues, in known antimicrobial peptide sequences. According to the output of our LSTM network, we synthesized 10 peptides and tested them against known bacterial pathogens. All of these peptides displayed broad-spectrum antimicrobial activity, validating our LSTM-based peptide design approach. Our two most effective antimicrobial peptides displayed activity against multidrug-resistant clinical isolates of Escherichia coli, Acinetobacter baumannii, Klebsiella pneumoniae, Pseudomonas aeruginosa, Staphylococcus aureus, and coagulase-negative staphylococci strains. High activity against extended-spectrum β-lactamase, methicillin-resistant S. aureus, and carbapenem-resistant strains was also observed. Our peptides selectively interacted with and disrupted bacterial cell membranes and caused secondary gene-regulatory effects. Initial structural characterization revealed that our most effective peptide appeared to be well folded. We conclude that our LSTM-based peptide design approach appears to have correctly deciphered the underlying grammar of antimicrobial peptide sequences, as demonstrated by the experimentally observed efficacy of our designed peptides.

also been used to generate antimicrobial peptides. Despite the potential of machine-learning approaches, no computationally designed peptide has progressed beyond the early stages of experimental validation. Early computational design approaches were hindered due to the small number of antimicrobial peptides characterized and due to limitations in the algorithms used at the time. The rapid growth of antimicrobial peptide databases, and the maturation of language-based models, specifically LSTM networks, therefore provided the impetus for designing a new generation of synthetic antimicrobial peptides.
Here, antimicrobial peptide design has been cast as a computational language-modeling problem. Antimicrobial peptide sequences are treated as words of a 20-alphabet language. A long short-term memory (LSTM) model (30) is used to understand the arrangement and frequencies of amino acid residues within a peptide, which is analogous to the grammar of a language. Our model was used to generate, synthesize, and experimentally characterize 10 antimicrobial peptide sequences, of which one lead (NN2_0018) displayed promising in vitro and in vivo antimicrobial properties.
The primary focus of this work is on the characterization of NN2_0018 as a lead molecule. NN2_0018 inhibits ESBL, methicillin-resistant, and carbapenem-resistant clinical isolates in vitro and also demonstrated in vivo activity against carbapenem-resistant Acinetobacter baumannii in a mouse model of peritoneal infection. Furthermore, NN2_0018 displayed no mortality, hepatotoxicity, or nephrotoxicity at therapeutic doses. Circular dichroism, nuclear magnetic resonance (NMR), scanning electron microscopy, fluorescence microscopy, and microarray gene expression experiments shed light on the structure and mechanisms of action of NN2_0018. Secondarily, our results also show that the rational design of antimicrobial peptides is possible.

Antimicrobial peptide design using LSTM language models
In this work, antimicrobial peptide design was cast as a language-modeling problem. As an analogy, antimicrobial peptides can be thought of as words in a language, created from 20 letters corresponding to 20 natural amino acid residues. The grammar of the language model is therefore the frequency and placement of amino acid residues. Given a sequence of amino acids x 1 , x 2 , …, x i Ϫ 1 , … x i , the language model attempts to predict the probability distribution over the amino acid vocabulary for the next amino acid in the sequence x i . A probability distribution function of the form P(x i ͉x Ͻi ) is learned, where x Ͻi refers to the sequence of residues before x i (x 1 to x i Ϫ 1 ).
The LSTM model was trained on known antimicrobial sequences from the YADAMP (yet another database of antimicrobial peptides) database (9). As of September 2015, the YADAMP web server contained 2525 manually annotated sequences with their corresponding minimum inhibitory concentration (MIC) values. 1011 sequences (supporting Dataset S1, lstm.train) had a sequence length of Յ30 residues and were chosen as input for our LSTM algorithm. Sequences of Ͼ30 residues may form tertiary structures. Unlike a simple helical pattern, structural motifs that fold into complex patterns in three-dimensional (3D) space may not be properly captured by an algorithm optimized for deciphering sequential grammar. These sequences were therefore eliminated from our dataset. Our LSTM model generated 30,832 peptide sequences (supporting Dataset S1, lstm.sample ). 17,390 sequences remained after removing sequences of Ͼ30 residues and redundant sequences using ClustalW (31).
Further filtering to select for positively charged amphiphilic peptides resulted in a dataset containing 6415 peptides (supporting Dataset S1, bilstm.out). For charge, we selected peptides possessing Ն4 positively charged residues (lysine, arginine, and histidine). For amphiphilicity, we used a simple index (H * ) to rapidly predict amphiphilicity for a large peptide sequence database. A peptide sequence was converted into a helical wheel projection on a two-dimensional (2D) polar coordinate plane (r, ), with neighboring residues positioned 100°a part. Given a peptide sequence S composed of residues {r 1 , r 2 , … r N }, we define C 204 S as the subset of residues that occur in the semicircle defined by in the anticlockwise direction. If ‫ށ‬ denotes the set of all polar residues, the score can be calculated using Equation 1.
Note that 0.5 and 2 are scaling terms to re-scale the H * Ј value from 0.5 3 1 to an H * value of 0 3 1 (where 0 denotes no amphiphilicity and 1 denotes perfect amphiphilicity). Helices possessing H * values of Ն0.33 (also containing Ն4 positively charged residues) were selected for further scoring using a Bi-LSTM (bi-directional long short-term memory) regression model.
A regression model was trained on 501 sequences (supporting Dataset S1, blstm.train) from the YADAMP database with available MIC data against Escherichia coli. It should be noted that our algorithm was not trained on toxicity data, as databases containing a sufficient quantity of toxicity data do not yet exist. MIC data were normalized to the range (0 -1), where lower values correspond to lower MICs. To generate the vector representation of a sequence, a Bi-LSTM was used. The Bi-LSTM model utilizes two LSTMs (one that operates on the peptide sequence in the forward direction and one that operated on the sequence in reverse). Like the LSTM language model, the residues are fed one at a time. Our Bi-LSTM model ranked 6415 sequences based on their predicted antimicrobial activity (supporting Dataset S1, bilstm.out).
From these 6415 sequences, the best 10 sequences (NN2_ 0018 3 NN2_0055 (1)) possessing the lowest Bi-LSTM scores were chosen for synthesis and experimental evaluation. These 10 sequences possessed Bi-LSTM scores ranging from 0.004135 3 0.010283. Three sequences (NN2_R0002, NN2_R0039, and NN2_R0048) possessing the poorest (highest) Bi-LSTM scores were also chosen for synthesis and experimentation, to act as Designed AMP versus MDR isolates negative controls. These three sequences possessed Bi-LSTM scores ranging from 0.551616 3 0.999483.
A workflow describing the algorithm's internal steps along with all inputs and outputs is provided in Fig. 1. An animation illustrating all the stages of this workflow is also provided (https://youtu.be/buMGrOprDsI). 3 Each of the stages is elaborated further in supporting Dataset S1, documentation.pdf. Our LSTM and Bi-LSTM algorithms are implemented in the Lua programming language relying on the Torch machinelearning library. The code for each of these stages and the entire pipeline have been uploaded to the GitHub repository (https://github.com/Tushar-N/amp-lm). 3

Antimicrobial susceptibility testing of designed peptides
Using the residue-level LSTM language algorithm previously described, we synthesized and experimentally characterized 10 peptides possessing the lowest Bi-LSTM scores (Table 1). These peptides were assayed for antimicrobial activity using a broth microdilution method especially designed for cationic antimicrobial peptides (32), using peptide concentrations ranging from 0.25 3 128 g/ml. 30 cultures were chosen for antimicrobial susceptibility testing based on their diversity and clinical relevance. Gram-positive, Gram-negative, fungal, and mycobacterial organisms were tested. Most cultures were acquired from the Microbial Type Culture Collection (MTCC, Chandigarh, India). Minimum inhibitory concentrations for all peptides and all cultures are provided in Table 2 (micromolar  values are provided in Table S1). Designed peptides were scored based on the number of cultures they inhibited with the lowest MIC, as compared with the MICs of all other designed peptides (Equation 2).
Here, X represents a matrix containing MIC values. M represents rows containing MIC values for a particular organism. N represents columns containing MIC values belonging to a particular peptide. Note that multiple minimum MIC values can occur along a given row. Using Equation 2, the two best

Designed AMP versus MDR isolates
performing peptides were identified as NN2_0050 and NN2_ 0018, with peptide scores of 15 and 10, respectively.
Two sets of control experiments were performed (Table 1). First, the MIC values of four peptides (NN2_0018_shuf1, NN2_0018_shuf2, NN2_0050_shuf1, and NN2_0050_shuf2) possessing shuffled sequences were obtained for E. coli (K12 MG1655). In all cases, the MIC values for unshuffled peptides were lower than their shuffled counterparts (Table 1), indicating that the grammar of NN2_0050 and NN2_0018 is critical for their efficacy. Second, we synthesized three peptides (NN2_R0002, NN2_R0039, and NN2_R0048) possessing high Bi-LSTM scores predicted to poorly inhibit E. coli. All three peptides displayed MIC values Ն128 g/ml against E. coli, confirming that the Bi-LSTM algorithm could differentiate between effective and ineffective sequences. The MIC values of these seven control peptides were compared with those of NN2_0018 3 NN2_0055. For E. coli, the control peptides displayed significantly higher MIC values (p ϭ 0.029, Table 1). These results indicate that the Bi-LSTM algorithm was successfully trained using E. coli MIC values.

Antimicrobial susceptibility testing of our peptides against MDR clinical isolates
Antimicrobial susceptibility testing for NN2_0050 and NN2_ 0018 was performed against 61 recent clinical isolates obtained from MS Ramaiah Medical College, Bangalore, India (Table  S2). Most isolates obtained displayed multidrug resistance (ESBL, methicillin resistance, and carbapenem resistance). Most isolates displayed mucoid morphologies not conducive to absorbance-based growth estimation. Therefore, minimum bactericidal concentration (MBC) for all cultures was assayed using the modified Resazurin protocol, as described under "Experimental procedures." NN2_0050 displayed greater activity against Gram-negative organisms. For NN2_0050, the Gram-negative MBC 50 was found to be 4 g/ml (2 M), and the Gram-positive MBC 50 was found to be 32 g/ml (15.91 M) ( Table 3). NN2_0018 displayed slightly better activity against Gram-positive organisms. For NN2_0018, the Gram-negative MBC 50 was found to be 16 g/ml (8.88 M), and the Grampositive MBC 50 was found to be 8 g/ml (4.44 M) (Table 4). However, NN2_0018 was found to possess greater activity against Klebsiella pneumoniae strains. Interestingly, NN2_ 0018 inhibited all 17 Staphylococcus aureus and CoNS strains tested with an MBC 50 Յ16 g/ml (8.88M). All MRSA and methicillin-resistant CoNS isolates tested were also inhibited. A summary of all MBC values and conventional antibiotic resistance results for individual organisms is provided in Table S2.

In vitro and in vivo toxicity determination for designed peptides
Cytotoxicity experiments for NN2_0018 and NN2_0050 were performed on HeLa and HaCaT cells using the MTT assay. For HaCaT cells, both NN_0018 and NN2_0050 were found to possess negligible cytotoxicity (IC 50 Ͼ128 g/ml) in the concentration range tested ( Fig. 2A). For HeLa cells, NN2_0018 displayed similar characteristics (IC 50 Ͼ128 g/ml). However, NN2_0050 possessed an IC 50 Ͻ64 g/ml (31.82 M) (Fig. 2B). These results indicate that NN2_0050 lacks specificity for prokaryotic cells and may possess cross-reactivity against other eukaryotic tissues.
Based on the encouraging in vitro toxicity results obtained for NN2_0018, in vivo toxicity experiments were performed using 6 -8-week-old BALB/c mice. Both female and male mice were used to account for gender differences in peptide toxicity.

Designed AMP versus MDR isolates
NN2_0018 concentrations ranging from 32 3 256 g/g (17.79 3 142.28 mol/g) mouse body weight were tested. Six mice per cohort were used for each concentration, including a vehicle control (30 mice per gender). NN2_0018 suspended in buffer (20% DMSO, 80% saline) was injected intraperitoneally, and all mice were monitored for 7 days post-injection. All mice survived for 7 days post-injection at NN2_0018 concentrations up to 64 g/g (35.57 mol/g) (Fig. 2, C and D). Significant mortality was observed at 256 g/g (142.28 mol/g), with only 33% of female and male mice surviving for 7 days. Using linear interpolation, the LD 50 of NN2_0018 was calculated to be 213 g/g (118.38 mol/g) for females and 224 g/g (124.50 mol/g) for males.
Blood tests were performed to determine whether NN2_0018 displays any hepatotoxic and nephrotoxic effects at 64 g/g (35.57 mol/g). These tests were performed on both female and male mice to account for gender differences in peptide toxicity. Four cohorts of six mice each (female-untreated, femaletreated, male-untreated, and male-treated) were prepared.

Designed AMP versus MDR isolates
Blood, liver, and kidney samples were extracted 24 h post-treatment. Blood urea nitrogen (kidneys, Fig. 2E) and aspartate aminotransferase (liver, Fig. 2F) concentrations for untreated versus treated cohorts displayed no significant differences, indicating that NN2_0018 does not possess acute hepatotoxic or nephrotoxic effects at 64 g/g (35.57 mol/g) for both female and male BALB/c mice.
Histopathological examination of liver and kidney sections stained with hematoxylin and eosin confirmed these findings. Liver sections displayed no necrosis or lipid vacuolation associated with liver damage (Fig. 3). Similarly, kidney sections from all four cohorts displayed no marked injuries. Renal tubules and glomeruli appeared intact. Characteristic cast formation, tubule dilation, or cytoplasmic vacuolation associated with drug-induced kidney damage was not detected (33,34).
Survival experiments, blood tests, and histopathological tissue examination all indicate that NN2_0018 possesses no significant in vivo toxicity up to 64 g/g (35.57 mol/g), and therefore it has the potential for systemic use.

NN2_0018 clears carbapenem-resistant A. baumannii infections in vivo
NN2_0018 efficacy against A. baumannii (P1270) was assayed using the mouse peritoneal model of infection. Four cohorts of 6 -8-week-old BALB/c mice (female) were infected through a peritoneal injection of 3.70 ϫ 10 6 cfu of A. baumannii suspended in saline. Cohort 1 was euthanized at 0.5 h postinfection. A peritoneal cfu count was performed on this cohort to determine the pathogenic load at the start of treatment. Treatment began 0.5 h post-infection. Cohort 2 was treated with solvent (20% DMSO, 80% saline) to act as a sham control. Cohort 3 was treated with 13.33 g/g (34.76 mol/g) meropenem (a carbapenem-class drug) suspended in saline, corresponding to the Food and Drug Administration's (https://www.accessdata.fda. gov/drugsatfda_docs/label/2016/050706s037lbl.pdf) recommended dose for a 75-kg adult (35). Cohort 4 was treated with 64 g/g (35.57 mol/g) NN2_0018 in solvent (20% DMSO, 80% saline). Cohorts 2-4 were euthanized 4.5 h post-infection

Designed AMP versus MDR isolates
(4 h post-treatment), and peritoneal cfu counts were performed for all mice. This experimental setup is illustrated in Fig. 4A. Statistical analyses were performed using a one-way ANOVA. Globally, the one-way ANOVA displayed a p value 5.83 ϫ 10 Ϫ6 , indicating that there were statistically significant differences between the means of these cohorts. To determine which cohorts possessed significantly different means, further statistical testing was performed using the pairwise Tukey's HSD (honest significant difference) tests. Pairwise Tukey's HSD tests possessing p Ͻ 0.05 are reported.
These results indicate that both sham and meropenem treatment are ineffective at reducing peritoneal cfu loads of A. baumannii (P1270).
In contrast, NN2_0018 was found to significantly lower peritoneal A. baumannii (P1270) loads for Cohort 4 (Fig. 4B), in comparison with both sham-treated Cohort 2 (p ϭ 0.0002) and meropenem-treated Cohort 3 (p ϭ 0.0001). The mean peritoneal cfu load for NN2_0018-treated Cohort 4 was 3.53 ϫ 10 6 , in comparison with the sham-treated and meropenem-treated peritoneal cfu loads of 2.11 ϫ 10 8 and 1.98 ϫ 10 8 cfu, respectively. These results indicate that NN2_0018 is more effective at reducing carbapenem-resistant A. baumannii loads in vivo than both sham and meropenem treatment.

SEM visualization of peptide-induced membrane disruption
Membrane disruptions were studied using scanning electron microscopy experiments. We chose E. coli (K12 MG1655) and Staphylococcus hemolyticus (MTCC 3383) as model Gram-negative and Gram-positive organisms, respectively. NN2_0050 and NN2_0018 were chosen for these experiments. Fig. 5, A-C, Table 4 Distribution of MBCs of NN2_0018 for clinical isolates MBC values tested range from 0.25 3 128 g/ml (0.14 3 71.14 M).

Designed AMP versus MDR isolates
depicts the effect of NN2_0050 and NN2_0018 on E. coli cellular morphology.
In the absence of the antimicrobial peptides, E. coli cells display typical morphological characteristics, remaining turgid, smooth, and cylindrically shaped (Fig. 5A). The addition of NN2_0018 dramatically alters E. coli cellular morphology. E. coli cells appeared highly ridged and flattened (Fig. 5B), implying a substantial loss of cytoplasmic contents through membrane rupture.
The addition of NN2_0050 produced similar morphological changes. E. coli cells appeared flaccid and highly ridged along most of their surface (Fig. 5C). Direct evidence of membrane rupture was observed. The leakage of cellular contents from lysed cells can be observed in the top-left region of Fig. 5C.
Although S. hemolyticus was observed to be susceptible to NN2_0050 and NN2_0018, it displayed little morphological change upon peptide addition. Untreated S. hemolyticus cells display typical morphological characteristics, remaining turgid, smooth, and spherical (Fig. 5D). The addition of NN2_0018 produced no morphological changes (Fig. 5E). Similarly, the addition of NN2_0050 brought about no morphological changes (Fig. 5F). Low magnification (ϫ10,000) SEM images for all samples are provided in Fig. S1, A-F.
We hypothesized that the thicker, peptidoglycan-rich Grampositive cell wall prevented the observation of large-scale morphological disruptions for S. hemolyticus. Therefore, we removed the cell wall via induced protoplast formation using benzylpenicillin (36). Protoplast formation was confirmed through Gram staining (Fig. S2, A and B). Untreated S. hemolyticus protoplasts retained a smooth, spherical shape despite completely lacking a cell wall (Fig. 5G). S. hemolyticus protoplasts treated with NN2_0018 displayed membrane perforations and minor blebbing (Fig. 5H). Protoplasts treated with NN2_0050 displayed similar perforations and more prominent cell-membrane blebbing (Fig. 5I). Low magnification (ϫ20,000) SEM images for all S. hemolyticus protoplasts are provided Fig. S3, A-C. These observations confirm that both Gram-positive and Gram-negative cells are susceptible to membrane disruption induced by peptides NN2_0018 and NN2_0050. Inset, Kaplan-Meier plot for mouse survival (BALB/c, female) at 128 g/g (71.14 mol/g) and 256 g/g (142.28 mol/g) NN2_0018. D, in vivo systemic toxicity assay for NN2_0018 using a BALB/c mouse model (male). Inset, Kaplan-Meier plot for mouse survival (BALB/c, male) at 128 g/g (71.14 mol/g) and 256 g/g (142.28 mol/g) NN2_0018. Note that six mice were used for each NN2_0018 concentration tested (including a buffer-only vehicle control). E and F, blood tests performed for both male and female BALB/c mice to determine toxicity at therapeutic NN2_0018 doses. In all cases, treated cohorts were injected with 64 g/g (35.57 mol/g) NN2_0018 in buffer and incubated for 24 h before blood extraction. p values (in green) were calculated using the Welch two-sample t test. E, blood urea nitrogen assay. F, aspartate aminotransferase assay. For all cases, peptide concentration units are expressed as micrograms of g peptide per g of mouse body weight (or micromoles of peptide per g of mouse body weight).

Designed AMP versus MDR isolates Peptide localization within bacterial cell membranes
Peptide localization experiments were performed by observing FITC-labeled peptides using confocal microscopy. E. coli (K12 MG1655) and S. hemolyticus (MTCC 3383) were incubated with FITC-labeled NN2_0018 and NN2_0050 and counterstained with DAPI (nucleic acid staining) and Nile red (lipid/ cell membrane staining). All images were acquired using a ϫ63 oil immersion lens. For clarity, a representative region for all images was chosen and further magnified digitally at ϫ3. All original images can be found in supporting Dataset S2.
NN2_0018 was observed to colocalize with Nile red (Fig. 6A), confirming peptide localization in the cell membrane. Both E. coli and S. hemolyticus display similar colocalization characteristics. NN2_0050 was also observed to localize predominantly within the cell membrane for both E. coli and S. hemolyticus (Fig. 6B).
Confocal microscopy confirmed that NN2_0050 causes large-scale S. hemolyticus cell membrane disruption. In Fig. 6B (Nile red/FITC-peptide), S. hemolyticus cell membranes appeared shrunken and distorted, in contrast to the large, spherical membranes visualized in Fig. 6A (Nile red/FITC-peptide). SEM experiments revealed that NN2_0050 was able to penetrate the peptidoglycan layer without causing any disruptions. Both experimental approaches therefore indicate that NN2_0050 ultimately localizes in the cell membrane, causing large-scale disruptions. These disruptions remained contained within the unperturbed peptidoglycan-rich cell wall.
Pearson's correlation was used to quantify colocalization for all combined images in Fig. 6. Initially, K-means clustering was performed, partitioning image pixels into two clusters (cell and background). For the cell body, pixel-pixel intensity correlations for all combinations of stain channels was calculated using Pearson's correlation. Higher correlation values represented better stain colocalization. In all cases, Nile red/FITC-peptide displayed the highest correlation, confirming that our designed peptides colocalize within membranes (Fig. 6C).

Differential E. coli gene expression upon NN2_0018 challenge
Two replicates of a carbapenem-resistant E. coli clinical isolate (P1645ec) were challenged with NN2_0018 at half-MBC concentrations (4 g/ml, 2.22 M). Two control replicates grown under identical conditions but lacking NN2_0018 challenge were also prepared. RNA expression levels of all four samples were compared using an Agilent comparative genomic hybridization (CGH) microarray platform. Differentially expressed genes (DEGs) showing at least 1.319 (2 0.4 )-fold up-regulation or down-regulation were identified between NN2_0018-challenged and unchallenged samples. A ClueGO comprehensive enrichment analysis was performed on these DEGs to classify genes into functional groups. Our ClueGO classification resulted in a total of 74 up-regulated and 15 down-regulated genes, classified into 15 functional groups ( Fig. 7A and Table  S3) and seven functional groups ( Fig. 7B and Table S4), respectively. Genes from these functional groups were then individually annotated based on a literature survey ( Table 5).
Three genes associated with virulence factors were up-regulated: phoB, cyaA, and ihfB. phoB, part of the phosphate regulon, is required for virulence expression across multiple organisms (37). cyaA (bifunctional hemolysin/adenylate cyclase) is responsible for respiratory tract colonization (38 -40). ihfB (integration host factor) is also known to regulate virulence gene expression across multiple organisms (41,42). Pathogens up-regulate virulence factor gene expression in response to stress (37), and these genes can therefore be considered as part of the nonspecific stress response.
Two genes associated with cell membrane integrity were upregulated. ompR, a transcriptional regulator of major outer membrane protein genes, was up-regulated. Outer membrane proteins maintain lipid asymmetry in the outer membrane, serving both a structural role and preventing cellular entry of toxins (43). cfa (cyclopropane-fatty-acyl-phospholipid synthase) was also up-regulated. Cyclopropane fatty acids are known to stabilize membranes by decreasing mobility and increasing lipid bilayer packing tightness (44 -47). These results indicate that ompR and cfa may be up-regulated to compensate for NN2_0018-induced membrane disruption.
Five genes associated with electron transport were up-regulated: erpA (essential respiratory protein), frdB, frdC, and frdD (fumarate reductase complex), and cydA (cytochrome bd-I ubiquinol oxidase subunit 1). erpA remains essential in the presence of oxygen or alternative electron acceptors (48). cydA is a terminal oxidase that predominates under low aeration  (49). Fumarate reductase acts as a terminal electron acceptor during anaerobic respiration only, accepting electrons from complex I via naphthoquinones (50). The up-regulation of anaerobic electron transport components implies that oxygen uptake by, or electron transport to, cytochrome c oxidase has been inhibited. It is conceivable that NN2_0018 inhibits electron transport chain complexes II 3 IV (cytochrome c oxidase) or terminal oxygen uptake, forcing the up-regulation of anaerobic electron transport chain components.

Designed AMP versus MDR isolates
Ten genes associated with carbohydrate degradation were up-regulated. These genes were associated with glycolysis (deoC, dmlA, gapA, and yeaD), the pentose phosphate pathway (pgl and tktB), glycogen metabolism (glgP and glgS), galactose metabolism (galT), and trehalose metabolism (otsB). The upregulation of these genes may be a response to increasing cellular energy demands, potentially due to the energetic demands of stress responses and to compensate for decreased oxidative electron transport.

Designed AMP versus MDR isolates
also down-regulated. The reasons for the differential expression of these eight genes, along with 19 other DEGs, are not apparent and merit further investigation. All DEGs and pathway alterations constituting the E. coli response to NN2_0018 challenge are depicted in Fig. 8.

Preliminary structural characterization of NN2_0018
Circular dichroism experiments revealed that NN2_0018 adopts a random-coil configuration in water, indicating that NN2_0018 remains disordered outside the cellular environment. However, spectra displaying ␣-helical characteristics (minima at 222 nm and 208 nm) were recorded in apolar solvents such as methanol, 15 mM dodecylphosphocholine (DPC) micelles, and 40% trifluoroethanol (Fig. 9A), indicating that NN2_0018 adopts an ␣-helical conformation upon interacting with environments mimicking the bacterial cell membrane. The 1D NMR spectrum of NN2_0018 in both deuterated DPC micelles and deuterated methanol possessed well resolved chemical shifts in the amide region (Fig. 9B), indicative of a well-folded peptide. The vast majority of ␣-protons in NN2_0018 resonate at chemical shifts below 4.5 ppm, indicating that the peptide adopts an ␣-helical conformation in both solvents. The NOESY NMR spectrum of NN2_0018 possessed several cross-peaks in the amide (7-9 ppm) region for both deuterated methanol (Fig. 9C) and DPC micelles (Fig. 9D). These cross-peaks indicate the spatial proximity of amide protons of adjacent residues (i 3 i ϩ 1, i 3 i ϩ 2), characteristic of ␣-helices, confirming that NN2_0018 adopts an ␣-helical structure in apolar environments. The cross-peaks are more numerous and possess greater intensities in DPC micelles, indi-

Figure 6. Confocal microscopy experiments performed on E. coli (K12 MG1655) and S. hemolyticus (MTCC 3383) cells, using FITC-labeled peptides NN2_0018 and NN2_0050.
A, E. coli and S. hemolyticus treated with FITC-labeled NN2_0018. In both cases, NN2_0018 was found to colocalize with Nile red, indicating a strong membrane-binding preference. B, E. coli and S. hemolyticus treated with FITC-labeled NN2_0050. Again, NN2_0050 was found to colocalize with Nile red, indicating a membrane-binding preference. Here, membrane destabilization is apparent for S. hemolyticus. 8 g/ml (4.44 M) NN2_0018 and 8 g/ml (4 M) NN2_0050 were used. All images have been captured using a ϫ63 oil immersion objective. For clarity, all images have been digitally magnified by an additional ϫ3. The scale bar (gray, bottom right) represents 2 m. C, Pearson's correlation coefficients for all stain combinations (DAPI/FITC-peptide/Nile red). Higher correlations denote better stain colocalization. In all cases, Nile red/FITC-peptide stains showed the highest correlation values.

Designed AMP versus MDR isolates
cating that NN2_0018 displays greater helical structure in DPC. Further structural characterization of NN2_0018 is in progress.

Discussion
The emergence of MDR pathogens poses a grave public health problem. Of particular concern is the emergence of carbapenem-resistant pathogens, as such pathogens are difficult to treat and result in poor clinical outcomes. There is therefore an urgent need for new antimicrobial compounds to address proliferating drug resistance. In this work, we have implemented an LSTM model to understand and design antimicrobial peptides. Our model correctly understood the underlying grammar of antimicrobial peptide sequences, as demonstrated by the broad-spectrum antimicrobial activity of all our designed peptides. Our two best peptide designs (NN2_0050 and NN2_ 0018) were found to display activity against MDR clinical isolates, including carbapenem-resistant and methicillin-resistant organisms.
Toxicity has hindered past efforts aimed at developing systemic therapeutic peptides. For example, gramicidin S and melittin (51) possess high hemolytic activities. Encouragingly, NN2_0018 displayed minimal toxicity at bactericidal Node size corresponds to the number of mapped genes for a particular functional group. Edges represent shared genes between two functional groups. Edge thickness is proportional to the number of shared genes. Nodes containing genes that are subsets of other nodes have labels colored gray.  Up-regulated and down-regulated genes observed directly from microarray data are colored green and red, respectively. NN2_0018 targets inferred but not directly observed from microarray data are shaded as red lines. For clarity, most individual DEGs are not shown in favor of depicting pathways/functions. Genes associated with each pathway/function can be found in Table 5. DEGs whose down-regulation could not be rationalized are depicted with gray edges.

Designed AMP versus MDR isolates
concentrations when tested in vitro against the HaCat and HeLa cell lines and in vivo against BALB/c mice. Furthermore, NN2_0018 displayed in vivo efficacy against carbapenem-resistant A. baumannii. Our selection of carbapenemresistant A. baumannii for in vivo texting was motivated by its Priority-1 classification (52) as a critical pathogen for the development of new drugs. Because of both efficacy against MDR clinical isolates and low in vivo toxicity, the algorithms and peptides described in this work represent a significant advancement over previous language models (28). Such models produced peptides that failed to display sufficient efficacy (only 4/40 peptides possessed MICs Յ64 g/ml against E. coli).
We further investigated the mechanisms of action of our best designs, and we concluded that their antimicrobial activity is primarily due to direct membrane interaction and disruption, with secondary systemic effects. Peptide localization into membranes was observed using confocal microscopy. Peptide-induced membrane disruptions involving prominent blebbing and exudation of cellular contents were observed using SEM. Microarray gene expression analysis revealed that E. coli responds to NN2_0018 challenge through stress responses as well as pathway-specific responses. Anaerobic electron trans-port proteins were found to be up-regulated, implying that NN2_0018 hinders oxidative electron transport. Different antimicrobial peptides have been shown to elicit unique bacterial gene expression responses (53), therefore implying that the responses characterized in this study may be specific to NN2_0018 challenge.
NN2_0018 appeared ␣-helical and well-folded in a micellar environment. Structural elucidation and structure-function analyses are important future steps in the characterization of NN2_0018. In particular, the mechanisms responsible for the differential activity of NN2_0018 and NN2_0050 for Gram-negative and Gram-positive organisms deserve further investigation. The mechanism of action of antimicrobial peptides does not depend on a specific molecular target. Instead, an entire cellular component (the cell membrane) is disrupted, which makes the development of resistance against them difficult. Ultimately, our experimentally validated LSTM algorithms and peptides may help design new peptide-based antibiotics. Such antibiotics are needed to counter the ever-increasing problem of multiple drug resistance. NN2_0018 adopts an ␣-helical conformation in apolar solvents (methanol, 15 mM dodecylphosphocholine micelles, and 40% trifluoroethanol). B, 1D NMR spectrum of NN2_0018 acquired in deuterated dodecylphosphocholine micelles (blue) and deuterated methanol (red). *, well-resolved chemical shift dispersion in the amide region indicates proper folding. **, ␣-protons appear below 4.5 ppm as well resolved peaks, indicating ␣-helical structure. Chemical shift dispersion was more pronounced in dodecylphosphocholine micelles, indicating greater helical content. C, NOESY NMR spectrum of NN2_0018 acquired in deuterated methanol. D, NOESY NMR spectrum of NN2_0018 acquired in 15 mM deuterated dodecylphosphocholine micelles. Cross-peaks in the amide (7-9 ppm) region are indicative of ␣-helical structures. Positive contours are colored blue, and negative contours are colored orange. In all cases, 1 mM NN2_0018 was used.

Antimicrobial susceptibility assays
The MIC of a given peptide and for a given organism was determined using the microwell dilution method, as described by Wiegand et al. (32) (Protocol E). This protocol was optimized for determining the MIC of cationic antimicrobial peptides. Briefly, the protocol is as follows: 2-fold dilutions of the peptide were created in a sterile 96-well polypropylene plate. Ten peptide concentrations were used, ranging from 256 3 0.5 g/ml. Each well contained the peptide diluted in Mueller Hinton (MH) broth (Sigma: 70192-100G), as well as the culture being assayed. At this stage, each well contained 50 l of peptide in Mueller Hinton broth.
Cultures to be assayed were grown in Mueller Hinton broth and incubated overnight at 37°C under shaker conditions of 180 rpm. The culture was diluted to 10 8 cfu/ml by comparing the absorbance at 600 nm with that of the MacFarland 0.5 standard. A further 1:100 dilution was performed using Mueller Hinton broth, reducing the number of colony-forming units to 10 6 cfu/ml. Spread plating was used to confirm the expected colony count. Each of the 10 wells described previously was inoculated with 50 l of this culture, resulting in a final inoculum of 5 ϫ 10 5 or 5 ϫ 10 4 cfu/well. Note that the addition of 50 l culture simultaneously caused a 2-fold dilution of the peptide, altering the peptide concentration range to 128 3 0.25 g/ml.
Two control experiments were performed: a growth control was created by inoculating 5 ϫ 10 5 cfu/ml of the culture in 100 l of Mueller Hinton broth. A sterility control was created containing 100 l of Mueller Hinton broth in the absence of peptide or culture. These plates were covered with other sterile polypropylene plates acting as lids to prevent contamination. These plates were incubated at 37°C for 24 h. Growth was determined by measuring the absorbance at 600 nm for each well. The MIC for a given peptide and a given organism was the first peptide concentration that completely inhibited growth (reading along a peptide concentration range of 128 3 0.25 g/ml). Some organisms displayed mucoid or plaque morphologies, which made the estimation of growth through absorbance inaccurate. For such organisms, protocol E was modified to include resazurin (54, 55). Resazurin is a weak fluorescent dye that is irreversibly reduced to fluorescent resorufin in proportion to aerobic respiration. Using this modified protocol, cultures in 96-well polypropylene plates were incubated at 37°C for 12 h. 30 l of a 0.02% (w/v) aqueous resazurin solution was then pipetted into each well. Further incubation was performed at 37°C for 12 h. Growth was estimated based on fluorescence measurements (excitation, 530 nm, and emission, 590 nm, reported as arbitrary fluorescence units). The percentage growth in each well was estimated based on Equation 3, and wells containing Յ5% growth were considered to display peptide bactericidal activity.

Cell culture and cytotoxicity assay
HeLa and HaCaT cells were grown in Dulbecco's modified Eagle's medium. The medium was supplemented with 10% fetal bovine serum, penicillin, streptomycin, and gentamycin. Cells were grown in serum-containing growth media until they reached 80 -90% confluence. These cells were later used for the cytotoxicity assay.
The cytotoxicity of our peptides was evaluated using the MTT assay. Approximately 1 ϫ 10 4 cells per well were seeded into polystyrene 96-well plates with 200 l of medium. These plates were incubated at 37°C for 12 h (5% CO 2 ), after which they were exposed to various concentrations of peptides and incubated at 37°C for 24 h (5% CO 2 ). MTT was added to each well at a final concentration of 0.5 mg/ml. The plates were then incubated at 37°C for 4 h (5% CO 2 ). After the supernatant was aspirated, 150 l of dimethyl sulfoxide (DMSO) was added to each well and incubated at 37°C for 10 min (5% CO 2 ). Absorbance measurements were performed at 570 nm using the Multi-Mode Microplate Reader (Biotek). Results were reported in the form of percentage growth, which was the growth of peptidetreated cells relative to untreated cells cultured under identical conditions. Five replicates for all peptide concentrations was performed, from which the mean percentage growth and standard deviation were calculated.

Peptide in vivo toxicity experiments using a mouse model
6 -8-Week-old BALB/c mice (male and female) weighing ϳ20 g were used as in vivo models to determine peptide toxicity. Toxicity was determined by injecting peptide suspended in buffer (20% DMSO, 80% saline) intraperitoneally and monitoring all mice for 7 days while recording all deaths. Vehicle controls consisting of buffer-only injections were also performed. All mice were euthanized via ketamine overdose at the end of the experiment. LD 50 values were then calculated using linear interpolation.
Blood tests (blood urea nitrogen and aspartate aminotransferase) and histopathological tests (hematoxylin-eosin staining of liver and kidney sections) were performed to determine the hepatotoxic and nephrotoxic properties of our peptides at therapeutic doses. Four cohorts of mice were used for these tests. Cohort 1 consisted of untreated BALB/c mice (female). Cohort 2 consisted of BALB/c mice (female) treated with a single dose Designed AMP versus MDR isolates of 64 g/g (35.57 mol/g) NN2_0018 in buffer. Cohort 3 consisted of untreated BALB/c mice (male). Cohort 4 consisted of BALB/c mice (male) treated with a single dose of 64 g/g (35.57 mol/g) NN2_0018 in buffer. All mice were incubated for 24 h post-injection and anesthetized using a terminal dose of ketamine. Blood was extracted immediately via cardiac puncture, although liver and kidney tissue samples were extracted post mortem.
All mice were housed in the Central Animal Facility, IISc, with feed and water provided ad libitum. All animal experiments described in this work were approved by the Institutional Animal Ethics Committee, IISc (Project No. CAF/Ethics/550/2017).

Peptide in vivo efficacy experiments using a mouse peritoneal model of infection
Experiments studying the peritoneal cfu clearance abilities of NN2_0018 were performed using 6 -8-week-old BALB/c mice (female) weighing ϳ20 g infected with A. baumannii (P1270). A glycerol stock of A. baumannii (P1270) stored at Ϫ80°C was thawed and inoculated into 10 ml of MH broth with 8 g/ml (20.86 M) meropenem to preserve the carbapenem-resistant phenotype. This culture was incubated at 37°C/24 h. This culture was diluted to 1.5 ϫ 10 8 cfu in saline using a McFarland 0.5 standard and was further diluted in saline to a final concentration of 1.85 ϫ 10 7 cfu/ml. cfu counts were retrospectively confirmed by plating and colony counting. 200 l of this suspension (3.7 ϫ 10 6 cfu) was peritoneally injected into four cohorts of BALB/c mice containing eight mice per cohort. A description of the experiments performed on each cohort is provided in Fig. 4.
Mice were euthanized using a CO 2 overdose. Peritoneal washes were performed by injecting 5 ml of chilled saline into the peritoneum and gently massaging and extracting the peritoneal fluid. Serial dilutions in saline and plating in Mueller-Hinton agar containing 8 g/ml (20.86 M) meropenem was performed immediately. Colony counting was then performed to calculate peritoneal cfu loads.
For the duration of all experiments, mice were housed in the Central Animal Facility (CAF, IISc), and they were provided with pellet feed and water ad libitum.

SEM experiments
A 1-ml bacterial culture was incubated overnight at 37°C/ 180 rpm in Mueller Hinton broth and then centrifuged at 6000 rpm for 10 min. The pellet was resuspended in sterile phosphate-buffered saline (PBS), and the A 600 was adjusted to 0.3-0.4. This resuspension was divided into two 500-l aliquots (test/control). Peptide was added to the test aliquot at a final concentration of 128 g/ml. The control aliquot did not contain any peptide. Both aliquots were incubated at 37°C for 2 h/180 rpm, then centrifuged at 6000 rpm for 10 min, and resuspended in 25 l of PBS. 10 l of each resuspension was pipetted onto a clean glass coverslip and air dried for 1 h. Air-dried samples were immersed in a 2.5% (w/v) glutaraldehyde solution made in PBS and incubated for 24 h under ambient conditions. Postincubation, these samples were washed three times with distilled water to remove traces of glutaraldehyde. Samples were immersed in 30, 50, 75, 85, 95, and 100% alcohol/water gradients for 3 min each for dehydration. The sample was dried in a hot-air oven at 70°C for 4 h. A 10-nm gold coating was applied to the sample (attached to an aluminum stub) using the Quorum Q150R ES sputter coater. SEM experiments were performed using the Carl Zeiss Ultra 55 field emission scanning electron microscope (FESEM, mono). Samples were analyzed using an extra-high tension voltage of 5 kV and using magnifications ranging from ϫ10,000 to 50,000.

Generation of bacterial protoplasts
S. hemolyticus protoplasts were generated using a standard protocol (36). Briefly, S. hemolyticus was inoculated into 10 ml of Mueller Hinton broth and incubated overnight at 37°C/180 rpm (culture A). 3 ml of this culture was directly inoculated into 10 ml of Mueller Hinton broth containing 5% sucrose, 0.1% MgSO 4 , and 100 units/ml benzylpenicillin. This culture was incubated at 37°C for 2 h/180 rpm (culture B). Protoplast generation was confirmed by Gram staining culture A and culture B. Untreated culture A was expected to stain Gram-positive. Treated culture B, lacking a cell wall, was expected to stain Gram-negative.

Confocal microscopy experiments
Designed AMP versus MDR isolates channels, pixel-intensity values were compared using Pearson's correlation. Higher correlation values were considered to represent better stain colocalization. Python scripts for image analysis are provided in Dataset S2.

Preparation of samples for microarray analysis
A carbapenem-resistant clinical E. coli isolate (P1645ec) was chosen for differential gene expression experiments upon exposure to peptide NN2_0018. Two replicates of E. coli (P1645ec) were incubated at 37°C/24 h in 10 ml of MH media, supplemented with NN2_0018 at half-MBC concentrations (4 g/ml, 2.22 M). Two control replicates of E. coli (P1645ec) incubated at 37°C/24 h in 10 ml of MH media, but without NN2_0018, were also grown. All four samples were also supplemented with 8 g/ml (20.86 M) meropenem to maintain their carbapenem-resistant phenotype. After incubation, all four samples were centrifuged at 6000 rpm for 10 min, and the pellets were flash-frozen using liquid nitrogen. RNA extraction, quality control, and E. coli microarray mRNA hybridization experiments using an Agilent CGH platform were performed by Genotypic, India, using an E. coli 8 ϫ 15,000 array.

Functional gene set enrichment analysis
Microarray data preprocessing was performed in R, using the limma package available through Bioconductor. The median signal and background intensities were extracted using the read.maimages() function (56). Signal intensities were background-adjusted using the normexp() function (56). Background-adjusted signals were then log2-transformed and quantile-normalized to make the intensities consistent across each array (56).
Differential analysis was performed for NN2_0018-treated E. coli (P1645ec) with respect to an untreated E. coli control. Genes with a 1.319 (2 0.4 )-fold change (up/down-regulation) and with a false discovery rate (FDR)-corrected p value calculated using the Benjamini and Hochberg method (57) less than or equal to 0.05 were considered as significantly differentially expressed genes (DEGs). This analysis revealed a significant up-regulation of 145 genes and a significant down-regulation of 26 genes. These DEGs were considered for downstream functional enrichment analysis through classification into functional groups using ClueGO comprehensive enrichment analysis. ClueGO 2.2.5 (58) is a Cytoscape3.2 plugin. ClueGO attempts to classify genes into different functional groups. For example, genes occurring in the same metabolic pathways, the same subcellular locations, or acting upon the same substrate/ product would be classified under the same functional group. ClueGO functional groups are composed of 3023 biological processes, 280 cellular components, 2591 molecular functions, and 105 pathway resources derived from the Kyoto Encyclopedia of Genes and Genomes (KEGG) (59). Cluego further integrates a list of identified gene ontology (GO) terms and pathways and organizes them into functionally grouped networks. These networks depict the biological relationship between the pathways and gene ontologies. ClueGO possessed sufficient information for 111 genes of the 145 up-regulated genes previously described. Further pruning to remove redundancy resulted in a list of 79 genes. Similarly, ClueGO possessed sufficient infor-mation for 20 genes of the 26 down-regulated genes previously described. Further pruning to remove redundancy resulted in a list of 14 genes.
To infer statistically significant functional groups, we used two-sided (enrichment/depletion) hyper-geometric distribution tests, with an FDR-corrected p value Յ0.05 using Bonferroni adjustment for the terms and the groups created by ClueGO. To reduce redundancy in the GO term categories, fusion option was performed with score set to 0.4. Further redundancy was manually identified and corrected.

Circular dichroism experiments
The Jasco J-810 spectrophotometer was used to perform all circular dichroism (CD) experiments. Samples were loaded into a quartz cuvette with a sample volume of 300 l and a path length of 1 mm. Far-ultraviolet spectra were collected at a wavelength range of 200 -250 nm. All spectra were collected at a 3-nm bandwidth and a 4-s response time. All spectra were collected three times, averaged, and corrected for buffer spectrum.

NMR experiments
NMR spectra were collected on the Agilent 600-MHz spectrometer using a triple-resonance z-gradient cryogenic probe. DPC (D38) micelles were used as a membrane-mimicking cosolvent. 1 mM peptide was suspended in excess DPC (15 mM DPC in 90% H 2 O/10% D 2 O; 15:1 DPC/peptide ratio) for all experiments. 1D 1 H NMR spectrum was acquired with 16,384 complex points and 64 scans. 1 H, 1 H-NOESY (nuclear Overhauser effect spectroscopy) spectrum was acquired with 4096 complex points in the directly acquired dimension and 1024 complex points in the indirectly acquired dimension, and 32 scans. The spectra were processed with the program NMRPipe (60).  Table 5). M. M. designed, performed, and analyzed HeLa cell line toxicity experiments (Fig. 2B). D. C. coordinated the study, planned experiments, and provided resources. N. C. coordinated the study, planned experiments, and provided resources. All authors reviewed the results and approved the final version of the manuscript.