The unfoldase ClpC1 of Mycobacterium tuberculosis regulates the expression of a distinct subset of proteins having intrinsically disordered termini

The human pathogen Mycobacterium tuberculosis (Mtb) har-bors a well-orchestrated Clp (caseinolytic protease) proteolytic machinery consisting of two oligomeric segments, a barrel-shaped heterotetradecameric protease core comprising the ClpP1 and ClpP2 subunits, and hexameric ring-like ATP-de-pendent unfoldases composed of ClpX or ClpC1. The roles of the ClpP1P2 protease subunits are well-established in Mtb, but the potential roles of the associated unfoldases, such as ClpC1, remain elusive. Using a CRISPR interference – mediated gene silencing approach, here we demonstrate that clpC1 is indispensable for the extracellular growth of Mtb and for its survival in macrophages. The results from isobaric tags for relative and absolute quantitation – based quantitative proteomic experiments with clpC1 - and clpP2 -depleted Mtb cells suggested that the ClpC1P1P2 complex critically maintains the homeostasis of various growth-essential proteins in Mtb, several of which contain intrinsically disordered regions at their termini. We show that the Clp

Stringent control of protein expression in a pathogen during regular growth condition, as well as in response to extracellular stimuli during infection, is an important determinant of its virulence (1). Proteostasis, a process of maintaining homeostasis for cellular proteins, primarily relies on the synthesis, folding, trafficking, and degradation of proteins (2). Most intracellular proteins constantly undergo degradation and resynthesis. The overall rate of synthesis and degradation decides the shelf life of proteins in the cell, which varies from a few minutes to days (3). In addition, although certain proteins can be overexpressed to multiple orders of magnitude, many proteins are dosage-sensitive, and their overexpression beyond a threshold limit is detrimental for bacterial growth, which must be taken care of by the respective protease machinery (4).
Proteolysis is an essential quality control system of the cell which maintains the level of cellular proteins by involving various proteases. Although general proteolysis is involved in removal of misfolded or damaged proteins, regulated proteolysis works under specific signals. In the cell, proteostasis requires a consorted activity of ATPases associated with diverse cellular activities (AAAs) and proteases. Typically, the multisubunit protease complex consists of the proteolytic component and regulatory ATPases. The ATPases chiefly perform the maintenance function (5), whereas proteases act as final executioners in the life cycle of proteins (6). Known cases of such multicompartmentalized systems such as Clp (caseinolytic protease), Lon protease, or 20S proteasome reveal that the executory nature of proteases is managed by their fundamental architecture comprising of multisubunit chambered structure with a central pore, whereas associated ATPases such as ClpX, ClpA, ClpC1, or 19S proteasome, which are commonly known as unfoldases, recognize, unfold, and translocate the prospective substrate proteins into the proteolytic chamber (7). Hence, studying these diverse proteins will be pivotal in completely understanding the overall protein homeostasis in a cell.
Mycobacterium tuberculosis (Mtb), the causative agent of tuberculosis in humans, stands among the most dreaded and successful human pathogens while silently encompassing the majority of world's population (8). Recent estimates of the absolute concentrations of the annotated proteins in Mtb reveal that 90% are scarcely expressed within a range of 2 orders of magnitude (9). These observations indicate that a large fraction of mycobacterial proteome is tightly regulated. Complex evolution and years of adaptation has empowered Mtb with unique proteostasis network. Although mycobacteria are endowed with a range of chaperones such as GroEL1, GroEL2, DnaK, DnaJ, GrpE, and ClpB involved in regulating the native conformation of protein, presence of Clp, FtsH, and proteasome-like machinery ensures adequate execution of erroneous or terminally exhausted proteins. Although FtsH, a membrane protease, and Pup-proteasome system play crucial roles in mycobacterial survival during various stresses and infection, respectively, the Clp machinery is the standalone cytosolic protease driving proteolysis under the normal physiological conditions (10). In Mtb, Clp protease consists of two proteolytic subunits namely ClpP1 and ClpP2 and an unfoldase, ClpX (Class II) or ClpC1 (class I; orthologous to ClpA), which together constitute the multisubunit chambered protein complex (11). ClpP1P2 are known to form functional heterocomplex of 14 subunits with 7 subunits from each protein, whereas Clp ATPases constitute a homohexameric chamber that is uniquely attached to the ClpP2 subunit of ClpP1P2 tetradecameric complex (12). Functionally, Clp ATPases are supposed to recognize and associate with the putative substrate, unfold it, and subsequently translocate into the ClpP1P2 proteolytic chamber (13). Whole-cell proteomic study with conditional clpP1P2 knockdown strain has established the essential nature of these proteins in Mtb physiology and intracellular survival (14). Unfortunately such characterization has not been attempted yet to delineate the function of associated unfoldases in the pathophysiology of Mtb.
Although the transposon-based TRASH screen predicts the clpC1 gene to be indispensable (15), the fundamental cause of its essentiality in Mtb is not understood. Moreover, its homologue is functionally redundant in other bacteria such as Escherichia coli (16). Hence, it is critical to experimentally determine the requirement of ClpC1P1P2 machinery to better characterize the role of ClpC1 in mycobacteria. Using CRISPR interference (CRISPRi) approach of gene silencing (17), here we establish that akin to clpP1 and clpP2, clpC1 is essential across different mycobacterial species including the pathogenic Mtb H 37 Rv, the nonpathogenic Mtb H 37 Ra, and the fast-growing soil bacterium Mycobacterium smegmatis (Msm) during extracellular growth, as well as in macrophages. Global protein expression profiling by quantitative proteomics indicates that the ClpC1P1P2 protein complex acts on a distinct set of proteins that are involved in a plethora of essential cellular functions including central metabolism and cell wall biosynthesis. Although comparative sequence analysis identifies known signatures of Clp machinery in a small subset of candidate proteins, ;85% of these sequences are enriched with disorder-promoting residues at either the N or the C terminus, suggesting a possible role for terminal disordered regions in their recognition by Clp protease. Finally, we show that the small heat shock protein Hsp20, which is highly accumulated in clpC1(2) and clpP2(2) strains of Mtb, is a dosage-sensitive protein, and its expression is maintained in Mtb exclusively by ClpC1P1P2 proteolytic machinery by recognition of its C-terminal region, which is disordered.

ClpC1 unfoldase is essential for Mtb proliferation
To understand the requirement of ClpC1, we constructed CRISPRi-based knockdown strains of Mtb H 37 Rv, conditionally depleted for clpC1 (annotated as clpC1 (2)) in an anhydrotetracycline (ATc)-dependent manner, typically as described earlier (17). As a positive control, clpP2 was also targeted using the same approach (annotated as clpP2 (2)). The ATc-dependent nature of silencing enabled us to fine-tune the dosage of target genes with minimal intervention. Treatment of respective knockdown strains with 50 ng/ml ATc for 7 days resulted in 70-80% suppression of clpP2 and clpC1, respectively, compared with their levels in empty vector control (Fig. 1, A and B). Next, we analyzed the growth profile of clpC1 (2) and compared it with clpP2(2) strain by determining A 600 as well as bacterial colony-forming unit (CFU) over a period of 2 weeks. As presented in Fig. 1C, cultures of empty vector containing control strain grow to full turbidity and achieve A 600 of 3.5 by 14 days, whereas bacteria depleted with either of these genes were unable to grow beyond A 600 of 0.30. Furthermore, CFU estimation indicates that the clpC1(2) and clpP2 (2) strains proliferate in the synthetic culture medium, albeit at a much slower rate compared with the empty-plasmid control (Fig. 1, D and E). Similar results were obtained with suppression of orthologous genes in the slow-growing avirulent strain Mtb H 37 Ra (Fig. S1A) or in fast-growing soil-borne Msm (Fig. S1B). These results thus conclude that ClpC1 unfoldase is required for bacterial replication across different mycobacterial species, irrespective of their growth rate or virulence.

ClpC1 is vital for the intracellular growth of Mtb
The extreme dependence of Mtb on the ClpC1 protein for its extracellular growth demonstrates critical requirement of ClpC1 in the management of the core metabolic processes. To further examine the requirement of ClpC1 for intracellular growth of Mtb, THP1 human monocyte-derived macrophages were infected with mid-log grown cultures of control and clpC1(2) strains at an MOI of 1:2 for 4 h. Subsequently, growth of mycobacteria was monitored by CFU plating of macrophage lysates at different intervals. As may be seen in Fig. 2A, the knockdown strain exhibits poor growth in THP1 macrophages compared with the control bacteria. Although the control Mtb cells show a substantial increase in bacterial counts on days 5 and 7 post-infection (3.92, 4.03, 4.03, 4.80, and 5.25 log 10 CFU/well on 0, 1, 3, 5, and 7 days after infection, respectively), growth of clpC1(2) within macrophages is essentially stalled (3.79, 3.88, 4.09, 4.13, and 4.24 log 10 CFU/well on 0, 1, 3, 5, and 7 days after infection, respectively). A similar level of attenuation in the intracellular growth was observed with clpP2(2) strain of Mtb H37Rv. Infection of THP1 with bacteria depleted with clpP2 resulted in complete cessation of growth because bacterial CFU did not increase by more than 5-8% of the initial CFU counts of 3.87 log 10 CFU/well on day 0 (Fig. 2B). Based on these results, we conclude that ClpC1 and ClpP1P2 are indispensable for intracellular survival of Mtb in host macrophages.

Demystifying the functional essentiality of ClpC1 in Mtb
In general, any given compartmentalized degradation machinery works by implementing the activity of protease and assisting unfoldase for effective proteolysis (11). The fundamental role of the associated ATPase is to provide specificity by recognizing, linearizing, and translocating the corresponding substrate to the protease chamber. The physiological essentiality of ClpC1 ATPase may be determined by the kind of substrates that are dependent on this protein for their appropriate regulation. Therefore, it becomes quintessential to define the prospective substrates of ClpC1P1P2 machinery to completely decipher the workings of Clp complex in mycobacterium.
Possible substrates of ClpXP and ClpAP machinery have been identified in E. coli using ClpP trap mutant (18). However, a similar approach did not work with Mtb because clpP1 and clpP2 genes are essential for growth and loss of clpP2 could not be complemented by expressing the S110A trap mutant (data not shown). Alternatively, we followed an approach wherein all such proteins that are accumulated in clpC1 knockdown strain were identified. We speculate that candidate proteins simultaneously showing abundance in an unrelated clpP2(-) strain may correspond to putative substrates of ClpC1P1P2. To achieve this objective, a global proteomic approach of iTRAQ-based LC-MS was implemented, as described under "Experimental procedures".  As summarized in Fig. 3A, whole-cell lysates were prepared after 4 days of ATc treatment from control, clpC1(2), and clpP2(2) strains and equal amount of proteins from each of these strains were subjected to tryptic digestion and labeling with four-plex isobaric tags followed by LC-MS. The time point was carefully chosen so that there is no major growth defect among strains. The iTRAQ data were obtained from two biological replicates with quantitation of ;45% of total Mtb proteins based on at least two peptides in both the sets ( Fig. 3B and Data Set S1). Further analysis reveals that 317 proteins in clpC1(2) and 359 proteins in clpP2(2) strains of Mtb H 37 Rv exhibit ≥1.5-fold accumulation in both the biological replicates. Importantly, 69% proteins accumulated in clpC1(2) (n = 219) are common to clpP2(2) (Fig. 3C). Apart from these, several proteins exhibit down-regulation upon depletion of clp genes; we anticipate that these are indirectly regulated either at the level of transcription or translation (Data Set S1). Furthermore, it was observed that 37% of proteins universally accumulated in both clpC1(2) and clpP2(2) strains are involved in intermediary metabolism and respiration, 24% regulate lipid metabolism, and cell wall and cell processes, whereas 27% are unknown (Table 1 and Fig. 3D). Remarkably, 38% proteins (n = 82) that are modulated in the above knockdown strains are essential for in vitro growth of Mtb (Table 1). Our results show that proteins mainly involved in energy production (AtpG, AtpA, AtpH, AtpD, Tkt, Ndh, Pgi, Gap, AceE, Pyk, PckA, GltA2, Gdh, Rv0248c, NadB, HemL, DlaT, LpdC, MenB, and EttA), amino acid metabolism (ThrC, ThrS, TrpS, GcvB, AspC, AlaS, HisC1, LysA, BkdB, IlvB1, GlnE, SerC, PheA, and GlyA1), protein homeostasis and transport (Rv3780, PrcB, Ffh, SecA1, SecA2, DnaJ1, DnaJ2, GrpE, ClpP1, and ClpB), Esx secretion system (EspE, EccA3, EccC3, EccD3, EspG3, and EccC5), cell wall biosynthesis (FabG1, GlgB, GlgE, InhA, AccD5, AccD6, and FtsH), transcription (SigA, CarD, SigE, MoxR1, and RbpA), DNA synthesis (PolA, NrdE, and GuaB2), and redox balance (TrxB2, SahH, and Mrp) are under regulation by Clp machinery, and their perturbation might be accountable for apparent lethality of respective knockdown strains (Table 1).

ClpC1-regulated proteins exhibit disordered ends
In bacteria such as E. coli, a signature motif LAA is identified by the ClpC1 homologue ClpA at either the N or the C terminus of the putative substrate (19). In addition, the ClpC1 orthologue ClpA follows the N-end rule by recognizing four N-  37 Rv depleted with clpC1 and clpP2. A, schematic of the work-flow for iTRAQ analysis. Lysates were prepared from respective knockdown strains after 4 days of depletion with 50 ng/ml ATc, which was subsequently labeled with four-plex iTRAQ labels followed by LC-MS analysis, as described in the text. B and C, approximately 50% of total Mtb proteins (n = 1827) were common in the two different iTRAQ experiments (B), which comprise of 219 proteins showing differential abundance (by ≥1.5-fold) after depletion of both clpC1 and clpP2 (C). D, functional categorization of ClpC1P1P2-regulatory proteins in Mtb. Shown is the percentage of distribution of various accumulated substrates under different functional categories. Functional classification was performed using the Mycobrowser database (RRID:SCR_018242) of proteins accumulated in clpC1(2) and clpP2(2) strains.

Table 1
List of proteins differentially accumulated in Mtb H 37 Rv in response to depletion of clpP2 and clpC1  Functional characterization of Mtb-ClpC1 unfoldase terminal residues (Tyr, Phe, Trp, and Leu) that serve as primary N-end degrons. The ClpS adaptor binds these residues and delivers attached substrates to the AAA1 ClpAP protease for degradation (20). Importantly, the ClpA unfoldase also recognizes untagged proteins such as casein, which is inherently unstructured (21), or those that are chemically denatured (22). A careful analysis of each of the 219 protein sequences for the presence of known motifs within the 15-aa region at either the N or the C terminus reveals that 28 proteins responding to clpC1 and clpP2 knockdown exhibit the presence of N-end degrons; these include three proteins namely End, LppZ, and Rv3722c, which also contain the characteristic LAA motif at the N terminus and a RNA polymerase-binding transcription factor CarD containing LAA at the C terminus. In addition, eight proteins exhibit LAA sequence exclusively at the C terminus (Table 2). Remarkably, we fail to identify such signature sequences in a large proportion (n = 183) of ClpC1P1P2-regulated proteins.
Because the presence of disorder-promoting residues (DPRs) (Pro, Arg, Gly, Gln, Ser, Glu, Lys, and Ala) contribute to proteolysis (23, 24), we further examined whether ClpC1P1P2-dependent proteins exhibit altered abundance of DPRs. Analysis of 219 proteins accumulated in both clpC1(2) and clpP2(2) strains reveals that 186 proteins are indeed enriched with ≥50% DPRs at the terminal 15-aa region. Moreover, 127 of these contain 20% order-promoting residues (ORs) (Cys, Trp, Tyr, Phe, Ile, Leu, Val, and Asn) at the respective ends, indicating the disordered conformation at their terminal regions ( Fig.  4A and Data Set S2). It is noteworthy that 22 proteins are enriched with DPRs at both the termini (Fig. 4B). Overall analysis of ClpC1-dependent proteins suggests that the disordered conformation might be critical for ClpC1P1P2-mediated proteolysis.

Involvement of ClpC1P1P2 in post-transcriptional regulation of a small heat shock protein, Hsp20
Subsequent to the identification of ClpC1-regulated proteins by quantitative MS, we shortlisted one of the most abundant proteins in clpC1(2) and clpP2(2) strains, the small heat shock protein Hsp20, for further validation. Hsp20 is a unique protein because it shows altered expression in response to a variety of stresses (25); however, its regulation has not been understood well. Our quantitative proteomics results suggest that Hsp20 is primarily dependent on ClpC1P1P2 for its post-transcriptional regulation. To examine whether the Hsp20 is indeed regulated at the post-transcriptional level, first we determined the change in the expression levels of hsp20 transcripts and compared it with Hsp20 protein levels in the respective clpC1(2) and clpP2(2) strains of Mtb H 37 Rv. We found that suppression of clpP2 gene results in minor induction of hsp20 transcripts by 3.5 6 0.35-fold, as determined by quantitative RT-PCR, whereas no change in hsp20 expression was observed in clpC1 (2) (Fig. 5A). This is in sharp contrast with the levels of Hsp20 protein, which exhibits significant up-regulation in both the clpP2(2) (41.5 1 5.4-fold) and clpC1(2) (18.4 1 5.2-fold) strains, as estimated by iTRAQ LC-MS (Fig. 5A). To further validate these results, we performed anti-Hsp20 immunoblotting Functional characterization of Mtb-ClpC1 unfoldase Table 2 Analysis of ClpC1-specific signature motifs in proteins accumulated in response to depletion of clpP2 and clpC1 Signature motifs recognized by ClpC1 are used to find similar sequences in the N-and the C-terminal 15-aa region of proteins accumulated in respective knockdown strains of Mtb H 37 Rv. The positions of the respective motifs in the candidate proteins are shown in bold type.

S. No. Accession
Rv no.

Gene ID Description
In vitro essentiality clpP2(2)/ control clpC1 (2) Functional characterization of Mtb-ClpC1 unfoldase of whole-cell lysates prepared from control, clpC1(2) and clpP2 (2) strains of Mtb H 37 Rv. Surprisingly, the Hsp20 level was beyond the detection limit in control, whereas intense signals with anti-Hsp20 antibodies were obtained with both clpP2(2) and clpC1(2) lysates (Fig. 5B). Notably, expression of an unrelated protein YidC remained unaltered in all three strains (Fig. 5B), thus ruling out the sample loading difference; equal loading was also verified by staining of the immunoblot with Ponceau S dye (Fig. 5B). Taken together, these observations corroborate the MS results and conclude that expression of Hsp20 is strictly under the control of ClpP1P2 and ClpC1 at the post-transcriptional level.
Hsp20 is a dosage-sensitive protein which is regulated by ClpC1P1P2 via its free C-terminal sequence Small heat shock proteins (sHsp) are generally the foremost responders to multiple stresses and hence are stringently regulated (25). Hsp20, also annotated as Acr-2 (a-crystalline-2) in Mtb, is a heat shock-induced small chaperonin protein with a molecular mass of 18 kDa harboring well-conserved and characteristic a-crystalline domain. It is highly up-regulated transcriptionally via various two component regulators under different stress conditions (26). To understand the underlying mechanism of regulation of Hsp20 protein by ClpC1P1P2, first we sought to determine whether the accessory proteins such as ClpS, which acts as a ClpC1 adapter (27), or SmpB, which is involved in ssrA tagging, are directly involved. Expression of clpS and smpB was depleted in cells using CRISPRi approach, and the level of Hsp20 was determined by anti-Hsp20 immunoblotting in the control, clpS(2) and smpB(2) strains. Fig. S2 shows that despite .80% depletion of these genes, expression of Hsp20 remained unaltered, which rules out the involvement of ClpS or SmpB in regulation of Hsp20. The recent crystal structure of the Hsp20 homologue from M. marinum shows the polydisperse nature of this protein with a dodecameric structure made up of multiple dimer subunits in vitro. Likewise, the nano-electrospray MS and cryo-EM studies reveal that Hsp20 of Mtb forms a wide variety of homooligomeric complexes made up of dimeric and tetrameric building blocks (28). There is evidence suggesting that multimeric structures can act as holdase by creating different binding scaffolds (29). Akin to these observations, we also found a higher oligomeric state of the purified Mtb Hsp20 protein by size-exclusion chromatography (Fig. 5C). Notably, multiple conformations of purified Hsp20 were also visible on Coomassie Brilliant Blue-stained denaturing polyacrylamide gel, as well as by anti-Hsp20 immunoblotting (Fig. 5C). Based on the available crystal structure of M. marinum protein, we modeled Mtb Hsp20, which suggests a characteristic solvent-exposed floppy C terminus in Hsp20 dodecamers (Fig. 5D, inset) akin to numerous Acr types of sHsp proteins (30). Moreover, 11 of 15 amino acids at the C terminus of Hsp20 are disorder-promoting in nature, and only 2 amino acids belong to the order-promoting category (Data Set S2). Because the C terminus of sHsp also contributes in the  (2) and clpP2(2) strains. The 219 proteins differentially accumulated in clpC1(2) and clpP2(2) strains were further categorized into 186 proteins enriched with ≥50% DPRs at the terminal 15-aa region. Moreover, 127 of these contain 20% ORs, indicating the possibility of unstructured region at the termini. B, list of proteins accumulated in clpC1(2) and clp2(2) strains, that exhibit ≥50% intrinsic disordered region and 20% OR at both the N and the C termini.
Functional characterization of Mtb-ClpC1 unfoldase formation of higher oligomers (28), we argue that although purified Hsp20 is inherently very stable, in vivo it is readily degraded. Because the degradation of multimeric Hsp20 would require disassembly followed by unfolding, it is hypothesized that ClpC1, being the unfoldase, is responsible for the recognition, unfolding, and degradation of Hsp20 via its unstructured C terminus. To test our hypothesis, firstly we overexpressed hsp20 transcripts harboring different tags at the 59-and 39ends, thus shielding the N-and the C-terminal sequences of the protein, respectively, and tested its expression by immunoblot-ting. Remarkably, in the WT bacteria we did not find expression of N-ter cMyc-tagged Hsp20 having free C terminus, despite a very high level of induction of its transcripts (Fig. 5E). Alternatively, intense signals of cMyc-Hsp20 expression were obtained in cells lacking ClpP2 or ClpC1 (Fig. S3). Further, when the C terminus of Hsp20 was blocked by the addition of either of the cMyc or FLAG tags or by linking with GFP, expression of recombinant protein became visible even in the WT cells ( Fig.  5F and Fig. S4), thus indicating the involvement of the free C terminus in regulation of Hsp20 expression. Figure 5. Dosage sensitivity of Hsp20 is mediated by ClpC1P1P2 via recognition of its C terminus. A, expression analysis of hsp20. Comparative analysis of hsp20 transcripts and its protein levels in the individual knockdown strains of Mtb H 37 Rv reveals relatively significant accumulation of Hsp20 protein in the clpC1(2) and clpP2(2) compared with its expression in the control strain. Transcripts were quantitated by real-time PCR, whereas protein levels were obtained from iTRAQ-based LC-MS studies. B, expression analysis of Hsp20 by immunoblotting. Assessment of Hsp20 expression levels in control and different knockdown strains of Mtb H 37 Rv by anti-Hsp20 immunoblotting further corroborates its post-transcriptional regulation by ClpC1 and ClpP2 (middle). The upper portion of the blot was cut and probed with anti-YidC, which served as control (top); levels of YidC remain constant, indicating equal loading of samples. Comparable loading of samples was also confirmed by Ponceau S staining of the blot (bottom). C, conformational analysis of purified Hsp20. Evaluation of purified Hsp20 by Coomassie Brilliant Blue-stained denatured polyacrylamide gel shows multiple protein bands migrating at higher molecular masses, indicating polydisperse conformation, which was also confirmed by size-exclusion chromatography, as well as by anti-Hsp20 immunoblotting of different fractions. D, homology modeling of Mtb Hsp20. Homology modeling by SWISS-MODEL reveals dodecameric conformation with a distinct solvent-exposed floppy region at the C terminus (see inset). E and F, in vivo expression analysis of Hsp20. Despite showing significant overexpression of mRNA transcripts following ATc treatment, N-ter cMyc-tagged Hsp20 (cMyc-Hsp20) with a free C terminus fails to show expression by anti-cMyc immunoblotting in WT M. smegmatis (E). Contrary to this, its derivative with C-ter cMyc tag (Hsp20-cMyc) exhibits remarkable overexpression upon incubation with ATc (F). The upper portions of the same blots in E and F were probed with anti-PknB antibody as control for validating equal loading of samples. The values in graphs represent the means 6 S.D. from multiple experiments. The p value was determined by Student's t test.
The disordered C-terminal end of Hsp20 is critical for interaction with ClpC1 and subsequent dosage sensitivity Based on the above findings, we further assessed whether the C-terminal disordered region of Hsp20 is required for its recognition by ClpC1 unfoldase. Ten amino acid residues were removed from the extreme C terminus of Hsp20, and expression of the truncated protein was examined in the WT bacteria harboring functional Clp protease complex. As shown in Fig.  6A, although full-length Hsp20 was unable to express, the Cterminal truncation restored its expression in the WT bacteria, which was similar to its level in cells depleted with clpP2 (Fig.  6A). To examine the underlying cause of differential expression of full-length and truncated Hsp20, we sought to analyze whether Hsp20 directly interacts with ClpC1 and whether its C terminus is required for this association. To achieve this objective, interaction of WT Hsp20 and its 210 derivative with ClpC1 was examined by using BLI technology, as described under "Experimental procedures". Importantly, the C-ter truncated derivative of Hsp20 exhibits an equal ratio of dodecamer and dimer conformations (Fig. S5). Purified ClpC1 of Mtb was immobilized to saturation (2.1 nm) on an amine-reactive second generation biosensor, which was then incubated with buffer containing different concentrations of Hsp20 proteins varying from 0.25 to 2.0 mM. As shown in Fig. 6B, full-length Hsp20 binds with ClpC1 in a dose-dependent manner, showing the affinity constant (K D ) of 1.5 3 10 27 M. Notably, removal of the C-terminal 10 aa from Hsp20 resulted in a complete loss of binding, with both dimer (Fig. 6C) and dodecamer (Fig. 6D) conformations even at relatively higher concentrations.

Discussion
The grandeur of any cellular system lies in its unprecedented ability to control the flow of genetic information. Phenotypic representation of such control is evident from the expression pattern of various dosage-sensitive proteins (31). Therefore, cellular machineries involved in maintaining the protein stoichiometry by regular calibration become essential for growth. In mycobacteria, Clp machinery plays a leading role in maintaining the overall protein burden (14) and therefore must be tightly regulated. Remarkably, the core protease subunits are assisted by associated unfoldases; however, the role of these unfoldases in Mtb has never been investigated. In this study we aimed at decoding the putative nonredundant role of the AAA1 ClpP1P2 partner ClpC1.
Growth analysis of control and knockdown strains reveal that cells depleted with the expression of clpC1 and clpP2 are unable to proliferate, both in the synthetic culture medium and inside the host macrophages, thus implicating the utmost requirement of ClpC1P1P2 machinery in maintaining the homeostasis of proteins that are critical for mycobacterial pathophysiology. To further identify proteins that are regulated by ClpC1 in Mtb, we followed a whole-cell proteomic approach. Because perturbation of clpC1 would affect many proteins that might be indirectly regulated, we anticipate a majority of those Figure 6. Role of C-terminal region of Hsp20 on its regulation by ClpC1. A, expression analysis of full-length and truncated Hsp20. Although full-length protein is expressed only in the clpP2(2) strain and not in the WT bacteria, deletion of the terminal 10 aa from the C terminus restores expression of cMyc-Hsp20 in the WT cells similar to its level in the clpP2(2) strain. B-D, association of Hsp20 with ClpC1 requires C-terminal sequence. Kinetics of ClpC1 association with full-length dodecamer (B) and C-terminal truncated dimer (C) or dodecamer (D) Hsp20 by BLI-Octet indicates the requirement of C-terminal sequence in the recognition of Hsp20 by ClpC1. The results are representative of three independent experiments.
Functional characterization of Mtb-ClpC1 unfoldase proteins, if not all, showing that accumulation simultaneously in both the clpC1(2) and clpP2(2) strains might be considered as specific substrates of ClpC1P1P2 machinery. Such proteins that are proteolyzed inside the cell via recognition, unfolding, and degradation by ClpC1 must bear the necessary degron. A careful analysis, however, identifies the characteristics ClpC1 signatures only in a minor fraction of proteins (Table 2). There is emerging evidence suggesting that the extent of disorder is an important determinant of the protein's t 1/2 (32,33). Protein unfolding is critical for translocation of the Clp substrates into the proteolytic chamber (34). Importantly, many proteins such as casein or antitoxins that are considered specific substrates of ClpC1 exhibit prevalence of intrinsically disordered regions (35). It is noteworthy that casein, which is specifically recognized by ClpC1, does not contain signature motifs such as Alarich sequence or any of the four N-end degrons. Rather, structurally it is an intrinsically unstructured protein (21) having a strong tendency to form globe-shaped surfactant-like micelles because of self-association in aqueous environment (36). Intrigued with these observations, we speculate that apart from showing sequence specificity, ClpC1 unfoldase can readily identify such proteins that are unstructured. The ability of ClpA and not of ClpX to recognize and facilitate the degradation of nonsubstrate proteins such as GFP or rhodanese, which are chemically denatured (22), further supports the above hypothesis. In our study, ;58% of proteins accumulated in the clpC1(2) and clpP2(2) strains of Mtb exhibit preponderance of disordered residues at the termini (Data Set S2), whereas only 13% proteins exhibit N-end degrons and 5% contain Alarich sequences ( Table 2). Hsp20 is one such example that does not bear any canonical sequence for recognition by ClpC1 unfoldase but is rapidly degraded by ClpC1P1P2 machinery in vivo. Additionally, it lacks N-end degron, and therefore no effect of depletion of either of smpB or clpS on its expression was observed (Fig. S2). Indeed our study shows that the unstructured region at the free C terminus serves as a signature for recognition of Hsp20 by ClpC1 and its subsequent degradation by Clp protease (Figs. 5 and 6). Importantly, the C-terminal region of Hsp20 is crucial for the formation of higher oligomers. Displacement of unstructured region to the inside by incorporating additional sequences at the extreme C terminus protects the Hsp20 protein from degradation possibly caused by the loss of its association with ClpC1.
Various AAA1 proteins, including a homologue of ClpA, the ClpB disaggregase, recognize the unfolded or aggregated proteins by specifically binding with exposed hydrophobic residues and prevent accumulation of misfolded proteins that are otherwise toxic for the cell (37). There are multiple reports on the transcriptional regulation of hsp20; however, overexpression of hsp20 in our study suggests a rapid degradation of its translated product in the WT mycobacteria. Remarkably, Hsp20 exhibits increased expression in response to bedaquilline drug treatment (38), as well as upon exposure to heat and detergent (26), suggesting its conditional requirement in mycobacteria. The structure of Hsp20 shows that the C-terminal amino acids are solvent-exposed despite the presence of highly compact multimeric structure formed by rest of the polypeptide chain (Fig. 5D). By virtue of characteristic structure with multiple substrate-binding sites, Hsp20-like proteins act as holdase. However, the holdase-like activity is irrelevant during steady-state growth and essentially required under adverse conditions to help proteins retain their conformation and function (29). Holdases act by acquiring the characteristic architecture that promotes sequence nonspecific association with morphed proteins and prevent their irreversible aggregation (39). The constant presence of such proteins under normal physiological conditions will lead to irrational and nonspecific interactions, resulting in disruption of the cellular milieu. Therefore, it becomes essential for candidate proteins to become free from holdases in the absence of stress. In an unpublished study, we observed that Hsp20 indeed shows association with ;60 proteins, further highlighting its highly interacting nature. Overall, results from this study further corroborate that interaction promiscuity of a protein could be an important determinant of its dosage sensitivity (4).
Although we have established ClpC1's role in maintaining the homeostasis of such dosage-sensitive proteins, it remains to be determined how it confers substrate specificity. As summarized in Fig. 7, it appears that ClpC1 might recognize a characteristic conformation that serves as degron. Although degrons offer the initiation site for interaction with unfoldase, further processing of substrate is an energy-driven process that requires unfolding of the rest of the polypeptide chain by ClpC1. The unique N-terminal engagement loops of ClpC1 linearize and translocate the substrates into the proteolytic chamber, leading to proteolysis. Remarkably, several immunogenic proteins of Mtb that are rich in DPRs such as PE-PPE and PE-PGRS family of proteins remain unaffected by the depletion of ClpC1. Recently, it was shown that although individually these proteins are disordered, they tend to form protein complexes that are highly ordered. It was proposed that such disorder-order structural dynamics is employed as a strategy by Mtb to elicit a pro-pathogen response that is favorable for infection (40). Taken together, these findings and our study suggest that disorder-to-order transition could be critical in determining the fate of a protein in vivo. The length of the initiation site, post-translational modification, protein-protein interaction, occupancy by a low complexity region in the disordered stretch, and the overall confirmation of unstructured regionwhether collapsed or expanded-are some of the determinants of protein's shelf life in the cell (41). Further studies are warranted to investigate whether these features are critical for recognition of the substrate proteins by ClpC1 and their subsequent proteolysis by Clp proteolytic machinery in Mtb as well.

Culturing of mycobacteria
The study involved Mtb H 37 Rv, Mtb H 37 Ra, and Msm mc 2 155, which were kindly provided by Dr. William Bishai (Johns Hopkins University, Baltimore, MD, USA) and Dr. William Jacobs (Albert Einstein College of Medicine, Bronx, NY, USA), respectively. Culturing of mycobacteria was performed in Middlebrook 7H9 broth or 7H11 agar supplemented with 13 OADS (0.054 g/liter oliec acid, 5 gm/liter BSA V, 2 g/liter dextrose, and 0.81 g/liter sodium chloride) along with 0.02% tyloxapol and 0.5% glycerol. Growth was obtained on 7H11 agar plates after incubation at 37°C, and broth cultures were grown at 37°C with shaking at 200 rpm. For culturing mycobacteria, kanamycin and hygromycin were used at concentrations of 25 and 50 mg/ml, respectively, whereas for E. coli kanamycin and hygromycin were used at 50 and 150 mg/ml, respectively, whenever needed.

Construction of knockdown strains using CRISPRi
To achieve the repression of Rv3596c (clpC1), Rv2461 (clpP2), Rv1331 (clpS), and Rv3100c (smpB) genes, a pair of complementary oligonucleotides specific to the target ORFs near the 59-end were synthesized, annealed, and cloned in pGrna at AflII-AclI sites, as previously described (17). The recombinant pGrna plasmid containing gene-specific guide sequences (Table 3) was transformed into dCas9-expressing Mtb H 37 Rv or Mtb H 37 Ra to generate Kan R -Hyg R knockdown strains, namely clpC1(2), clpP2(2), clpS (2), and SmpB (2), respectively. Similarly, individual knockdown strains were created in Msm by targeting MSMEG_6091 (clpC1) and MSMEG_4672 (clpP2), respectively, with a minor modification. Briefly, gene-specific guide sequences (Table 3) were created by annealing complementary oligonucleotides that were cloned in Figure 7. Schematic representation of a model depicting critical requirement of terminal disordered region for recognition of substrate by ClpC1. A and B, terminal disordered region serves as recognition signal for effective engagement, unfolding, and degradation of the prospective ClpC1 substrate. A, the solvent-exposed C-terminal residues in Hsp20, a model substrate of ClpC1, provide a unique conformation that facilitates engagement of Hsp20 with the Nterminal region of ClpC1 followed by unfolding and introduction to the protease chamber. B, in the absence of the C-ter tail, the Hsp20 is unable to interact with ClpC1, which prevents its proteolysis by Clp machinery.
Functional characterization of Mtb-ClpC1 unfoldase pGrna plasmid at AflII-AclI sites. The DNA fragment containing tetracycline-inducible promoter along with guide sequence was excised from respective recombinant pGrna clones by NheI-NotI digestion and cloned in pTetInt-dcas9 at XbaI-NotI sites. Subsequently, the recombinant plasmids pTetInt-dcas9:: gRNA_clpC1 and pTetInt-dcas9::gRNA_clpP2 were electroporated in WT Msm to create Kan R clpC1(2) and clpP2(2) strains, respectively. Suppression was achieved by treatment of bacterial cultures with 50 ng/ml ATc for 4 days in Mtb and for 24 h in Msm (unless indicated otherwise).

RNA extraction and quantitative real time RT-PCR
Extraction of total RNA was performed using RNAiso plus reagent (Takara Bio Inc.) according to the manufacturer's instructions. First strand cDNA was synthesized with 500 ng of total RNA after DNase I treatment (Thermo Fisher) using random hexamer primers and Superscript III RT (Invitrogen Life Technologies), as per the manufacturer's recommendations. PCR was performed with 50 ng of cDNAs and SYBR Green PCR Master Mix (Applied Biosystems) with the help of gene-specific primer pairs (Table 3) amplifying the ;200-bp region near the 59-end. Real-time quantification was carried out using the ABI 7500 Fast real-time PCR system (Applied Biosystems) as instructed by the manufacturer. The expression levels of different genes in the test strain were estimated relative to their expression in control after normalizing with the change in expression level of the housekeeping gene sigA as reported previously (17).

Cloning, expression, and protein purification
To achieve expression of N-ter His 6 -tagged full-length and C-terminal 210-aa truncated Hsp20 and full-length ClpC1 in E. coli, respective ORFs were PCR-amplified from Mtb genome and cloned in pET28b plasmid (Invitrogen). Sequences of oligonucleotides used for amplification of hsp20 and clpC1 are listed in Table 3. Expression of N-terminal His 6 -tagged proteins was obtained in E. coli BL21 following overnight induction with 0.5 mM isopropyl b-D-thiogalactopyranoside at 18°C. The recombinant proteins were subsequently purified from E. coli by using nickel-nitrilotriacetic acid affinity chromatography followed by dialysis in a storage buffer containing 50 mM Tris-HCl pH8.0, 100 mM NaCl, and 10% glycerol. The dialyzed proteins were stored at 280°C for further use.

Size-exclusion chromatography
Superdex 200 Increase 10/300 GL (GE Healthcare) was used to perform analytical gel filtration on the AKTA purifier (GE Healthcare) with a flow rate of 0.5 ml/min using buffer containing 100 mM Tris-HCl pH 8.0 and 150 mM NaCl. Known sizeexclusion chromatography standards (Sigma-Aldrich) were used to determine the void volume and construction of the standard curve for determination of molecular mass of the proteins. 100-200 mg of purified protein was used in a single run.

Analysis of proteins by iTRAQ-based LC-MS approach
Synthesis of peptides-For peptide synthesis, 100 mg of whole-cell extracts (WCEs) from different strains of Mtb H 37 Rv was cleaned by acetone precipitation. The pellets were suspended in 50 mM triethyl ammonium bicarbonate, pH 8.0, and denatured by adding 0.1% SDS. Subsequently, 5 mM (Tris-(2-carboxyethyl) phosphine was added in each sample and incubated at 60°C for 1 h to reduce the proteins followed by Labeling of peptides-For iTRAQ labeling, peptides from two biological replicates were incubated with four-plex iTRAQ labeling reagents (AB-SCIEX), according to the manufacturer's recommendations. After labeling, the reaction was quenched by diluting the labeled samples in two volume of water. The resultant labeled peptides from all three strains were mixed together and dried under vacuum.
Cation-exchange fractionation of labeled peptides-The labeled peptide mixture was fractionated on PerkinElmer Flexar HPLC system using Agilent Zorbax strong cation-exchange column (2.1 3 150 mm) having 5 mm particle size. The samples were dissolved in 80 ml of strong cation-exchange loading buffer (8 mM ammonium formate and 30% acetonitrile (ACN)), and peptides were eluted using automatic fraction collector with linear gradient of ammonium formate (10-500 mM) at the rate of 300 ml/min at every minute for ;40 min The eluted fractions (;40 fractions) were vacuum-dried and reconstituted just before the second-dimensional separation on C18 column.
LC-MS analysis of cation-exchange iTRAQ-labeled peptides-The cation-exchange fractions were reconstituted in 15 ml of 2% ACN and 0.1% formic acid in water and centrifuged at 10,000 3 g for 5 min. 5 ml of clear supernatant was loaded onto a Cap-Trap C18 trap cartridge (Michrom Bioresources Inc.) and desalted for 10 min at the rate of 10 ml/min using 2% ACN and 0.05% TFA in water using Eksigent NanoLC 400 labeled peptides separated on a Chromolith Caprod RP-18e HR capillary column (150 3 0.1 mm; Merck Millipore) using alinear gradient of buffer B (98% ACN and 0.05% TFA in water) in buffer A (2% ACN and 0.05% TFA in water). The peptides were eluted at the rate of 1.5 ml/min, which were directed to the 5600 TF for MS and MS/MS analysis. MS spectra were acquired from 350 to 1250 Da and MS/MS of the peptides were fragmented using IDA criteria. In brief, 25 most intense peaks were fragmented using collision-induced dissociation with iTRAQspecific rolling collision energy in each cycle. The MS/MS spectra was acquired from 100 to 1600 Da.
Peptide identification and quantitation-Identification as well as quantitation of peptides and proteins was performed by ProteinPilot software, version 5.0.1 (AB SCIEX) using the Paragon algorithm as the search engine. MS/MS spectra were searched against the Mtb (strain ATCC 25618/H37Rv) reference proteome (UP00001584) downloaded (February 2019) from UniProt (RRID:SCR_018666) having 3993 protein entries in FASTA format and incorporated in ProteinPilot database for search. Trypsin was used as proteolytic enzyme with the cleavage at arginine and lysine residues. Protein identification was performed using single missed cleavage, whereas peptides without missed cleavage were used for quantification. Cys alkylation was used as fixed modification, whereas oxidation at methionine and deamidation at Asn and Gln were used as variable modifications for search against the database. The iTRAQ four-plex peptide-labeling quantitation method was used for the quantitation of peptides and proteins. The search results were corrected for background and bias correction. Tolerance levels for MS and MS/MS were 0.1 and 0.01 Da, respectively.
To estimate the false discovery rate (FDR), a decoy database search strategy was used. The FDR is defined as the percentage of decoy proteins identified against the total protein identification. The FDR was calculated by searching the spectra against the Mtb H37Rv database. Protein with 1% FDR was used for the quantitation purpose. The peptide selection criteria for relative quantitation were performed as follows: only peptides unique for given protein were considered for relative quantification, excluding those common to other isoforms or proteins of the same family. The proteins were identified on the basis of having at least two peptides with an ion score above 95% confidence.

Immunoblotting
Whole-cell extracts of mycobacteria were prepared by bead-beating lysis in 13 PBS, and immunoblotting was performed with 30 mg of bacterial WCEs, unless specifically mentioned. Antibodies against Hsp20 were commercially raised in rabbit using specific peptide, which was selected based on immunogenicity score (Genscript). Accordingly, we used the VDKDVNVELDPGQP peptide for raising anti-Hsp20 antibodies. An additional Cys residue was added at the N terminus of the peptide for KLH conjugation. Purified IgGs were subsequently used for immunoblotting of WCEs as described below.
For immunoblotting, protein transfer on nitrocellulose membrane was performed at a constant voltage of 25 V for 15 min using a semidry transfer apparatus (Bio-Rad), followed by blocking with 5% nonfat dried milk in PBS containing 0.05% Tween 20 (PBST) (blocking buffer) for 1 h at the room temperature. The membrane was incubated for overnight with 1:5000 dilution of anti-Hsp20 antibodies in blocking buffer at 4°C. After extensive washing with PBST three to five times, the immunoblot was incubated with the horseradish peroxidase-conjugated anti-rabbit IgG at 1:10,000 dilution in blocking buffer for 1 h at room temperature followed by washing to remove unbound antibodies. The signals were obtained using Super-Signal TM West Femto maximum sensitivity substrate (Thermo Fisher). Immunoblotting experiments with anti-YidC and anti-PknB were performed typically as described earlier (42), whereas those with anti-FLAG (F3165, Sigma-Aldrich) or anti-cMyc (M4439, Sigma-Aldrich) were conducted according to the manufacturer's specifications.

Overexpression of Hsp20 in mycobacteria
For overexpression of N-terminally tagged cMyc-Hsp20 in mycobacteria, full-length and C-terminal 210-aa truncated derivatives of hsp20 were PCR-amplified from Mtb genome using the primer pairs mentioned in Table 3. Subsequently, the PCR amplicons were subjected to restriction digestion followed by cloning at the NdeI and HindIII sites in an E. coli-mycobacterial shuttle plasmid pTetR (17). The resulting plasmids were termed pTetR-Hsp20 and pTetR-Hsp(210 aa), respectively. Subsequently, the complementary oligonucleotides bearing cMyc-specific sequence were annealed and inserted in-frame at the 59-end of the gene in these plasmids resulting in pTetR-cMyc-Hsp20 and pTetR-cMyc-Hsp20(210 aa). Similarly, for overexpression of C-terminal tagged Hsp20 in mycobacteria, Functional characterization of Mtb-ClpC1 unfoldase hsp20 ORF lacking the stop codon was amplified from Mtb genome and cloned in pTetR followed by in-frame insertion of full-length gfp sequence or complementary oligonucleotides bearing cMyc or FLAG tag-specific sequences at the C terminus; the resulting plasmids were annotated as pTetR-hsp20gfp, pTetR-hsp20-cMyc, and pTetR-hsp20-FLAG, respectively. Induction of N-and C-terminally tagged Hsp20 was obtained in WT as well as knockdown strains of Msm transformed with the respective plasmids, by treatment with 50 ng/ml ATc for 24 h.

Macrophage infection
For macrophage infection, THP1 monocytes were seeded in 24-well plates at a density of 2 3 10 5 cells/well in the presence of 50 nM phorbol myristate acetate for differentiation. After 24 h of differentiation, phorbol myristate acetate was removed, and the cells were infected with control, clpC1(2) and clpP2(2) strains at a MOI of 1:2 (macrophage:bacteria). After 4 h of incubation, the cells were washed with prewarmed PBS and replenished with RPMI, 10% FBS containing amikacin (200 mg/ml) for 1 h to kill extracellular bacteria. Subsequently the cells were washed and maintained in RPMI-FBS medium containing 200 ng/ml ATc throughout the experiment. At different time points, the cells were harvested by using PBS 1 0.1% Triton X-100 and incubated on ice for 5 min to release intracellular bacteria. The cell debris was removed by centrifugation, and bacterial supernatant was serially diluted in PBS and spread on 7H11 agar plates for CFU estimation after 4 weeks of incubation at 37°C.

Protein-protein interaction studies
Optical interference-based biolayer interferometry (BLI) from the Octet system by ForteBIO was used to ascertain the interaction between Mtb ClpC1 and Hsp20. Briefly, 40 mg/ml ClpC1 was dialyzed in 10 mM sodium acetate, pH 4.5, followed by immobilization onto the amine reactive group sensor (second generation) up to a level of 2.1 nm. Different concentrations of ligand (purified Hsp20) in a buffer containing 100 mM Tris-HCl pH 8.0, and 150 mM NaCl were used to acquire differential graded response. Binding constant was calculated as per the standard steps of baseline (60 s), association (180 s), and dissociation (180 s). The baseline was set using analyte buffer as a control.

Data availability
Proteome data have been deposited in the PRIDE database under accession code PXD013589. All remaining data are contained within the article.