Programmed Heterogeneity: Epigenetic Mechanisms in Bacteria

Contrary to the traditional view that bacterial populations are clonal, single-cell analysis reveals that phenotypic heterogeneity is common in bacteria. Formation of distinct bacterial lineages appears to be frequent during adaptation to harsh environments, including the colonization of animals by bacterial pathogens. Formation of bacterial subpopulations is often controlled by epigenetic mechanisms that generate inheritable phenotypic diversity without altering the DNA sequence. Such mechanisms are diverse, ranging from relatively simple feedback loops to complex self-perpetuating DNA methylation patterns.

The term "epigenesis" was introduced into contemporary biology by Conrad Waddington, a British visionary embryologist, to describe how cell lineages are formed during the development of multicellular eukaryotes (1,2). During differentiation of eukaryotic tissues, genetically identical cells diversify into distinct lineages by inheritable changes in gene expression without loss or alteration of the DNA sequence. Many decades after Waddington, a universally accepted definition of epigenetics remains to be agreed upon. However, a tentative definition may be that epigenetics addresses the study of cell lineage formation by non-mutational mechanisms.
Most textbooks and reviews on epigenetic gene regulation concern only eukaryotes. One reason may be the enormous success of eukaryotic epigenetics and its implications for human disease. In addition, bacteria have been traditionally viewed as clonal populations of genetically identical cells with phenotypes merely reflecting their genetic constitution. This view is, however, naïve. Certain bacterial genera undergo complex developmental programs that involve cell differentiation. Spore formation by Bacillus subtilis (3), differentiation of Rhizobium into nitrogen-fixing bacteroids (4), asymmetric cell division in Caulobacter (5), formation of fruiting bodies by Myxococcus (6), heterocyst formation in cyanobacteria (7), and biofilm formation in many bacterial species (8,9) are well known examples of bacterial development. In all of these phenomena, bacterial cells with distinct morphological and physiological properties are formed while the genome DNA sequence remains intact.
Formation of phenotypically distinct cells in populations made of genetically identical bacteria is not restricted to developmental programs. In the last few decades, the introduction of single-cell analysis in bacteriology has revealed many examples of subpopulation formation. For instance, clonal populations of bacteria can sometimes bifurcate into two distinct states, a phenomenon known as bistability (10,11). Reversible bistability, traditionally known as phase variation, is also common (12). Transition at high frequency between two or more phenotypic states (13) can occur through mutations at genomic repeat sequences (14,15) or via site-specific recombination (16 -19). In other cases, bistability and phase variation are controlled by epigenetic mechanisms with strikingly different levels of complexity, from the propagation of simple feedback loops to the formation of DNA methylation patterns reminiscent of chromatin modification in eukaryotic cells (20 -22).
Subpopulation formation can often be observed in the laboratory. However, it may be especially relevant in natural environments, either as an adaptive strategy (e.g. to evade the immune system and other host defenses during bacterial infection) or as a bet-hedging strategy that may facilitate survival if environmental changes occur (23). Relevant examples of phenotypic heterogeneity in natural environments are the formation of "persisters" (dormant bacterial cells resistant to antibiotics) (24,25), the formation of lineages during Salmonella colonization of animals (26 -28), and the bistable expression of extracellular matrix genes during biofilm formation by B. subtilis (9).
Even though subpopulation formation can be seen as the execution of intrinsic bacterial programs, it often involves stochastic events. For instance, random fluctuations in gene expression, a phenomenon known as "noise," can establish cellto-cell differences in an isogenic population of bacteria (29). These quantitative differences can become qualitative (30) in the sense that expression above a critical threshold will provide a distinct signal, and expression below the threshold will provide a different signal (21,31). Propagation of these signals by feedback loops enables the formation of epigenetic lineages (Fig. 1).

Formation of Epigenetic Lineages by a Positive Feedback Loop
Bistable gene expression occurs when a unimodal pattern of gene expression becomes bimodal, bifurcating into two distinct patterns (10,32). Bistability can be generated either by a positive feedback loop or by a double-negative feedback loop (22,33). A classical example of bistability generated by a positive feedback loop was described in the Escherichia coli lac operon (34). When added at high concentrations, the gratuitous inducer isopropyl ␤-D-thiogalactopyranoside (IPTG) 3 fully derepresses the lac operon. At low concentrations, however, IPTG is unable to induce a naïve (uninduced) culture. However, if a fully induced culture is transferred to medium containing low concentrations of IPTG, a subpopulation of cells is able to maintain the fully induced state (34). Maintenance occurs because fully induced cells have a high level of ␤-galactoside permease in their membrane. The permease transports IPTG, providing a high internal concentration of inducer, which maintains full induction (32,34). The positive feedback loop in this system is that a high level of permease is required to concentrate IPTG in the cell, and high internal IPTG levels are required for high levels of permease synthesis (34). In other cells, however, a decrease in the internal concentration of inducer will reduce permease synthesis, which in turn will cause further reduction in the internal concentration of IPTG, driving the cell toward the repressed state via binding of the LacI repressor. The overall consequence is that a fully induced population bifurcates into two bistable states: fully induced and uninduced (repressed) (32)(33)(34).
Errors made during transcription can also provide signals for epigenetic switching in the E. coli lac operon (35). An increased error rate during transcription, caused either by mutations that reduce transcription fidelity of RNA polymerase or by the absence of transcription fidelity factors GreA and GreB, increases switching of the lac operon from the off state (uninduced) to the on state (induced) (35). The interpretation is that errors in lacI mRNA synthesis cause a transient decrease in the Lac repressor level, which permits switching to the on state (35,36). Note that an uninduced E. coli cell contains ϳ10 molecules of the Lac repressor, an amount small enough to make the system noisy and therefore metastable. Perturbation of this delicate equilibrium by transcription inaccuracy can switch the system to the on state. Even though the decrease in the Lac repressor concentration is transient, synthesis of permease will generate a positive feedback loop that will maintain in the on state in certain cells (34). Lac bistability is not observed in cells containing a 10-fold higher Lac repressor level, consistent with the hypothesis that switching occurs only under conditions in which repressor levels are subsaturating.
Another classical example of bistability occurs in B. subtilis. Upon entry into stationary phase, a fraction of B. subtilis cells acquire the capacity to take up DNA, a phenomenon known as competence (10). A crucial factor for competence development is accumulation of ComK, which activates genes required for DNA uptake as well as the comK gene itself (37). During exponential growth, ComK is synthesized but degraded. When the culture approaches stationary phase, a quorum sensing-related factor stabilizes ComK (38,39). At that moment, a competition is initiated between several repressors and ComK for binding to regulatory regions of the comK promoter (40,41). Binding of ComK initiates a positive feedback loop, leading to increased synthesis of ComK and subsequent transcription of competence genes. Binding of the repressors inhibits comK expression and prevents competence. A crucial property for bifurcation of the population into two subpopulations is that the level of ComK in individual cells fluctuates, generating stochastic noise. When the ComK level reaches a threshold in a B. subtilis cell, a quantitative difference becomes qualitative: the ComK positive feedback loop will be activated, and competence will develop (42)(43)(44). Development of competence thus occurs in cells that undergo a small but critical increase in ComK concentration ( Fig. 2). In turn, comK will be repressed in cells in which the ComK level remains below the threshold, and they will not develop competence ( Fig. 2) (43).

Formation of Epigenetic Lineages by a Double-negative Feedback Loop
Infection of E. coli by bacteriophage can follow two developmental programs. One is lysis of the bacterial cell; the other is lysogeny, a symbiosis-like association in which the phage enters a dormant state. Although the lysis/lysogeny decision is influenced by the physiological state of the cell and by environmental factors, the fate of individual infections is unpredictable and may be considered stochastic (33,45,46). Phage has two repressors, cI and Cro, each of which represses expression of the other. At the onset of infection, both repressors are produced, and the lysis/lysogeny decision may be viewed as a repressor race: the repressor that first occupies specific regulatory DNA sites in DNA will repress synthesis of its antagonist (45). If the winner is Cro, synthesis of cI will be repressed, and will lyse the host cell (Fig. 2). If the winner is cI, synthesis of Cro will be repressed, and will lysogenize the cell (Fig. 2) (45). Note that the outcomes of a positive feedback loop and a doublenegative feedback loop are analogous (22,33). In the case of ,  preventing the synthesis of Cro by cI is equivalent to positive autoregulation of cI and vice versa.

Phase Variation via DNA Methylation Patterns
A common epigenetic mechanism to regulate switches involves the formation of DNA methylation patterns (47,48). This occurs when a methylation sequence on DNA overlaps the binding site for a protein, and methylation of that sequence is blocked (49,50). For example, most GATC sites in the E. coli chromosome are fully methylated except for a short time following DNA replication, in which they are hemimethylated. However, a few sites are stably unmethylated due to binding of proteins at sites that overlap or are adjacent to a GATC site, competing with Dam for binding and blocking methylation (47,48,51). Two such GATC sites in the pap (pyelonephritis-associated pili) operon of uropathogenic E. coli orchestrate Pap pilus phase variation (52,53). The core of the Pap switch consists of two sets of binding sites, 1-3 and 4 -6, within the pap promoter region for the global regulator known as the leucineresponsive regulatory protein, Lrp (54). Lrp appears to be predominantly a tetramer of dimers (octamer), with three Lrp dimers binding to three pap sites, leaving one dimer unbound (Fig. 3) (55-57).
A GATC site is present within site 2 (GATC prox ) and site 5 (GATC dist ); methylation of these sites affects Lrp binding, as discussed below. Lrp binds cooperatively to a set of three pap sites, but occupancy of all six Lrp sites occurs infrequently due to a mutual exclusion mechanism that requires negative DNA writhe (supercoils) (58). Lrp binding to sites 1-3 (Fig. 3, red boxes) blocks methylation of GATC prox and also blocks pap transcription because the RNA polymerase 70 -binding site is in this region (Fig. 3A) (59). In contrast, binding of Lrp to sites 4 -6 ( Fig. 3, green boxes) blocks methylation of GATC dist and helps to activate pap transcription (60). The role of Lrp in activating transcription may be to bend DNA, facilitating binding of catabolite gene activator protein to the RNA polymerase ␣-subunit (61).
Transition from the phase off state to the phase on state requires two pap-encoded regulators, PapI and PapB. PapI increases the affinity of Lrp for pap sites 2 and 5 via an ACGATC sequence present in each site (52,56,62). PapB, the product of the first gene of the pap operon, binds near the papI promoter and activates papI transcription, forming a positive feedback loop (Fig. 3D, dashed arrow) (63). Methylation of GATC prox is required for the off-to-on transition because it lowers the affinity of PapI/Lrp for site 2, increasing the probability that PapI/Lrp will bind to sites 4 -6 and initiate transition to the on phase (58). For this to occur, Lrp bound at sites 1-3 in off phase cells must dissociate to allow methylation of GATC prox by Dam. This likely occurs as the replication fork passes through the pap regulatory region, and a hemimethylated GATC dist site is generated (Fig. 3B). The affinity of PapI/ Lrp for hemimethylated pap sites 4 -6 is significantly higher than for the fully methylated DNA (52,56). If PapI/Lrp binds to site 5 before Dam methylates the daughter strand, cooperative binding of Lrp/PapI to sites 4 -6 will occur to initiate transition to the phase on state. Evidence indicates that a dimer of Lrp and a monomer of PapI bind to pap site 5 (56). This transition is also dependent on dissociation of Lrp from sites 1-3 and methylation of GATC prox : increasing the off rate (k dis ) of Lrp at sites 1-3 increases the off-to-on rate (64).
Dam methylase is highly processive, such that ϳ130 Dam molecules can efficiently methylate ϳ20,000 genomic GATC sites (65). Thus, when Dam methylates GATC prox , it should have a high propensity to methylate the adjacent GATC dist site before dissociating from DNA. This would block PapI/Lrp binding to site 5 and block transition to the on phase (60). Recent work has shown that the presence of a poly(A) tract 5Ј to the two pap GATC sites decreases the processivity of methyla- tion by reducing the rate of methyl transfer (k chem ) (66). This may be necessary to allow PapI/Lrp to compete with Dam for access to hemimethylated GATC dist sites following DNA replication (67).
The phase on-to-off transition, which occurs at an ϳ100-fold higher rate than the off-to-on transition (47), has not been analyzed in detail. Following DNA replication, cells in the phase on state contain a hemimethylated GATC dist site and a fully unmethylated GATC prox site. If Dam methylates GATC dist , binding of PapI/Lrp will be inhibited, providing an opportunity for Lrp binding at sites 1-3 due to release of mutual exclusion. Notably, binding of Lrp to site 2 is unaffected by methylation of GATC prox (52); therefore, the key step must be competition of Dam and PapI/Lrp for binding at site 5. Formation of the phase off DNA methylation pattern requires two rounds of DNA replication to convert a fully methylated GATC prox site to a fully unmethylated site.
The on and off pap transcription states are each self-perpetuating and heritable. In the off state, GATC dist is fully methylated, preventing PapI/Lrp binding to sites 4 -6 ( Fig. 3A). Conversely, in the on state, PapI expression is high due to the PapB positive regulatory feedback, and GATC prox is fully methylated, preventing PapI/Lrp binding to sites 1-3 (Fig. 3D). In addition, it is likely that both the off and on states are stabilized by mutual exclusion (58).
The pap switch is modulated by additional transcription factors that are environmentally responsive, including H-NS, RimJ, and CpxR. Transcription of pap is blocked at 23°C by H-NS, which binds to the pap regulatory region and blocks GATC methylation (68). H-NS also modulates Pap switching at 37°C in response to high osmolarity and other environmental conditions (69,70). This may occur by altering PapI/Lrp binding to pap regulatory sites, but the mechanistic details are unknown. RimJ, which acetylates ribosomal protein S5, inhibits transition to the on state in response to temperature and other environmental conditions by an unknown mechanism (71). The CpxAR two-component regulatory system responds to cell envelope stress by phosphorylation of CpxR. Phosphorylated CpxR binds specifically to the pap regulatory region, competes with Lrp, and blocks pap transcription, which may protect cells from further cell envelope damage (72)(73)(74).

Other Switches Regulated by DNA Methylation Patterns
Many methylation-dependent phase variation systems have been identified since the initial discovery of the Pap system. Some of these systems, such as foo, clp, and pef, which all encode pili, are designed similarly to the pap switch (75)(76)(77). Remarkably, the latter two systems have a reversed architecture in which the PapI homologs ClpI and PefI act as negative regulators. Other methylation-controlled switches use DNA-binding proteins other than Lrp, including OxyR and Fur. The best characterized system is agn43, which controls the expression of antigen 43 (78,79), an outer membrane protein that plays a role in biofilm formation and pathogenesis (80,81). OxyR binds three GATC sites in the agn43 regulatory region. Binding of OxyR blocks methylation of the three GATC sites and inhibits agn43 transcription, forming the off phase. Transition to the on phase occurs following DNA replication if Dam can methylate both strands of the three GATC sites before OxyR rebinds to the sites (50,82,83). Notably, the poly(A) tracts adjacent to the GATC sequences in pap and its relatives are not present in agn43, and thus, Dam should processively methylate the three agn43 GATC sites if they are not bound by OxyR (84). The on-to-off switch can occur after DNA replication, when the three GATC sites are hemimethylated. OxyR has a higher affinity for agn43 DNA containing hemimethylated GATC sites versus fully methylated GATC sites (84,85). Thus, if OxyR binds to the GATC region before Dam fully methylates the GATC sites, a phase off intermediate state will ensue, and after one more round of replication to convert the hemimethylated GATC sites to fully unmethylated sites, the phase off transition will be complete. On-to-off transition is affected by the local concentration of OxyR; the addition of three or more OxyR-binding sites upstream of agn43 biases cells toward the off phase (84).
A number of phase variation switches appear to be regulated by mechanisms reminiscent of agn43. These include the gtr switch on the P22 bacteriophage (86) and the chromosomal switch locus STM2209-STM2208 (opvAB) (87), each controlling modification of cell surface lipopolysaccharide of Salmonella, both of which are controlled by OxyR and Dam. In enteroaggregative E. coli, the sci1 type VI secretion system is controlled by a phase switch in which the iron regulatory protein Fur blocks Dam methylation of sci1 GATC sites, forming phase off and on methylation patterns (88).

Phasevarions: Formation of Epigenetic Lineages by Phase Variation of DNA Methylase Synthesis
Certain restriction-modification systems show phase variation, and a common mechanism for switching between off and on states is expansion and contraction of nucleotide repeats (89). Phase variation of restriction-modification systems may generate subpopulations of bacterial cells differing in their susceptibility to phage infection and in their ability to acquire foreign DNA. In addition, DNA adenine methylation by certain phase-variable restriction-modification systems regulates expression of specific genes (90). These systems, known as "phasevarions," conserve their restriction-modification activity but have additionally acquired epigenetic regulatory capacity (91,92). In some phasevarions, the gene encoding the restriction enzyme is inactivated by mutation, whereas the modification gene (mod) remains active. Hence, in these mutant type III restriction-modification systems, the Mod enzyme is a functional analog of solitary methyltransferases (e.g. Dam).
In the human pathogens Haemophilus influenzae, Neisseria meningitidis, and Neisseria gonorrhoeae, DNA adenine methylation by Mod enzymes has been shown to regulate gene expression, and the loci under Mod control include genes with roles in envelope structure, virulence, and stress responses (92). Because synthesis of Mod DNA methylase is phase-variable, isogenic subpopulations contain two types of bacterial cells. One population contains N 6 -methyladenine in the genome, whereas the other subpopulation does not. As a consequence, each lineage shows a distinct pattern of gene expression that affects all DNA methylation-sensitive loci.
Whereas individual phase variation systems, such as pap and agn43, generate heterogeneity of a single phenotypic trait, cell lineages under phasevarion control differ in multiple phenotypic traits. The capacity of phasevarions to generate bacterial lineages may be further extended in bacterial species that contain multiple mod alleles, each with slightly different DNAbinding domains (92). Independent switching in the synthesis of several Mod proteins can be expected to generate multiple gene expression patterns, thus increasing the phenotypic heterogeneity of the population.

Hierarchical Epigenetic Networks
Phase variation of certain genetic loci causes bistable expression of other genes, extending phenotypic heterogeneity to cell functions encoded outside the phase variation locus. An example of this kind occurs in the Salmonella enterica std operon, which encodes fimbriae for attachment to the intestinal mucosa (93). Transcription of std is controlled by a LysR-like regulator known as HdfR and by two products of the std operon, StdE and StdF (94). Production of Std fimbriae in isogenic populations of Salmonella is subject to phase variation; the switching mechanism remains to be deciphered. However, it is well established that the StdE and StdF gene products regulate expression of genes outside the std operon, including the cluster of virulence genes known as Salmonella pathogenicity island 1, SPI-1 (95). Because SPI-1 expression is prevented by StdE/StdF, cells that produce Std fimbriae do not synthesize the SPI-1-encoded apparatus and vice versa (95). One may thus predict that phase variation of the std operon in the animal intestine will split Salmonella populations into two lineages, one able to invade the intestinal mucosa (causing acute disease) and one able to attach to the intestinal epithelium (causing latent infection). Depending on the host physiological conditions and the host response, one of the two subpopulations will be able to colonize the animal, whereas the other will be eliminated. Whatever the outcome, bet-hedging will increase the chances that a fraction of the Salmonella population survives. This model fits well with the view that colonization of animals by Salmonella involves subpopulation formation at several stages (26 -28), and the same may be true for other human pathogens (25,96,97). Subpopulations may differ in their susceptibility to antibacterial drugs, thus explaining why certain bacterial infections are difficult or impossible to eradicate.