Flexibility and Disorder in Gene Regulation: LacI/GalR and Hox Proteins*

To modulate transcription, a variety of input signals must be sensed by genetic regulatory proteins. In these proteins, flexibility and disorder are emerging as common themes. Prokaryotic regulators generally have short, flexible segments, whereas eukaryotic regulators have extended regions that lack predicted secondary structure (intrinsic disorder). Two examples illustrate the impact of flexibility and disorder on gene regulation: the prokaryotic LacI/GalR family, with detailed information from studies on LacI, and the eukaryotic family of Hox proteins, with specific insights from investigations of Ultrabithorax (Ubx). The widespread importance of structural disorder in gene regulatory proteins may derive from the need for flexibility in signal response and, particularly in eukaryotes, in protein partner selection.


LacI/GalR Proteins
The LacI/GalR family of transcription regulators comprises Ͼ4000 homologs; all members of this family are found exclusively in bacteria (10,12,13). The common structure of this family is a homodimer that contains one DNA-binding site and two binding sites for small-molecule, allosteric ligands (10). Some members form tetramers by a variety of mechanisms, whereas other homologs bind heteroproteins as part of the regulatory cycle (10). Fig. 1 (A-C) shows the tetrameric structure for the paradigmatic lactose repressor protein (LacI), which we use here to provide an overview of the flexible regions required for transcription regulation by LacI/GalR homologs.
First, a flexible linker connects the DNA-and ligand-binding domains (Fig. 1, A-C) (14,15). In ϳ60% of LacI/GalR homologs (13), this linker includes a conserved motif that forms a "hinge helix" in known structures. The side chains of the hinge helices interact with the minor groove at the center of the two DNA half-sites, bending the operator by ϳ45° (Fig. 1B) (14 -17). In this complex, various linker side chains form specific, hydrophobic interactions with operator DNA; thus, the linker-DNA interactions appear to be critical for recognizing specific LacI/ GalR operator sequences (14 -17). For LacI, the hinge helices remain compact (and presumably folded) even when the LacIoperator complex is bound to its allosteric ligand, inducer IPTG (18). However, when bound to nonspecific DNA, the NMR structure of LacI DNA-binding domains/linkers shows that the hinge helix is unfolded (16). In the absence of any DNA, both NMR and small angle x-ray scattering of full-length LacI show high mobility for the N-terminal DNA-binding domain that accompanies unfolding of the hinge helix (18,19).
The second flexible region in the LacI/GalR proteins is a three-stranded "pivot" between the N-and C-subdomains of the regulatory domains (20,21). Changes at this pivot occur when small, allosteric ligands bind the regulatory domain. Binding therefore alters the juxtaposition of the N-subdomains, which "pulls" the hinge helices and provides a key mechanism for altering their orientation and contacts to DNA (14,(21)(22)(23).
The third flexible region is unique to Escherichia coli LacI. This protein has an additional C-terminal sequence that com-prises the highly stable tetramerization domain (Fig. 1, B and C). Flexible linkers join the tetramerization domain to the regulatory domain, allowing the angle between the two dimers to vary (18,24,25). For this region, freedom of motion is essential for DNA looping and is discussed further below.
The sequences and roles of these flexible regions vary significantly among LacI/GalR homologs to generate functional diversity (reviewed in Ref. 10). For example, differences in the pivot and N-subdomain interface can lead to alternative regulatory outcomes. LacI is inducible-the consequence of binding its natural allosteric effector is to reduce DNA affinity and hence relieve repression of downstream genes (Fig. 1C). In contrast, PurR is repressible-the consequence of binding its allosteric ligand is to enhance DNA binding and repression (15). In addition, for ϳ40% of homologs, the ϳ18-amino acid linker that connects the core domain to the DNA-binding domain appears to be completely disordered, lacking a hinge helix (13). Similar to eukaryotic intrinsically disordered proteins (26), the linker sequence in these proteins has a high density of charge and/or prolines, although the specific positions vary (Fig. 1D). In these homologs, disorder in the linker appears to have arisen to facilitate binding DNA operators with varied spacing between half-sites ( Fig. 1E) (13,27).
Of the homologs with disordered linkers, E. coli CytR is the best studied. For high affinity DNA binding, CytR requires cooperative binding of flanking catabolite repressor proteins (CRPs) (10,28). The unfolded linkers in CytR allow its two N-terminal DNA-binding domains to bind operators with varied half-site spacing (Fig. 1E) (28). Notably, the disordered linkers do not propagate allosteric information to the DNA-binding domains as found for LacI. Instead, the conformational change precludes simultaneous binding to catabolite repressor protein and target DNA (29).
The range of functional differences among LacI/GalR family members illustrates how sequence changes in flexible protein regions can introduce functional variation without affecting the overall fold.

Hox Proteins
Within multi-cellular organisms, the family of Hox transcription regulators specifies the identities of many tissues (30,31). Each Hox homolog regulates a different set of target genes during development to specify cellular position within the organism (e.g. various head or cardiac substructures) and to determine cellular function (30). All Hox proteins contain (i) a conserved DNA-binding domain ("homeodomain") (32-35) and (ii) a hexapeptide motif that mediates interactions with the Exd/Pbx class of Hox co-factors ( Fig. 2A) (31). Hox proteins also contain transcription activation and repression domains that influence functional specificity (e.g. Ref. 36). Large regions of the Hox proteins are intrinsically disordered, as reflected by sequence analyses, striking protease sensitivity, and challenges in protein purification (32) (Fig. 2B). Unlike the LacI/GalR homologs, both domain organization and the locations of regulatory sites (e.g. phosphorylation and splicing sites) vary considerably among Hox family proteins ( Fig. 2A) (33,(37)(38)(39)(40).
In all Hox proteins, the 60-amino acid DNA-binding homeodomain accounts for only a small fraction of the total sequence ( Fig. 2A). Homeodomains contain three helices, the third of which binds the DNA major groove and is stabilized by the other two helices (34). At its N terminus, the homeodomain contains a dynamic, disordered "N-terminal arm" of 9 amino  (14). Note that the hinge helix and N-terminal DNA-binding domain are not resolved in the presence of inducer, presumably due to flexibility that arises from hinge helix unfolding; the DNA-binding domain may also become less structured. D and E, linker sequence variation and DNA sites for subsets of LacI/GalR homologs that contain (top) and lack (bottom) the YPAL motif (13). In YPAL homologs, structures show that amino acids in positions Pϩ1 through Lϩ2 fold into an ␣ helix (14,15,17). YPAL homologs recognize operators with contiguous DNA half-sites (panel E, top). Homologs that lack the YPAL motif (e.g. E. coli CytR) bind DNA with half-sites that are more widely spaced (panel E, bottom) (13,27). When examined individually, the sequences of non-YPAL linkers resemble those of intrinsically disordered proteins (13). Logos were created with WebLogo (95).
acids. In DNA-bound homeodomains, the N-terminal arm interacts with both bases and backbone phosphates in the DNA minor groove (34,35). Although the N-terminal arm never adopts a regular secondary structure in this complex, DNA interactions restrict its motion (35). The disordered N-terminal arm facilitates DNA sequence recognition by detecting small, sequence-specific variations in the phosphate positions (35,41). Finally, the N-terminal arm can also influence contacts between Helix 3 and the major groove (42). Both theoretical and experimental results reveal that binding affinity is highly influenced by the disordered N-terminal arm (e.g. Ref. 43).
One of the best-studied Hox proteins is Ultrabithorax (Ubx) from Drosophila melanogaster (Fig. 2B). The Ubx transcription activation domain is glycine-rich (33% versus 7% natural abundance generally in proteins), including 13 glycine residues in a row; not surprisingly, this region is extremely disordered (32,37). Genetic studies have identified numerous DNA sequences that are bound by Ubx in vivo. Biochemical studies of Ubx, one of the few full-length Hox proteins that have been purified, have provided a structure of its DNA-bound homeodomain and identified regions of Ubx that regulate DNA binding (30,32,34,44).
Most Hox proteins, including Ubx, have DNA target sequences that contain a 5Ј-TAAT-3Ј sequence (5Ј-ATTA-3Ј on the complementary strand) (Fig. 2C) (46,47). Despite the short length of this sequence, Ubx binds specific sites with high affinity (32,47). Disordered regions outside the homeodomain can profoundly impact DNA binding and sequence selection, providing an effective mechanism to diversify binding (32,44). As a consequence, full-length Ubx in vivo binds alternative DNA sequences with a much wider array of affinities than does the isolated Ubx homeodomain (44).

Flexibility Enables the Search for DNA-binding Sites
All transcription regulators must recognize their specific cognate DNA sequence among myriad nonspecific sites (48). The strategies used for this process are similar for prokaryotes and eukaryotes, although the latter environment is further complicated by the presence and packing of nucleosomes (49). Nevertheless, all regulatory proteins carry out this task more rapidly than predicted for diffusional search (50). For both prokaryotes and eukaryotes, combinations of sliding, hopping, intersegment transfer (brachiation), and looping yield the most efficient search process (51,52). As discussed further below, protein flexibility is key to several of these processes. Discerning the modes of transfer can be complex, giving rise to divergent views on search mechanisms (e.g. Ref. 52).

Sliding
Once a protein associates with nonspecific DNA, sliding reduces the dimensionality of the search and thereby enhances association rates for specific sites (Fig. 3A) (48,50,53). As a specific example from prokaryotes, in vivo experiments with LacI indicate that (i) sliding distances are ϳ45 bp before dissociation from DNA, consistent with theoretical analysis (52), and (ii) obstruction by other DNA-bound proteins occurs (54). The flexibility of the LacI hinge helices appears to be critical to the sliding process because these domains are unfolded when complexed with nonspecific DNA but folded in the operator-bound form in NMR studies (16).
Despite the presence of chromatin structure, sliding is also effective in eukaryotes. For Hox homeodomains, the disordered N-terminal arms play key roles in sliding, with the length and charge of this region driving sliding dynamics (55). Electrostatic interactions dominate binding in the nonspecific complex (51,53), although the orientation and mode of homeodomain-nonspecific DNA interaction are otherwise similar to the specific complex (unlike other transcription factor families; e.g. Ref. 56).

Hopping
In this mode of transfer, proteins bind to DNA, dissociate, and then rebind DNA at another site ( Fig. 3B) (53). The length of the "hop" may be quite short or can cover long distances (49). For both prokaryotic and eukaryotic transcription factors, hopping appears to increase the speed of the search process (51, 53). In addition, hopping provides a mechanism for some eukary- otic transcription factors to bypass nucleosomes when sliding along DNA (49).

Intersegment Transfer/Brachiation
Intersegment transfer, also called "brachiation" (using appendages to swing from object to object), allows movement from one DNA segment to the next (Fig. 3C) (55). This mechanism is distinct from hopping and is more prominent at high concentrations of DNA, as found in vivo (53). Intersegment transfer facilitates searches over long stretches of DNA because regions that are far in sequence space can be close in cellular space (as occurs via extensive packing in many eukaryotic systems) (57). This mechanism requires that two segments of DNA be simultaneously bound by protein. Hence, at least two DNA-binding interfaces are needed on the protein, and sufficient protein flexibility is required (58). The two interfaces can be provided by multimeric assembly, by multiple DNA-binding domains within a monomer, or by monomers with a single, bipartite DNA-binding domain.
For tetrameric LacI, the two dimers provide the requisite two binding interfaces, and flexibility in the segments that link the regulatory domains to the C-terminal tetramerization domain allows variation in dimer orientation (18,24,25). For the homeodomain, the intrinsically disordered N-terminal arm, which binds the minor groove, and the third helix, which binds the major groove, provide the two protein-DNA interfaces (55).
This type of interaction accelerated the rate of target recognition by the HoxD9 homeodomain by more than 3 orders of magnitude (59). Thus, the flexibility of the N-terminal arm plus the flexible "joint" between the N-terminal arm and the helical portion of the homeodomain play a critical role in enhancing the rate of searching.

Looping
DNA looping occurs when regulatory proteins or their complexes simultaneously bind two DNA sites (Fig. 3D). Transient looping may occur during brachiation/intersegment transfer, but stable loops persist and impact transcription (60,61). For example, LacI looped complexes are significantly more stable than LacI bound at a single site (62). In eukaryotes, looping can place enhancers and promoters in direct physical contact (61). Loop formation is influenced by DNA sequence and/or the presence of ancillary proteins (61,63).
For E. coli tetrameric LacI, distances between target operator-binding sites can vary from hundreds to more than a thousand base pairs (62,64). The natural lac operon has a spacing of ϳ400 bp between operators O1 and O2 and ϳ100 bp between O1 and O3 (64). The distances between binding sites, as well as their relative rotation around the DNA helix, can greatly alter transcription (64). In addition, protein flexibility is critical to forming looped structures (24). For tetrameric LacI binding to two operators, the two dimers adopt an "open" conformation and reassociate with the same or a different DNA in a "hopping" process. C, intersegment transfer and brachiation allow proteins with multiple binding sites to move from one DNA segment to another via transient contacts to both DNA strands. This type of movement can be accomplished by an oligomeric protein with two DNA-binding sites (e.g. LacI tetramer, left) or by a monomer with two separate DNA-binding regions within a single domain (e.g. Ubx homeodomain, right, green with flexible N-terminal arm in gold oval). D, stable loops can be formed when two DNA-binding domains simultaneously form specific complexes at DNA target sites. The two sites can be separated by stretches of DNA that vary widely in length. Prokaryotic looping (left) generally involves single proteins (e.g. LacI tetramer) or a protein assisted by a nearby DNA bending protein (e.g. two GalR dimers and the bending protein HU) (60,65). In contrast, multi-protein complexes at eukaryotic promoters (right) can be highly complex and comprise multiple loops of varying stability that can encompass up to ϳ10 6 bp (as in for example Ref. 97).
(i.e. the angle between the two dimers is increased relative to the crystal structure (18)). Chemically cross-linking LacI N termini across two dimers limits dimer-dimer mobility and precludes looping (24). An alternative approach to effect looping is utilized by the homolog GalR, which forms highly stable loops with the assistance of protein HU to facilitate DNA bending (65).
The substantial intrinsic disorder found in eukaryotic regulatory proteins greatly facilitates loop formation. Many eukaryotic transcription regulators, including the Hox proteins, bind to clusters of DNA sites (66). Both side-to-side cooperative Hox binding to Hox-site clusters and back-to-back Hox-Hox interactions between two clusters can enable looped structures (Fig.  2C) (66). Hox proteins can either form loops themselves or recruit large protein complexes, such as the polycomb group proteins, the cohesion complex, and the condensing complex, to bridge distant DNA sequences (67) (Fig. 3D). In addition, Ubx binds other transcription factors that have their own DNA-binding sites near those of Ubx target DNA sequences (45,68). This arrangement provides opportunities for creating combinatorial loops that are sensitive to cellular conditions and allow response to cell-signaling stimuli. Importantly, the intrinsically disordered regions of Ubx are required for these heterologous protein interactions (69).

Regulatory Mechanisms Exploit Protein Flexibility/Disorder
Transcription regulation often requires that regulatory proteins alter their DNA binding in response to external signals. Both the LacI/GalR and the Hox proteins utilize flexibility and disorder to transmit this incoming information to the DNAbinding domain.

Allosteric Communication in the LacI/GalR Proteins
Effector binding to LacI/GalR proteins impacts several flexible regions. For E. coli LacI, structures of free, DNA-bound, and IPTG-bound protein (14,70), along with molecular dynamics simulations (23,71,72), have been used to study these adaptable regions. The largest changes are found in the linker region and in the N-subdomain interface of LacI (14,23,70). Inducer binding alters the juxtaposition of the LacI N-subdomains to bring them into closer contact (Fig. 4A) (73). The  (14). The flexibility of these regions is critical to strong transcription repression and allosteric regulation. B, the flexible interface between the LacI/GalR linkers and regulatory domains can facilitate allosteric response to multiple ligands (79). The LacI DNA-binding domain/linker can be fused to the regulatory domains of other homologs (shown as schematic dimers) to create functional chimeric repressors (red bars in graph) with intact allosteric response to small effector ligands (blue bars). Note that the "LLHP" chimera has the opposite allosteric response of LacI (79). DEL control indicates the activity of reporter enzyme in the absence of repressor. C, regions within Ubx important for modulating its DNA binding. The upper red bracket indicates the Ubx region that contains phosphorylation sites (38). Blue brackets below indicate sequences that interact with Ubx partner proteins (69). Yellow regions are intrinsically disordered (32). The gold and brown striped region is both spliced and disordered (32,38). The conserved hexapeptide motif important for Exd interaction is dark gray, and the HD is black. D, example structures for three families of proteins that interact with Ubx. For the six partners of the DNA/RNAbinding three-helical bundle family, the engrailed homeodomain is shown (PDB file 1ENH) (98); for the five partners of the ␣-␣ superhelix family, ␤-catenin is shown (PDB file 1QZ7) (99); and for the six partners of the zinc finger C2H2/C2H2 family, Zif268 is shown (PDB file 1A1I) (100). Intrinsic disorder of Ubx regions may be key for recognizing this wide range of partners (69). required flexibility in this region has been explored via mutagenesis (74). A key residue is Lys-84, which is buried within the otherwise apolar interface between the N-subdomains and changes positions in the bound and unbound structures (14). When Lys-84 was substituted with Leu or Ala, the allosteric response was diminished to Յ10-fold (as compared with Ͼ10 4 -fold for wild type), the kinetics of inducer binding were greatly slowed, and protein stability was significantly enhanced (74,75).
The motion at the N-subdomain also alters the linker/hinge helix of LacI. The point of closest approach between the two linkers of a dimer is the side chain of Val-52. When this residue was mutated to cysteine, a disulfide bond could be formed that blocked allosteric response to inducer binding (76). Other substitutions at position 52 showed that extrinsic interactions, such as interactions with operator DNA, had more influence on LacI function and allosteric response than did the intrinsic propensity of amino acids for folding the hinge helix (77). The length of the linker region is also important. When 1-3 Glu residues were inserted after the hinge helix, LacI showed progressive decreases in DNA binding affinity and allosteric response (78). Thus, this flexible linker region must be precisely positioned (i) to allow communication between the DNA-binding and regulatory domains and (ii) to align the DNA-binding domains within each dimer.
Nevertheless, linker flexibility facilitates tolerance to significant sequence diversity (Fig. 4B). In fact, fully functional hybrids were created by fusing the LacI DNA-binding domain/ linker to regulatory domains from other homologs. Each chimera has the DNA binding specificity of LacI, ligand binding of the parent regulatory domain, and allosteric response defined by the regulatory domain (79). Thus, the interface between the linker and regulatory domains is highly adaptable.

Hox Regulatory Mechanisms
Prokaryotic repressors are generally designed to respond to a limited number of signals, often only one. In contrast, eukaryotic Hox proteins integrate multiple input signals to generate highly specific outcomes unique to the tissue and organism (80). Further, these proteins must differentiate a plethora of DNA sites with both cellular and tissue specificity (30). To that end, many Hox proteins have several splice isoforms (e.g. Refs. 44 and 81), a variety of modification sites (e.g. phosphorylation) (38,82,83), and a number of protein partners (Fig. 4C) (31,45,68). These regulatory mechanisms are frequently used to diversify the functions of transcription factors (84). Although these processes typically occur within intrinsically disordered regions, their locations vary among Hox proteins.
In Ubx, all of the regulatory processes are associated with intrinsically disordered regions that also regulate DNA binding (32,38,44). To provide an example in each category: (i) when Ubx interacts with partner protein DIP1 via these disordered regions, Ubx transcription activation is precluded in vivo (68); (ii) the conserved hexapeptide, which alters DNA binding specificity, and the homeodomain are connected by a disordered linker that varies from 7 to 50 amino acids in length in alternatively spliced isoforms, with the result that Ubx splicing isoforms regulate different genes and construct different tissues in vivo (39,40,85); and (iii) Ubx is phosphorylated within the disordered region of the transcription activation domain in a tissue-specific manner, suggesting a regulatory function (37,38).
The disordered regions also mediate Ubx binding to a variety of heteroprotein partners, a critical element in Hox protein function (45,68,69,86). Hox proteins bind to components of the transcription machinery (45,87), as well as to other specific transcription factors, to facilitate Hox regulation of the correct subset of genes in different tissues (Fig. 4D) (45,88,89). For Ubx partners identified by yeast two-hybrid methods, two key elements have emerged: (i) binding to many of these partners requires the disordered regions within Ubx and (ii) partners can be classified into specific "folds" (69). Indeed, of the selected topologies, three folds include at least five Ubx partners, jointly representing more than half of known Ubx partner proteins (Fig. 4D). Different structural families preferentially bind different disordered segments and splice isoforms of Ubx (69).
These regulatory mechanisms can influence one another (80). For example, alternative splicing impacts Hox binding to other proteins (39,69). Likewise, phosphorylation of Hox proteins can impact protein interactions and cooperative DNA binding (90). Thus, regions that exhibit intrinsic disorder have the potential to integrate multiple sources of information to regulate and coordinate Hox functions.
Interestingly, the various disordered regions of Ubx can be distorted to allow formation of biomaterials (91). Deleting the disordered regions precludes self-assembly (92,93). Two consequences of intrinsic disorder have the potential to make these materials commercially useful: (i) Ubx fibers are remarkably strong and extensible (91,92) and (ii) the disordered regions allow fiber formation to accommodate a wide range of other proteins fused to the Ubx sequence (94).

Conclusion
Although prokaryotic and eukaryotic proteins exhibit many unique features, flexibility has emerged as key to transcription regulation in both kingdoms. This feature of proteins permits regulatory proteins to adapt to varied spacing among DNAbinding sites and to engage multiple mechanisms of searching for and binding to DNA target sites. Flexibility allows the variety of protein interactions required to construct complex DNA structures such as loops, either by direct binding or through interactions with other proteins. Finally, flexibility, and indeed in some cases, extensive disorder are required for regulation of transcription factor function through allosteric ligand binding, protein sequence alterations (splicing and/or posttranslational modifications), and/or protein-protein interactions. The multiple modes by which flexibility enables transcription regulation generate both diverse and highly effective mechanisms for an organism to respond to a varied local cellular environment as well as features essential for the development and function of multicellular organisms.
Acknowledgments-We express our appreciation for molecular models derived from simulations provided by Justin Drake and B. Montgomery Pettitt, University of Texas Medical Branch-Galveston.