Dancing Protein Clouds: The Strange Biology and Chaotic Physics of Intrinsically Disordered Proteins*

Biologically active but floppy proteins represent a new reality of modern protein science. These intrinsically disordered proteins (IDPs) and hybrid proteins containing ordered and intrinsically disordered protein regions (IDPRs) constitute a noticeable part of any given proteome. Functionally, they complement ordered proteins, and their conformational flexibility and structural plasticity allow them to perform impossible tricks and be engaged in biological activities that are inaccessible to well folded proteins with their unique structures. The major goals of this minireview are to show that, despite their simplified amino acid sequences, IDPs/IDPRs are complex entities often resembling chaotic systems, are structurally and functionally heterogeneous, and can be considered an important part of the structure-function continuum. Furthermore, IDPs/IDPRs are everywhere, and are ubiquitously engaged in various interactions characterized by a wide spectrum of binding scenarios and an even wider spectrum of structural and functional outputs.

"Dancing protein clouds" is a joke from a time when the newly born field of protein intrinsic disorder was trying to find an appropriate term to describe biologically active proteins without unique structures. The need for a specific term was determined by the clear recognition that those structureless functional proteins were fundamentally different from the "normal" globular proteins that used information encoded in their amino acid sequences to fold into specific, aperiodic crystal-like structures needed for specific biological functions. Fig. 1 reflects these attempts by representing different terms used in literature to describe such "strange" or "abnormal" proteins and shows that the "dancing protein clouds" expression is formed by superimposing "dancing proteins" and "protein clouds" descriptors. Although the phrase "dancing protein clouds" sounds like a parody, the term actually has deep meanings. The presence of a unique structure in a given protein means that when one would look at the sample containing this protein, s/he would find that all protein molecules are alike, that the structure of an individual molecule barely changes over time, and that the ensemble-averaged (or time-averaged) structure is identical, or at least very similar, to the structures of all individual protein molecules in that sample. In other words, if one would overlay all those individual structures, a crisp and clear image would be generated, similar to those found in the Protein Data Bank (PDB), and this ensemble-averaged structure would not change much over time. On the other hand, the lack of a unique structure in a given protein would create a highly dynamic ensemble, members of which would possess very different structures at any given moment, and the structure of any given molecule would change dramatically over time (therefore the "dancing protein" analogy). If one would try to overlay all those structures, all those dancing protein molecules, a cloudlike, fuzzy entity would be generated, and the shape of this cloud would not be static, dramatically changing with time and in response to subtle environmental perturbations. Therefore, the dancing protein cloud analogy describes the structure and functionality of an intrinsically disordered protein (IDP) 2 or an intrinsically disordered protein region (IDPR), which are both known to be strongly dependent on environmental conditions and can change dramatically due to subtle disturbances. To some extent, an IDP resembles a ball in unstable equilibrium at the top of a hill, which has limitless possibility for movement and would make very different trajectories, depending on how it is disturbed and pushed away from the top of the hill. Therefore, this dancing protein cloud is a chaotic entity that represents an illustration of a dynamical system with high sensitivity to initial conditions, and thereby can be considered as a subject of chaos theory. Fig. 2 provides further illustration of the remarkable similarity between the dynamic conformational behavior of an IDP (neuroligin cytoplasmic domain (1)) and the behavior of a simple chaotic system known as the Lorenz attractor (2)(3)(4), which serves as a classic example of a non-linear dynamic system that was developed as a simplified mathematical model to describe atmospheric convection by considering two-dimensional flow of a fluid subject to differences in temperature and gravity (2,3). Fig. 2 shows that similar to the Lorenz attractor, an IDP neither converges to a steady state nor diverges to infinity, but stays in a limited but chaotically defined region. * This work was supported in part by Russian Science Foundation RSCF Grant Because the IDP structure-function universe is vast and is a subject of multiple recent reviews, this minireview describes only a few subjectively chosen aspects of disorder. In my view, the chosen topics represent some of the hot spots of the modern disorder-related research.

Complexity from Simplicity: Peculiarities of Amino Acid Sequences of IDPS/IDPRS
To fold or not to fold? For a protein, the answer to this almost Hamletian question is scripted in its amino acid sequence (5)(6)(7)(8)(9)(10). Early analysis of extended IDPs, which are (almost completely) unfolded at physiologic conditions, revealed that these proteins do not have stable structures due to the presence of numerous uncompensated charged groups and a low content of hydrophobic amino acid residues. This indicates that the combination of low mean hydropathy and relatively high net charge represents an important prerequisite for the absence of compact structure in proteins under physiological conditions (10). At a more detailed level, IDPs/IDPRs are significantly depleted in order-promoting residues (Cys, Trp, Tyr, Phe Ile, Leu, Val, and Asn), and are instead enriched in disorder-promoting residues (Pro, Arg, Gly, Gln, Ser, Glu, Lys, and Ala) (6,9,(11)(12)(13). In other words, amino acid sequences of IDPs and IDPRs are simpler than those of ordered proteins and domains and have smaller information volume. However, this sequence simplicity is translated into a vastly extended sequence space and related structural complexity. In fact, due to the removal of restrictions posed by the need to gain ordered structure spontaneously, the sequence space of IDPs/IDPRs is noticeably greater than that of foldable ordered proteins and domains (14). Furthermore, although many IDPs/IDPRs can (at least partially) fold upon binding to their partners, their folding code, i.e. the ability to spontaneously gain a unique biologically active structure, is noticeably reduced. Although a portion of such folding code is missing for IDPs, it can be supplemented by their binding partner(s) (14). Curiously, because different binding partners could provide drastically different complementary parts of a folding code, an IDP/IDPR can fold differently at binding to different partners (15,16). Based on these considerations, it has been hypothesized that IDPs/IDPRs should be considered as "edge of chaos" systems that operate in the boundary between order and complete randomness or chaos, i.e. in the region where the complexity is maximal (14).

Structure-Function Continuum of Proteins: Structural Heterogeneity and Functional Multifariousness of IDPS
IDPs and IDPRs are heterogeneous and fractal at multiple levels. Globally, they can be compact or extended, and their major structural components can be heterogeneous as well, and IDPs/IDPRs can be described as different combinations of foldons (independent foldable units of a protein), inducible foldons (disordered regions that can fold at least in part due to the interaction with binding partners), non-foldons (non-foldable protein regions), semi-foldons (regions that are always in a semi-folded state), and unfoldons (regions that undergo an order-to-disorder transition to become functional) (14). This structural heterogeneity defines the multifariousness of disorder-based functions and suggests that, instead of being based on the one protein -one structure -one function concept that describes the functionality of ordered proteins and domains, there is a structure-function continuum (14), where for a given protein biological functions may arise from a specific disordered form, from inter-conversion between disordered forms, and from disorder-to-order or order-to-disorder transitions (6). As a result, IDPs/IDPRs can do a lot (9,(17)(18)(19), and this "a lot" complements the functions traditionally ascribed to ordered proteins (6,9,18).

Useful Decorations: Endless Increase in Functionality with Posttranslational Modifications
The functions of many proteins, especially those lacking unique structures, are modulated, controlled, and extended by various posttranslational modifications (PTMs) that range from enzymatic cleavage of peptide bonds to covalent additions of particular chemical groups, lipids, carbohydrates, or even entire proteins to amino acid side chains. Although DNA typically encodes 20 primary amino acids, proteins contain more than 140 different residues because of various PTMs that can occur at any stage of the protein life (but always after protein biosynthesis), which extends the range of amino acid structures and properties, thereby diversifying the possible structures and functions of proteins (20). It is believed that as many as 300 PTMs can occur physiologically (21). Some PTMs are readily reversible, with the tightly controlled interplay between modifying and demodifying enzymes being used for rapid and economical control of their functions (22). Therefore, it is not surprising that as much as 5% of the genomes of higher eukaryotes is expected to encode PTM-related enzymes (21). Among the most common PTMs are: specific cleavage of precursor proteins; formation of disulfide bonds; covalent addition/removal of low molecular weight groups; and covalent attachment of large biological molecules, as seen in ubiquitination and SUMOylation (23). Furthermore, some proteins require multiple different PTMs for their function. For such multi-PTM proteins, modified sites in proteins can not only mediate individual functions, but can also function together to fine-tune molecular interactions and to modulate overall protein activity and stability (24).
Although all amino acids can be subjected to PTMs, PTMs are usually found at the side chains that act as either strong (Cys, Ser, Thr, Tyr, Lys, His, Arg, Asp, and Glu) or weak (Met, Asn, and Gln) nucleophiles, whereas the remaining residues (Pro, Gly, Lys, Ile, Val, Ala, Trp, and Phe) are rarely involved in the covalent modifications of their side chains (20,22). Curiously, there is a significant overlap between the sets of modifiable and disorder-promoting residues (with the noticeable exception of Cys and Tyr), whereas the majority of non-modifiable residues (except for Pro) are order-promoting. In agreement with these general observations, phosphorylation (18,25,26), acetylation, acylation, protease digestion, methylation, ubiquitination, and some other PTMs were shown to preferentially occur in IDPRs (18,25,27,28). Therefore, IDPRs serve as carriers of PTMs, with disorder being especially important for regions undergoing multiple PTM events (29).

Binding Promiscuity: Never-nude Proteins
IDPs and IDPRs are promiscuous binders, able to participate in interactions with various binding partners via the one-tomany and in many-to-one binding scenarios (6,9,19,22), and many hub proteins (i.e. proteins that contain numerous links within the protein-protein interaction networks) are IDPs (30 -35). It seems that many IDPs/IDPRs are always involved in interaction. Such always-complexed, never-nude proteins are never alone, instead invariably and habitually interacting with partners that could be different at different time points or at different cellular locations.

Polyvalent Interactions: Polybivalent Scaffolds, Polyvalent Wrappers, and Everything in Between
Polyvalent interactions refer to the simultaneous binding of multiple binding sites of one protein to multiple receptors on another protein (36). There are at least two different interaction modes that define the formation of such polyvalent complexes, namely semi-static and dynamic (37). One of the illustrative examples of such interactions is given by IDP/IDPR wrapping around the binding partner, which produces a polyvalent complex, where several disjoint ordered segments of an IDP/IDPR bind to disjoint and spatially distant binding sites on the surface of an ordered partner (38). In such complexes, ordered segments of flexible wrappers are connected by flexible linkers and have almost no intramolecular contacts, instead forming very intensive intermolecular contacts with a binding partner (38). Another important illustration of the disorder-based polyvalent interactions is given by bivalent and polybivalent scaffolds (39). Here, two monovalent IDPs can form a bivalent macromolecular scaffold via interaction with a dimeric protein with two symmetrical binding sites or by self-association (39). Because such bivalent scaffolds can bind additional bivalent ligands, a polybivalent scaffold that promotes subsequent self-association and/or higher order organization of the IDP components can be created (39).
In addition to the aforementioned semi-static and fuzzy polyvalent complexes, IDPs can be involved in the formation of highly dynamic shuffle complexes illustrated by the tri-partite  (4), which is defined by a set of three nonlinear interdependent equations that were originally defined to describe the weather (2,3). This variable changes stochastically, and its stochastic changes clearly resemble the time dependence of the FRET efficiency describing the conformational dynamics of the neuroligin cytoplasmic domain. In fact, similar to IDPs, this system is extremely sensitive to initial conditions (butterfly effect). C, the phase-space representation of the behavior of the variable in the Lorenz attractor (4). Here, the variable is plotted against its rate of change, generating the characteristic loops reflecting the presence of a strange attractor (2,3). Note the resemblance of the shape of this plot to a butterfly. Such a shape indicates that the trajectories of the chaotic system converge onto an infinitely complicated shape, known as an attractor. Trajectories on this attractor that start close together diverge rapidly as time passes, but remain confined to the attractor (4).
interaction of the intrinsically disordered C-terminal distal tail of the human Na ϩ /H ϩ exchanger 1 (hNHE1cdt) with the inactive extracellular signal-regulated kinase-2 (ERK2). 3 Here, several hNHE1cdt binding sites are involved in concomitant, noncooperative, tri-partite interaction with iaERK2, where the hNHE1cdt sites do not affect each other and do not cooperate to increase the overall affinity, but "shuffle" dynamically, being sometimes off, sometimes on, thereby functioning similarly to holding a hot potato, 3 by analogy to the 40-year-old hot potato hypothesis proposed by Perham (41) to describe channeling of substrates and intermediates in multienzyme complexes.

Binding-induced Folding and Unfolding Transitions
As a result of interaction with specific binding partners, IDPs/IDPRs can fold permanently or transiently. Some IDPs can gain differently folded conformations being bound to different partners. Other IDPs preserve significant disorder in their bound states. These various possibilities are schematically represented in Fig. 3 and are briefly discussed below.

Binding-induced Folding: Molecular Glue, Molecular Mortar, and Molecular Epoxy
Many IDPs/IDPRs are known to undergo function-induced disorder-to-order transitions (18,25,42). The degree of this binding-induced folding varies in different systems, giving rise to the broad structural and functional heterogeneity of the resulting complexes. One extreme case is given by IDPs serving as molecular glue/mortar (18). For example, in the structure of the Haloarcula marismortui ribosome, ribosomal proteins use their IDPRs as molecular mortar to fill the gaps and cracks between the rRNA loops (43). On the other hand, many protein complexes are formed via the mutual folding associated with the two-state complexation mechanism, where the protomers, being intrinsically disordered in their uncomplexed form, have to undergo at least partial binding-induced folding at the complex formation (44 -46). Because in such cases unbound forms of both protomers are disordered or "liquid" and because these complexes rigidify after binding, such systems can be described as molecular epoxy.

Binding-induced Transient Folding: Disorder-controlled Dynamic "On-Off" Switches
In addition to the formation of stable or static complexes via the molecular mortar/epoxy mechanisms, IDPs/IDPRs serve as a crucial foundation for the dynamic signaling interactions of the "on-off" switch type because they can bind partners with both high specificity and low affinity (47). This means that the regulatory interactions can be specific and also can be easily dispersed (9,48). Obviously, this disorder-controlled "on-off" switch represents a cornerstone of signaling, where turning a signal off is as important as turning it on (6,48).

Binding-induced Folding Divergence: Morphing Shape-changers
The intrinsic plasticity of IDPs/IDPRs allows them to adopt different structures upon binding to different partners (5, 30, 42, 49 -51). This possibility is illustrated by the morphing MoRFs concept. MoRFs are molecular recognition features that are defined as short, interaction-prone, (partially) structured fragments of IDPs and IDPRs that easily undergo disorder-to-order transitions upon binding to globular partners (52)(53)(54). Structurally, MoRFs are classified according to their structures in the bound state, where ␣-MoRFs form ␣-helices, ␤-MoRFs form ␤-strands, and -MoRFs form structures without a regular pattern of backbone hydrogen bonds (52,54). In its bound state, a MoRF constitutes a short, contiguous, (partially) structured segment fitted into a groove at the surface of the ordered partner. Morphing MoRFs correspond to a subset of MoRFs characterized by the polymorphism of their bound states, where a bound region adopts completely different geometries in the rigidified structures induced by the binding to its partner, depending on the nature of the bound partner (15,16,19,55,56).

Binding and Non-folding: Stochastic Machines
Various disorder-related activities do not directly involve coupled binding and folding of IDPs/IDPRs, but rather are dependent on the flexibility, pliability, and plasticity of their backbone. These are so-called entropic chain activities, as they rely entirely on an extended random-coil conformation of a polypeptide that maintains flexibility while carrying out function (9,69). Obviously, such entropic chain activities can be found not only in a standalone IDPs, but also in some dynamic signaling complexes (70). Fig. 4 shows a founding member of this class of protein machines, which includes the axis inhibition (axin) protein responsible for the colocalization of ␤-catenin, casein kinase I␣ (CKI-␣), and glycogen synthetase kinase 3␤ (GSK3␤), resulting in the formation of a highly dynamic complex crucial for the Wnt signaling pathway (70). In this stochastic machine, ␤-catenin, CKI-␣, and GSK3␤ bind to distant sites of the very long IDPR of axin to form a complex consisting of structured domains connected by long flexible linkers (55, 70 -72). This stochastic machine works not by coordinated conformational changes, but by stochastic, uncoordinated movements of the long disordered linkers (flexible arms), which are in constant chaotic motion but eventually enable productive collisions of ␤-catenin with kinases, leading to phosphorylation of this protein (70). Because the human proteome contains hundreds of axin-like proteins, it has been hypothesized that the "stochastic machine" represents a common mechanism of action of disordered protein complexes (70).

Binding-induced Unfolding: Awakening Dormant Disorder
Recent studies indicated that a path to function from disorder to order is not a one-way street, but a bidirectional road. In fact, functions of some ordered proteins rely on local or even global functional unfolding, which has induced nature and transient character (73). This conditional (74) or transient dis-order (75), also known as cryptic or dormant disorder, functional induced unfolding (73), and regulated unfolding (76), can be awakened by a wide spectrum of environmental factors, such as changes in pH, temperature, the redox potential, mechanical force, or light exposure, or via specific interactions of a protein with its environment, such as interactions with membranes, ligands, other proteins, nucleic acids, or various posttranslational modifications, or the release of autoinhibition (73). The function-related changes in these conditionally disordered proteins are induced by transient alterations in their environment or by modification of their structures, and they are reversed as soon as the environment is restored or the modification is removed (73). Therefore, the concept of ordered proteins with dormant functional disorder (i.e. proteins that have unique structures but are not functional unless (partially) unfolded) challenges the viewpoint that intrinsic disorder could be biologically irrelevant because IDPs/IDPRs have to undergo disorder-to-order transitions either during their functions or to become functional, and being bound, they are not too different from "normal" ordered proteins.

Pliable Complexes: Disorder for Internal and External Uses
As it follows from the aforementioned observations, intrinsic disorder plays a number of important roles in organization, maintenance, and control of protein complexes (37). It has also been emphasized that protein complexes utilize two different types of functional disorder: internal, i.e. disorder used for assembly, movement, and functional regulation of the different parts of a given complex; and external, i.e. disorder used by a complex for interaction with the external regulators (37). Irrespective of this internal/external classification, intrinsic disorder has three global functional implications in protein complexes, where it is used for structural, functional, and regulatory purposes (37).

Insuperable Attraction: Membraneless Organelles
On the opposite side of all the disorder-based protein complexes briefly described in the previous sections, IDPs and hybrid proteins with long IDPRs can be involved in the formation of various membraneless organelles via the intracellular liquid-liquid phase separations (77). As follows from their name, these membraneless organelles are devoid of membranes, with their components being directly involved in contact with the surrounding nucleoplasm or cytoplasm (78,79). These cytoplasmic and nuclear organelles are highly dynamic entities and are formed due to the colocalization of molecules at high concentrations within a small cellular or nuclear microdomain (78,79), leading to the intracellular phase transitions (77,80), which may be triggered by changes in the concentration of IDPs, changes in the concentrations of specific small molecules or salts, or changes in the pH and/or temperature of the solution. The formation can be further regulated by the various posttranslational modifications and alternative splicing of the phase-forming proteins, or by the binding of these proteins to some definite partners (77). Importantly, said formation is not typically accompanied by significant structural changes in the assembling proteins (81). Axin is shown with color variation to make its pathway easier to follow. Ordered RGS (for regulator of G protein signaling) and DIX (for DIshevelled and aXin) domains are located at the N and C termini of axin, respectively. The dashed line corresponds approximately to the location of the Gly 295 -Ala 500 disordered segment (60). Axin binds to CKI-␣ (at two separate sites), to GSK3␤, and also to ␤-catenin. Because the ␤-catenin binding site of axin is located between the GSK3␤ and CKI-␣ interaction sites, and because the two binding sites with CKI-␣ may lead to the formation of a loop, ␤-catenin becomes close to both kinases. Hence, the formation of this ␤-catenin destruction complex pulls all the proteins together, and substantially raises their local concentrations. Because the phosphorylation sites are in a disordered region of ␤-catenin and because the various binding sites are all in a long disordered region in axin, random motions of these flexible regions can readily bring about the substrate-enzyme collisions needed for function. Reproduced with permission from Ref. 70

Evolution of Intrinsic Disorder: Back to the Future
Numerous computational studies have revealed that IDPs/ IDPRs are more common in eukaryotes than in less complex organisms (82)(83)(84)(85)(86)(87)(88)(89)(90). Furthermore, regions of eukaryotic mRNA affected by alternative splicing often code for IDPRs, suggesting that the invention of this intrinsic disorder/alternative splicing duet was an evolutionary breakthrough that eventually generated multicellular organisms (91). The high abundance of IDPs/ IDPRs in eukaryotes and the fact that the process of alternative splicing is found almost exclusively in eukaryotes seem to suggest that intrinsic disorder represents a relatively recent evolutionary creation. However, at the early stages of life evolution on Earth, where the first polypeptides originated in the primordial soup, the probability for these primitive proteins to have unique structures was extremely low. This hypothesis is further supported by the temporal order of addition of the amino acidcoding tri-nucleotides to the genetic code: Gly/Ala, Val/Asp, Pro, Ser, Glu/Leu, Thr, Arg, Asn, Lys, Gln, Ile, Cys, His, Phe, Met, Tyr, and Trp (92). Importantly, many of the early amino acids (e.g. Gly, Asp, Glu, Pro, and Ser) are disorder-promoting, whereas codons encoding the major order-promoting residues (Cys, Trp, Tyr, and Phe) were a later addition to the genetic code (93). Therefore, it is very likely that the primordial polypeptides were intrinsically disordered, and thus, the global evolution of intrinsic disorder is characterized by a wavy pattern, where highly disordered primordial proteins were first substituted by highly ordered enzymes that evolved to catalyze the production of different compounds crucial for the independent existence of the first cellular organisms, and then the protein intrinsic disorder was reinvented at subsequent evolutionary steps, leading to the development of more complex organisms (93).

Natural Abundance of IDPS: Where Do You Not Find Disorder?
Recent years have witnessed a dramatic change in the understanding of the natural abundance of IDPs/IDPRs from some obscure, rare, and easily countable exceptions (in the early days), to the currently accepted abundant normality, where the prevalence of IDPs/IDPRs in various proteomes and biological processes is a well recognized reality (82)(83)(84)(85)(86)(87)(88)(89)(90). In other words, protein science transitioned from searching for "Where can you find disorder?" to "Where is disorder not present?" Because disorder is almost incompatible with catalytic and transport functions, the answer seems to be straightforward: enzymes, transport, and transmembrane proteins should be mostly ordered. However, besides doing their catalytic and transport jobs, these last outposts of order need to be controlled and regulated. Furthermore, they can also control and regulate other proteins. Thus, it is not surprising that in addition to their ordered domains, many enzymes, transport, and membrane proteins have numerous, and often long, functional IDPRs (94 -98). Furthermore, it has been recently hypothesized that even membrane-embedded domains of multipass transmembrane proteins could be disordered (40). In this model, disordered membrane proteins are suggested to have fully formed secondary structure, but little tertiary structure, and the sequence sig-nature for disorder in membrane proteins is likely to be reversed, with disordered transmembrane proteins being more hydrophobic than their folded counterparts (40). Therefore, it seems that disorder is here, there, and everywhere.

Concluding Remarks: Moving Means Alive!
Although gaining structural and functional information about IDPs and IDPRs is a challenge, because they do not typically "freeze" while their "pictures are taken," a decade and a half of intensive studies on these proteins revealed a number of unique features related to their sequence organization, structural heterogeneity, conformational properties, natural abundance, cellular distribution, functional repertoire, regulation, interactability, involvement in the pathogenesis of various diseases, etc. However, the mass of data produced so far represents just a small tip of a colossal iceberg, and disorder-related research continues to awe researchers on a regular basis. More discoveries and breakthroughs are expected in the future due to the elaboration of novel experimental and computational tools for focused studies of these intriguing members of the protein universe.