TAR DNA-binding protein 43 (TDP-43) liquid–liquid phase separation is mediated by just a few aromatic residues

Eukaryotic cells contain distinct organelles, but not all of these compartments are enclosed by membranes. Some intrinsically disordered proteins mediate membraneless organelle formation through liquid–liquid phase separation (LLPS). LLPS facilitates many biological functions such as regulating RNA stability and ribonucleoprotein assembly, and disruption of LLPS pathways has been implicated in several diseases. Proteins exhibiting LLPS typically have low sequence complexity and specific repeat motifs. These motifs promote multivalent connections with other molecules and the formation of higher-order oligomers, and their removal usually prevents LLPS. The intrinsically disordered C-terminal domain of TAR DNA-binding protein 43 (TDP-43), a protein involved in motor neuron disease and dementia lacks a dominant LLPS motif, however, and how this domain forms condensates is unclear. Using extensive mutagenesis of TDP-43, we demonstrate here that three tryptophan residues and, to a lesser extent, four other aromatic residues are most important for TDP-43 to undergo LLPS. Our results also suggested that only a few residues may be required for TDP-43 LLPS because the α-helical segment (spanning ∼20 residues) in the middle part of the C-terminal domain tends to self-assemble, reducing the number of motifs required for forming a multivalent connection. Our results indicating that a self-associating α-helical element with a few key residues regulates condensate formation highlight a different type of LLPS involving intrinsically disordered regions. The C-terminal domain of TDP-43 contains ∼50 disease-related mutations, with no clear physicochemical link between them. We propose that they may disrupt LLPS indirectly by interfering with the key residues identified here.

Eukaryotic cells have different membrane-enclosed compartments (1) but not all of these are enclosed by a lipid membrane. Membraneless organelles, such as the stress granules that regulate RNA stability or the nucleoli that regulate ribonucleoprotein assembly (2), are also biologically important. Both in vitro and in-cell experiments have shown that some of the proteins in membraneless organelles can reversibly form condensates through liquid-liquid phase separation (LLPS) 2 (reviewed in Refs. [3][4][5][6][7]. Multivalency (the presence of multiple binding sites) is critical for LLPS, as highlighted by the importance of tandem-repeat Src homology 3 domains and polyproline-rich motifs for signal transduction in actin regulation (8). Intrinsically disordered proteins (IDPs) with low sequence complexity, repeated sequence motifs, and structural plasticity facilitate multivalent interactions, as has been shown for several RNA-binding proteins, including hnRNPs (9 -12), FUS (11,(13)(14)(15)(16), and Whi3 (17). These RNA-binding proteins maintain protein homeostasis in cells by storing RNA inside membraneless organelles, whereas it has also been suggested that their impaired or irreversible aggregation may lead to proteinaceous diseases (9,13). Signal transduction and RNA storage are not the only functions of the condensates: the protein BuGz facilitates the assembly of microtubules (18); SPD-5 is a key scaffold protein in the nucleation of microtubules in pericentriolar material (19); gene silencing has been shown to involve the LLPS of heterochromatin proteins (20, 21); the protein Tau, implicated in Alzheimer's disease, has been shown to form condensates (22); and the yeast prion protein Sub35 undergoes LLPS in response to cellular stress (23). Our previous study also suggests that the LLPS of galectin-3's low-complexity N-terminal domain is important for the formation of extracellular galectin-glycan lattices (24).
A number of amino acid motifs in IDPs with low sequence complexity are known to facilitate LLPS: for instance, tyrosines flanked with glycine or serine in FUS protein (11), the large number of FG and RG dipeptide repeats in Ddx4 (25), argininerich dipeptide repeats in C9orf72 (26), or the prevalence of negatively charged and aromatic/hydrophobic residues in the disordered domain of nephrin (27); and nucleoporins multivalently interact with transport factors through their FG repeats, hinting at a similar tendency to form condensates (28,29). In these proteins, the formation of multivalent connections is facilitated by the presence of dozens of these motifs. In contrast, there is no dominant LLPS motif in the intrinsically disordered C-terminal domain of the TAR DNA-binding protein of 43 kDa (TDP-43; Fig. 1), which also forms condensates (9, 30 -32): this ϳ160-residue domain contains only six positively and three negatively charged amino acids, four typical (G/S)-(F/Y)-(G/S) LLPS motifs, and three sparsely distributed FG repeats (Fig. 1A). How this intrinsically disordered domain forms condensates using relatively few LLPS motifs is an open question. One of the main differences between TDP-43 and other IDP LLPS systems is its central ␣-helical element (residues ϳ320 -340) (30,31,33), which is critical for forming condensates (30,32). This ␣-helix assists intermolecular self-association (30,31), and we have shown in a previous study (31) that LLPS is driven by hydrophobicity and inhibited by electrostatic repulsion. Although we demonstrated that removing a hydrophobic tryptophan in this ␣-helix severely disrupts LLPS (31), we did not untangle the network through which TDP-43 molecules make multivalent contacts using these sparse LLPS motifs. Here, we use mutagenesis to systematically investigate the effects on LLPS of seven aromatic residues that precede or follow glycine or serine (Fig. 1). We show that the most important elements for TDP-43 LLPS are the three tryptophans in this domain, especially Trp-334, whereas other aromatic motifs contribute to a lesser extent. This mechanism, involving just a few repeated motifs and an ␣-helical assembly center, is different from those previously described for LLPS and provides a new perspective on the disease-associated mutations of TDP-43.

TDP-43 liquid-liquid phase separation
In addition to its functions in gene regulation and mRNA transportation (34), TDP-43 has been identified as the main disease protein in the biopsies of amyotrophic lateral sclerosis (ALS) patients (35). Like many other ALS-associated RNAbinding proteins such as FUS, some hnRNPs, and TIA-1 (36), TDP-43 has been shown to undergo LLPS (9). It has been suggested, furthermore, that the disruption of LLPS increases the pathological fibrilization of these proteins (36). The mechanism governing LLPS for TDP-43 is therefore important for its pathogenicity. Cell-based studies have demonstrated that TDP-43 droplet-like properties are inducible under certain conditions (32,37). Conicella et al. (30) have also characterized in detail the LLPS properties of the C-terminal domain of TDP-43 in vitro. They reported that this domain only undergoes LLPS in the presence of ions or RNA molecules (30), whereas with a slightly different construct and buffer conditions, we induced LLPS of TDP-43 at low temperatures in the absence of salt or RNA molecules (31). We measured the turbidity of the WT sample (the optical density at 600 nm, OD 600 nm , Fig. 2A) and recorded micrographs of the condensates at different static temperatures (Fig. 2B) to confirm the occurrence of LLPS. We also collected time-lapse micrographs from low to high and then back to low temperatures to confirm the reversibility of the process (Fig. 2C, and supporting Movie S1). Although turbidity depends on both the number and the size of the particles, the TDP-43 condensates were all found to be around 1 m in diameter ( Fig. 2B and supporting Fig. S1) in agreement with Molliex et al.'s observation (9). This situation differs from the spread of condensate sizes usually observed for LLPS proteins. For TDP-43 therefore, turbidity reflects only the number of particles present. We thus used the turbidity of the sample, supported by micrographs, to indicate the presence of LLPS. We measured the turbidity and collected microscopic images of the ALS-related mutants Q331K (net charge increased from ϩ3 to ϩ4) and A315E (net charge reduced from ϩ3 to ϩ2), artificial W334G (removal of a hydrophobic residue), the Q331K variant in the presence of NaCl (screening of charge-charge interactions), and the WT in the presence of urea or 1,6-hexanediol (disruption of hydrophobic interactions) to confirm that LLPS occurs as a competition between

The three key tryptophans critical for LLPS
In our previous study, we demonstrated that replacing hydrophobic Trp-334 with glycine disrupts LLPS despite favorable conditions, i.e. high protein concentration (40 M), high NaCl concentration (300 mM), and low temperature (5°C) (31). We also noted that LLPS still occurs for W334G when the protein concentration is greater than 100 M. (Note that all the ALS-associated mutants we have studied and the WT precipitate rapidly when the protein concentration is higher than 40 M under our standard buffer conditions: pH 6.5, 10 mM phosphate buffer.) Increasing the protein concentration increases the chance for the protein molecules to interact with one another and thus compensates for the loss of the attraction from Trp-334 driven by the hydrophobic interaction. A single residue, however, is unlikely to disrupt all the multivalent contacts that contribute to forming the higher-order assembly unless this assembly is divalent. Trp-334 follows a serine and precedes a glycine residue, which is reminiscent of a recognized LLPS motif: tyrosine or phenylalanine flanked by glycine or serine (10,11,14). It is noteworthy that there are only three tryptophans in the C-terminal domain, and all three present this motif (Trp-334, Trp-385, and Trp-412, purple bars in Fig.  1). Conicella et al. (30) performed intermolecular paramagnetic relaxation enhancement studies using a nitroxide spin-label introduced at residue 317 and observed enhanced NMR relaxation rates from the central ␣-helix of one molecule to that of another and between the middle ␣-helix and residues 382-385 (containing Trp-385). Although this was not mentioned in the original study, the relaxation rates of residues ϳ400 -412 were also increased, indicating contacts between the spin label at position 317 and the region around Trp-412. Furthermore, tryptophans have been shown to initiate the refolding of a denatured protein in acidic urea (38), which also suggests that tryptophans may initiate higher-order intermolecular assembly.
In light of these studies, our hypothesis was that the three tryptophans are involved in the initiation of LLPS and may form multivalent connections by themselves. To understand the importance of each tryptophan for LLPS, we created all possible tryptophan-to-glycine constructs and used the turbidity and micrographs of the corresponding samples under conditions favoring condensate formation. At 5°C and a protein concentration of 20 M, clear evidence of LLPS was only observed for the WT sample (Fig. 3A). In the presence of 100 mM NaCl at the same protein concentration (Fig. 3, B and D, and supporting Fig. S1), the only constructs for which increased turbidity and condensates in the micrographs were observed were two single tryptophan-replaced (⌬1W) variants: W385G and W412G. When we increased the protein concentration to 100 M but without salt, all three ⌬1W variants showed signs of LLPS with W385G and W412G having a stronger phase separation tendency than W334G (Fig. 3, C and D, and supporting Fig. S1). Under these conditions, condensates and increased turbidity were observed for all three double-tryptophan (⌬2W) mutants, more so for the W385G/W412G variant than the W334G/ W385G and W334G/W412G variants. There were almost no detectable signs of LLPS for the triple-tryptophan (⌬3W) mutant, W334G/W385G/W412G. When salt was added at this high protein concentration, the W385G and W412G samples

TDP-43 LLPS mediated by a few key residues
precipitated immediately (Fig. S3) and the ⌬2W variants followed a similar trend to the one shown in Fig. 3C, namely that the strongest signs of LLPS were observed for the variant with Trp-334 retained (W385G/W412G). The turbidity of the ⌬3W sample was insignificant. We also performed the experiments "reversibly," to demonstrate that the samples with zero optical density do still contain protein molecules. We centrifuged 100 M samples at 15,000 ϫ g at 5°C for 5 min with a higher protein concentration in the supernatant indicating a lesser propensity for LLPS. The results are consistent (Fig. S4).
These results indicate that Trp-334 is the most important of the three tryptophans for LLPS, as its removal severely reduces the LLPS tendency of the corresponding WT and disease related variants under the conditions considered here (Figs. 2D and 3B). Only when the protein concentration is increased to 100 M are condensates observed for the W334G variant. The 20 M samples of the W385G and W412G mutants are more prone to LLPS than W334G (Fig. 3, B-D), reinforcing this interpretation that Trp-334 is more important than Trp-385 or Trp-412 for LLPS. The ⌬2W variants with Trp-334 removed (W334G/W385G and W334G/W412G) show much less of a tendency toward LLPS than the W385G/W412G variant, also in agreement with this interpretation. Although we do not have NMR chemical shift assignments for all these tryptophan mutants, we have shown previously that the W334G mutation, which is within the ␣-helical region, has little effect on its ␣-helical propensity (31). The good overlap between the NMR spectra of the W334G and ⌬3W variants (Fig. 3F), especially for cross-peaks from residues in the ␣-helix, indicates that their conformations are similar.

The LLPS network between tryptophans and other motif residues
Because alanine is generally regarded as being as hydrophobic as tryptophan (39) or more so (40,41), and has a greater ␣-helical propensity (42), we replaced Trp-334 with alanine to investigate the role of hydrophobicity and secondary structure in TDP-43's LLPS. (Note that we did not insert phenylalanine or tyrosine because (G/S)-(F/Y)-(G/S) is also an LLPS motif.) Samples of the W334A (unaltered or increased hydrophobicity) variant were found to have a similar turbidity to those of the W334G (reduced hydrophobicity) variant under all conditions, suggesting that Trp-334 is involved in intermolecular multivalent linking (Fig. 4).
The ⌬2W construct with Trp-334 retained (W385G/ W412G) still shows signs of LLPS when the protein concentration is high (100 M, Fig. 3, C and D), indicating that residues other than the three tryptophans are involved in the formation of condensates (i.e. it is not simply a trivalent network that drives LLPS). Accordingly, we introduced glycines to replace the single tyrosine and three phenylalanine residues in (G/S)-(F/Y)-(G/S) motifs in the W385G/W412G variant, leaving Trp-334 only in place (i.e. ⌬2W.⌬3F.⌬1Y in the nomenclature introduced above). No increase in turbidity was observed for this construct under any conditions, nor were any condensates observed in the micrographs (Fig. 4), as was the case for the ⌬3W variant. These results for the ⌬2W (W385G/W412G) and ⌬2W.⌬3F.⌬1Y constructs indicate that the corresponding tyrosine and phenylalanine residues are also involved in the formation of condensates. However, their contribution is
Supporting Fig. S2 shows that mutating Trp-334 has no effect on the structural propensity of the ␣-helical region. Because all the other mutations investigated here affect residues outside the ␣-helical region, one can assume that the ␣-helical propensity is likewise unchanged in these variants. This would also be consistent with our studies of ALS-associated mutants: G298S, Q331K, M337V (31), A315E in this study (Fig. S2), G294V, G294A, and A315T variants (Fig. S5), and data from other groups (30,33).

NMR spectroscopy indicates that self-association is enhanced in the presence of Trp-334
In our previous work, we showed using NMR peak intensity ratios and chemical shift perturbations between 40 and 20 M samples and that the ␣-helical region (residues 320 -340) of the C-terminal domain TDP-43 self-associates. The NMR peak intensity ratio in the ␣-helical region is less than would be expected from the change in protein concentration, indicating a shift in equilibrium from the monomeric to the self-associated state, as shown also by the chemical shift differences (31). These two NMR parameters are more difficult to measure for the mutants studied here because they are less prone to undergo LLPS so we increased the concentration ratios from two to five (i.e. we compared 100 and 20 M samples). We collected NMR spectra at 15°C, a temperature at which no condensate is observed but self-association still occurs (31). For W334G, selfassociation of the ␣-helix between protein molecules still occurs: the signal ratios for the ␣-helical region are slightly less than the expected five-to-one ratio and there are small changes in chemical shift between the two concentrations (upper panels in Fig. 5, A and B, and supporting Fig. S6A). Replacing Trp-334 with hydrophobic alanine does not recover self-association (middle panels in Fig. 5, A and B, and supporting Fig. S6B). These results for the W334G and W334A variants imply that the self-association tendency of the ␣-helical element is independent from the effects of Trp-334. On the other hand, removing all the aromatic residues in LLPS motifs except Trp-334 (⌬2W.⌬3F.⌬1Y), leads to more self-association than in the variants without Trp-334 (bottom panels in Fig. 5, A-C). This indicates that Trp-334 strongly enhances the self-association of the ␣-helix. However, the ⌬2W.⌬3F.⌬1Y variant does not form condensates (Fig. 4C) because even though it self-associates more, multivalent contacts are still required for the protein to undergo LLPS.

Discussion
Protein multivalency is crucial for the assembly of higherorder oligomers (8). Intrinsically disordered regions favor higher-order assembly, but those with a low sequence complexity are more likely to be multivalent (11). These simple sequences often contain repeated patterns that act as alternative contact sites. For example, clustered blocks of positively and negatively charged residues are critical for the LLPS of Ddx4 (25), and (G/S)-(F/Y)-(G/S) motifs have been identified as important for LLPS in FUS protein (11). On the other hand, in the case of complex coacervation for the C-terminal domain of nephrin, the total charge composition is more important than the primary sequence (27), suggesting that hybrid connections between charged and aromatic/hydrophobic residues are involved. Systematically removing these patterns or motifs gradually reduces the LLPS tendency of the corresponding constructs, as shown in a recent study in which any consecutive 5 of the 27 tyrosines in an IDP were show to be of equal importance for phase separation (16). Of the 25 tyrosine or phenylalanine residues in hnRNP A2 on the other hand,

TDP-43 LLPS mediated by a few key residues
about 11, clustered in a specific block, have been shown to be more important than the others for LLPS (12). The mechanism of LLPS for the C-terminal domain of TDP-43 is different, however, because only a small number of residues are involved. The most important of these is Trp-334, followed by Trp-385 and Trp-412. When the latter two tryptophans are removed, LLPS still occurs at 5°C but only at high protein and high salt concentrations, whereas it does not when all the aromatic residues in LLPS motifs other than Trp-334 are removed, suggesting that these aromatic residues are involved in LLPS but to a lesser extent.

Why is LLPS controlled by just a few residues in TDP-43?
Multivalent connections in IDPs that undergo LLPS typically involve a large number of different motifs (11,25) or specific types of residues (27) distributed throughout the amino acid sequence. For the C-terminal domain of TDP-43, however, LLPS is driven by just three tryptophans with minor contributions from one tyrosine and three phenylalanines. One potential reason why LLPS is controlled by fewer motifs in the C-terminal domain of TDP-43 is the presence of an ␣-helix. This element spans roughly 20 residues in the center of the domain (Fig. 1) and is involved in intermolecular interactions (30,31). It is highly conserved (32) and mediates pre-mRNA splicing through interactions with other hnRNP proteins (43)(44)(45). Deleting this ␣-helical region or reducing its secondary structure propensity by point mutations or by inserting random sequences (30,32) prevents LLPS. Moreover, as demonstrated in model polyalanine and polyglutamine systems, Polling et al. (46) suggest that polyalanine-formed ␣-helices can promote self-assembly. They also suggest that ␣-helical driven clustering may facilitate the nucleation of amyloid fibrils when, as for polyadenylate-binding nuclear protein 1, the fibril-promoting region is outside the polyalanine stretch. The C-terminal domain of TDP-43 has a similar sequence arrangement: the QN-rich domain that follows the ␣-helix (Fig. 1) is known to be involved in the protein's aggregation (47)(48)(49). Trp-334 in this ␣-helix may enhance the intrinsic tendency toward self-assembly of the helical element (Fig. 5), and this enhanced intermolecular interaction may thus facilitate LLPS. There is less self-association in the absence of the ␣-helix so many more LLPS motifs would be required to sufficiently increase the chance of intermolecular contacts via weak electrostatic interactions, hydrophobicity, and/or translational diffusion. A large number of repeated LLPS motifs may be an evolutionary advantage (25, 50). However, by bringing the molecules closer together, the ␣-helix reduces the number of motifs required to form higher-order assemblies.
It has been shown for several proteins that a reduced tendency toward LLPS increases the likelihood of pathological aggregation (9,13). There are around 50 ALS-associated variants of the C-terminal domain of TDP-43 but no clear physiochemical link between the corresponding mutations. It has recently been reported that the phosphorylation of FUS protein close to LLPS-related tyrosine sites may cause disease (15,16). Several of the ALS-associated mutations of TDP-43 (black stars in Fig. 1A) and the Ser-409 and Ser-410 phosphorylation sites (51, 52) are also close to the LLPS motifs we identified here. Studying the effect of these ALS mutants on LLPS motifs may offer an alternative avenue toward understanding the cause of this disease and others. Furthermore, the fact that the LLPS of this intrinsically disordered domain is controlled by just a few residues may explain the unusual droplet form of TDP-43 (9).

Protein expression and purification
The constructs of the C-terminal domain of TDP-43 (residues 266 -414) were prepared using a His 6 tag as described previously (31,53). This purification tag has no effect on the ␣-helical propensity or the LLPS tendency of the domain (31). Most mutants were created using a designed primer (Table S1). The ⌬2W.⌬3F.⌬1Y construct was created by whole gene synthesis and the ⌬1W.⌬3F.⌬1Y and ⌬0W.⌬3F.⌬1Y constructs were created with adapted primers (Table S1). All constructs were verified by DNA sequencing. The same protein expression, purification, and sample quality control protocols were used as described in our previous publication (31). In short, the overexpressed protein was extracted from inclusion bodies using 8 M urea and purified using a nickel-charged immobilized metal-ion affinity chromatography column (Qiagen, Inc.) and then a C4 reverse phase column (Thermo Scientific, Inc.) using an HPLC system. The purified sample was lyophilized for storage, and then dissolved in 10 mM phosphate buffer at pH 6.5 for experiments. The protein concentration was determined using the Beer-Lambert law by measuring the absorbance at 280 nm using a NanoDrop UV-visible spectrometer (Thermo Scientific, Inc.) with the appropriate extinction coefficients ( Table 1). The extinction coefficients were calculated based on the primary sequence using the web server ExPASy (54).

Turbidity measurements
The turbidity of the protein samples was quantified by measuring light transmittance at 600 nm using a JASCO V550 UVvisible spectrophotometer. The temperature of the spectrophotometer was controlled using a water bath. For each measurement, the sample was left to equilibrate in the temperature-controlled water bath for 5 min. Ten scans were accumulated for each measurement and measurements at each temperature point were repeated three times to estimate the associated errors. The data are reported as mean Ϯ S.D. The samples were all left to return to room temperature after each measurement to confirm the reversibility of the LLPS process.

Microscopy
The micrographs were collected using an Olympus BX51 microscope with a ϫ40 long working distance objective lens. The images were recorded with a Zeiss AxioCam MRm cam-era. The protein samples were placed on a thermostatic microscope stage (THMS600, Linkam Scientific Inc.) and were equilibrated for 5 min at each temperature before recording the micrographs.
For the time-lapse micrographs, the samples were put into the sample chamber pre-equilibrated at 0°C. Once the number of condensates ceased to increase, the temperature of the chamber was changed to 25°C and the first micrograph was recorded (time zero). When all the condensates had disappeared, the temperature was immediately changed to 0°C. Micrographs were collected every 10 s (Fig. 2C shows two images recorded 40 s apart, and supporting Movie S1 shows the time-lapse movie). 15 N-edited heteronuclear single-quantum coherence (HSQC) spectra were recorded using the standard pulse sequence and the WATERGATE scheme to suppress solvent signal (55,56). To confirm the quality and integrity of the samples, one-dimensional proton spectra were recorded with an improved WATER-GATE solvent saturation scheme (57) before and after all NMR experiments. Standard chemical shift assignment experiments were recorded with nonuniform sampling schemes (58,59). All spectra were recorded on a Bruker AVIII 600 MHz spectrometer with a cryogenic probe. The data were processed using NMRPipe (60). Kjaergaard et al.'s (61) database of random-coil shifts was used for secondary chemical shift analysis. The procedure used to analyze the peak intensities has been described in detail previously (31). Briefly, the nonlinear line shape modeling function in NMRPipe was applied to all the HSQC spectra, with a Lorenzian-to-Gaussian window function. The averaged chemical shift difference (⌬␦ av ) was calculated using, where ⌬␦ H and ⌬␦ N are the chemical shift differences between two 1 H-15 N HSQC spectra, respectively, for the amide proton and the nitrogen chemical shifts.

Concentration measurements after centrifugation
Different 100 M protein samples were prepared in 1.5-ml Eppendorf tubes. The samples were centrifuged at 15,000 ϫ g at 5°C for 5 min. The concentration of the supernatants was measured using a NanoDrop spectrometer. The experiments were repeated three times for each variant. The results are reported as mean Ϯ S.D.

Table 1 Extinction coefficients and 1% absorbance used to determine protein concentrations
The extinction coefficient is at 280 nm measured in water. The values were determined based on the primary sequence using the ProtParam function (web.expasy. org/protparam/) on the ExPASy server.