The intrinsically disordered N-terminal domain of galectin-3 dynamically mediates multisite self-association of the protein through fuzzy interactions

Galectins are a family of lectins that bind β-galactosides through their conserved carbohydrate recognition domain (CRD) and can induce aggregation with glycoproteins or glycolipids on the cell surface and thereby regulate cell activation, migration, adhesion, and signaling. Galectin-3 has an intrinsically disordered N-terminal domain and a canonical CRD. Unlike the other 14 known galectins in mammalian cells, which have dimeric or tandem-repeated CRDs enabling multivalency for various functions, galectin-3 is monomeric, and its functional multivalency therefore is somewhat of a mystery. Here, we used NMR spectroscopy, mutagenesis, small-angle X-ray scattering, and computational modeling to study the self-association–related multivalency of galectin-3 at the residue-specific level. We show that the disordered N-terminal domain (residues ∼20–100) interacts with itself and with a part of the CRD not involved in carbohydrate recognition (β-strands 7–9; residues ∼200–220), forming a fuzzy complex via inter- and intramolecular interactions, mainly through hydrophobicity. These fuzzy interactions are characteristic of intrinsically disordered proteins to achieve liquid–liquid phase separation, and we demonstrated that galectin-3 can also undergo liquid–liquid phase separation. We propose that galectin-3 may achieve multivalency through this multisite self-association mechanism facilitated by fuzzy interactions.

NTD and the CRD interact in a "fuzzy" complex manner interand intramolecularly through both NTD-NTD and NTD-CRD interactions.

The self-association of galectin-3 is concentration-dependent
We investigated galectin-3 self-association at a residue-specific level by comparing the 15 N-1 H HSQC spectra of 40 -400 M protein samples. Several cross-peaks moved as the protein concentration was increased (Fig. 1D). The largest changes occurred for peaks assigned to residues ϳ200 -220, in ␤strands 7, 8, and 9 ( Fig. 1E), which are on the reverse side with respect to the carbohydrate-binding site (Fig. 1B). Accounting for the change in concentration alone, the peak intensity ratio between the 40 and 400 M spectra should be ϳ0.1, but the mean ratios for the NTD and CRD are ϳ0.13 and ϳ0. 25, respectively (Fig. 1F). This is because the corresponding peaks are weaker and/or broader in the spectrum of the higher concentration sample. The chemical shift changes and intensity ratios suggest that galectin-3 self-associates at higher protein concentrations, in agreement with previous observations (18).

NMR dynamics studies confirm the self-association
We used NMR spin relaxation experiments, which are widely used to study the dynamical properties of biomolecules, to further characterize the self-association. The overall tumbling rate, diffusion tensor, and internal motions of a molecule in the pico-to nanosecond timescale can be derived from the results

Galectin-3 self-association
of spin relaxation spectroscopy. The relaxation data are also indicative of chemical exchange due to micro-to millisecond timescale motions (23)(24)(25). We collected the longitudinal relaxation rates (R 1 ), transverse relaxation rates (R 2 ), and heteronuclear NOEs (hetNOEs) at different protein concentrations. The residues in the NTD have higher R 1 values, lower R 2 values, and lower hetNOEs than those in the CRD because the intrinsically disordered region is more dynamic (Fig. 2, A-C). More importantly, the values obtained for R 2 and R 1 are higher and lower, respectively, for the higher-concentration sample, which is generally interpreted as indicating equilibrium exchange between the monomeric form and a larger molecule (26 -31). The hetNOEs vary little with concentration ( Fig.  2C) because hetNOEs are relatively insensitive to the overall tumbling rate of a molecule (23)(24)(25). The same concentration-dependent trend from the 13 C transverse relaxation experiments and flat Carr-Purcell-Meiboom-Gill relaxation dispersion (32,33) profiles indicate that the increase of 15 N R 2 is not due to chemical exchange (supplemental Fig. S1).
Model-free analysis with one or two types of motions can be applied to interpret the dynamics of a folded protein (34). This analysis may be applied to intrinsically disordered systems with additional motional modes by increasing the number of parameters for mathematical fitting, but more experimental data are required (35). It is possible to describe the dynamics of fulllength galectin-3 using other types of motion, but this is beyond the scope of the current study. Because our analyses were mainly focused on the CRD part, we deduced the dynamic properties of full-length galectin-3 from the data collected for the CRD. We used the program Tensor2 (36) to calculate the generalized order parameters (S 2 ) and internal correlation times of the amide N-H bond vectors ( i ) using the R 1 , R 2 , and hetNOE data measured in the 40 M sample (34). Most of the S 2 values are Ͼ0.8, and the i values are Ͻ200 ps (Fig. 2, D and E), indicating that the N-H bond vectors move on a timescale shorter than the overall tumbling rate of the molecule. These results are similar to those of an NMR dynamics study of the CRD only (37). Under such criteria, the overall tumbling rate of the CRD can be calculated directly using Equation 2 (see "Experimental procedures"). We averaged the relaxation rate constants (Fig. 2 (F and G) and supplemental Table S1) and used first-order extrapolation to estimate the corresponding values at zero protein concentration (31). The extrapolated R 1 and R 2 values give a rotational correlation time for the CRD of ϳ11.76 ns in the presence of the NTD. We then compared this approximation with the theoretical correlation time. The rotational diffusion of a rigid domain in a protein with an intrinsically disordered tail is hindered because of the presence of disordered peptides (38). The HYCUD algorithm has been used to predict the rotational tumbling dynamics of proteins with flexible linkers (39,40) and of intrinsically disordered proteins (41). Using the same approach for the CRD in the presence of the NTD yields a correlation time of 11.84 Ϯ 2.34 ns (supplemental Table S1 and Fig. 2H). This value agrees well with the one calculated using relaxation times extrapolated to zero protein concentration (11.76 ns), confirming that 40 M galectin-3 is  Table S2. Error bars, S.D.

Galectin-3 self-association
mainly monomeric. Accordingly, for all experiments, we used the data of the 40 M sample as a monomeric reference to avoid the intrinsic effect on protein dynamics caused by different lengths of the construct, buffer conditions, and working temperatures.

The intrinsically disordered NTD interacts with the CRD in a fuzzy complex manner
We applied the same NMR approach to NTD-only and CRDonly constructs (supplemental Figs. S2 and S3). The absence of any variations in chemical shifts, HSQC peak intensity ratios, and R 2 rates confirms that the CRD alone does not self-associate, whereas the corresponding changes for the NTD alone show that it does (5,6). Therefore, although the CRD in the full-length construct shows more pronounced chemical shift differences between protein concentrations (Fig. 1, D-F), these changes are mediated in the presence of the NTD.
We then systematically truncated the NTD to identify which segment is essential for self-association. The construct without the first 30 residues (⌬ 1-30 ) shows the most obvious decrease in the R 2 difference (⌬R 2 , gray bars in Fig. 3 (A and B); averaged ⌬R 2 value for the CRD are 11.7 Ϯ 1.9 and 4.0 Ϯ 0.9 s Ϫ1 , respectively (supplemental Table S2)). The ⌬R 2 values of ⌬ 1-40 are similar to those of ⌬ 1-30 , whereas for ⌬ 1-40 to ⌬ 1-100 , the ⌬R 2 values gradually decrease ( Fig. 3 (C-F), 4.0 Ϯ 0.7, 3.6 Ϯ 0.6, 2.1 Ϯ 0.4, and 1.5 Ϯ 0.3 s Ϫ1 ). Similarly, the chemical shift differences between two concentrations for residues ϳ200 -220 are smaller for the shorter constructs (Fig. 3I). These results indicate that all of these 10-residue-based segments (from 20 to 100) are involved to a certain level in self-association, probably because of the repeated PGAX motif ( Fig. 1A; there is no such motif between residues 30 and 40, and this may explain the similar ⌬R 2 values between the ⌬ 1-30 and ⌬ 1-40 constructs). Furthermore, the HSQC peaks from residues 200 -220 shift systematically with the number of residues truncated (constructs ⌬ 1-30 to ⌬ 1-100 ; Fig  , and with residues 200 -220 shown in orange (P). The frequently contacted and negatively charged area is circled in yellow in both panels. Q, a graphical explanation of the peak movements observed for the galectin-3 constructs (using residue Ala-216 as an example). The peak corresponding to what would be observed if this site were fully occupied by the NTD (the bound state, not observed experimentally) is shown in purple. The intermediate positions of the peaks observed for the full-length protein (in red) show that the corresponding residues are in fast exchange between the free and bound states. The more contact there is between the NTD and the CRD (e.g. at higher protein concentrations), the greater the population of the bound state is and the closer the experimental peak is to the bound position. The opposite is true when there is less contact between the two domains (e.g. shorter constructs). Error bars, S.D.

Galectin-3 self-association
these residues (positions 200 -220) are in the fast exchange regime between a free state (as in the CRD-only construct) and a fully occupied state (Fig. 3Q), thus suggesting that there is interaction between the NTD and residues 200 -220. The black peak (the free state) in Fig. 3Q is observed when the surface of residues 200 -220 is not in contact with the NTD (i.e. when the CRD is studied in isolation). The concentration dependence experiments and those on the systematically truncated constructs show that the more contact there is between the NTD and residues 200 -220, the further the corresponding peaks are from their positions in the CRD-only spectrum (Fig. 3Q). In the 40 M samples, intermolecular self-association is unlikely (Fig.  2), so these differences must be caused by intramolecular contacts between the NTD and CRD ("intra-NC"). In the 400 M samples, these differences reflect intermolecular ("inter-NC") and intra-NC contacts. These results also suggest that many NTD sites can bind to a single CRD region, a typical characteristic of fuzzy interactions (42). Consistent with this fuzzy interaction model, SAXS data collected from samples with a low protein concentration (Fig. 4, A and B) show that full-length galectin-3 is smaller than a computationally modeled random NTD ensemble (radius of gyration of 28.92 versus 34.61 Å; Fig.  4, C and D); a model with restrained spatial sampling is closer in size (27.66 Å; Fig. 4E) to the experimentally determined value (see "Experimental procedures" for details of the analysis).
We also noticed that the constructs with the first 10 or 20 residues removed (⌬ 1-10 and ⌬ 1-20 ) behave differently from the others, with similar or slightly higher ⌬R 2 values (9.4 Ϯ 1.4 and 12.8 Ϯ 1.4 s Ϫ1 (Fig. 3 (G and H)) compared with 11.7 Ϯ 1.9 s Ϫ1 (Fig. 3A)) and chemical shift perturbations similar to those of full-length galectin-3 (Fig. 3J). The HSQC peaks from residues 200 -220 in the ⌬ 1-10 and ⌬ 1-20 constructs are further away from their positions in the CRD-only spectrum than they are in full-length galectin-3 at both concentrations ( Fig. 3 (M and N) and supplemental Fig. S4). This is the opposite trend from the one shown in Fig. 3 (K and L) and suggests that the first 20 residues hinder the interaction between the NTD and CRD. The only two charged residues in the NTD (Asp-3 and Asp-9, negatively charged) may induce electrostatic repulsion that interferes with N-C interactions. Indeed, modeling (using APBS (Adaptive Poisson-Boltzmann Solver) (43)) the surface charge distribution of the CRD reveals a negatively charged region that corresponds partly to the surface of residues 200 -220 ( Fig. 3 (O and P), circled in yellow). In addition, the presence of five polar residues between positions 11 and 20 (two serines, two asparagines, and one glutamine) may also hinder contact between the NTD and CRD, because their interaction is mainly driven by hydrophobicity (see below). The different behaviors following the deletion of the first 20 residues may relate to its biological importance; the secretion of galectin-3 is blocked when the first 11 amino acids are removed (44), and a phosphorylated Ser-6 (increasing the total negative charge) is required for its anti-apoptotic activity (45).

Paramagnetic relaxation enhancement experiments show both intermolecular N-N and N-C interactions
To investigate the role of NTD-NTD (supplemental Fig. S2) interactions in full-length galectin-3, we recorded HSQC spectra of 15 N-labeled wild-type protein mixed with an equal amount of NMR-inactive ( 14 N) protein with an oxidized (1-oxyl-2,2,5,5-tetramethyl-⌬3-pyrroline-3-methyl) methanethiosulfonate (MTSL) label to four different sites (A10C, A31C, A49C, and A100C) in the NTD and one (I250C) in the CRD (Fig. 5A) at high and low protein concentrations (I ox ). We also recorded HSQC spectra of samples with the same concentration of the wild-type or reduced MTSL protein for normaliza- ) that the radius of gyration (R g ) of galectin-3 is ϳ28.92 Å. The range chosen for the fitting is within the qR g limit (1.28). This experimental R g is smaller than that of a non-restrained ensemble (R g ϭ 34.61 Å) (D) but close to that of an ensemble with restrained spatial sampling (R g ϭ 27.66 Å) (E). A.U., arbitrary units.

Galectin-3 self-association
tion (I red ). The signal ratios (I ox /I red ) between the samples with oxidized and reduced MTSL are shown in Fig. 5 (B-F). For the 40 M samples (gray bars), the I ox /I red ratio is close to one for all residues, indicating negligible intermolecular interaction at this concentration (also confirming that MTSL did not introduce an extra intermolecular effect). In the high-concentration samples (black bars in Fig. 5 (B-F)), the signals from the NTD are weaker when labeled with oxidized MTSL on the NTD, suggesting that the NTDs interact intermolecularly ("inter-NN" interactions) (Fig. 5, B-E). A broad range of intensity bleaching in the NTD at these four well-separated spin-label sites suggests that the contact is fuzzy between the NTDs. In the CRD, on the contrary, the peaks whose signal is damped the most come from residues opposite to the carbohydrate-binding site (␤-strands 2, 7, 8, 9, and 11) in the A31C, A49C, and A100C instances (the level of bleached intensity is mapped onto the crystal structure in Fig. 5 (H-J)), indicating the presence of inter-NC contacts as well (Figs. 1D and 3). The MTSL label on the A10C mutant has less contact with the CRD (Fig. 5G), in agreement with our conclusion that the first 20 residues inhibit contact between the NTD and CRD (Fig. 3, M-P). Nevertheless, this part of the NTD is still involved in the inter-NN interactions, as shown in Fig.  5B. The MTSL label on the CRD (I250C; the ␤-strand 11) bleaches a broad range of intensity in the NTD, consistent with the model that the NTD contacts with the CRD fuzzily and that the first ϳ20 residues have less contact with this domain (Fig.   5F). Furthermore, no obvious intensity bleaching in the CRD indicates that intermolecular interaction between the CRDs is negligible (Fig. 5, F and K).

Hydrophobicity is the main force for the self-association
To investigate the driving force mediating the observed interactions, we tested the dynamics of the protein in different buffer conditions. Galectin-3 still self-associates when the concentrations of salt (100 mM NaCl) or ligand (250 mM glucose) are high (Fig. 6 (A-C); with averaged ⌬R 2 values of 11.7 Ϯ 1.9, 13.6 Ϯ 4.1, and 17.3 Ϯ 3.4 s Ϫ1 , respectively). In the presence of 0.8 M urea, on the contrary, the ⌬R 2 values between high-and low-protein concentration samples are substantially reduced (11.7 Ϯ 1.9 s Ϫ1 versus 7.0 Ϯ 1.4 s Ϫ1 ; Fig. 6D and supplemental Table S2); the HSQC peak intensity ratios are also closer to the molar ratio (0.1; Fig. 6H). In addition, the peaks for residues 200 -220 appear at positions closer to those observed in the CRD-only spectrum for both protein concentrations (Fig. 6I). These results indicate that hydrophobicity is the driving force for self-association because urea is known to disrupt the hydrophobic effect (46). We tried to increase the urea concentration (up to 4 M) to entirely disrupt these interactions, but the resulting spectra were of poor quality, with the protein having transitioned into the so-called "molten globule" (47) state (the purple spectrum in supplemental Fig. S6A). Nevertheless, the peaks in Fig. 6I are closer to the positions of the CRD-only construct

Galectin-3 self-association
from 0.8 to 4 M urea (supplemental Fig. S6, B-E), indicating weakened interactions between the NTD and CRD. We also compared ⌬R 2 values at different temperatures because the hydrophobic interaction is enhanced at higher temperatures (48). In keeping with our interpretation, the ⌬R 2 values are smaller in the low-temperature experiments, indicating a weaker self-association (Fig. 6J). The results from these temperature-dependent experiments are in agreement with those obtained using pulse field gradient NMR to measure the diffusion coefficient of self-associated galectin-3 (18).

The fuzzy interaction between the NTD and CRD
Unlike other members of the galectin family, galectin-3 is monomeric in solution with only one carbohydrate recognition site. How this molecule achieves its functional multivalency and which domain mediates self-association are subjects of much debate (9, 12, 14 -18). Mayo and co-workers' NMR studies (18,22) provide the opportunity to investigate this protein at a residue-specific level and offer the chance to close these debates. We recorded HSQC spectra at different galectin-3 concentrations to confirm the findings of Ippel et al. (18) that the chemical shifts of the residues around 200 -220 are concentration-dependent (Fig. 1D) and that the intensity ratio between different concentrations is not as expected (Fig. 1E). Accordingly, Ippel et al. (18) concluded that galectin-3 selfassociation occurs in the CRD rather than the NTD. This contradicts our findings that the NTD interacts with the CRD through a many-to-one binding, inter-or intramolecularly (Figs. 3 and 5). The chemical shift perturbation is undetected in the NTD, probably because every contact to one site of the CRD is dispersed to many sites of the NTD.
We used NMR spin dynamics experiments to characterize galectin-3 self-association (Fig. 2). The results obtained for NTD-truncated constructs prove that the intramolecular interaction between the NTD and the CRD is fuzzy in character (Fig.  3); no single site of the NTD is critical for its interaction with the CRD. SAXS analysis and computational modeling support this fuzzy interaction model (Fig. 4). PRE experiments show both intermolecular NTD-NTD and NTD-CRD interactions (Fig.  5). The similar PRE patterns obtained with the spin labels placed at four different sites are also consistent with fuzzy inter-

Galectin-3 self-association
actions between the NTDs (Fig. 5). The interaction can be disrupted using mild concentration of urea or be enhanced by increasing the temperature, suggesting that self-association is driven by hydrophobicity (Fig. 6). A model that summarizes these interactions is illustrated in Fig. 7.

The fuzzy interaction and functional implications may be linked through the liquid-liquid phase separation property of galectin-3
The low sequence complexity of the structurally disordered NTD is reminiscent of those recently reported proteins whose low-complexity domains mediate their liquid-liquid phase separation behavior in various cellular functions (49 -56). Indeed, we found that the NTD (as does a phosphomimetic mutant S6E) undergoes temperature-dependent phase separation (supplemental Fig. S7). As discussed in the recent review by Wu and Fuxreiter (57), fuzzy contact promotes reversible and dynamic assembly via transient and direct interactions and increases interaction affinities. The fuzzy interactions between the NTD and CRD or between the NTD and NTD might thus assist the formation of a higher-order assembly (i.e. may favor liquid-liquid phase separation) in galectin-3. The multivalency of galectin-3 to form galectin-glycan lattice may stem from its ability to undergo liquid-liquid phase separation. Regarding the pentameric model for galectin-3, we note that the situation in which galectin-3 precipitated in the study by Ahmad et al. (9) is similar to those in which proteins coacervated in the presence of counterions or nucleic acids (58). The phase-separated form observed here may be an alternative explanation for this protein's multivalency. These results do not discredit the pentamer model, however; in fact, liquid-liquid phase separation, in bringing the proteins together, may also assist the formation of a pentamer.

Conclusion
Our results show that galectin-3 self-associates via inter-and intramolecular NTD-CRD interactions and intermolecular NTD-NTD contacts. To the best of our knowledge, this is the first time these three types of interactions have been shown to occur in a fuzzy manner and to be driven by hydrophobicity. The fact that galectin-3 undergoes liquid-liquid phase separation sheds new light on the protein's function in its self-associated form. Our study also provides new insight on how galectin-3 self-associates when it binds to glycoconjugates, such as those present on the cell surface, resulting in aggregation of these glycoconjugates or the formation of galectin lattices.

DNA constructs
The plasmid containing a hexahistidine-tagged small ubiquitin-like modifier protein (His 6 -SUMO) in a pHD vector and the protease His 6 -Ulp1(403-621) were provided as a gift by Dr. T. F. Wang (Institute of Molecular Biology, Academia Sinica); the DNA construct design and protein purification protocol were adapted from Lee et al. (59). The full-length human galectin-3 gene was appended to the His 6 -SUMO tag using SfoI (at The remaining surface figures with NTDs of different lengths are shown in the same orientation, in white. At low protein concentrations, only intramolecular interactions between the NTD and the CRD occur. The shorter the construct is, the lower the chances are that the NTD interacts with the CRD (Fig. 3, K and Q). At high concentrations, there are both intra-and intermolecular contacts between the NTD and residues 200 -220 of the CRD (Figs. 1D and 3Q). Furthermore, the PRE experiments show that there are also intermolecular NTD-NTD contacts (Fig. 5). These contacts occur less frequently when the constructs are shorter, as evidenced by smaller chemical shift perturbations (Fig. 3I).

Galectin-3 self-association
the 5Ј-end) and XhoI (at the 3Ј-end) cutting sites and designed primers (see supplemental Table S3). The fusion construct (His 6 -SUMO-Gal3) was used as a basis for the following construct design. cDNA coding for the NTD alone was created by inserting a stop codon at the site corresponding to residue number 113. Truncated constructs (⌬ 1-10 , ⌬ 1-20 , etc.) were designed based on the FastCloning method (60) with designed primers (supplemental Table S3). Cysteine mutations were introduced using appropriate primers (supplemental Table S3). All of the resulting plasmids were fully sequenced.

Protein expression and purification
The protein expression scheme for the fusion construct is the same as our protocol for histidine-tagged proteins described in a previous publication (61). The supernatant of cell lysate was filtered (0.45 m) and loaded into a nickel-charged immobilized metal-ion affinity chromatography (IMAC) column (Qiagen, Inc.). The column was washed using 10 column volumes of 50 mM Tris-HCl with 300 mM NaCl at pH 7.5, and the bonded protein was eluted using 5 column volumes of the same buffer with an additional 500 mM imidazole. Imidazole was removed using a PD-10 column (GE Healthcare). His 6 -Ulp1(403-621) protease was added to the protein solution with a final concentration of 30 M and left at 4°C for 2 h to detach the His 6 -SUMO tag and galectin-3. The protease-digested solution was loaded into a nickel-charged IMAC column, and the flowthrough was collected (supplemental Fig. S8A). Protein purity was checked using SDS-polyacrylamide gels (supplemental Fig.  S8B). The collected protein was also loaded into a G75 gelfiltration column (GE Healthcare) with a FPLC system (supplemental Fig. S8C) or a PD-10 column to switch the buffer to a phosphate buffer (20 mM) at pH 6.8. The purified sample was frozen with liquid nitrogen and stored at Ϫ80°C until needed. Protease inhibitor (Roche Applied Science) was added before each experiment.
Our purification protocol differs from most of those published previously (5,18), in which galectin-3 is captured by lactosylated beads and eluted using a high concentration of lactose (250 -500 mM). In those instances, extensive dilution is required to avoid any spurious effects from trace amounts of lactose, or a small amount of lactose (e.g. 25 mM) was added to saturate its effect (18).

NMR experiments
All NMR experiments were performed at 303 K, unless otherwise stated, on Bruker AVIII 850-or 600-MHz spectrometers, both equipped with a TCI cryogenic probe. The 1 H-15 N HSQC spectra were collected using a standard pulse sequence with WATERGATE solvent suppression (62,63). The spin relaxation experiments were measured using a standard pulse sequence (23,64) with delay times of 17, 34, 51, 68, 85, and 102 ms to determine the 15 N R 2 and 100, 200, 300, 600, 800, and 1000 ms to determine the 15 N R 1 . Peak intensities were fitted to exponential decays with a Monte Carlo procedure to estimate fitting error. All dynamics data were collected in an interleaved manner with an interscan delay of 3 s.
All NMR data were processed using NMRPipe (65) and analyzed with SPARKY (66). Peak intensities and their errors were measured using non-linear line-shape analysis in NMRPipe (65). The peak intensities from a particular spectra were normalized to the corresponding number of scans for calculating their ratio with standard error propagation (67). The average chemical shift perturbation for a given peak was calculated using the following equation (68), where ⌬␦ H and ⌬␦ N are the chemical shift differences, respectively, in the proton and nitrogen dimensions, between two HSQC peaks.

MTSL labeling
The nitroxide spin-label MTSL (Toronto Research Chemicals) was attached to the thiol group of cysteine mutations. Using the same purification protocol as described above, after the His 6 -SUMO-Gal3 cysteine mutant was eluted from IMAC column, 5 mM tris(2-carboxyethyl)phosphine or 10 mM DTT was added to oxidize the thiol group. After the tris(2-carboxyethyl)phosphine or DTT was diluted 10 times, the sample was concentrated and loaded into a desalting PD-10 column to remove the reducing agent. The MTSL was added immediately in the flow-through of the desalting column to a final concentration of ϳ600 M (ϳ10 times the molar amount of protein). After reacting for 30 min, the MTSL was washed out using the PD-10 column. The MTSL-labeling efficiency was checked using Ellman's reagent (69,70). Protease Ulp1(403-621) was then used to disconnect His 6 -SUMO and MTSL-labeled galectin-3, and the subsequent purification steps were identical to those described above. The reduced-MTSL-labeled sample was prepared by adding 3 l of ascorbic acid from a stock solution to the NMR sample (with a final concentration 2 times that of the protein) or using the same concentration of wild-type sample as normal. The MTSL label cannot interact with the native cysteine at position 173 (in the middle of ␤-stand 5) because this cysteine's thiol group is buried in the CRD (confirmed by the Ellman's reagent (69,70)). We used the same MTSL-labeling procedure on 15 N-labeled wild-type galectin-3. The identical HSQC spectrum and intensity profile also prove that the native cysteine cannot be spin-labeled. Furthermore, the similar HSQC spectrum from a reduced-MTSL-labeled sample confirm that no extra interaction is introduced by the MTSL (supplemental Fig. S9). The NMR intensity ratios between the reduced-and oxidized-MTSL samples were mapped onto the crystal structure (Protein Data Bank code 2NMO) using an inhouse written script.

Dynamics analysis
The squared generalized order parameters (S 2 ) and internal correlation times of the amide N-H vectors ( i ) in the folded CRD were determined using the program Tensor2 (36).
The rotational correlation time of the well-structured CRD was calculated using Equation 2 (23),

Galectin-3 self-association
where N is the 15 N resonance frequency in hertz. The values used for R 1 and R 2 were the means of the corresponding values for residues 120 -200 and 200 -250, and the associated S.D. values were used as error estimates. The error for c was calculated using the standard error propagation procedure for functions of several variables (67), namely the following, where ␦R 1 and ␦R 2 are the errors derived from the experimental data.
The theoretical c of the CRD in full-length galectin-3 was predicted using the program HYCUD (39), starting from a structural model of the CRD taken from the Protein Data Bank (accession code 2NMO). The disordered domain was constructed using flexible-MECCANO (71-73), a program that builds up random coil structures using amino acid-specific conformational potentials. Two thousand conformers were created for this HYCUD analysis. The segmental size for the disordered domain was set to 10 amino acids in HYCUD. The program HYDROPRO (74) was used in conjunction with HYCUD to predict the hydrodynamic properties of the solute. The radius of the primary elements was set to 2.9 Å (74). The viscosity of the solvent was set to 0.8 mPa⅐s, the viscosity of water at 303 K (75).

SAXS experiments and analysis
SAXS data were collected at the 23A SWAXS end-station equipped with an on-line size exclusion-HPLC system (Agilent chromatographic system 1260 series) at the National Synchrotron Radiation Research Center (NSRRC), Taiwan (76). The sample solution (100 l of 10 mg ml Ϫ1 ; corresponding to the low protein concentration instances after HPLC dilution) was injected into the HPLC column with a flow rate of 0.35 ml min Ϫ1 and directed through a quartz capillary (2.0-mm diameter) thermostatted at 303 K for simultaneous SAXS and UVvisible absorption measurements at the same sample position with orthogonal incidences. SAXS data were collected with 1 data frame per 30 s using a Pilatus 1M-F area detector. Buffer solutions were measured under identical conditions for background scattering subtraction. With 15-keV X-rays (wavelength ϭ 0.8266 Å) and a sample-to-detector distance of 3.17 m, the scattering vector q, defined by 4 Ϫ1 sin with scattering angle 2, covered 0.007-0.25 Å Ϫ1 . Data were evaluated for radiation damage, background subtraction quality, and sample concentration effects, and well-overlapped SAXS profiles collected over the sample elution peak of HPLC were integrated for improved data statistics.
The data were analyzed using the ATSAS software package (77). The scattering intensity I as a function of the scattering vector q (Å Ϫ1 ) was replotted using q 2 ⅐I(q) versus q, forming Kratky plots (78), to demonstrate the degree of compactness of the sample. The scattering data were also replotted using the Guinier approximation (ln[I(q)] versus q 2 ) (78), where R g is the radius of gyration of the molecule, which can be obtained by linear regression (Fig. 4C). The Guinier approximation is valid only for very small angles (i.e. for q Ͻ 1.3/R g ) (78).

Modeling of conformational ensembles
Ten thousand randomized NTD conformers attached to the CRD were generated using flexible-MECCANO (71)(72)(73). The R g was calculated based on the root-mean-squared distances of all of the C␣ atoms from the center of mass of each conformer, where N is the total number of the C␣ atoms, and the vector r i represents the coordinates of atom i. Among these 10,000 conformers, around 400 having at least one pair of C␣ atoms Ͻ8 Å apart, one from residues 20 -100 and the other from residue 200 -220, were selected as the ensemble representing restricted spatial sampling. L. conceived the study, wrote the paper, and obtained funding. J. H. conceived the study, collected the data, analyzed the data, wrote the paper, and obtained funding.