Poly-Small Ubiquitin-like Modifier (PolySUMO)-binding Proteins Identified through a String Search*

Background: Sumoylation is recognized by proteins with SUMO-interacting motifs. Results: SUMO-interacting motifs were identified through a computational string search and validated in SUMO binding assays. Conclusion: Arkadia, FLASH, C5orf25, and SOBP all contain clustered SIMs. Significance: These proteins contain distinct SUMO binding structures responsible for the recognition of diverse forms of sumoylation. Polysumoylation is a crucial cellular response to stresses against genomic integrity or proteostasis. Like the small ubiquitin-like modifier (SUMO)-targeted ubiquitin ligase RNF4, proteins with clustered SUMO-interacting motifs (SIMs) can be important signal transducers downstream of polysumoylation. To identify novel polySUMO-binding proteins, we conducted a computational string search with a custom Python script. We found clustered SIMs in another RING domain protein Arkadia/RNF111. Detailed biochemical analysis of the Arkadia SIMs revealed that dominant SIMs in a SIM cluster often contain a pentameric VIDLT ((V/I/L/F/Y)(V/I)DLT) core sequence that is also found in the SIMs in PIAS family E3s and is likely the best-fitted structure for SUMO recognition. This idea led to the identification of additional novel SIM clusters in FLASH/CASP8AP2, C5orf25, and SOBP/JXC1. We suggest that the clustered SIMs in these proteins form distinct SUMO binding domains to recognize diverse forms of protein sumoylation.

Protein sumoylation is a covalent modification on lysine residues by small ubiquitin-like modifier (SUMO) 3 family proteins. It occurs through a biochemical pathway similar to ubiquitylation (ubiquitination), symbolized as an E1-E2-E3 catalytic cascade. Like ubiquitylation, sumoylation modifies many proteins in diverse cellular processes yet has distinct functional consequences (1,2). Sumoylation promotes a protein-protein interaction through the physical contact between SUMO and the SUMO-interacting motif (SIM), a ␤-strand consisting of hydrophobic (in the pattern of (V/I/L)(V/I/L)X(V/I/L), (V/I/ L)X(V/I/L)(V/I/L), or (V/I/L)X(V/I/L)X(V/I/L)) and acidic res-idues (3)(4)(5)(6)(7). As all SIMs are believed to dock onto the same site on SUMO, they constitute a single class of SUMO binding motif. This is in contrast to the multitude of structures recognizing ubiquitin at several surface sites (8).
SUMO-interacting motifs regulate the activity and specificity of sumoylation and are found in both the catalytic cascade (e.g. the PIAS family SUMO ligases) and the substrates (e.g. Bloom's syndrome helicase) (4, 7, 9 -11). The other prominent role of SIMs is to transduce the biological effects of sumoylation (3). For example, upon monosumoylation of human thymine DNA glycosylase, an intramolecular interaction between its SIM and the conjugated SUMO causes an allosteric inhibition of the enzyme's activity (12)(13)(14). A SIM in promyelocytic leukemia protein (PML) may facilitate the formation of PML nuclear bodies by promoting the self-assembly of sumoylated PML (15,16). The SUMO-modified PML bodies in turn recruit other SIM-containing proteins such as Daxx, thymine DNA glycosylase, and RNF4 (14,(17)(18)(19)(20). The SUMO-SIM interaction thus is crucial for the functional consequences of protein sumoylation.
To explore how the SUMO signal is recognized through SUMO-binding proteins, we previously identified a family of SUMO-targeted ubiquitin ligases (STUbLs), including RNF4 and its fission yeast homologs, Rfp1 and Rfp2 (21). Interestingly, RNF4 family proteins all contain multiple SIMs positioned closely as a cluster. This has raised the possibility that clustered SIMs are attuned to specifically recognize polySUMO chains (20,22). Although the significance of polysumoylation has been recognized, its function is yet to be well defined (23). Through RNF4-family STUbLs, polySUMO may be a signal for ubiquitylation and proteasome-dependent degradation (18,20,24,25). Likewise, additional SUMO-binding proteins with clustered SIMs may exist and mediate biological effects of polysumoylation other than protein turnover.
We learned from our work that SIM-containing proteins defy a common homology-based identification; although BLAST analysis revealed the similarity between Rfp2 and the mammalian RNF4 based on the sequences of their C-terminal RING domains, it failed to highlight the homology in their SIMs even though this is obvious to the naked eye (21). However, we reasoned from this experience that SIMs could still be compu-tationally identified through a simple motif scan. Similar approaches have been successful in the identification of consensus phosphorylation sites (hence the substrates) of protein kinases (26) as well as other short linear motifs for proteinprotein interaction (27). In fact, an in silico search with a limited scope has revealed atypical SIMs similar to the one in CoREST1 (in the pattern of (V/I/L)X(V/I/L)X(V/I/L)) (6). Here we report our discovery of novel SUMO-binding proteins through a systematic computational string search. In particular, we have searched for proteins containing multiple SIMs with sequence similarity to those in RNF4. We identified four mammalian proteins, Arkadia/RNF111, FLASH/CASP8AP2, C5orf25, and SOBP/JXC1, all containing clustered SIMs.

EXPERIMENTAL PROCEDURES
Reagents-The cDNAs of human and mouse Arkadia cDNAs (BC060862 and BC069835, respectively), human C5orf25 (BC037298), and mouse sine oculis-binding protein (SOBP) cDNA (BC059851) were obtained from the Mammalian Gene Collection and were further PCR-amplified for subcloning into appropriate expression vectors as indicated in the paper. Human FLASH cDNA was a gift from Prof. Odd Gabrielsen (University of Oslo). cDNAs of RNF4 and fission yeast Rfp1 and Rfp2 has been described previously (21). Mutants were generated through PCR amplification (Phusion, New England Biolabs) or PCR-based site-directed mutagenesis (QuikChange, Agilent). Recombinant tri-SUMO proteins (FLAG-3xSUMO1 and FLAG-3xSUMO2) are the fusion of an N-terminal FLAGtagged, full-length SUMO moiety followed by two copies of SUMO ⌬N mutant (SUMO1 ⌬N17 or SUMO2 ⌬N11 , respectively) and were purified as His 8 fusion proteins. Anti-GST/anti-GFP bivalent antiserum was raised in rabbit against a GST-GFP fusion protein. Anti-FLAG (M2) and anti-panSUMO antibodies were purchased from Sigma and Abgent, respectively. Poly-SUMO2 chains were purchased from BostonBiochem, Inc. All recombinant proteins were expressed in Escherichia coli strain BL21 (DE3) and purified through affinity purification using a His or GST tag following routine protocols. Human embryonic kidney 293T cells were maintained in Dulbecco's modified Eagle's medium supplemented with 10% fetal bovine serum.
String Search-A naïve string search script was written in Python 3 (see supplemental Scripts S1 and S2) to collect protein sequences that each contains more than one SIM-like motifs. Reference proteome sequence files were retrieved from NCBI or UniProt. Using (V/I)(V/I)(D/E)(V/I/L)(T/D/E) and (V/I)(V/ I)(V/I/L)(V/I/L)(D/E) as query sequences (as in supplemental Script S1), we obtained about 1500 single hits and about 80 multiple hits (supplemental Table S1). Using (V/I/L/F/Y)(V/ I)DLT as the query sequence (supplemental Script S2), we obtained about 200 total hits (supplemental Table S2). Both lists include redundant records or splicing isoforms.
In Vitro GST Pulldown Assay-Glutathione-agarose beads (10 l bed volume) were mixed with 10 g of GST fusion protein in 800 l of binding buffer (50 mM Tris-HCl pH 7.5, 400 mM NaCl, 1% Nonidet P-40, 0.1% sodium deoxycholate, 0.5% BSA) at 4°C for 30 min followed by the addition of 20 g of His-FLAG-3xSUMO1 (or 3xSUMO2). The mixture was incubated at 4°C for 1 h. The beads were washed 3 times with the binding buffer without BSA and boiled in SDS-PAGE sample buffer. The bead-bound GST-fusion proteins and FLAG-3xSUMO were visualized with anti-GST and anti-FLAG immunoblot.
Immunoprecipitation-293T cells growing in 60-mm dishes were transfected using the calcium phosphate precipitation method. For all immunoprecipitation assays the cells were treated with bortezomib (100 nM) for 16 h before cell lysis. Two days after transfection, total cell lysates were prepared with 600 l per 60-mm dish lysis buffer (50 mM Tris-HCl, pH 8.0, 400 mM NaCl, 0.5% Nonidet P-40, 0.1% sodium deoxycholate, with protease inhibitor mixture, and 2.5 mg/ml of N-ethylmaleimide) assisted with sonication bursts (3 ϫ 10 s using a microtip on a Branson Sonifer 450 with output control set at 1.5). Anti-GFP immunoprecipitation was carried out by incubating the cleared lysate with 2 l of anti-GFP serum for 1 h at 4°C followed by another 1-h incubation with protein A-agarose beads. For anti-FLAG immunoprecipitation, a 2-h incubation with anti-FLAG (M2)-conjugated agarose beads (Sigma) was applied instead. The immune complex-bound beads were washed three times with the lysis buffer and analyzed through SDS-PAGE and immunoblotting.

String Search Identified Proteins Containing Multiple
Putative SIMs-We conducted our string search in the form of a bioinformatics exercise; we wrote a Python script to run a "naïve string search" against a reference proteome data set, such as a NCBI RefSeq file for human proteins ( Fig. 1 and supplemental Script S1). As the consensus sequence of a SIM would match to a very large number of proteins, we only aimed for proteins with SIM-like sequences resembling those in RNF4. In particular, considering that all known SIM structures are ␤-strands, we have noticed that SIM2, -3, and -4, but not SIM1 in RNF4, favor a ␤-strand conformation as predicted by PELE, a collection of secondary structure prediction algorithms hosted on Biology Workbench (supplemental Fig. S1A). Prompted by this observation and also by the experimental evidence arguing for a dispensable SIM1 (20), we restricted our search criterion to sequences based on those of SIM2, 3, and 4 of RNF4, and thus conducted the string search using degenerate sequences (V/I)(V/I)(D/E)(V/I/L)(T/D/E) and (V/I)(V/I)(V/I/ L)(V/I/L)(D/E) as a start. This search resulted in ϳ80 proteins with at least two matches. Besides RNF4 itself and the hypothetical product of an RNF4 pseudogene ( Fig. 1 and supplemental Table S1), the list contained three previously reported SIMcontaining proteins: chromatin assembly factor 1 subunit A (CAF1p150/CHAF1A), Smad-interacting zinc finger protein 1 (SIZN1, also known as zinc finger CCHC domain containing protein 12 or ZCCHC12), and androgen receptor-interacting protein 4 (ARIP4, or RAD54-like protein 2) (supplemental Table S1 and Fig. S1B) (28 -30). While our algorithm did not automatically detect clustered SIMs, a list of 80 proteins as such was short enough for us to visually inspect each protein. As a result, we identified Arkadia/RNF111 with closely located SIMlike motifs as well as a C-terminal RING domain, making it a potential novel SUMO-targeted ubiquitin ligase ( Fig. 2A). Notably, secondary structure prediction by PELE strongly suggested that all the SIM-like motifs in Arkadia form ␤-strands, consistent with the idea that SIMs are acidic ␤-strands (supplemental Fig. S1A).
Arkadia Has a SUMO Binding Domain with Three SIMs-Our computational search identified two motifs, 300 VVVIE 304 and 382 VVDLT 386 in human Arkadia. Both motifs are surrounded by acidic residues as in a typical SIM. Interestingly, further inspection of the sequence in this region revealed two additional motifs, 274 EEDLFV 279 and 325 EVEIVTV 331 ( Fig. 2A). The first motif, 274 EEDLFV 279 , is also composed of hydrophobic and acidic residues, but it is not similar to any known SIMs and is not strictly conserved among Arkadia homologs. The second motif, 325 EVEIVTV 331 , which would not have been identified by the string search, partially resembles the SIM1 in RNF4. Sequence alignment indicates that 300 VVVIE 304 , 325 EVEIVTV 331 , and 382 VVDLT 386 are identical among all Arkadia proteins found in vertebrates and hence are designated as SIM1, SIM2, and SIM3. The motif 274 EEDLFV 279 upstream of SIM1 is designated as SIM0 ( Fig. 2A).
To validate the SUMO binding activity of this region in Arkadia, we carried out a number of protein binding assays. We first isolated an Arkadia fragment that covers the predicted SIM1-SIM3 (283-415) and conducted a GST pulldown against a linear tri-SUMO2 protein, which serves as a polySUMO mimetic. Specifically, to determine whether the SUMO binding was SIM-mediated, mutations were introduced in each SIM-like motif individually and in all four possible combinations (sim1 m , sim2 m , sim3 m , sim12 m , sim13 m , sim23 m , and sim123 m ) (Fig. 2, B and C). We found that among the three motifs, SIM3 is the most critical, as mutating SIM3 resulted in more significant loss of SUMO binding than mutating SIM1 or SIM2 (Fig. 2C, lane 4). In contrast, mutating SIM2 alone appeared to have no obvious impact on SUMO binding. The contribution of SIM1 to SUMO binding is intermediate between SIM2 and SIM3. Mutating SIM1 and SIM3 together totally abolished SUMO binding as did mutation of all three SIMs (Fig. 2C, lanes 6 and 8). Moreover, we found no difference between the fragment covering SIM1-3 (283-415) and a longer one including the upstream SIM0 (258 -415) (Fig. 2C, compare lanes 1 to 9 and lanes 4 to 10), indicating that SIM0 is not functionally significant. In conclusion, Arkadia contains a SIM cluster that recognizes poly-SUMO chains. Hereinafter, we refer to this unique SUMObinding structure as a SUMO binding domain (SBD).
Although the precise role of SUMO1 in polySUMO chain formation is still being investigated (31,32), recent identifica-tion of an inverted sumoylation consensus motif, (D/E)XK, raised the possibility that SUMO1-containing or SUMO1-exclusive polySUMO chains can form in vivo with Lys-17 (as in 14 GDKKEG 19 ) of SUMO1 serving as the branching site (33). To determine whether the Arkadia SBD has differential affinity toward SUMO isoforms (SUMO1 versus SUMO2/3), we compared linear tri-SUMO1 with linear tri-SUMO2 in parallel binding assays (Fig. 2, D and E). We found that both the Arkadia SBD and the full-length RNF4 protein associated with either tri-SUMO proteins equally well (Fig. 2D), suggesting that the SBDs from Arkadia and RNF4 are not selective against the two major SUMO isoforms. Moreover, when the same set of Arkadia SIM mutants were tested against tri-SUMO1, we obtained a binding profile nearly identical to that with tri-SUMO2 (compare Fig. 2, C and E). Therefore, among the three SIMs in Arkadia SBD, the same hierarchy of SUMO affinity exists regardless of SUMO isoforms, with SIM3 being the most critical and SIM2 being the weakest SIM for both SUMO1 and SUMO2.
In additional assays, we confirmed that the Arkadia SBD could specifically associate (a) with a ladder of authentic poly-SUMO2 chains in vitro ( Fig. 2F), (b) with both EGFP-SUMO1 and EGFP-SUMO2 monomeric fusion proteins in a bead halo binding assay (supplemental Fig. S2A) (34), (c) with a smear of high molecular weight SUMO conjugates in cultured cells (supplemental Fig. S2B), and (d) with SUMO1 in a yeast two-hybrid assay (supplemental Fig. S2C), all in a SIM1/3-dependent manner. Finally, to ascertain the SUMO binding activity of fulllength Arkadia proteins, we immunoprecipitated FLAGtagged, full-length Arkadia from 293T cells coexpressing an HA-tagged SUMO1 or SUMO2 and found that high molecular weight SUMO1 or SUMO2 conjugates specifically associated with wild-type Arkadia and that this association was dependent on an intact SIM cluster but independent of the RING domain ( Fig. 2G). We conclude that Arkadia is a SUMO-binding protein capable of recognizing both SUMO1 and SUMO2 through its SUMO binding domain.
Dominant SIMs in an SBD Are Formed by the VIDLT Motif-Our mutational analysis of the Arkadia SBD suggests that individual SIMs in a SIM cluster contribute differentially to SUMO binding (Fig. 2, C and E, and supplemental Fig. S2C). In fact, other clustered SIMs that we encountered previously, i.e. the ones in RNF4 and its fission yeast homologs Rfp1 and Rfp2, also display a two-tiered hierarchy in SUMO binding (supplemental Fig. S3) (20). The SIM3 in Arkadia, SIM2, and SIM3 together in FIGURE 2. Validation of SUMO-interacting motifs in Arkadia/RNF111. A, shown is the domain structure of Arkadia (top) and an alignment of the predicted SUMO binding regions of Arkadia proteins from mouse (Mus musculus NP_291082), human (Homo sapiens NP_060080), frog (Xenopus tropicalis NP_001072805), chicken (Gallus gallus NP_001186680), and zebrafish (Danio rerio XP_001922708). Solid line rectangles highlight the two motifs (SIM1 and SIM3) matching the initial search string; dashed line rectangles highlight two motifs (SIM0 and SIM2) found through visual inspection. B, shown is a list of Arkadia mutants used for testing SUMO binding. Mutation in each SIM replaced three core hydrophobic residues (Val, Ile, or Leu) with Ala. C, shown is the contribution of Arkadia SIMs to SUMO binding. Various GST fusion proteins as indicated were tested for their association with a FLAG-tagged tri-SUMO2 (FLAG-3xSUMO2) in a GST pulldown assay. The glutathione-agarose bead-bound proteins were detected with anti-GST and anti-FLAG antibodies in an immunoblot. The GST-RNF4 and its sim23 ⌬ , an internal deletion mutant lacking both SIM2 and 3, were used as positive and negative controls, respectively. D, shown is a comparison of the interaction of Arkadia and RNF4 with two major SUMO isoforms. A GST pulldown assay was used to compare the association of Arkadia SBD or RNF4 with either tri-SUMO1 or tri-SUMO2. E, shown is the contribution of Arkadia SIMs to SUMO binding. Various GST fusion proteins as indicated were tested for their association with a FLAG-tagged tri-SUMO1 (FLAG-3xSUMO1) in a GST pulldown assay conducted essentially as in C. IB, immunoblot. F, shown is SIM-dependent association of Arkadia with polySUMO chains. The GST pulldown assay was carried out essentially as in C but against the polySUMO2 ladder. The bead-bound proteins were analyzed with anti-panSUMO or anti-GST immunoblots as indicated. Detectable polySUMO2 species are labeled with the calculated length of SUMO2 units. G, shown is SIM-dependent association between Arkadia and high molecular weight sumoylated proteins in cultured cells detected through coimmunoprecipitation. FLAG-tagged full-length Arkadia in the indicated forms (WT; sim, sim13 m ; CS, RING domain mutant C971S) were coexpressed with HA-tagged SUMO1 or SUMO2 in 293T cells. Proteins in the anti-FLAG immune complex and in the cell lysate were analyzed in a 4 -12% Bis-Tris SDS-PAGE followed by immunoblotting using anti-FLAG and anti-HA antibodies as indicated.
RNF4, SIM2 in Rfp1, and SIM1 in Rfp2 are all essential for their SBDs to interact with SUMO and can provide significant SUMO binding affinity in the absence of the rest of the SIMs in each SBD. We thus call them "dominant SIMs." In contrast, the rest of the SIMs in each SBD all play an "accessory" role, as their mutation caused minimal or no reduction of SUMO binding ( Fig. 2C and supplemental Figs. S2C and S3). We conclude that the two-tiered SUMO affinity is a common phenomenon with clustered SIMs; each SUMO binding domain contains at least one dominant SIM that is more critical for SUMO binding than the rest of the SIMs. We suggest that the combined SUMO recognition by dominant and accessory SIMs provides an optimal avidity effect for targeting complex polySUMO structures.
Most dominant SIMs (except the SIM2 in Rfp1) contain a VIDLT core motif similar to the singular SIMs in PIAS family SUMO E3 ligases (Fig. 3A) (21), indicating that 1) such a high degree of sequence conservation implies a requirement for specific structural features in a dominant SIM and 2) the SUMO affinity of individual SIMs in an SBD largely relies on their particular sequence. These notions prompted us to conduct a more detailed mutational analysis to further ascertain the contribution of each residue of 382 VVDLT 386 in Arkadia SIM3. We replaced each of the five residues with Ala and found that indeed all five residues are essential for SUMO binding; in contrast, Val-387, the following C-terminal hydrophobic residue, is dispensable (Fig. 3B). Our observation thus complements that of Yuan Chen and co-workers (35), where they show that only a small set of amino acids is allowed in the PIAS1 SIM for high affinity SUMO binding. Together our results indicate that a precise structure is needed for maximum SUMO affinity of the VIDLT-type SIMs, which accepts only limited perturbations.
To further determine the prevalence of this structural rigidity among the VIDLT-type SIMs and to dissect the contribution of VIDLT residues to the recognition of different SUMO isoforms, we introduced more single-residue mutations in Arkadia SIM3; we replaced the 382 VVDLT 386 pentamer with YVDLT, VIDLT, VLDLT, VVELT, VVNLT, VVDIT, VVDLS, and VVDLV. Some of the substitutions, such as Val to Leu, Asp to Glu, or Leu to Ile, etc., are subtle and would otherwise be considered homologous in common sequence comparisons. We tested these mutants in parallel assays for their physical interaction with tri-SUMO1 and tri-SUMO2. We found that except for YVDLT and VIDLT, all other variants showed significant loss of affinity toward both SUMO1 and SUMO2, again indicating that a precise structure is needed for maximum SUMO affinity of the VIDLT-type SIMs (Fig. 3, C-E). Nevertheless, we did observe that certain mutations at Leu-385 and Thr-386 preserved significant residual affinity toward SUMO1 but not SUMO2 (Fig. 3, C and D, lanes 15-17). We suggest that this reflects a subtle difference between SUMO1 and SUMO2 in the shape of their SIM docking sites at the atomic level, which accounts for the selectivity between certain SIMs (of lesser SUMO affinity) and the different SUMO isoforms (5,35). It is also remarkable that the change from VVDLT to VIDLT enhanced SUMO binding (Fig. 3, C and D, comparing lanes 3  with 7; Fig. 3E), suggesting VIDLT is indeed the most fitted SIM.
Together, our data suggest that not all SIMs are identical in the spectrum of SUMO affinity and the VIDLT-type SIMs are perhaps best fitted for SUMO binding (Fig. 3F). Within the VIDLT pentamer, structural flexibility is allowed at the first position, where a bulkier hydrophobic side chain, such as Phe, Tyr, or Leu, in addition to Val or Ile, may also confer full affinity toward SUMO; in the second position, only Ile or Val is allowed; the rest of the pentamer, i.e. DLT, is essential (Fig. 3F).
Novel SUMO-binding Proteins with Clustered VIDLT-type SIMs-Our data together with the recent findings by Yuan Chen and co-workers (35), strongly argue that aromatic residues Phe or Tyr may also be allowed in the first position of a pentameric VIDLT-type SIM even though they are rarely seen in reported SIMs. We thus ran another focused string search for proteins containing multiple (V/I/L/F/Y)(V/I)DLT motifs (Fig.  4A, supplemental Script S2 and Table S2). This iteration led to only four hits: RNF4, human FLICE-associated huge protein (FLASH, also known as caspase 8-associated protein 2, CASP8AP2), uncharacterized chromatin 5 open reading frame 25 (C5orf25), and sine oculis-binding protein (SOBP, also known as Jackson circler protein 1, JXL1) (Figs. 4A and supplemental Fig. S5). As with the first string search, our algorithm did not automatically detect clustered SIMs, yet strikingly, the predicted SIMs are all clustered in these four proteins ( Fig. 4B and Fig. 5). Thus, both the sequence and the clustered presence of these motifs are highly indicative of important SUMO-binding structures.
In human FLASH, the string search identified three closely spaced motifs, 1683 YVDLT 1687 , 1737 FIDLT 1741 , and 1794 YIDLT 1798 , designated as SIM1, SIM3, and SIM4 (Fig. 4B). We also designated 1700 FIEVT 1704 as SIM2 and predicted that SIM2 would serve as a negative control in our validation even though it is similar to the other three perfectly matched motifs. We validated the SUMO binding of these putative SIMs by carrying out SUMO binding assays against tri-SUMO2 using a GST fusion of FLASH-(1661-1815). We found that the wildtype fragment bound to SUMO as robustly as the Arkadia SUMO binding domain. Except for SIM2, mutating SIM1, Ϫ3, or 4 alone all caused reduction of SUMO binding to a certain degree, with sim3 m resulting in the most severe loss followed by sim4 m (Fig. 4C). Next, we tested triple-SIM mutants with only one SIM left intact and found that all triple-SIM mutants showed diminished SUMO association (Fig. 4D), among which sim124 m , with an intact SIM3, preserved the most significant amount of residual SUMO affinity, just as predicted by the binding result with single-SIM mutants. Moreover, sim134 m and sim234 m are incapable of SUMO binding as is the quadruple mutant sim1-4 m , indicating that SIM3 and SIM4 together contribute to the majority of the SUMO binding affinity of FLASH-(1661-1815), whereas SIM2 is indeed insignificant as expected. This is further confirmed by the result of two double-SIM mutants, sim12 m and sim34 m , with the former mutant retaining robust SUMO binding, whereas the latter being completely inactive (Fig. 4D). Last, a slightly larger FLASH fragment (1581-1884) containing the SUMO binding domain could specifically associate with high molecular weight SUMO1 or SUMO2 conjugates in cultured cells in a SIM1/3/4-dependent manner (Fig.  4E). We thus identify FLASH as a novel SUMO-binding protein.

Clustered SIMs Found by String Search
Its SBD contains a cluster of three functional SIMs, all with a distinct signature led by the aromatic residues Phe or Tyr.
As with the Arkadia SIM3, we also examined the structural rigidity of the most critical SIM, SIM3 ( 1737 FIDLT 1741 ) in FLASH by testing single-residue mutations (at a smaller scale, though); we replaced the FIDLT pentamer with VIDLT, FIELT, FIDIT, and FIDLS in a sim1 m background (with SIM4 kept intact for a proper dynamic range in detecting the SUMO binding). Consistently, we found that whereas the VIDLT mutant was fully functional, the other three mutations resulted in a significantly diminished SIM3 (Fig. 4F). Our data suggest that FIDLT indeed belongs to the VIDLT-type SIMs with a highly  (77). B, all the five core residues in Arkadia SIM3 (VVDLT) are essential for SUMO2 binding. The SIM3 residues Val-382, Val-383, Asp-384, Leu-385, Thr-386, and Val-387 were each replaced with Ala in a sim12 m background as indicated. Binding assays were conducted as described in Fig. 2C. IB, immunoblots. C, a rigid structure of VIDLT-type SIMs allowing minimum variations for full strength SUMO1 binding. Additional Arkadia SIM3 mutants, V382Y, V383I, V383L, D384E, D384N, L385I, T386S, and T386V were generated in a sim12 m background as indicated. Binding assays were performed as described in Fig. 2C. Examples of purified GST fusion proteins are shown in supplemental Fig. S4. D, a rigid structure of VIDLT-type SIMs allowing minimum variations for full strength SUMO2 binding. Binding assays were performed in parallel to C. E, shown is a summary of single-residue mutations tested in C and D. Mutated residues are indicated in bold; mutations that retained wild-type affinity for SUMO are highlighted. Quantitative densitometry obtained from C and D measures the bound tri-SUMO (FLAG-3xSUMO1 or Ϫ3xSUMO2) to the respective GST fusion proteins and is shown as the percentage relative to that of wild-type SIM3. F, shown is a diagram indicating a sequence-dependent SUMO affinity of all putative SIMs (upper panel) and the limited sequence space for the VIDLT-type SIMs (bottom panel).
conserved atomic structure and confirm that motifs with the (V/I/L/F/Y)(V/I)DLT sequence represent SIMs with high SUMO affinity.
In C5orf25, we designated the two identified motifs, 26 FIDLT 30 and 45 VIDLT 49 , as SIM1 and SIM2 (Fig. 5A) and tested the GST fusion proteins of C5orf25-(1-74) and its sim1 m , sim2 m , and sim12 m mutants in the tri-SUMO2 binding assays. We found that SIM1 and SIM2 contributed equally and together are essential for SUMO-binding (Fig. 5B). Consistently, the association of full-length C5orf25 with high molec-ular weight SUMO1 or SUMO2 conjugates in 293T cells required the simultaneous presence of its SIM1 and SIM2, suggesting that C5orf25 contains an N-terminal SBD with two VIDLT-type SIMs (Fig. 5C).
In SOBP, we isolated a 77-amino acid fragment (SOBP 600 -676) containing the predicted SIM1 ( 620 VVDLT 624 ) and SIM2 ( 653 VIDLT 657 ) and its mutant forms (sim12 m , sim1 m , and sim2 m ) as GST fusion proteins (Fig. 5D). In the SUMO binding assay, essentially parallel to previous ones, we found that the wild-type fragment bound to tri-SUMO in a similar fashion to Clustered SIMs Found by String Search DECEMBER 7, 2012 • VOLUME 287 • NUMBER 50 the Arkadia SBD. Unlike C5orf25, here the two SOBP SIMs appeared to function differently; whereas the simultaneous mutation of both SIM1 and SIM2 abolished SUMO binding, mutating SIM2 resulted in a more significant reduction of SUMO association than mutating SIM1 (Fig. 5E). Together, we conclude that a functional SBD with two VIDLT-type SIMs exists in both C5orf25 and SOBP.
In conclusion, a detailed structural analysis of the Arkadia SBD has led us to the computational identification of additional novel SBDs in FLASH, C5orf25, and SOBP. All three SBDs contain clustered VIDLT-type SIMs, indicating that these SBDs, like the ones in RNF4 and Arkadia, are crucial SUMO binding structures responsible for the recognition of diverse forms of sumoylation (Fig. 5F).

DISCUSSION
SIM Prediction-We have successfully identified novel SUMO-binding proteins using a simple computational search based on a single criterion; that is, the sequence similarity to existing SIMs. As all known structures of SUMO-bound SIMs are ␤-strands (5,7,9,12), we have used secondary structure prediction to add confidence to our SIM search (supplemental Fig. S1A). Consistently, we note that besides Val and Ile, Thr is also frequently associated with known SIMs, most likely because as ␤-branched amino acids, Thr, Val, and Ile all have the greatest propensity to form a ␤-strand (36,37). Therefore, for future improvements, secondary structure prediction algorithms may be combined with the string search to achieve a more accurate SIM prediction. A more comprehensive protocol for computational SIM prediction has also been described recently (38). In either case, it may be difficult to balance accuracy and thoroughness in such a motif scan. For example, without filtering, certain giant proteins such as Titin were found in our search presumably because of the abundance of ␤-strands in their structure. As another example, the two SIM-like motifs found in small CTD phosphatase 3 (supplemental Table S1) are buried inside the protein according to its crystal structure (PDB 2HHL) (39) and are, therefore, unlikely to be exposed for SUMO interaction unless the protein is unfolded. Above all, a SIM prediction is more meaningful when closely tied to careful experimental validation.
Various Flavors of SIM Sequences-The three SIMs in Arkadia come in different flavors: SIM1 (VVVIE) resembles RNF4 SIM4 (VVIVD), SIM2 (VEIVTV) is similar to RNF4 SIM1 (IEL-VET), and SIM3 (VVDLT) is almost identical to both SIM2 (IVDLT) and SIM3 (VVDLT) in RNF4. Arkadia SIM1 and SIM3 also match the previous designation of SIM subtypes, SIM-a and SIM-b, respectively, whereas SIM2 appears to be rather a bidirectional SIM than a pure SIM-r (40,41). Differences in sequence may contribute to the selectivity of individual SIMs toward different SUMO isoforms (5,35). Consistently, some of our Arkadia SIM3 mutants showed better residual interaction with SUMO1 than with SUMO2 (Fig. 3). On the other hand, with the intact Arkadia or RNF4 SBD, we did not observe any binding preference toward either SUMO1 or SUMO2 (Figs. 2 and 3, and supplemental Fig. S2A), indicating that the Arkadia SIMs together can recognize both SUMO1 and SUMO2-conjugated proteins. Moreover, the dominant SIMs in both RNF4 and Arkadia are the VIDLT-type (or the SIM-b subtype) SIMs that are also found in the PIAS SIMs, which display a high affinity to both SUMO1 and SUMO2/3 (Figs. 2 and 3) (5, 35), again suggesting that clustered SIMs with a VIDLT-type SIM can potently target both SUMO1 and SUMO2/3 isoforms.
It is striking that the VIDLT pentamer maintains a rigid structure; even a subtle variation would lower its affinity for SUMO, except for the first position where it can accommodate aromatic residues Phe and Tyr (Fig. 3) (35). This indicates that not all SIMs (or the permutations of SIM consensus sequence) are equal; some of our single-residue mutations in Arkadia SIM3 resulted in dramatic reduction of SUMO binding yet still matched the SIM consensus sequence. We believe that the VIDLT-type SIMs make the best fitted structure for the interaction with all SUMO isoforms and are likely preserved as a key feature of the sumoylation system during evolution (35). Likewise, chemical mimetics of VIDLT may disrupt overall SUMO-SIM interaction and impair the activity of sumoylation essential for living cells, e.g. during their recovery from genotoxic stress (37). We further speculate that the dominant SIMs drive a "zipping-up" process during the recognition between clustered SIMs and multiple SUMO moieties, which likely initiates in a dynamic manner before adopting a stable configuration. and its mutants as indicated were purified as GST fusion proteins and tested for their association with FLAG-3xSUMO2. Analogous to the mutational analysis with Arkadia shown in Fig. 2B, single SIM mutations (sim1 m , sim2 m , sim3 m , sim4 m ) were generated by replacing all three core hydrophobic residues (Tyr/Phe, Val/Ile, and Leu) with Ala in each of the four putative SIMs. Binding assays were carried out as in Fig. 2C. The GST-Arkadia-(283-415) and its sim123 m mutant were used as positive and negative controls, respectively. Dual color immunoblot (anti-GST and anti-FLAG) on a single membrane was processed through the Odyssey Infrared Imaging System. Quantitative densitometry is shown in the bar graph below as the ratio of each FLAG-3xSUMO2 band to its corresponding GST band. IB, immunoblot. D, shown is the contribution of individual SIMs in FLASH SUMO binding domain. The FLASH mutants with only one (sim234 m , sim134 m , sim124 m , sim123 m ) or none (sim1-4 m ) of the four SIMs left intact as well as two double SIM mutants (sim12 m , sim34 m ) were tested for SUMO binding as in C. Examples of purified GST-fusion proteins are shown in supplemental Fig. S4. E, shown is the SIM-dependent association between the FLASH SBD and high molecular weight sumoylated proteins in cultured cells. EGFP-tagged fragment of FLASH (residues 1581-1884, either wild-type or sim134 m mutant) were coexpressed with HA-tagged SUMO1 or SUMO2 in 293T cells as indicated. Immunoprecipitation (IP) was carried out with anti-GFP antiserum and protein A-agarose beads. Proteins in the precipitated immune complex and in the cell lysate were analyzed in an 8% Tris-glycine SDS-PAGE followed by immunoblotting using anti-GFP and anti-HA antibodies as indicated. The stacking gel was preserved during immunoblotting in order to detect high molecular weight SUMO conjugates that could not enter the resolving gel. vect, EGFP alone; sim, sim134 m . *, it is unclear why the sim134 m fragment showed a different migration (as also seen with some other SIM mutants in this study); this may reflect a conformational change due to the loss of intact SIMs. F, a precise structure of FLASH SIM3 allows minimum variations for full strength SUMO binding. Single-residue replacements in FLASH SIM3, F1737V, D1739E, L1740I, and T1741S were generated as indicated in a sim1 m background. Binding assays were performed as in C and D.
Significance of Clustered SIMs-The tandem array of SIMs in RNF4 family proteins is thought to specifically recognize poly-SUMO chains (20). It is worth noting that the spacing from SIM1 to SIM2 in Rfp1 or Rfp2 is ϳ20 amino acids because a similar spacing is also found between the tandem ubiquitininteracting motifs in a number of polyubiquitin-binding proteins (42). This is also about the same distance between RNF4Јs SIM1 and SIM3 or its SIM2 and SIM4 and also between the Clustered SIMs Found by String Search DECEMBER 7, 2012 • VOLUME 287 • NUMBER 50 SIM1 and SIM2 in C5orf25. This particular spacing may contribute to the optimal recognition of polySUMO chains, as with the tandem ubiquitin-interacting motifs for polyubiquitin chains (42,43). Although a structural model is lacking for the interaction between a SIM cluster and a polySUMO chain, one can speculate that two adjacent SUMO units form physical contacts with either the SIM1/3 or the SIM2/4 pair in a stable RNF4-polySUMO chain complex.
Besides polySUMO interaction, clustered SIMs may also target other forms of sumoylation, such as a protein or protein complex that is monosumoylated at multiple sites (Fig. 5F). The SBDs in Arkadia and FLASH may be particularly fitted for this kind of SUMO binding, where the SIMs are further separated and presumably more flexible. The avidity effect due to multiple SIM-SUMO contacts may thus provide sufficient affinity and specificity for the formation of sumoylation-regulated protein complexes, which would be distinct from those involving a singular SIM and often a secondary, SUMO-independent physical contact (44,45). Furthermore, it is also conceivable that clustered SIMs form a folded ␤-sheet (Fig. 5F). It then follows that there may exist a dynamic balance between a folded, standalone ␤-sheet and an open, less ordered conformation with multiple extended SIMs in complex with poly/multi-SUMO. Given its unique structure, we propose that a SIM cluster or a tandem array of closely located SIMs be designated as an SBD. More detailed structural and functional characterization will be needed for us to understand how SIM-containing proteins interpret the SUMO signal.
Potential Biological Function of SBDs-Arkadia is known to act as a RING domain ubiquitin ligase to promote transcriptional activation downstream of the TGF␤ signaling pathway, especially during early embryonic development in vertebrates (46 -49). Arkadia may also influence the activity of other signaling pathways, as Arkadia was found to interact with Axin1, a canonical component of the Wnt pathway (50). The identification of Arkadia as a SUMO-binding protein in the TGF␤ pathway raises the possibility that Arkadia may interact with a sumoylated protein specific to this signaling pathway, such as Smad4 or SnoN (51)(52)(53)(54), or with a poly-or multisumoylated transcriptional repressor complex (6, 55). We will describe in a separate paper that the combined activity of SUMO binding and RING domains in Arkadia provides a novel SUMO-targeted ubiquitin ligase specific to the TGF␤ pathway. 4 The SUMO binding domain of FLASH is in a region previously found to interact with the tandem death effector domains (DEDs) of procaspase-8 and was named "DED-related domain" (DRD) (56). However, the sequence similarity between FLASH and CED-4/Apaf-1, its alleged functional homologs, has been disputed (57). Thus, the precise role of FLASH in apoptosis is still unclear. We note that Ubc9, the SUMO-conjugating enzyme (E2), was identified in addition to FLASH in the initial yeast two-hybrid screen against the procaspase-8 DEDs and that the FLASH SBD itself has also been shown to form a yeast two-hybrid interaction with Ubc9 and PIAS1 (56,58,59). It may not be coincidental that both SUMO1 itself (also known as Sentrin) and TTRAP/TDP2 and Daxx, two SIM-containing proteins, were initially found as death domain-binding proteins also through yeast two-hybrid screens against death receptors Fas/CD95 or TNFR/CD40 (5, 19, 60 -63). As both death domain and DED belong to the death domain superfamily and form oligomers (78), perhaps using an oligomerization domain as bait in a yeast two-hybrid screen is prone to hitting the SUMO pathway components and SUMO-binding proteins. This raises an intriguing possibility that protein oligomerization triggers sumoylation and may also explain the early identification of Ubc9 (UbE2I) and SUMO1 from a two-hybrid screen against Rad52, a protein also forming homotypic polymers (64 -67). More recently, FLASH was shown to localize to Cajal bodies and function in transcriptional control or RNA processing (68 -71). It is conceivable that the SBD in FLASH contributes to sumoylation-regulated assembly of nuclear complexes such as Cajal bodies or is responsible for the communication between Cajal body and sumoylated nuclear entities such as PML bodies (3,70,72).
SOBP is a nuclear zinc finger protein whose molecular functions are largely unknown. Recessive mutations of mouse Sobp gene cause defective patterning of sensory epithelium and deformed organ of Corti in the inner ear, resulting in a deafness phenotype known as Jackson circular (jc) (73). An SOBP truncating mutation lying immediately after the two SIMs has also been associated with defects in the development of the nervous system in humans (74). SOBP 4 H. Sun, Y. Liu, and T. Hunter, manuscript in preparation. . B, validation of the C5orf25 SUMO binding domain. C5orf25-(1-74) and its mutants as indicated were purified as GST fusion proteins and tested for their association with FLAG-3xSUMO2. Binding assays were carried out as in Fig. 2C. The GST-FLASH-(1660 -1885) and its sim1-4 m mutant were used as positive and negative controls, respectively. Examples of purified GST-fusion proteins are shown in supplemental Fig. S4. IB, immunoblots. C, SIM-dependent association between C5orf25 and high molecular weight sumoylated proteins in cultured cells detected through coimmunoprecipitation. FLAG-tagged full-length C5orf25 in indicated forms (WT; sim, sim12 m ) were coexpressed with HA-tagged SUMO1 or SUMO2 in 293T cells. Immunoprecipitation was carried out with anti-FLAG (M2)-conjugated agarose beads. Proteins in the precipitated immune complex and in the cell lysate were analyzed in an 8% Tris-glycine SDS-PAGE followed by immunoblotting using anti-FLAG and anti-HA antibodies as indicated. The stacking gel was preserved during immunoblotting to detect high molecular weight SUMO conjugates that could not enter the resolving gel. D, shown are predicted SIMs in SOBP. Top panel, the relative position of predicted SIMs in SOBP; middle panel, the sequence of SOBP residues 600 -676 with the two SIMs ( 620 VVDLT and 653 VIDLT) highlighted; bottom panel, SIM mutants of SOBP were generated by replacing three hydrophobic residue in each or both of the two putative SIMs (sim1 m , V620A/V621A/L623A; sim2m, V653A/I654A/ L656A). E, SOBP SIMs show differential contributions to SUMO binding. Wild-type and mutant forms of SOBP fragment 600 -676 as indicated were purified as GST fusion proteins. The binding assay was performed essentially as described in Fig. 2C. GST-Arkadia-(258 -415) and its sim3 m mutant (see Fig. 2C) were included as positive and negative controls. F, a diagram illustrating diverse forms of protein sumoylation that can be recognized by SUMO binding domains containing clustered SIMs. may act as a critical transcription factor for neural development in mammals. Sumoylation is well known for its role in transcriptional repression through modification of certain transcription factors and SIM-mediated assembly of nuclear protein complex (55). Future studies will resolve how sumoylation regulates the function of SOBP through its SUMO binding domain.
In summary, we have found a number of SUMO-binding proteins with unique SIM compositions through a data mining approach. In contrast to most SUMO-binding proteins, which often have a singular, stand-alone SIM, the proteins described here all contain SUMO binding domains with clustered SIMs and are likely crucial components for SUMO recognition in diverse biological processes.