In Vivo Identification of Sumoylation Sites by a Signature Tag and Cysteine-targeted Affinity Purification*

Small ubiquitin-like modifier (SUMO) is conjugated to its substrates via an enzymatic cascade consisting of three enzymes, E1, E2, and E3. The active site of the E2 enzyme, Ubc9, recognizes the substrate through binding to a consensus tetrapeptide ΨKXE. However, recent proteomics studies suggested that a considerable part of sumoylation occurs on non-consensus sites. Current unbiased sumoylation site identification techniques typically require high stoichiometry in vitro sumoylation, mass spectrometry, and complex data analysis. To facilitate in vivo analysis, we have designed a mass spectrometric method based on an engineered human SUMO-1 construct that creates a signature tag on SUMO substrates. This construct enables affinity purification by covalent binding to cysteine residues in LysC/trypsin-cleaved peptides and site identification by diglycyl lysine tagging of sumoylation sites. As a proof of concept, site-specific and substrate-unbiased in vivo sumoylation analysis of HeLa cells was performed. We identified 14 sumoylation sites, including well known sites, such as Lys524 of RanGAP1, and novel non-consensus sites. Only 3 of the 14 sites matched consensus sites, supporting the emerging view that non-consensus sumoylation is a common event in live cells. Six of the non-consensus sites had a nearby SUMO interaction motif (SIM), which emphasizes the role of SIM in non-consensus sumoylation. Nevertheless, the lack of nearby SIM residues among the remaining non-consensus sites indicates that there are also other specificity determinants of non-consensus sumoylation. The method we have developed proved to be a useful tool for sumoylation studies and will facilitate identification of novel SUMO substrates containing both consensus and non-consensus sites.

Sumoylation is a post-translational modification that consists of covalent conjugation of the small ubiquitin-like mod-ifier (SUMO) 5 to substrate proteins and results in altered activity of the substrate. Sumoylation influences a plethora of cellular processes, including transcriptional regulation of gene expression and genome integrity (1). SUMO conjugation involves an enzymatic cascade employing three enzymes: activating E1, conjugating E2, and ligating E3 (2). Because the active site of the single E2, Ubc9, recognizes the substrate through binding to a consensus tetrapeptide ⌿KXE (⌿, a hydrophobic amino acid; K, the target lysine, X, any amino acid, and E, glutamic acid), SUMO acceptor sites are to date predominantly identified through mutagenesis of target lysine residues on consensus tetrapeptides (3). Recent proteomics studies by us and others have shown that a considerable proportion of sumoylated proteins do not contain the consensus sites (4 -6) and are unreachable by the conventional mutagenesis approach. Furthermore, a model for non-consensus SUMO targeting has been proposed, where SUMO-Ubc9 thioester is recruited by a SUMO interaction motif (SIM) located on the substrate (7). However, the mechanisms for targeting non-consensus substrates remain largely unknown. Thus, novel tools that allow unbiased identification of sumoylation sites are urgently needed.
A proteomics approach, such as tryptic digestion of proteins followed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) and database search, has often been used for identification of post-translational modification sites in an unbiased manner (8 -10). In the case of sumoylation, the site identification has relied upon the high stoichiometry obtained by in vitro sumoylation and target-specific data inspection (11)(12)(13). Tryptic digestion of mammalian sumoylated proteins results in peptides with a long side chain on the target lysine residues (see Fig. 1A). The commonly used proteomics techniques are not readily applicable for identification of these branched peptides, which show highly complicated fragmentation patterns in MS/MS analyses when compared with linear peptides. One way to circumvent this problem is a threonineto-arginine substitution (T95R) on the C terminus of human SUMO-1 (14). The arginine residue leads to a tryptic cleavage of the long side chain, whereas two glycine residues remain on the lysine residue as a signature tag (ϩ114.0429 Da), which represents the sumoylation site. This approach is still challenging due to poor detection sensitivity and lack of specific purification techniques for the peptides containing the sumoylation sites (sumoylated peptides) in complex tryptic digests. To facilitate the in vivo identification of sumoylation sites, we have developed a novel combined strategy using a further engineered human SUMO-1 construct. In addition to the easily detectable signature tag, this strategy also provides enrichment capability of sumoylated peptides. Application of our substrateunbiased approach on HeLa cells resulted in identification of both consensus and non-consensus sumoylation sites, providing novel insights into SUMO target recognition.

EXPERIMENTAL PROCEDURES
Plasmid Constructs, Cell Culture and Transfection, Western Blotting, His Purification of Sumoylated Proteins, and Verification of Novel Sumoylation Site-The experimental procedures are described in the supplemental Experimental Procedures.
LysC Digestion-The purified sumoylated proteins (92 g in 470 l) were reduced for 1 h at 37°C by the addition of 20 mM dithiothreitol (DTT; one sample volume), basified by the addition of 200 mM Tris base (one sample volume), and then digested overnight (ϳ20 h) at 37°C by the addition of LysC (4.6 g; Wako) in H 2 O (one sample volume). The resulting solution consisted of 50 mM Tris, 25% phosphate-buffered saline, 2 M urea, 5 mM DTT (pH ϳ9). To quench the digestion, the solution was acidified with 10% trifluoroacetic acid to a final concentration of 1%. The resulting LysC digest was desalted with three pieces of OMIX pipette tips C18 (Varian) according to the instructions of the manufacturer. Briefly, an aliquot of the digest was loaded onto the OMIX tip that was pretreated with acetonitrile (ACN; 2 ϫ 100 l) and 1% trifluoroacetic acid (2 ϫ 100 l). After washing the OMIX tip with 1% trifluoroacetic acid (2 ϫ 100 l), the LysC digest was eluted with 0.1% trifluoroacetic acid, 80% ACN (100 l). Three eluates were combined in a 1.5-ml tube and then evaporated.
Enrichment of Cysteine-tagged Sumoylated Peptides-The enrichment method using Thiopropyl-Sepharose 6B beads (GE Healthcare) was developed based on previous reports by Liu et al. (15,16). The desalted LysC digest was reduced for 1 h at 37°C by the addition of 5 mM DTT in a Tris-HCl/EDTA buffer (50 mM Tris-HCl, 1 mM EDTA, pH 7.4; 10 l). To capture cysteine-containing peptides, the reduced digest was shaken for 1 h with the Thiopropyl-Sepharose 6B beads that were preequilibrated and suspended in the Tris-HCl/EDTA buffer (ϳ50 l of beads in 100 l). After spinning the tube, unbound peptides were removed as a supernatant. The beads were washed by rotation for 10 min with the Tris-HCl/EDTA buffer (500 l), 2 M NaCl (500 l), 0.1% trifluoroacetic acid, 80% ACN (500 l), and then the Tris-HCl/EDTA buffer (500 l). After removal of a supernatant, the beads were shaken for 1 h at 37°C with modified trypsin (920 ng; Promega) in the Tris-HCl/ EDTA buffer (50 l) and then incubated at 37°C overnight (ϳ20 h) to perform on-bead digestion. The resulting tryptic digest, which was expected to contain sumoylated peptides, was collected as the supernatant, further eluted by shaking the beads for 10 min with the Tris-HCl/EDTA buffer (2 ϫ 50 l), and then combined in a tube. Although the tryptic digest was not expected to include cysteine-containing peptides, this digest was reduced for 1 h at 37°C by the addition of 100 mM DTT in 50 mM Tris-HCl buffer (pH ϳ9; 15 l) and then alkylated for 30 min at room temperature in the dark by the addition of 500 mM iodoacetamide in the Tris-HCl buffer (15 l). The alkylation was quenched by the addition of 500 mM DTT in the Tris-HCl buffer (15 l) to prevent side reaction from excess iodoacetamide (17). After acidification with 10% trifluoroacetic acid (15 l), the treated tryptic digest was desalted according to Rappsilber et al. (18) with slight modification. Briefly, a C18 microcolumn was made with three pieces of Empore C18 disk (3M) packed into a 200-l pipette tip. The tryptic digest was loaded onto the C18 microcolumn that was pretreated with ACN (50 l) and 0.1% formic acid (50 l). After washing the column with 0.1% formic acid (5 ϫ 50 l), the tryptic digest was eluted with 0.1% formic acid, 80% ACN (50 l) and then evaporated. After reconstitution with 0.1% formic acid, an aliquot (8.5% v/v) and another aliquot (85% v/v) of the desalted tryptic digest were evaporated for LC-MS/MS analysis and fractionation, respectively.
LC-MS/MS Analysis-Each half of the samples was analyzed by LC-MS/MS using either a QSTAR Pulsar hybrid quadrupoletime-of-flight tandem mass spectrometer (Applied Biosystems) or an LTQ Orbitrap XL hybrid linear ion trap-orbitrap mass spectrometer (Thermo Fisher Scientific). The detailed procedures are described in the supplemental Experimental Procedures.
MS/MS spectra of sumoylated peptides suggested with a Mascot expectation value less than 0.05 were inspected manually for identification. The QSTAR MS/MS spectra were centroided before the manual inspection. In case sumoylated peptides were identified with only one of the mass spectrometers, the identical peptides suggested with the other mass spectrometer were inspected manually although the expectation value was not less than 0.05. A sumoylated peptide containing a known sumoylation site was manually inspected although the expectation value was high. Sumoylated peptides, which showed slightly high expectation values (i.e. a Mascot score Ͼ15), were considered as candidates and inspected manually. One of the candidate peptides, which showed well annotatable fragment ions and reasonable fragmentation pattern, was subjected to biological validation.

RESULTS AND DISCUSSION
Enrichment Method for Sumoylated Peptides-To facilitate in vivo identification of sumoylation sites of human proteins, we have developed an enrichment method based on a modified, cysteine-tagged form of SUMO-1 (Fig. 1B). Two proteases are used. LysC digestion provides a cysteine-containing branched peptide that can be specifically retained by covalent binding to a thiol-specific resin (Thiopropyl-Sepharose 6B, GE Healthcare), and trypsin releases the target peptide from the immobilized cysteine tag. As a result, peptides containing an extra mass of two glycine residues on a lysine residue are obtained, allowing identification of sumoylation sites by mass spectrometry. To create cysteine-tagged SUMO-1 (SUMO-1C), we mutated several residues of His-SUMO-1 and tested these mutants using human PARP-1 as a model substrate (4,20,21). PARP-1 is a stress-inducible SUMO substrate, and for practical reasons, we used a DNA-binding mutant PARP-1 H53R that is moderately sumoylated also under non-stress conditions. 6 Mutation of some residues, i.e. Ile 88 and Glu 93 , in SUMO-1 resulted in a loss or severely compromised PARP-1 sumoylation, whereas others left its sumoylation intact. These mutations were combined to create His-SUMO-1C containing C52S, H75K, V87K, V90C, Q92C, and T95R, which still retained the sumoylation capability and is marked with X in Fig. 1C.
Identification of Human Sumoylation Sites in Vivo-The applicability of our enrichment method was tested without sub-6 H. A. Blomster, unpublished observations. FIGURE 1. Cysteine-tagged human SUMO-1 for enrichment of sumoylated peptides. A, tryptic digestion of mammalian sumoylated peptides results in peptides with a long side chain on the target lysine residues. B, identification of sumoylation sites using SUMO-1C. Altered residues are marked with asterisks, and the SUMO target lysine is shown in bold. The peptide of interest is different between subsequent purification steps and is indicated with blue color. Left panel, LysC cleavage results in a release of the cysteine tag from the rest of SUMO-1C. The substrate is also digested, but the sumoylated lysine is not cleaved. Middle panel, using Thiopropyl-Sepharose, the cysteine-containing peptides are covalently retained. Right panel, target peptides are eluted with trypsin, and the diglycine (GG)-modified internal lysine is identified as a sumoylation site by LC-MS/MS. C, construction of SUMO-1C. PARP-1 H53R was expressed together with WT SUMO-1 or the indicated mature forms of SUMO-1 mutants in HeLa cells and blotted against the Myc tag to detect sumoylation. SUMO-1 I88K,T95R has lost its ability to conjugate to PARP-1, whereas other mutants, which had no effect on sumoylation, were combined to create His-SUMO-1C, marked with an X. The arrow indicates the sumoylated form of PARP-1. Molecular masses in kDa are indicated on the left side of the blot. Hsc70 was used as a loading control. WB, Western blot. strate overexpression using the following experimental setup. The His-SUMO-1C construct was expressed in HeLa cells followed by metal affinity purification under denaturating conditions and removal of unconjugated His-SUMO-1C by a 30-kDa regenerated cellulose cut-off membrane (supplemental Fig. 2). Purified sumoylated proteins were subjected to the cysteine tag peptide enrichment as described above. The resulting samples were fractionated with an SCX microcolumn and analyzed by LC-MS/MS with QSTAR Pulsar and LTQ Orbitrap XL mass spectrometers. The obtained MS/MS data were searched using Mascot against the Swiss-Prot human database, with variable modifications including the diglycyl lysine residue as a sumoylation site.
We found 14 sumoylated peptides, which were derived from 12 substrate proteins (Table 1, supplemental Tables 1 and 2). According to the Swiss-Prot database, 7 of the 12 substrate proteins were assigned to the nuclear or nuclear membrane compartment, 2 were unassigned, and 3 were assigned to reside outside the nucleus, in membranes, or even cellular projections. Among the sumoylated peptides, 3 and 11 were sumoylated on consensus and non-consensus sites, respectively. This result supports our earlier presented hypothesis that SUMO substrate recognition mechanisms would be more versatile than earlier anticipated and that numerous SUMO substrates would be targeted by yet unknown mechanisms, independent of the consensus tetrapeptide (4). The identified sites included two well known consensus sites, Lys 524 of Ran GTPase-activating protein 1 (RanGAP1) and Lys 11 of SUMO-2 (22,23), and two non-consensus sites, Lys 7 of SUMO-1 and Lys 779 of transcription intermediary factor-1␤ (TIF-1␤) (13, 24) ( Table 1). SIM sequences were found in close proximity in 6 of the 11 non-consensus sites (data not shown). Our results are consistent with the model of the SIM-dependent non-consensus targeting (7), and we propose that individual non-consensus sumoylation sites, at least on small proteins, could be searched for by mutating SIM-like residues. The lack of SIM residues nearby the remaining identified non-consensus sites indicates that there are also other, yet undefined, specificity determinants that govern the targeting of non-consensus sumoyla-tion. Hundreds of SUMO substrates have been reported in recent proteomics studies (4,5). However, without a verified sumoylation site assignment, the insights of non-consensus sumoylation sites mentioned here are clearly beyond reach.
As examples of MS/MS spectra inspected manually, we present Lys 524 of RanGAP1 obtained with LTQ Orbitrap XL and Lys 198 of cytoskeleton-associated protein 2-like (CKAP2L) obtained with QSTAR Pulsar (Fig. 2, A and B). The remaining MS/MS spectra are found in supplemental Fig. 3. Low energy collision-induced dissociation of LTQ and QSTAR provided losses of glycine residue(s) from the sumoylation sites (Fig. 2,  supplemental Fig. 3). The losses accompanied dominant fragment ions and especially precursor-related ions. When compared with the QSTAR Pulsar, the modern LTQ Orbitrap XL provided a larger number of sumoylated peptides due to the superior sensitivity and precursor mass accuracy (Table 1). LTQ MS/MS used in this study provided many fragment ions, which should result in the good Mascot scores for identification. However, due to the relatively low mass accuracy and resolution provided by LTQ MS/MS, annotation of fragment ions with charge states was ambiguous in some cases (e.g. an ion at m/z 502.4 in Fig. 2A), and the MS/MS spectra of sumoylated peptides were hard to analyze manually.
Confirmation of CKAP2L Novel Sumoylation Site-To our knowledge, sumoylation of CKAP2L has not been reported previously. By manual inspection of the MS/MS spectrum obtained with QSTAR (Fig. 2B), most of the predominant ion peaks were annotated as peptide fragment ions. Because QSTAR MS/MS provides isotope-resolved ion peaks, charge states of the fragment ions could be verified. Intensive fragment ions, y 6 , y 8 , and their doubly charged forms corresponded to the proline-induced fragment ions. Diglycine modification of the ⑀-amino group of the peptide N-terminal lysine residue apparently induced b 1 and a 1 ions instead of the b 2 and a 2 ions that are frequently observed by QSTAR MS/MS. An ion observed at m/z 115.08 corresponds to fragment GG (theoretical m/z 115.05). Because none of the ion peaks matched to the threonine-containing fragment ions with neutral loss of H 3 PO 4 , this peptide should be phosphorylated at the tyrosine residue (25).  The novel site of CKAP2L was confirmed by expressing HA-CKAP2L WT and HA-CKAP2L K198R together with WT SUMO-1 in HeLa cells. After HA immunoprecipitation and Western blotting against SUMO-1, a complex pattern of several sumoylated bands corresponding to several sumoylation sites of HA-CKAP2L WT was detected (Fig. 2C, supplemental Fig. 1). The finding that two sumoylated bands marked with asterisks were lost in the HA-CKAP2L K198R mutant, i.e. the upper band on top of a smear and the lower band on top of another band (Fig. 2C), demonstrates that CKAP2L is sumoylated on Lys 198 and that our method can be used to identify novel non-consensus sites of sumoylation.
Recently, Hsiao et al. (6) reported an informatics tool to facilitate identification of endogenous sumoylated peptides, which are branched with the long side chains, i.e. C termini of SUMOs (Fig. 1A). This tool linearizes expected sumoylated peptides in silico to simplify the MS/MS database search. However, the peptide identification was achieved mainly with detection of predominant fragment ions originating from the C termini of SUMOs, and sufficient numbers of the fragment ions from the substrate peptides, required for confident identification of the sumoylation sites, were not observed (6). Our enrichment strategy using the removable cysteine tag facilitates considerably the in vivo identification of sumoylation sites. Similarly to the recent advances in phosphopeptide enrichment (25-28) that have significantly promoted phosphoproteomics studies, the concept of sumoylated peptide enrichment is likely to expand the field of the site-specific SUMO proteomics.
Taken together, the results of our study validate a useful tool for sumoylation studies that already in this study gave novel insight into non-consensus sumoylation motifs. As the method is readily available for a broad spectrum of research environments, it will certainly be helpful in unraveling SUMO-targeting mechanisms and facilitate identification of novel SUMO consensus and non-consensus sites.