Dynamic and differential in vivo modifications of the isoform HMGA1a and HMGA1b chromatin proteins.

Most naturally occurring mammalian cancers and immortalized tissue culture cell lines share a common characteristic, the overexpression of full-length HMGA1 (high mobility group A1) proteins. The HMGA1 protooncogene codes for two closely related isoform proteins, HMGA1a and HMGA1b, and causes cancerous cellular transformation when overexpressed in either transgenic mice or "normal" cultured cell lines. Previous work has suggested that the in vivo types and patterns of the HMGA1 post-translational modifications (PTMs) differ between normal and malignant cells. The present study focuses on the important question of whether HMGA1a and HMGA1b proteins isolated from the same cell type have identical or different PTM patterns and also whether these isoform patterns differ between non-malignant and malignant cells. Two independent mass spectrometry methods were used to identify the types of PTMs found on specific amino acid residues on the endogenous HMGA1a and HMGA1b proteins isolated from a non-metastatic human mammary epithelial cell line, MCF-7, and a malignant metastatic cell line derived from MCF-7 cells that overexpressed the transgenic HMGA1a protein. Although some of the PTMs were the same on both the HMGA1a and HMGA1b proteins isolated from a given cell type, many other modifications were present on one but not the other isoform. Furthermore, we demonstrate that both HMGA1 isoforms are di-methylated on arginine and lysine residues. Most importantly, however, the PTM patterns on the endogenous HMGA1a and HMGA1b proteins isolated from non-metastatic and metastatic cells were consistently different, suggesting that the isoforms likely exhibit differences in their biological functions/activities in these cell types.

Overexpression of the HMGA1 gene (formerly known as HMGIY (1)) is such a consistent feature of tumors that it has been suggested to be a "diagnostic" biochemical marker of both neoplastic transformation and cancer progression (2)(3)(4)(5)(6)(7)(8). Elevated levels of HMGA1 gene products and HMGA1 proteins have been observed in almost every cancer type investigated and are correlated with increasing degrees of malignancy and metastatic potential of a number of cancer types (6,9,10).
Additionally, it was demonstrated that overexpression of HMGA1 proteins in "normal" rat 1a cells and a human breast adenocarcinoma cell line caused neoplastic transformation and malignant metastatic progression of these cells, respectively (11,12). Together, these results demonstrate that overexpression of full-length HMGA1 proteins in tumor cells is extremely widespread and biologically important.
The HMGA1a (formerly known as HMG-I) and HMGA1b (formerly known as HMG-Y) isoform proteins are architectural transcription factors that are derived from alternatively spliced mRNA transcripts coded for by the HMGA1 gene located on human chromosome 6 (locus 6p21) with HMGA1b (95 amino acids; ϳ10.6 kDa) containing an internal 11-amino acid deletion compared with HMGA1a (106 amino acids; ϳ11.5 kDa) (13). Both isoforms contain three independent DNA-binding regions, called AT-hook motifs, that bind to the DNA minor groove and preferentially recognize the structure of short stretches of an AT-rich sequence (14 -16). However, the internal 11-amino acid deletion changes the spacing between AThooks I and II in the HMGA1b protein compared with that in the HMGA1a isoform.
Originally, the HMGA1a and -A1b proteins were thought to possess the same biological activities/functions because the binding of recombinant proteins (which lack secondary biochemical modifications) to DNA substrates appeared to be similar for both isoforms. The first suggestion that HMGA1a and -A1b proteins might have different biological functions in vivo came from the work of Banks et al. (17) that demonstrated that post-translational modifications (PTMs) 1 of the HMGA1a and HMGA1b isoforms affect their mode of DNA-protein interactions. Additionally, different PTM patterns were detected in vivo on HMGA1a and -A1b proteins isolated from different tissue culture cell types (17). Direct confirmation that HMGA1a and HMGA1b could indeed have different functions in vivo was subsequently obtained when one or the other of these isoforms was overexpressed as a tetracycline-regulated transgene in human MCF-7 mammary epithelial tumor cells (12). These experiments demonstrated that artificially induced overexpression of the HMGA1b transgene in the non-metastatic MCF-7 cells caused them to progress much more rapidly to a metastatic and highly malignant phenotype than did induced overexpression of the transgenic HMGA1a isoform. Moreover, oligonucleotide microarray studies revealed that the gene expression profile in MCF-7 cells overexpressing trans-genic HMGA1a was significantly different from the profile in cells overexpressing transgenic HMGA1b. Western blot analysis of proteins being expressed in these cells further confirmed the microarray results, thus demonstrating that the HMGA1a and HMGA1b isoforms differentially regulate specific genes in the transgenic MCF-7 mammary epithelial cells (12). Clearly, there are distinct functional differences between the HMGA1a and -A1b isoforms in vivo, but more research is required to determine whether PTMs of the proteins play a role in these observed biological differences.
In vivo phosphorylation of HMGA1 was first discovered in Ehrlich ascites cells (18) and Friend erythroleukemic cells (19) where, along with histone H1, they are among the most highly phosphorylated chromatin proteins in the nucleus. Further research also demonstrated that HMGA1 proteins are phosphorylated in a cell cycle-dependent manner, specifically during metaphase (20,21). These discoveries led researchers to identify HMGA1 phosphorylation by specific kinases such as cyclin-dependent kinase 2 (22)(23)(24), casein kinase II (25), protein kinase C (17,26), and homeodomain-interacting protein kinase-2 (27). In addition, other PTMs of HMGA1, such as acetylation by cAMP-response element-binding protein binding protein and p300/CBP-associated factors (28,29) and protein methylation (17,30,31), have been discovered. More recently, di-methylations of both arginine and lysine residues were discovered on the HMGA1a protein that may correlate to the metastatic potential of the cells (32). It has also been demonstrated that HMGA1 proteins are biochemically modified in vivo for specific purposes, for example, enhanceosome formation/destabilization (28,29) and participation in apoptosis (30,31,33). Nevertheless, the present knowledge of the types and patterns of in vivo PTMs found on HMGA1 proteins in different cell types and in the same cells under different physiological conditions is quite limited, and must be better characterized before the biological significance of such modifications can be fully and critically assessed.
An important unanswered question in cell biology is whether differences in the types and patterns of in vivo PTMs present on closely related isoform proteins differentially influence the biological function/activity of these proteins in cells. For example, a complete characterization of the types and sites of PTMs present on the HMGA1a and -A1b proteins found in the same cell type has not yet been conducted. Likewise, no systematic assessment has been made of the types and sites of PTMs found on these closely related isoform proteins in cells that have been experimentally induced to exhibit markedly different phenotypic characteristics to gain insight into the molecular modifications of the proteins relating to cellular phenotype. In the present study, we employed mass spectrometry techniques to examine systematically the in vivo PTMs found on the endogenous HMGA1a and -A1b proteins in cells under varying sets of conditions. We focused on the differential in vivo PTMs found on the closely related HMGA1a and HMGA1b proteins isolated from two genetically matched MCF-7 lines of human mammary epithelial cells ("MCF-7/Tet-Off" and HA7C) whose origins and biochemical and phenotypic characteristics have been described in detail previously (12). The parental MCF-7/Tet-Off cell line expresses very low levels of the endogenous HMGA1 protein, does not readily grow in soft agar, and does not form tumors in nude mice, which is characteristic of normal mammary epithelial cells (12). In contrast, the HA7C cell line (which was originally derived from the MCF-7/Tet-Off cell line by stable transfection) is induced to overexpress transgenic HMGA1a protein when the cells are grown in the absence of tetracycline. As a consequence of this induced overexpression of transgenic HMGA1a protein (ϳ40 times the amount of the endogenous protein and well within the concentration range found in naturally occurring human cancers), the HA7C cells acquire the ability to grow in soft agarose, form primary tumors when injected into nude mice, and exhibit a moderately metastatic and invasive phenotype (12).
In contrast to previous studies that identified PTMs on only the most abundant HMGA1 protein species present in cells, an important technical advance in the present investigation was the inclusion of all of the different in vivo modified forms of the endogenous HMGA1 proteins found in a given cell type for analysis by mass spectrometry. Our results demonstrate that the endogenous HMGA1 proteins within a given cell type represent a complex and heterogeneous population, with the HMGA1a and HMGA1b proteins exhibiting a much higher level of in vivo PTMs than has been reported previously. Most importantly, we report the first observed differential in vivo modifications on the HMGA1a and HMGA1b proteins isolated from the same cell type, and we demonstrate that the PTM patterns found on these isoforms differ in non-metastatic and metastatic MCF-7 cells. Furthermore, complex patterns of PTMs were observed on all three of the AT-hook regions of the HMGA1a and -A1b proteins that appeared to differ between the isoforms present within the same cell type as well as between non-metastatic and metastatic cells. These findings provide the first comprehensive comparison of the PTM patterns found on the HMGA1 isoforms in mammalian cells.

MATERIALS AND METHODS
Cell Lines and Cell Culture Methods-The cell lines utilized in this study were MCF-7/Tet-Off (Clontech, catalog number C30071), which is a non-metastatic line of MCF-7 human breast adenocarcinoma cells, and HA7C, a derivative line of MCF-7/Tet-Off cells that has been stably transfected with an expression vector containing a full-length human HMGA1a cDNA (clone 7C) (34) in which transcription is controlled by a tetracycline-regulated promoter (12). In HA7C cells, the N terminus of the HMGA1a cDNA protein-coding region is fused, in-frame, with a hemagglutinin peptide tag so that the chimeric transgenic protein produced by the vector can be distinguished and separated from the native, endogenous HMGA1a protein. Both the parental MCF-7/Tet-Off and the HA7C cell lines were grown and maintained as described previously (12). Tetracycline was not added to the medium of the HA7C cells in order to maximize the transgenic HMGA1a protein overexpression. The cells were harvested when confluent and frozen at Ϫ70°C until needed.
Protein Isolation, Purification, and Detection-Recombinant human (rh) HMGA1 proteins were extracted from Escherichia coli BL21 DE3 pLysS cells (Stratagene, La Jolla, CA) with diluted (5%) trichloroacetic acid and were purified by reverse phase high performance liquid chromatography (RP-HPLC) techniques, as described previously (35). Briefly, following acid extraction, the HMGA1 proteins were purified by utilizing an RP-HPLC C4 Microsorb analytical column with a linear acetonitrile, 0.2% trifluoroacetic acid gradient from 10 to 25% acetonitrile. Further purification of the HMGA1 proteins was accomplished with a C18 Dynamax analytical RP-HPLC column with a linear acetonitrile, 0.2% trifluoroacetic acid gradient from 5 to 23% acetonitrile. RP-HPLC isolated HMGA1 protein purity was assessed by SDS-PAGE following standard protocols (34) and verified by MALDI MS. Endogenous HMGA1 proteins were also isolated from both of the MCF-7 cell lines by employing these conditions.
Detection of Endogenous HMGA1 Proteins Isolated from MCF-7 Cells-To confirm the presence of HMGA1 proteins extracted from both parental and the transgenic MCF-7 cell lines and from RP-HPLCpurified protein fractions, SDS-PAGE and Western blot analysis using polyclonal anti-HMGA1 were employed following published protocols (35). Additionally, SDS-PAGE was used to assess the purity and concentration of (within the nanogram to microgram range) endogenous, in vivo modified HMGA1 proteins using known concentrations of pure rhHMGA1 proteins as reference standards (data not shown).
Enzyme Digestions of the HMGA1 Proteins-Purified native HMGA1 proteins from each of the experimental cells lines were separated into two equal fractions of 0.5 mg and lyophilized. One of the samples was dephosphorylated with calf intestinal phosphatase (CIP) (Roche Applied Science), and the reactions (3 g of in vivo modified HMGA1, 30 units of CIP) were carried out at 37°C for 12 h in the reaction buffer supplied by the manufacturer. The dephosphorylated HMGA1 proteins were purified from the CIP by trichloroacetic acid precipitation (35). Both the native and dephosphorylated samples were then individually lyophilized, reconstituted, and divided into two equal fractions. Sequencing grade proteinase Arg-C (Sigma) or sequencing grade trypsin (Promega, Madison, WI) (50:1 mass ratio of protein to enzyme) in 50 mM ammonium bicarbonate (pH 8.0) at 37°C was used to digest the samples. Tryptic digestions were carried out for 4 h, while Arg-C reactions were digested for 10 h, with the length of the reactions empirically determined. Partial tryptic digestions were essential because complete cleavage of HMGA1 resulted in small peptide fragments, making them difficult to analyze by mass spectrometry. Trifluoroacetic acid was added to a final volume of 2% to terminate the reactions. The samples were lyophilized, and the peptides were redissolved in H 2 O.
MALDI Mass Spectrometry-A PerSeptive Biosystems Voyager DE-RP MALDI time-of-flight (TOF) mass spectrometer (Framingham, MA) was used to analyze the HMGA1 digestions in the linear positive ion mode according to published protocols (17,36,37). TOF was measured over 256 laser pulses and averaged into a single spectrum. Saturated matrix solution consisting of 3,5-dimethoxy-4-hydroxycinnamic acid, 50% acetonitrile, 0.2% trifluoroacetic acid was mixed with fulllength rhHMGA1 and purified in in vivo modified HMGA1 proteins. The HMGA1 protein digestions were then mixed with ␣-cyano-4-hydroxycinnamic acid matrix solution and calibration mixture 3 (Cal Mix) and analyzed by MALDI-TOF MS. Both matrices were purchased from Sigma. To maximize peak detection, three different scans were carried out on every protein/peptide sample.
MALDI-TOF MS Calibration-The MALDI-TOF mass spectrometer was internally calibrated using standards purchased from PE Biosystems (Foster City, CA). Briefly, the best peak definition for the range of analysis (400 -5000 mass/charge (m/z)) needed for the rhHMGA1a peptide samples was achieved with the instrument settings of noise filter "1" and Gaussian smooth "23." Internal calibrations were performed using the 379.35 m/z ␣-cyano-4-hydroxycinnamic acid matrix dimer and the 5734.59 m/z bovine insulin peaks, and these conditions were chosen for the first "rough" internal calibration. An additional fine-tuned calibration of each spectrum was accomplished by calibrating on multiple unmodified HMGA1 peptides resulting from either tryptic or Arg-C digestions.
Data Analysis-Data Explorer version 5.1 (PE Biosystems) was used to analyze m/z ratios for proteins and peptides. The analysis strategies were as follows. Prior to analysis, known peptide peaks resulting from autodigestion of trypsin (prospector.ucsf.edu/ucsfhtml4.0/misc/trypsin. htm) and potential human keratin contamination masses identified by the EXPASY FindPept tool (www.expasy.ch) were removed from each tryptic spectrum. Autodigestion and human keratins resulting from Arg-C enzymatic digestions were also accounted for by using the Find-Pept tool and removed from the corresponding spectra. The remaining peak masses were then copied into an Excel (Microsoft, Redmond, WA) spreadsheet from the Data Explorer program on both the vertical axis and horizontal axis to produce a two-dimensional matrix. Each cell not on the vertical and horizontal axis in the spreadsheet matrix contained a formula that calculated mass differences between adjacent spectrum peaks. Mass differences between the 21.5-and 22.5-Da mass range from adjacent peaks, corresponding to possible sodium adducts, were removed from further analysis. Peak masses remaining after removal of the possible contaminating human keratin peaks, trypsin/Arg-C peptide peaks, and sodium adducts were further analyzed with the EX- PASY FindMod tool (www.expasy.ch), Protein Prospector MS-digest program (www.prospector.ucsf.edu), and manual mass calculations. The average mass values of 14.03, 42.08, and 79.98 were used for the analysis of potential PTMs, methylation, acetylation, and phosphorylation, respectively. Only peaks that corresponded to our data range criterion of less than Ϯ150 ppm were further analyzed.
Electrospray Ionization Tandem Mass Spectrometry-HMGA1 peptides from enzymatic digestions were separated by using an Agilent 1100 capillary LC system with a 40-cm capillary column (150 m inner diameter ϫ 360 m outer diameter, Polymicro Technologies, Phoenix, AZ) packed with 5-m C 18 particles (PoroS 20R2, Applied Biosystems, Foster City, CA). Peptide elution was achieved at a flow rate of 1.8 l/min using water, acetonitrile, 0.1% acetic acid, 0.01% trifluoroacetic acid with a linear gradient from 10 to 60% acetonitrile. The capillary column flow was infused directly into a Thermo-Finnigan LCQ Deca XP ion trap mass spectrometer. The mass spectrometer duty-cycle length was optimized to include a single full MS scan followed by three MS/MS scans on the three most intense parental masses (determined by Cali-ber® software in real time) from the single parent ion full scan. Dynamic mass exclusion windows of 3 min were used. MS/MS spectra for all samples were measured with an overall mass/charge window of 400 -2000 m/z.
Analysis of HMGA1 Peptides-Resulting tandem mass spectra were analyzed by SEQUEST® (Bioworks 2.0 ThermoFinnigan) (38 -42), which compares experimental spectra with predicted idealized mass spectra generated from a data base of protein sequences. These idealized spectra are weighted largely with "b" and "y" fragment ions, i.e. fragmentation at the peptide bond from the N and C termini, respectively. The peptide mass tolerance was set to 1.0; the fragment ion tolerance was set to 0.0; and trypsin and Arg-C enzyme rules were applied during SEQUEST® analysis. Single and double dynamic modifications representing phosphorylation, methylation, and acetylation were used to analyze the MS/MS spectra against a data base containing primarily HMG proteins and other chromatin proteins. Extensive manual analysis of singly and doubly ionized spectra was conducted (43) to validate and complete the analysis begun with SEQUEST®.

Diverse Post-translationally Modified HMGA1 Isoform Protein Populations within Human MCF-7 Mammary Epithelial
Cells-As illustrated in Fig. 1, the endogenous HMGA1a and HMGA1b proteins were isolated from the MCF-7/Tet-Off cells and scanned with MALDI-TOF MS to examine their PTMs. The most prominent peaks in the in vivo HMGA1 comprehensive profiles generally correspond to the proteins with a single FIG. 3. MALDI spectra of Arg-C-digested HMGA1b proteins. Mass peaks from unmodified HMGA1b Arg-C digest peaks are labeled with letters, and modified HMGA1b Arg-C digest peaks are labeled with numbers. The insets are magnified segments of the main MALDI spectrum enlarged to better visualize minor mass peaks. A, spectrum of Arg-C-digested recombinant human HMGA1b protein. B, spectrum of Arg-C peptides derived from in vivo modified HMGA1b proteins purified from MCF-7/Tet-Off cell lines. Information on positively identified peaks, indicated by letters or numbers, is presented in Table I. Peaks identified by mass analysis as coming from human keratin proteins are indicated by K followed by a number. Unlabeled peaks are potential partial cleavage products of either the HMGA1b protein, the Arg-C enzyme, or fall outside the selection criterion range and therefore were not positively assigned. acetylation, most likely N-terminal acetylation. Furthermore, HMGA1a (11.54 -12.46 kDa) and HMGA1b (10.59 -11.36 kDa) PTMs are observed over large mass ranges (Fig. 1). The array of different masses observed in the MALDI-TOF MS spectra of these protein preparations indicates that both the HMGA1a and -A1b proteins exist in cells as a heterogeneous population of many different biochemically modified forms. Furthermore, their different mass ranges suggest that some HMGA1a proteins are more highly modified in vivo than the HMGA1b isoforms.
In Vivo PTMs of Full-length HMGA1 Proteins-RP-HPLC separations were performed to isolate all of the HMGA1a and HMGA1b chromatography fractions from each other for subsequent MS analysis (data not shown). It should be stressed that the HMGA1 proteins have been purified to near-homogeneity by RP-HPLC and identified by Western blot analysis using specific anti-HMGA1 antibodies prior to analysis (35). Therefore, the wide range of masses observed in the spectra are because of different in vivo biochemical modifications of the proteins rather than because of spurious protein contaminations (17). MALDI-TOF MS spectra of these highly purified HMGA1a ( Fig. 2A) and HMGA1b (Fig. 2C) samples isolated from HA7C cells demonstrated that most of the in vivo modified proteins were indeed successfully separated from each other by using RP-HPLC techniques (compare Fig. 1 with Fig.  2, A and C). Furthermore, some PTMs that occur on the proteins were tentatively assigned by subtracting the masses of unmodified or N-terminally acetylated HMGA1 proteins from larger mass peaks in the spectra. Phosphorylated, acetylated, and methylated proteins were tentatively identified for HMGA1a, but a distinct mass peak containing all three modi-fications was not detected ( Fig. 2A). By mass analysis, HMGA1b proteins containing phosphoryl, acetyl, and methyl groups were tentatively identified ( Fig. 2A). Likewise, an HMGA1b protein that contained no N-terminal acetylation was identified. Even though some types of PTMs could be identified by analysis of endogenous full-length HMGA1 proteins, additional methods had to be used to verify the potential PTMs observed as well as tentatively assign them to specific amino acid residues.
In Vitro Dephosphorylation of HMGA1 Proteins-The in vitro removal of phosphoryl groups by CIP resulted in a significant mass range shift toward the less modified forms of HMGA1 (Fig. 2, B and D). The spectrum of protein masses observed after dephosphorylation demonstrated that a significant amount of HMGA1 phosphorylation occurred in vivo. However, most of the in vivo HMGA1 proteins were also modified with other moieties. Furthermore, by subtraction of HMGA1 unmodified mass peaks from some of the new mass peaks observed after dephosphorylation, other modification types were identified on some of the new peaks. A di-acetylated and dimethylated HMGA1a protein isoform was tentatively identified, along with an unmodified form (no N-terminal acetylation) of HMGA1a (Fig. 2B). In addition, a di-acetylated form of HMGA1b was observed, with positive assignment of other potentially acetylated and methylated forms not unambiguously identified, even though we detected various peaks in the spectrum (Fig. 2D).
Analysis of full-length HMGA1 proteins purified from the cells provided some information on the types of PTMs that were present on the proteins. To identify the specific modifications and which amino acids were modified in vivo, however, we digested native HMGA1 and dephosphorylated HMGA1 proteins with two different proteolytic enzymes, and we analyzed the digestion products by MALDI-TOF MS. The results were organized into overlapping peptide maps that allowed us to positively identify many of the amino acids of HMGA1 that were biochemically modified in vivo.
MALDI Analysis of Enzymatic Digestions of the HMGA1b Protein Isolated from MCF-7/Tet Off Cells-The MALDI-TOF MS spectrum of Arg-C enzymatic digested native HMGA1b exhibited a much more complex cleavage pattern than the unmodified recombinant human HMGA1b protein (Fig. 3). Nevertheless, native HMGA1b peptide mass peak "7" (Fig. 3B) and rhHMGA1b peptide mass peak "g" (Fig. 3A) correspond to the same Arg-C peptide fragment, unmodified HMGA1b amino acids 1-29 (Table I). Thus, the more complex pattern of peptides observed for the Arg-C-digested native protein isolated from MCF-7/Tet-Off cells is a consequence of the in vivo secondary biochemical modifications of the protein. For example, the HMGA1b peptide corresponding to amino acids 62-95 is completely unmodified in the Arg-C rhHMGA1b digest (Fig.  3A, peak h) but contains one or two phosphoryl groups in the endogenous HMGA1b Arg-C digestion (Fig. 3B, peaks 11 and  12). Similarly, peaks "3" and "4" demonstrate that the same peptide fragment (corresponding to amino acids 30 -46) can be completely unmodified or contain an array of modifications, such as methyl and acetyl groups (Fig. 3B).
Dephosphorylation of Native HMGA1b Verifies Types of PTMs-Confirmation of phosphoryl groups on peptide fragments was achieved by enzymatic dephosphorylation of native HMGA1b isolated from HA7C cells prior to digestion by the proteolytic enzymes. Verification of phosphorylated peptides is necessary because the negative charges on phosphoryl groups suppress mass spectrometry ionization (44). The dephosphorylated HMGA1b spectra also confirmed which peptides were acetylated and methylated by analyzing the spectra for nonphosphorylated peptide peaks and mass peaks that arose from dephosphorylation of the peptides. For example, HMGA1b from HA7C cells digested with the Arg-C proteolytic enzyme illustrated that the peptide 73-95 contains two phosphoryl groups and an acetyl group (Fig. 4A). When the native HMGA1b sample was dephosphorylated, the two phosphoryl groups were removed, which shifted the peak 160 Da (Fig. 4B), confirming the presence of two phosphoryl moieties on the peptide. Furthermore, these results also verified that amino acids 73-95 contain an acetyl group and that more than one modification can occur simultaneously on the same peptide fragment. Unfortunately, at the present time no enzymes are known that selectively remove acetyl or methyl groups from the HMGA1 proteins that could be used for additional verification of these types of modifications.
Different PTMs Detected between the HMGA1a and -A1b Protein Isoforms-Comparisons were also performed between the complex MALDI-TOF MS spectra obtained from enzymatic digestions of the native HMGA1a and HMGA1b proteins isolated from the same cells. For example, spectra of Arg-C digestion of these two isoform proteins isolated from the HA7C cell line revealed that they exhibit many different peaks and also a differential cleavage pattern by the enzyme (Fig. 5). Differen- tially modified peptides were detected in these spectra. For example, analyses revealed that the N-terminal peptides of HMGA1a Arg-C digestions were all unmodified in the spectrum shown in Fig. 5A (peaks 4, 7, and 11; Table II, lower section), whereas the N-terminal peptides from HMGA1b Arg-C digestions (Fig. 5B) were found to be modified with phosphoryl, acetyl, and methyl moieties (peaks k and l; and Table II, top  section). Additionally, peaks "a," "c," "d," "e," "f," "h," "n," and "o" of HMGA1b (Fig. 5B) all encompass the A1b splice region, so these peaks would not be observed in the spectra of HMGA1a Arg-C digestion (Fig. 5A). Furthermore, the C-terminal regions of the two proteins exhibited different modifications in the spectra shown in Table II. The Arg-C digestions of the HMGA1a peptides from the same cell line are unmodified or contain two acetyl groups on the lysine residues (peaks 5, 6, and 9; Table II). However, the C-terminal peptides from the Arg-C digestion of the native HMGA1b purified from HA7C cells are acetylated, phosphorylated, and methylated (peaks i and m; Table II). These results clearly demonstrate the complexity of in vivo HMGA1 PTMs and also illustrate the dynamic nature of these modifications. Likewise, they show that the same peptide fragments, derived from isoform proteins in the same cells, exhibit different PTM patterns ranging from completely unmodified to the same peptide possessing multiple modifications. It is important to note, however, that from these two spectra alone (Fig. 5), we achieved 100% peptide sequence coverage of both HMGA1 isoforms isolated from the HA7C cells (Table II).
Peptide Mapping the PTMs of HMGA1 Proteins-To identify which amino acids of the HMGA1 proteins are post-translationally modified, we formulated peptide maps for all of the mass peaks that met the exclusion criterion of less than Ϯ150 ppm mass error. For example, peptide maps from Arg-C diges-  Table II. Sodium adducts are indicated with Na ϩ ; water losses are indicated with ϪH 2 O, and possible human keratin peaks identified by mass analysis are indicated by K followed by a number. tions of HMGA1a and HMGA1b proteins purified from MCF-7/Tet-Off cells are shown in Fig. 6. MALDI-TOF MS detected the N-and C-terminal peptides of HMGA1a (Fig. 6A), whereas numerous peptide fragments in the center of the HMGA1b proteins (Fig. 6B) were preferentially identified by MALDI-TOF MS. This trend appears independent of how highly modified the peptides are, as illustrated by the peak containing five phosphorylations, an acetylation, and two methylations on the N terminus of HMGA1a (Fig. 6A). The reason(s) for this marked asymmetric peptide distribution between the isoform proteins is unknown and is currently under investigation.
The peptide maps shown in Fig. 6 also confirm that the in vivo modifications of HMGA1 proteins are dynamic. Comparisons between the two HMGA1 protein isoform peptide maps illustrate a wide range of modifications that can occur over a specific peptide region. The N-terminal regions of both protein isoforms can exist in forms ranging from completely unmodified to species containing a variety of secondary modifications including phosphoryl, acetyl, and methyl moieties on one peptide fragment (Fig. 6). This pattern of complexity is observed throughout the length of the HMGA1 proteins and indicates that not just one post-translationally modified form of a protein is present in vivo but that a whole range of biochemically modified proteins co-exist in the MCF-7 cell lines examined. This observation with peptide fragments is further supported by the spectra of intact native proteins isolated directly from cells ( Fig. 1) as well as the MALDI-TOF MS spectra of the highly purified RP-HPLC fractions of native HMGA1 proteins (Fig. 2).

Confirmation of Secondary Modifications on Native HMGA1a by Electrospray Ionization MS/MS-Independent
confirmation of the complex PTMs that were observed by MALDI-TOF MS was accomplished with ESI-MS/MS. For example, Fig. 7 shows the ESI-MS/MS spectrum of one N-terminal fragment of an Arg-C-digested native HMGA1a from HA7C cells that was initially analyzed by the SEQUEST® program and further evaluated by manual analysis to verify the b and y ions assigned by this program and also to identify unassigned peptide peaks. The results of these combined analyses, which identify the sites and types of PTMs present on this particular peptide, are schematically shown in Fig. 7, upper right-hand corner. The b 1 -4 ion mass corresponds to the HMGA1a peptide fragment 1-4 containing two phosphorylated serine residues, and the y 2 -19 and y 2 -21 ion masses further demonstrated that serines 3 and 4 are phosphorylated. Furthermore, only the y 1 -19 ion (-H 2 O) was identified and not a y 1 -19 ion that contained a phosphoryl group, thus demonstrating that serine 5 was not phosphorylated. These analyses unambiguously identified two phosphoryl groups on serine residues 3 and 4 that could not be assigned by MALDI-TOF MS analysis alone because of the abundance of serine residues within this particular HMGA1a peptide. Additionally, this peptide contained an

Differential in Vivo Modifications of HMGA1 Protein Isoforms
acetyl group on lysine 14 that was also identified by MALDI-TOF MS (Fig. 8), whereas another acetyl moiety was assigned to lysine 6 on the N-terminal HMGA1a peptide by MS/MS that was not detected by MALDI-TOF MS ( Fig. 7 and Fig. 8). Data from other ESI-MS/MS analyses (data not shown) also verified many of the other PTMs that we had already identified by MALDI MS (Table III and Fig. 8). Differential PTMs of the HMGA1 Proteins-Peptide mapping of MALDI-TOF MS results for the HMGA1a and HMGA1b isoform proteins isolated from the non-metastatic MCF-7/Tet-Off and the metastatic HA7C cell lines are illustrated in Fig. 8. Differential in vivo modifications between the HMGA1 protein isoforms were detected within and between each of these cell lines. For example, three di-methylations were identified on the third AT-hook of HMGA1a from MCF-7/Tet-Off cells, whereas only one methyl group was found across this same region in HMGA1b from the same cell line. Likewise, a methyl group was detected on lysine 81 of HMGA1b proteins isolated from MCF-7/Tet-Off cells, whereas the same in vivo modification was not detected on the corresponding HMGA1a amino acid residue (Fig. 8). In addition, we detected an in vivo dimethylation of arginine 44 of HMGA1b in the MCF-7/Tet-Off cell line but not in the HMGA1a proteins from the same cell line (Fig. 8). Furthermore, similar differential modifications between the HMGA1 isoform proteins were also identified on proteins isolated from the same HA7C cells. For example, ly-sine 14 of HMGA1a is acetylated in the HA7C cell line; however, the HMGA1b protein lacks this in vivo modification at lysine 14 within the same cells (Fig. 8). Likewise, threonine 20 is phosphorylated in vivo on the HMGA1a protein from the HA7C cells and not the HMGA1b protein. Most interestingly, the in vivo phosphorylation of threonine 20 is reversed between the HMGA1 isoforms in the MCF-7/Tet-Off cell line (Fig. 8).
Differential modifications were also observed between the HMGA1 isoforms purified from MCF-7 cell lines with different phenotypic characteristics, such as non-metastatic (e.g. MCF-7/Tet-Off) versus metastatic (e.g. HA7C) cells. For example, we were not able to positively identify phosphorylation of the Cterminal tail of HMGA1 isoform proteins isolated from the non-metastatic MCF-7/Tet-Off cells, although we could easily verify that two phosphorylations occur between serine residues 99, 101, and 102 (HMGA1a sequence numbering) of HMGA1 proteins isolated from the metastatic HA7C cell line (Fig. 8). These sites could potentially be modified in the MCF-7/Tet-Off cells as inferred from the ESI-MS/MS results (Table III), but a peptide fragment small enough for MALDI-TOF MS localization of phosphoryl groups to these specific residues was not obtained in our analyses. Additionally, more methyl groups were detected in the region of the third AT-hook of HMGA1a isolated from MCF-7/Tet-Off cells than on this region of the HMGA1a protein isolated from the HA7C cell line (Fig. 8). It is important to reiterate, however, that heterogeneous popula-FIG. 6. An Arg-C peptide fragment map was generated from HMGA1 peptide fragments of proteins isolated from MCF-7/Tet-Off cells. The peptides were identified from multiple spectra, and the peak masses used to generate this peptide fragment map are listed in Table I, bottom section (and in the supplemental tables), obtained from the three other MALDI-TOF MS scans conducted on the each of the Arg-C digestions of the highly purified HMGA1 proteins from MCF-7/Tet-Off cells. Each positively identified peptide fragment is illustrated with a bar, and the PTMs identified on that fragment are presented above the bar. Phosphorylations are indicated by P; acetylations are indicated with A; methylations are indicated by M, and fragments that are unmodified are indicated by No Mod. The bracketed region on the HMGA1a protein sequence corresponds to the HMGA1b splice site, the 11 amino acids that are removed to generate the HMGA1b isoform.
tions of HMGA1 isoform proteins co-exist in both the MCF-7/ Tet-Off and HA7C cell lines, and even though a modification is assigned to a specific amino acid, not all the HMGA1 proteins from a given cell line will contain the identified PTM on the specific amino acid residue. Therefore, some of the HMGA1 proteins in the population will contain the in vivo modification, whereas others may not. These analyses therefore provide an informative "snapshot" of the dynamic nature of the in vivo PTMs that are occurring on the HMGA1 proteins in living cells.

DISCUSSION
This is the first study to demonstrate that the HMGA1 isoform proteins are differentially and post-translationally modified in vivo, both within the same cell type and between genetically related cells exhibiting different expressed phenotypes. It is also the first to show that both HMGA1a and HMGA1b proteins exhibit dynamic patterns of PTMs in vivo. Previous work on HMGA1a demonstrated that this isoform protein has a variety of in vivo PTMs that varied between non-metastatic and metastatic cells (32). In that study, however, the PTMs present on the HMGA1b isoform protein were not examined, thus potentially overlooking important differences between the modifications that occur on closely related isoform proteins that might be related to the metastatic potential of cells. That such differences in PTM patterns on these isoform proteins, should they be identified, might be biologically important is supported by the previous demonstration that the HMGA1a and HMGA1b proteins, when overexpressed in non-metastatic MCF-7 cells, not only exhibit markedly different efficiencies in promoting cellular metastasis but also differentially regulate the transcriptional expression of numerous genes associated with tumor progression and increased malignancy (12). Given that a number of studies have shown that both the substrate binding characteristics and biological functions of HMGA1 proteins are regulated by secondary modifications (9, 17, 28 -33), it seems quite likely that at least some of the different PTMs observed on the HMGA1 isoform protein cells in non-metastatic and metastatic cells in this study serve as molecular controls for the differential biological effects of these proteins. Even though the actual biological functions/ activities of these complex PTM patterns are unknown, the FIG. 7. Ion fragmentation of an in vivo modified HMGA1a N-terminal parental peptide isolated from MCF-7/Tet-Off cells. An HMGA1a parental peptide fragment (corresponding to amino acid residues 1-23 with a mass of 2740.04) with two phosphoryl and two acetyl groups was selected for further analysis by ESI trap mass spectrometry. The fragmentation pattern shown in the figure was first analyzed by SEQUEST® software to select the b and y fragment ions (i.e. fragmentation at the peptide bond from the N and C termini, respectively, at the amide linkages) and to match these ions to idealized mass spectra generated from a data base of protein sequences. The idealized match, shown in the inset (upper right-hand corner), displays the PTMs on the sequence. Serine residues that are phosphorylated are indicated with *, and lysine residues that are acetylated are indicated with #. The b and y ions that were identified from the fragmentation of the parental peptide are indicated with the symbols  and  , respectively. The b ϩ1 , b ϩ2 , and y ϩ1 , y ϩ2 ions arise from loss of charge as a result of fragmentation of doubly charged parental peptide. The original spectrum resulting from the fragmentation was further analyzed by manual mass calculations to verify the b and y ions identified by SEQUEST® and to subsequently identify peaks that were not assigned by this program. Ions that corresponded to water losses (ϪH 2 O), phosphoryl losses (ϪH 2 PO 4 ), ammonia losses (ϪNH 3 ), and internal cleavages (IC), could not be identified by SEQUEST® and were therefore identified by manual mass analysis. Under the fragmentation conditions employed, both water and phosphoryl groups are known to be lost from serine and threonine residues. present findings lay the molecular foundation for rational approaches to investigate whether or not they play a role(s) in cancer.
Differential in Vivo PTMs of the HMGA1 Protein Isoforms-To date, only one other study has examined the differential PTMs between the HMGA1 protein isoforms (17). That investigation focused primarily on the different extents of in vivo phosphorylation that occur on the HMGA1a (also known as HMG-I) and HMGA1b (also known as HMG-Y) proteins in non-malignant and malignant cells, and although the workers in this study noted that the isoform proteins in both types of cells could also be modified by either acetylation and methylation, the precise sites of such modifications were not determined. Another important difference between this previous study and the present work is that here we have conducted a comprehensive analysis of all of the heterogeneous in vivo modified forms of the HMGA1 isoforms present in cells rather than just analyzing PTMs that occur on the major fractions of HMGA1 proteins that elute from RP-HPLC columns. In addition, by a combination of MALDI-TOF MS and ESI-MS/MS analyses, we unambiguously identified many of the specific sites of phosphorylation, acetylation, and methylation on the HMGA1a and HMGA1b proteins that were previously not possible when employing MALDI MS techniques alone.
Of potentially great biological significance is the discovery that differential modifications occur on all three of the AT-hook motifs of the HMGA1a and HMGA1b isoforms isolated from both the non-metastatic and metastatic cell lines (Fig. 8). More importantly, mono-and di-methylations, which have only recently been discovered for the HMGA1 proteins (17, 30 -32), appear to be the most abundant modifications on the AT-hook regions. As mentioned earlier, these are the peptide domains of the proteins that specifically bind to the minor groove of ATrich DNA sequences, and previous work has demonstrated that in vivo biochemical modifications of the AT-hook peptides alters the interaction of HMGA1 proteins with both DNA and chromatin substrates (17). Therefore, it is reasonable to suspect that the complex in vivo PTM patterns on the AT-hooks observed here will have a significant differential effect on the gene regulatory activities of the HMGA1a and HMGA1b proteins in cells. Likewise, it is notable that the peptide region located between the second (II) and third (III) AT-hooks of the HMGA1a contains considerably larger numbers and types of PTMs than does the corresponding region of HMGA1b isoform in both MCF-7/Tet-Off and HA7C cells (Fig. 8). Because this is the peptide domain of HMGA1 proteins that physically interacts with the greatest number of protein partners, most of which are transcription factors (6,9,45,46), PTMs of specific residues within the region between AT-hooks II and III are prime candidates for selectively controlling interactions be- The experimentally identified and biochemically modified amino acid residues are indicated by boldface and underlined letters corresponding to the amino acid modified in the sequence. The first two protein sequences correspond to the HMGA1a and HMGA1b proteins purified from MCF/Tet-Off cell lines, and the last two sequences correspond to the HMGA1a and HMGA1b proteins purified from HA7C cell lines. Sites of specific phosphorylations are indicated by lightning bolts, specific acetylations by hexagons, and specific methylations by trapezoids above the stick diagrams of the proteins. The numbers and types of PTMs that have been identified on the HMGA1 proteins by MADLI-TOF MS, but whose exact residue assignments have not been made, are indicated by bars below the amino sequences. Every 10 amino acid residues of both the HMGA1a and HMGA1b sequence are indicated by • above the protein sequences.
tween HMGA1 and other proteins as part of the gene regulatory process. It is provocative to consider that the new biochemical modifications identified in this study might be related to both the differential expression of genes that are known to be controlled by either the HMGA1a or HMGA1b proteins and also the differential effects of these isoforms in promoting neoplastic progression of MCF-7 cells (45).
The Complexity of in Vivo PTMs on HMGA1 Proteins-Numerous earlier studies have examined in vivo post-translational modifications on the HMGA1 proteins (17, 19, 26, 28 -30, 47-49), and many of the same PTMs documented in these previous investigations have also been identified here. However, in contrast to these prior investigations, our present study demonstrates the dynamic nature of PTMs found on the HMGA1 isoforms within the same cell. Analysis of highly purified HMGA1 proteins isolated from a given cell type revealed that the proteins exist as a heterogeneous population of posttranslationally modified forms, from unmodified to extremely modified proteins. Most interestingly, and in contrast to several previous reports (25,30,33,50), we did not observe constitutive phosphorylation of the serine residues in the acidic tail region of the HMGA1 isoforms in either the MCF-7/Tet-Off or HA7C cells (Fig. 8). This apparent discrepancy could be partially explained by differences in the types of cells analyzed in these different studies. However, another equally plausible contributing factor is the fact that most in vivo protein phosphorylations are known to be dynamic and reversible. In contrast to earlier investigations that usually analyzed modifications on only the most abundant protein species present, in the current study we examined the PTMs found on the entire heterogeneous populations of HMGA1 proteins found within cells. Therefore, because in vivo phosphorylations are in a dynamic state of flux, it is unlikely that all of the individual proteins within these populations will be phosphorylated in the C-terminal tail region at any given time. Additional support for the idea that dynamic biochemical modifications are a general feature of many cells is the observation that HMGA1 proteins isolated from other human cell types also exist as heterogeneous populations with complex and variable PTM patterns (data not shown).
Relationship of Nuclear Protein Modifications to Cancer-It is important to place the current observations on the complex changes in PTMs on HMGA1 proteins between non-metastatic MCF-7/Tet-Off and metastatic HA7C cells in the context of previous findings relating to nuclear protein biochemical modifications and cancer. As already noted, reports exist of correlations between changes in PTM patterns on HMGA1 proteins and the malignant phenotype of cells (17,32), including a recent demonstration that many neoplastic cell lines exhibit increased amounts of mono-methylated HMGA1a reaching up to 50% of the total cellular protein (48). Furthermore, similar to the findings in this study, differential PTMs of another protein related to cancer, p53, have been directly linked to tumorigenesis (51). But perhaps the most extensively studied changes in protein PTMs associated with cancer are those relating to epigenetic alterations in the patterns or "code" of covalent modifications found on the nucleosomal core histones (reviewed in Refs. 52 and 53). For example, well documented differential changes in histone modifications, such as methylations and acetylations, have been shown to control expression of certain genes in cancerous cells (53)(54)(55)(56), and differential histone PTMs have been demonstrated by mass spectrometry between acute myeloid and chronic lymphocytic leukemias (57).
Given the extensive epigenetic alterations in PTM patterns on nuclear proteins, particularly histones, found in normal and cancerous cells, a possible caveat exists concerning how to interpret the biological significance of the PTM variations on HMGA1 proteins found in non-metastatic and metastatic MCF-7 cells. If these PTM variations simply reflect an epiphenomenon (resulting from forced overexpression of transgenic HMGA1 in the MCF-7Tet-Off cells) that induces heterogeneous modification changes on all/most nuclear proteins (particularly other HMG proteins), they may not be functionally relevant to cellular metastatic progression. Although this possibility cannot yet be ruled out, control experiments involving MALDI-TOF analysis of full-length, acid-soluble nuclear proteins isolated from non-metastatic and metastatic MCF-7 cells indicate that a number of these proteins do not change their overall levels of post-translational modifications in these cell types (see supplemental Fig. 1). More importantly, similar MS analyses of highly purified, full-length HMGN1 (also known as HMG-14; a member of a separate HMG protein family that is also highly modified in vivo; see Ref. 58) demonstrate that the modifications on this closely related nuclear protein are not significantly different in non-metastatic and metastatic cells (see supplemental Fig. 2). Taken together, these data strongly argue against the likelihood that the changes in PTM patterns observed on HMGA1 proteins between the non-metastatic MCF-7/Tet-Off and the metastatic MCF-7/HA7C cells are an indirect, global epigenetic consequence of transgenic HMGA1a protein overexpression inducing heterogeneous PTM changes on all modifiable nuclear chromatin proteins.
In conclusion, this study represents the most comprehensive characterization of the in vivo PTMs found on the HMGA1a and HMGA1b proteins reported so far. We demonstrate that these two isoform proteins are differentially modified in vivo both within the same cell type and also in cells exhibiting different metastatic phenotypes. The biological function/significance of the many new sites and types of differential modifications on the HMGA1a and HMGA1b proteins are presently unknown. However, the findings reported here lay the foundation for future efforts to determine whether some of the modification differences identified here are, as suspected, causally related to tumor progression.