Comprehensive Mass Spectrometric Mapping of the Hydroxylated Amino Acid residues of the α1(V) Collagen Chain*

Background: α1(V) is an extensively modified collagen chain important in disease. Results: Comprehensive mapping of α1(V) post-translational modifications reveals unexpectedly large numbers of X-position hydroxyprolines in Gly-X-Y amino acid triplets. Conclusion: The unexpected abundance of X-position hydroxyprolines suggests a mechanism for differential modification of collagen properties. Significance: Positions, numbers, and occupancy of modified sites can provide insights into α1(V) biological properties. Aberrant expression of the type V collagen α1(V) chain can underlie the connective tissue disorder classic Ehlers-Danlos syndrome, and autoimmune responses against the α1(V) chain are linked to lung transplant rejection and atherosclerosis. The α1(V) collagenous COL1 domain is thought to contain greater numbers of post-translational modifications (PTMs) than do similar domains of other fibrillar collagen chains, PTMs consisting of hydroxylated prolines and lysines, the latter of which can be glycosylated. These types of PTMs can contribute to epitopes that underlie immune responses against collagens, and the high level of PTMs may contribute to the unique biological properties of the α1(V) chain. Here we use high resolution mass spectrometry to map such PTMs in bovine placental α1(V) and human recombinant pro-α1(V) procollagen chains. Findings include the locations of those PTMs that vary and those PTMs that are invariant between these α1(V) chains from widely divergent sources. Notably, an unexpectedly large number of hydroxyproline residues were mapped to the X-positions of Gly-X-Y triplets, contrary to expectations based on previous amino acid analyses of hydrolyzed α1(V) chains from various tissues. We attribute this difference to the ability of tandem mass spectrometry coupled to nanoflow chromatographic separations to detect lower-level PTM combinations with superior sensitivity and specificity. The data are consistent with the presence of a relatively large number of 3-hydroxyproline sites with less than 100% occupancy, suggesting a previously unknown mechanism for the differential modification of α1(V) chain and type V collagen properties.

Aberrant expression of the type V collagen ␣1(V) chain can underlie the connective tissue disorder classic Ehlers-Danlos syndrome, and autoimmune responses against the ␣1(V) chain are linked to lung transplant rejection and atherosclerosis. The ␣1(V) collagenous COL1 domain is thought to contain greater numbers of post-translational modifications (PTMs) than do similar domains of other fibrillar collagen chains, PTMs consisting of hydroxylated prolines and lysines, the latter of which can be glycosylated. These types of PTMs can contribute to epitopes that underlie immune responses against collagens, and the high level of PTMs may contribute to the unique biological properties of the ␣1(V) chain. Here we use high resolution mass spectrometry to map such PTMs in bovine placental ␣1(V) and human recombinant pro-␣1(V) procollagen chains. Findings include the locations of those PTMs that vary and those PTMs that are invariant between these ␣1(V) chains from widely divergent sources. Notably, an unexpectedly large number of hydroxyproline residues were mapped to the X-positions of Gly-X-Y triplets, contrary to expectations based on previous amino acid analyses of hydrolyzed ␣1(V) chains from various tissues. We attribute this difference to the ability of tandem mass spectrometry coupled to nanoflow chromatographic separations to detect lowerlevel PTM combinations with superior sensitivity and specificity. The data are consistent with the presence of a relatively large number of 3-hydroxyproline sites with less than 100% occupancy, suggesting a previously unknown mechanism for the differential modification of ␣1(V) chain and type V collagen properties.
It is generally accepted that ␣1(V) 2 ␣2(V) heterotrimers are incorporated into growing fibrils of the more abundant collagen type I and are involved in regulating the geometry and properties of the resulting type I/V heterotypic fibrils (5,6). Thus, mutations in the genes encoding either ␣1(V) or ␣2(V) chains result in the human connective tissue disorder classic Ehlers-Danlos syndrome, characterized by collagen I fibrils of abnormal shapes and diameters and deficient tensile strength (7,8). More recently, it has been demonstrated that anti-col(V) autoimmune responses can underlie chronic lung transplant rejection in both humans (9) and animal models (10) and that pre-transplant col(V)-specific autoimmunity is also a significant risk factor for primary graft dysfunction, the leading cause of early morbidity and mortality after lung transplantation (11,12). Col(V) autoimmunity has also been identified as a consistent feature in both late stage human coronary artery disease and a mouse model of atherosclerosis (13). In both human lung transplant rejection and coronary artery disease, immune responses have been shown to be specific to the ␣1(V) chain, with an absence of such responses to the ␣2(V) chain (9,13).
Recent studies investigating PTMs on collagenous proteins have utilized tandem mass spectrometry (MS n ) on low resolution, low mass accuracy ion trap mass spectrometers or mass measurements of large peptides using matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) MS (14, 30 -32). However, performing MS n on instruments with low mass accuracy or relying on MS measurements of peptides that have not been subjected to MS n makes confident peptide identification and unambiguous PTM site localization challenging. The analysis of collagenous proteins by MS is complicated by the high degree of modification and the large number of proline residues. The former leads to generation of isobaric, but differentially modified, peptides that often co-elute, producing chimeric spectra difficult to interpret without the aid of high mass accuracy and abundant product ions from MS n . The latter hampers MS n sequencing efforts due the propensity of Pro/Hyp to preferentially cleave during collisional activation (the "proline effect") (33,34). This can limit the generation of a ladder of b-and y-type product ions most commonly used by database search algorithms for peptide identification and PTM localization of spectra generated by collision-activated dissociation (33,(35)(36)(37).
In the current study we used state-of-the-art proteomics workflow centered on the use of high resolution MS and nanoflow chromatographic separations to precisely map hydroxylation and glycosylation modifications in bovine placental ␣1(V) and human recombinant pro-␣1(V) procollagen chains. Instead of relying on a single enzyme or chemical treatment to produce peptides amenable to MS, we used five different proteases (individually or sequentially) to maximize protein sequence coverage and facilitate PTM discovery. Peptides were subsequently separated by nanoflow liquid chromatography and introduced into an electron transfer dissociation (ETDenabled) hybrid dual cell linear ion trap-orbitrap mass spec-trometer by electrospray ionization. The orbitrap recorded the masses of eluting peptides with low ppm mass accuracy (Ͻ5 ppm) affording confident peptide identification and localization of PTMs. The hybrid orbitrap mass spectrometer permitted the use of several dissociation techniques; resonant excitation collisionactivated dissociation, beam-type collision-activated dissociation (higher energy collisional dissociation (HCD)), and ETD (38,39). The availability of multiple dissociation techniques and the use of MS 2 and MS 3 for peptide interrogation often allowed us to pinpoint the exact residue(s) carrying the individual PTMs. Such comprehensive, experiment-based PTM localization has not previously been achieved for collagenous proteins. Recent analyses of collagenous proteins by MS relied heavily on a priori biological knowledge to assess prolyl hydroxylation (i.e. PTM assignment based upon hydroxylation motifs) (30,32) rather than PTM assignment based upon localizing fragments from MS n spectra.
We report comprehensive mapping of all PTMs involving hydroxylated residues on bovine placenta ␣1(V) and human recombinant pro-␣1(V) collagen chains and provide manually verified mass spectral evidence for each modified site. Our analyses reveal PTMs that vary or are invariant between the bovine tissue ␣1(V) and human recombinant pro-␣1(V) chains and also reveal all hydroxylated residue PTMs in pro-␣1(V) sequences NH 2 -terminal to the COL1 major collagenous domain. We also identify an unexpectedly large number of Hyp residues in the X-position of Gly-X-Y triplets, attributed to our ability to identify modified peptides present in low stoichiometric amounts that go undetected by classic amino acid analyses or Edman sequencing. Another unexpected, and striking finding was X-position Hyp residues discovered in the unusual contexts of Gly-Hyp-Val and Gly-Hyp-Ala triplets in the bovine placental ␣1(V) COL1 domain. Findings presented herein may aid in characterizing and locating ␣1(V) autoimmune epitopes and may provide further insights into col(V) function.

EXPERIMENTAL PROCEDURES
Preparation of Human Pro-␣1(V) Procollagen and Bovine ␣1(V) Collagen Chains-Human recombinant pro-␣1(V) homotrimers were produced via expression from a modified pCEP-Pu vector in 293-EBNA human embryonic kidney cells (Invitrogen) followed by dialysis of conditioned media against 50 mM Tris-HCl, pH 8.6, buffer containing 0.1 mM phenylmethylsulfonyl fluoride, 1 mM N-ethylmaleimide, 0.1 mM p-aminobenzoic acid, and 5 mM EDTA for low salt precipitation of collagen chains as previously described (40). Bovine ␣1(V) chains were prepared essentially as previously described (41). Briefly, minced and washed amnion stripped off placenta from the area close to attachment of the umbilicus was suspended in 0.5 M acetic acid, 0.2 M NaCl and digested with pepsin at 4°C. Col(V), which is soluble in 0.7 M NaCl and precipitates in 1.2 M NaCl, was purified from supernatants via differential NaCl precipitation (42). ␣1(V) chains were separated from ␣2(V) chains via chromatography on diethylaminoethyl cellulose.
Amino Acid Analysis-Bovine ␣1(V) collagen chains were hydrolyzed in 6 N HCl, 1% phenol at 110°C for 48 h and analyzed by means of a Hitachi L 8800 A amino acid analyzer at the University of California-Davis Molecular Structure Facility.
Digestion of Pro-␣1(V) and ␣1(V) Samples for Mass Spectrometry-Pro-␣1(V) and ␣1(V) samples were desalted using 50 mg tC18 SepPak cartridges (Waters Corp., Milford, MA). Eluates were dried down and resuspended in digestion buffer (see below) optimized for each protease. Cysteine residues were reduced and alkylated by incubation in 5 mM dithiothreitol for 45 min at 37°C followed by a 30-min incubation at room temperature in 15 mM iodoacetamide in the dark. Alky-lation was quenched by a 15-min incubation in 5 mM dithiothreitol at room temperature. This was followed by a multiple protease approach, utilized to maximize protein sequence coverage (45), and involved addition of five different proteases (individually or sequentially) (see the details in the supplemental Experimental Procedures).
Liquid Chromatography Electrospray Tandem Mass Spectrometry (LC-MS/MS)-A Waters nanoACQUITY UPLC and autosampler (Waters Corp.) were used to load samples onto a fused-silica capillary precolumn (75-m inner diameter ϫ 360-m outer diameter). Precolumns with cast chemical frits were slurry-packed to 5 cm in length with a stationary phase consisting of a 5-m diameter, 100 Å pore size, C18 particles (Magic C18AQ, Michrom Bioresources, Inc., Auburn CA) (46). Reversed-phase LC separation was achieved across a 13-cmlong, 50-m inner diameter ϫ 360-m outer diameter fusedsilica analytical column packed with the same C18 stationary phase. An electrospray ionization emitter was integrated into the analytical column by use of a laser puller (Sutter Instrument Co., P-2000, Novato, CA). An estimated 1-4 g of protein digest was loaded onto the precolumn and chromatographed over a 60-min linear gradient at 300 nl/min (2 to 30% B, buffer A (0.2% formic acid in water); buffer B (0.2% formic acid in acetonitrile)). Eluate was introduced into the mass spectrometer via electrospray ionization (ϩ2.0 kV), and peptide cations were subjected to tandem mass spectrometry using an LTQ Orbitrap Velos mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) enabled for ETD (39,47,48). Typical experiments consisted of MS 1 analysis in the Orbitrap mass analyzer using a resolving power of 60,000 followed by 10 data-dependent HCD MS 2 events. Product ion mass analysis was also conducted in the Orbitrap using resolving powers of 7,500 -15,000. Precursor and product ion mass error was typically Ͻ10 ppm. Similar experiments were conducted using ETD, where charge statedependent ETD activation times were utilized. To obtain additional sequence information of glycosylated peptides, an additional set of LC analyses were conducted using an MS 3 mass spectrometry method. For MS 3 analyses, three data-dependent collision-induced dissociation or ETD MS 2 events were executed followed by data-dependent HCD MS 3 events, where the most intense peak in each MS 2 spectrum was dissociated. For all experiments, an automatic gain control target value of 1,000,000 charges was used for MS 1 . An automatic gain control target of 50,000 charges was used for both MS 2 and MS 3 . Precursors were dynamically excluded from data-dependent MS 2 for 30 -60 s. A precursor isolation width of 2-3 m/z was typically used.
Proteomic Data Analysis-Spectral reduction from raw data was performed by DTA Generator, a program available in the COMPASS software suite (freely available online) (49). The ETD preprocessing option was used to remove known neutral loss peaks from ETD spectra (50,51). The OMSSA (Open Mass Spectrometry Search Algorithm) search algorithm was used for database correlation (52). Spectra were searched against a concatenated target-decoy version of human Pro-␣1 (V) or bovine ␣1 (V) sequence databases (53). Several variable PTMs were considered: hydroxylation of proline and lysine residues (ϩ15.9949 Da, monoisotopic mass), glycosylation of lysine (monosaccharide, ϩ178.0473 Da, disaccharide, ϩ340.0995), and glycosylation-associated neutral losses upon activation. Finally, oxidation of methionine (ϩ15.9949 Da) residues was specified as a variable modification, and carbamidomethylation of cysteine (ϩ57.0215 Da) residues was searched as a fixed modification. A mass tolerance of Ϯ5 Da from the average mass was used for precursors, whereas a mass tolerance of Ϯ 0.01 Da from the monoisotopic mass was used for product ions (38). Spectra for all the modified sites were manually validated (see details in the supplemental Experimental Procedures) and are provided in supplemental Spectra S1 for bovine and supplemental Spectra S2 for human.

PTMs of Bovine Placental ␣1(V)
Chains-Chromatographically purified ␣1(V) chains extracted with acetic acid and pepsin from bovine placenta were subjected to tandem mass spectrometry (MS n ) to map the positions of hydroxylated residues and of galactosylhydroxylysyl (Gal-Hyl) and glucosylgalactosylhydroxylysyl (Glc-Gal-Hyl) residues. Treatment of fibrillar collagens with pepsin removes NH 2 -and COOH-terminal nontriple helical sequences, leaving only the pepsin-resistant major triple helical COL1 domain. At the time this study began, bovine ␣1(V) cDNA sequences were unavailable, and only a small portion (C-propeptide sequences and about the COOHterminal third of COL1 sequences) of bovine ␣1(V) coding sequences were available from annotated bovine genome databases. Thus, full-length bovine ␣1(V) cDNA sequences were generated as described under "Experimental Procedures," for comparison to MS data. At the time of submission of this study, database annotated genomic sequences were mostly complete but still had 29 amino acid differences compared with the cDNA sequences reported here (GenBank TM accession number JQ611730). The cDNA sequences reported here have been validated via comparison of MS analyses of bovine placental ␣1(V) protein.
MS n of AspN, GluC, chymotrypsin, ArgC, and trypsin-generated peptides of bovine placental ␣1(V) chains produced 94% sequence coverage and identified the positions of 106 Hyp residues in the Y-position of Gly-X-Y triplet repeats and 22 Hyp residues in the X-position of Gly-Hyp-Hyp triplet repeats (Fig.  1). Y-position Pro residues in Gly-X-Y triplets are hydroxylated to 4-Hyp residues by the enzyme prolyl 4-hydroxylase, for which the minimum substrate appears to be the tripeptide X-Pro-Gly (54). Thus, the 106 Y-position Hyp residues reported here are likely 4-Hyp. Gly-Pro-Hyp triplets are thought to constitute the substrate sequence for P3Hs, of which there are three (55,56). We conclude the 22 Hyp residues in the X-position of Gly-Hyp-Hyp triplet repeats in the COL1 domain of bovine placental ␣1(V) chain are likely to be 3-Hyp residues. Unexpectedly, Hyp residues were detected in the X-positions of a Gly-X-Val triplet and a Gly-X-Ala triplet (residues 509 and 587, respectively) (Figs. 1 and 2). As prolyl 4-hydroxylase is thought to hydroxylate only Y-position Pro residues and as P3H is thought to hydroxylate only Pro residues within Gly-Pro-Hyp triplets, it is not clear which enzyme has hydroxylated these sites or whether these hydroxylated prolines are 3-Hyp or 4-Hyp residues. Our analysis also identified the positions of 3 Hyl residues and 34 Glc-Gal-Hyl residues (Fig. 1), Hyl residues linked to the disaccharide glucosylgalactose.
Hydroxylated Residues and Glycosylated Hyl Residues of Human Recombinant Pro-␣1(V) Chains-A number of previous studies on the biology of pro-␣1(V) collagen chains have been performed on human recombinant pro-␣1(V) collagen chains produced in 293-HEK (human embryonic kidney) cells (40,(57)(58)(59)(60)(61)(62). As previous partial characterizations of fibrillar collagen chain PTMs have shown tissue-and cell type-specific variations in distributions of 3-Hyp residues (30 -32), we mapped the hydroxylated residues and saccharide-bound Hyl residues of the experimentally important human recombinant full-length pro-␣1(V) collagen chains produced in 293-HEK cells. This study of procollagen chains also allowed characterization of hydroxylated amino acid residues in sequences NH 2terminal to the COL1 domain and in COOH-propeptide sequences, both of which are lost in pepsin extraction of ␣1(V) chains from tissues.
MS n of AspN, GluC, chymotrypsin, ArgC, and trypsin-generated peptides of human recombinant pro-␣1(V) collagen chains produced 90% sequence coverage of the entire protein minus signal peptide sequences and 96% of the COL1 domain and identified the positions of 98 Hyp residues in the Y-position of Gly-X-Y triplet repeats and 9 Hyp residues in the X-position of Gly-Hyp-Hyp triplet repeats in the COL1 domain (Fig. 3). As described above, Hyp residues in the Y-position of Gly-X-Y repeats or in the X-position of Gly-Hyp-Hyp triplet repeats are likely 4-and 3-Hyp residues, respectively. Interestingly, within pro-␣1(V) sequences NH 2 -terminal to the COL1 domain, MS analysis mapped six Hyp residues in the Y-position of Gly-X-Y triplets, 1 Hyp residue in the X-position of a Gly-Hyp-Hyp triplet, one Hyl, and two Glc-Gal-Hyl residues. Five of the Y-position Hyp residues exist within a hypothetical (44) short, interrupted collagenous (COL2) subdomain (Fig. 3), consistent with the likelihood that this subdomain actually forms a triple helix, as the latter would be stabilized by these five 4-Hyp residues. Possible functional significance of the sixth 4-Hyp, which lies upstream of this hypothetical collagenous domain, is unclear. All NH 2 -terminal domain Glc-Gal-Hyl, Hyl, and X-position Hyp residues lie within the COL2 subdomain, suggestive of functional roles for these residues within this subdomain as well. Supportive of our findings, the single COL2 domain Hyl mapped here was previously identified by MS analysis as an "NH 2 telopeptide" Hyl residue involved in covalent cross-linking of ␣1(V) chains to ␣1(XI) chains in bovine cartilage (4). MS n analysis here of the human recombinant pro-␣1(V) chain also localized the positions of 6 Hyl, 1 Gal-Hyl, and 22 Glc-Gal-Hyl residues and 1 residue found as both Gal-Hyl and Glc-Gal-Hyl within the COL1 domain. It seems unlikely that species-specific differences in ␣1(V) sequences contributed much to differences in numbers and placement of Y-position Hyp residues, as bovine and human ␣1(V) COL1 domain sequences are 99.3% Peptides were prepared by trypsin digestion of bovine placenta ␣1 (V) collagen chains. Mass error from the expected product ion monoisotopic mass is typically less than 5 ppm. y 14 ϩ4OH fragment ion and internal fragment ions (PGPV-28 ϩ 2OH, PGPVϩ2OH, and PGPVGAϩ2OH) confirm that one Hyp is in the Gly-X-Val triplet in the upper spectrum (A). y 11 ϩOH, y 12 ϩOH, and y 14 ϩ2OH fragment ions confirm that one Hyp is in the Gly-X-Ala triplet in the lower spectrum (B). OH, one hydroxylation.
identical, differing by only 7 primary amino acids, none of which is a Pro (Figs. 1 and 3, orange residues). Modifications could not be localized for Y-position Pro residues 642, 714, 903, or 909 in either human recombinant pro-␣1(V) or bovine placenta ␣1(V) chains.
Expression of Prolyl 3-Hydroxylases in 293-HEK Cells-Differential 3 hydroxylation of prolines at some, but not other sites in clade A fibrillar collagen chain COL1 domains can be due to differential expression of the enzyme prolyl 3-hydroxylase 2 (P3H2) in different cell types and tissues (31). To test whether the absence of 3-Hyp residues at some sites in human recombinant pro-␣1(V) chains that had been 3-hydroxylated in ␣1(V) chains from bovine placenta might be due to deficiency in levels of P3H2, we tested for P3H2 expression in 293-HEK cells. We also tested for expression of the other two prolyl 3-hydroxylases, P3H1 and P3H3, and for CRTAP and PPIB (peptidyl prolyl cis-trans isomerase B), which are components of the prolyl 3-hydroxylation complex (63). As can be seen (Fig. 4), all Gly-Hyp-Hyp X-position Hyp, 1 Gly-Hyp-Val, 1 Gly-Hyp-Ala, 1 Gly-Hyp-Thr, 7-Hyl, and 23 Glc-Gal-Hyl residues were identified. Red, hydroxylated, non-glycosylated residues; green, Glc-Gal-Hyl residues; dark blue, Gal-Hyl residue; purple, residue found as both Glc-Gal-Hyl and Gal-Hyl. Underlined, hydroxylated/glycosylated residues mapped in previous studies (4,32). Sequences not identified in the course of MS analysis are in light blue. The seven COL1 amino acid residues that differ between human and bovine are orange. *, 12 Y-position Pro residues hydroxylated in human or bovine ␣1(V) COL1 domain but not the other. #, Hyl residues identified by Wu et al. (4) as being involved in interchain covalent cross-links.ˆ, Lys residues hydroxylated in bovine placenta ␣1(V) but not in human recombinant pro-␣1(V). A vertical arrow marks the site of proteolytic removal of the signal peptide, as previously determined by Edman degradation NH 2 -terminal amino acid sequencing (40) and as confirmed by MS analysis in the present study. The brackets indicates limits of the small COL2 hypothetical triple helical domain, in which interruptions of Gly-X-Y repeats are underlined. Amino acid residues NH 2 -terminal to the COL1 domain are numbered 1 to 558, starting with the initial Met residue of the signal peptide. Amino acid residues of the COL1 domain are numbered 1 to 1014, for easy comparison to the similarly numbered residues of the bovine ␣1(V) COL1 domain in Fig. 1. Amino acids of the COOH-propeptide are shown for the sake of completeness, but are not numbered, due to lack of identified hydroxylated residues.
P3Hs are clearly expressed in 293-HEK cells at levels similar to those of a cell line previously shown positive for expression of all three enzymes (43). CRTAP and peptidyl prolyl cis-trans isomerase B expression levels were readily detectable as well. These findings are thus consistent with the likelihood that the decreased numbers of 3-Hyp residues in human recombinant pro-␣1(V) chains, as compared with the tissue form of ␣1(V) chain obtained from bovine placenta, is unlikely due to decreased levels of P3Hs or other prolyl 3-hydroxylation complex components in 293-HEK cells. Rather, the reduction in 3-Hyp residues in the recombinant pro-␣1(V) chains is likely due to some other variable(s). It is possible that even normal levels of endogenous enzymes are insufficient to fully modify the overexpressed pro-␣1(V)chains produced in 293-HEK cells, perhaps contributing to some of the differences in PTM levels noted between the human recombinant pro-␣1(V) and bovine placenta ␣1(V) chains. However, the distribution of X-position Hyp residues in the recombinant pro-␣1(V)chains suggests that reduced numbers of such residues in these chains may involve the changed kinetics with which pro-␣1(V) chains are incorporated into pro-␣1(V) 3 homotrimers, the form in which they were produced in HEK-293 cells for this study, compared with the kinetics of incorporation of pro-␣1(V) chains into pro-␣1(V)2pro-␣2(V) heterotrimers, the predominant form in which they are found in tissues (see "Discussion").

A High Mass Accuracy, High Resolution Tandem Mass Spectrometric Approach for Analysis of Collagenous Proteins-
Study of the PTMs of collagens has primarily involved traditional amino acid analysis, a methodology that determines amino acid compositions by hydrolysis of purified proteins/ peptides followed by measurement of their individual amino acid constituents. Ordinarily, such analyses yield quantitative measurements of amino acid abundance, but not sequence information or the location of PTMs. More recently, MS has been employed to investigate PTMs in collagens (14, 30 -32). Henkel and Dreisewerd (14) utilized ultraviolet MALDI MS to analyze fetal calfskin collagens I, III, and V, chemically cleaved by cyanogen bromide (CNBr) to generate a limited number of peptides with distinct masses. PTMs were deduced for each CNBr peptide by comparing the difference between experimental and theoretical masses, the mass difference representing the mass of the modification(s). However, MS n was not performed to confirm sequence identity or to locate the positions of PTMs. This approach provided valuable information on the total PTM state, but without the use of MS n , specific residues carrying the PTMs remained unknown. Eyre and coworkers have characterized collagen peptides by performing MS 2 using a low resolution, low mass accuracy ion trap mass spectrometer (30 -32). They identified several novel sites of prolyl 3-hydroxylation and proposed a biological function for these PTMs in collagen. However, many of the modifications were not experimentally localized by MS 2 . Because the acquired tandem mass spectra lacked the requisite information for localization, assignment of Pro hydroxylation was often inferred from known collagen motifs. Because of the highly modified nature of collagenous proteins, we contend that PTM mapping via MS should require spectral evidence in the form of localizing fragments to have the highest confidence in the validity of sequence-specific modifications.
Our strategy, founded on a high resolution MS platform, allowed characterization of the ␣1(V) chain with unparalleled sensitivity and PTM localization precision. The enabling technology was an ETD-enabled LTQ Orbitrap Velos hybrid mass spectrometer exhibiting low ppm mass accuracy, high resolution MS, and MS n capabilities combined with multiple options for performing peptide dissociation (47). We began by using multiple proteases to generate a diverse pool of peptides amenable to MS n (45). The combination of mass accuracy and a capability to perform multistage activation using three different dissociation techniques often allowed us to pinpoint the exact residue(s) carrying individual PTMs. However, we were not always able to unambiguously define the site of localization. Thus, we classified PTM assignments into 1 of 3 categories, 1) localized (MS n product ions support PTM assignment to a specific residue), 2) unlocalized (insufficient product ions to unambiguously assign a PTM to a specific residue), or 3) pseudolocalized (insufficient product ions to unambiguously assign a PTM to a specific residue), but localization can be inferred from known collagen modification motifs (supplemental Tables S1 and S2). Note that hydroxylation modification was only considered for Pro and Lys residues and that glycosylation (mono-and disaccharide) modifications were only considered for Hyl residues (15). The numbers of PTMs reported under the "Results" and "Discussion" are exclusively for localized sites. ␣1(V) and pro-␣1(V) chains were cleaved using five different individual proteases or by applying two proteases sequentially. Trypsin, the most common protease used for MS experiments, has substantially reduced cleavage rates for Lys and Arg residues COOH-terminal to Pro, thus posing a problem for colla- gens, which have high Pro/Hyp content (64,65). Such is also the case with chymotrypsin and Glu-C. However, Arg-C will cleave at Arg residues (also at Lys, but at lower rates) adjacent to Pro residues. Peptides were separated by on-line nanoflow reversed-phase liquid chromatography, where peptides with identical primary sequences differing only in degree of hydroxylation eluted across a wide elution window, whereas peptide isomers differing only in the position of hydroxylation typically coeluted and were cofragmented to produce chimeric MS n spectra (Fig. 5). Glycopeptides and short (Ͻϳ7 residues), highly modified peptides often eluted early during the chromatographic gradient, demonstrating poor and inconsistent retention. A direct injection style of sample loading was employed to enable analysis of highly polar and poorly retained peptides rather than the vented-style trapping normally used for rapid sample loading/concentrating (66).
Because of high ␣1(V) Pro content and decreased efficacy of ETD-induced cleavage of Pro N-C ␣ bonds, ETD MS 2 analyses occasionally did not provide sufficient numbers of sequenceinformative product ions to permit PTM localization (67). To complement ETD data, data were also acquired utilizing HCD collected at different collision energies. Low HCD energies favor the generation of moderate mass fragment ions, whereas high HCD energies favor the production of low m/z fragments.
Low mass fragments enabled screening of y 1 -y 3 , b 1 -b 3 , and immonium ions, improving the ability to localize PTMs residing on peptide termini. The high ␣1(V) Pro content also resulted in an unusually large number of internal fragments (multiple peptide backbone cleavages of a single ion) in spectra collected at moderate and high HCD energies (35). Many of these internal fragments were useful for PTM localization in the absence of localizing product ions produced by a single backbone cleavage event (supplemental Spectra 1 and 2). Despite our best efforts using multiple dissociation techniques and MS 3 , neutral loss of glycosylation was still common. In these cases neutral loss product ions were used to aid in localization, as reported for phosphorylation localization (supplemental Spectra S1 and S2) (68).

Comparison of the Col1 PTMs of Bovine Placental ␣1(V) and Human Recombinant Pro-␣1(V) Chains to Each Other and to
Previous Findings-Previous amino acid analyses of ␣1(V) chains from human skin (15) or placenta (69) or produced in 293-HEK cells (61) produced estimates of 4-Hyp content of from ϳ106 to ϳ111 per 1000 amino acids. The 106 and 98 Y-position Hyp residues directly localized here by MS n for the COL1 domains of bovine placenta ␣1(V) and human recombinant pro-␣1(V) chains, respectively, are in the same range as these numbers, consistent with identification of the Y-position Hyp residues mapped here as 4-Hyp residues. Additional Y-position Hyp residues were either not directly localized or are likely located on COL1 proteolytic fragments not recovered here by MS n , as such fragments contain five additional Y-position Pro residues for both the bovine and human. Eighty-six Y-position Hyp residues mapped to identical positions in the bovine and recombinant human COL1 domains (Figs. 1 and 2), suggesting that these Hyp residues may be relatively invariant in ␣1(V) chains from various sources. Of the 12 Y-position Hyp residues that differ between the bovine ␣1(V) and human recombinant pro-␣1(V) samples, only 2 are found in the latter but not the bovine. Thus, Y-position Hyp residues found in the human recombinant pro-␣1(V) COL1 domain are for the most part a subset of those found in the bovine ␣1(V) chain. Differences in Y-position Hyp residues between the two COL1 domains may be due to tissue-/cell type-specific differences in modifying enzymes, although the pro-␣1(V) chains studied here were synthesized as overexpressed pro-␣1(V) 3 homotrimers rather than as the endogenous pro-␣1(V) 2 pro-␣2(V) heterotrimers commonly found in tissues, which might also affect levels and placement of Hyp residues.
Previous estimates of the amino acid composition of ␣1(V) chains from human skin (15) suggested ␣1(V) to be more similar to nonfibrillar basement membrane collagen IV chains than to other fibrillar collagen chains in possessing a high content of Hyl residues, the majority of which are glycosylated. Specifically, human skin ␣1(V) chains (15) were estimated to comprise ϳ39 hydroxylysines, 29 in the form of Glc-Gal-Hyl residues and 5 in the form of Gal-Hyl residues, suggesting Glc-Gal and Gal to be frequently and infrequently occurring PTMs, respectively. Henkel and Dreisewerd (14), employing MALDI MS analysis of fetal calf skin ␣1(V) chains, estimated that most, if not all, Hyl residues are likely glycosylated, with ϳ87% predicted to be Glc-Gal-Hyl. However, they lacked MS n sequence data to support their assignments of PTMs, making it difficult to confidently determine the nature of saccharide species (e.g. two residues with monosaccharide modifications and one residue with a disaccharide modification share the same mass such that these PTMs cannot be distinguished based solely on analysis of intact peptide masses). Rhodes and Miller (69) employed amino acid analysis to estimate human placenta ␣1(V) chains to contain 35 Hyls but did not analyze glycosylation. Here, we directly localized 37 hydroxylysines in bovine placenta ␣1(V) chains, similar in number to the 34 -39 hydroxylysines previously detected by amino acid analysis in ␣1(V) chains from bovine placenta and uterus (70). Furthermore, we identified 34 of the hydroxylysines mapped here as Glc-Gal-Hyl residues. Thus, overall numbers of Hyl and Glc-Gal-Hyl residues mapped here in bovine ␣1(V) chains accord well with previous estimates of amino acid content. Our estimates of Gal-Hyl modification of bovine placenta ␣1(V) may differ from those of previous reports because 1) extremely low levels of Gal-Hyl modifications were beyond our limit of detection, 2) we exclusively reported localized modifications, disregarding peptides with multiple potential glycosylation sites that could not be localized to a specific residue, and/or 3) previous reports did not attempt to localize modifications and may have overestimated Gal-Hyl numbers.
We found the COL1 domain of human recombinant pro-␣1(V) chains produced in 293-HEK cells to contain 30 hydroxylysines: 22 as Glc-Gal-Hyl residues, 1 as Gal-Hyl, and 1 as both Glc-Gal-Hyl and Gal-Hyl. These hydroxylated Lys residues are a subset of those found in bovine ␣1(V), except for Hyl 150 , which is not hydroxylated in the bovine chain ( Figs. 1 and 3). Thus, 29 hydroxylated Lys residues found in ␣1(V) COL1 domains from two very different sources may be relatively invariant in ␣1(V) chains. The reduced numbers of Hyl and Glc-Gal-Hyl residues in the human recombinant protein suggests that 293-HEK cells, which produce little or no endogenous extracellular matrix proteins (61), may possess reduced levels of the multifunctional enzyme lysyl hydroxylase 3, which has lysyl hydroxylase activity, and collagen galactosyltransferase and glycosyltransferase activities that allow it to glycosylate Hyl residues hydroxylated by itself and by other lysl hydroxylase isoforms (18), although even normal levels of enzymes might be insufficient to fully modify the overexpressed recombinant pro-␣1(V)chains. Interestingly, COL1 residue 84 was detected here as both a Glc-Gal-Hyl and a Gal-Hyl residue, showing glycosylation at this site to be dynamic, with partial occupancy by both Glc-Gal and Gal saccharides. Our direct localization of 30 COL1 Hyl residues within the human recombinant pro-␣1(V) supersedes a previous amino acid analysis estimate of 6 COL1 Hyl residues for human recombinant pro-␣1(V) chains produced in 293-HEK cells (61) and is more congruent with the observed efficient secretion of recombinant pro-␣1(V) 3 homotrimers from 293-HEK cells (40,61), as threshold levels of glycosylated Hyl residues are necessary for efficient secretion of at least some collagenous molecules (18).
Previous amino acid analysis of ␣1(V) chains from human placenta (69) or human skin (15) estimated 3-Hyp content at four or ten 3-Hyp residues, respectively, per 1000 amino acids, levels higher than those detected by similar analyses of the major fibrillar collagen chains (71)(72)(73)(74). The nine Hyp residues mapped here to the X-positions of Gly-Hyp-Hyp repeats in the COL1 domain of human recombinant pro-␣1(V) chains falls within this range (15,69). However, the 22 X-position Hyp residues mapped here within the bovine placenta ␣1(V) chain lies beyond this range and is instead reminiscent of the range of 10 to 20 3-Hyp residues previously estimated by amino acid analysis to lie within the collagen IV chains of some tissues (75,76).
Using low resolution MS 2 and assignments based on known collagen motifs (see above), Eyre and co-workers (32) recently mapped three X-position Hyp residues in ␣1(V) chains from human bone to positions 434 and 665 and to a site variously identified by them as 692 or 695 of the COL1 domain. They also predicted the tissue-specific occurrence of 3-Hyp residues within Gly-Pro-Pro repeats at the COOH termini of the COL1 domains of ␣1(V) and other fibrillar collagen chains, based on mapping of such residues to Gly-Pro-Pro repeats at the COL1 COOH termini of ␣1(I) and ␣2(I) chains from tendon but not from skin or bone (30). Here, we confirm the presence of X-position Hyp residues at sites 434, 665, and 695 and in all three COL1 COOH-terminal Gly-Pro-Pro repeats (sites 1004, 1007, and 1010) in bovine placenta ␣1(V) chains (supplemental Spectra 1), whereas human recombinant pro-␣1(V) chains contained an X-position Hyp residue only at site 434. Thus, we have confirmed the positions previously identified or predicted for six ␣1(V) COL1 X-position Hyp residues and found one of these (residue 434) to be, thus far, invariable. In addition, we identified the positions of 15 additional X-position Hyp residues in bovine placental ␣1(V) chains. The nine human recombinant pro-␣1(V) COL1 X-position Hyp sites are also found in the bovine ␣1(V) chain and provide confirmatory evidence for Hyp occupancy of these nine X-position sites. Interestingly, COL1 X-position Hyp sites in human recombinant pro-␣1(V) chains are limited to the NH 2 -terminal half of the domain. Although the reason for this is unknown, it may relate to the COOH to NH 2 terminus direction of triple helix formation and to the ability of prolyl 3-hydroxylases to modify unfolded but not triple helical procollagen chains. Thus, because the recombinant pro-␣1(V) chains are produced as pro-␣1(V) 3 homotrimers, which form triple helices with apparently increased kinetics than do pro-␣1(V) 2 pro-␣2(V) heterotrimers (77), prolyl 3-hydroxylases may be particularly limited in the ability to hydroxylate residues in the COOH-terminal portions of these recombinant chains. In fact, the NH 2 -terminal COL1 distribution of X-position Hyp residues in recombinant pro-␣1(V) 3 homotrimers is consistent with the suggestion that 3-Hyp content may be related to the speed of triple helix formation (55) but may be inconsistent with the suggestion (31) that 3-hydroxylation of fibrillar collagens begins at the COOH terminus and diminishes in the more NH 2 -terminal portion of the COL1 domain. Other PTMs were not limited to the NH 2terminal portion of recombinant pro-␣1(V) COL1 domains, perhaps reflecting differences in the kinetics of prolyl 3-, prolyl 4-, and lysyl hydroxylases.
We report here the unexpected finding of X-position Hyp residues in the context of a Gly-X-Val and a Gly-X-Ala triplet in the bovine placenta ␣1(V) COL1 domain. It is unclear at this time whether these Hyp residues might be 3-Hyp or 4-Hyp or whether their appearance represents a lapse in fidelity of at least one of these enzymes or whether such sites represent previously unknown substrates at which hydroxylation serves specific functions. Supportive of our findings, Pro hydroxylation in an Gly-Pro-Ala motif has previously been reported for bovinederived collagen I (78). It should be noted that although X-position 4-Hyp residues are thought to destabilize triple helices when in the context of Gly-Hyp-Pro triplets (79), X-position 4-Hyp residues do not necessarily destabilize the triple helix when in the context of triplets that lack Pro or Hyp in the Y-position (80). Thus, the significance of the unexpected finding of X-position Hyp in these non-canonical sites remains to be determined.
Numbers of Hyl, Glc-Gal-Hyl, and Y-position Hyp residues mapped here by MS n in the bovine placenta ␣1(V) chain are consistent with numbers of 4-Hyp, Hyl, and Glc-Gal-Hyl residues in estimates of the amino acid compositions of hydrolyzed ␣1(V) chains from various tissues. In contrast, the number of X-position Hyp and likely 3-Hyp residues detected here by tandem MS in Gly-Hyp-Hyp triplets in bovine placenta ␣1(V) chains is markedly higher than numbers of 3-Hyp residues previously estimated by amino acid analyses. Importantly, the numbers of PTMs predicted by early amino acid analyses were based on a presumption of 100% occupancy at a fixed number of sites. Thus, although congruence in numbers of PTMs predicted by previous amino acid analyses and the MS n localization results presented here suggests that this presumption holds true for most ␣1(V) PTMs, this does not appear to be the case for 3-Hyp residues. Rather, the large difference between the numbers of 3-Hyp predicted by previous amino acid analyses and the number of X-position Hyp residues in Gly-Hyp-Hyp triplets mapped here by high resolution MS is best explained by the concept of a relatively large number of 3-Hyp sites that have less than 100% occupancy in a population of ␣1(V) chains from a given tissue. This conclusion is supported by amino acid analysis performed on bovine ␣1(V) chain samples used in this study (Table 1), which estimated overall Hyp levels of these samples to be similar to, and not higher than, Hyp levels estimated for ␣1(V) chains in earlier amino acid analysis studies (70). It is not surprising that with the improved sensitivity and specificity afforded by our MS-driven proteomics workflow that we have identified numerous PTMs previously undiscovered, including those present at relatively low stoichiometric abundances. Interestingly, 3-Hyp residues point away from the triple helix (81), implying roles in protein-protein interactions as the bases of biological function for these residues. The presence of numbers of sites that can be differentially modified by P3Hs in a given tissue suggests a previously unknown mechanism for dynamic modification of the functions and intermolecular interactions of ␣1(V) chains and col(V) and perhaps other collagen types as well. Mapping of Hydroxylated Amino Acids of ␣1(V) Collagen