Up-regulation of collagen proteins in colorectal liver metastasis compared with normal liver tissue

Changes to extracellular matrix (ECM) structures are linked to tumor cell proliferation and metastasis. We previously reported that naturally occurring peptides of collagen type I are elevated in urine of patients with colorectal liver metastasis (CRLM). In the present study, we took an MS-based proteomic approach to identify specific collagen types that are up-regulated in CRLM tissues compared with healthy, adjacent liver tissues from the same patients. We found that 19 of 22 collagen-α chains are significantly up-regulated (p < 0.05) in CRLM tissues compared with the healthy tissues. At least four collagen-α chains were absent or had low expression in healthy colon and adjacent tissues, but were highly abundant in both colorectal cancer (CRC) and CRLM tissues. This expression pattern was also observed for six noncollagen colon-specific proteins, two of which (CDH17 and PPP1R1B/DARP-32) had not previously been linked to CRLM. Furthermore, we observed CRLM-associated up-regulation of 16 proteins (of 20 associated proteins identified) known to be required for collagen synthesis, indicating increased collagen production in CRLM. Immunohistochemistry validated that collagen type XII is significantly up-regulated in CRLM. The results of this study indicate that most collagen isoforms are up-regulated in CRLM compared with healthy tissues, most likely as a result of an increased collagen production in the metastatic cells. Our findings provide further insight into morphological changes in the ECM in CRLM and help explain the finding of tumor metastasis–associated proteins and peptides in urine, suggesting their utility as metastasis biomarkers.

The collagen protein family is well-studied, having 8000ϩ publications annually over the last 5 years (based on the PubMed index, July 2018). Despite the large number of annual publications, there is still a need to explore collagen and its functions. It is believed that deepening the understanding of collagen will eventually lead to significant contributions in important fields such as cancer, fibrosis, and regenerative medicine (1).
It is well-known that the loss of the normal structure of the extracellular matrix (ECM), 2 including the major component collagen, has been associated with enhanced tumor proliferation (2,3). Tumor proliferation changes the function of collagen (4,5), and it has been shown that these changes are dependent on changes of individual collagen levels, as shown by Nyström et al. (6) for colorectal cancer (CRC). These changes have been suggested to play a role in chemoresistance and increased cell proliferation becoming a vicious circle. Naba et al. (7) showed differences of ECM proteins in colorectal liver metastasis (CRLM) compared with healthy liver in an underpowered sample set (n ϭ 3). However, a sufficiently powered study regarding the concentration of specific collagen types in CRLM is lacking.
As a result of CRLM, it has been shown that an up-regulation of collagen can be observed in urine (8,9) and plasma (10). Increased excretion of collagen in urine and plasma may be explained by a higher rate of tissue remodeling and turnover, as matrix metalloproteinases (MMPs) (11)(12)(13) are differently expressed in CRLM and CRC compared with healthy liver and colon tissue, respectively (14 -16). However, the up-regulation in CRLM of proteins related to collagen production has not been shown.
We aim to study the changes in collagen induced by CRLM with MS in a well-powered data set. We determined which specific collagen types are altered, whether proteins in the collagen turnover pathway are altered, and whether we can define a specific protein footprint of the primary tumor in the metastasis.

Patient characteristics
Patients with CRLM had a median age of 69.2 years (interquartile range (IQR) 60.5-74.7 years) and were mostly male (66.7%). Patients had a median of 1 (maximum 7) tumor, being moderately differentiated in 28 of 30 patients. The CRLM tumor had a median size of 2.4 cm (IQR 1.4 -3.5 cm). The primary tumor was located in the colon (47%), rectum (43%), and sigmoid (10%). Patients with CRC had a median overall age of 71.1 years (IQR 54.8 -80.9 years). The patients with liver fibrosis (FIB; ranging from moderate to severe fibrosis) had a median The authors declare that they have no conflicts of interest with the contents of this article. This article contains supplemental information S1-S8. The MS proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE archive with the data set identifier PXD008383. 1 To whom correspondence should be addressed. Tel.: 31-107038069; E-mail: t.luider@erasmusmc.nl.
overall age of 59 years (IQR 58 -59 years) and were all male. These patients developed fibrosis due to a combination of different causes: viral hepatitis (3 of 5), HCC (5 of 5), steatosis (1 of 5), and steatohepatitis (2 of 5). Additional information is found in supporting information S4.

Data assessment
Normalization-Normalized values obtained from healthy adjacent liver tissue (control) and CRLM tissue did not differ significantly (p ϭ 0.58).
Permutation-In the true data set of control versus CRLM tissue, 5812 of the 20,635 peptides (28.1%) were tested significantly different. In the permutated data, an average of 631 (3%) peptides were significantly different. The true data set contained more significant peptides than the average of the permutation plus twice the S.D. (5812 Ͼ Ͼ ϳ631; p Ͻ 0.05) of the permutated data.

Proteomics
Top 10 proteins down-and up-regulated-In total, 2817 proteins were identified in CRLM and control tissue (see supporting information S8). The 10 proteins with the highest absolute number of down-and up-regulated peptides are shown in Table  1. Many down-regulated proteins were present in liver-specific pathways (Table 1). Furthermore, seven of the top 10 up-regulated proteins in CRLM tissue originated from ECM.
Collagen in CRLM-We then analyzed the number of differentially expressed collagen peptides and collagen-␣ chains. In CLRM tissue, 1137 collagen peptides were identified, of which 819 peptides (72%) were up-regulated and 55 (4.9%) were down-regulated. Overall, we observed an increased collagen level in CRLM versus control tissue (p Ͻ 0.0001, linear -fold change 1.67). Furthermore, 19 of the 22 collagen-␣ chains were significantly up-regulated (Table 2). COL12A1 was the most differentially expressed collagen alpha chain (2-fold change, 86% of identified peptides (n ϭ 50) significantly up-regulated). An unsupervised clustering was performed based on all collagen peptides that separated both groups, except for four samples (CRLM-5, -9, -10, and -25). The four samples (CRLM-8, -12, -19, and -21) containing up to 50% control tissue were evenly clustered within the CRLM group (Fig. 1). CRLM-10 had much lower MS intensities for collagen peptides than the rest of the CRLM samples. However, in the normalization process, this sample remained within the normal range of samples.
Collagen turnover-related proteins-The observed up-regulation of collagen in CRLM may be due to changes in the normal turnover of collagen in the liver. To test this hypothesis, 65 different proteins included in collagen turnover pathways were retrieved from the "reactome pathway knowledgebase" (17,18). Twenty-four of these 65 proteins were identified, 17 (ϳ68%) of which were significantly up-regulated in CRLM tissue. Sixteen of the 17 up-regulated proteins are included in the collagen synthesis and/or fibril-forming pathways (Table 3). Supporting information S5 contains additional information.
CRC protein signature in CRLM-Next, we studied the presence of a correlation between collagen levels in CRLM and CRC tissue. For COL10A1, COL12A1, COL14A1, and COL15A1, an absence or low number of unique peptides was observed in healthy colon and healthy liver tissue, but a higher number of unique peptides were present in CRC and CRLM tissues (supporting information S6), being most apparent for COL12A1.
Furthermore, the presence of proteins specific for colon tissue in CRLM and control tissues was studied. Elevated expression of 165 genes was found in colon tissue, and in CRLM tissue, 25 corresponding proteins were identified (Table 4). Supporting information S7 contains additional information. Six proteins matched the set criteria and were considered unique for colon tissue versus control liver tissue and were found to be significantly higher in CRLM tissue than control tissue.

Collagen in colorectal liver metastasis
Immunohistochemistry MS data were cross-validated by IHC staining of COL12A1 in CRLM and control tissue. CRLM stroma tissue stained positive for COL12A1 (Fig. 2, a and b), except for three tissues (CRLM-19, -20, and -28; Fig. 2c). Control tissue did not show positive staining (Fig. 2d). The three negatively stained CRLM tissue sections did cluster with the rest of the CRLM samples by proteomics (Fig. 1). Based on summed peptide ranking, the The number of peptides identified per collagen-␣ chain. b The number of peptides with a -fold change Ͼ0 or Ͻ0 and p Ͻ 0.05. The -fold change is based on log 10 values. c Indicates the p value at the protein level. d Indicates the true -fold change at the protein level (not the log 10 value). e These proteins are identified with one peptide, and their identification is considered as less reliable compared with proteins identified with two or more peptides.

Figure 1. Unsupervised clustering at the patient level (horizontal) and collagen peptide level (vertical).
The yellow and red regions at the top indicate control liver and CRLM tissue, respectively. The color scheme in the cluster analysis itself is from yellow (low) to red (high).

Collagen in colorectal liver metastasis
COL12A1 values of the three negative IHC tissues were above 3 ϫ S.D. of the control, and 0, 9, and 18 peptides, respectively, were identified per sample, indicating that by proteomics, these samples can be detected correctly. The IHC staining of COL12A1 is present in the ECM. However, the staining was not equally distributed in the ECM of the CRLM tissue. Furthermore, studying tissue sections to determine IHC scoring revealed distortions in the CRLM morphology compared with the control tissue (Fig. 2). In addition, CRC and healthy colon tissue (control C) were stained. CRC tissues showed, similar to CRLM, staining of the ECM for COL12A1. Staining was also observed in the nuclei of epithelial cells in healthy colon tissue by COL12A1 antibody NBP1-88062 (Fig.  2g). The other COL12A1 antibody (HPA009143) did not show staining of the epithelial cells but stained in addition to the collagen present in tumor stroma smooth muscle tissue (Fig.  2h). FIB liver tissue was stained by IHC for COL12A1, and no positive staining was observed (Fig. 2f).

Discussion
In this study, we focused on the presence of collagen in CRLM, and we showed by using proteomics that we could distinguish all 30 CRLM samples correctly from normal liver tissue in a relative detailed protein level. Twenty-two of the 44 known collagen-␣ chains were observed in CRLM and/or control tissue (healthy adjacent liver tissue). The remaining 22 chains were not identified, possibly due to low concentration or absence in the liver.
Previously, Naba et al. (7) showed the presence of specific collagen-␣ chains in colon, CRC, liver, and CRLM tissue in a small sample set (n ϭ 3) including a pooled control (n ϭ 2). Naba et al. (7) observed more collagen types, although they performed additional enrichment and extensive fractionation. The authors indicated that validation needs to be performed in a larger sample set, which was confirmed by our power calculation with an estimation of (minimum) 25 samples/group. In the present study, we analyzed matched control and CRLM tissue from 30 individuals, providing a well-powered data set. An up-regulation of collagen peptides (p Ͻ 0.0001) in CRLM tissue compared with control tissue was observed as well as up-regulation of individual collagen-␣ chains. Three collagen-␣ chains (COL8A1, COL16A1, and COL18A1) were not significantly affected. COL18A1 is mainly involved in the development of the eye (19), central nervous system, and liver structures (20,21). Musso et al. (22) demonstrated that COL18A1 production is not increased in CRLM compared with liver tissue and therefore may be considered as a negative control. The low number (two or less) of identified peptides for COL4A4, COL5A3, COL7A1, COL8A1, COL15A1, and COL16A1 indicate relatively low levels in CRLM.
COL12A1 MS data were cross-validated with IHC, and both techniques showed significant up-regulation of COL12A1. During scoring of the tissues, no abnormalities were observed in healthy adjacent liver tissue. Nevertheless, the presence of molecular abnormalities cannot be ruled out with certainty. We tried to exclude direct tumor effects by taking control tissue that was at least at a distance of 1 cm away from the tumor. Many articles referring to premetastatic niches have been published (27)(28)(29)(30). A premetastatic niche is formed by interaction of the potential metastasis site with proteins excreted from the primary tumor, creating a niche that is favorable for the growth of a metastasis. We were not able to visualize COL12A1 in three CRLM tissue sections with IHC, although increased COL12A1 levels were detected with MS. In CRC, COL12A1 was mainly present in stroma, analogous to CRLM.
However, in colon tissue, it was surprisingly observed inside the nuclei of epithelial cells, with repetitive staining. We could not validate the staining of the nuclei of the epithelial cells with another antibody (HPA009143). This indicates that staining of the nuclei is most likely false positive. This antibody (HPA009143), which stains COL12A1, also showed cross-reactivity. In this case, smooth muscle tissue stained false positively. Both antibodies showed exactly the same staining for collagen Table 3 An overview of the identified proteins involved in collagen synthesis and degradation The listed proteins are involved in three different pathways, annotated as follows: F, assembly of collagen fibrils and other multimetric structures; B, collagen biosynthesis and modifying enzymes; D, collagen degradation. Green, up-regulated; red, down-regulated.
i Number of peptides identified per protein.
ii Number of peptides that are significantly different and have a -fold change Ͼ0 or Ͻ0. The -fold change is based on log 10 values. iii p values calculated over the summed ranked peptides followed by the -fold change. The -fold change shown is a true value (not a log 10 value). iv These proteins are identified with one peptide, and their identification is considered as less reliable compared with proteins identified with two or more peptides.

Collagen in colorectal liver metastasis
structures, and they showed different additional cross-reactivity (nuclei endothelial cell and smooth muscle tissue). The shotgun proteomics data allowed us to search for specific proteins produced by CRLM, which originate from the primary colon tumor tissue. First, we looked into collagen types that were expressed higher in CRC with respect to colon tissue and were also more abundant in CRLM with respect to liver tissue. Four collagen types matched these criteria, COL10A1, COL12A1, COL14A1, and COL15A1, and the abundance was strongest for COL12A1. We hypothesize that these four collagen types visible in CRLM are a sign of origin of the primary tumor and highly specific for CRLM. We cannot exclude the possibility that fibrotic tissue is triggered by the metastatic cells to produce, for example, COL12A1; however, we observed in five fibrotic tissue sections that COL12A1 is not present in just fibrotic tissue itself. Collagen type X (31) and type XII (32) are described as being up-regulated in CRC; however, collagen type XIV and XV are not described. Collagen type X, XII, XIV, and XV are not described in literature in relation with CRLM.
Colon-specific proteins were selected from the Protein Atlas, and after applying the selection criteria, six proteins remained. These remaining proteins (CDH17, KRT20, CEACAM5, GPA33, MUC13, and PPP1R1B/DARPP-32) are colon-specific and were significantly present in the CRLM tissue. Previous reports state that KRT20 (33), MUC13 (7,34), CEACAM5 (35), and GPA33 (36,37) are differently expressed in CRLM tissue, whereas CDH17 and PPP1R1B/DARP-32 have not been described previously. It is likely that CDH17 is involved in cell organization and stimulates tumor proliferation (38), whereas PPP1R1B/DARPP-32 is highly expressed in CRC tissue and is a predictor for metastasis (39). Furthermore, six of the 10 top down-regulated proteins are related to liver-specific processes. The relative loss of proteins involved in liver processes in the tumor relates to a known decrease of hepatocytes in tumor tissue (40). The top 10 up-regulated proteins included collagen and two other ECM proteins, fibrillin-1 (FBN1) and fibronectin (FN1), suggesting that the ECM composition strongly deviates from normal.
We conclude that collagen is up-regulated in CRLM compared with control tissue and that specific collagen types (COL10A1, COL12A1, COL14A1, and COL15A1) from the primary tumor can be detected in the CRLM tissue as well. The collagen changes found in CRC and CRLM may reflect changes of the ECM related to tumor proliferation and seeding into the liver. These data may help to define new biomarkers to be used in CRLM detection after treatment of the primary tumor or its metastases.

Experimental design and statistical rationale
This study was approved by the Erasmus MC review board (MEC-2007-088), and we have worked according to the Declaration of Helsinki. We identified collagen types that are differently present in paired (n ϭ 30) CRLM tissue compared with normal adjacent liver tissue (control). The presence of colon-or CRC-specific proteins in CRLM was studied by comparing CRLM, control, healthy adjacent colon (n ϭ 5), and CRC tissue (n ϭ 5). Fibrotic tissue (n ϭ 5) was analyzed to determine whether COL12A1 up-regulation is a general liver process or caused by metastatic cells. CRLM, control liver, CRC, and colon tissue were analyzed by IHC (COL12A1 antibody: NBP1-88062) and MS. FIB tissue was analyzed by IHC (COL12A1 antibody: NBP1-88062). Colon tissue was additionally stained with a second COL12A1 antibody (HPA009143) to verify the unexpected staining of epithelial nuclei.
Sample sizes of the CRLM and control tissue were based on a power analysis (␣ ϭ 0.05, ␤ ϭ 0.20). Mean and S.D. used for the power analysis were calculated on the overall data of log-transformed significant up-regulated collagen peptides in CRLM and control tissue of five patients (control mean ϭ 5.00, CRLM mean ϭ 5.94, pooled S.D. ϭ 1.18). The power analysis depicted a sample size larger than 25 samples/group to determine the observed differences in the subset. No replicate measurements were performed; we assumed the biological variation to be much larger than the technical variation, as has also been described in literature for the technique used (41).
Quality of the LC-MS/MS runs was monitored by measuring samples first on a test system to identify incomplete digestions and determine the presence of other components influencing the chromatography. Furthermore, after every 10 samples, a quality control sample was measured containing a set of peptides spread over the whole gradient to monitor possible retention time shifts or loss of intensity in the mass spectrometer. The peptides showed a retention time variation that remained within 0.2 min.
Data were assessed by analysis of the normalization by a t test, and background was assessed by permutation testing (see "Data assessment"). Proteins included in collagen turnover pathways were analyzed. Statistics are described under "Data analysis" and "Statistical analysis." Mass spectrometry data were orthogonally cross-validated by IHC on the most distinctive collagen type.

Sample selection criteria
All samples were provided by the Department of Pathology, Erasmus MC, and examined (by a gastrointestinal pathologist) for the presence of healthy and tumor tissue in the same section. Four CRLM sections were included that contained up to 50% control tissue (CRLM-8, -12, -19, and -21); the other sections were free of control tissue. Control liver sections were demonstrated to be free of tumor tissue by standard histological examination and were taken at a minimum distance of 1 cm away from the tumor. All CRC sections included contained nontumorous colon tissue (by standard histological examination). All samples were included in the data analysis. Tissue sections and patient information were handled according to the Federation of Dutch Medical Societies (47). Therefore, no approval of the Medical Research and Ethics Committee was required.

Collagen in colorectal liver metastasis
tion Systems Peroxidase/DAB, rabbit/mouse was obtained from Dako (Heverlee, Belgium). Tris-EDTA for heat-induced antigen retrieval was obtained from Klinipath (Duiven, The Netherlands). Other chemicals were obtained from Sigma-Aldrich (Zwijndrecht, The Netherlands).

IHC staining
Formalin-fixed paraffin-embedded tissue sections were analyzed by MS and IHC. Consecutive tissue sections (6-m thickness) were cut. For IHC, tissue sections were deparaffinized and rehydrated, followed by inhibition of endogenous peroxidases. Prior to primary antibody labeling, heat-induced antigen retrieval was used. Samples were labeled with a primary antibody against COL12A1, followed by secondary labeling, visualized by incubation with DAB, and counterstained with hematoxylin. Tissue sections were scored by a gastrointestinal pathologist (MD). Tissues was scored according to the following criteria: 0, no staining; 0.5, small focal staining; 1, focal staining; 1.5, few focal staining areas; 2, several focal staining areas; 2.5, staining more than 40% and less than 50%; 3, Ͼ50% staining. The full IHC protocol is available in supporting information S1.

Sample treatment for MS
Tissue sections for MS analysis were placed in an Eppendorf cup; one tissue section was analyzed per sample, irrespective of the size of the resection material. Injection volumes were normalized based on the UV absorbance measured on a test system. The tissue sections were deparaffinized and rehydrated, followed by removal of formalin cross-links. Formalin crosslinks were removed by incubation with a solution of Tris and RapiGest (42), followed by reduction, alkylation, and overnight trypsin digestion. Trypsin cleaves exclusively C-terminal to lysine or arginine unless proline is located at the C-terminal side (43). The full protocol for sample treatment for MS analysis and nano-LC and MS settings are available in supporting information S1. Mass spectrometry data were made publicly accessible via the PRIDE archive, accession number PXD008383.

Data analysis
MGF peak list files were extracted from raw files by Proteo-Wizard (version 3.0.9166). MGF files were searched using the Mascot search engine (version 2.3.2, Matrix Science Inc., London, UK) and the UniProt/Swiss-Prot database (20,194 entries), as described by Singh et al. (44). In short, the database search contained a mass tolerance of 10 ppm for the peptide mass and 0.5 Da for the fragment mass. A maximum of 4 missed cleavages was allowed. Hydroxylation (ϩ16 Da) of proline, lysine, and methionine were included as variable modifications, and carbamidomethylation (ϩ57 Da) of cysteine as a fixed modification. Mascot search results were further analyzed by Scaffold (version 4.6.2, Proteome Software, Portland, OR). In Scaffold, protein confidence levels were set to a 1% false discovery rate (FDR), at least 1 peptide/protein, and a 1% FDR at the peptide level. FDRs were estimated by inclusion of a decoy database search generated by Mascot. Subsequently, the Scaffold data were imported into Progenesis QI (version 4, Nonlinear Dynamics, Newcastle-upon-Tyne, UK) to align the LC runs.
Features not matching the selection criteria: charge between ϩ2 and ϩ5, and at least two isotopes were excluded from further analysis. Identifications from Scaffold were imported via Scaffold into Progenesis QI, followed by export of the normalized abundance to Excel 2010 (Microsoft, Redmond, WA). Duplicate feature intensities were summed. Data were further processed with Excel, GraphPad Prism (version 5.01, GraphPad Software, Inc., La Jolla, CA), and R (version 3.3.1, R Consortium, Vienna, Austria). Prior to log 10 transformation, a relatively small value of "10" was added, this to include missing values for further data analysis. The log 10 -transformed data were used for statistics; thereby, the data were assumed to be normally distributed after log transformation. Protein significance was determined by ranked peptide values and summation of all peptides per protein, followed by an unequal variance t test. In all following comparisons, an unequal-variance t test was used unless stated otherwise. p values Ͻ 0.05 were considered significant unless stated otherwise.

Data assessment
CRLM and control MS data were assessed; this included the analysis of normalization for injection volumes, permutation testing, and intra-and intercollagen-␣ chain correlations. To exclude a bias a priori in the data, normalization was tested by summation of all individual peptide values per patient, followed by testing for significance between CRLM and control tissue.
To exclude the possibility that results were found by chance in large data sets, permutation testing (n ϭ 1000 iterations) was performed using R. The R script has been added to supporting information S2. Briefly, the data were randomly divided into two groups at the peptide level; significant differences between the two groups were determined using the Wilcoxon signedrank test. Significant p values were summed per permutation, and the log 10 was taken. It was assumed that the log 10 summed significant p values were normally distributed. If the true data set value was greater than the average value of the permutation test plus twice the S.D. (p Ͻ 0.05), then a significant difference was assumed.

Statistical analysis
An unsupervised cluster analysis was performed with R on ranked peptides. The R code is available in the supporting information S3.
The observed up-regulation of collagen can be due to a disturbance in its normal turnover. To test this hypothesis, proteins included in collagen turnover pathways were retrieved from the reactome pathway knowledgebase (17,18). Three pathways present in the reactome pathway knowledgebase are related to collagen production and degradation: "collagen biosynthesis and modifying enzymes," "assembly of collagen fibrils and other multimetric structures," and "collagen degradation." Although not included in the reactome pathway knowledgebase, Xaa-Pro dipeptidase (PEPD) is involved in collagen degradation. PEPD cleaves Xaa-Pro and Xaa-Hyp, but not Pro-Pro, and is essential for the final degradation of collagen. PEPD was added to the retrieved list of proteins from the reactome pathway knowledgebase.

Collagen in colorectal liver metastasis
The CRC protein signature in CRLM was determined at two levels: collagen-specific and the general proteome. At the collagen level, the total unique peptide count was used as a measure of abundance. Proteins were considered as "hitchhiked" from the colon to CRLM if they matched the following criteria: 1) they were marked in the Human Protein Atlas (www. proteinatlas.org) 3 (45) as tissue-enriched, group-enriched, or tissue-enhanced in colon tissue; 2) they had an intensity-based absolute quantification (iBAQ) value at the protein level for colon tissue but not for liver tissue; and 3) the RNA expression level was 100-fold lower in the liver versus colon tissue, and five peptides or more were identified per protein. iBAQ and RNA expression levels were taken from https://www.proteomicsdb. org 3 (46).