Elucidation of proteostasis defects caused by osteogenesis imperfecta mutations in the collagen- a 2(I) C-propeptide domain

Intracellular collagen assembly begins with the oxidative folding of ~ 30-kDa C-terminal propeptide (C-Pro) domains. Folded C-Pro domains then template the formation of triple helices between appropriate partner strands. Numerous C-Pro missense variants that disrupt or delay triple-helix formation are known to cause disease, but our understanding of the specific proteostasis defects introduced by these variants remains immature. Moreover, it is unclear whether or not recognition and quality control of misfolded C-Pro domains is mediated by recognizing stalled assembly of triple-helical domains or by direct engagement of the C-Pro itself. Here, we integrate biochemical and cellular approaches to illuminate the proteostasis defects associated with osteogenesis imper-fecta-causing mutations within the collagen- a 2(I) C-Pro domain. We first show that “ C-Pro-only ” constructs recapitulate key aspects of the behavior of full-length Col a 2(I) constructs. Of the variants studied, perhaps the most severe assembly defects are associated with C1163R C-Pro a 2(I), which is inca-pable of forming stable trimers and is retained within cells. We find that the presence or absence of an unassembled tri-ple-helical m and in 3 buffer lacking 15 and performed according to the manufac-turer s by aqueous

disulfide bond that covalently links the assembled monomers to each other (9,21). The cysteine residues are important for stability and were also recently shown to be critical for controlling proper heterotrimer formation (10).
These biochemical and structural insights provide a framework to understand how C-Pro mutations can induce collagenopathies. Currently, at least thirty missense mutations in either C-Proa1(I) or C-Proa2(I) are known to cause OI (22). Some of these mutations disrupt the key cysteine residues, whereas others cause defects for less obvious reasons. In early studies, it was observed that several variants delay secretion or cause collagen-I overmodification [e.g. C1299W, G1272V, and T1431I in C-Proa1(I) and D1315V and G1176V in C-Proa2(I)] (19,22,23). Others appeared to greatly reduce procollagen secretion and cause extensive formation of Cola1(I) homotrimers [e.g. C1163R in C-Proa2(I)] (19). More recently, studies of C-Proa(I) variants in primary patient cells confirmed that the resulting misfolding can cause aberrant collagen-I trafficking (18).
Despite this progress, there is much that remains unknown regarding C-Pro proteostasis. Well-appreciated practical chal-lenges associated with biochemical characterization of collagen folding, quality control, and chaperone interactions (24) present many difficulties for high-resolution characterization of defective proteostasis in the context of full-length collagen-I constructs containing disease-causing C-Pro mutations. Many variants have never been subjected to biochemical characterization. Moreover, it is unclear whether or not quality control mechanisms directly recognize misfolded C-Pro domains or if they instead require the presence of a full-length, triple-helix domain-containing collagen molecule stalled at the assembly or folding stage. Along these lines, there is considerable data now available regarding how the ER proteostasis network engages full-length collagen-I (24,25), but which components of that network are specifically involved in C-Pro domain proteostasis still remains unclear.
We sought to obtain a more comprehensive understanding of how collagen-I folding and assembly is disrupted by diseasecausing C-Pro mutations and of how the cell responds to C-Pro misfolding. We focused on C-Proa2(I) variants because, unlike C-Proa1(I), only a single copy of C-Proa2(I) is included in any  [18]). The two a1(I) chains are colored light and dark pink, the a2(I) chain is colored cyan, and the Ca 21 ion is colored green. Locations of exemplary osteogenesis imperfecta (OI)-causing amino acid substitutions in C-Proa2(I) studied here are highlighted in yellow. B-E, close-up views of the locations of OI-causing mutations with the WT residue shown. Y1263C with the proximal disulfide bond (B), C1163R and its disulfide bond with residue C1195 (C), P1182R and the neighboring Ca 21 ion (D), and G1176V (E).
given assembled heterotrimer. This stoichiometry simplifies analysis, because collagen-I heterotrimers in autosomal dominant patients must then contain only either one or no mutant C-Proa2(I) domains; there can be no cases of normal assemblies that contain both a mutant and a WT monomer. We created a set of full-length and C-Pro-only Cola2(I) constructs spanning a range of OI-causing C-Pro mutations (Figs. 1B-E), including mutations that add or delete Cys residues (e.g. Y1263C [22] and C1163R [19]) and mutations that alter charge or could otherwise be consequential for protein conformation/ folding (e.g. P1182R [22] and G1176V [19]).
We began by showing that these C-Pro-only constructs recapitulate the behavior of the full-length collagen-I constructs, indicating that the presence versus the absence of unassembled or slowly assembling triple-helical domains has little impact on the trafficking of these proteins, which is instead driven mainly by the misbehaving C-Pro domain itself. Arguably, the most severe assembly defect was associated with C1163R C-Proa2(I), which was also the only variant completely retained in cells. We next applied a suite of biochemical and imaging approaches to compare the folding, trafficking, and quality control behavior of WT versus C1163R C-Proa2(I). Finally, we used MS-based proteomics to illuminate how the ER proteostasis network differentially engages normal versus misfolding collagen-I C-Pro domains. Our results provide fresh insights into the molecular determinants of collagen-I proteostasis and will inform continued efforts to resolve dysregulated collagen proteostasis in disease.

Expression and analysis of OI-causing C-Proa2(I) variants
We began by developing a panel of constructs encoding missense variants of Cola2(I) within the C-Pro domain that are known to cause autosomal-dominant OI (Figs. 1B-E). We chose Y1263C and C1163R C-Proa2(I) (Figs. 1B and C), because they either introduce an extra Cys residue or remove one of the Cys residues present in the WT protein (19,22). The intricate network of cysteine residues in the C-Pro domain assists both monomer folding and proper assembly of 2:1 Cola1(I):Cola2(I) heterotrimers (10,11,26). Disruption of this network via adding or removing a Cys residue might be expected to be particularly deleterious. We selected P1182R (Fig. 1D), because this mutation not only removes a conformationally constrained Pro residue but also introduces an extra positive charge within the Ca 21 -binding region of the protein (22). Ca 21 binding was recently shown to be critical for dynamic, noncovalent assembly of C-Proa(I) heterotrimers. Finally, we employed the G1176V variant (Fig. 1E), a relatively more conservative mutation that was previously proposed to disrupt triple-helix formation (19). Overmodified Cola2(I) chains were observed in both the medium and cell layer of G1176V Cola2(I)-expressing primary patient fibroblasts (19).
We designed constructs in which full-length Cola2(I) variants were FLAG tagged at their N termini, whereas the corresponding WT Cola1(I) constructs were HA tagged ( Fig. 2A). Given the lack of high-quality collagen-I antibodies, this approach enables robust differential detection of the various constructs. We previously showed that such N-terminal tags do not disrupt normal collagen-I assembly (24). We cotransfected WT or mutant full-length Cola2(I) plasmids along with the WT, full-length Cola1(I) plasmid in HEK293T cells. We selected HEK293T cells as an expression platform, because this cell line does not produce endogenous collagen-I, which could otherwise convolute our analyses, and because it was particularly convenient for our downstream studies focusing entirely Full-length collagen is known to display anomalous mobility on SDS-PAGE gels (48). C, diagram showing Cola1(I) and Cola2(I) C-Pro-only constructs. D, immunoblots showing intracellular expression and secretion of WT C-Proa1(I) and WT or OI-causing mutants of C-Proa2(I) from the C-Pro-only constructs. HEK293T cells were transiently transfected with WT Cola1(I) and WT or mutant Cola2(I) full-length constructs or WT C-Proa1(I) and WT or mutant C-Proa2(I) constructs, as indicated. Cotransfection of Cola1(I) or C-Proa1(I) and RFP instead of Cola2(I) or C-Proa2(I) was used as a control.
on the C-Pro domain itself. As a control, cells cotransfected with Cola1(I) and RFP instead of Cola1(I) and Cola2(I) were included in each sample set.
We analyzed the resulting lysate and medium samples by immunoblotting of reducing SDS-PAGE gels. We observed that, just like full-length WT Cola1(I) and Cola2(I), all the fulllength Cola2(I) OI variants were robustly expressed in cell lysates (Fig. 2B, top). In contrast, analysis of the corresponding medium samples revealed that, whereas all the other full-length Cola2(I) OI variants were secreted at high levels, full-length C1163R Cola2(I) was intracellularly retained (Fig. 2B, bottom). We note that the full-length WT Cola1(I) that was cotransfected with full-length C1163R Cola2(I) secreted normally into the media despite C1163R Cola2(I) retention, consistent with the notion that this mutation may prevent collagen-I heterotrimer assembly without interfering with Cola1(I) homotrimer production (19). Altogether, these observations are consistent with prior efforts to characterize collagen-I production by patient fibroblasts expressing the same variants.
We next asked whether these behaviors would be recapitulated by C-Pro-only constructs, which are much more amenable to detailed biochemical characterization. In particular, we questioned whether the presence or absence of a triple-helical domain was critical for the observed results. To test this possibility, we prepared C-Pro-only constructs lacking collagen-I triple-helical domains such that HA epitope-tagged C-Proa2(I) WT and OI variants could be cotransfected alongside WT FLAG epitopetagged C-Proa1(I) (Fig. 2C). We included a 23-XTEN linker (27) between the FLAG epitope tag and the WT C-Proa1(I) domain to improve separation on SDS-PAGE gels and thereby facilitate differential detection of C-Proa2(I) versus C-Proa1(I), which have very similar molecular weights. Upon cotransfection of these plasmids (and/or an RFP control), we observed strong similarity to the behavior of the full-length constructs. In particular, C1163R C-Proa2(I) was still intracellularly retained, whereas all the other variants were robustly secreted (Fig. 2D).

Characterization of folding and assembly defects induced by OI-causing C-Proa2(I) variants
A key advantage of C-Pro-only constructs is that they can be much more easily subjected to detailed biochemical analysis than full-length constructs. We next made use of this feature to understand, at higher resolution, the consequences of these OI-causing C-Proa2(I) mutations for collagen-I folding and assembly. We used immunoblotting of nonreducing SDS-PAGE gels to analyze C-Pro heterotrimer assemblies in media obtained from cells cotransfected with WT C-Proa1(I) and either WT or OI-causing C-Proa2(I) variants. We found that WT C-Proa2(I) and two of the secreted OI-causing variants (G1176V and Y1263C) were able to form disulfide-linked  Fig. S1. G1176V and Y1263C formed disulfide-linked heterotrimers with WT C-Proa1(I). P1182R C-Proa2(I) failed to assemble into disulfide-linked heterotrimers, instead forming homooligomers mediated by incorrect disulfide bonds. As was also observed in Fig. 2, C1163R C-Proa2(I) was not secreted. B, immunoblot assessing intracellular interactions between WT C-Proa1(I) and WT or C1163R C-Proa2(I). WT or C1163R C-Proa2(I) was immunoprecipitated from cell lysates using anti-HA beads. The blot was then probed for coimmunoprecipitation of WT C-Proa1(I). By this assay, WT C-Proa2(I) robustly and stably assembled with WT C-Proa1(I), whereas the C1163R C-Proa2(I) variant was unable to assemble with WT C-Proa1(I) to any significant extent. HEK293T cells were transiently transfected with WT C-Proa1(I) and WT or mutant C-Proa2(I) constructs, as indicated. Cotransfection of C-Proa1(I) and RFP instead of C-Proa2(I) was used as a control.
heterotrimers with C-Proa1(I) (see Fig. 3A for nonreducing gels and Fig. S1 for reducing gels). In contrast, the other secreted variant, P1182R, did not form disulfide-linked heterotrimers with C-Proa1(I). Interestingly, instead of forming heterotrimers with C-Proa1(I), P1182R C-Proa2(I) formed disulfide-linked homodimers and homooligomers that could be dissociated under reducing conditions and appeared to be well-defined rather than general protein aggregates. This observation is consistent with the notion that the P1182R amino acid substitution derails collagen-I proteostasis by interfering with Ca 21 binding (22), which we recently showed is a key early noncovalent assembly step on the pathway to formation of heterotrimers with C-Proa1(I) (10).
C1163R C-Proa2(I) was not secreted into the media, so the analysis in Fig. 3A could not reveal whether this OI-causing variant assembles with C-Proa1(I). However, immunoprecipitation of intracellular WT versus C1163R C-Proa2(I) from cells also expressing WT C-Proa1(I) revealed that C1163R C-Proa2 (I) was indeed assembly defective ( Fig. 3B), failing to trimerize with C-Proa1(I) to any detectable extent.

OI-Causing C-Proa2(I) variants do not cause acute ER stress
We next asked whether these OI-causing Cola2(I) collagen variants cause ER stress, as would be indicated by activation of the unfolded protein response (UPR). qPCR analysis revealed that transcript levels for the UPR targets (28-30) BiP, ERdj4, Grp94, CHOP, HYOU1, Sec24D, and Gadd34 were not significantly increased by expression of any of the full-length WT and OI variants of Cola2(I) studied, including the ER-retained C1163R variant and the aggregation-prone, misassembling P1182R variant ( Fig. S2; thapsigargin treatment was used as a positive control for UPR activation). Similar results were also observed in cells transfected with the C-Pro-only versions of WT or OI-causing Cola2(I) (Fig. S3).
These observations are consistent with prior work indicating that collagen-I mutations, even severe disease-causing mutations, rarely induce prototypical ER stress (31)(32)(33)(34)(35). These general failures to induce a prototypical ER stress response raise the question of whether the ER proteostasis network is capable of differentially recognizing even very disruptive collagen-I variants.

C1163R C-Proa2(I) traffics abnormally and accumulates in the ER
We next sought to gain greater insight into the failed folding and assembly of the most severely defective of these C-Proa2(I) mutations, the C1163R variant. We first employed confocal microscopy to assess the trafficking of C1163R C-Proa2(I). We used antibodies against the ER marker protein-disulfide isomerase (PDI), the cis-Golgi marker GM130, and the lysosomal membrane marker LAMP1 to evaluate the localization of WT versus C1163R C-Proa2(I) upon cotransfection with WT C-Proa1(I) in HEK293T cells. A mAb against C-Proa1(I) was also used to track the subcellular location of that strand.
Consistent with its robust secretion, we observed that WT C-Proa2(I) trafficked extensively to the Golgi apparatus and was only weakly detected in the ER at steady state, as shown by its minimal colocalization with the ER marker PDI versus its extensive colocalization with the cis-Golgi marker GM130 (Fig.  4A). In addition, the majority of WT C-Proa2(I) strands were found to colocalize with WT C-Proa1(I), consistent with successful heterotrimer assembly. This intracellular trafficking pattern is commonly observed for endogenous full-length, WT collagen-I (36)(37)(38)(39).
In contrast, we observed that C1163R C-Proa2(I) was mostly retained in the ER, with only a very small proportion of this variant observed in the Golgi (Fig. 4B). These data are consistent with recognition of misfolded C1163R C-Proa2(I) by the ER quality control machinery, leading to strong intracellular retention. Notably, only a small proportion of the C1163R C-Proa2 (I) colocalized with WT C-Proa1(I), likely reflecting the inability of this OI-causing Cola2(I) variant to form a heterotrimer with its WT Cola1(I) counterpart. This observation also coincides with our observation (Fig. 2D) that expression of the C1163R C-Proa2(I) variant did not prevent normal trafficking of WT C-Proa1(I).
We next used the lysosomal membrane marker LAMP1 to assess whether C1163R C-Proa2(I) was targeted for degradation through the lysosomal pathway. We observed that only a small (and similar) proportion of either WT or C1163R C-Proa2(I) colocalized with LAMP1-positive vesicles, suggesting that autophagy is not extensively involved in the degradation of accumulated, misfolded C-Proa2(I) polypeptides (Fig. S4).

C1163R C-Pro variant fails to assemble with WT C-Proa1(I) and is instead targeted for ER-associated degradation
We turned to pulse-chase analysis to better understand the fate of intracellularly retained C1163R C-Proa2(I). We began by cotransfecting WT C-Proa1(I) along with WT C-Proa2(I). Transfected cells were then metabolically labeled with a brief pulse of 35 S-Cys/Met-containing media, followed by a chase period in unlabeled media (Fig. 5A). RFP-transfected samples were included with each sample set to control for background signal during the subsequent analysis. We harvested cell lysate and medium samples for analysis every 30 min. To enrich the HA epitope-tagged C-Proa2(I), each sample was then immunoprecipitated using HA antibody-conjugated agarose beads. Eluted medium samples were digested with a combination of enzymes (PNGase-F and O-glycosidase) to remove posttranslational modifications on the C-Pro domains that would otherwise complicate SDS-PAGE separation and downstream analysis.
Analysis of the resulting autoradiographs revealed that the WT C-Proa2(I) signal steadily decreased in the lysate over time, whereas the corresponding medium signal increased, consistent with a normal secretion time course (Fig. 5B). WT C-Proa1(I) that was coimmunoprecipitated with C-Proa2(I) followed a similar trend in both lysate and medium over time. Treatment with the proteasome inhibitor MG-132 indicated little to no proteasomal degradation of WT C-Proa(I) during the experiment.
In contrast, pulse-chase analysis of C1163R C-Proa2(I) cotransfected with WT C-Proa1(I) led to detection of protein only in the cell lysates (Fig. 5C). To ensure that the lack of signal in medium was not simply because of a secretion delay, we extended the time points further and still observed no medium signal even after six hours. Interestingly, the corresponding lysate samples indicated that most of the C1163R C-Proa2(I) was cleared from cells within six hours despite not being secreted. We hypothesized that the intracellularly retained C1163R C-Proa2(I) variant might be directed to ER-associated degradation (ERAD). Indeed, treatment with 15 mM of MG-132 at the latest time point (6 h) rescued a substantial fraction of the intracellular signal, consistent with clearance of C1163R C-Proa2(I) by ERAD (Fig. 5D). Notably, C-Proa1(I) was never detectably coimmunoprecipitated with C1163R C-Proa2(I) during this experiment, further confirming that this variant of C-Proa2(I) failed to assemble with C-Proa1(I) throughout its lifetime in the cell.

MS-based interactomics reveal how the ER proteostasis network differentially engages WT versus C1163R C-Proa2(I)
We next sought to understand how the ER proteostasis network engages WT versus C1163R C-Proa2(I). We began by defining the interactome of the WT C-Pro, which has not previously been established. Cells were cotransfected with RFP and WT C-Proa1(I) to create a negative control or cotransfected with WT C-Proa1(I) and WT C-Proa2(I) for the experimental sample. Following our previously developed protocol for coimmunoprecipitation of full-length collagen-I and its interactors (24,40), we then briefly treated intact cells with the cell-permeable, lysine-reactive, reversible crosslinker dithiobis-(succinimidyl propionate) to immortalize (1) with the ER luminal chaperone PDI but strong colocalization with the cis-Golgi matrix protein GM-130 and WT C-Proa1(I). B, confocal immunofluorescence microscopy showed extensive colocalization of the C1163R C-Proa2(I) variant with the PDI ER marker but limited colocalization with WT C-Proa1(I) or the cis-Golgi marker GM-130. HEK293T cells were transiently transfected with WT C-Proa1(I) and WT or mutant C-Proa2(I) constructs, as indicated, prior to fixing, staining with the indicated antibodies, and preparation of slides for confocal imaging. In all images, insets represent selected fields magnified 11.013 as well as their overlays. Scale bar, 20 mm.
transient interactions between C-Proa2(I) and the proteostasis network. Samples were then immunoprecipitated using anti-HA agarose beads, followed by repeated washes and elution using a denaturing, nonreducing buffer. Eluted protein samples were then further processed for LC-MS/ MS analysis (see Experimental procedures; two biological replicates of both the negative control and experimental sample were analyzed, with an additional technical replicate of the experimental sample). The resulting spectral count data for all identified proteins in these samples are provided in Table S1.
In any given experimental replicate, 15-27% of the tryptic peptides identified belonged to the 247-residue WT C-Proa2(I) protein, indicating that the bait was strongly enriched. A total of 110 proteins were identified across all negative-control and experimental samples. In Table 1, we present the results for all the high-confidence WT C-Proa2(I) interactors, as defined by meeting the following criteria: 1) absent from one or both of the negative-control replicates; 2) at least two spectral counts in at least one experimental sample; and 3) presence in at least two independent biological replicates of the experimental sample.
A number of features indicate the reliability of the resulting WT C-Proa2(I) interactome. First, based on semiquantitative spectral counting, the most abundant C-Proa2(I) inter-actor is C-Proa1(I). Second, 75% of the identified interactors are known to be localized to the secretory pathway. Third, other than C-Proa1(I), virtually all the secretory pathway interactors are components of the ER proteostasis network and, based on their known functions, are likely to interact with and assist the folding of this protein. Fourth, most of the putative interactors were previously shown to interact with full-length collagen-I (24). Further, only a subset of the full-length collagen-I interactors were identified. Notably missing are the peptidyl prolyl isomerases, the triple-helix-specific chaperone HSP47, and a vast array of triple-helix-modifying enzymes. The absence of these proteins from the C-Pro-only interactome should be expected, as those interactors specifically support triple-helix maturation. Instead, for C-Proa2(I) we observe engagement by PDIs, Hsp40/70/90 chaperones that typically assist folding of globular proteins, and components of the ER's lectinbased proteostasis network. Because the C-Pro domain is a globular, disulfide-rich, N-glycosylated protein, its folding would reasonably depend specifically on assistance from these components of the ER proteostasis network.
With the first analysis of how the ER proteostasis network engages the collagen C-Pro domain in hand, we turned our attention to the misfolding, OI-causing C1163R C-Proa2(I) variant. Given that we had observed intracellular retention of C1163R C-Proa2(I) and targeting to ERAD, we hypothesized that a comparative analysis of the WT versus C1163R C-Proa2(I) interactomes would reveal differential engagement of the disease-causing variant by the ER proteostasis network.
Samples from cells cotransfected with WT C-Proa1(I) and WT C-Proa2(I) or cotransfected with WT C-Proa1(I) and C1163R C-Proa2(I) (two biological replicates and one additional technical replicate of each sample) were immunoprecipitated and processed by following the same procedure as that used for establishing the WT C-Proa1(I) interactome. Following LC-MS/MS analysis, protein identification, and quantitation of spectral counts (see Table S2 for the complete data set), we set out to establish the differences in these two interactomes.
In Table 2, we present the results for proteins that interact differentially with WT versus C1163R C-Proa2(I) based on the following criteria: 1) absence from both the negativecontrol samples from Table S1; 2) .2-fold enrichment in bait-normalized spectral counts in at least two biological replicates of either the WT or C1163R C-Proa2(I) samples; and 3) at least two spectral counts present in at least one of the enriched samples.
Three key features of this comparative analysis support the reliability of the results. First, .85% of the proteins shown to interact more with C1163R C-Proa2(I) than with WT C-Proa2 (I) are known to be localized to the secretory pathway. Second, as expected based on our results in Fig. 3B, the comparative proteomic analysis shows that C-Proa1(I) interacts much more strongly with WT than with C1163R C-Proa2(I). Third, we used immunoprecipitation and Western blotting to corroborate several of the proteins identified to preferentially engage C1163R C-Proa2 (I). All three (PDIA3, PDIA6, and BiP) were shown to interact with C1163R more than with WT C-Proa2(I) (Fig. 6).
The results (Table 2) provide a compelling picture of greatly enhanced engagement of the misfolding C1163R C-Proa2(I) variant by the ER proteostasis network. Of particular note, the ER's HSP40/70/90 and lectin-based chaperone and quality control systems differentially recognize the misfolding variant. Further, the PDI machinery interacts much more extensively with C1163R C-Proa2(I). This observation is consistent with the notion that the loss of a key cysteine disrupts the C-Pro disulfide network in a particularly deleterious manner. The outcome of this enhanced proteostasis network engagement is the complete intracellular retention coupled with effective quality control of C1163R C-Proa2(I).   Single-stranded DNA-binding protein  Table S2 for complete, unfiltered data set. The common contaminants ribonucleoprotein/ribosomal proteins, histones, tubulin, and keratin are not included in this table. b Spectral counts shown for C1163R C-Proa2(I) interactors are normalized based on the amount of C1163R versus WT C-Proa2(I) spectral counts observed in a given replicate. c Spectral counts shown for C-Proa2(I) are not normalized.

Discussion
The critical first step in the production of functional fibrillar collagens is the folding and proper assembly of C-Pro domains (41). However, the proteostasis network mechanisms that specifically engage the C-Pro domain, the molecular consequences of disease-causing C-Pro mutations, and how the cell recognizes and responds to misfolded C-Pro domains have all remained ill-defined (28). Here, we studied the folding, misfolding, processing, and interactome of collagen-I C-Pro domains in cells using a biochemically amenable system in which C-Pro domains were expressed in the absence of their associated triple-helical domains. This C-Pro-only system provides a valuable context to elucidate how selected OI-causing C-Pro domain mutations disrupt collagen-I assembly and proteostasis.
We first showed that the behavior of WT versus mutant C-Pro-only constructs recapitulated the processing of the corresponding full-length collagen-I constructs. Comparison of the medium and lysate samples of mutant C-Pro constructs with the corresponding full-length constructs confirmed similar secretion and retention patterns. It is noteworthy that, whereas C-Pro domain mutations have been shown to be targeted to degradation pathways, it was previously unclear whether the quality control system directly recognizes and engages the misfolded C-Pro domain itself or rather recognizes unfolded triple-helical domains whose assembly is stalled by the presence of a misfolded C-Pro (42,43). Our data indicate that, at least for the mutations studied here, the presence of a triple-helical domain is not an important driver of the behavior of the mutant collagens. Misfolded C1163R C-Proa2(I) is still retained in the absence of the triple-helical domain, whereas misfolded P1182R Cola2(I) is still secreted when the triple-helical domain is present. This observation suggests that cells have limited capacity to specifically recognize triple-helix domain defects.
We characterized the consequences of four OI-causing C-Proa2(I) mutations reported to result in type I OI: specifically, mutations resulting in the G1176V, Y1263C, P1182R, and C1163R amino acid substitutions. The G1176V and Y1263C C-Proa2(I) variants formed disulfide-linked heterotrimers with C-Proa1(I) and did not significantly impact collagen-I secretion. These results are consistent with prior observations that these variants only modestly delay collagen-I production in patient cells and result in a mild OI phenotype (19,22).
On the other hand, the P1182R and C1163R C-Proa2(I) variants caused much more substantive issues from the perspective of collagen proteostasis. P1182R C-Proa2(I) was secreted at high levels but did not detectably assemble with C-Proa1(I) at all. Stable formation of collagen-I heterotrimers is a process that depends on dynamic noncovalent assembly of various trimeric species in a Ca 21 -mediated process, followed by covalent immortalization of appropriately assembled heterotrimers using interstrand disulfide bonds (10). Disruption of Ca 21 binding, which is a plausible consequence of positioning a positively charged arginine residue near the Ca 21 -binding site (Fig. 1D), therefore would be expected to prevent the assembly of disulfide-linked heterotrimers. Interestingly, we found that this mutation resulted in the exclusive formation of what clearly must be badly misassembled, disulfide-linked homodimers and homooligomers of P1182R C-Proa2(I). This defect did not prevent secretion of P1182R C-Proa2(I), consistent with observations by us here and others elsewhere that the P1182R substitution does not substantially impede the secretion of full-length Cola2(I) (22). Intriguingly, the P1182R amino acid substitution causes type IV OI with moderate phenotypes rather than causing a severe form of OI (22), despite the severe misassembly into homooligomers. A likely explanation is 2-fold: 1) P1182R oligomers are well secreted, avoiding risks such as severe ER stress or cellular dysfunction, and 2) because P1182R does not significantly assemble with Cola1(I), the simultaneous presence of WT Cola2(I) in patient cells enables the formation of normal collagen-I heterotrimers; P1182R-driven homooligomers may not be stable or extensively deposited into extracellular matrices, although it may still be disruptive.
The C1163R C-Proa2(I) amino acid substitution proved similarly problematic from a protein-folding and assembly perspective. The loss of C1163 disrupts a critical intrachain disulfide bond in C-Proa2(I). We found that the resulting protein was recognized as misfolded by the ER proteostasis network, and secretion was entirely prevented. Notably, the disease phenotype caused by the C1163R C-Proa2(I) mutation is quite mild type I OI (19), despite the severe folding defect. Our data suggest three factors that contribute. 1) Very little, or none, of the C1163R C-Proa2(I) variant escapes quality control to be secreted, avoiding extracellular matrix defects. 2) Unlike the case for many intracellularly retained triple helical variants (32,42), the cell robustly directs retained C1163R to ERAD, avoiding extensive buildup in the ER and consequent ER dysfunction or stress. 3) Very little, or none, of the C1163R Cola2(I) assembles with Cola1(I), instead being quality controlled and degraded. Therefore, in heterozygous patients, Figure 6. Confirmation of MS-based interactome data showing how the ER proteostasis network differentially engages WT versus misfolded C-Proa2(I). Shown is an immunoblot assessing intracellular interactions between WT or C1163R C-Proa2(I), expanded to include additional interactors beyond C-Proa1(I) that is shown in Fig. 3B. WT or C1163R C-Proa2(I) was immunoprecipitated from cell lysates using anti-HA beads. The blot was then probed for coimmunoprecipitation of BiP, PDIA3, and PDIA6, validating increased interaction of these ER proteostasis network components with the misfolding C1163R variant. Cotransfection of C-Proa1(I) and RFP instead of C-Proa2(I) was used as a control.
WT Cola2(I) can still assemble normally with Cola1(I) to form heterotrimers. In this regard, it is notable that Cola1(I) homotrimers were also produced by C1163R mutant patient cells, likely reflecting the deficiency in Cola2(I) caused by the misfolding mutation (19). Our data also help to explain why the C1163R Cola2(I) mutation is not as deleterious from a pathology perspective as the P1182R Cola2(I) mutation. For example, P1182R patients suffer from bone fractures, whereas C1163R patients typically do not. The ER proteostasis network can prevent secretion of C1163R Cola2(I) and direct it to degradation. In contrast, P1182R Cola2(I) misassembles into homooligomers, escapes quality control, and is secreted into the extracellular matrix, where it may introduce some structural deficiencies.
Intrigued by the efficient ER retention and clearance observed with the C1163R C-Proa2(I) variant, we turned to MS to establish how the ER proteostasis network engages WT C-Pro domains and to identify how the ER chaperone and quality control system specifically identifies misfolding C1163R C-Proa2(I) to retain it intracellularly and direct it to ERAD. Our proteomic studies revealed increased interaction of C1163R C-Proa2(I), compared with WT, with the PDI family (e.g. PDIA3, PDIA4, and PDIA6), as well as general ER chaperones (e.g. Hyou1, BiP, and ERdj3). The PDIs are known to promote proper disulfide bond formation and engage clients with improperly assembled disulfides. The increased interaction of C1163R C-Proa2(I) with these proteostasis network components highlights the impact of disulfide bond network disruption. The increased interaction with the HSP40/70/90 system likely reflects the mechanism of targeting to ERAD.
In conclusion, these results highlight the potential of the biochemically amenable C-Pro-only system to provide molecularlevel insights into how cells handle collagen proteostasis defects. The assembly, pulse-chase, and interactome experiments are all challenging to perform with this level of resolution using the full-length protein, because of the inherent challenges associated with working with GC-rich and repetitive genes, MS-grade immunoprecipitations of collagen, and the difficulty of obtaining high-quality immunoblotting data. Thus, this approach sets the stage not just for fundamental understanding of collagen folding but also for future studies aimed at identifying whether and how we can resculpt cellular proteostasis networks to better address collagen folding defects (35). Modulation of the cellular proteostasis network is an emerging and promising strategy for targeting other protein misfolding-related diseases (44). Improving the proteostasis network's ability to identify and successfully fold or prevent the misfolding of mutant collagen-I strands could prove beneficial in therapeutic settings. A critical first step is studies such as that presented here, characterizing proteostasis defects in detail and identifying how the proteostasis network attempts to address such defects.

Experimental procedures
Plasmids Plasmids encoding procollagen-a1(I) and procollagen-a2(I) with the preprotrypsin signal sequence upstream of an HA or FLAG epitope tag, respectively, were described previously (24). Plasmids (pcDNA3.1) encoding FLAG-and HA-tagged C-Proa1(I) and C-Proa2(I), respectively, were prepared by PCR amplifying the C-Pro domains spanning from the endogenous C-proteinase cleavage site (45) to the C terminus of procollagen using primers that incorporated the NotI and EcoRV restriction enzyme cut sites for insertion downstream of a preprotrypsin signal sequence and the indicated HA or FLAG tag. The FLAG-tagged C-Proa1(I) construct was then further modified to incorporate a 23-XTEN linker by inserting annealed oligonucleotides into the NotI site. The additional amino acids introduced by this linker enabled separate detection of C-Proa1(I) versus C-Proa2(I) on autoradiography gels from pulse-chase experiments. OI variants were introduced by sitedirected mutagenesis using the QuikChange XL II kit from Agilent Technologies and the primers listed in Table S3. Complete open reading frames for the WT constructs are provided in the supporting information.

Cell culture and transfections
HEK293T cells (ATCC) were cultured in Dulbecco's modification of Eagle's medium (DMEM; Corning) supplemented with 15% fetal bovine serum (FBS; Corning), 100 IU penicillin (Corning), 100 mg/ml streptomycin (Corning), and 2 mM L-glutamine (Corning) at 37°C in a humidified 5% CO 2 atmosphere. Cells were regularly tested for mycoplasma contamination using the Agilent MycoSensor PCR assay kit. Transient transfections of full-length and C-Pro domain-encoding collagen-I and RFP-encoding plasmids were performed using Lipofectamine 3000 (Thermo Fisher Scientific). For all experiments, medium was changed to fresh DMEM supplemented with 50 mM L-ascorbate (Amresco) 24 h posttransfection. Media and lysates were harvested 24-48 h posttransfection for analysis. Cells were harvested, washed with 13 PBS, and then lysed at 4°C in a buffer containing 50 mM Tris-HCl at pH 7.5, 150 mM sodium chloride, 1 mM EDTA, 1.5 mM magnesium chloride, 1% Triton X-100 (Integra), and protease inhibitor tablets (Sigma). Each experiment was performed in biological triplicate.
Immunoprecipitations 24 h posttransfection with the indicated constructs in a 70% confluent 10-cm dish, HEK293T cells were washed with 13 PBS, diluted to 10 ml with 13 PBS, and treated with a final concentration of 100 mM dithiobis(succinimidyl propionate) (Thermo Fisher) for 30 min. The reaction was quenched by treatment with 1 ml of 1 M Tris buffer at pH 8.0 for 10 min. Cell pellets were collected by centrifugation and then lysed by treatment for 20 min with lysis buffer (see above) at 4°C. Cell debris was removed by centrifugation at 10,000 3 g for 15 min. Protein concentration in the supernatant was quantified using a Bradford assay (Bio-Rad). A ratio of 1 mg of each sample to a 60-ml slurry of HA-antibody conjugated agarose beads (Sigma) was then diluted to 1 ml in lysis buffer. Samples were mixed end-over-end at 4°C overnight and centrifuged at 1500 3 g for 5 min at 4°C, and then the supernatant was removed and the beads were washed three times with lysis buffer. Next, the beads were boiled in 100 ml of 6% SDS in 300 mM Tris at pH 6.8 for 30 min to elute the proteins. After spinning samples at 2000 3 g, the eluent was carefully separated from the beads for further analysis. Samples were harvested at the indicated time points. Medium was collected, spun down at 1500 3 g to remove debris, and then added to HA-antibody agarose beads (30 ml; Sigma). Cells were washed with 13 PBS and lysed for 20 min in lysis buffer (see above). Lysates were spun down at 10,000 3 g for 15 min to remove cell debris, and then the supernatant was added to HA-antibody agarose beads. All immunoprecipitations were allowed to incubate on an end-over-end rotator at 4°C overnight. The following day, the supernatant was removed and the beads were washed three times with radioimmunoprecipitation assay buffer containing 50 mM Tris at pH 7.5, 150 mM sodium chloride, 1% Triton X-100, 0.5% DOC, 0.1% SDS, and 1 mM EDTA. Immunoisolates were eluted by boiling in 63 Laemmli buffer lacking SDS for 15 min and then spun down to separate the eluent from the beads. WT medium samples were treated with a combination of PNGase-F, neuraminidase, and O-glycosidase (NEB), and digestions were performed according to the manufacturer's instructions. Samples were denatured by boiling in 63 Laemmli buffer (300 mM aqueous Tris at pH 6.8, 15% glycerol, 6% SDS, and 10% [w/v] bromphenol blue) supple-mented with 167 mM 1,4-DTT. Eluted samples were separated on homemade 15% SDS-PAGE gels. Gels were dried, exposed to a phosphorimager plate, and then imaged on a Typhoon imager. Band intensities were quantified using ImageQuant TL (GE Healthcare). Experiments were performed in biological triplicate with standard deviation shown.

Quantitative RT-PCR
Transfected cells were harvested and washed with PBS at 4°C. Total RNA was extracted using the Omega RNA purification kit. RNA concentrations were quantified and normalized to 1 mg total RNA for cDNA reverse transcription. cDNA was synthesized in a Bio-Rad Thermocycler using the Applied Biosystems reverse transcriptase cDNA kit. Kapa BioSystems Sybr Fast qPCR master mix, appropriate primers (Sigma Aldrich), and cDNA were used for amplification in a Light Cycler 480 II real-time PCR Instrument. Primer sequences are shown in Table S4. Primer integrity was assessed by thermal melt to ensure homogeneity. Transcripts were normalized to the housekeeping gene RPLP2. Standard deviation of n = 3 is shown in plots.

Confocal microscopy
Transfected HEK293T cells (4 3 10 4 cells) suspended in complete DMEM were plated on a 24-well plate with poly (D)lysine-coated coverslips (Chemglass Life Sciences) and incubated for 48 h at 37°C in a humidified 5% CO 2 atmosphere. Culture medium was removed and the coverslips were carefully washed with PBS. Cells were fixed with 4% formaldehyde (Mallinckrodt) for 3 h at 4°C and then permeabilized with 0.1% Triton X-100 in PBS for another 30 min at room temperature. Coverslips were then incubated for 1 h at room temperature in a blocking buffer containing 1% BSA in TBS at pH 7.5. Double labeling was performed by incubating coverslips overnight at 4°C in TBS (5% BSA, 0.01% sodium azide) containing mouse anti-HA (1:200; Abcam, ab9110) and then mouse anti-PDI (1:200; Abcam, ab2792), anti-LAMP1 (1:200; Abcam), rabbit anti-Col1A1 (1:200; Sigma, HPA008405), or mouse anti-GM130 (1:500; BD Biosciences, 610822). Secondary antibodies (1:1000), Alexa Fluor 488-conjugated anti-mouse (Invitrogen) or Alexa Fluor 568-conjugated anti-rabbit (Invitrogen), were then applied to the coverslips for 2 h at room temperature. Coverslips were rinsed at least 33 with TBS after each incubation. Nuclei were stained with DAPI (1 mM; Invitrogen) for 15 min at room temperature. After final washes (33 with PBS), the coverslips were mounted with ProLong (Thermo) to prevent photobleaching. Negative controls for nonspecific binding of the secondary antibodies obtained by omitting primary antibodies in the staining protocol were included for each experiment. Images were acquired on a Zeiss AxioVert200M microscope with a 633 oil immersion objective and a Yokogawa CSU-22 spinning disk confocal head with a Borealis modification (Spectral Applied Research/Andor) and a Hamamatsu ORCA-ER charge-coupled device camera. The MetaMorph software package (Molecular Devices) was used to control the hardware and image acquisition. The excitation lasers used to capture the images were 405 nm, 488 nm, and 561 nm. Image processing was performed using ImageJ (NIH).

MS-based interactome analyses
Eluted C-Pro samples obtained from immunoprecipitations (as described above) were precipitated by vortexing with 450 ml of MeOH. 150 ml of CHCl 3 was then added and the sample was vortexed again. Finally, 450 ml of water was added and samples were vortexed and then centrifuged at 10,000 3 g for 3 min. The upper aqueous phase was carefully removed, whereas the white precipitate at the solvent interface was preserved. The collected precipitate was then washed 33 with 0.5 ml of MeOH. The washed pellet was dried using a SpeedVac and then resuspended in an aqueous solution containing 8 M urea, 50 mM ammonium bicarbonate, and 10 mM DTT. The samples were incubated in a 56°C water bath for 45 min, cooled for 2 min at room temperature, and then incubated with 55 mM iodoacetamide for 1 h in the dark. Samples were next incubated with 1 mg of sequencing-grade trypsin (Promega) overnight at room temperature. Proteolyzed samples were acidified to a final concentration of 5% formic acid and subjected to C 18 stage tips for desalting. After eluting tryptic peptides from C 18 stage tips, the samples were dried by SpeedVac and then resuspended in 0.1% formic acid.
Samples were injected onto an EASY-nLC 1000 nanopump system connected to a Thermo Q Exactive Hybrid Quadrupole-Orbitrap mass spectrometer and analyzed using the LC-MS/MS parameters specified in Tables S1 and S2. For database searching, the five samples used for the WT interactome analysis (Table S1) and the six samples used for the comparative interactome analysis (Table S2) were analyzed separately to obtain accurate statistics. Tandem mass spectra were extracted, and charge states were deconvoluted using Proteo-meDiscoverer v2.3. Deisotoping was not performed. Samples were analyzed using Sequest (Thermo Fisher Scientific, San Jose, CA, USA; version IseNode in Proteome Discoverer 2.3.0.523). Sequest was used to search Uniprot_Human, updated 1 December 2019 with 20,533 entries, containing common contaminant proteins and assuming fully tryptic peptides with at most 2 missed cleavages. Sequest was searched with a fragment ion mass tolerance of 0.020 Da and a parent ion tolerance of 10.0 ppm. Carbamidomethylation of cysteine was specified in Sequest as a fixed modification. Oxidation of methionine and acetylation of the N terminus were specified in Sequest as variable modifications. For criteria for protein identification, Scaffold (Scaf-fold_4.10.0, Proteome Software Inc., Portland, OR) was used to validate MS/MS-based peptide and protein identifications. Peptide identifications were accepted if they could be established at .6.0% probability to achieve a false discovery rate of ,1.0% by the Scaffold Local false discovery rate algorithm. Protein identifications were accepted if they could be established at .97.0% probability to achieve a false discovery rate of ,1.0% and contained at least two identified peptides in at least one sample. Protein probabilities were assigned by the Protein Prophet algorithm (47). Proteins that contained similar peptides and could not be differentiated based on MS/MS analysis alone were grouped to satisfy the principles of parsimony. See Table S5 for detailed results of these analyses.

Data availability
All the proteomic data are available on MassIVE with accession number MSV000085340.