Structure of a collagen VI α 3 chain VWA domain array: adaptability and functional implications of myopathy causing mutations

Collagen VI is a ubiquitous heterotrimeric protein of the extracellular matrix (ECM) that plays an essential role in the proper maintenance of skeletal muscle. Mutations in collagen VI lead to a spectrum of congenital myopathies, from the mild Bethlem Myopathy to the severe Ullrich Congenital Muscular Dystrophy.


INTRODUCTION
Collagen VI is a ubiquitously expressed beaded microfibril forming collagen and serves to anchor large structures such as blood vessels and nerves to the surrounding ECM and to basement membranes (1)(2)(3)(4). It is often found associated with other ECM components, such as decorin and biglycan.
These link the collagen VI microfibrils to matrix adaptor proteins, such as matrilins (5). Collagen VI has been implicated in a wide range of physiological processes, from tumor growth and metastasis through to macrophage recruitment (6). Its role in the skeletal muscle is the best characterised, as defects in collagen VI manifest as congenital myopathies (7). The hallmark conditions are Bethlem Myopathy (BM) (8,9) at the milder end of the spectrum and Ullrich Congenital Muscular Dystrophy (UCMD) at the more severe end (10,11).
Collagen VI was long thought to be composed of only three distinct polypeptide chains, the shorter α1 and α2 chains and the longer α3 chain. However, three alternative long chains α4, α5 and α6 were subsequently discovered and sometimes replace the α3 chain (12,13). In humans the COL6A4 gene coding for the α4 chain is inactivated by a large pericentral inversion (12). There are two features that make collagen VI unique within the collagen protein family. First, it contains only a short collagenous domain and was therefore named "short chain collagen" in early studies (14,15). Indeed, collagen VI, in particular the long chains, mainly consists of tandem arrays of von Willebrand factor type A (VWA) domains (Fig. 1). These are well known protein-protein interaction modules of approximately 200 amino acid residues. They adopt a Rossman fold of a doubly wound, open twisted central -sheet surrounded by six -helices (16). For example, the six murine collagen VI chains contain 46 different VWA domains and therefore collagen VI is the prototypical VWA domain containing multiprotein assembly. Second, collagen VI forms very large complexes of about the size of a ribosome already before secretion. The α1, the α2 and one of the long chains (α3 in most tissues) form a heterotrimeric triple helical monomer (17). For formation of larger complexes to occur two monomers assemble to a dimer in an antiparallel manner by forming a triple helical segmented twisted supercoil (18). Two dimers then assemble laterally to form a tetramer. Several disulphide bridges stabilize the complex (1). The final secreted collagen VI tetramer, when made up by the α1α2α3 chains, contains up to 72 VWA domains. After secretion, intercalating overlapping end-to-end association of tetramers leads to the formation of the beaded microfibril, a process which is accompanied by proteolytic processing of the globular C-terminal part of the α3 chain (19,20).
Due to its size and heterogeneity, structural studies of collagen VI at atomic resolution are challenging. A nanostructure of the bead regions from isolated α1α2α3 chain containing collagen VI microfibrils was determined using cryo-TEM (21). The beads have a compact hollow head region with four lobes, probably containing the Cterminal VWA domains from the three αchains. The head is connected by an intermediate region to two tail regions, most likely containing the N1 VWA domains of all three α chains. However, the tandem array of VWA domains from the Nterminus of the long α3 chain is lacking from the structure (21). Nevertheless, by utilizing small angle X-ray scattering (SAXS) a C-shaped form of the N-terminal VWA domain tandem array of the α3 chain was determined (20), a shape also found for the homologous α4, α5 and α6 chains (22). Structural information at high resolution for the VWA domain array is lacking, even though the X-ray crystal structure of the single N5 VWA domain of the α3 chain revealed a C-terminal linker extension that allows the tandem arrangement of multiple VWA domains (23).
As much of collagen VI is made up by the N-terminal tandem VWA domain array of the α3 chain, insight into its structure is required to understand collagen VI selfassembly into microfibrils and integration into the extracellular matrix. Further, such information will help in studies aimed at revealing the pathomechanisms underpinning collagen VI myopathies. Although mutations which affect the triple helix are the most common (24), several mutations in the N-and C-terminal VWA domains have also been shown to cause disease (7) and especially the N2 VWA domain of the α3 chain harbours several pathogenic point mutations (25)(26)(27)(28). Here, we determine the structure of the collagen α3 N2 domain by X-ray crystallography and study the effects of introducing BM and UCMD patient mutations (Fig. 1) on the secretion efficiency of the single domains in cell culture. We investigate longer double-and quadruple arrays of α3 chain derived VWA domains (N5-N4 and N6-N3) by SAXS to assess the structural heterogeneity and the modular adaptability of the protein in solution. We also study the flexibility of N6-N3 by single particle EM.

RESULTS
The X-ray crystal structure of the collagen VI α3 N2 VWA domain reveals a pronounced structural conservation among N-terminal α3 chain VWA domains.
The N2 domain located within the Nterminal region of the α3 chain carries several myopathy causing mutations. To date, the location of the affected amino acid residues could only be estimated by homology modelling (29) and conclusions on the consequences of mutations for the pathomechanism lack an experimentally determined structural basis. Therefore, the X-ray crystal structure of the human collagen VI α3 N2 domain was determined at a resolution of approximately 2.3 Å, refined to RWork/RFree of 25.8 / 20.0 % ( Table 1). The data collected were anisotropic with resolution estimates between 2.20 and 2.58 along different axes. This explains the relatively high Rmerge values in the outer shell (2.27 -2.20Å). The asymmetric unit contains two independent copies of the VWA domain, which are virtually identical (all-atom RMSD of 0.23 Å) even in the loop sections, which may indicate a certain structural rigidity. The structure of the human collagen VI α3 N2 domain ( Fig. 2A) is typical of the VWA domain family, adopting an alternating / Rossman fold made up of a central twisted core of five parallel (1,2,4,5,6) and one anti-parallel (3) -strands surrounded by six -helices. The sequence of the N2 domain does not indicate a metal iondependent adhesion site (MIDAS) motif, often present in VWA domains (30). Overall, the structure of the N2 domain is very similar to the α3 N5 domain (PDB 4IGI (23)) ( Fig. 2B) with an RMSD C of 0.90 Å, even though the domains share only 31% sequence identity. Remnants of the Cterminal linker (as seen in Protein Databank accession 4IGI), that would otherwise connect to the N1 domain, were not resolved suggesting a level of structural heterogeneity in this region of the protein.
Beside the unresolved C-terminus, the major difference between the two structures is the short loop/turn in the N2 domain between β2 and α2 with the sequence 1670YEDGD1674. This is partly explained by the difference in primary structure in the N5 domain (1055LDVGPD1060) which is slightly longer and seems to allow different hydrogen bridging with Lys1636 at the Nterminus and Lys1699 in the α2 helix of the N2 domain (Fig. 2 C).
The N2 VWA domain harbouring BM and UCMD mutations exhibit different selfassociation propensities and affect cellular secretion levels.
The X-ray crystal structure of the N2 domain allows exact mapping of pathogenic mutations (Fig. 3A) and thereby gives clues for pathomechanisms. The mutations R1660C, D1674N, G1679E and L1726R were selected from literature or from the Leiden Open Variation Database (LOVD https://www.lovd.nl/), using the criteria that they were i) likely pathogenic and/or ii) had supporting data from patient material indicating pathogenicity (25)(26)(27)(28). The mutations cover a spectrum of prototypic locations within an VWA domain. G1679E is located in the central hydrophobic β sheet, L1726R in an α helix facing the hydrophobic core, and R1660C and D1674N are surface exposed in an α helix and a loop, respectively. As the recombinant expression of heterotrimeric collagen VI has not yet been achieved, only the mutated single VWA domains were expressed in HEK-293 EBNA cells to study their impact on expression and secretion (Fig. 3B, C).
The wild-type N2 domain is expressed and secreted and is well-resolved by SDS-PAGE under non-reducing (Fig. 3B) and reducing ( Fig. 3C) (Fig. 3C). The R1660C shows qualitatively more of this dimer compared to D1674N and the native N2, that shows only trace levels. The surface exposed location of the R1660C mutation likely increases the propensity of the domain to form disulphide bonded dimers under oxidizing conditions. However, the dimerization observed in D1674N and trace-dimerization in the native N2 domain can only be reconciled by the formation of a disulphide bond between two unpaired cysteine residues (C1825) located within the unstructured C-terminal tail. Strikingly, the G1679E (lane 4) and L1726R (lane 5) mutant domains are expressed but retained within the cell, shown by the strong band in the cell lysate and the lack of detectable material in the supernatant (Fig. 3B, C).
The double VWA domain construct N5-N4 samples a definable set of structural states.
Although X-ray structures of single VWA domains may help to understand their particular function, e.g. by identifying binding surfaces, information about the connection of VWA domains and their relative arrangement to each other in space is also needed. Therefore, attempts at crystallizing tandem double domains were made. The variant N5-N4 was chosen as a prototype because the N5 domain, including parts of the linker to N4, has already been crystallized (23). However, the attempts to crystallize N5-N4 were unsuccessful. Therefore, in-line size exclusion chromatography SAXS (SEC-SAXS) (31) combined with parallel multiangle laser light scattering (MALLS) were used to obtain the structural parameters of N5-N4, including the molecular weight (MW), radius of gyration (Rg), maximum particle dimension (Dmax), and probable real-space atom-pair distance distribution, or p(r) profile. The SAXS results are summarized in Figures 4, 5 and Supporting information, Figures 1 and 2. The majority of the protein sample elutes from the gel filtration column as a monomer with an average MW of 43 kDa (expected MW from the N5-N4 amino acid sequence = 44.3 kDa) with an Rg of 2.90 +/-0.1 nm (Fig. 4A). The resulting p(r) profile yields a highly skewed distribution of real-space distances indicating a rather elongated particle. The profile extends to the maximum diameter Dmax of ca. 10 nm and is dominated by a maximum at r = 2.6 nm and a well-defined secondary shoulder at 5 nm (Fig. 4B). The former feature points to the average cross-section of the particle. This latter feature is likely to be caused by contributions from scattering pair distances arising from well-defined, spatially separated domains. Low-resolution models of the protein in solution were calculated using ab initio modelling. The DAMMIN (32) ab initio model of the N5-N4 domain pair and corresponding fit ( 2 = 1.03; CorMap (33) p = 0.85) are shown in Fig.  4C. It appears that the N5-N4 monomer forms an extended dumbbell, with two compact globular lobes. Overall, the double domain construct has dimensions of approximately 4 x 4 x 9.5 nm, while each lobe is approximately 4 x 4 x 4 nm such that each are of sufficient volume and shape to accommodate a single compact VWA domain (Fig. 4C). The X-ray crystal structure of the N5 domain has been solved (PDB: 4IGI), however a high-resolution atomistic model of the N4 domain is not available. Therefore, a homology model of N5-N4 was generated with I-TASSER (34) using the amino acid sequence of N5-N4 as input (Supporting information, Fig. 2). The topscoring homology model generated by the I-TASSER calculations has structural similarities to the double VWA domain of the proximal thread matrix protein 1 (PTMP1) isolated from the mussel shell fish (PDB: 4CN8) (35) (Fig. 5). The N5-N4 I-TASSER homology model does not agree with the SAXS data ( 2 = 18, CorMap p = 0, Supporting information, Fig. 2). Of note, the Rg of the I-TASSER model is far too small (2.4 nm) compared to that obtained from the experimental SAXS data (2.9 nm) indicating that the VWA domains of the predicted model are not sufficiently separated. To obtain a model of N5-N4 that fits the SAXS data, the X-ray crystal structure of N5 was combined with the I-TASSER-built N4 domain and a dummy amino acid linker was generated to connect the two domains. Using the program BUNCH (36), the spatial positioning of the N5 and N4 VWA domains were refined against the SAXS data with the optimization of the linker conformation (Fig. 5). BUNCH was run several times (at least 10) to produce a cohort of rigid-body refined atomistic structures of N5-N4 that fit the SAXS data ( 2 = 1.0, CorMap p = 0.8; Fig. 5A). The separation of the N5 and N4 domains of the refined BUNCH model agrees with the volume (mass) distribution of the ab initio dummy atom bead model. The linker connecting the N5 and N4 domains is not hyper extended, but does allow for the N5 and N4 domains to be distinctly separated from each other (Fig.  5D). In addition, the general positioning of the N5 and N4 domains across the BUNCH model cohort is consistent suggesting that the protein does not sample highly diverse structural states, although it cannot be excluded that the domains may undergo a 'twisting' motion relative to the long axis of the protein (Fig. 5A middle). An additional investigation to assess the structural heterogeneity of N5-N4 was performed using the ensemble optimization method, EOM (37,38). The results of EOM analysis are shown in Figure 5B, C, D. Briefly, EOM takes the atomistic structures of the N5 and N4 domains and treats these as rigid bodies while the linker between the domains (Fig.  8D) is treated as a flexible random chain. EOM then generates a cohort of 10 000 structures incorporating the random linker conformations and calculates the resulting Rg and Dmax distributions of this initial pool of structures. Using a genetic algorithm, EOM finds the best sub-set of ensemblestates to represent the experimental SAXS profile. Therefore, by comparing the Rg and Dmax distributions of the refined pool that fits the SAXS data relative to the initial pool, it is possible to assess whether the protein samples highly diverse states (if so, the random and refined pool distribution widths will be similar), or whether more extended or more compact ensembles best represent the conformation(s) of the protein in solution (Fig. 5B). For N5-N4, the final EOM refined ensemble ( 2 = 1.04, CorMap p = 0.18) has significantly narrower Rg and Dmax distributions compared to the initial pool suggesting that the protein is sampling a limited set of conformational states. In addition, the distributions shift to lower values of Rg and Dmax indicating that the linker connecting the domains tends towards a more compact structure (i.e., does not sample hyper-extended conformations). The resulting bi-modal nature of the Rg distribution may suggest co-existence of two states of N5-N4. The state where N5 and N4 are spatially separated -as observed from the BUNCH modelling -exists together with a less-frequent compactedstate, where the N5 and N4 domains are closer together, similar to the predicted I-TASSER/mussel PTMP1 model. With respect to the volume fractions of these two-states, EOM generates a refined ensemble where in 75% of the structures these are in the BUNCH-like extended conformation and in 25% of the cases are in a more compact PTMP1-like conformation (Fig. 5D). In combination, the BUNCH and EOM analyses suggest that the linker between the N5 and N4 VWA domains contributes in limiting the conformational sampling of the double domain variant, the overall structural states of which are welldefined.
The quadruple VWA domain construct, N6-N3, samples a diverse conformational ensemble and is structurally heterogeneous.
The four-domain N6-N3 tandem construct was also subjected to SEC-SAXS analysis with parallel MALLS measurements (Fig.  6). The MW correlation through the gel filtration peak of N6-N3 is shown in Figure  6A with estimates falling within a narrow range of 85.2-90.1 kDa, with an average of 88. 5 kDa. The corresponding concentration-independent MW obtained from the SAXS data was evaluated at 92 kDa. Combined, the results indicate that N6-N3 elutes from the column as a monomer (expected MW = 93 kDa). The p(r) profile of N6-N3 is shown in Figure 6B. The Rg determined from the p(r) at 4.25 +/-0.1 nm is consistent with the Guinier Rg estimate of 4.1 nm (Supporting   information Table 3 and Supporting information, Fig 3.) Overall, the p(r) profile has two defined maxima at r = 2.8 and 5.2 nm and the overall anisotropic distribution of distances that extend to a Dmax of 16 nm suggests the protein has an extended/modular domain organization. Additional analysis of the SAXS data show that the scattering profile is highly ambiguous, meaning that a number of shape topologies fit the scattering data (Supporting information, Table 3 (39)). Therefore, the ab initio modelling using DAMMIN, and the alternative program GASBOR (40), were run several times and the individual results clustered into shape categories. The final low-resolution representations of N6-N3 are presented in Figure  Atomistic models of N6-N3 were developed by combining the crystal structure of N5 with I-TASSER homology models of N6, N4 and N3 and performing both BUNCH and CORAL (41) rigid body modelling, taking into account the missing linkers between the domains (Fig. 8D). The spatial positioning of the N6, N5, N4 and N3 VWA domains were refined in parallel against two SAXS datasets: the N6-N3 data and the data from double-domain N5-N4 construct. Both refinement programs were run several times (at least 30) to produce a very limited set of atomistic structures that fit both the N6-N3 SAXS data ( 2 = 1.02-1.07, CorMap p = 0.093-0.55) as well as the N5-N4 data ( 2 = 1.03-1.12, CorMap p = 0.001-0.18). Examples of the BUNCH and CORAL models of N6-N3 are displayed in Figure 7B. In general, the N6, N5 and N4 domains appear to cluster to form an 'trefoil-shaped head' while the N3 domain extends out into solution. Of note, it is difficult to ascribe one specific conformation of the protein, with the domains -and especially N3 -sampling different conformational states, although these states are on average consistent with the mass distribution as seen in the ab initio models. As the above analysis suggests that N6-N3 is structurally heterogeneous, EOM was performed against the single N6-N4 dataset, and the results are summarized in Figure 8. The refined ensemble that fits the SAXS data ( 2 = 0.98, CorMap p = 0.9) can be represented by models whereby, and in general, the trefoil arrangement of N6-N4 is relatively preserved across the ensemble, although it is noted that the N-terminal N6 domain may adopt different, but spatially close, positions relative to the N5-N4 'core' (as is also indicated in the BUNCH/CORAL modelling). The apparent clustering of the N6-N4 modules is in contrast to the N3 domain that appears to sample markedly different spatial positions relative to the trefoil-head. For example, 44% of the volume fraction of states within the ensemble have a spatially separated N3 domain at the end of an extended linker, while 55% show the N3 domain proximal to the adjacent N4 domain (Fig. 8A). In order to corroborate the structural heterogeneity observed in the SAXS experiments, the quadruple VWA domain construct N6-N3 was directly visualized at domain resolution using negative-stain electron microscopy (EM). Indeed, the raw micrographs highlight the structural heterogeneity of the four-domain N6-N3 tandem construct (Fig. 9a). To investigate the underlying conformational ensembles, we analyzed the data using both deterministic and stochastic image classification approaches. In close agreement with the SAXS data, both classification methodologies independently revealed a multitude of shape topologies, ranging from extended 'pearls-on-a-string' conformations with length of up to 16 nm to compacted, square conformations with a diameter of circa 10 nm (Fig. 9b). While an assignment of the individual domains is not possible due to their almost identical dimensions, most topologies observed in the class averages, corresponding to circa 90 percent of all classified particles, agree with the SAXS data assignment of an 'trefoil-shaped head region' with a more mobile fourth domain (Supporting information Video 1). Independent of the classification methodology used, a subset of classes corresponding to circa 10 percent of all classified particles shows only three resolved domains (Supporting information Figure 5). Inspection of the individual particle images constituting these classes revealed that this is likely due to occlusion of the fourth domain by the other three domains. The combined SAXS and negative-stain EM results demonstrate that structural heterogeneity and conformational sampling is intrinsic to the modular VWA domain architecture of the collagen VI 3 chain. The combined modelling results, encompassing N5-N4 and N6-N3, suggest that different levels of inter-domain linker flexibility combined with possible transient/frustrated interactions between each of the individual VWA modules endows the collagen VI 3 chain with a propensity towards structural adaptability.

DISCUSSION
Collagen VI is a very large, microfibril forming multi-chain ECM protein. The three classical chains carry many VWA domains both in their N-and C-terminal regions, with the α3 chain containing the most (20,21,42). Here, we analysed structural features of the large tandem array of VWA domains at the N-terminus of the α3 chain. Our work focussed on the myopathy mutation-prone N2 VWA domain and on the central domains N6 to N3 that are most likely relevant for the proposed flexibility of the N-terminus of the α3 chain (21). These may be required for microfibril formation (43) or for the ability to bind other extracellular matrix macromolecules, e.g. biglycan and decorin (5) or heparin and hyaluronan (44).

The location of point mutations in the structure of the N2 VWA domain determines how the local fold is affected and indicates the severity of the resulting myopathy
Outside of the triple helical domains collagen VI is highly polymorphic (45) and for the N1-N10 domains of the α3 chain 257 missense mutations of uncertain clinical significance are listed in the ClinVar database. It is therefore difficult to discriminate between polymorphisms and disease-causing missense mutations. Since exome or whole genome sequencing has become a standard tool for the diagnosis of collagen VI related myopathies this is an emerging problem. Especially the lack of structural information makes it difficult to predict the consequences of changes in the amino acid sequence for the structure and thereby the function of the protein. The N2 domain of the α3 chain, on the boundary between the triple helical core region and the extended array of N-terminal VWA domains, harbours several missense mutations that have been proposed to be pathogenic or likely pathogenic (25)(26)(27)(28). Notably, the α3 N2 domain has previously undergone crystallography attempts (29) which were unsuccessful. This is reminiscent to attempts to recombinantly express and maintain matrilin VWA domains, which are similar to those in collagen VI, in a soluble state for downstream characterisation (46). Our initial attempts with the N2 domain also did not yield crystals. However, when we alkylated the free thiol group in the Cterminal linker region crystals were obtained, indicating that disulphide bond formation interferes with crystallization. The alkylation will most likely not alter the structure of the N2 domain as it does, like the earlier crystallized N5 domain (23), not contain a disulphide bond that connects Nand C-terminal ends. Interestingly, the comparison of the tertiary structures of the N2 and N5 VWA domains revealed that, while only sharing a 31% sequence identity this is sufficient to maintain the conserved structure (Fig. 2B), perhaps implying a functional conservation of VWA domains in the N-terminal array of the α3 chain. The low identity of the amino acid sequences and thereby of the nucleotide sequences may be necessary to avoid homologous recombination events especially as each of the N2 to N10 VWA domains is encoded on a single approximately 600 bp long exon. While the C-terminal extension could not be resolved in the crystal structure of the N2 domain, incidental SAXS analysis could model a C-terminal extension comprising the residues corresponding to the linker region between the N2 and the N1 domain (data not shown, but refer to the Small Angle Scattering Biological Data Bank, SASBDB, entry SASDJJ4).
The missense mutations occurring in the N2 VWA domain include two dominant mutations linked to BM, p.G1679E (25,29) and p.L1726R (26), one mutation linked to an intermediate phenotype, p.R1660C (28) and one dominant UCMD associated mutation, p.D1674N (27). The exact localization of the mutations revealed by the N2 domain X-ray structure in concert with recombinant expression of single mutated N2 domains enabled us to further pinpoint the pathomechanisms. The BM causing G1679E mutation was found in a Dutch pedigree and segregated to 19 affected members (25). Fibroblasts from these patients do not show a significantly altered secretion of the collagen VI chains. However, they had a 20-30% loss of wild type N2 domain epitopes indicating either unfolding of N2 or degradation of the mutated chain (29). As predicted (29), the mutation is indeed located in the β2 strand of the central β sheet and thereby has a detrimental effect on the fold (29). This explains why only trace amounts of the single G1679E N2 domain could be purified from cell culture supernatant of transfected HEK-293 EBNA cells (29) and why the G1679E mutation caused the intracellular retention of the N2 domain (Fig. 3B, C). Nevertheless, based on published data it is likely that a mutated full length α3 chain containing an unfolded N2 domain can be secreted (25). However, assembly and microfibril formation have not been studied in patient fibroblasts and on sections. As haploinsufficiency of collagen VI is not associated with a clinical phenotype (47)(48)(49), perhaps binding sites located on the N2 domain that are necessary for collagen VI assembly or for binding to other ECM molecules are affected. The other dominant BM mutation, L1726R, has been studied in patient fibroblasts and shown to have no consequences on assembly and microfibril formation (26) and most likely represents the same pathomechanism. As for G1679E, the single L1726R N2 mutant domain is not secreted (Fig. 3B, C). The location of the mutation on the amphipathic α helix 3 points to a disturbed fold of the central hydrophobic core (Fig. 3A) and therefore that a specific loss of function of the N2 domain in collagen VI microfibrils carrying the mutation is the common diseasecausing effect.
The R1660C mutation occurred in a compound heterozygous patient with an intermediate BM/UCMD phenotype (28). On the paternal allele he carried a premature termination codon, resulting in nonsense mediated decay of the resulting mRNA. The loss of function mutation on the paternal allele and the R1660C mutation on the maternal allele led to a reduced level of collagen VI in the matrix of cultured fibroblasts, as well as a highly perturbed morphology of the collagen VI matrix (28). Interestingly, the mother showed signs of an undifferentiated connective tissue dysplasia including soft scoliosis of the thoracolumbar region, skin hyperlaxity in the elbow joint area and hypermobility in the interphalangeal joints of the hands and feet (28). In addition, fibroblast cultures from the mother showed more intracellular accumulation and a somewhat lower level of an extracellular deposition of collagen VI (28) indicating a mild dominant effect of the mutation. The R1660C mutation is found on the surface of the domain (Fig.  3A). Unlike the BM causing mutations G1679E and L1726R that destroy the fold of the central hydrophobic core and result in intracellular retention, the R1660C N2 domain is secreted and, under non-reducing conditions, dimerization is observed in the cell culture supernatant (Fig. 3B). However, although not so pronounced, the wild-type N2 domain already forms dimers, most likely through the formation of a disulphide bridge between the cysteine residues in the C-terminal linker region (Fig. 3B). Interestingly, for the R1660C N2 domain this dimerization is not observed in the intracellular fraction. Instead, a faster migrating band occurs which is not present under reducing conditions (Fig. 3B, C), indicating that the fold of the faster migrating species is maintained by an intramolecular disulphide bridge between the novel cysteine and the cysteine in the N2 domain linker, leading to a more compact form of the protein (Fig. 3B). Compared to the mutations that affect the general fold of the N2 domain, the introduction of a cysteine residue on an exposed surface could present a novel site for aberrant disulphide shuffling within the collagen VI protein or between collagen VI and other ECM proteins.
The D1674N mutation was originally found to occur in one out of 79 patients with UCMD and did not appear in healthy controls (7). However, the genomic background of the single UCMD patient harbouring the D1674N mutation is complex, as the compound heterozygous individual also carried a R1395Q mutation in the α3 chain N4 domain and was homozygous for a R876S change in the C2 domain of the α2 chain causing a highly deleterious UCMD phenotype (27,50). Therefore, the contribution of the D1674N mutation is difficult to assess. The recombinant expression of the N2 domain carrying the D1674N mutation does not reveal any expression differences compared to the wild type, neither in secretion nor in the migration pattern on SDS-PAGE (Fig.  3B, C). The position of the mutation on the surface of the domain, coupled with the change from a negatively charged aspartic acid to an asparagine, could affect intra or intermolecular interactions either through the gain or loss of an interaction surface (Fig. 3A). The consequences of this mutation can be compared to those of the R1064Q mutation, occurring in the α3 chain N5 domain, where the fold and the migration of the recombinant protein was comparable to wild type (23). Therefore, the lack of a significant effect of the D1674N mutation on the expression and secretion of the single N2 VWA domain is unsurprising. Rather, the data indicate that the severe phenotype observed in the patient is due to the other attendant mutations rather than the D1674N variant. Of course, the D1674N mutation may, while not being disease causing, contribute to the phenotype as a modifier compounding the effect of other mutations in the mode of closely spaced multiple mutations (CSMM) which act in a coordinated fashion (51). The difficulty that the D1674N and R1064Q mutations highlight is the complexity of distinguishing genuine disease-causing variants from single nucleotide polymorphisms, which commonly occur in the genes encoding collagen VI (27). While important to understand the molecular consequences of mutations in the VWA domains, it is still difficult to dissect the effect these have on the larger α3 chain, or indeed on the collagen VI molecule as a whole. Limitations of studying point mutations in recombinantly expressed domains must be considered when investigating their effect, as catastrophic outcomes for the recombinant domain may be well tolerated in the context of the longer chain (26). Nevertheless, studying the consequences of mutations for folding of single VWA domains is helpful to predict pathomechanisms when patient fibroblasts are not available.
Our characterisation of single point mutations in the α3 chain N2 domain offers important insights into the consequences of perturbations to the VWA domain fold. Based on these data, the design of chemical chaperones to assist in the folding of corebased mutations (G1679E, L1726R) could follow. Moreover, as the VWA domains of the α3 chain are encoded on exons with phase 1 boundaries (52), siRNA or oligonucleotide based knockdown or skipping of the regions coding for the VWA domains with surface mutations could be used to ameliorate the effect of aberrant interaction processes that arise from these mutations.

The structural heterogeneity of the modular VWA α3 chain could provide a mechanism for adapting to binding interfaces and modulating stiffness in tissues
The distal VWA domains of the α3 chain, upstream of N2, are rarely implicated in the pathogenesis of collagen VI myopathies (7). Instead, their roles are in the assembly of collagen VI (43) and its interactions with other ECM macromolecules, such as von Willebrand factor and heparin (44,53). Our results show why this region has resisted interrogation using high-resolution structural methods (20)(21)(22): When combined in tandem, the VWA modules spanning the N6-N3 array (although encompassing an apparent region of increased spatial consistency across N5-N4) are not, overall rigidly well-defined, but sample a cohort of positions to form an ensemble of states in solution. The flexibility of the N6-N3 was also directly observed in negative-stain EM confirming that structural heterogeneity is inherent to this part of collagen VI.
In retrospect, the inability to crystallize the tandem VWA domains described here (despite our repeated attempts) provides a hint toward this structural heterogeneity inherent to the protein. Tandem VWA domains occur also in vitrin, cochlin and AMACO, but high-resolution structural information is not available. Each of the three VWA domains in the von Willebrand factor have been crystallized separately, but not as tandem multiple-domain constructs. Of note, flexibility between the VWA domains in the von Willebrand factor may be a prerequisite for the shear-induced unfolding of the central A2 domain making this domain accessible for ADAMTS13 cleavage (54). Interestingly, a tandem of two VWA domains from the proximal thread matrix protein 1 of the mussel byssus has been crystallized and the structure solved (55). Here the VWA domains are connected by a two-β-stranded linker that is further stabilized by disulphide bonds yielding a novel structural arrangement. However, the short inter-domain linkers spanning the N9-N2 array of the collagen VI α3 chain do not contain such cysteines that are otherwise present between the N1-N2 domains or in the related α4, α5 and α6 chains (Fig. 8D). It may be that the presence of cysteine residues, and the formation of subsequent intramolecular disulphide bonds, goes toward restricting conformational variability (23). This variability could be a contributing factor for the differential temporospatial expression of the four long chains of collagen VI. As the N6-N3 portions of the α3 chain have the capacity to sample multiple, state(s), which long chain is expressed at what time could reflect a mechanism for modulating the stiffness (22) or binding site availability of the collagen VI microfibrils in a given tissue (42). For example, the α3 chain N9-N2 region has been implicated in the binding of heparin and hyaluronan (44), von Willebrand factor (53) and decorin and biglycan (5). In particular, the N8 domain has been shown to be an interaction site for von Willebrand factor (53). The alternative splicing of this domain changes the responsiveness of the collagen VI protein toward von Willebrand factor interactions, allowing the modulation of the signal cascade. Therefore, mutations in the Nterminal α3 chain VWA region may affect binding events both directly -via removal of key binding determinants and/or domain destabilization -or indirectly by altering the conformational sampling of the protein.
The intrinsic structural heterogeneity and modular architecture of the α3 VWA array and its apparently dynamic nature could provide a mechanism for stiffness modulation in tissues and provide an 'adaptable avidity' for intra-or intermolecular interaction surfaces necessary for interacting with diverse binding partners.
The conformations of the single (N2) and tandem (N5-N4, N6-N3) α3 chain Nterminal VWA domains presented here raise interesting avenues for further investigation of this region in both wild type and myopathic conditions. Such studies could ascertain to what extent conformational sampling is requisite for the correct functioning of collagen VI. In addition, it is important to dissect how pathogenic mutations, which affect the local structure of the VWA domain, result in catastrophic effects of the collagen VI protein as a whole. Such mechanisms could encompass loss of binding sites used for self-assembly or for adaptor proteins within the collagen VI interactome, or simply lead to a decreased level of available collagen VI. It would be informative to identify binding surfaces involved in intra-or intermolecular interactions so as to understand the role of collagen VI both in normal tissue function and in myopathic processes.

PCR amplification of the collagen VI 3
chain VWA domains by guest on December 9, 2020 http://www.jbc.org/

Downloaded from
The N2 and N5-N4, N6-N3 constructs were obtained using PCR and human/murine cDNAs as templates. Primers were designed with overhanging tags for downstream infusion cloning into infusion enabled vectors (Table 1). All high throughput PCR amplification was performed at the Oxford Protein Production Facility (OPPF), Rutherford Appleton Laboratories, Oxfordshire, UK. All steps were performed in 96 well PCR plate format. For the production of the mutation containing constructs (α3 N2 R1660C, D1674N, G1679E, L1726R, the insert was designed and ordered as a geneblock from IDT technologies (https://eu.idtdna.com/pages) and used at a concentration of 20ng/µl template for PCR with the appropriate primers. Sequences were verified by Sanger sequencing.

Recombinant expression and purification of collagen VI 3 chain VWA domains
Isolated DNA plasmids were used to transform BL21 Rosetta / BL21 Rosetta pLacI protein expression bacteria. 10ml of a preculture was expanded to each 1L LB flask, and 1mM IPTG was added to induce expression of the recombinant protein. The cultures were then incubated at 28°C for 18 hours and harvested by centrifugation. The cell pellet was lysed by sonication and PMSF added to a final concentration of 10mM. The crude lysate was then centrifuged to separate the soluble intracellular material from the cell debris. As the N2 domain of the α3 chain contains a single cysteine, purification of this domain also included a reduction and alkylation step. 5mM dithiothreitol was added and after 30 minutes iodoacetamide to a final concentration of 15mM. An affinity chromatography column was prepared with Streptactin Sepharose resin. The resin was washed with TBS and the supernatant of the cell lysate applied. The column was again washed with TBS and bound protein eluted with 2.5mM ddesthiobiotin. All constructs contained either a thrombin (N5-N4) or 3C protease (N2, N6-N3) cleavage site immediately downstream of the strep tag. Thrombin was incubated with N5-N4 at a dilution of 1 unit/1µg protein in 20mM Tris-HCl, 150mM NaCl, 5mM CaCl2, pH 7.4. The N2 construct was incubated with Precission Protease (commercial 3C protease) at a dilution of 1 unit/100µg protein in 20mM Tris-HCl, 150mM NaCl, 1mM DTT, 1mM EDTA, pH 8.0. The solution was dialysed and reapplied to the Streptactin Sepharose column where cleaved protein was eluted in the flow-through. Excess buffer was removed by ultrafiltration. For final purification an ÄKTA purifier FPLC system was used along with a Superdex 75 Increase 10/300 column (N2, N5-N4) or Superdex 200 Increase 10/300 column (N6-N3). The fractions contained within the UV280 peak were analysed by SDS-PAGE, pooled and concentrated by ultrafiltration.

Transient transfection of HEK-293 EBNA cells
HEK-293 EBNA cells were cultured in Dulbecco's Modified Eagle Medium (DMEM) with 5% v/v fetal calf serum and 1% v/v penicillin and streptomycin. 300,000 cells/well were plated into a 6 well plate and incubated at 37°C/5%CO2 for 18 hours followed by washing with PBS and adding fresh media. The transfection solution was prepared by incubating 500-1000 ng of the plasmids encoding the recombinant proteins with 97µl of DMEM, 3µl Fu-GENE Lipo-Transfection Reagent for 15 minutes and then pipetted onto the media layer of the cells. 24 hours post transfection the supernatant was collected, centrifuged and treated with 1x Complete protease inhibitor. To harvest the cells RIPA buffer was added and the cell layer scraped from the plate. The cells were extracted at 4°C overnight with end over end turning. After 16 hours, the cells were sonicated at 30% amplitude for 10 seconds, centrifuged and the supernatant collected.

Sodium dodecyl sulfate polyacrylamide gel electrophoresis and western blotting
Samples were incubated with Laemmli sample buffer with or without 5% βmercaptoethanol, loaded into the wells and electrophoresis carried out. Proteins were detected either by staining with Coomassie Brilliant Blue R250 or by western blotting. The proteins were electrophoretically transferred to nitrocellulose. The nitrocellulose membrane was incubated in 5% (w/v) milk powder in TBS, a primary antibody directed against the Strep tag was added followed by a secondary antibody horseradish peroxidase conjugate. The membrane was then incubated with ECL solution and exposed on X-ray film.

Establishment of crystallography trials
Proteins were prepared for crystallography trials by centrifugation at 13,000 rpm at 4˚C for 15 minutes to remove air bubbles. The Mosquito crystallisation robot (TTP Labtech) was initialised and a pre-set vapour diffusion sitting drop programme was loaded. The protein was mixed with buffer from the reservoirs of the commercial Morpheus crystallography screen to yield final buffer: protein ratios of 1:1, 2:1 and 1:2. Once complete, the plate was removed from the robot and sealed with an adhesive plastic seal and stored at room temperature. Plates were checked weekly with a microscope to detect crystal formation.
The data were processed in P1 using the DIALS/DUI package (56) and the Pointless-Aimless-cTruncate (57) pipeline from within CCP4 (58). The phases were determined by molecular replacement with a VWA domain from collagen VI solved earlier (PDB: 4IGI) as the search model using PHASER (59). The initial model was optimised by various build-and-refine cycles using COOT (60) and phenix refine (61). Further analyses and pictures were made using the program Chimera (62). The final model and data were deposited at the PDBe under the accession code 6SNK.

SAXS measurements and data reduction
SAXS experiments were carried out at the European Molecular Biology Laboratory bioSAXS-P12 beam line (DESY, Hamburg, Germany) (63). SAXS data collection, I(s) vs s, where s = 4sin/; 2 is the scattering angle and  the X-ray wavelength (0.124 nm; 10 keV) was performed at 20 o C using coupled sizeexclusion chromatography (gel filtration) with a split-flow in-parallel MALLS/RI detector (64).
The MALLS/RI data were measured at 25 o C using a Wyatt Mini-Dawn TREOS with inbuilt quasi elastic light scattering (QELS) module coupled to an OptiLab T-Rex refractometer for protein concentration determination. The MALLS system was calibrated relative to the scattering from toluene and, in combination with concentration estimates obtained from refractive index (RI), was used to evaluate the MW distribution of species eluting from the gel filtration column. The dn/dc N5-N4 was set at 0.185 ml g -1 while N6-N3 was taken as 0.1831 ml g -1 (calculated using SEDFIT (65). The molecular weight estimates from MALLS/RI and the hydrodynamic radius, RH, derived from QELS were determined using Wyatt ASTRA7 software.
All samples underwent 'slow defrosting' from snap-frozen -80 o C aliquots; first at -20 o C for 2 hours followed by thawing on ice. The sample injection volumes and load concentrations were: N5-N4, 55 µl at 14mg/ml and N6-N3, 75µl at 7mg/ml. A GE-Healthcare S75 Increase (N5-N4) or S200 Increase 10/300 column (N6-N3) were used to affect the separation of each monomer variant. The columns were equilibrated in 20 mM TRIS, pH 7.4, 150 mM NaCl, 3% v/v glycerol at a flow rate of 0.6 ml.min -1 . Glycerol was added to reduce the effects of X-ray radiation damage to the protein sample (66). Automated sample injection and data collection were controlled using the BECQUEREL beam line control software (67). The SAXS intensities were measured as 2400 x 1 s individual X-ray exposures, from the continuously-flowing column eluent, using a Pilatus 6M 2D-area detector. The 2D-to-1D data reduction, i.e., radial averaging of the data to produce 1D I(s) vs s profiles, were performed using the SASFLOW pipeline incorporating RADAVER from the ATSAS 2.8 suite of software tools (68,69). The 2400 individual frames obtained for each gel filtration-SAXS run were processed using CHROMIXS (70). Only those scaled individual SAXS data frames with a consistent Rg through the SECelution peak after the subtraction of a appropriate solvent blank (as evaluated from the Guinier approximation; lnI(s) vs s 2 for sRg < 1.3) were used to generate the final averaged SAXS profiles. All SAXS datadata comparisons and data-model fits were assessed using the reduced  2 test and the Correlation Map, or CORMAP (33), pvalue set at significance threshold of =0.01. The ambiguity of each dataset, with respect to ab initio shape restoration, was assessed using AMBIMETER (39), while the maximum useful data range (smax) and number of Shannon channels was assessed using SHANNUM (71). The indirect inverse Fourier transformations of the resulting data for each construct to generate the probable frequency of scattering-pair real-space distances (p(r) profiles) were calculated using GNOM (72). The calculation of p(r) and all subsequent modelling was performed within the SHANNUM smax limit. Ab initio modelling was performed using GASBOR (40) and DAMMIN (32), while atomistic rigid-body modelling, incorporating the crystal structure of the N5 and, where relevant, the I-TASSER (34) generated homology models of N6, N4 and N3 were performed using BUNCH (36) and CORAL (41) while EOM (37,38) was used for ensemble analysis. For the ab initio modelling of N6-N3, and due to the ambiguity of the SAXS data, DAMMIN and GASBOR modelling was performed multiple times (at least [16][17][18][19][20][21][22][23][24][25][26][27][28][29][30] and from the individual model cohorts, representative clusters of related spatially-aligned lowresolution structures were calculated using DAMCLUST (41).

Negative-stain EM
For negative-stain electron microscopy, affinity purified N6-N3 (0.17 mg/ml) was freshly diluted 1:100 with TBS. Samples were applied to freshly glow discharged holey carbon grids with an additional 2 nm thin continuous carbon layer (Quantifoil R2/1 + 2nm, copper 200 mesh), and stained using uranyl formate (0.75 % w/v). For this, 4.5 µl of the diluted sample were incubated on the grid, then blotted to a thin layer by hand. To remove unbound protein, the grid was washed three times with 15 µl TBS, immediately blotting to a thin liquid layer by hand. The protein was then stained by brief incubation in a droplet of uranyl formate solution, immediate blotting, and incubation in a fresh droplet of uranyl formate for 20 s. Excess liquid was removed by blotting, and the remaining thin layer of stain rapidly dried by a laminar flow of air.
Transmission electron microscopic images were acquired on a JEOL JEM-2200FS microscope operating at 200 kV, equipped with a 4k x 4k TemCam F-416 camera. Alignment and astigmatism correction were performed manually using the microscope's user interface, then images were acquired using low-dose condition settings in SerialEM (73). In total 146 images were collected at a nominal magnification of 50,000x corresponding to a calibrated pixel size of 2.05 Å pixel -1 at a defocus range from 0.02 µm to 3.00 µm.
Data was processed using the Sphire package (74). After defocus estimation and CTF-based quality control, assuming an amplitude contrast of 80 %, 146 images were processed by the Phosaurus-network general model using a threshold of 0.1 and a particle size of 140 pixel, corresponding to an object size of approximately 290 Å, yielding 43,246 particle coordinates. While the choice of a non-restrictive threshold yielded many false positive coordinates, as judged by visual inspection, it ensured that all N6-N3 particles were selected despite apparent shape heterogeneity. Particle images were extracted with a box size of 208 pixel, and classified at a down sampled pixel size of 4.95 Å using the ISAC algorithm specifying 50 particles per class. For visualization, particle images were recomputed at full pixel size with full CTF correction using the beautifier algorithm. In parallel, particle coordinates were imported into the Relion package and processed independently (75). After defocus estimation and CTF-based quality control, assuming an amplitude contrast of 80 %, 46,214 particle images were extracted based on the imported coordinates, and classified into 100 classes.
Data availability: X-ray data for N2 have been deposited in the Protein Data Bank in Europe with accession code 6SNK. SAXS data and models for N5-N4, N6-N3 and N2 have been deposited in the Small Angle Scattering Biological Data Bank (SASBDB) under the accession codes SASDEY9, SASDEZ9, and SASDJJ4, respectively. All other data are contained within the manuscript and the supporting information.

Conflict of interest:
The authors declare that they have no conflicts of interest with the contents of this article.
iNEXT (grant number 653706), funded by the Horizon 2020 program of the European Union (PID: 2861) using the Structural Audit programme and an iNEXT proposal ID 5602 (HSD and RW).    (A) The MALLS, cm -1 (blue) and dRI, ml.g -1 (red) traces obtained for the gel filtration elution peak of N5-N4. The MW correlation through the N5-N4 peak is shown as a black line that spans 42.5-43.2 kDa (MW average = 43 kDa). (B) The real-space scattering-pair distance distribution, or p(r) profile, of N5-N4 calculated from the SAXS data (shown in D, measured as scattering intensity I(s) vs s, where s = 4sin/; 2 is the scattering angle and  the X-ray wavelength). (C) DAMMIN generated ab initio models for the N5-N4 protein showing an individual low-resolution structure (cyan). Ten individual DAMMIN models were spatially aligned and averaged, with the result shown here in green. (D) The fit to the SAXS data of the individual ab initio model is also shown (blue line).   The fit to the SAXS data of a selected BUNCH-refined rigid-body model to the N6-N3 data as well as the N5-N4 'core' of the model to the N5-N4 data; Right: corresponding spatial alignment between BUNCH and CORAL generated N6-N3 structures. (B) The three individual BUNCH and CORAL models (N5-N4 blue, N6 and N3 in cyan) that fit both the N6-N3 and N5-N4 SAXS datasets. Note, although the N6-N4 domain region of CORAL model (iii) appears different to BUNCH (i) and CORAL (ii) this depends on how the models are aligned (whether N3 projects away from, or into, the plane). Note: For clarity, the N5-N4 SAXS data shown in panel A have been scaled by 0.1 on the I(s) axis.