First Structural Glimpse of CCN3 and CCN5 Multifunctional Signaling Regulators Elucidated by Small Angle X-ray Scattering*

The CCN (cyr61, ctgf, nov) proteins (CCN1–6) are an important family of matricellular regulatory factors involved in internal and external cell signaling. They are central to essential biological processes such as adhesion, proliferation, angiogenesis, tumorigenesis, wound healing, and modulation of the extracellular matrix. They possess a highly conserved modular structure with four distinct modules that interact with a wide range of regulatory proteins and ligands. However, at the structural level, little is known although their biological function(s) seems to require cooperation between individual modules. Here we present for the first time structural determinants of two of the CCN family members, CCN3 and CCN5 (expressed in Escherichia coli), using small angle x-ray scattering. The results provide a description of the overall molecular shape and possible general three-dimensional modular arrangement for CCN proteins. These data unequivocally provide insight of the nature of CCN protein(s) in solution and thus important insight into their structure-function relationships.

cancer, is more pronounced (11)(12)(13). However, the biochemical basis, and importantly, the underlying protein-protein interactions that lead to these functions are not understood. CCN3, like all CCN proteins, is involved in many biological functions including cancer (14), regulation of angiogenesis (15), and cell adhesion (16). CCN5 was first identified in the WISP pathway and has been implicated heavily for its role in several types of cancers, most importantly breast (17) and colon (7) cancers, and for its role in bone turnover and chondrogenesis (18).
The complexity of CCN behavior is due to their large repertoire of ligands. These include the integrins, which are considered to be the "functional receptors of the CCN family"; the small cysteine knot growth factors (vascular endothelial growth factor (VEGF), transforming growth factor (TGF), and bone morphogenic protein (BMP) 2 ); extracellular matrix proteins (collagen, fibulin, and Notch); heparin sulfated proteoglycans; LDL receptor protein-1; and insulin growth factor (11,12,19) In some cases, the module that these receptors bind is known. VEGF isoforms bind to domains 3 and 4 of CCN2 (20) and domain 2 of CCN3 binds BMP2 (21). Aggrecan binds to the N terminus of CCN2 (either domain 1 or domain 2) (22), and domain 3 and domain 4 facilitate binding to heparin sulfated proteoglycans (23) and fibulin (16). For some functions, only a single domain is needed, but for others, multiple domains acting in concert are required (24). This raises questions about how the shape of the molecule influences its biological function.
The CCN proteins, like many extracellular matrix (ECM) proteins, are constructed from a series of distinct domains (25) that have evolved through exon shuffling (6,26,27). The building blocks that make up the CCN family are: 1) an insulin-like growth factor binding domain (IGFBP); 2) a von Willebrand type C repeat (VWC); 3) a thrombospondin type I repeat (TSP); and 4) a cysteine knot containing a carboxyl-terminal domain (CT) (6). A schematic diagram is shown in Fig. 1. These four domains have been shown to be discrete structural entities and can be produced as individual truncates that still possess biological activity (24). The CCN family is highly conserved at the primary structure level with ϳ30 -50% amino acid identity (and 40 -60% similarity) (28,29). CCN5 is an exception in that it lacks the CT domain but still maintains the first three highly conserved domains, and CCN6 lacks four normally conserved cysteine residues in the VWC domain. Protease-susceptible linkers separate the domains with the N-and C-terminal halves separated by a longer linker region that varies in both length and composition between the proteins. These linker regions can be cleaved by a range of matrix metalloproteases (MMP-1, MMP-2, MMP-3, MMP-7, MMP-9, and MMP-13) (30) and proteases such as elastase and plasmin (30 -32). Currently there is no structural information known about the CCN proteins. The high number of cysteines in each protein (ϳ10% by mass) makes obtaining pure material difficult (12), and previous attempts to purify these proteins have resulted in insoluble aggregates (33,34). There has been some progress, with microgram quantities of active CCN2 produced in bacteria (35). Production of individual domains or truncated sections has been more successful (35)(36)(37)(38), although it has not led to a greater understanding of the structure of the CCN proteins or their individual domains.
Although the structure of the individual domains can be hypothesized through homology modeling of related ECM domains (1), it does not yield any clues as to the overall shape of the molecule and any interdomain interactions that may occur. It has been hypothesized that the proteins exist as a globular bundle until positive or negative regulators act upon them to unfold and expose their active domains or cleave them at their protease-susceptible linkers into biologically active truncated fragments (9). Given the high degree of sequence identity between the proteins, it is likely that domain-domain interactions may be one of the factors that control the subtle differences in behavior and binding of these proteins (1,2). In the case of CCN2, domain cooperativity has already been seen to influence ligand binding with domain 4 binding to VEGF 121 and both domain 3 and domain 4 being required to bind the longer VEGF1 165 (20). Experiments done by Kubota et al. (24) also demonstrated that certain biological effects were only seen when multiple domains were involved. Movement about linker regions has been seen to be important in other mosaic proteins (39), and the secondary structure, if any, of the linker regions can also have an effect on interdomain communication (40).
To begin to understand how structure can lead to function, we have expressed and purified CCN3 and CCN5 in high yields. We have elucidated their low resolution solution structures using small angle scattering to give us a first look at the hitherto unseen CCN protein family members. The conserved cysteine bonds are also shown in each domain with CCN6 lacking two of these in the VWC domain. CCN5 can also be seen to be lacking the CT domain (6). B, a schematic of the recombinant (r) constructs of CCN3 and CCN5 with an N-terminal Trx tag, an N-terminal hexa-histidine affinity tag, and a factor Xa cleavage site. FIGURE 2. Circular dichroism data. CD spectra of CCN3 (gray triangles) and CCN5 (black squares) are shown. CD spectra were collected on a Jasco-J600 between 190 and 290 nm. Analysis in the program JFIT suggested that ϳ50% of the protein was composed of ␤-strands, ϳ40% consisted of random coil, and ϳ10% consisted of ␣-helix. The large proportion of ␤-strand and coil is consistent with the homology modeling with the small proportion of ␣-helix likely from the Trx tag.
The pET32-rCCN vector was then transformed into Rosetta-Gami2 cells (Merck Biosciences) for expression.
Expression and Purification-Single colonies were used to inoculate 10 ml of LB supplemented with 100 g/ml ampicillin and grown overnight at 37°C. This was then used to inoculate 1 liter of LB supplemented with 100 g/ml ampicillin and 35 FIGURE 3. Scattering curves for CCN3 and CCN5. Top, A, the scattering curve of CCN3 (blue circles). The fit to the ab initio data is shown in red and was calculated using the program DAMMIN (41). B, the radius of gyration (R g ) was calculated using the Guinier approximation (61) and was found to be 57.7 Å. C, the interatomic distance function (P(r)) gave a maximum dimension of 170 Å. These values were calculated in GNOM (44). Bottom, A, the scattering curve of CCN5 (blue circles). The fit to the ab initio data is shown in red and was calculated using the program DAMMIN (41). B, the radius of gyration (R g ) was calculated using the Guinier approximation (61) and was found to be 48.8 Å. C, the interatomic distance function (P(r)) gave a maximum dimension of 160 Å. These values were calculated in GNOM (44). g/ml chloramphenicol and was grown for ϳ8 h at 37°C before expression was induced with 0.2 mM isopropyl-1-thio-␤-Dgalactopyranoside and left for ϳ16 h at 20°C before cells were harvested through centrifugation. Cell pellets were frozen overnight at Ϫ20°C before being resuspended in lysis buffer (25 mM Hepes, pH 7.3, 500 mM NaCl, 1 mM PMSF, EDTA-free protease inhibitor tablet (Roche Applied Science); 10% v/v Bugbuster; Benzonase; and rLysozyme (Merck Biosciences) and incubated for ϳ20 min at room temperature before being lysed using a French press. Lysed cells were spun at 24,000 rpm for ϳ30 min, and the inclusion bodies were collected. Inclusion bodies were resuspended in ϳ20 ml solubilization buffer (25 mM Hepes, pH 7.3, 500 mM NaCl, 1 mM PMSF, 7.2 M urea) before being applied to a 1-ml His-Trap crude column (GE Healthcare) equilibrated in 25 mM Hepes, pH 7.3, 500 mM NaCl, 20 mM imidazole, and 7.2 M urea. The CCN proteins were eluted in a single step with elution buffer (as above, except 0.5 M imidazole). The partially purified CCN proteins were dripped slowly into ϳ100 volumes of refolding buffer (50 mM Tris, pH 8.2, 250 mM NaCl, 10 mM KCl, 0.05% PEG3350) and left at 4°C for Ͼ24 h. The refolded CCN proteins were then subjected to a two-step purification using an Ä KTA Express (GE Healthcare) that involved an Ni 2ϩ affinity column equilibrated with refolding buffer and eluted with an ascending imidazole gradient (0 -500 mM) followed by gel filtration on a Superdex 75 column in 25 mM Tris, pH 8.2, 200 mM NaCl. Proteins were then concentrated for use in biophysical studies. The concentration was calculated using absorbance at 280 nm. Once concentrated, the purity was checked by SDS-PAGE and Western blot using anti-His antibodies targeting the N-terminal fused His tag. Samples were stored at 4°C.
Circular Dichroism (CD)-CD data were recorded on a Jasco-J600 spectropolarimeter at 25°C from 190 to 290 nm with a 50 nm/min scanning speed using quartz cuvette with 5-mm path length. For both samples, eight runs were performed and averaged. A buffer blank (50 mM Tris, pH 8.2, 200 mM NaCl) was subtracted. An estimation of the secondary structure was performed by the program JFIT (41). CD data of CCN3 and CCN5 can be seen in Fig. 2.
Dynamic Light Scattering-Dynamic light scattering analysis was performed on a Malvern Instruments machine at 20°C. Protein concentration was ϳ1 mg/ml in 25 mM Tris, pH 8.2, 200 mM NaCl, and samples were filtered through a 0.22-m filter to remove any dust or particulates. For both CCN3 and CCN5, the samples were pure and monodispersed.
Small Angle X-ray Scattering Data Collection-Small angle x-ray scattering experiments were performed at beamline I22 at the Diamond Light Source (Didcot, Oxfordshire, UK) at 20°C using a RAPID 2D detector. Scattering experiments using concentrations of 3.5 mg/ml for CCN3 and 5 and 3.5 mg/ml for CCN5 were performed using 300-s exposures, but no radiation damage was observed. A silver behenate standard was used for calibrating the q axis (42), and a 5 mg/ml lysozyme (mass 14.3kDa) solution in sample buffer (25 mM Tris, pH 8.2, 200 mM NaCl) was used as a molecular mass standard to calculate the mass of the CCN samples.
Data were analyzed and processed using the PRIMUS (43). Indirect Fourier transformation GNOM (44) was used to gen-erate the pair distribution function of the samples and calculate the radius of gyration (R g ) and particle maximum dimension (D max ). The scattering data for CCN3 and CCN5 can be seen in Fig. 3. Ab initio modeling was performed using the program DAMMIN (45). Twenty individual runs of DAMMIN were averaged by the program DAMAVER (46). Visualization of the molecular envelope and manual modeling of the homology models of the CCN domains were performed in PyMOL (62). A summary of the data collection details is provided in Table 1.

RESULTS AND DISCUSSION
Expression and Purification-Both CCN3 and CCN5 proteins have been successfully expressed in the recombinant form and purified (Ͼ95% purity) to high concentration (47). In both cases, the proteins were constructed as recombinant protein in pET32 with an N-terminal thioredoxin (Trx) and hexa-histidine tag. The histidine tag was used for affinity purification, and the thioredoxin tag has been seen to help with the formation of disulfide bonds previously (35). In the case of CCN3, an ϳ30-kDa breakdown product was observed before being removed using a high molecular weight cut-off filter. This fragment comprising the C-terminal region was also observed in preparations of CCN3 by Perbal et al. (16) and Thibout et al. (36). Despite the presence of the factor Xa cleavage site, both proteins were left with the Trx tag intact as upon cleavage, there were problems with protein precipitation at high concentrations.
Biophysical Experiments-Circular dichroism experiments of CCN3 and CCN5 gave curves compromising ϳ50% ␤-strand and ϳ40% random coil and are shown in Fig. 2. The small proportion of helical content (ϳ10%) is likely due to the presence of the attached Trx tag that is composed primarily of ␣-helices (Protein Data Bank (PDB) ID 1KEB) (48). This high ratio of ␤-strands and random coil is in agreement with homology modeling (1) and secondary structure prediction software. These suggested that domains 1-3 are composed entirely of ␤-strands and disulfide-stabilized random coil and that only the CT domain may contain a small proportion of ␣-helix. For CCN5 that lacks the CT domain, all of the observed helical content is likely due to the ␣-helices present in the Trx tag.
There have been suggestions that the CCN proteins may exist as dimers or higher order oligomers in solution due to the VWC and CT domains (10). The calculated mass (calculated from the distance distribution function (P(r) curve) gave a mass of 50.2 kDa for CCN3 and 38.1 kDa for CCN5, which is close to the calculated masses of the fusion proteins of 51.2 and 39.9 kDa, respectively. The monomeric state was also supported by a The distance distribution function (P(r)) function from GNOM was used to calculate the radius of gyration (R g ) and the maximum dimension (D Max ). b The discrepancy ( 2 ) between experimental and ab initio data was calculated by the program DAMMIN. DAMAVER was used to average 20 models and gave the above value for normalized spatial discrepancy (NSD).  the monodisperse nature of the peaks seen in the dynamic light scattering experiments, which were both Ͼ99% pure by dynamic light scattering analysis. Ab Initio Modeling-Given the complete absence of structural data available for the CCN molecules, ab initio modeling was performed using the program DAMMIN (45) with 20 runs of DAMMIN (45) being averaged using the program DAMAVER (46). The resultant surface envelope models suggested that both proteins exist as extended, not globular, molecules in solution (Fig. 4A), differing from the globular hypothesis proposed by Perbal (9). This long extended flexible structure, although not immediately obvious as a stable protein, is in keeping with the known structural information about many of the other ECM multidomain proteins that exhibit similar long extended highly flexible structures (25). Fibronectin forms a long extended formation of small repeats (49), as does fibrillin (50), and all the current data on the related ECM modulator thrombospondin (51) also support the idea of a long extended molecule to allow for easy docking of ligands or multiple domains to interact with the same ligand simultaneously. The long linker region separating the N-and C-terminal halves of the molecule allows for a great deal of flexibility and could accommodate large domain movements for multiple domains to act in synergy with the binding ligands, as seen for domains 3 and 4 of CCN2, to simultaneously bind to VEGF 165 , as reported by Inoki et al. (20) or by Kubota et al. (24), using the four independent domains of CCN2. In addition to allowing flexible binding of multiple domains to ligands, the extended molecule allows access to the protease-susceptible sites that link the domains and may be why biologically active truncates are so prevalent among the CCN family (52,53). These proteases include several matrix metalloproteases (MMP-1, MMP-2, MMP-3, MMP-7, MMP-9, and MMP-13) that target the large linker region connecting the N-and C-terminal halves of the molecule, whereas other cellular proteases such as elastase and plasmin target the shorter linker between domains 1-2 and domains 3-4 (30 -32).
Although there is no direct structural information known about CCN proteins, it has been possible to form partial homology models for the individual domains ( Fig. 4B) (1, 2). These models were built using homologous domains from other ECM proteins and maintain the same structural features and the same conserved motifs and are also in agreement with CD spectrum data that suggest that the CCN proteins are composed predominantly of ␤-sheet and random coil structure. The IGFBP domain is a flat domain that has a two-lobe arrangement with the active site cleft between the lobes and the N-terminal lobe of primarily random coil stabilized by a ladder-like arrangement of disulfide bonds and an anti-parallel ␤-sheet forming the C-terminal half. For CCN3 and CCN5, the IGFBP domain was modeled from the NMR structure of IGFBP4 (PDB 1DSP) (54). The VWC domain is related to fibronectin (55) and has two small ␤-sheets stabilized by disulfides at the N-terminal half and a C-terminal half stabilized by three disulfides without any major secondary structure elements. The NMR structure of the VWC domain from collagen IIa (PDB 1U5M) was used to generate the homology models (55). The TSP domain forms a three-stranded anti-parallel sheet with layers of cysteines, tryp-tophans, and arginines similar to thrombospondin (56), and the structure of the TSP domain from malaria TRAP protein (PDB 2BBX) was used for both CCN3 and CCN5 (57). Finally, the CT domain has a cysteine knot that is similar to that found in the growth factors such as VEGF, TGF, and BMPs (58). The CT domain in CCNs may have some structural differences outside of the cysteine knot as only a partial model of the CCN3 domain could be built using the structure of TGF␤-1 (PDB 1KLA) (59) as a template (1,2).
With only partial homology models and an absence of structural data, we were unable to employ rigid body modeling programs such as SASREF (60); however, using molecular graphics programs, we were able to manually dock the modeled domains (1, 2) into the surface envelope produced by DAMMIN (45). In both cases, the ab initio surface envelope was able to comfortably accommodate the Trx tag and the functional domains for CCN3 and CCN5, aligning each domain so that the N and C termini were in the approximately correct orientation to link to the next domain. The available space between the domains also gave a large enough degree of freedom to comfortably accommodate the unstructured linker regions between each domain.
Given the high degree of homology (6) between CCN1-4 and CCN6, it is likely that this long extended flexible structure with each domain readily accessible for binding will hold true for every member of the CCN family. Although CCN5 lacks the CT domain that is responsible for many of the biological functions in CCN1-3 (9, 10, 12), CCN5 is still biologically active (13) and seems to share the same extended structure, suggesting that it is the interaction with other domains that is responsible for its biological activity and that subtle changes in the structure could be responsible for the observed activity.
Conclusion-Small angle x-ray scattering data have now given us our first insight into the structure of the full-length CCN proteins. It can now be hypothesized that they exist in the ECM as long extended scaffolds able to move and flex to allow multiple domains to bind a ligand, these domains working in concert to pass on messages throughout the cell. This first glimpse of the CCN structure when combined with higher resolution structural techniques may soon begin to answer some of the important questions about the molecular recognition of CCN proteins through their ligands. In addition, with further scattering experiments on CCN-ligand complexes, we may be able to observe the flexible nature of CCN binding.