Model structure of decorin and implications for collagen fibrillogenesis.

The three-dimensional structure of human decorin, a secreted proteoglycan involved in the regulation of collagen fibrillogenesis and cellular growth, has been modeled based on the crystal structure of the porcine ribonuclease inhibitor. Both proteins contain leucine-rich repeats and share 18% identical residues. This model structure of decorin has an arch shape with the single glycosaminoglycan chain and the three N-linked oligosaccharides located on the same side of the molecule. Decorin was modeled as binding to a polar sequence of collagen type I found in the d band. The inner concave surface is the appropriate size and shape to accommodate only one collagen triple helix of ∼3 nm in length. The binding of one collagen triple helix to decorin is proposed to play a major role in the formation of the staggered arrangement of collagen molecules within the microfibrils by preventing lateral fusion of collagen molecules.

Decorin belongs to a growing family of structurally related proteoglycans, grouped as the small leucine-rich proteoglycans (SLRPs), 1 whose major functional roles include regulation of collagen fibrillogenesis, modulation of growth factor activity, and regulation of cellular growth (1). Comparison of the human (2), bovine (3), avian (4), and murine (5) decorin has allowed a better understanding of its structure. The mature protein is highly conserved across species and consists of a central domain harboring ten leucine-rich repeats (LRR) flanked by disulfide-bonded terminal sequences. The amino terminus contains a single attachment site for either chondroitin or dermatan sulfate, whereas the central domain harbors three attachment sites for N-linked oligosaccharides. The sequence homologies are most pronounced in the central domain, and it is most likely that the high affinity binding site(s) (K d in the nanomolar range) of decorin for various fibrillar collagens, including types I and II (6), III (7) and VI (8), is located to this central region. Indeed, results from three independent laboratories that have used recombinant decorin (9), decorin peptides (10), or biglycan/decorin chimeric proteins (11) have demonstrated that the LRRs 4 -6 contain the high affinity binding site(s) for collagen type I. The binding of decorin to fibrillar collagen has important biological consequences in vivo, as recently demonstrated by the phenotype of mice lacking the decorin gene. 2 In these mutant animals, disruption of the decorin gene leads to skin fragility and abnormal collagen morphology, characterized by uncontrolled lateral fusion of fibrils. Additional evidence that LRRs are potent mediators of specific homophilic and heterophilic interactions is provided by the observation that a single LRR in the nerve growth factor receptor TrkA can bind its ligand with nanomolar affinity and can mediate neurotrophin selectivity (12).
The crystal structure of porcine ribonuclease inhibitor (RI), a leucine-rich protein with structural homology to the SLRPs, has recently been elucidated at 2.5 Å resolution (13) and also its complex with RNase (14). The entire RI is made up of 15 tandem LRRs with alternating short ␤-strands and ␣-helices. These units form a superhelix and are arranged so that the ␤-strands and ␣-helices are parallel to a common axis, thereby forming a horseshoe structure with the ␤-sheets lining the concave surface of the molecule and the ␣ helices flanking the convex face (13). This unusual shape, together with the vast exposure of the concave face to the solvent and its conformational flexibility, may explain why LRRs are utilized to achieve strong protein-protein interactions (1,14).
Comparative modeling approaches capable of predicting the three-dimensional structure of a given polypeptide from sequence similarities with known structures have been previously used to investigate ligand binding to specific proteins and receptors, to solve experimental structures in solution or in the crystal state, and to design protein engineering experiments (15). In the present investigation, we modeled the three-dimensional structure of human decorin based on the crystal structure of the porcine RI using the molecular mechanics method AMMP, which uses all atoms without a distance cut-off (16). The decorin model is arch-shaped, and the concave surface can accommodate one triple helix of collagen type I. Decorin proteoglycan is proposed to bind in the 0.6-D (1 D ϭ 67 nm) gap occurring between staggered collagen molecules and could account for the proposed binding sites and functional role of decorin in regulating collagen fibrillogenesis.

EXPERIMENTAL PROCEDURES
The crystal structure of the RI (13) was used to build the model of decorin. The alignment of the amino acid sequences of human decorin and the RI was made by first aligning the LRRs. Then insertions and deletions were positioned on the basis of the secondary and tertiary structure of the ribonuclease inhibitor. The program AMMP (16) was run on a Dell PC (133 MHz Pentium) in Windows to build and minimize the model of decorin. The Windows 95 implementation of AMMP is comparable in speed to a Unix workstation. The procedure has been optimized so that minimization of the crystal structure of a protein results in atomic positions that are within the experimental errors in protein crystal structures (17,18). The molecular mechanics calculations used the UFF potential set with modified atomic parameters as listed previously (18). The force constants for the planarity terms of the carbonyl and aromatic carbon atoms were increased from 6 kcal/(mol-Å) to 150 and 100 kcal/(mol-Å), respectively. This change significantly improved the agreement between the calculated normal mode frequencies and observed IR absorptions for formaldehyde and benzene. Interatomic distance restraints were generated for the backbone and side chain atoms expected to have similar positions in the RI and decorin from the sequence alignment. These restraints were expressed as a split harmonic potential similar to that used for nuclear Overhauser effect data. The inserted residues were built with ideal geometry but not in self-avoiding positions with conjugate gradients. Then the decorin molecule was optimized using four-dimensional embedding as described before (17). Two embedding passes were used, first with the distance restraints and then without. The output of the last pass was then further optimized with conjugate gradients to improve the final geometry. A model for collagen type I (19) was modified to have the sequence of residues 174 -186, KGEAGPEGARGPE, because this region is the center of the decorin binding site near the amino terminus of collagen. The collagen triple helix structure was optimized as described previously (17,18) and was manually positioned within the concave surface of the decorin model. Rigid body searches of the collagen were made to locate the minimum energy complex with decorin. The rigid body searches were performed with the quarternion/simplex search in AMMP. Finally, the decorin-collagen complex was minimized. Several complexes of similar energy were obtained with the collagen triple helix threaded through the decorin "arch" shape. The complex in which the collagen interacted with LRRs 3 and 4 of decorin was chosen because this position agrees with other experimental data.

RESULTS AND DISCUSSION
Model Structure of Human Decorin-The structure of human decorin has been predicted by analogy to the crystal structure of the porcine RI (13). The structural motif of the LRR in the RI consists of 27-29 residues that form a ␤-strand connected to an ␣-helix. The 15 LRRs of the RI form a horseshoe-shaped structure with the inner concave surface formed by curved ␤-sheets and the outer convex surface by the ␣-helices. The leucine-rich repeats of decorin are shorter than in the RI, with an average length of 24 residues. Therefore, the ␤/␣ structural motif was shortened by reducing the length of the helix and the connecting region by 4 -6 residues (Fig. 1). Similar deletions were used to build a model of the LRRs of the thyrotropin receptor (20).
The model structure of decorin consists of an arch shape with an inner concave surface formed by the curved ␤-sheet and the outer convex surface formed by ␣-helices (Fig. 2, A and B) as in the crystal structure of the RI (13). The overall dimensions are 6.5 nm (the distance between the two arms) ϫ 4.5 nm (the distance between the base of the arch and the apex) ϫ 3 nm (overall thickness). In support of our model, a preparation of mixed SLRPs from bovine sclera, containing decorin and fibromodulin/lumican proteoglycans, when examined by rotary shadowing electron microscopy, revealed horseshoe-shaped images (21), resembling our arch-shaped model. As expected for dry preparations of proteoglycans, the dimensions were somewhat smaller (6 ϫ 3.8 ϫ 2 nm) than those predicted by our model. The RI dimensions are larger (7 ϫ 6.2 ϫ 3.2 nm) than those of decorin, as predicted from the greater number of LRRs.
The ␣-helical structure in the decorin model is not fully conserved in the LRR3 helical region, because decorin has two proline residues. However, the RI also has proline in the middle of two helices. The conformation of regions with insertions or deletions is very difficult to predict accurately (17,22,23), but the overall architecture of proteins and common ligand binding sites can often be predicted with relatively high accuracy (17,22). Fortunately, the conserved residues of the LRR are located in the ␤-strands and amino terminus of the helix, and their conformation is essentially unchanged in this model structure of decorin. These results support the view that other SLRPs would fold in a manner similar to porcine RI and human decorin. This is likely due to the relatively large number of tandem repeats of leucine-rich modules, which would force the two parallel sets of ␣-helices and ␤-strands into a curved shape. Our model of decorin forms a structure somewhat more open than the horseshoe of the RI, which would allow a greater access of ligand proteins to the inner concave surface and thus a greater capacity of forming favorable contact points with other proteins.
Interaction of a Collagen Triple Helix with Decorin-The overall shape of the decorin molecule immediately suggested that collagen molecules would bind on the inner concave surface. It was clear that only a single collagen triple helix would be able to fit well inside the curved surface of decorin (Fig. 2). The inner diameter of the decorin model is about 2.5 nm, which provides a good complementary surface for the 1.5 nm diameter rod formed by the collagen triple helix (24). Interestingly, there are several polar and charged side chains exposed on the concave surface of decorin (Fig. 2C), and therefore, decorin would be expected to bind preferentially to a more polar region of the collagen molecule. Although the binding of decorin to the d band of collagen has been proposed to occur near its carboxyl end, at residues 886 -891 of collagen (21, 25), a recent study using isolated collagen molecules and purified decorin has shown convincingly that the binding occurs preferentially near the amino-terminal end of collagen (26). The precise mapping of the decorin-binding site was possible because procollagen molecules harboring a globular C-propeptide were also tested and revealed the molecular polarity. Specifically, the major binding site, which accounts for ϳ60% of decorin/collagen associations, occurs between collagen residues 158 and 198 and corresponds exactly to the d band of collagen quarter stagger (26). This region of collagen has many charged residues and the central stretch of 13 amino acid residues ( 174 KGEAGPE-GARGPE 186 ), which harbors 2 basic and 3 acidic residues, was selected for modeling. Thus, the collagen triple helix will have 6 basic and 9 acidic residues in this region (Fig. 2D). Different models of collagen suggested that different combinations of these charged residues would form salt bridges that intercon- nect the three strands in the triple helix. This region of collagen is predicted to form ionic and polar interactions with the residues on the concave surface of decorin. Decorin has 10 basic and 11 acidic residues facing the inner cavity of the C-shaped molecule (Fig. 2C). Accordingly, complementarity of charge would enhance binding between collagen and decorin. The model shows collagen interacting most closely with LRRs 3 and 4 (Fig. 2, E and F), in agreement with experimental data based on direct interaction between decorin peptides or decorin/biglycan chimeric proteins and collagen type I (10,11). This predicted decorin-binding sequence located in d band of collagen type I is also present with somewhat lower fidelity in collagen types II, III, and V and is highly conserved in murine, avian, bovine, and human collagens, an important point insofar as decorin association with the d band of fibrillar collagens occurs in all species so far examined (27).
An interesting feature of decorin model is that all the carbohydrate moieties are positioned on one side of the arch-shaped molecule (Fig. 2E). The single dermatan/chondroitin sulfate chain located on Ser 7 is at one edge of the decorin molecule, so that the glycosaminoglycan side chain is relatively free to protrude away from decorin in different directions. That is, the linear, highly charged glycosaminoglycan chain can align orthogonally or parallel to the major axis of the collagen fibril as previously observed in ultrastructural studies using cationic dyes (28). The three N-linked oligosaccharides at Asn 184 , Asn 228 , and Asn 275 , respectively, all lie on the same surface of decorin, so that the glycosylation will increase the thickness of the arch-shaped decorin molecule.
Decorin Could Act as a Spacer During Lateral Assembly of Collagen Molecules-The collagen molecule consists of a triple helical arrangement that forms a 300-nm-long flexible rod, ϳ1.5 nm in diameter (24,29). There is ample evidence that three ␣-chains with a repeating Gly-X-Y sequence staggered by one residue relative to one another are required to make a collagen molecule (29). An individual ␣-chain is not stable as a helix. These collagen molecules aggregate in parallel into a Fibril, a D-staggered array (1 D ϭ 67 nm, or 234-residue spacings) that overlaps about three-quarters of the length of the molecule (Fig. 3). This aggregation gives rise to the characteristic D periodicity of fibrils in which each D period is subdivisible into an overlap zone and a more loosely packed gap zone (26). The triple helix molecules are in contact with each other for about three-quarters of their length with a staggered spacing (gap zone) of 0.6 D, ϳ40 nm, between molecules. Type I collagen triple helices are further stabilized by cross-linking of their lysine residues to produce the four-dimensional stagger (29). The decorin molecule is proposed to enclose one triple helix of collagen in the gap between molecules at the d band, located at ϳ0.8 D from the amino terminus. Decorin binding to a fibril of five collagen triple helices is shown in Fig. 3. A secondary binding site, located at ϳ1.6 D from the amino ter-  (26). Basic residues are in blue, acidic residues are in red, and the other residues are in yellow. E is a model of the decorin complexed with a triple helix of collagen in a space-filling representation with decorin in green and collagen in yellow. The glycosaminoglycan attachment site on Ser 7 is in orange, and the N-linked oligosaccharide attachment sites are in purple. F is a model of the decorin/collagen complex in a view perpendicular to that of E. About 10 residues of collagen fit inside the cavity of the decorin molecule. minus (26), could also interact with the decorin molecule thus further stabilizing this interaction. Only one triple helix would fit inside the arch-shaped decorin, and the fibril could not form with continuous molecules at this position because decorin would block the aggregation of the collagen molecules. Decorin is proposed to bind with the major collagen-binding site at the concave surface and contact the secondary site on an adjacent collagen molecule with one of the terminal ends, perhaps via its carboxyl end as proposed before (9,10). In any event, the presence of decorin at these strategic sites would assist in the correct positioning of the collagen molecules within the staggered conformation of the fibril. Interactions with other SLRPs and enzymatic cross-linking of triple helices would represent additional aids for proper collagen fibrillogenesis in vivo.
In conclusion, the model of decorin/collagen complex predicts a tight fit between these two matrix proteins at the molecular level and provides a molecular explanation and a refined topology for the observed decorin-binding sites on collagen. The binding of decorin to one collagen triple helix within the gap zone and a second collagen molecule in the staggered arrangement will promote formation of the correct Fibril and prevent incorrect addition/fusion of collagen in the gap. Decorin is the shaded vertical rectangle and is proposed to interact with a primary sequence located at ϳ0.8 D from the aminoterminal end (N) of collagen type I (26). The model shows the possibility of decorin interacting also with a secondary site located at ϳ1.6 D from the amino terminus. The decorin-binding sites are indicated by solid rectangles on the collagen. The gap and overlap are approximately 0.6 and 0.4 D, respectively. The bottom panel is a schematic cross-sectional view of decorin interacting with two collagen triple helices (solid circles) within a fibril. One collagen molecule lies within the decorin arch and another interacts with one arm of the arch. The dotted circle represents the gap region.