NMR Shows Hydrophobic Interactions Replace Glycine Packing in the Triple Helix at a Natural Break in the (Gly-X-Y)n Repeat*

Little is known about the structural consequences of the more than 20 breaks in the (Gly-X-Y)n repeating sequence found in the long triple helix domain of basement membrane type IV collagen. NMR triple resonance studies of doubly labeled residues within a set of collagen model peptides provide distance and dihedral angle restraints that allow determination of model structures of both a standard triple helix and of a triple helix with a break in solution. Although the standard triple helix cannot continue when Gly is not every third residue, the NMR data support rod-like molecules that have standard triple-helical structures on both sides of a well defined and highly localized perturbation. The GAAVM break region may be described as a “pseudo triple helix,” because it preserves the standard one-residue stagger of the triple helix but introduces hydrophobic interactions at the position normally occupied by the much smaller and hydrogen-bonded Gly residue of the repeating (Gly-X-Y)n sequence. This structure provides a rationale for the consensus presence of hydrophobic residues in breaks of similar length and defines a novel variant of a triple helix that could be involved in recognition.

Little is known about the structural consequences of the more than 20 breaks in the (Gly-X-Y) n repeating sequence found in the long triple helix domain of basement membrane type IV collagen. NMR triple resonance studies of doubly labeled residues within a set of collagen model peptides provide distance and dihedral angle restraints that allow determination of model structures of both a standard triple helix and of a triple helix with a break in solution. Although the standard triple helix cannot continue when Gly is not every third residue, the NMR data support rod-like molecules that have standard triple-helical structures on both sides of a well defined and highly localized perturbation. The GAAVM break region may be described as a "pseudo triple helix," because it preserves the standard one-residue stagger of the triple helix but introduces hydrophobic interactions at the position normally occupied by the much smaller and hydrogen-bonded Gly residue of the repeating (Gly-X-Y) n sequence. This structure provides a rationale for the consensus presence of hydrophobic residues in breaks of similar length and defines a novel variant of a triple helix that could be involved in recognition.
Although the collagen triple helix can be considered as one of the most well defined protein motifs, there are still surprises and undefined molecular features. All non-fibrillar collagens contain sites where the repeating Gly-X-Y pattern is interrupted, and evidence suggests that these breaks are of functional importance, playing a role in molecular or higher order structure, or serving as recognition sites for interactions (1,2). The structural consequences of a break for the triple helix are not understood.
The unique triple helix protein motif found in collagen has three extended polyproline II-like chains supercoiled about a common axis. The typical triple helix is characterized by a (Gly-X-Y) n repeating sequence. The three chains are staggered by one residue, and Gly residues must be present as every third residue in each chain so that the three chains can pack very tightly, burying the Gly residues and forming Gly N-H . . . CϭO (X) hydrogen bonds (3)(4)(5). The abundant fibrillar collagens, e.g. types I, II, and III, maintain this precise Gly-X-Y repeat throughout their ϳ1000-residue triple helix domain, and the replacement of even one Gly by another residue results in a pathological condition (1). But many non-fibrillar collagens have been identified, and these all contain one or more breaks in the repeating Gly-X-Y pattern (1). The non-fibrillar type IV collagen is found in basement membranes of all multicellular animals, and its network architecture contributes to the essential function of basement membranes in providing mechanical support to cells, serving as semi-permeable barriers between tissues, and providing signals for differentiation (6 -9). In type IV collagen, there are more than 20 breaks in the ϳ1350-residue-long triple helix of the form Gly-X-Y-(AA) n -Gly-X-Y, and the lengths of the breaks range from common short breaks where n ϭ 1, to long breaks where n ϭ 26 (10,11). To begin to define the effects that small breaks have on triple helix structure, NMR studies are presented here on a model peptide of a type IV collagen break where n ϭ 4.
Peptide models can provide an approach to characterizing the effects of breaks on triple helix structure and stability (12,13). Our laboratories recently reported that a 30-mer homotrimer peptide model for a natural break found in the ␣5(IV) chain of type IV collagen, GPOGAAVMGPOGPO (residues 386 -399, where O is used as a single letter code for hydroxyproline (Hyp) 2 ) forms a stable trimeric structure, with a relatively small decrease in stability compared with the triple helical structure of a homologous peptide with Gly as every third residue, GPOGAAGVMGPO (13). The classic triple helix structure cannot continue when Gly is not present as every third residue, and biophysical studies indicate that the break decreases the triple helix content, destabilizes the triple helix by 10°C, and reduces the enthalpy substantially. Although NMR experiments reveal a non-random conformation within the break, it was not clear whether there is a way to continue a modified triple helix through this GAAVM region, or whether a new local structure, such as a ␤-bend, is introduced.
Here NMR triple resonance experiments are presented on a set of doubly labeled peptides with and without the GAAVM break (Table 1), allowing the assignment and tracking of individual chains, the identification of intra-and intermolecular NOEs, and the determination of dihedral angles through 3 J HNHa coupling constants. These NMR parameters are used as restraints in energy minimization of molecular models derived from high resolution x-ray structures of several homologous peptides, to allow the first visualization of the solution molecular conformation of a standard triple helix and of a triple helix with a naturally occurring type IV collagen break. The break in the type IV collagen sequence leads to a highly localized distortion of the triple helix. The GAAVM region has a well defined conformation, which maintains the one-residue stagger but introduces hydrophobic packing of three Val residues at the center of the molecule, which would be normally occupied by Gly residues.

EXPERIMENTAL PROCEDURES
SamplePreparation-PeptidesAc-(GPO) 4 GAAGVM(GPO) 4 GY-CONH 2 , designated as the GAAGVM peptide, and Ac-(GPO) 4 GAAVM(GPO) 4 GY-CONH 2 , designated as the GAAVM peptide, were synthesized by Tufts University Core Facility (Boston, MA), as previously described (13). The peptides were made with selectively 13 C/ 15 N doubly labeled residues (underlined below): residues Gly 13 , Ala 14 , Ala 15 , Gly 16 , Val 17 , and Gly 25 were labeled in peptide GAAGVM; residues Gly 13 , Ala 14 , Ala 15 , Val 16 , and Gly 24 were labeled in the peptide GAAVM. Peptides were purified using a Waters XTerra Prep C18 column on an Amersham Biosciences fast-protein liquid chromatography system, and the identity of the peptides was confirmed by matrix-assisted laser desorption ionization mass spectrometry.  (22)(23)(24) with mixing times of 30 -80 ms were performed at different temperatures from 15-25°C. Short mixing times (30 ms) in NOESY-HSQC were used to eliminate spin diffusion, and the data at various temperatures were used to help resolve overlapped resonances. Three-dimensional HNHA experiments (25) were performed to measure homonuclear 3 J HNHa coupling constants at 25°C, with an H-H coupling period of 25 ms. The correction factor for the 3 J HNHa coupling constants was obtained by performing T 1zz measurements (26) of amide protons on a 15 N-labeled GAAVM sample at 25°C. All data were processed using the FELIX 2004 software package (MSI, San Diego, CA) and/or NMRPipe (39) and analyzed with FELIX 2004 or NMRView (27). In the acquisition dimensions for all experiments, a solvent suppression filter was applied to the data prior to apodization with a 90°sine-bell window function. The data were subsequently zero-filled to 1024 complex points and Fourier transformed. The t 1 and t 2 dimensions in all three-dimensional experiments were increased 1.5 times by forward-backward linear prediction (28), multiplied by a sine-bell window function, zero-filled to 256 complex points, and Fourier transformed. The final three-dimensional data for each experiment included 256 ϫ 256 ϫ 512 real points.
In the 3 J HNHa experiments, the cross peak to diagonal peak volume ratios were taken to calculate the apparent coupling constants (25). Experimental error was calculated based on the experimental uncertainty in volume measurement. The experimental uncertainty was estimated as the standard deviation of the volume integration in the regions free of signals. The average value and the measurement error of the apparent 3 J HNHa coupling constants was then calculated based on the maximum and minimum J coupling constants from the maximum and minimum volume ratio of cross/diagonal peaks with experimental uncertainty included. From the separate T 1zz measurement (26) an average T 1sel value of 84 ms was found for the amide protons of the 15 N-labeled GAAVM peptide. Equation 1, below (29), was solved to determine accurate values of 3 J HNHa , and the correction factor of 1.16 was determined. The corrected J coupling constants of doubly labeled GAAGVM and GAAVM peptides were obtained by multiplying the apparent coupling by 1.16.

Generation of NOE Contact Map and Molecular
Modeling-A computer model structure of GAAGVM was obtained based on the crystal structure of T3-785 (PDB ID: 1BKV) (30) and built using the Molecular Operating Environment 2005.06 (Chemical Computing Group Inc., Montreal, Canada). The GIT-GAR-GLA residues in T3-785 were replaced with the residues GPO-GAA-GVM. The structure was solvated with a standard MOE Water Soak procedure and energy-minimized from residue 6 to residue 21 using the Amber99 all-atom force field (31) to a room mean square gradient of 0.05. This structure was used to generate the background of the NOE map of a standard triple helix motif. Hydrogen atoms were added to the model structure of GAAGVM using the REDUCE program (32). The predicted background map was generated by calculating NH-H distances equal to or smaller than 5 Å and classifying these as NH-NH, NH-H ␣ , and NH-side chains (H ␤ , H ␥ , and H ␦ ). NOE contact maps for peptides GAAGVM and GAAVM were made from observed NH-H NOEs in the threedimensional 1 H-15 N NOESY-HSQC experiment and classified as NH-NH, NH-H ␣ , and NH side chains (H ␤ , H ␥ , and H ␦ ).
A refined model of GAAGVM was generated by implementing the angle restraints from the 3 J HNHa values with varying restraint ranges to the energy minimization. Back calculation of 3 J HNHa values and back calculation of an NOE map were used to select the best fit model with the experimental data. A model of GAAVM was generated based on the x-ray crystal structure of Hyp (PDB ID: 1EI8) (33). Residues GPO-GP were substituted to GAA-VM. The resulting model was energy-minimized with 3 J HNHa coupling restraints similarly to the GAAGVM peptide minimization. Energy minimization is very sensitive to the initial starting structure and to the ranges of the restraints; therefore, a number of different input structures with varying restraint ranges were used. Back calculation of 3 J HNHa values and NOEs indicated that most structures were not consistent with all of the experimental data. A representative structure that was consistent with experimental 3 J HNHa values and all 1 H-1 H NOEs was selected for GAAGVM and GAAVM.

NMR Chain Assignments and Chemical Shift Differences between GAAVM and GAAGVM Peptides-For both peptides,
all trimer resonances could be assigned to specific chains of the triple helix as indicated in the heteronuclear single quantum coherence (HSQC) spectrum by the superscripted number (Fig.  1). The sequential assignment is derived from the triple resonance experiments, and the chain stagger is derived from NOE experiments that define interchain interactions.
To obtain the sequence-specific assignments, a series of triple resonance experiments were performed on the control GAAGVM peptide, where each residue in the GAAGV segment of the peptide is double labeled ( Table 1). For the GAAGVM peptide, traditional triple resonance experiments, including HNCO (16)/HN(CA)CO (17,18), HNCA (16), and HNCACB (20)/CBCA(CO)NH (21) experiments were performed (Table 2A). Unambiguous correlations were obtained for residues Gly 13 -Ala 14 , Ala 15 -Gly 16 , and Gly 16 -Val 17 . The resonances arising from the Ala 14 residue were highly overlapped in the proton and carbon dimensions, which is likely to be a result of the similarity in environment of the two Ala residues in the GAAG sequence. To resolve the three Ala 14 -Ala 15 connectivities, an additional (5,3)D HACACONHN/HACA, CONHN GFT experiment (14,15) was used and successfully sequentially correlated the chemical shifts of the 13 CЈ-13 C ␣ -1 H ␣ moieties of residue i Ϫ 1 and the NH group of residue i for Ala 14 -Ala 15 . For the GAAVM peptide complete sequential assignments were obtained for the doubly labeled GAAV segment in the GAAVM peptide from a single pair of HA(CA)NH/HA(CACO)NH (19) experiments (Table 2B), because there was no chemical shift overlap in the NH or H ␣ dimensions. Assignments were confirmed by the HNCA (16) experiment (Table 2B). NOE experiments were used to obtain interchain distances, which led to the identification of the three chains in terms of their 1-residue stagger. For example, NOEs observed between chain 3 3 Gly 13 NH and chain 1 1 Ala 15 H ␣ in GAAGVM and between 1 A 14 NH and 2 G 13 H ␣ in GAAVM indicated the relationship between chains and allowed identification of the leading (chain 1), trailing chains (chain 3), and middle chains (chain 2) of the peptides. Chain-specific assignments of peaks set the stage for using distance and dihedral angle constraints in structure determination.
The HSQC spectrum of the GAAGVM peptide gives a typical  pattern expected for the triple helix. Each labeled residue shows at least one monomer peak and one or more trimer peaks, consistent with the presence of the triple helix structure. Gly 25 shows only a single trimer resonance due to the repetitive Gly-Pro-Hyp environment (13), whereas residues Gly 13 , Ala 14 , Ala 15 , Gly 16 , and Val 17 all have multiple trimer peaks (two peaks for Gly 16 and three peaks for all the other residues) due to the non-repeating sequence environment (13,34,35). Ala 14 and Ala 15 have similar chemical shifts to each other, which may reflect their similar environments in the GAAG sequence. The HSQC spectrum of the GAAVM peptide again shows three trimer peaks as well as monomer peaks for each labeled residue. This indicates that a well defined structure with three nonequivalent chains is present in the region where the Gly-X-Y pattern is broken. Gly 24 , within the Gly-Pro-Hyp repeating region at the C terminus, shows only a single trimer resonance at the typical triple helix position similar to Gly 25 above in GAAGVM, whereas three trimer peaks are seen for residues Gly 13 , Ala 14  Prediction of Short Range Distances in a Classic Triple Helical Conformation from the GAAGVM Model Structure-A predicted contact map in a classic triple helical conformation is generated from a GAAGVM model structure. The high resolution structures of a number of collagen-like peptides have been solved by x-ray crystallography (30, 36 -38), and these structures have confirmed and provided molecular coordinates for the triple-helical structure. A model structure of the GAAGVM peptide was generated by molecular modeling using the crystal structure of the peptide T3-785 ((POG) 3 -ITGARGLAG-(POG) 4 sequence) (30), because it contains a central sequence with no imino acids surrounded by Gly-Pro-Hyp (GPO) triplets. The central GIT-GAR-GLA residues were replaced with the residues GPO-GAA-GVM, and the structure was energyminimized (see "Experimental Procedures" for details). The cross-section of this minimized GAAGVM model structure shows the NH groups of the Gly residues are located near the center of the triple helix and form hydrogen bonds with the CO of the X residue of a neighboring chain (Fig. 2). The NH vectors of the Y residues point outward toward the solvent, while the NH vectors of the X residues point more tangentially toward a neighboring chain within the molecule making a water-bridged hydrogen bond with the CϭO of the Gly of the neighboring chain.
A predicted contact map is generated from the computer model structure for the GAAGVM peptide and allows computation of distances that are within 5 Å for the backbone NH to backbone NH, H ␣ , and side-chain protons in this structure ( Fig.  3A; shaded squares indicate that there is one or more predicted contact within 5 Å, and circles indicate the individual predicted NOE). A number of salient features can be seen from the contact map and include: (a) Intrachain sequential distances (indicated in gray highlighted boxes) are not symmetric. The NH atoms of residue i (indicated by j NH(i), where j refers to the

HA(CA)NH/HA(CACO)NH HNCA
In GFT (5,3)D experiments 1-4 represent COϩCAϩHA, CO-CAϩHA, COϩCA-HA, and CO-CA-HA, respectively. b Sequential correlations that can be obtained from the experiment are marked with "X." chain number and i refers to the residue) have a larger number of NOEs to the NH, H, and side chain residues of i Ϫ 1 than to the NH, H, and side chain residues of i ϩ 1 (individual predicted NOEs are indicated by circles, which are superimposed on the gray highlighted boxes). (b) Interchain short distances directly across the chain (indicated in yellow highlighted boxes) can be seen for all residues primarily from j NH(i) to jϪ1 NH(i ϩ 1) or to jϪ1 H ␣ (i ϩ 1). NOEs are also predicted for the j NH(i) to jϩ1 NH(i Ϫ 1). These short distances arise directly from the one residue stagger of the triple helix. (c) There are additional interchain contacts between Gly residues, where the corresponding Gly residues in the three chains are packed in the center in a staggered manner (Fig. 2B). This staggered packing results in additional interchain j NH(i) to jϩ1 NH(i) and jϩ1 H ␣ (i), and j NH(i) to jϪ1 NH(i) and jϪ1 H ␣ (i) contacts. Additionally, unique NOEs between Gly j NH(i) to jϪ2 NH(i ϩ 3) and jϪ2 H ␣ (i ϩ 3), and between Gly j NH(i) to jϩ2 NH(i Ϫ 3) and jϩ2 H ␣ (i Ϫ 3) are predicted and would correspond to NOEs between 3 Gly 13 to 1 Gly 16 , and 1 Gly 16 to 3 Gly 13 .
Experimental NOE Observations in GAAVM and GAAGVM-A contact map (Fig. 3B) was generated for the experimental NOE data of the GAAGVM peptide, now using circles to denote the experimental values and highlighted squares of intramolecular (gray) and intermolecular (yellow) NOEs to represent predictions from the GAAGVM model structure (Fig. 3A). The expected intrachain NOEs and some additional interchain NOEs consistent with the close packing of the central Gly and the 1-residue stagger between chains ( 2 G 13 NH- 3 16 , the interchain NOEs were not observed. Despite the overlap it is possible to determine the triple helix stagger from the interchain NOEs and to confirm that the solution conformation of the GAAGVM peptide is consistent with a typical triple helix similar to the model structure. A contact map diagram was constructed for the experimental NOE data for GAAVM to compare the solution conformation of the peptide with the break to the predicted intra-and intermolecular contacts in the GAAGVM model, which are again represented as highlighted squares (Fig. 3C, highlighted boxes are taken from Fig. 3A). The experimental NOEs of GAAVM, represented as circles, were placed on the background of the predicted NOEs from the GAAGVM model and show a 1-residue stagger of the triple helix throughout the GAAVM break even though a standard triple helix is not possible due to the absence of the Gly residue. The diagnostic interchain NH-NH, NH-H ␣ NOEs between the three chains of the Gly 13 residues ( 1 G 13 NH to 2 G 13 NH, 1 G 13 NH to 2 G 13 H ␣ , 2 G 13 NH to 1 G 13 H ␣ , and 2 G 13 NH to 3 G 13 NH) are seen as well as interchain backbone NOEs from 1 Ala 14 to 2 Gly 13 and 2 Ala 15 to 3 Ala 14 supporting the one residue stagger. In addition, new interchain NOEs between the three chains of the Val 16 residues at the interruption site as well as between the Val 16 residues and the backbone protons of residues Gly 13 and Ala 14 are observed in the GAAVM peptide. These new NOEs observed for Val 16 at the break are similar to the predicted NOEs for Gly 16 in the Gly-X-Y repeating sequence of the GAAGVM model. This suggests that the amide group of Val 16 is packing into the center of the triple helix in this break region, attempting to mimic the features of the much smaller Gly residue found at the same position as the Val in the GAAGVM peptide.
Determination of J Coupling Constants-Additional NMR measurements were carried out to obtain 3 J HNHa coupling constants for both the GAAGVM and GAAVM peptides to compare the range of dihedral angles that can be adopted in the break with those in a standard triple helix with a Gly-X-Y repeat (Fig. 4). For GAAGVM the 3 J HNHa coupling constant values of Ala 14 , Ala 15 , and one of the H ␣ of Gly 13 , Gly 16 , and Gly 24 are relatively uniform in the range of 4 -5 Hz. Val appears to have a slightly larger value in the range of 6 -7 Hz. Typically 3 J HNHa values of ϳ5 Hz pose a multiple minimum problem, and experimental determination of a 5-Hz coupling constant generates four possible angles from the parameterized Karplus equation (25), 3 J HNHa ϭ 6.51 cos 2 ( Ϫ 60°) Ϫ 1.76 cos( Ϫ 60°) ϩ 1.6. One of the solution sets gives values of ϳϪ67°, which is consistent with average Gly dihedral angles of Ϫ69.1°for T3-785, suggesting that the angles for the labeled residues in the GAAGVM peptide are consistent with the angles found in the polyproline II (PPII) conformation (30). In contrast, the GAAVM peptide does not have uniform coupling constants along the entire peptide chain; values of Gly 13 and Ala 14 as well as the value of Gly 24 at the C-terminal end are very similar to those in GAAGVM. The coupling constant values of Ala 15 and Val 16 in the interrupted peptide are now in the range of 8.5-9.5 Molecular Modeling of the GAAGVM and GAAVM Peptides-The angle restraints from 3 J HNHa measurements were used to refine the model of GAAGVM derived from an x-ray crystal structure as described above. The refined model did not differ significantly from the original model, indicating that the original model derived from x-ray structure is in good agreement with NMR (Fig. 5, A and B). In the GAAGVM model, all , angles fall into the PPII region of the Ramachandran plot, consistent with a well-behaved triple helix (Fig. 5C).
The agreement between the model obtained from an x-ray crystal structure and the NMR solution data for the GAAGVM peptide established a strategy to use NMR data to define the conformation in solution for a GAAVM break. For this peptide, a model was generated based on a recent x-ray structure with a break in the Gly-X-Y repeating sequence (33). Our peptide differs from the peptide solved by x-ray in having a 4-residue break compared with a 1-residue break and in having a natural type IV collagen sequence compared with the repeating (POG) n sequence. The central GPOGP sequence of the crystal structure peptide (POG) 2 -POGPOGPG-(POG) 5 was replaced by GAAVM, and the structure was energy-minimized incorporating angle restraints from experimental GAAVM NMR 3 J HNHa  couplings. The structure obtained gives good agreement with NOE NMR data.
The principal features of the model preserves the 1-residue stagger between the 3 chains and introduces close packing of Val near the central axis (Fig. 5, D  and E). However, the larger Val residues are unable to fit into the center like the small Gly residues. The three Val residues form a small localized hydrophobic core in GAAVM (Fig. 5E). As a result of localized changes in the dihedral angles of eight residues (Fig. 5F), the three Val can be placed in this central location, maintaining the rodlike structure without creating a bulge. The GAAVM region may be denoted as a "pseudo-triple helix," because it retains the 1-residue stagger and the close packing of the 3 chains and differs only in the dihedral angles of a small number of residues. The absence of one residue in GAAVM versus GAAGVM creates a loss of axial register of the triple helix on both sides of the break, as seen in the crystal structure for peptides with breaks (5,33).
The Ramachandran plot of the GAAVM model indicates a local disruption of the PPII dihedral angles relative to the model of GAAGVM. The Val 16 dihedral angles are the most perturbed with all three chains falling outside of the PPII range (Fig. 5F). In addition, the dihedral angles of 2 Ala 15 and 3 Ala 15 (just N-terminal to Val 16 ), 2 Ala 14 , 1 Met 17 , and 1 Gly 18 all are outside the PPII region. The highly localized nature of the perturbation is recognized by the observation that only 8 residues in the 3 chains are altered from standard collagen dihedral angles, suggesting that the well defined deformation in structure is primarily absorbed by a subset of residues in the break so that one can optimize the extent of normal triple helix structure on both sides of the break.
A number of lines of argument support this GAAVM model obtained by molecular modeling with NMR J-coupling restrictions. Back calculation of the 3 J HNHa and NMR distances from the model shown in Fig. 5D and comparison to the experimentally observed 3 J HNHa and distances indicate that the proposed model has a good fit to the experimental data (data not shown). Other models tested did not show good agreement with the experimental data. Therefore, although the B and E, space filling model of the cross-section view from the N terminus to the C terminus of GAAGVM (B) and GAAVM (E). Views show that the GAAVM region adopts a "pseudo-triple helix," which preserves the 1-residue stagger between the 3 chains and introduces close packing and hydrophobic interactions of the three Val residues at the site of the break. Residues are color-coded similarly to the enlarged ribbon diagrams. The AGV segment in GAAGVM is colored as Ala 15 (dark blue), Gly 16 (yellow), and Val 17 (red), and the AVM segment in GAAVM is colored as Ala 15 (dark blue), Val 16 (red), and Met 17 (light blue). C and F, Ramachandran plots for model structures of GAAGVM (C) and GAAVM (F). To highlight the central region of the peptides, only residues 6 -21 in GAAGVM and residues 6 -17 in GAAVM are plotted and shown in black triangles. Ramachandran contour map for Pro residues is shown in the background, and typical secondary structures are indicated (␣, ␣-helix; ␤, ␤-sheet; and PPII, polyproline II and collagen).
GAAVM model is not unique, it provides a representative example of the structure that can be adopted by the peptide containing the break that fits the experimental data, and, as described below, it explains the observed consensus sequence of 4-residue breaks. The NMR data on the GAAVM peptide in solution is consistent with the rod-like nature of the triple helices and the highly localized nature of the structural perturbation of previous x-ray structures on two peptides with small breaks (5,33). The increased conformational flexibility at the break site reported for the crystal structure is consistent with the faster hydrogen exchange rates at Gly 13 in GAAGVM versus GAAVM (Ref. 13 and data not shown).
Biological Implications-The GAAVM break studied here represents one of the most common types of break in type IV collagen and other non-fibrillar collagens, with 4 non-Gly residues in a row, Gly-X-Y-Gly-AA 1 -AA 2 -AA 3 -AA 4 -Gly-X-Y. The consensus pattern for such breaks (denoted as G4G) includes a small residue at the AA 2 position and a hydrophobic residue at the AA 3 position (13). The GAAVM break contains the consensus sequence, with an Ala residue at the AA 2 position and a Val residue at the AA 3 position. The model structure derived from NMR data suggests that the placement of a hydrophobic residue at the AA 3 position of the break promotes novel hydrophobic packing near the central axis of the superhelix, whereas the presence of a smaller AA 2 residue is required just before the hydrophobic residue to permit the distortion required for this hydrophobic packing. The presence of a small hydrophobic core formed by the Val residues at position AA 3 suggests that hydrophobic stabilization can partly compensate for the Gly packing normally found at the same position and helps explain why this and similar breaks can be present in non-fibrillar collagens with melting temperatures similar to those of fibrillar collagens.
The structural consequences of the GAAVM break serve as a prototype for a typical break of this kind in non-fibrillar collagens, which are homotrimers, such as type VII collagen in anchoring fibrils and type X collagen in hypertrophic cartilage (1). The G4G breaks in these homotrimer collagens are predicted to resemble the locally perturbed triple helix structure reported here for the model peptides. Other non-fibrillar collagens, such as type IV collagen in basement membranes, are heterotrimeric, consisting of two or three distinct chain types. For type IV collagen molecules, a G4G break in one chain is sometimes opposite a G4G break in the other chains, giving a structure similar to that seen in the model peptide. However, in most cases the G4G break in one type IV collagen chain is opposite another kind of break or an uninterrupted sequence in the other chains, and it is not clear whether such "mixed breaks" will have larger or different effects on the triple helix structure.