Characterization of Collagenous Peptides Bound to Lysyl Hydroxylase Isoforms*

Lysyl hydroxylase (LH, EC 1.14.11.4) is the enzyme catalyzing the formation of hydroxylysyl residues in collagens and other proteins with collagenous domains. Although lower species, such as Caenorhabditis elegans, have only one LH orthologue, LH activity in higher species, such as human, rat, and mouse, is present in three molecules, LH1, LH2, and LH3, encoded by three different genes. In addition, LH2 is present in two alternatively spliced forms (LH2a, LH2b). To understand the functions of the four molecular forms of LH in vertebrates, we analyzed differences in the binding and hydroxylation of various collagenous peptides by the LH isoforms. Nine-amino acid-long synthetic peptides on Pepspot were used for the binding analysis and an activity assay to measure hydroxylation. Our data with 727 collagenous peptides indicated that a positive charge on the peptide and specific amino acid residues in close proximity to the lysyl residues in the collagenous sequences are the key factors promoting peptide binding to the LH isoforms. The data suggest that the LH binding site is not a deep hydrophobic pocket but is open and hydrophilic where acidic amino acids play an important role in the binding. The data do not indicate strict sequence specificity for the LH isoforms, but the data indicated that there was a clear preference for some sequences to be bound and hydroxylated by a certain isoform.

Hydroxylysyl residues are post-translational modifications of proteins found mainly in collagens, the components of connective tissues (1)(2)(3)(4)(5). Hydroxylysyl residues are found in the Y positions of repeating X-Y-Gly sequences of collagenous polypeptide chains. In addition, some other sequences in the non-helical amino-terminal region of fibrillar collagens contain hydroxylysyl residues but lack the repeating collagenous sequence. The amount of hydroxylysyl residues among the different collagen types varies from 6 residues per 1000 amino acids in type III collagen to about 70 residues per 1000 amino acids for type VI collagen. Hydroxylysyl residues are also present in some noncollagenous proteins, such as acetylcholinesterase, complement C1q, macrophage scavenger receptor, etc., all of them, however, having a collagenous domain in their structure (1)(2)(3)(4)(5).
Hydroxylysyl residues serve as attachment sites for the carbohydrates galactose and glucosylgalactose (6). The function of these carbohydrate moieties, which are unique to collagenous structures, is unknown, however. Hydroxylysyl residues also have an important function in collagen cross-link formation in vivo, which is to build fibril-forming collagen molecules into stable supramolecular structures (2,7,8). Two alternative pathways exist for cross-link formation, one based on lysine residues and the other on hydroxylysyl residues. The lysylassociated pathway predominates in skin, cornea, sclera, and other soft tissues, whereas more stable cross-links are formed by the hydroxylysyl-derived pathway, which predominates in weight-bearing and mineralized tissues such as bone, cartilage, ligament, and tendons. Embryonic skin and major connective tissues of the body also contain hydroxylysyl linked cross-links (2,7,8).
Lysyl hydroxylase (LH, 1 EC 1.14.11.4) is the enzyme that catalyzes the formation of hydroxylysyl residues in collagens and other proteins with collagen-like amino acid sequences (1)(2)(3)(4)(5). The reaction occurs as a post-translational event before triple helix formation in the endoplasmic reticulum. The reaction requires ferrous ion, 2-oxoglutarate, molecular oxygen, and ascorbate. The amino acid sequence required for the LH reaction is an X-Lys-Gly sequence, a triplet repeated many times in collagenous proteins. There is evidence that in some cases glycine in the above triplet can be replaced by some other amino acids (1,2,9). The interaction of the amino acid sequence with LH is influenced by peptide chain length, the effect of which is seen mainly in the K m , which decreases with increasing chain length, whereas the V max of the reaction is unaffected (1,10,11). Furthermore, the triple helical or other type of conformation of the peptide prevents lysine hydroxylation (1,12).
Lysyl hydroxylase activity in the human, mouse, and rat tissues is present in three different molecules, LH1, LH2, and LH3, originating from three different genes (1,(13)(14)(15)(16)(17)(18)(19)(20)(21)44). Our phylogenetic data suggest that LH1 and LH2 are more closely related and produced by a more recent duplication event than the less closely related LH3 (18). LH2 is present in two alternatively spliced forms (LH2a and LH2b); LH2a excludes exon 13A from the LH2 transcript (22,23). LH3 differs from LH1 and LH2 in that it is multifunctional and able to catalyze in addition to lysyl hydroxylation, sugar transfer reactions, the consecutive steps in the formation of glucosylgalactosylhydroxylysyl residues (24 -25). Only one isoform for LH is present in lower species such as Caenorhabditis elegans (26). This ancestral C. elegans LH is also able to glycosylate hydroxylysyl residues and, thus, is functionally similar to LH3 (25,27).
To better understand the evolutionary development of the lysyl hydroxylase isoforms and to obtain information about the functions of the three isoforms in vivo, we compared the interactions of LH1, LH2, and LH3 with different collagenous peptides. Furthermore, we analyzed the change in activity and peptide binding when LH2 was changed via alternative splicing of one exon, from LH2a (a short form) to LH2b (a long form). We have studied the characteristics of the peptides bound to the LH isoforms and the ability of each isoform to hydroxylate the peptides. Our data revealed that there was no absolute sequence requirement of the collagenous peptide substrate for the LH1, LH2, or LH3 isoforms, but there was a clear preference for some sequences to be bound and hydroxylated by a certain isoform.

EXPERIMENTAL PROCEDURES
Production and Purification of LH Isoforms-All recombinant LH isoform expressions were carried out by baculovirus transfer vector pFastBacI in the BAC-TO-BAC TM Expression system (Invitrogen). The vector was modified to contain the human LH1 signal peptide, His 6 tag, and a BamHI site for insertion of the desired cDNA (12,14). The construct of human LH1 covered the nucleotides from 255 to 2405 (13). The construct of human LH2a and LH2b covered the nucleotides from 76 to 2267 (14,22), the LH2b containing the alternatively spliced exon sequence between nucleotides 1500 and 1501 (22,23). The construct of human LH3 covered the nucleotides from 289 to 2455 (15). The vectors were transfected in Sf9 cells. For Pepspots analysis the His-tagged recombinant proteins were purified on a nickel nitrilotriacetic acidagarose (Qiagen) column as described elsewhere (14,24,25). For activity assays the crude supernatant of the SF9 cells was used as the enzyme source.
Activity Measurements-LH activity was assayed by the hydroxylation-coupled decarboxylation of 2-oxo [1-14 C]glutarate (28) using a synthetic peptide as a substrate. The specific activity of 2-oxoglutarate was 74 ϫ 10 5 dpm/mol. The synthetic peptides used as substrates in the LH reaction were synthesized at the Department of Biochemistry, University of Oulu, using an ABI 433A synthesizer and an Fmoc deprotection strategy. The peptides were purified by reverse-phase chromatography and the quality of the peptides was assessed by mass spectrometry.
Preparation of Peptide-scanning Membrane (Pepspots)-The amino acid sequences (Pepspots sets) chosen for peptide scanning analysis were obtained from collagen polypeptide chains of types I-XIX (29). The sequences were examined for occurrences of X-Lys-Gly, and for each instance found, a subsequence of nine amino acids was selected, including three amino acids on either side of the X-Lys-Gly. The peptides were synthesized as spots (30) onto polyethylene glycol-derivatized cellulose membranes (AIMS Scientific Products, Braunschweig, Germany; www. aims-scientific-products.de) using the peptide-scanning instrument AutoSpot Robot ASP222 (Abimed Analysen-Technik, Langenfeld, Germany).
Peptide-scanning Assay-For interaction studies, the membrane was blocked for 2 h at room temperature with 5% dried milk in 20 mM Tris-HCl, pH 7.6, 137 mM NaCl, and 0.1% Tween 20 and subsequently incubated with a baculovirus-expressed and nickel column-purified LH isoform at a concentration 5 g/ml in the presence of 50 mM Tris-HClbuffer, pH 7.8, 50 mM Fe 2 SO 4 , and 1 mM ascorbate for 1 h at 37°C. Unbound protein was removed by washing two times with Tris-buffered saline. The bound LH isoform was electroblotted onto a polyvinylidene difluoride membrane (Millipore) using a Semiphor transfer unit (Hoefer) as recommended by Jerini Peptide Technologies GmbH. and visualized on x-ray film (Eastman Kodak Co.) according to the protocol of ECL TM Western blotting (Amersham Biosciences) using the monoclonal anti-polyhistidine clone His-1 (Sigma) and horseradish peroxidase-goat anti-mouse IgG conjugate (Zymed Laboratories Inc.).
Bayesian Analysis-A Bayesian approach (31) was taken to analyze the Pepspot data. We wanted to access the qualities of the peptides that would promote or hinder binding. The binding probability of each set of peptides carrying a certain quality was compared against the binding probability of peptides that were not in the set. Because of the limited amount of data, the exact values of binding probabilities were not calculated, but instead, the binding probabilities were treated as random variables. Their probability distributions were calculated based on our prior estimates and the data. The binding probability of a set of peptides was defined as the probability that a Pepspot binding test gives a positive result for a peptide sampled randomly from the set. The probability of sampling a particular peptide sequence from a set of peptide sequences equals the probability that evolution would produce such a sequence in a uniformly random position in the collagen sequence given that the collagen subsequence is known to be in the set. The selection of the sequences for the Pepspots set was modeled as this kind of sampling.
Sammon Map of Sequences-To find more qualities of the peptides that might be related to binding, a visualization tool called Sammon mapping (32,33) was used to map the Pepspot peptides onto a twodimensional map in such a way that the distances on the Sammon map approximated calculated pair-wise sequence dissimilarities. Similar peptides were therefore positioned close to each other, and dissimilar peptides were placed farther apart. The sequence dissimilarity (WAC dissimilarity) was calculated as a sum over the WAC amino acid dissimilarities of the corresponding positions in the two sequences. The WAC similarity matrix (34), which is based on physicochemical properties of the microenvironments created by amino acids, was chosen over matrices based on phylogeny because it was thought that the peptide sequences may have evolved at least partially independently. The WAC dissimilarity used here was derived from the WAC similarity as 14 minus WAC similarity, where 14 is the value of the WAC selfsimilarity. The G3PCX (35) evolutionary algorithm was used to optimize the Sammon map.
CD Spectra-The secondary structure of the peptides that were employed in the activity measurements was studied with CD spectroscopy. The CD spectra were recorded with a Jasco J-715 spectropolarimeter equipped with a microprocessor for spectral accumulation and data manipulation. All measurements were made in 1-mm path length quarts cells. The peptide concentration was 0.5 mg/ml, and the measurements were performed at room temperature (ϩ22°C). The ellipticity was converted to the mean residue molar ellipticity. Secondary structure percentages of peptides were calculated from the circular dichroism spectra using the CONTIN/LL method (36) with the SP37A reference set of 37 soluble proteins (37), as provided by the CDPro software package (38).
Theoretical Peptide Structures-It is suspected that peptides binding to LH have a similar fold (11). This was investigated by solving theoretical structures of the Pepspot peptides in water and by comparing the structural data to the Pepspot binding data. The structures were solved using the biased probability Monte Carlo global optimization procedure with distance dependent electrostatics, as provided by the ICM software package (39). The carboxyl end negative charge was protonated to account for the effects of the peptides being bound at their carboxyl end to the membrane via a polyethylene glycol spacer molecule.
Theoretical Peptide Structures and Binding Data-Various approaches were used to find correlations between the peptide structures and the Pepspot binding data. As a structural dissimilarity measure, the root mean square deviation of the coordinates of the corresponding backbone and Lys-5 heavy atoms between all Pepspot peptide pairs were calculated. A Sammon map of the peptides was created from the calculated root mean square deviation data and colored according to the Pepspot binding. The dependence of binding probability on peptide solvation free energy was analyzed statistically. A large negative solvation free energy suggests that the peptide prefers an aqueous environment over a hydrophobic one. The solvation free energy was calculated using ICM software. Previous research (11) suggested a bent structure with lysine in the middle as a criterion for hydroxylation. To assess this, a secondary structure analysis (40) was carried out on the three-dimensional structures.

RESULTS
We have synthesized nine-amino acid collagenous peptides having lysyl residues at the Y position of repeating X-Y-Gly sequences. The sequences were obtained from polypeptide chains of type I-XIX collagens (29) and comprised altogether 727 different sequences. The peptides represent potential in vivo substrates for the lysyl hydroxylase isoforms. As shown in Table I, most of the peptide sequences were obtained from collagen types IV, V, VI, VII, IX, and XI. The synthetic peptides were immobilized onto a membrane, forming Pepspots, and the binding of the LH isoforms to the Pepspots was recognized via a His-tag signal at the amino terminus of the bound enzyme. Subsequently, we performed a statistical analysis of our Pepspots data and studied the characteristics of peptides bound to Pepspots and the peptides that failed to bind. We analyzed the effects of the net charge, the physicochemical properties, and the position of individual amino acids in the sequence on the binding of the peptide to the enzyme. Some of the peptides were also chosen for in vitro activity assays.
Binding of Collagenous Peptides to LH1-LH1 bound to 196 peptides derived from various collagen types, the majority from types XI, XII, XV, and XVII (Table I). As shown in Fig. 1, the net charge of the peptide strongly influenced the binding. A net charge of 0 or less was associated with a low binding probability, a net charge of ϩ or Ϫ1 was associated with a moderate binding probability, and a net charge of 2 or greater was associated with a high binding probability. Our data also indicated that some amino acids in certain positions promoted binding to the isoforms, whereas other amino acids inhibited binding (Table II). Lysine and arginine in position 2 as well as glutamate in position 4, histidine and aspartate in position 7, and arginine and lysine in position 8 promoted peptide binding to LH1, whereas glycine in position 1, alanine, aspartate, and proline in position 2, glutamine and isoleucine in position 4, glutamate in position 7, and alanine, glutamate, and proline in position 8 reduced the binding. A Sammon map was used to visualize peptide binding using a two-dimensional WAC similarity matrix based on physicochemical properties of microenvironments created by amino acids (Fig. 2a).
Binding of the Peptides to LH2a and LH2b-As seen in Table  I, 195 peptides bound to LH2a, and 226 bound to LH2b. These were derived from several collagen types (Table I) but predominantly from types VI, VII, and XVII in the case of LH2a and from types XI and XII in the case of LH2b. The net charge of the peptide had a significant effect in the binding to LH2a and LH2b as seen previously with LH1 (Fig. 1). There were differences between LH2a and LH2b at a net charge of 1 and 2, however. Sammon map visualization of peptide binding was also carried out (Fig. 2, b and c) as for LH1.
Analysis of the individual amino acids of the peptide sequence indicated with high confidence that valine in position 1, arginine and lysine in position 2, lysine and leucine in position 4, aspartate in position 7, and arginine and lysine in position 8 promoted binding to LH2a, whereas alanine in position 2, alanine and glutamate in position 4, glutamate in position 7 and alanine, and methionine and proline in position 8 reduced the binding (Table II). The results for LH2a were somewhat different from those obtained for LH2b. In the case of LH2b, lysine and arginine in positions 1 and 2, alanine in position 4, arginine, histidine, isoleucine, lysine, and serine in position 7, and arginine and lysine in position 8 promoted peptide binding to LH2b, whereas glutamate in position 1, aspartate in position 2, glutamine and isoleucine in position 4, aspartate and glutamate in position 7, and aspartate, glutamate, glutamine, and proline in position 8 reduced the binding.
Binding of the Collagenous Peptides to LH3-242 of the peptides bound to LH3. The bound peptides were derived mostly from collagen types VI, XI, and XII (Table I). The net charge of the peptide was again an important determinant for binding to LH3 (Fig. 1). Visualization of the peptide binding is shown in Fig. 2d. Our data indicated that arginine and lysine in position 1, alanine and threonine in position 4, lysine, serine, tyrosine, and histidine in position 7, and arginine and lysine in position 8 promote peptide binding, whereas glutamate in position 1, aspartate in position 2, glutamate and aspartate in position 7, and glutamate, aspartate, glutamine, and proline in position 8 inhibit the binding (Table II).
Some of the Peptides Bound to One Isoform-There was some overlapping in the binding of peptides to the LH isoforms; 46 of the peptides bound to all isoforms (LH1, LH2a, LH2b, LH3), 6 bound to LH1 and LH3, 22 bound to LH2a and LH3, 33 bound to LH2b and LH3, 6 bound to LH2a and LH2b, 19 bound to LH1 and LH2a, and 30 bound to LH1 and LH2b. There were some peptides, however, that bound to only one LH isoform. We found 21 sequences, which only bound to LH1, 45 bound only to LH2a, 32 bound only to LH2b only, and 40 bound only to LH3.
Amino Acids Present in Collagenous Sequences in Vivo-We constructed a nine-amino acid sequence figure from amino acids present in the collagenous sequences (Fig. 3) according to the following constraints. Gly was constrained in position 6 to the maximal score and, the size of the other amino acids represents their abundance in the other positions. The sequences were collected by computer from collagen types I to XIX. All collagen ␣-chain sequences were examined for the occurrence of short sequence motifs such as XYG (Fig. 3a1), XRG (Fig. 3a2), and XKG (Fig. 3a3). For each motif found, a nine-residue subsequence was determined around that position. Finally, the relative frequencies of the amino acids in each position were calculated. As seen in the Fig. 3a1, proline is most abundant amino acid in all triplets in the X and Y positions. The other common amino acids in the X positions are leucine, glutamate, and alanine, whereas in the Y position the most common amino acids are lysine, arginine, alanine, and glutamine. The most abundant amino acids preceding the lysyl residue in the Y position are proline, glutamate, leucine, alanine, aspartate, and glutamine (Fig. 3a3). The most abundant amino acids following the lysine triplet are glutamate, aspartate, alanine, serine, and proline in the X position and proline, lysine, arginine, and alanine in the Y position. Our hand-picked experimental set of lysine-containing sequences (Fig. 3a4) corresponded well to the set of sequences found by computer, although ϳ400 occurrences of XKG motifs were missing from the experimental set.
Analysis of Peptides Bound to Different LH Isoforms Based on Amino Acid Sequences-As seen in Fig. 2, the WAC distance-based Sammon maps showed no absolute sequence specificity for any of the four LH isoforms. However, a detailed analysis of the Sammon maps (Fig. 2, areas marked by 1-5) revealed some differences between the isoforms. As seen in Fig.  4, in certain areas of the Sammon map the binding probability was not identical to all the isoforms. An analysis of sequences in those areas (Fig. 3b, 1-5) showed that area 1, where LH1 has  I  34  21  9  29  35  II  19  26  32  21  32  III  18  17  22  11  11  IV  263  25  28  27  33  V  5 7  2 3  2 5  2 6  3 5  VI  42  33  33  33  45  VII  46  37  37  22  28  VIII  20  0  20  35  35  IX  46  17  17  33  24  X  22  9  23  a low binding probability, has an increased frequency of proline in position 2, leucine in position 4, and glutamine in position 7. Area 2, in which LH2a has a low binding probability, has glutamate at position 7 in most peptides and an increased frequency of proline in position 4. Area 3 gives a moderate binding probability for all isoforms, somewhat higher for LH1 and LH2a, and has a positively charged amino acid in position 2, mostly glutamate in position 4, and at a high probability a negatively charged amino acid at position 7. Area 4, in which all isoforms except LH2a have a high binding probability, also has mostly glutamate in position 4, but position 7 is rarely occupied by an acidic residue; proline has a somewhat increased frequency in position 8. Area 5 has a somewhat high binding probability for LH3, and its sequences are marked by an increase in alanine in positions 2, 7, and 8 and proline in position 4, replacing the charged residues normally occupying these positions. Theoretical Peptide Structures and Binding Data-It was found that the free energy of solvation of a peptide into water correlates well with binding (Fig. 5). With all isoforms, a low solvation free energy, favorable for an aqueous environment, is FIG. 1. Pepspot binding probability as function of peptide net charge and LH isoform. The vertical bars represent binding probability medians. Error bars represent 2.5 and 97.5% quantile intervals, enclosing the true binding probability at a 95% probability. Net charge was determined from the amino acid sequence alone.

Comparison of the binding probability of XYGXKGXYG peptides, which have a specific amino acid (table rows) in a particular position (table columns), to that of other peptides
Each position column is subdivided into four columns, corresponding to LH1, LH2a, LH2b, and LH3. The XYGXKGXYG peptides are divided into subsets each having a specific amino acid in a particular position. The binding probability of each subset is compared to the binding probability of all other XYGXKGXYG peptides. For some subsets the probability that the binding probability was higher than that of others was Ͼ95%, and for some, that it was lower. These cases have been marked with plus (ϩ) or minus (Ϫ), respectively. Furthermore, for some it was possible to say that at 95% probability the difference was greater than a certain number of % units. These numbers are given in the table. associated with a high binding probability, and a higher solvation free energy is associated with a low binding probability. Because it was suspected that the solvation free energy might be directly explained by the peptide net charge, a scatter plot of the two properties ( Fig. 6) was created. The solvation free energy was found to be strongly dependent on peptide net charge (Fig. 6), with both positively and negatively charged peptides preferring an aqueous environment. No increase in binding was found (not shown) for peptides with a hydrogenbonded turn over Lys-5, as identified in the secondary structure analysis. The Sammon map based on structural root mean square deviation dissimilarities (not shown) showed no significant aggregation of binding or non-binding peptides.
Activity Assay to Test Bound Peptides as Substrates for LH Isoforms-An activity assay was used to determine whether the peptides bound to the LH isoforms can be hydroxylated by the enzymes, i.e. serve as substrates in the reaction. The conditions for the activity assay differ from those used on the Pepspots analysis, however. One of the cosubstrates, 2-oxoglutarate, was not present in the solution in the Pepspot analysis so as to prevent hydroxylation and subsequent release of the enzyme. Furthermore, peptides on Pepspots were immobilized while they were in solution in the activity assay. For these reasons, the binding of individual peptides in the activity assay (K m values in Table III) was not correlated with how they bound on the Pepspots. This was analyzed by comparing the corresponding K m values between binding and non-binding peptides in Pepspots data. The result is understandable be-cause, for individual peptides, even a small e.g. structural variation can cause a different result in either assay. With enough data a statistical analysis can overcome this variation, but for practical reasons, the number of activity assays could not be increased to such a high level. We were able to determine, however, in our activity assay whether the peptides were bound to the active site of the enzyme and served as substrates for the LH isoforms. We selected 28 peptides from Pepspots representing all types of sequences for in vitro measurements, and these peptides were tested with all four LH isoforms; altogether there were 112 measurements. The data revealed that in 57 measurements the result of the activity assay agreed with the Pepspots data. Furthermore, in 50 measurements, the isoform was able to hydroxylate the peptide, although the Pepspot analysis was negative. As seen from these data, the activity assay was more sensitive than the Pepspots binding analysis. This was as expected and indicated that our screen by Pepspots had picked about half of the peptides compared with the data obtained by the activity assay. There were only a few peptides, 5 of 112 analyses, that showed binding in the Pepspots analysis but did not serve as substrates in the activity assay. These were probably peptides that bound to the enzyme at a site other than in the active site, or they may represent peptide inhibitors.
The K m and V max values of the different peptides for the LH isoforms are listed in Table III. The V max data indicated differences in the capacity of the various isoforms to hydroxylate different peptides. Interestingly, none of the negatively charged peptides tested were hydroxylated by LH1, whereas most of them were substrates for the other isoforms. The peptides with a positive charge all served as substrates in the reaction, which agrees with the Pepspot binding data. The presence of one extra exon in LH2 (LH2b in comparison with LH2a) caused a change in the behavior of LH2, as also seen in our binding assay. LH2a had a different ability to hydroxylate collagenous peptides compared with LH2b. Furthermore, changing one of the amino acids or the order of two amino acids in the peptide (compare L26 with L32 and L22 with L23) affected the ability of the peptide to be hydroxylated by the LH isoforms. Furthermore, increasing the amount of NaCl in the activity assay caused a remarkable reduction of the activity in most cases, in agreement with our finding that the net charge of the peptide is a key determinant for binding (Table IV).
Circular Dichroism Spectra-The secondary structure of peptides used in the activity assays were obtained from CD spectra. Three typical CD spectra are shown in Fig. 7. The secondary structure percentages (Table V) as interpreted from the CD spectra showed that all the analyzed peptides had a largely unordered structure, 28 -50%, with the ␤-sheet being the most prevalent, 24 -48%, of the ordered structural motifs. Polyproline II was present only in minor amounts, 11-15%, and similar for turns, 10 -16%. None of the secondary structure percentages was found to correlate with K m or V max values. DISCUSSION Collagens are a large family of extracellular proteins having more than 40 genes in vertebrates, the number of collagen genes in nematodes, like C. elegans, exceeding 150 (1)(2)(3)(4)(5)41). A large number of post-translational modifications are associated with collagen biosynthesis. Lysyl hydroxylase is the enzyme catalyzing one of the post-translational modifications, hydroxylation of lysyl residues (1-5). It is not known why vertebrates, like human, mouse, and rat (13)(14)(15)(16)(17)(18)(19)(20)(21)44), have three LH isoforms, whereas lower species like C. elegans (26) have only one orthologue for lysyl hydroxylase. Our earlier data indicate that all three isoforms are expressed in the same cells, although at different levels, when analyzed in various cultured human cell FIG. 3. a, amino acid frequencies. The height of each letter is proportional to the frequency of the amino acid in a particular position. Collagen ␣-chain sequences were examined for motifs (white) such as G in 10,134 occurrences (1), RG in 863 occurrences (2), and KG in 1,132 occurrences (3), and a 9-residue sub-sequence was taken from the vicinity of each instance of a motif found. The frequencies were calculated from these sets of sequences. 4, the hand-picked PepSpot set of 727 peptides is also included. The RG motif is used for comparison because of the structural similarity of arginine and lysine. Some interesting differences between 1 and 2 and between 2 and 3 are indicated (black) in 2 and 3, respectively. b, amino acid frequencies of sequences of Sammon map areas 1-5 (see Fig. 2). 1, area 1; 2, area 2; 3, area 3; 4, area 4; 5, area 5. The most important differences from the general KG motif are indicated (black).
lines (42). mRNA analysis data suggest that the expression of LH1, LH2, and the ␣ subunit of prolyl 4-hydroxylase, another post-translational enzyme of collagen biosynthesis (1)(2)(3)(4)(5), are associated with collagen biosynthesis, showing similar co-regulation and statistically significant correlations with each other and with total collagen synthesis (42). The data argue against collagen-type specificity in LH1 and LH2 reactions. A different behavior was found for LH3. LH3 did not show any correlation with the other LH isoforms or with specific collagen types or total collagens (39). We have recently found that LH3 has additional functions, i.e. galactosylation of hydroxylysyl residues and glucosylation of galactosylhydroxylysyl residues (24 -25, 27), although the extent of these reactions in vivo is still under investigation (43). Recent reports suggest LH2b to be a telopeptide lysyl hydroxylase (21,45). There are no data so far to indicate the substrate requirements for the different LH isoforms. It is not known if the amino acid sequences surrounding the individual lysine residues determine which LH isoform catalyzes the hydroxylation reaction.
We have used peptide scanning to screen the binding of in vivo occurring collagenous sequences to the LH isoforms. Our aim was to screen a large number of peptides, to access the roles of the LH isoforms in hydroxylating different collagen types, and to see if the amino acid sequences in close proximity to the lysyl residues in the Y position of X-Y-Gly sequences influence binding to the enzyme. This is the first study to analyze how the amino acid sequence of the peptide affects the binding of the peptide to the LH isoforms. For practical reasons we have limited the peptide size to nine amino acids, although it is probable that other amino acids farther from the lysine residue may also affect peptide binding in vivo. Our data revealed Pepspots as a useful assay to screen a large number of peptide candidates. Our activity assays suggest that many peptides bound to the active site of the enzyme. As seen from our data, there was no absolute sequence specificity for the binding of peptides to LH1, LH2a, LH2b, and LH3. There was an overlap of the binding of many peptides to the isoforms, but there were some peptides that preferentially bound to certain LH isoforms. These data confirm our earlier finding (42) that LH1, LH2a, LH2b, and LH3 lack collagen-type specificity. There was a tendency, however, for the isoforms to favor some collagen types; LH1 for instance binds peptides of type XI and XV collagens, and LH3 binds peptides of type VI, XI, and XII collagens.
We analyzed the frequencies of amino acids in sequences of collagenous polypeptides in vivo. Our data revealed proline as the most frequent amino acid in the X and Y positions of repeating X-Y-Gly sequences. If lysine was in the Y position of the general X-Y-Gly triplet (see Fig. 3a, 1 and 3), the frequency of charged amino acids was higher in the triplet after the lysine-containing triplet. We also compared the lysine-containing sub-sequence with the sub-sequence, in which another basic amino acid, arginine, was substituted for the lysine (Fig.  3a2) to determine whether the sub-sequence charge was the main determinant for other amino acids in the sequence evolution. As seen in the figures, the sequences remained quite similar. The comparison revealed, however, that evolution had to some extent changed the amino acid sequence surrounding the lysyl residues; one possible purpose of this might be to regulate the hydroxylation of the lysyl residues in the sequence. The data in this study indicated that the net charge of the peptide is a very important determinant for the binding to all isoforms, a net charge of ϩ1 or higher giving better binding than a net charge of 0 or less. The alternative splicing, which occurs in LH2, changed the peptide binding properties of LH2, suggesting that the alternatively spliced exon participates in the peptide binding to the active site, or it changed the threedimensional structure of the active site and alters the binding properties. Individual amino acids in our peptide sequences also influenced the binding to the different isoforms. These included lysine in position 2, histidine in position 7, and arginine and lysine in position 8, which promoted the binding to all isoforms. Furthermore, glutamate in position 7 and proline in position 8 prevent peptide binding to all isoforms. Some amino acids in the sequence facilitated the binding to certain LH isoforms. For example, alanine in position 4, promoted binding to LH2b and LH3 but not to LH2a. The data clearly indicated that the amino acids surrounding the lysyl residue have an effect on the binding of the peptide to LH.
We calculated theoretical structures for the peptides in an aqueous environment. Comparison of these data to data from the binding assay revealed no structural motifs that would have affected the binding. Similar conclusions were reached by comparing the CD data with the values obtained from the activity assays. However, we found that peptides with a stronger preference for an aqueous environment bound LH more  often, with similar results for all isoforms. These peptides typically had a high net charge, which most probably is what caused the high binding. The finding also suggests that the LH binding site might be hydrophilic, i.e. not be a deep hydrophobic pocket. It was found earlier (1, 10) that a 99-amino acid cyanogen bromide fragment from the ␣1 chain of rat skin collagen containing two Ala-Lys-Gly sequences was a particularly poor substrate for LH (LH1). We used 11 Ala-Lys-Gly-containing peptides originating from human ␣1 chain of type I collagen in our Pepspots screening (Table I). Our Pepspot data revealed that only two of the peptides bound to LH1, whereas nine remained unbound, in agreement with the earlier data (1,10). None of the Ala-Lys-Gly peptides bound to LH2a (the binding to LH2b was negative for six peptides), whereas five of the peptides bound to LH2b. LH3 showed the same binding pattern as LH2b.
LH1 is known to be involved in the heritable connective tissue disorder, Ehlers-Danlos syndrome type VI (OMIM 225400 (46,47)). Numerous mutations of LH1 have been char-acterized in patients with this syndrome. Muscular hypotonia, kyphoscoliosis, marfanoid habitus, fragile eyeballs, osteopenia, and occasionally vascular fragility are symptoms associated with the disorder. A recent study suggests that mutations of LH2 cause Bruck syndrome, and an increased expression level of LH2b is associated with systemic sclerosis (45). No mutations or disorders associated with LH3 have been reported so far. It is interesting to note that the reports of the LH1 mutations (48) indicate differences in under-hydroxylation between bone type I and cartilage type II collagens, type I being more dramatically affected than type II collagen. These data suggest that LH1 preferentially hydroxylates lysyl residues of type I collagen in bone. Similar results were obtained when the crosslinked peptides originating from type I or type II collagens, which were secreted to the urine of the patient with the Ehlers-Danlos syndrome VI, were analyzed (49). Our observations from mouse tissues (18) revealed that there are clear differences in the tissue-specific expression of the LH isoforms. LH1 is extremely highly expressed for instance in heart, liver, lung, skeletal muscle, and kidney, LH2 in the heart, lung, and kid- ney, and LH3 in the heart, lung, liver, and testis. The tissuespecific expression of the LH isoforms partially explains the localization of enzyme defects to certain tissues, as seen in the patients with Ehlers-Danlos syndrome type VI (46,47). Based on the data of the present study, it is probable that the lysyl residues in the proteins of patients with Ehlers-Danlos syndrome VI can probably be hydroxylated to some extent by other isoform(s) but to what extent the hydroxylation occurs depends on the amount of the other LH isoforms in the tissue as well as on the amino acid sequence surrounding the lysyl residue to be hydroxylated. CONCLUSIONS This is the first study to screen a large number of collagenous peptides for possible interaction with the LH isoforms. Pepspots was useful as a binding assay, enabling the simultaneous screening of a large number of peptides, although it was not as sensitive as the activity assay in the screening. A two-dimensional Sammon map was used to visualize the different types of peptides bound to each LH isoform. Our data with 727 collagenous peptides indicated that the net charge of the peptide and specific amino acids in close proximity to the lysyl residues in the collagenous sequence were key determinants in the binding and the hydroxylation of lysyl residues of X-Lys-Gly sequences, a positive charge promoting both events. There was no absolute sequence specificity for the binding of the peptides to LH1, LH2a, LH2b, and LH3. These enzymes were able to hydroxylate many sequences from different collagen types, but there was a preference of some peptides to be hydroxylated by a specific isoform.