Evidence for 4-Hydroxyproline in Viral Proteins

4-Hydroxyproline, the characteristic amino acid of collagens and collagen-like proteins in animals, is also found in certain proline-rich proteins in plants but has been believed to be absent from viral and bacterial proteins. We report here on the cloning and characterization from a eukaryotic algal virus, Paramecium bursaria Chlorella virus-1, of a 242-residue polypeptide, which shows distinct sequence similarity to the C-terminal half of the catalytic α subunits of animal prolyl 4-hydroxylases. The recombinant polypeptide, expressed in Escherichia coli, was found to be a soluble monomer and to hydroxylate both (Pro-Pro-Gly)10and poly(l-proline), the standard substrates of animal and plant prolyl 4-hydroxylases, respectively. Synthetic peptides such as (Pro-Ala-Pro-Lys) n , (Ser-Pro-Lys-Pro-Pro)5, and (Pro-Glu-Pro-Pro-Ala)5 corresponding to proline-rich repeats coded by the viral genome also served as substrates. (Pro-Ala-Pro-Lys)10 was a particularly good substrate, with a K m of 20 μm. The prolines in both positions in this repeat were hydroxylated, those preceding the alanines being hydroxylated more efficiently. The data strongly suggest that P. bursaria Chlorella virus-1 expresses proteins in which many prolines become hydroxylated to 4-hydroxyproline by a novel viral prolyl 4-hydroxylase.

The formation of 4-hydroxyproline is catalyzed by prolyl 4-hydroxylases that act on proline residues in peptide linkages. The vertebrate enzymes are 240-kDa ␣ 2 ␤ 2 tetramers, in which the catalytic sites are located in the ␣ subunits and the ␤ subunits are identical to the enzyme and chaperone protein disulfide isomerase. They require Fe 2ϩ , 2-oxoglutarate, O 2 , and ascorbate and hydroxylate -X-Pro-Gly-sequences (for reviews, see Refs. 5 and 6). Prolyl 4-hydroxylases from higher plants may resemble the vertebrate enzymes in their structure (7), whereas prolyl 4-hydroxylases from multicellular and unicellular green algae are 60-kDa monomers (8,9). Plant prolyl 4-hydroxylases require the same cosubstrates as the animal enzymes, but they differ from the latter in that they hydroxylate proline residues in poly(L-proline) and poly(L-proline)-like sequences, while the repeating -X-Pro-Gly-triplets are either very poor substrates or not hydroxylated at all (2,8).
We report here that the genome of Paramecium bursaria Chlorella virus-1 (PBCV-1 1 ; Refs. 10 and 11) encodes a 242amino acid polypeptide that shows a distinct amino acid sequence similarity to the C-terminal half of the catalytic ␣ subunits of animal prolyl 4-hydroxylases. In addition, the genome contains many open reading frames for proteins with proline-rich repeats. The recombinant viral polypeptide, expressed in Escherichia coli, was found to be a soluble monomer and to hydroxylate (Pro-Pro-Gly) 10 , poly(L-proline), and several synthetic peptides corresponding to proline-rich repeats coded by the viral genome. The data strongly suggest that PBCV-1 expresses proteins in which a number of proline residues become hydroxylated by a viral prolyl 4-hydroxylase with many unique properties. Thus the occurrence of 4-hydroxyproline in proteins is probably not restricted to certain animal and plant proteins.

EXPERIMENTAL PROCEDURES
Identification of the PBCV-1 Prolyl 4-Hydroxylase-like Polypeptide-A sequence homology search in GenBank TM using The Basic Local Alignment Search Tool (12) indicated the presence in the PBCV-1 genome (accession number U42580) of an open reading frame encoding a 242-amino acid polypeptide that showed a similarity to the C-terminal half of the human prolyl 4-hydroxylase ␣(I) subunit (13). This amino acid sequence was aligned with those of the ␣(I) and ␣(II) subunits of human type I and type II prolyl 4-hydroxylases (13,14) and the ␣ subunits of the Caenorhabditis elegans (15) and Drosophila melanogaster (16) prolyl 4-hydroxylases by the ClustalW method (17). The cleavage site of the signal peptide was predicted using the computational parameters of von Hejne (18).
Cloning and Expression in E. coli of the PBCV-1 Prolyl 4-Hydroxylase-like Polypeptide-PCR primers 5Ј-CGCGCATATGGAGGGGTTT-GAAACCAGCGAT-3Ј and 5Ј-CGCGCTCGAGTCATTTAACAGCACG-GATCCATT-3Ј were synthesized based on the viral DNA sequence and used to obtain a 621-base pair PCR product flanked by NdeI and XhoI restriction sites from the viral genomic DNA. This PCR product coding for the amino acids Glu-36 -Lys-242 of the viral prolyl 4-hydroxylaselike polypeptide was cloned to NdeI-XhoI-digested pET15b expression vector (Novagen), and the sequence was verified in an automated DNA sequencer (Applied Biosystems).
The expression plasmid was transformed into the E. coli BL21(DE3) strain (Novagen). The cells were grown at 37°C to an optical density of 0.55 at 600 nm, incubated at 28°C for 30 min, and expression was * This work was supported by grants from the Health Sciences Council of the Academy of Finland and from FibroGen Inc. (South San Francisco, CA) and traveling grants from the Swedish Institute and the Swedish Chemical Society (to M. E.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18  induced by the addition of isopropyl-␤-D-thiogalactopyranoside (IPTG) to 0.8 mM. The cells were harvested 3 h after induction, suspended in a 0.05 volume of a solution of 5 mM imidazole, 0.5 M NaCl, and 20 mM Tris, pH 7.9, sonicated until the sample was no longer viscous, centrifuged at 38,000 ϫ g for 30 min, and the soluble and insoluble fractions were analyzed by 12% SDS-PAGE.
Protein Purification-The recombinant PBCV-1 polypeptide was purified by applying the soluble fraction of the cell lysate to a Ni 2ϩ -chelate affinity column (Invitrogen); the unbound material was removed by washing with a solution of 60 mM imidazole, 0.5 M NaCl, and 20 mM Tris, pH 7.9; and the recombinant polypeptide was eluted by increasing the imidazole concentration to 0.5 M. The fractions were analyzed by 12% SDS-PAGE and those containing the polypeptide were pooled and concentrated with Macrosep 10K concentrators (Filtron). The apparent molecular weight of the purified protein was estimated by applying it to a calibrated HiLoad 16/60 Superdex S-200 (Amersham Pharmacia Biotech) column, equilibrated, and eluted with a 0.3 M NaCl, 50 mM sodium phosphate buffer, pH 7.0.
Assays-Prolyl 4-hydroxylase activity was assayed by a method based on the hydroxylation-coupled decarboxylation of 2-oxo-[1-14 C]glutarate (19). In some experiments the (Pro-Ala-Pro-Lys) 5 substrate was purified from the reaction mixture by reverse phase HPLC, hydrolyzed using the manual gas-phase hydrolysis method, and analyzed in an Applied Biosystems 421A amino acid analyzer. N-terminal sequencing of the purified (Pro-Ala-Pro-Lys) 5 peptide was performed in an Applied Biosystems 477A pulse-liquid protein sequencer. K m and V max values were determined as described previously (20).

The PBCV-1 Genome Encodes a Prolyl 4-Hydroxylase-like
Polypeptide-A sequence homology search indicated that the genome of PBCV-1 (Refs. 10 and 11; GenBank TM accession number U42580) contains an open reading frame encoding a 242-amino acid polypeptide that shows a distinct sequence similarity to the C-terminal half of the catalytic ␣ subunits of prolyl 4-hydroxylases from various animal sources (Fig. 1). A putative signal sequence is located at its N terminus, the most likely first amino acid of the processed viral polypeptide being glutamate (Fig. 1), based on the computational parameters of von Hejne (18). Thus the length of the signal peptide is probably 32 residues and that of the processed polypeptide 210 amino acids. The sequence of the processed viral polypeptide is 20% identical to residues 294 -504 in the 517-residue ␣ subunit of human type I prolyl 4-hydroxylase (13) and 15-23% identical to the corresponding residues in the ␣ subunits of the human type II prolyl 4-hydroxylase (14) and the C. elegans (15) and D. melanogaster (16) prolyl 4-hydroxylases (Fig. 1). The two histidines and one aspartate that bind the Fe 2ϩ atom at the catalytic site (20 -22) and the lysine that binds the C-5 carboxyl group of the 2-oxoglutarate (20) are all conserved in the PBCV-1 sequence (His-152, Asp-154, His-221, and Lys-231 in Fig. 1). Since the last mentioned residue in all other 2-oxoglutarate dioxygenases, including the closely related enzyme lysyl hydroxylase (23), is an arginine (21, 24, 25), we regarded it as possible that the viral polypeptide might be a prolyl 4-hydroxylase. The fifth critical residue at the catalytic site of the vertebrate prolyl 4-hydroxylases, a histidine that is probably involved in the binding of the C-1 carboxyl group of 2-oxoglutarate to the Fe 2ϩ atom and in the decarboxylation of this cosubstrate (20), is replaced in the PBCV-1 sequence as in the Drosophila ␣ subunit sequence by an arginine (Arg-239 in Fig.  1). However, the PBCV-1 sequence shows no similarity to the peptide substrate binding domain present between residues 140 -240 in the ␣ subunits of animal prolyl 4-hydroxylases (26).
The Recombinant PBCV-1 Prolyl 4-Hydroxylase-like Polypeptide Is a Soluble Monomer-To express the viral polypeptide in E. coli, the PBCV-1 DNA sequence coding for amino acids Glu-36 -Lys-242 was synthesized by PCR, cloned into the pET-15b vector with an N-terminal histidine tag, and transformed into the BL21(DE3) host strain. Expression of the polypeptide was induced with IPTG, and the cells were incubated at 28°C for 3 h. The cells were then harvested, suspended in a Tris-HCl buffer, pH 7.9, containing 5 mM imidazole, sonicated, and the soluble and insoluble fractions were analyzed by 12% SDS-PAGE and Coomassie Blue staining (Fig. 2, lanes 2 and 3). The expressed recombinant polypeptide was mainly found in the soluble fraction (Fig. 2, lane 2) and could be purified using a Ni 2ϩ -chelate affinity column and imidazole elution (Fig. 2, lane  4). Gel filtration in a calibrated Superdex S-200 column indicated that the recombinant polypeptide had an apparent molecular weight of about 30,000 (details not shown). As the calculated molecular weight of the recombinant polypeptide with the N-terminal histidine tag and the thrombin cleavage site is 27,195, the recombinant polypeptide was apparently a monomer.
The Recombinant PBCV-1 Polypeptide Hydroxylates Both (Pro-Pro-Gly) 10 and Poly(L-proline)-To study whether the vi-  (19). When 0.5 mg/ml of (Pro-Pro-Gly) 10 was used as the peptide substrate, the amount of 14 CO 2 generated was 5450 cpm, whereas various negative controls gave less than 500 cpm. Poly(L-proline), M r 40,000, a competitive inhibitor of animal prolyl 4-hydroxylases (5, 6), also acted as a substrate, giving 5850 cpm under the above conditions. The pH optimum of the hydroxylation reaction was 7.0 (details not shown).
The viral enzyme, like the animal and plant prolyl 4-hydroxylases, required Fe 2ϩ , 2-oxoglutarate, O 2 , and ascorbate (details not shown). The K m values for the cosubstrates Fe 2ϩ , 2-oxoglutarate, and ascorbate were very similar to those of human type I prolyl 4-hydroxylase (Table I), suggesting that the cofactor binding sites of these enzymes may be similar. However, the K m value of the viral enzyme for the peptide substrate (Pro-Pro-Gly) 10 was about 150-fold (Table I), and the K m values for poly(L-proline), M r 13,000 and 40,000 (Table I), were also much higher than those of 23 and 7 M reported for poly(L-proline), M r 7,000 and 31,000, with the prolyl 4-hydroxylase from the unicellular green alga Chlamydomonas reinhardii (8) or 10 M for poly(L-proline), M r 7,000, with the prolyl 4-hydroxylase from the multicellular green alga Volvox carteri (9).
The Viral Enzyme Hydroxylates Peptides Corresponding to Proline-rich Repeats Coded by the Viral Genome-The PBCV-1 genome contains many open reading frames coding for prolinerich repeats. These include (Pro-Ala-Pro-Lys) n , in which n is up to 26, (Ser-Pro-Lys-Pro-Pro) 20 , (Pro-Glu-Pro-Pro-Ala) 9 , (Ser-Thr-Lys-Pro-Pro) 11 , and (Glu-Pro-Ser-Pro-Glu-Pro) 5 . Synthetic peptides (Ser-Pro-Lys-Pro-Pro) 5 , (Pro-Glu-Pro-Pro-Ala) 5 , Lys-Pro-Ala, Pro-Ala-Pro-Lys, and (Pro-Ala-Pro-Lys) n , where n ϭ 2-10, were therefore tested as substrates for the recombinant PBCV-1 polypeptide. All these peptides were found to serve as substrates, their K m values ranging from 20 to 8600 M (Table  II). The V max values for (Pro-Ala-Pro-Lys) n , where n ϭ 3-10, were identical within the range of experimental error (Table  II), and these values were also essentially identical to those for poly(L-proline), M r 13,000 and 40,000, and for (Pro-Pro-Gly) 10 determined in the same experiments (details not shown), whereas the V max for (Pro-Ala-Pro-Lys) 2 was about 40%, (Ser-Pro-Lys-Pro-Pro) 5 15%, and those for (Pro-Glu-Pro-Pro-Ala) 5 , Pro-Ala-Pro-Lys, and Lys-Pro-Ala were even lower (Table II). Thus the best substrate among those tested when considering both K m and V max was (Pro-Ala-Pro-Lys) 10 . The generation of 4-hydroxyproline in the (Pro-Ala-Pro-Lys) 5 peptide was verified by amino acid analysis of the peptide purified from the hydroxylation reaction mixture by reverse phase HPLC (details not shown).
The substrate requirements of the viral enzyme thus differed distinctly from those of both animal and plant prolyl 4-hydroxylases. The hydroxylation of (Pro-Pro-Gly) 10 is a property similar to that of animal prolyl 4-hydroxylases. Although the K m of 2900 M is much higher than the K m values of 20 and 100 M of the human type I and type II enzymes (26), the K m of 20 M of the C. elegans enzyme (27) and 260 M of the D. melanogaster enzyme (16), the V max of the viral enzyme for (Pro-Pro-Gly) 10 was similar to its V max values for poly(L-proline) and the best polypeptide substrates. Some plant prolyl 4-hydroxylases also hydroxylate (Pro-Pro-Gly) 10 , but only at a very low rate (8). The hydroxylation of poly(L-proline) is a property of plant prolyl 4-hydroxylases (2), whereas poly(L-proline) is a competitive inhibitor of the animal enzymes (6), but the K m values of the viral enzyme for poly(L-proline) were more than 1 order of magnitude higher than those reported for plant enzymes (2,8,9). The best peptide substrates of the viral enzyme, (Pro-Ala-Pro-Lys) 10 and (Ser-Pro-Lys-Pro-Pro) 5 , correspond to sequences coded by the viral genome. The K m values for the authentic viral polypeptides may be even lower, as the K m values decreased with an increase in the chain length of the substrates and as the actual viral repeat sequences range up to (Pro-Ala-Pro-Lys) 26 and (Ser-Pro-Lys-Pro-Pro) 20 .
Prolines in Both Positions of the -Pro-Ala-Pro-Lys-Repeat Are Hydroxylated by the Viral Prolyl 4-Hydroxylase-In order to study whether prolines in both positions of the -Pro-Ala-Pro-Lys-repeat are hydroxylated, (Pro-Ala-Pro-Lys) 5 was allowed to react with the viral prolyl 4-hydroxylase under conditions that gave a high extent but not complete hydroxylation. The peptide was then purified from the reaction mixture and subjected to amino acid sequencing. Prolines in both positions of the repeat were found to be hydroxylated, but those preceding alanines were hydroxylated more readily, except in the ex-   10 and poly(L-proline) K m values were determined as described previously (20).   5 20 1800 (Pro-Glu-Pro-Pro-Ala) 5 1000 400 Lys-Pro-Ala 8600 200 Pro-Ala-Pro-Lys 4800 400 (Pro-Ala-Pro-Lys) 2 950 4600 (Pro-Ala-Pro-Lys) 3 310 13,500 (Pro-Ala-Pro-Lys) 5 50 10,300 (Pro-Ala-Pro-Lys) 10 20 11,900 Viral Prolyl 4-Hydroxylase treme N-terminal -Pro-Ala-Pro-Lys-repeat (Fig. 3). The highest extents of hydroxylation were seen with prolines in the second and third repeat (Fig. 3). Interestingly, the pattern of hydroxylation of (Pro-Ala-Pro-Lys) 5 with the viral prolyl 4-hydroxylase was found to be distinctly different from that of the hydroxylation of the 5 or 10 -Pro-Pro-Gly-triplets in (Pro-Pro-Gly) 5 or (Pro-Pro-Gly) 10 by the vertebrate enzyme (28,29). The latter hydroxylates its substrates asymmetrically, so that the 4th or 9th triplet from the N-terminal end, respectively, is hydroxylated more readily than any other (28,29), whereas no such asymmetric hydroxylation was seen with the viral enzyme (Fig. 3).
Conclusions-The present data indicate that the genome of PBCV-1 encodes an active prolyl 4-hydroxylase with many unique properties and a number of protein sequences that can be hydroxylated by the enzyme. The unique properties of the enzyme include its low molecular weight and specificity with respect to various peptide substrates. The cosubstrates needed by the enzyme in vivo may be provided by either the virus or more likely by its host. On the basis of these data it seems very probable that the occurrence of 4-hydroxyproline in proteins is not restricted to certain animal and plant proteins.
The function of 4-hydroxyproline residues in all collagens and collagen-like proteins in animals is to stabilize their triplehelical structures (3,6,30). The functions of these residues in plant proteins are less well characterized but are also likely to involve stabilization of structures (4). The 4-hydroxyproline residues in plant proteins are often O-glycosylated, and the glycosylation is probably important for the structural role of the proteins in plant cells (1,4). The functions of the 4-hydroxyproline residues in viral proteins are likely to be similar to those in animal and plant proteins, but work will be needed to elucidate these functions and to determine whether 4-hydroxyproline residues in viral proteins serve as attachment sites for carbohydrate units. FIG. 3. Analysis of the hydroxylation of the proline residues in (Pro-Ala-Pro-Lys) 5 by the PBCV-1 prolyl 4-hydroxylase. The hydroxylation reaction was carried out with 80 g/ml of (Pro-Ala-Pro-Lys) 5 as the substrate in the standard prolyl 4-hydroxylase reaction mixture under conditions that gave a high extent but not complete hydroxylation of the substrate. The peptide substrate was purified from the reaction mixture by HPLC and subjected to N-terminal sequencing. The columns indicate the degree of hydroxylation of the various proline residues in the hydroxylated peptide. P ϭ proline; A ϭ alanine; K ϭ lysine.