Molecular studies of CtpA, the carboxyl-terminal processing protease for the D1 protein of the photosystem II reaction center in higher plants.

The D1 reaction center protein of the Photosystem II complex in green plants is synthesized with a short carboxyl-terminal extension. Proteolytic cleavage and removal of this extension peptide in the thylakoid lumen are necessary for the assembly of a manganese cluster that is essential for the oxygen evolution activity of Photosystem II. We have isolated cDNAs encoding CtpA, the carboxyl-terminal processing protease for the D1 protein, from two higher plants, spinach and barley. In each of these organisms, CtpA is encoded by a single copy nuclear gene, and its steady-state mRNA levels are light-regulated. The CtpA protein is detectable in etiolated material, and its level increases approximately 5-fold upon illumination. Moreover, the CtpA gene is expressed in shoot tissues and not in roots. In its precursor form, the CtpA protein harbors a bipartite transit sequence characteristic for thylakoid lumenal proteins. Cell fractionation studies demonstrated that CtpA is associated with thylakoid membranes and is resistant to treatments with thermolysin, consistent with its localization in the lumen of thylakoids. Comparisons of the sequence of the higher plant CtpA enzyme with those of other related carboxyl-terminal processing proteases suggest that these proteins constitute a new family of proteases.

The D1 reaction center protein of the Photosystem II complex in green plants is synthesized with a short carboxyl-terminal extension. Proteolytic cleavage and removal of this extension peptide in the thylakoid lumen are necessary for the assembly of a manganese cluster that is essential for the oxygen evolution activity of Photosystem II. We have isolated cDNAs encoding CtpA, the carboxyl-terminal processing protease for the D1 protein, from two higher plants, spinach and barley. In each of these organisms, CtpA is encoded by a single copy nuclear gene, and its steady-state mRNA levels are light-regulated. The CtpA protein is detectable in etiolated material, and its level increases ϳ5-fold upon illumination. Moreover, the CtpA gene is expressed in shoot tissues and not in roots. In its precursor form, the CtpA protein harbors a bipartite transit sequence characteristic for thylakoid lumenal proteins. Cell fractionation studies demonstrated that CtpA is associated with thylakoid membranes and is resistant to treatments with thermolysin, consistent with its localization in the lumen of thylakoids. Comparisons of the sequence of the higher plant CtpA enzyme with those of other related carboxyl-terminal processing proteases suggest that these proteins constitute a new family of proteases.
The pigment-protein complex Photosystem II (PSII) 1 is the major producer of oxygen in the biosphere. Upon excitation with light, P680 (the reaction center chlorophylls of PSII) releases electrons that move via pheophytin and two plastoquinone molecules to the acceptor side of PSII (see Ref. 1 for a recent review). The oxidized P680 molecule is subsequently reduced by electrons arriving from water via a cluster of four manganese ions and a redox-active tyrosine residue on the donor side of this photosystem (2). The most highly resolved photoactive PSII complex contains a heterodimer of the D1 and D2 polypeptides, two subunits of cytochrome b 559 , and PsbI and PsbW proteins (3,4). Closely associated with them are other integral membrane proteins, namely CP47, CP43, PsbL, and a number of smaller polypeptides (5,6). D1 and D2 are membrane integral components that provide binding sites for a number of important redox-active cofactors in PSII (1). While D2 is relatively stable, D1 is rapidly turned over in vivo, presumably during a damage-repair cycle after excitation of the reaction center with light (7). In most cases, the D1 subunit is synthesized as a precursor polypeptide with a short carboxyl-terminal extension (8,9). The C terminus of this protein is subsequently translocated into the thylakoid lumen and immediately cleaved off by a lumenal C-terminal processing peptidase (8,10). Mutants that are unable to remove this extension fail to produce oxygen, presumably because the C terminus of the mature D1 protein functions as a ligand for the manganese cluster in PSII (11)(12)(13).
The D1-processing protease has been partially purified from pea and spinach (10,14,15). Recently, ctpA, the gene encoding this protease, has been isolated from the cyanobacterium Synechocystis 6803 (16). Targeted mutagenesis of the ctpA gene in this organism resulted in the formation of inactive PSII complex due to a loss of electron transfer from water via the manganese cluster (17). Immunoblot analysis of PSII proteins revealed that the D1 protein in this mutant strain is nearly 2 kDa larger than that in wild-type cells, confirming that the carboxyl-terminal extension of D1 is not cleaved in this mutant. The cyanobacterial gene encodes a polypeptide of 427 amino acids, with a 31-residue-long bacterial signal sequence at its amino-terminal region, suggesting that the mature enzyme is located in the thylakoid lumen (16). In higher plants, this protease is encoded by a nuclear gene. In this report, we describe the isolation and molecular characterization of CtpA cDNAs from two higher plants, spinach and barley. We also present a comparison of the primary structures of this novel processing enzyme and its homologs from various organisms and discuss its expression pattern and cellular localization.

Construction of cDNA Libraries and Identification of CtpA cDNA
Clones-Construction of a spinach cDNA library from poly(A) ϩ RNA isolated from plants grown under light has been described previously (4). The barley cDNA library was a gift from Prof. Schulze-Lefert (RWTH, Aachen, Germany). For the screening of these libraries, we used a 1.4-kilobase EcoRI genomic fragment from the cyanobacterium Synechocystis 6803, containing the ctpA gene (16), as a probe. Screening of cDNA libraries, analysis of the phage inserts, and Southern and Northern blot analyses were performed as described (18). Blots were hybridized overnight at 65°C in 6 ϫ NET (1 ϫ NET ϭ 150 mM NaCl, 30 mM Tris-Cl, pH 7.5, 1 mM NaEDTA), 5 ϫ Denhardt's solution, 0.5% SDS and washed in 2 ϫ NET at room temperature (18). For the amplification of the 5Ј-end of the CtpA gene from spinach, a 5Ј-rapid amplification of cDNA ends kit from GIBCO-BRL (Eggenstein, Germany) was used.
Analysis of Nucleotide Sequence-DNA sequencing was carried out by the dideoxynucleotide chain termination procedure (18) using a modified T7 polymerase (Sequenase, U. S. Biochemical Corp.), ␣-35 S-dATP, and CtpA-specific as well as universal synthetic oligonucleotides. The final sequence was determined from both strands of DNA frag- wustl.edu. 1 The abbreviation used is: PSII, Photosystem II.
ments. Analysis of nucleotide sequence was performed using the Genetics Computer Group software package (19). For data base searches, the BLAST (20) network service at the National Center for Biotechnology Information (Bethesda, MD) was used. Protein Electrophoresis and Immunodetection-Purification of thylakoid membranes from spinach and barley chloroplasts, extraction of proteins from these membranes, and the procedure for electrophoretic separation of proteins on one-dimensional SDS-polyacrylamide gel were as described earlier (4). For the overproduction of CtpA in Escherichia coli, a highly conserved region of the CtpA gene (corresponding to nucleotides 1113-1700) (see Fig. 1) was amplified using polymerase chain reactions, for which the oligonucleotide at the 5Ј-end was modified such that it introduced an in-frame ATG translational start codon. After digestion of the polymerase chain reaction product with EcoRI, it was cloned into the EcoRI site of a pET derivative overexpression plasmid (21) and transformed into E. coli strain BL21 (DE3). Overexpression in the presence of isopropyl-1-thio-␤-D-galactopyranoside was as described (21). After gel electrophoresis, proteins from the overproducing E. coli strain were electrophoretically transferred to nitrocellulose and visualized by Ponceau S staining in 1% acetic acid. The desired protein band was excised, dissolved in dimethyl sulfoxide, and injected into rabbits to raise antibodies against the CtpA protein. The serum was affinity-purified with the overexpressed protein on nitrocellulose filters as described by Harlow and Lane (22). Immunological detection of proteins on Western blots was as described (18).
Other Methods-The conditions for the growth of spinach and barley seedlings in darkness and greening of the etiolated plants under light were as described elsewhere (4). Fractionation of thylakoid membranes into PSII, PSI, cytochrome b 6 f and ATP synthase, four major thylakoidbound protein complexes, was as described by Westhoff et al. (23) and Markgraf and Oelmü ller (24). Concentrations of proteins were measured as described previously (4).

RESULTS
Isolation and Characterization of CtpA cDNAs-CtpA, the carboxyl-terminal processing protease for the D1 protein of PSII, is a functionally well conserved protein in green plants and cyanobacteria. We used a ctpA gene-specific probe from Synechocystis 6803 (16) to isolate corresponding cDNAs from spinach and barley cDNA libraries. The nucleotide sequence and the deduced amino acid sequence of the spinach gene are presented in Fig. 1. Other than the first 40 nucleotides at the 5Ј-end, this sequence was derived from a full-length cDNA clone, which contained 14 nucleotides upstream of the first ATG codon. According to the Kozak rule, the translational initiation codons of eukaryotic genes are often preceded by in-frame stop codons (25), none of which was present in the sequence of this cDNA. To verify that this ATG codon encodes the initiator methionine, we performed 5Ј-rapid amplification of cDNA ends experiments. Two amplification products were obtained and sequenced. The longest sequence contained 54 nucleotides upstream of this ATG codon and harbored two in-frame stop codons, 51 and 42 bases upstream of the ATG codon, respectively.
We have also isolated barley CtpA cDNA clones that encode the mature CtpA protein, but lack the nucleotide sequence for its transit peptide (see below). The nucleotide sequence of the longest barley clone has been deposited in the GenBank TM / EMBL Data Bank under accession number X90929 (cf. also Fig. 2).
Analysis of Spinach CtpA Protein Sequence-Analysis of the hydrophobicity profile of the sequence of the CtpA protein (data not shown) indicated that this polypeptide is largely hydrophilic (26). The only significant hydrophobic stretch is between residues 129 and 145 (Fig. 1). A comparison of the amino acid sequence of the spinach CtpA protein with that of the Synechocystis 6803 CtpA protein suggested that the precursor polypeptide of the spinach protein harbors a bipartite transit sequence characteristic for thylakoid lumenal proteins. Moreover, the presence of a hydrophobic domain preceded by positively charged residues and followed by an SWS motif suggested that terminal cleavage occurs after amino acid residue 150. The SWS sequence in the spinach CtpA protein corresponds to an ALA motif in Synechocystis 6803 CtpA and has been identified as the terminal cleavage site of many thylakoid lumenal proteins (27). Recently, Fujita et al. (28) have determined the sequence of the amino-terminal 27 residues of the CtpA protein from spinach. This sequence matched exactly that of residues 151-177 in the precursor form of the CtpA protein (Fig. 1). Fig. 2 shows a comparison of the amino acid sequences of the CtpA protein from spinach and barley with those of a number of carboxyl-terminal processing proteases from various prokaryotic organisms. Although the highest level of similarity was observed among the D1-processing proteases from the cyanobacteria Synechocystis 6803 (17) and Synechococcus 7002 (29), a surprisingly high degree of conservation was also found in C-terminal processing enzymes that are not involved in D1 processing. In total, 27 residues are conserved in all of these proteases, most of which are present in the C-terminal half of these proteins. Notably, three motifs, DLRXNXGG, ASASEI, and GKGXXQ, are conserved in all of these sequences ( Fig. 2; also see "Discussion").
Expression Profiles of CtpA in Plant Cells-Southern blot analysis of the genomic DNAs from spinach and barley indicated that the CtpA gene is present in a single copy in these green plants (Fig. 3A). This gene has also been identified in wheat and has been assigned to the homologous group 3 chromosomes by random fragment length polymorphism analysis of aneuploid (nullisomic/tetrasomic) lines. 2 Northern analysis with spinach and barley RNAs demonstrated that the CtpA cDNAs hybridize to ϳ2000-nucleotide-long transcripts (Fig.  3B). Since the spinach cDNA contains only 1821 base pairs, it seems to lack either 5Ј-or 3Ј-untranslated flanking sequences. These Northern blot data also revealed that the steady-state mRNA levels of the CtpA gene are strongly light-regulated in higher plants. Moreover, no hybridizing transcript could be detected in roots, indicating that this gene is expressed only in shoot tissues. Thus, the expression pattern of the CtpA gene is similar to that of other nuclear encoded genes for thylakoid proteins (30,31). Since the C-terminal half of the CtpA protein contains the most conserved domains (Fig. 2), the corresponding part of the 2 U. Hohmann, R. Oelmü ller, and R. G. Herrmann, unpublished data.

FIG. 2. Comparison of the amino acid sequences of the mature CtpA protein from 1) spinach (this study), 2) barley (this study), 3) Synechocystis 6803 (S6803; GenBank TM /EMBL accession number L25250), 4) Synechococcus 7002 (S7002; partial sequence, GenBank TM /EMBL accession number S18125), and 5) Bartonella bacilliformis (Barton; GenBank TM /EMBL accession number L37094) as well as those of the related Prc proteases from 6) E. coli (GenBank TM /EMBL accession number D00674) and 7) Haemophilus influenzae (Hemoph; GenBank TM /EMBL accession number L46298). The PileUp program in the Wisconsin Genetics
Computer Group package was used to align these sequences (19). Numbers on the right correspond to the positions of the amino acid residues in each protein. Dashes indicate gaps introduced to optimize alignments. The arrow marks the processing site after the import/signal sequence in sequences 1, 3, and 4. Spaces indicate identity to the corresponding residue in the spinach protein. Residues in boxes are conserved.
CtpA gene was overexpressed in E. coli, and the overexpressed protein was used to raise antibodies in rabbits. These antibodies recognized a 42-kDa protein in extracts from Synechocystis 6803 cells (Fig. 4A), consistent with previous observations (17). Since no such cross-reacting band was detected in the CtpAdeficient cyanobacterial mutant strain T564, we concluded that the immunoreaction was specific for the CtpA protein. In spinach, a slightly larger protein (45 kDa) was visualized (Fig. 4B), consistent with results obtained by Fujita et al. (28). This protein was present in etiolated tissues, and its steady-state level increased ϳ5-fold upon illumination (Fig. 4B), again confirming that the expression of the CtpA gene is light-regulated in green plants.
Cellular Localization of CtpA Protein-On the basis of its function, the CtpA protease is expected to be present in the lumen of thylakoid membranes since the C terminus of the pre-D1 protein is exposed to the lumen. Western blot analysis showed that the CtpA protein was present in protein extracts from intact chloroplasts (Fig. 5A). In particular, it was present in the thylakoid fraction, but not in the stromal fraction of these plastids. Further fractionation of the thylakoid membranes into PSII, PSI, cytochrome b 6 f, and ATP synthase, four supramolecular complexes, showed that CtpA is not tightly associated with any of them (Fig. 5A). Moreover, the CtpA protein in isolated thylakoid membranes was protected from digestion with the protease thermolysin (Fig. 5B). Similar effects were observed with PsbO, a 33-kDa lumenal polypeptide in PSII. In contrast, thermolysin could digest the stroma-exposed domain of PsbW, a transmembrane component of PSII (for details, see Ref. 4). Taken together, these data conclusively demonstrated that the CtpA protein is localized in the thylakoid lumen.

DISCUSSION
In this report, we have described the primary structure of CtpA, a C-terminal processing protease from spinach and barley, two higher plants. This nuclear encoded protein is synthesized in the cytoplasm as a precursor protein and then imported into the chloroplasts. The deduced amino acid sequence of the precursor polypeptide (Fig. 1) as well as the subcellular fractionation studies (Fig. 5) clearly demonstrated that the CtpA enzyme is located in the thylakoid lumen. An intriguing feature of this polypeptide is an unusually long bipartite transit sequence (150 residues), which exceeds the length of the transit sequences of all known lumenal polypeptides in chloroplasts (32). The stroma-directing envelope transit domain and the thylakoid transfer domain exhibit the expected characteristics: the former is hydrophilic, basic, and enriched in hydroxylated residues (Fig. 1), and the latter contains a positively charged N terminus followed by a hydrophobic segment, which is a crucial epitope for the translocation of lumenal proteins across the thylakoid membrane (32)(33)(34). Interestingly, cleavage of the cyanobacterial CtpA precursor proteins occurs after an AXA motif, characteristic for the terminal cleavage sites of many bacterial and thylakoid lumenal proteins (27). However, in the CtpA protein in spinach, the Ala residues in this motif have been replaced by Ser, another short chain amino acid.
It has been observed earlier that the Tsp (or Prc) protein in E. coil (35) as well as the CtpA protease in Synechocystis 6803 (17) exhibit significant sequence similarities to the interphoto-  receptor retinoid-binding protein in bovine, human, and insect eye systems (36,37). The spinach CtpA protein also shows such sequence similarity to the interphotoreceptor retinoid-binding protein (data not shown), suggesting that it might bind carotenoids (17). To test whether the activity of CtpA depends on the availability of carotenoids, we grew barley seedlings in the presence of Norflurazon. Norflurazon inhibits carotenoid biosynthesis, but allows accumulation of chlorophylls and the D1 protein in dim light-grown plants (38). However, under such conditions, we found that the D1 protein was processed to its mature size (data not shown), indicating that carotenoids are not absolutely required for the catalytic function of CtpA.
An important observation during this study is that the expression of the CtpA gene in both spinach and barley is positively influenced by the presence of light. Interestingly, both the CtpA transcript and the CtpA protein are present in low but significant amounts in dark-grown etiolated tissues. In contrast, the chlorophyll-binding protein D1, the substrate of CtpA, is not detected in such tissues (4). It is well known that the expression of most of the nuclear genes that encode various soluble or membrane protein components of the photosynthetic apparatus in green plants are light-regulated (30). However, the CtpA protein serves a regulatory function during the biogenesis of functionally active thylakoid membranes. Thus, CtpA is one of the first examples of nuclear genes encoding regulatory proteins in chloroplasts whose transcription is strongly influenced by light.
Numerous proteins in eukaryotic and prokaryotic cells are synthesized in precursor forms with cleavable carboxyl-terminal extensions (35, 39 -41). However, the processing proteases responsible for such cleavage reactions have not been widely studied. Only a small number of such proteases from prokaryotic organisms have been characterized at the molecular level. As shown in Fig. 2, a number of them share significant similarities at the level of their amino acid sequences. Among all of these proteins, the Prc (or Tsp) proteases from E. coli (35,39) and Haemophilus (42) are significantly larger than the CtpA proteins from plants, cyanobacteria, and Bartonella. As mentioned earlier, a small but significant number of residues are absolutely conserved in all of these proteases, suggesting that they may be important determinants for the activity of these C-terminal processing proteases. These include three Asp, one Glu, two Ser, two Pro, two Arg, and one Lys residue. In a recent study, Keiler and Sauer (43) identified three active-site residues, a serine, an aspartic acid, and a lysine residue, in the Tsp (or Prc) protease in E. coli. The corresponding residues in the spinach CtpA protein are serine 441, aspartic acid 452, and lysine 466, three absolutely conserved residues (Fig. 2), and we expect that these residues have catalytic roles in the spinach CtpA protein also. Interestingly, the CtpA proteases from plants and cyanobacteria did not exhibit any sequence similarity to the recently described HycI C-terminal processing protease involved in the maturation of a hydrogenase subunit in E. coli (44). Bowyer et al. (14) and Satoh and co-workers (45) have studied the biochemical properties of the CtpA protease in pea and spinach. Based on the profiles of inhibition by various chemicals, they have concluded that this protease is not a member of the serine, aspartate, cysteine, or metalloprotease families. Taken together, the sequence information and the biochemical data suggest that the CtpA-like enzymes constitute a new class of proteases. In this context, it is noteworthy that only one gene for this type of protease is present in the entire genome of Haemophilus (42).