Cloning and Expression in Escherichia coli of the Cytochrome c 552 Gene from Thermus thermophilus HB8

We report sequence of Thermus thermophilus HB8 DNA containing the gene (cycA) for cytochrome c 552 and a gene (cycB) encoding a protein homologous with one subunit of an ATP-binding cassette transporter. The cycA gene encodes a 17-residue N-terminal signal peptide with following amino acid sequence identical to that reported by (Titani, K., Ericsson, L. H., Hon-nami, K., and Miyazawa, T. (1985) Biochem. Biophys. Res. Commun. 128, 781–787). A modified cycA was placed under control of the T7 promoter and expressed in Escherichia coli. Protein identical to that predicted from the gene sequence was found in two heme C-containing fractions. Fraction rC 552, characterized by an α-band at 552 nm, contains ∼60–70% of a protein highly similar to native cytochromec 552 and ∼30–40% of a protein that contains a modified heme. Cytochrome rC 552 is monomeric and is an excellent substrate for cytochromeba 3. Cytochrome rC 557is characterized by an α-band at 557 nm, contains ∼90% heme C and ∼10% of non-C heme, exists primarily as a homodimer, and is essentially inactive as a substrate for cytochromeba 3. We suggest thatrC 557 is a “conformational isomer” ofrC 552 having non-native, axial ligands to the heme iron and an “incorrect” protein fold that is stabilized by homodimer formation.

We report sequence of Thermus thermophilus HB8 DNA containing the gene (cycA) for cytochrome c 552 and a gene (cycB) encoding a protein homologous with one subunit of an ATP-binding cassette transporter. The cycA gene encodes a 17-residue N-terminal signal peptide with following amino acid sequence identical to that reported by (Titani, K., Ericsson, L. H., Hon-nami, K., and Miyazawa, T. (1985) Biochem. Biophys. Res. Commun. 128, 781-787). A modified cycA was placed under control of the T7 promoter and expressed in Escherichia coli. Protein identical to that predicted from the gene sequence was found in two heme C-containing fractions. Fraction rC 552 , characterized by an ␣-band at 552 nm, contains ϳ60 -70% of a protein highly similar to native cytochrome c 552 and ϳ30 -40% of a protein that contains a modified heme. Cytochrome rC 552 is monomeric and is an excellent substrate for cytochrome ba 3 . Cytochrome rC 557 is characterized by an ␣-band at 557 nm, contains ϳ90% heme C and ϳ10% of non-C heme, exists primarily as a homodimer, and is essentially inactive as a substrate for cytochrome ba 3 . We suggest that rC 557 is a "conformational isomer" of rC 552 having non-native, axial ligands to the heme iron and an "incorrect" protein fold that is stabilized by homodimer formation.
Three cytochromes c from Thermus thermophilus HB8 have been described (cf. Fee et al. for review (1)). One of these, the very basic cytochrome c 552 , is a good substrate for the terminal oxidase, cytochrome ba 3 (ϳ250 electrons/s at 25°C (2), and a poor substrate for the terminal oxidase cytochrome caa 3 (ϳ1-2 electrons/s at 25°C, Ref. 3). Mechanistic and biophysical examination of these respiratory enzymes is greatly impeded, because useful quantities are only obtained from large amounts of bacterial cell mass, and yields of the cytochrome c 552 from T. thermophilus are small; for example, Ͻ1 mg of pure cytochrome c 552 is obtained from 1 kg of Thermus cell paste (4), although this may depend on growth conditions and purification procedures (4,5). Studies on the functional properties of the two Thermus cytochrome c oxidases are thus limited by the availability of and by the need for designed mutations in the cytochrome c. The prospect of having the three-dimensional structure of cytochrome ba 3 (6) and the report of an atomic resolution x-ray structure of native Thermus cytochrome c 552 (7) emphasize the potential of this system to provide novel insight on cytochrome c oxidase function. In the course of constructing a deletion series of a Thermus genomic DNA fragment for sequencing the c 552 locus, we observed that colonies from a certain time point in the series were distinctly reddishbrown in color (8). This led us to attempt engineered expression of the cytochrome c 552 gene in E. coli.
The c-type cytochromes have covalently attached heme groups usually linked via thioether bonds between two cysteine residues and the 2-and 4-vinyl groups of the heme. Bacterial cytochromes c are located in the periplasmic space in Gramnegative bacteria, or in the case of Gram-postive bacteria, they are bound to the outer surface of the plasma membrane by means of a hydrophobic N-terminal segment (9). Cytochrome c 552 of T. thermophilus was previously shown to be located in the periplasmic space (10). Bacterial cytochrome c synthesis involves several steps: (a) a pre-apocytochrome c is synthesized in the cytoplasm, i.e. a protein having no heme attached and which possesses an additional N-terminal domain containing signals to transport the protein across the plasma membrane and to specifically proteolyze the signal sequence peptide. (b) The apoprotein and protoheme IX are transported across the plasma membrane, where the signal peptide is cleaved. (c) Once heme and apoprotein are in the periplasmic space, the heme is covalently attached to the protein. And finally, (d) the protein folds into the conformation of the mature holoprotein (cf. Ref. 11 for review). While many details remain unknown, the entire process occurs within a few seconds of protein synthesis (12).
As many as 16 genes and/or gene products are essential for cytochrome c synthesis (cf. Refs. 11 and 13 for review). Included are ferrochelatase (HemH), an enzyme responsible for incorporating iron into protoporphyrin IX (14); heme lyase (CcmF and CcmH) to attach heme to apoprotein by catalyzing the formation of the thioether (see Refs. 11 and 15 for review and additional references); a thioredoxin-like molecule located in the periplasm maintains reduced cysteine SH in apocytochromes c (CycY or HelX, Refs. 16,17); a disulfide isomerase that presumably helps maintain the apocytochrome c in a state suitable for reaction with the lyase (DipZ) (18); unidentified "assembly factors" such as CycH in Paracoccus denitrificans (19); and genes that encode for ABC transporters (for review, see Refs. 11 and 20). For example, the helABCD operon of Rhodobacter capsulatus (17) and the cycVWXY operon of Bradyrhizobium japonicum (21) encode ABC-type ATP-dependent transport proteins that are essential for cytochrome c synthesis in their respective organisms (cf. Refs. 11 and 13 for review and additional citations). Finally, there are specialized gene products for synthesis of specific cytochromes c, e.g. cytochrome c 1 of the bc 1 complex (19,22), and there appear to be specific sequences within the cytochrome c molecule itself that contain information to guide the synthetic process (23,24).
Given the complexity of cytochrome c maturation, one might suspect that heterologous expression would be inefficient and, indeed, there are numerous reports to that effect (25)(26)(27)(28)(29)(30)(31). There are, however, some notable exceptions. Thus, Ubbink et al. (32) describe expression of the complete (including code for the signal peptide) Thiobacillus versutus cytochrome c 550 gene in E. coli. The cell mass obtained from each liter of culture medium yielded 1-2 mg of holoprotein, all of which appeared to reside in the periplasm. This allowed the authors to obtain enough protein for high-resolution NMR studies. Sambongi and Ferguson (33) made the remarkable observation that synthesis of holo Paracoccus denitrificans cytochrome c 550 in E. coli requires an N-terminal signal peptide to target the gene product to the periplasm, whereas E. coli can synthesize holo Hydrogenobacter thermophilus cytochrome c 552 without targeting to the periplasm. These authors attributed heme C formation to "spontaneous cytoplasmic attachment of heme to the (properly folded) thermostable protein," revealing yet another complexity of cytochrome c synthesis in E. coli. We describe here the first heterologous expression system for large scale preparation of Thermus cytochrome c 552 from E. coli and our initial characterization of the gene products.

EXPERIMENTAL PROCEDURES
Molecular Biology Methods-Genomic DNA was isolated from T. thermophilus strain HB8 (ATCC 27634) as described previously (34). E. coli strains were cultured in L-broth (35) modified by omitting glucose and lowering the sodium chloride to 0.5%, with 1.5% agar (Difco) for plates. Plasmid preparations were carried out by standard procedures (36). Probe design for genomic hydridization was based on the method of most probable codons (37, 38; see "Results") using the GCG (39) program CODONFREQUENCY. The cytochrome c 552 gene was isolated on a 1.6-kilobase KpnI genomic restriction fragment by the method of Mather et al. (40) and cloned in the Stratagene vector BSII(SKϩ), generating plasmid p13BCYCA (8). An exonuclease III deletion series was constructed to sequence the sense strand (Erase-a-base system, Promega), and synthetic oligonucleotide primers were used to sequence the antisense strand. Sequencing was performed by the method of Sanger et al. (41) with modifications as described (42) to alleviate compressions, false terminations, and pyrophosphorolysis.
Construction of pETC552-The c 552 reading frame was amplified by PCR 1 from p13BCYCA (42) while simultaneously introducing restriction sites for subcloning. The expression vector pET17b (Novagen, Madison, WI) was used to harbor and direct the synthesis of the following construct in E. coli strain BL21(DE3). The sense-strand primer, 5Јd(CTGCTCGGCGGCCTGGCATATGGCCCAG)-3Ј, introduces an NdeI site (underlined) and the start codon (bold type). The antisense primer, 5Ј-d(CTGGGCCAGCATGGGATCCGGTTACTT)-3Ј, introduces a BamHI site (underlined) just downstream of the native stop codon (bold type). PCR primers were synthesized at the California Institute of Technology using an Applied Biosystems 380B DNA synthesizer. The PCR reaction was optimized using the HotWax OptiStart Kit™ (Invitrogen, San Diego, CA). The PCR fragment was gel-purified to remove excess primers, nucleotides, and PCR buffer from the sample. After digesting the PCR fragment and pET17b with NdeI and BamHI, each was agarose gel-purified. The prepared vector and insert fragments were ligated with T4DNA ligase (New England Biolabs). Bacterial strain BL21(DE3) was transformed to ampicillin resistance with the ligation mixture and colonies containing the construct pETC552 were isolated for analysis. The construct was verified by DNA sequencing on an Applied Biosystems Prism DNA sequencing system at the California Institute of Technology sequencing facility.
General Methods-Optical spectra were recorded on a SLM/AMINCO model DB3500 spectrophotometer in 1-cm cells. Fully oxidized proteins were obtained by making the solution 10 M in ferricyanide, and reduced proteins were prepared by adding a small amount of strongly buffered sodium dithionite solution directly to the optical cuvette. The reduced-oxidized extinction coefficient, ⌬⑀ ϭ 14.3 mM Ϫ1 cm Ϫ1 (5), was used to determine approximate concentrations of the recombinant cytochromes c. Pyridine hemochromes were prepared and quantified according to the method of Berry and Trumpower (43). Second derivatives of absorption spectra were obtained using the Macintosh program, IGOR (Wavemetrics Inc., Portland, OR). Protein was measured by the BCA protein assay kit of Pierce according to manufacturer's instructions. Cytochrome c oxidase activity was determined polarographically at 25°C in a Gilson water-jacketed cell using a YSI 5331 oxygen probe (Yellow Springs Instruments) and a Chemtrix type 30 oxygen meter with output to a strip-chart recorder.
Amino acid analyses and N-terminal sequencing were done in the Protein Chemistry Laboratory at the University of New Mexico and carried out as described by Keightley et al. (42). Reaction of protein with iodoacetic acid was carried out as described by Simpson et al. (44). Under the conditions of the N-terminal sequence analysis, carboxymethyl cysteine has the same chromatographic properties as serine. SDS-polyacrylamide gel electrophoresis was carried out, with minor modifications, according to Downer et al. (45), using a Bio-Rad Mini-PROTEAN II electrophoresis cell (Hercules, CA). Nondenaturing gel electrophoresis was carried out with reversed polarity using the same apparatus according to the method of Gabriel (46). Electrospray mass spectrometry was carried out at the Scripps Research Institute Mass Spectrometry Facility (La Jolla, CA) using a Perkin-Elmer SCIEX API III mass analyzer with the orifice potential set at 100 V (47). Reconstructions of mass spectra from ion spectra were made using the Perkin-Elmer program Bio-MultiView. The GCG (39) program, PEPTIDE-SORT, was used to calculate expected properties of the isolated gene products.
Expression and Purification of Recombinant Cytochrome c 552 -10 ml of culture medium (LB and 50 mg/ml ampicillin) were inoculated from a freshly streaked plate of E. coli (strain BL21-DE3) cells containing the pETC552 plasmid. After incubation overnight at 37°C, this culture was used to inoculate 1 liter of LB medium, containing 50 mg/ml ampicillin, in a 2.8-liter Fernbach flask. This culture was incubated at 37°C overnight on a rotary shaker set at 125 rpm. The cells were pelleted by centrifugation at 5000 ϫ g for 5 min. The cell pellet was distinctively reddish-brown color, indicating heme protein overproduction. The pellet from 1 liter of culture was resuspended in 25 ml of 50 mM Tris-HCl, pH 8.0, 4 mg/ml lysozyme, 40 units/ml DNase I, 3 units/ml RNase A, 0.1% Triton X-100. p-toluenesulfonyl fluoride was added (2 mM) to inhibit proteolysis, and the extract was incubated at 30°C for at least 30 min. The cell debris was separated from the extract by centrifugation at 12,000 ϫ g for 15 min at 4°C. The supernatant, which had the characteristic reddish-brown color, was loaded onto a CM-52 gravity column that had been equilibrated with 25 mM Tris-HCl, 0.1 mM EDTA, pH 8.0 (25 mM TE buffer (TE buffer: 25 mM Tris-HCl, 0.1 mM EDTA, pH 8.0)) at 4°C, washed overnight with several column volumes of equilibration buffer, and eluted with a gradient of 0 -1 M NaCl in 10 mM TE buffer. Fractions containing the reddish-orange protein were combined and concentrated using Centricon 10 concentrators (Amicon) with YM-3 or YM-10 membranes. The concentrated protein was desalted (10 mM TE buffer) using a PD-10 (Amersham Pharmacia Biotech) and passed over a EconoPac CM Column (Bio-Rad) attached to an Amersham Pharmacia Biotech fast protein liquid chromatography purification system. The column was washed with several column volumes of the same buffer, and the protein was eluted with a gradient of 0 -2 M NaCl in 10 mM TE buffer. The final step of the procedure involves fast protein liquid chromatography purification using a HiLoad 16/60 Superdex 75 column (Amersham Pharmacia Biotech) that was equilibrated with TE buffer (see Fig. 3). The protein was concentrated as described above.

RESULTS
The amino acid sequence of T. thermophilus HB8 cytochrome c 552 was previously determined by Edman degradation of proteolytically generated peptides (48). We used reverse translation and took advantage of the highly skewed codon usage of T. thermophilus (cf. Refs. 49 and 50) to design a nondegenerate oligonucleotide probe for the gene that was complimentary to the protein sequence, QGQIEVKGMKYNG: 5Ј-(CCGTTC-TACTTCATCCCCTTGACCTCGATCTGCCCCTG). This probe was used to identify and isolate a 1.6-kilobase pair KpnI fragment as described under "Experimental Procedures." The entire fragment was sequenced on both strands and is presented in Fig. 1; the sequence of the probe was found to differ in only 4 of the 38 positions (at the positions underlined above).
The DNA sequence reveals two open reading frames. The first open reading frame begins at position 154 and ends at position 600 with the stop codon TAA. The translated nucleotide sequence from 205 to 597 corresponds to the chemically determined amino acid sequence of native cytochrome c 552 (48), and we designate this cycA. Translation from position 154 through 204 indicates the presence of a 17-amino acid signal peptide (underlined in Fig. 1; cf. Ref. 51). The region upstream of the initiation codon (position 154) contains elements that probably serve as a promoter, including Ϫ10 and Ϫ35 regions and a ribosome binding site, also underlined in Fig. 1 52 for a review of Thermus promoters). Only 9 bases downstream of the TAA stop codon in cycA, a second open reading frame begins at nucleotide 610 and ends with a termination codon (TAG) at 1404; we designate this region as cycB. The DNA sequences in the cycA and cycB regions were subjected to codon usage analysis using a table of codon usage generated from 14 previously sequenced T. thermophilus genes (cf. Ref. 42). The average codon preference value for cycA (nt 154 to nt 600) ϭ 1.3084, and the value for cycB (nt 610 to nt 1404) ϭ 1.2317, values typical of other Thermus genes (cf. Refs. 42 and 50). In contrast, average codon preferences of the other two possible frames in these two regions range from 0.3858 to 0.7692. These results reveal that both cycA and cycB adhere closely to Thermus usage and are thus properly assigned as open reading frames. The sequence of cycA predicts exactly the amino acid sequence of cytochrome c 552 determined chemically by Titani et al. (48).
The translated amino acid sequence from cycB is also shown in Fig. 1. A TFASTA search (53) against the Swissprot data base retrieved numerous proteins that share amino acid sequence similarity, with the highest scoring hits being members of the protein superfamily known as ABC transporters (data not shown). These form a diverse group of enzymes that bind and hydrolyze ATP to drive the transport of signal sequenceindependent transport of polypeptides, organic molecules, or ions across membranes (cf. Ref. 20 for a general review). As reviewed by Thöny-Meyer (11), a subset of this family is essential for cytochrome c maturation. Fig. 2 shows a complete alignment of the CycB amino acid sequence with those of the ATP-binding subunit of selected ABC transporters that have been implicated in cytochrome c maturation in several eubacteria. The designated A-and B-sites ( Fig. 2) form the ATP binding fold (20,54), and the sequence similarity in these regions as well as throughout the molecule is evident. CycB from Thermus thus appears to be a homolog of known cytochrome c maturation proteins and therefore probably has the same function in T. thermophilus (see "Discussion").
Two palindromic sequences were identified using the GCG (39) program STEMLOOP as underlined in Fig. 1. The first palindromic sequence occurs at nt 606 to nt 617 with a complementary sequence at nt 633 to nt 622. This includes 2 GϭU pairings, which suggests that this palindrome may form in the mRNA and thus may regulate a termination of translation. Since the start codon of cycB is located in the leading strand of this potential structure, it may provide part of an attenuation signal to modulate CycB formation. 2 Potentially the strongest stemloop-forming sequence begins at nt 1435 to nt 1448 with a complementary region nt 1466 to nt 1453. This occurs 30 nu-cleotides after the termination codon of cycB, it contains a single A-C mismatch and is probably involved in termination of transcription.
Although functional in Thermus cells, the suggested promoter elements upstream of cycA ( Fig. 1) do not direct the expression of detectable amounts of cytochrome c 552 in E. coli. Other T. thermophilus HB8 promoters containing similar Ϫ10 and Ϫ35 elements do function in E. coli (cf. Refs. 55-57), however, and transcription from a 23 S rRNA-5 S rRNA rRNA-tRNA Gly promoter with similar Ϫ10 and Ϫ35 elements was found to promote the initiation of transcription at the identical nucleotide position in both E. coli and Thermus (58). That holocytochrome c can be expressed from cycA in E. coli if preceded by a suitable promoter was first evidenced by the appearance of reddish-colored colonies among clones bearing a set of nested deletions that were prepared for DNA sequencing (8). This observation suggested to us that placing cycA under the control of another promoter might permit expression of useful quantities of recombinant cytochrome c 552 in E. coli.
Accordingly, a plasmid was prepared by PCR amplification of a DNA fragment containing cycA in which the forward primer includes an NdeI site and an alanine codon just upstream of the codon for glutamine at nucleotide position 205, while the reverse primer contains a BamHI site and the normal stop codon for cycA (see "Experimental Procedures"). This fragment was then ligated into the Studier vector (59), pET17b, that had been digested with NdeI and BamHI; by this strategy, cycA was placed immediately downstream of the T7 RNA polymerase promoter. The standard strategy in this expression system is to activate the expression of T7 RNA polymerase from the host genome through an IPTG-inducible promoter, which then transcribes an open reading frame placed downstream of the T7 promoter in a pET construct (59). The complete DNA sequence of the insert showed that no mutations had been introduced during the construction process. The predicted gene product thus begins with MAQADGAKI and ends with KKLGLK (cf. Fig. 1). Lacking a signal sequence, any protein formed is likely to be restricted to the cytoplasm (cf. Ref. 11).
When E. coli cells (strain BL21-DE3) containing this plasmid are grown on agar plates in the absence of the T7 RNA polymerase inducer (IPTG) they appear as reddish-colored colonies. Similarly, when the cells are grown aerobically in 1-liter batches, again in the absence of IPTG, the cell pellets are distinctly red-orange in color. However, induction of T7 RNA polymerase with IPTG in liquid cultures causes the formation of inclusion bodies in the host cytoplasm and leads to no detectable holocytochrome c formation. 3 These observations indicate that a balance exists under noninducing conditions between protein synthesis initiated by a somewhat "leaky" IPTGinducible promoter in the DE3 lysogen (cf. Ref. 59) and the ability of the cell to provide the necessary heme for cytochrome synthesis. We see no evidence for apocytochrome c in our preparations (see below). When cell pellets from cultures grown under expressing conditions (without IPTG) are treated with a lysozyme/EDTA mixture (60) and centrifuged, the resulting supernatant has only a faint color, suggesting that little if any heme protein was transported to the periplasm (not shown). However, when the cells are disrupted (see "Experimental Procedures") and subjected to centrifugation at 10,000 ϫ g for a few minutes, the supernatant has a strong reddish-orange color. A cation exchange resin (CM-52) clears the color of the supernatant in a few minutes and itself turns reddish-brown from the adsorbed material. Accordingly, a purification procedure was developed based on the cationic nature of cytochrome c 552 at neutral pH (the pI predicted by PEPTIDESORT is 10.2, in agreement with that observed for native protein (5) and its relatively small molecular mass (ϳ15 kDa) (see "Experimental Procedures"). As shown in Fig. 3, the elution profile of the final gel filtration column reveals the presence of two cytochrome c-containing fractions. The void volume of this column is ϳ24 ml with an exclusion limit of ϳ100,000 Da. The first peak, eluting at ϳ65 ml, shows an ␣-band absorbance in the reduced state at 557 nm (see below) and is designated cytochrome rC 557 , while the major peak, eluting at ϳ79 ml, has an ␣-band at 552 nm and is designated cytochrome rC 552 . (We have not done a complete standardization of the column, but the elution profile is consistent with the appearance of a 30-kDa dimer at 65 ml and a 15-kDa monomer at ϳ80 ml.) The ratio of absorbance at 408 nm to the absorbance at 280 nm (r ϭ A 408 /A 280 ) is a good indicator of purity, and these are relatively constant across the two peaks. For rC 557 the purity ratio is Նϳ3.1, whereas for rC 552 the ratio is Նϳ4. Yoshida et al. (4) reported a value of 5.1 for native cytochrome c 552 from Thermus cells, and our best preparations of rC 552 have values of ϳ4.5. As evidenced by gel electrophoresis (Fig. 4), N-terminal sequence analyses (Table I) and electrospray mass spectrometry (Figs. 5 and 6), each of these fractions contain highly pure cytochrome c proteins that are encoded for by the Thermus cycA gene. Fig. 4A shows SDS gels stained with Coomassie Blue. The sample of native protein shows a majority band near 14.5 kDa and weak impurity bands at ϳ21.5 and ϳ40 kDa. The rC 552 sample shows a single band at ϳ14.5 kDa and no obvious impurity bands. Similarly rC 557 shows a single band at ϳ14.5 but has a band of lesser intensity near 30 kDa. Thus, the two recombinant proteins appear to have molecular masses close to that of the native protein and consistent with translation of the cloned gene. Fig. 4B shows SDS gels that were stained for heme based peroxidase activity according to the method of Thomas et al. (61). It can be seen that the principal protein bands in all samples, appearing at ϳ14.5 kDa, stain for heme, and that the apparent impurity band near 31 kDa in cytochrome rC 557 also stains for heme. We believe that the latter is due to a dimer of the 14.8-kDa protein that survives the denaturing influence of the SDS even at 100°C (see below).
While native cytochrome c 552 is N-terminally blocked (48), the recombinant protein was designed to begin with the MAQ sequence so as to avoid this potentially complicating factor and to remove the signal sequence. Partial N-terminal sequence information on both of these fractions (Table I) show sequence beginning with alanine. The proteins are very pure with respect to the N terminus, and the individual amino acids were obtained in high percent yield (data not shown). Prior to Nterminal sequence determination, the samples were reduced with dithiothreitol, reacted with with ICH 2 COOH and purified by high performance liquid chromatography. These procedures should convert any free cysteine-SH groups to the carboxymethyl derivative (44). The left column of Table I shows the yield of the individual amino acids from cytochrome rC 552 as the sequencing reactions progress. It is noteworthy that the encoded N-terminal methionine (62) has been completely removed in both fractions and that the first 11 amino acids are otherwise predicted by the gene sequence (cf. Fig. 1). At the first predicted cysteine, however, the yield dropped by ϳ10fold, increased again for the next two amino acids (also predicted by the gene sequence), and fell again precipitously at the next predicted cysteine. The last amino acid reported is the histidine predicted to serve as the proximal ligand to the heme iron (7). These data demonstrate that both cysteine groups are largely blocked (Ͼ95%) in cytochrome rC 552 , at least partly by forming thioether bonds with the two vinyl groups of the heme, as is seen in normal cytochromes c (however, see below). The right column of Table I shows the yield of the individual amino acids from cytochrome rC 557 as the sequencing reactions progress. The N-terminal sequence of cytochrome rC 557 is identical to that of cytochrome rC 552 (Table I), and in this particular sample, there is no detectable Cys(Cm)-cysteine, suggesting complete reaction of the cysteine-SH with the heme. In addition, overall amino acid compositions of both samples are consistent with the compositions predicted from the cycA sequence in pETC552 (data not shown). Fig. 5 shows electrospray mass spectra of native cytochrome c 552 from T. thermophilus and recombinant cytochromes rC 552 and rC 557 obtained from E. coli. The ion spectrum of the native sample, recorded with a 10 mM NH 4 ϩ -acetate solution, shows seven major ions ranging from 1847(8ϩ) to 1059(14ϩ) and a few very small peaks probably from impurities (spectrum not shown). The mass spectrum of native cytochrome c 552 (isolated from Thermus cells) in aqueous solution has its primary peak at 14,771 Da followed by a ladder of peaks resulting from the binding of one to several Na ϩ ions (Fig. 5A). The predicted molecular mass of the molecule, based on the amino acid sequence of Titani et al. (48), after correcting for the formation of an N-terminal pyroglutamate residue (14,157 Da) and adding 617 Da for the iron-protoporphyrin IX, is 14,774 daltons. The difference between the predicted and observed values is just outside the expected error limits of Ϯ 2 Da, but this is not pursued further here. There is no evidence for apoprotein in the mass spectrum recorded on a broader scale (not shown). Furthermore, dissolving the protein in a highly denaturing solvent prior to recording the ion spectrum (see below) reveals no apoprotein (not shown). This is to be expected if all the heme in the sample is covalently bound to the protein.
The electrospray ion spectrum of cytochrome rC 552 , recorded from a solution of 10 mM NH 4 ϩ -acetate, consists of only two major peaks at 2483(6ϩ) and 2128(7ϩ) and a minor peak at 1862(8ϩ). The mass spectrum of cytochrome rC 552 (Fig. 5B) reveals a relatively small peak is at 14,858, a dominant peak at 14,890 Da, and a lesser peak at 14,910 Da, the latter corresponding to the binding of one Na ϩ ion to the 14,890-Da material. A broader reconstruction of the mass spectrum from the ion spectrum failed to reveal the presence of apoprotein (14,245 ϩ -acetate. The primary peak is at 14,771 daltons followed by a ladder of peaks resulting from the binding of 1-5 Na ϩ ions. B shows the mass spectrum of cytochrome rC 552 in 10 mM NH 4 ϩ -acetate. The first peak is at 14,858, and the dominant peak is at 14,888 daltons followed by a peak at 14,910 corresponding to binding of one Na ϩ ion. The predicted molecular mass of rC 552 is 14,245 ϩ 617 ϭ 14,862 daltons. C shows the mass spectrum of cytochrome rC 552 in the denaturing solvent: 30% formic acid, 60% isopropanol, 10% water. Injection of a concentrated cytochrome solution into this solvent causes its normally orange color to change to brown-green. The electrospray experiment was carried out immediately thereafter. The dominant peak in the spectrum is now at 14,874 with lesser peaks at 14,566 and 14,891 daltons. D shows the mass spectrum of cytochrome rC 557 in 10 mM NH 4 ϩ -acetate. The shape of this spectrum is unchanged on dissolution of the protein into the denaturing solvent. The primary peak is at 14,858 daltons. See text for details. Da). The predicted molecular mass for rC 552 is 14,245 ϩ 617 ϭ 14,862 daltons which may correspond to the relatively small peak at 14,858 Da, although it is just outside the error limits of Ϯ 2 Da. The most likely explanation of the mass spectrum is that 1-3 water molecules remain tightly bound to this protein even after the ionization/evaporation event of the electrospray experiment. This notion is supported by the observation that when cytochrome rC 552 is diluted into the strongly denaturing solvent (cf. Ref. 42), 30% formic acid, 60% isopropyl alcohol, 10% water, the majority peak is shifted from 14,890 in water to 14,875 Da (Fig. 5C). This 15-Da decrement is assigned to the loss of a single water molecule. Interestingly, the peak at 14,890 Da remains in Fig. 5C, but at reduced intensity, suggesting that the water is quite tightly bound to the protein. In contrast to native cytochrome c 552 , dissolution in the denaturing solvent produces a small amount of material (ϳ5%) appearing at 14,245 Da that corresponds to the predicted mass of the apoprotein. This is consistent with the finding that ϳ4 -5% of the cysteine residues in this fraction are not covalently attached to heme (Table I). A reconstruction of the mass spectrum over the range 10 -50 kDa (not shown) reveals only a weak band at 29,724 Da, which is assigned to cytochrome rC 557 . As shown in Fig. 5D, the mass spectrum of cytochrome rC 557 in aqueous solution reveals a single peak at 14,858 Da which is very close to that predicted from the gene sequence plus one iron-protoporphyrin IX (14,862 Da). However, this is not the dominant species in the mass spectrum (see below). As with the rC 552 protein, there is no evidence for apoprotein in this sample. Similarly, while the ion spectrum (not shown) changes dramatically on dissolution of the protein into the denaturing solvent, the reconstructed spectrum in this range (not shown) is essentially the same as that obtained in aqueous solution (see below). Also, there is no evidence for formation of apoprotein upon denaturation, confirming the N-terminal sequence data, which shows that essentially all the heme in rC 557 is covalently bound ( Table I).
The monomeric species in the mass spectrum of rC 557 (Fig.  5D) represents only a very small fraction of the protein reaching the mass detector. Instead, the protein appears to oligomerize. The molecular ion spectrum of cytochrome rC 557 is shown in Fig. 6 and, as noted above, is very different from that of cytochrome rC 552 . Here there are 12 ions ranging from 1982.0 (15ϩ on the dimer) to 1144.0 (est. 26ϩ on the dimer) occurring in unit charge steps. A broad scale reconstruction of the mass spectrum using the program Bio-MultiView reveals peaks at n ϫ 14,858, where n appears to increase without bound. Thus, there is an extremely small peak at 14,858 Da (that shown in Fig. 5D), a strong peak at 29,716 Da, a much weaker peak at 44,575 Da, followed by a stronger peak at 59,433 Da, a weak peak at 74,291 Da, and so on. The molecular ion spectrum, however, supports only the presence of dimers. This can be seen, for example, by using the formula M app ϭ [((m/z) ϫ z) Ϫ z] and assuming that the ion peak at 1982.0 is due to the tetramer (M app ϭ 59,430 Da) and has 30ϩ charges. It is predicted then that an ion peak must occur at ϳ1918 having 31ϩ charges. No such peak is visible in Fig. 6. Similar ion peaks are expected for the tetramer mass at all odd charge values; by inspection, these are not present. Therefore, the mass spectrum is only consistent with the presence of a dimer having mass 29,716, which is twice that expected for the monomer (see above). This suggests that dimerization is the physical basis for the separation of rC 557 from rC 552 on the gel filtration column (Fig. 3).
To further test for the presence of multimers in aqueous solutions of rC 557 , samples were subjected to gel electrophoresis under nondenaturing conditions. Fig. 4C shows reversed polarity electrophoresis at pH 4.5 in 15% acrylamide gel (46). In the left lane, the native cytochrome c 552 from Thermus shows essentially a single band (and a minor impurity), and the recombinant rC 552 fraction (central lane) shows a dominant single band with electrophoretic mobility very similar to that of native protein and a weak band of lesser mobility. In the case of rC 557 (right lane), only a weak band is observed at the mobility of native and rC 552 proteins, but it shows a strong band with approximately the same electrophoretic mobility as the weak band in rC 552 . Were there a mixture of dimers, tetramers, etc., these would appear as a ladder of bands, which is not the case. Thus, as anticipated from the electrospray and gel filtration observations, this experiment indicates that "as isolated" rC 557 exists predominantly as a homodimer, whereas rC 552 exists primarily as a monomer, with the possibility of  Table II; the visible peak(s) in the pyridine hemochrome spectrum is also included (in parentheses). The native protein has Soret maxima at 409 nm (oxidized) and 417 nm (reduced), and the reduced protein shows the ␤-band at 525 nm and the ␣-band at 552 nm, in agreement with previous work (5). By comparison, rC 552 has Soret maxima at 410 nm (oxidized) and 418 nm (reduced), and the reduced protein has bands at 522 and 552 nm; rC 557 has Soret maxima at 416 nm (oxidized) and 423 nm (oxidized), while the reduced protein has absorption bands at 526 and 557 nm. The different optical absorption properties indicate that the heme is experiencing a significantly different environment in the two proteins (see below). In addition, the reduced spectrum of rC 552 shows a distinct shoulder near 580 nm (Fig. 7), suggesting the presence of some non-heme C material (see below). This shoulder is also evident in the rC 557 sample but at much lower intensity than in rC 552 .
Some of the spectral differences are better visualized in the second derivative absorption spectra of the Soret (oxidized and reduced) and visible (reduced) regions, as shown in Fig. 8. This presentation defines the Soret absorption maxima mentioned above and illustrates the characteristic splitting of the ␣-band of native c 552 (548.4 and 553.4 nm), first recognized by Honnami and Oshima (5), 4 and shows that this distinguishing feature is absent in rC 552 , which exhibits only a single, second derivative absorption band at 550 nm. Interestingly, the second derivative of the absorption spectrum of cytochrome rC 557 suggests a split ␣-band that is shifted several nanometers to the red (552.8 and 558.2 nm). The broad shoulder near 570 nm, evident in the spectra of Fig. 7, does not survive the loss in signal-to-noise that accompanies calculation of the second derivative.
The pyridine hemochrome (phc) spectra of native cytochrome c 552 and recombinant cytochromes rC 552 and rC 557 are shown in Fig. 9, A and B. These provide additional insight into the origin of the spectral differences described above. In the Soret region, the native Thermus cytochrome c 552 shows a single band at 413 nm, whereas cytochrome rC 552 shows a somewhat broader band near 415 nm with a distinct shoulder near 433 nm. The cytochrome rC 557 fraction shows the major peak near 416 nm, being approximately as broad toward the blue as the rC 552 spectrum but having distinctly less of the 433 nm material evident toward the red. In the visible region (Fig. 9B), the phc spectrum of native Thermus cytochrome c 552 shows a ␤-band at 522 nm and a sharp ␣-band at 550 nm with no evidence for a longer wavelength band. By contrast, the phc spectrum of rC 552 shows bands at 522, 550, and ϳ576 nm, while rC 557 shows maxima in the visible region at 521 and 552 nm and has a much smaller contribution near 576 nm than does the spectrum of c 552 . The fact that the heme C components in the phc spectrum of rC 552 and rC 557 are essentially identical to each other when the protein is denatured is consistent with the above conclusion that the protein in these two fractions is the same. 5 The small differences in the native and recombinant phc spectra are likely due to the different N terminus, a neutral pyroglutamate in native protein versus a glutamine and a neutral N-terminal alanine (in the strongly basic solution) in the recombinant proteins.
The absorption spectra indicate the presence of some nonheme C material in the recombinant cytochrome c preparations. As noted above, essentially all the heme in both fractions is covalently bound, as confirmed by the ability of acid-acetone extraction (63) to remove only traces of color from the rC 552 protein (data not shown). Some heme could be removed from rC 552 by the silver sulfate method of Paul (64) (provided the protein was first dialyzed against dilute acetic acid to remove chloride), as indicated by the fact that the phc spectrum of the resulting extract was similar to that of starting material (not shown). However, treatment of the extracted heme by the method of Grinstein (65), which simultaneously removes the iron and produces the dimethyl ester, reveals only the presence of protoporphyrin IX (as evidenced by the characteristic four- 4 The ␣-band (or Q 00 transition) results from the promotion of an electron from the vibrational ground state of the nondegenerate a 2u () orbital to the vibrational ground state of the doubly degenerate e g () x and e g () y orbitals of the porphyrin (76,77). The characteristic A-type magnetic circular dichroism spectrum in this region of the spectrum is caused by the field induced splitting of e g () xy (78,79). In addition, small structural changes, which induce structural differences along the x and y directions of the porphyrin, can also split the e g () xy orbitals, resulting in the appearance of two transitions in the ␣-band region, the so-called split-␣ spectrum seen in many cytochromes c (80). Than et al. (7) remark that the heme in their crystals of native cytochrome c 552 is distorted into a "saddle shape," which could be the origin of the observed splitting of the native c 552 ␣-band. 5 Cytochromes c from certain protozoa (81,82) and C14A human cytochrome c expressed in yeast (83) show ␣-bands near 556 -558 nm in their reduced forms with pyridine hemochrome ␣-bands from 553-554 nm. In addition, the phc ␣-band is shifted to 550 -551 nm on heating in the presence of hydrazine. Amino acid sequence analysis of cytochrome c 557 from Crithidia oncopelti and cytochrome c 558 from Euglena gracilis showed that an alanine was substituted in the place of the cysteine nearest the N terminus (81). Because heating with hydrazine has no effect on the ␣-band of heme C in the recombinant cytochromes being described here (J. A. Keightley, D. Sanders, T. R. Todaro, A. Pastuszyn, and J. A. Fee, unpublished results), and the data of Table I show that both cysteine residues are blocked, the unique spectral properties of rC 557 are not due to failure to form a thioether linkage between the protein and the heme.  The numbers in parentheses correspond to the position of the pyridine hemochrome (see Fig. 9). line spectrum with peaks at 505, 538, 577, and 625 nm (65). We will defer identification of this material until it can be separated from the protein in a form suitable for chromatographic purification (Ref. 66; however, see "Discussion").
The above results show that the cytochrome rC 552 obtained from E. coli is not a "molten image" (Psalms 106:19) of native cytochrome c 552 . However, the data of Fig. 10 show that cytochrome rC 552 is an excellent substrate for cytochrome ba 3 , having ϳ85% of native c 552 activity, while cytochrome rC 557 is quite a poor substrate, having only ϳ5% of native cytochrome c 552 activity. Thus, when used as a substrate for Thermus cytochrome ba 3 , we obtain the following Michaelis-Menten parameters: native cytochrome c 552 has V max ϭ ϳ70 s Ϫ1 and K m ϳ10 M while cytochrome rC 552 has a V max of ϳ60 s Ϫ1 and K m ϳ15 M. 6 Soulimane et al. (2) have reported a somewhat higher value for V max (ϳ250 s Ϫ1 ) and a similar value for K m (ϳ17 M).

DISCUSSION
Cytochrome c synthesis in E. coli is a complicated process that appears to accommodate only certain foreign cytochrome c genes, and there are few examples of successful expression. While we were surprised that E. coli bearing pETC552 produced such high levels of holocytochrome c 552 , our enthusiasm has been tempered by the knowledge that this cytochrome is actually a complicated mixture of different forms. Identification of the structural gene for the cytochrome c in the same operon with an ABC transporter was also an unexpected finding.
We will first deal with the implications of cycB. The currently suggested function of the ABC transporters in cytochrome c maturation is to selectively transport protoheme IX from the cytosol into the periplasmic space and to bring it into contact with a specific apocytochrome c, and Thöny-Meyer (11) has pointed out that the CcmB component of the ABC transporter (not yet identified in Thermus) may contain a heme binding site. Among the currently known genes involved in cytochrome c maturation, none is transcribed as part of an operon that includes the cytochrome c structural gene (11). While further work is needed to establish involvement of this ABC transporter component in Thermus cytochrome c biosynthesis, our finding of an operon that encodes both the structural gene of the cytochrome c and a component of the ABC transporter strongly suggests that the latter is involved in the maturation process.
We turn now to the cytochrome c expressed in E. coli. As reported by Titani et al. (48), the mature holocytochrome c 552 isolated from T. thermophilus has a blocked N terminus, probably a pyroglutamate. The DNA sequence reveals that the cytochrome is synthesized in Thermus cells as a pre-protein having an N-terminal signal peptide. The sequence of this signal peptide, (M)K ϩ R ϩ TLMAFLLLGGLALA2Q, is very similar in length and composition to other known pre-apocytochrome c sequences (27,28,32,67,68), which typically have 1-2 positively charged amino acids at the N terminus followed by 14 -20 hydrophobic amino acids; proteolytic cleavage occurs prior to the first hydrophilic residue (69), in this case 2Q. Very likely, this sequence is responsible for directing the protein into the periplasm of Thermus cells (10). The mechanism of pyroglutamate (48) formation in the mature cytochrome c 552 is not known, and, unfortunately, because residues 1-3 are disordered in the currently available crystals (7), it is impossible to confirm the presence of a pyroglutamyl N terminus.
The truncated cycA gene in pETC552 begins with the sequence MAQ followed by the sequence of the mature protein. In both fractions, the N-terminal Met is completely removed. As evidenced by the absence of heme-free protein in our preparations, essentially all of the cytochrome c apoprotein that is synthesized appears to react with protoheme IX. This is occurring in the cytoplasm and is thus consistent with the idea of Sambongi and Ferguson (33), that a properly folded and reduced apoprotein can react spontaneously with protoheme IX to form the thioether linkages of heme C. However, there would appear to be two folding paths, one of which results in monomeric rC 552 and the other dimeric rC 557 . In addition, part of the heme in both fractions is chemically modfied (see below).
The expression system yields ϳ3 mg of purified cytochrome rC 552 and ϳ0.8 mg of cytochrome rC 557 per liter of culture medium. This level of production is adequate to obtain the quantities of material required for most modern biophysical studies. However, because there are several cytochrome c related products in the final protein solutions, the use of this material will be somewhat restricted. The majority species in fraction rC 552 appears to be a "native-like" cytochrome c. Because rC 552 lacks the "split-␣" absorption band which is characteristic of the native protein in its reduced state (5), the unique structural details responsible for splitting the e g () xy excited state (cf. Footnote 4) are evidently not present in the recombinant protein. Nevertheless, two observations, that will be published elsewhere, suggest that this material probably has an atomic structure very similar to that of native cytochrome c 552 . First, high resolution NMR studies reveal a pattern of paramagnetically shifted 1 H NMR lines similar to that previously reported for native Thermus cytochrome c 552 (Ref. 70). 7 Second, single crystals of cytochrome rC 552 have been grown that have the same morphology and x-ray diffraction properties as those obtained with native protein. 8 The most important complication is that the cytochrome rC 552 contains a significant amount of non-heme C chromophore. While the atomic structure of this chromophore remains to be determined, we know the following from our current data. The material is covalently attached to the protein, probably at the cysteine residues; it has a molecular mass very close to that of protoheme IX (within Ϯ3-4 Da); it is removed from the protein under the same conditions that remove the protoheme IX, namely reaction with Ag ϩ ions; either it does not survive the strongly acidic conditions needed to remove the iron from protoporphyrin IX, or it does not release its Fe under these conditions 9 ; it is redox active and accounts for ϳ30 -40% of the oxidizing equivalents in the sample while heme C accounts for the remainder 10 ; and it reacts with hydrazine at elevated temperatures to create a new chromophore having an absorption band at 602 nm. 11 Unfortunately, we have not found growth conditions that suppress formation of this material. Fortunately, the unusual heme is not a serious impediment to examining the steady-state kinetic/proton translocation properties of cytochrome ba 3 . Thus, on a per milligram protein basis, the rC 552 protein is ϳ85% as active as native cytochrome c 552 , suggesting that the protein molecules to which the unusual heme is bound are at least partially active. Alternatively, the change from the neutral N-terminal pyroglutamate to the positively charged N-terminal Ala may affect the interaction with cytochrome ba 3 . While not perfect, the expression system, as it is currently used, will provide the substrate needed for many functional studies with cytochrome ba 3 that were not previously feasible.
What is the nature of the non-heme C material? Barker and co-workers (71,72), in attempting to transmute cytochromes b to cytochromes c by the judicious placement of cysteine residues near the heme, discovered a noncovalently attached green heme in their recombinant preparations, which they recently identified as iso-spirographis heme, in which the 4-vinyl of protoheme IX is oxidized to a formyl group. 12 The unusual heme is thought to form, during expression in E. coli, by a heme-mediated oxidation of its own vinyl group when in close proximity to the free thiol group of the protein (71). The pyridine hemochrome spectrum of the free iso-spirographis heme has absorption maxima at 434, 538, and 582 nm (73), while the pyridine hemochrome spectrum of the unusual heme in fraction rC 552 shows absorption bands near 433 nm and at 576 nm. Because the electrospray spectra show essentially one protein species, the non-C heme material has a molecular mass very close to that of protoheme IX. A mono-formyl heme would have a mass only 2 Da larger than protoheme, a difference that would not be resolved in our spectra. This and the similarity of the phc spectra support a working hypothesis that the unusual chromophore is a "formyl" heme. The N-terminal sequences and mass spectral data indicate that the unusual heme is covalently attached to the protein in a manner that protects both cysteine residues from reacting with iodoacetate, perhaps as a thiohemiacetal bond. Efforts are currently underway to separate this material from the true recombinant cytochrome c 552 and to characterize it by a combination of resonance Raman, high resolution NMR, EPR, and other techniques.
The third interesting product of the gene expression system is the rC 557 fraction. All the analytical results indicate that the two proteins are identical in composition, but the red-shifted ␣-band in the spectrum of the reduced protein indicates that the heme C is in a significantly different environment. We hypothesize, therefore, that cytochrome rC 557 is a "conformational isomer" of cytochrome rC 552 , which has ligands to the heme iron different from those of the native protein (His-14 and Met-69) and that this conformation has a strong propensity to dimerize.
Sosnick et al. (74) have rationalized biphasic ("fast" and "slow") folding in horse heart cytochrome c, with the fast phase due to molecules having the heme properly coordinated in the unfolded state (His-14 and Met-80), while the slow phase is due to molecules that have suffered "adventitious misorganization in early chain condensation" caused by non-native ligation of the heme in the unfolded state (either His-26 or His-33). The presence of rC 557 in our preparations can be explained by the scheme of Sosnick et al. (74) if a portion of the molecules is improperly coordinated in the unfolded state (His-16 and pos-sibly His-33, Met-64, or His-87 in place of Met-70), followed by thermodynamic trapping in this state by formation of a dimer. Bryngelson et al. (75) have pointed out that misligation during cytochrome c folding would introduce considerable roughness into the energy landscape for the folding of this protein, which otherwise appears to be quite smooth. The low activity of rC 557 as a substrate for cytochrome ba 3 suggests that the dimer interface involves a portion of the surface area that normally interacts with the oxidase. Once the structural coordinates are made available (7), it may be possible to deduce the structure of this dimer. A fourth complication is that the rC 557 fraction also contains unusual heme, though by comparison with the rC 552 fraction, at a much lower level.
In summary, thermostable cytochromes c can be expressed heterologously in E. coli cells, but the resulting product is not the desired mature, homogeneous holoprotein. Indeed, there appears to be a multiplicity of closely related products, some of which, at least, perform the expected biological function.