The Identification and Structural Characterization of C7orf24 as γ-Glutamyl Cyclotransferase

The hypothetical protein C7orf24 has been implicated as a cancer marker with a potential role in cell proliferation. We have identified C7orf24 as γ-glutamyl cyclotransferase (GGCT) that catalyzes the formation of 5-oxoproline (pyroglutamic acid) from γ-glutamyl dipeptides and potentially plays a significant role in glutathione homeostasis. In the present study we have identified the first cDNA clones encoding a γ-glutamyl cyclotransferase. The GGCT gene is located on chromosome 7p14-15 and consists of four exons that span 8 kb. The primary sequence is 188 amino acids in length and is unlike any protein of known function. We crystallized functional recombinant γ-glutamyl cyclotransferase and determined its structure at 1.7 Å resolution. The enzyme is a dimer of 20,994-Da subunits. The topology of GGCT is unrelated to other enzymes associated with cyclotransferase-like activity. The fold was originally classified as “BtrG-like,” a small family that only includes structures of hypothetical proteins from Mus musculus, Escherichia coli, Pyrococcus horikoshii, and Arabidopsis thaliana. Since this is the first member of this family with a defined function, we propose to refer to this structure as the γ-glutamyl cyclotransferase fold. We have identified a potential active site pocket that contains a highly conserved glutamic acid (Glu98) and propose that it acts as a general acid/base in the reaction mechanism. Mutation of Glu98 to Ala or Gln completely inactivates the enzyme without altering the overall fold.

The hypothetical protein C7orf24 has been implicated as a cancer marker with a potential role in cell proliferation (1)(2)(3).
The ␥-glutamyl cycle involves the synthesis of glutathione and its utilization in the uptake of amino acids, particularly in the renal tubule and across the blood-brain barrier in the choroid plexus in the brain (4, 5) (Fig. 1). GGCT (EC 2.3.2.4) is an essential enzyme in this pathway and catalyzes the general reaction, L-␥-Glutamyl-L-amino acid 3 5-oxoproline ϩ L-amino acid REACTION 1 Deficiency of glutathione synthetase can lead to chronic 5-oxoprolinuria as a result of the uncontrolled synthesis of the dipeptide ␥-glutamylcysteine and its subsequent hydrolysis to cysteine and 5-oxoproline (pyroglutamic acid) by GGCT (6, 7) (Fig.  1). In some patients, the generation and urinary excretion of 5-oxoproline has been reported to be up to 50 g/day. This extraordinary production of 5-oxoproline and associated acidosis has been considered to be a significant factor in the pathology associated with this deficiency. The inhibition of GGCT in cases of glutathione synthetase deficiency would block the degradation of ␥-glutamylcysteine and allow its accumulation to a level where it may partially substitute for glutathione in redox and detoxification reactions. Thus, there is potential therapeutic value in designing specific inhibitors of GGCT. Deficiency of 5-oxoprolinase, the enzyme that catalyzes the ATP-dependent decyclization of 5-oxoproline to glutamate, has been reported in several patients and also leads to 5-oxoprolinuria (4 -10 g/day) (7). This indicates that there is a significant turnover of glutathione and ␥-glutamyl-amino acid dipeptides via GGCT and the ␥-glutamyl cycle under normal metabolic conditions. The position of GGCT in the ␥-glutamyl cycle suggests that it could play a significant role in regulating the synthesis of glutathione by limiting the availability of ␥-glutamylcysteine; however, this possibility has not been critically investigated.
Several previous studies have purified GGCTs from a range of mammalian species (8 -12), and multiple isoforms have been detected in rat and human tissues. An early study using starch gel electrophoresis identified a genetic polymorphism in mouse GGCT and mapped the locus by linkage to Lyt-2 on mouse chromosome 6 (13). Despite these previous studies and the significant role this enzyme plays in the ␥-glutamyl cycle, a cDNA encoding a GGCT has never been cloned, and nothing is known of its sequence, structure, or genomic organization. In this study, we have cloned a cDNA encoding human GGCT and determined its primary amino acid sequence and gene structure. Bioinformatics analysis has identified related sequences in many animal species but has not identified sequence similarity with any other protein of known function. We have also expressed functional recombinant enzyme and determined its crystal structure. GGCT has a novel fold that contains an unusual overlapping ␤ strand motif and is a member of a small family of proteins of unknown function that have been identified by structural genomics projects. Thus, we have identified the first function attributed to this new protein family.

Purification of Erythrocyte GGCT
GGCT was partially purified from expired human red blood cells that were generously provided by the Australian Red Cross Blood Transfusion Service. The purification procedure followed the DEAE-cellulose and ammonium sulfate steps that were previously described (11), and the presence of the enzyme was detected in chromatographic fractions by a spectrophotometric assay that used ␥-glutamyl-L-alanine as a substrate (14).
The active fractions were subjected to SDS-PAGE with Coomassie Blue staining, and the GGCT band was specifically identified by Western blotting with a rabbit antiserum raised against previously purified human erythrocyte GGCT (11). The band corresponding to the GGCT was excised from the Coomassie Bluestained gel and subjected to trypsin digestion and mass spectrometry analysis by the Australian Proteome Analysis Facility.

Cloning and Expression of Recombinant GGCT
An IMAGE consortium kidney cDNA clone (BC019243) was used as a template with the primers hGCTSF 5Ј-GTACCGCG-GTGGCATGGCCAACTCGGGCTG and hGCTNR 5Ј-AAT-GCGGCCGCGCACATAGAATACCCTTAG to amplify the GGCT coding sequence by PCR. The amplified product (629 bp) was cloned into pGEM-T (Promega) and sequenced to confirm the absence of amplification errors. All DNA sequencing was on an ABI 3730 sequencer (Australian Cancer Research Foundation Biomolecular Resource Facility, John Curtin School of Medical Research, Australian National University) following the manufacturer's protocol (Applied Biosystems). The cloned cDNA was excised from the pGEM-T vector with KspI and NotI ligated into the same sites in the pHUE vector (15) and transfected into Escherichia coli BL21(DE3). Expression of recombinant enzymes in pHUE allows the rapid purification of the expressed protein by Ni 2ϩ -agarose chromatography and the subsequent removal of all additional NH 2 -terminal residues by cleavage with the catalytic domain of mouse ubiquitin-specific protease 2, as previously described in detail (15). Some minor contaminating proteins remained, so the enzyme was subjected to an additional step of ion exchange chromatography. The protein was dialyzed into 10 mM potassium phosphate, pH 6.5, fractionated on a 1 ϫ 12-cm column of DEAE-Sepharose with a linear gradient to 0.2 M potassium phosphate, pH 6.5, containing 1 M NaCl. The eluted fractions containing GGCT activity were pooled, and if not used immediately, the enzyme was stored frozen at Ϫ20°C.
Selenomethionine-labeled GGCT was expressed using the same vector but grown under modified conditions, as described by Van Duyne et al. (16). The labeled GGCT was purified as described above and transferred into 20 mM Tris/HCl, 0.1 mM dithiothreitol, pH 8.0, while being concentrated after the DEAE ion exchange chromatography step.

Determination of GGCT Activity and Kinetic Constants
A continuous spectrophotometric method that links the production of free alanine to the oxidation of NADH via the action of added glutamate-pyruvate transaminase and lactate dehydrogenase was used, as previously described (14). The kinetic constants were determined by varying the ␥-glutamyl-L-alanine concentration between 0.5 and 30 mM over seven steps in triplicate. The rate data were fitted to the Michaelis-Menten equation by the Mac-CurveFit version 1.5 program (Kevin Raner Software).
Western blots were performed as described by Towbin et al. (17). The antiserum against GGCT was raised in rabbits against purified human erythrocyte GGCT or recombinant human GGCT expressed in E. coli. The antibodies were diluted (1:1000) in PBS with 1% skim milk. After a 1-h incubation with the primary antibody and washing, the membrane was incubated with peroxidase-conjugated goat antirabbit immunoglobulins (1:8000; DAKO) for 1 h and washed three times with phosphate-buffered saline. Bound antibodies were visualized by using enhanced chemiluminescence (ECL Amersham Biosciences).

Crystallization and Data Collection
Wild Type Structure-A 2-l sample of selenomethionine containing GGCT (15 mg/ml in 20 mM Tris/HCl, 0.1 mM dithiothreitol, pH 8.0) was mixed with equal volumes of ␥-Glu-Glu dipeptide (40 mM) and well solution. The well contained 1 ml of a solution comprising PEG 4000 (22.5% w/v), 100 mM sodium formate buffer, pH 3.9, 200 mM ammonium acetate. Crystals appeared as flat rods within 7 days and reached maximum size (longest edge 200 m) after 20 days. For cryoprotection, crystals were transferred to artificial mother liquor containing 32.5% PEG 4000 prior to placement in a cryostream at 100 K.
After collection and integration of peak and inflection anomalous data (Table 1), the structure was solved automatically with HKL3000 (18) as follows. Eight selenium sites were found by SHELXD (19). (This is consistent with a dimer in the asymmetric unit, since GGCT contains four methionine residues not including the NH 2 -terminal methionine.) RESOLVE (20) detected noncrystallographic 2-fold symmetry based on the selenium sites. The heavy atom search was followed by 10 cycles of solvent flattening in SHELXE (19) using map contrast and connectivity to assign the correct enantiomorph. Data between 50 and 3.0 Å resolution was used for obtaining initial MAD phases. After heavy atom refinement and phase calculation in MLPHARE (21), the mean figure of merit for phases in this resolution range was 0.27. The program DM (22) was used to improve phases through solvent flattening, histogram mapping, noncrystallographic symmetry averaging, and phase extension. The resulting electron density maps were sufficient for RESOLVE to automatically build 80% of the protein structure. The "peak" x-ray data were used for these rounds of structure refinement. All reflection data were used for electron density map generation, model building, and structure refinement.
Ala 98 Mutant Structure-Crystals were grown at 4°C. A 2-l sample of protein (15 mg/ml) was mixed with equal volumes of 20 mM ␥-Glu dipeptides (either ␥-Glu-Phe, ␥-Glu-Glu, ␥-Glu-Cys, ␥-Glu-Gly, or ␥-Glu-Ala) and well solution before being suspended over the well in a hanging drop configuration. The well solution was composed of 0.1 M potassium acetate buffer, pH 4.25, 20% PEG 4000, and 0.2 M ammonium acetate. The largest crystals grew from solutions containing ␥-Glu-Phe (longest edge 200 m) and appeared after 3 days and grew to maximum dimensions after 1 week. For x-ray data collection, crystals were cryoprotected by transfer to artificial mother liquor with PEG 4000 concentration increased to 30%. Crystals were snap-frozen using an Oxford cryostream at 100 K. Data were collected using a Mar345 Dtb image plate system mounted on a Rigaku RU200 rotating anode x-ray generator with mirrors. Data were processed with the HKL package. Structure refinement started with the previously determined wild type structure.
Gln 98 Mutant Structure-Crystals were grown at 4°C. A 4-l sample of protein (160 mg/ml) was mixed with 2 l of 100 mM ␥-Glu peptides (as for Ala 98 ) and 2 l of well solution prior to suspension over the well in a hanging drop configuration. The largest crystals grew in the presence of ␥-Glu-Cys (longest edge 300 m). The well solution was composed of 0.1 M sodium acetate buffer, pH 5.0, 30% PEG 4000, and 0.2 M ammonium acetate. Crystals appeared after 3 days and grew to maximum dimensions after 1 week.
Prior to x-ray data collection, crystals were cryoprotected by transfer to artificial mother liquor with PEG 4000 concentration increased to 35%. Crystals were snap-frozen using an Oxford cryostream at 100 K. Data were collected using a Mar 165-mm CCD at the Australian Synchrotron. Data were processed with Mosflm (23) and scaled with Scala (24). Structure refinement commenced with the Ala 98 mutant structure.
For all three models, model building and refinement was conducted using REFMAC5 (25) and O (26). Ramachandran plot statistics were derived from PROCHECK (27). Statistics for the final models are given in Table 1.

RESULTS
Identification of a GGCT cDNA Clone-Human GGCT was partially purified from human erythrocytes by ion exchange chromatography and ammonium sulfate precipitation. The presence of the enzyme was confirmed by a specific spectrophotometric assay and detection on Western blots with an antiserum raised against highly purified human erythrocyte GGCT in an earlier study (11). Although the partially purified preparation contained several components, a relatively strong ϳ21-kDa Coomassie Blue-stained band that corresponded to the most prominent band detected on Western blots was selected for further investigation. Trypsin digest and mass spectrum analysis revealed five peptides that gave 46% coverage of a previously undescribed protein known as C7orf24 (chromosome 7 open reading frame 24) (accession number Hs.530024). The C7orf24 protein appeared to be a good candidate, since its predicted molecular mass is around 21-kDa, and its position on chromosome 7 is syntenic with the previously mapped location of the mouse Ggct gene on chromosome 6 (13).
Analysis of the EST data base revealed many C7orf24 cDNA clones, and we selected one (BC019243) from human kidney for further study. The longest open reading frame within the selected EST clone was amplified and cloned in the pHUE expression vector.
Characterization of Recombinant Human GGCT-Subsequent expression and purification of recombinant protein from the C7orf24 cDNA generated a protein with a molecular size on SDS-PAGE of ϳ21 kDa, and a Western blot revealed that the expressed protein cross-reacted strongly with the original antiserum raised against purified human erythrocyte GGCT (Fig.  2). To further confirm that this protein was GGCT, we measured its activity with ␥-glutamyl-L-alanine as a substrate. The recombinant enzyme was very active and had a specific activity of 50.3 mol/min/mg. Some kinetic properties of the recombinant enzyme are shown in Table 2. Thus, as a result of the immunological cross-reactivity and the enzymatic activity, we conclude that the EST clone identified in this study encodes a functional GGCT enzyme.
The Coding Sequence of Human GGCT-The cDNA sequence and the deduced amino acid sequence of human GGCT is shown in Fig. 3. The coding region encodes a peptide of 188 amino acids and has a predicted molecular mass of 20,994 Da that is in agreement with the apparent subunit size of the recombinant enzyme determined by SDS-PAGE (Fig. 2).
where F o and F c are the observed and calculated structure factors, respectively. R-free was calculated from 5% of the diffraction data not used in refinement.
BLAST (28) searches were used to identify related sequences in publicly available EST and protein data bases. The sequences were aligned by the program ClustalW (29), and the percentage identities are shown in Table 3. Although we found related sequences in a range of species from Caenorhabditis elegans to mammals, the best matches were in higher species, and further studies will be needed to determine if the diverged proteins identified in species such as Drosophila melanogaster, Strongylocentrotus purpuratus (echinoderm), and Caenorhabditis elegans encode proteins with GGCT activity. The BLAST search did not identify any similar proteins in plants, although a protein with GGCT activity has been detected in tobacco (30).
It was particularly notable that the GGCT sequence showed no similarity to other previously described proteins of known function, including glutaminyl cyclases that catalyze the formation of pyroglutamate at the amino termini of peptides (31). Crystal Structure of Human GGCT-The initial structure of human GGCT was determined at 2.4 Å resolution by multiwavelength anomalous dispersion. The crystals were orthor-hombic and belonged to the space group P2 1 2 1 2 1. The unit cell parameters and space group (Table 1) were consistent with a dimer in the asymmetric unit with a Matthews coefficient of 2.08 Å 3 /dalton. The structure comprises a dimer with continuous electron density observable for residues 14 -182 in monomer A and residues 15-183 in monomer B as well as 34 water molecules. No electron density is apparent for the NH 2 -terminal 13 (monomer A) or 14 (monomer B) residues. Similarly, the COOH-terminal regions are disordered, with no density observed for the last 6 (monomer A) or 5 (monomer B) residues in the sequence. Nevertheless, the electron density maps corresponding to the final model are of high quality (Fig. 4), and the model statistics are satisfactory ( Table 1). The enzyme adopts a mixed ␣/␤ topology with six ␤-strands, five ␣-helices, and four short 3 10 helices (Fig. 5). Strands 1-5 form a barrel structure. Strands 2 and 3 extend beyond this barrel and twist around each other, forming a 120°elbow and reversing their topological order. The helices are loosely clustered around strands ␤1, ␤2, ␤3, and ␤6. Dimerization interactions occur across strand ␤4, which lies adjacent to the noncrystallographic 2-fold axis, forming a continuous ␤-sheet with its noncrystallographic symmetry-related partner (Fig. 5). Other dimerization interactions lie between strand ␤3 and 3 10 -helix 4. Because of the moderate resolution of the x-ray data, water molecules were built into electron density in a conservative manner, resulting in lower average B-factor for water compared with protein (Table 1).
Each GGCT monomer features an invagination formed by helices ␣1 and ␣2 and strands ␤1 and ␤2. There is electron density within this site that cannot be explained by the presence of a water molecule but cannot easily be interpreted as substrate or reaction product. We propose that this is the location of the active site. The site is lined with hydrophilic and amphipathic   residues (Fig. 6). At the COOH-terminal end of helix ␣2 lies Glu 98 , which we propose acts as a general acid/base in the reaction (Fig. 7). Two water molecules were observed to cluster around Ser 24 , and the unexplained density is found at the bottom of the pocket near Tyr 22 .
The absence of electron density for reactants and products can be explained by catalytic turnover of the substrate in the crystal and product release by the enzyme. Glu 98 is conserved in sequence homologues as distantly related as Anopheles gambiae. We mutated Glu 98 to Gln or Ala and found that both mutations completely inactivate the enzyme (Table 2). This strongly supports the contention that Glu 98 plays a significant role in catalysis and is consistent with the role of active site carboxylic residues in the aforementioned glutaminylcyclases.
To confirm that the loss of activity was not simply due to misfolding of the mutated protein, we crystallized the E98A and E98Q mutants and solved their structures. The crystals were isomorphous, with the wild type enzyme having the same space group and similar unit cell dimensions. A summary of the data is shown in Table 1. The structures of the wild type and the mutant enzymes superimpose with root mean square deviations of 0.212 Å (E98A) and 0.278 Å (E98Q) over 338 C-␣ atoms, indicating that the mutations had not significantly altered the overall structure. The E98A mutation was confirmed by

identity of putative GGCT isoenzymes identified in a range of species
The sequences were identified by BLAST searches of publicly available data bases and aligned by ClustalW. The accession numbers for the full DNA or protein sequences are BX364665 (human), CX763487 (mouse), CK475160 (rat), DT847765 (bovine), BU388085 (chicken), DN763108 (gekko), DN862974 (zebrafish); NP649038 (D. melanogaster), NP_495406 (C. elegans), and XP_798993 (echinoderm). the presence of negative peaks (Ϫ3) in the mF o Ϫ DF c difference maps over the side chain of Glu 98 (the phase angles were derived from the wild type enzyme structure). Despite the presence of ␥-Glu-Phe in the crystallization mixture, no ligand was observed in the putative active site or anywhere else on the structure. In the case of E98Q, positive peaks (3) in the mF o Ϫ DF c difference maps were present over the side chain Ala 98 (the phase angles were derived from the E98A structure). Again, despite the presence of high concentrations of the substrate ␥-Glu-Cys in the crystallization mix, no substrate was observed bound in the active site. Instead, at this resolution, acetate molecules (derived from the crystallization mixture) were observed bound in the proposed active site (Fig. 4).

Human
Substrate Binding Model-Based on the structural data and the mutagenesis of Glu 98 , we have developed a model of substrate binding shown in Fig. 8. In this model, the ␥-Glu group is buried in the proposed active site pocket with the amine nitrogen engaged in a salt bridge with Glu 98 . The carboxylate group accepts hydrogen bonds from the backbone amide groups of Tyr 22 and Gly 23 and the side chain hydroxyl of Tyr 19 . The amide oxygen in the substrate resides in an oxyanion hole formed by the side chain hydroxyl groups of Tyr 139 and Ser 24 and by the backbone amide nitrogen of Ser 24 . The carboxyl group of the second residue in the substrate engages in a salt bridge with Arg 30 . The side chain of the second residue lies in a groove on the surface of the enzyme formed by Ile 68 and Trp 64 . It is noteworthy that the ␥-Glu carboxylate and ␥-peptide group overlap closely with the acetate groups seen in the E98Q structure (Fig. 4).
Active Site Mutations-Prior to the development of our preferred substrate binding model, we made additional mutations around the proposed active site pocket. These mutant proteins (G23A, Y105F, and T125F) were kinetically characterized, and the data are presented in Table 2. The G23A mutation caused a 2-fold reduction in k cat and a 4-fold increase in the K m that results in a marked reduction in the catalytic efficiency (k cat /K m ) and in the specific activity. These changes are consistent with Gly 23 contributing to the conformation of the substrate binding pocket. Although the hydrogen bond between the Gly 23 NH and the ␥-Glu carboxyl would be maintained with the Ala 23 substitution, the additional side chain is likely to affect local conformation and substrate binding. The Y105F mutation caused a significant reduction in the k cat and a 3-fold increase in the K m . These changes result in a substantial fall in both the k cat /K m and the specific activity. In the proposed model, the Tyr 105 hydroxyl donates a hydrogen bond to the Ala 69 carbonyl oxygen group and accepts a hydrogen bond from the substrate cysteinyl NH. The significant loss of activity associated with the Y105F mutation is consistent with a prominent role in catalysis and substrate binding for the hydrogen bonds donated and accepted by the Tyr 105 hydroxyl group.  In contrast, the Y125F mutation had only minor effects on the enzyme's reaction kinetics. This is also consistent with the preferred substrate binding model. Although Tyr 125 lies in the vicinity of the active site pocket (Fig. 8), its side chain hydroxyl is not positioned where it can contribute directly to substrate binding or catalysis.
Genomic Organization of the Human GGCT Gene-A BLAST search of the human genome with the GGCT cDNA sequence revealed matching sequences on chromosomes 5, 7, and 20. The sequences on chromosomes 5 (NT_023148) and 20 (NT_011387) contain many base substitutions and do not contain introns and are therefore likely to represent reverse transcribed pseudogenes. In contrast, the sequence on chromosome 7p15-p14 (NT_079592; AACC02000087) extends over 8 kb and is divided by four introns. The position and size of the introns within the coding sequence is provided in Fig. 3 and Table 4. All of the intron boundaries conform to the GT/AG rule. The first 23 bp (underlined in Fig. 3) of the cDNA studied in detail here (BC019243) do not appear to occur in close proximity to the rest of the gene and could not be located by BLAST searches. A similar extension was also noted in a small number of other cDNA clones from other tissues, suggesting the possibility of variable splicing within the 5Ј-end of the gene. In con-trast, many ESTs contain about 18 alternate 5Ј bases that correspond exactly to the genomic sequence, suggesting that the normal start of transcription may lie around 121 bp upstream of the ATG that initiates translation (Fig. 3).
Expression of GGCT in Human Tissues-To determine the level of GGCT expression, we used the UniGene site with the EST Profile Viewer to examine the number of GGCT transcripts in different tissues represented in the human EST data base. For ease of comparison, the data in Table 5 have been standardized to show the number of GGCT transcripts/million ESTs in each tissue. Clearly, GGCT is widely expressed in many tissues. On the basis of previous studies and the proposed role of GGCT and the ␥-glutamyl cycle in the reabsorption of amino acids in the kidney, it was expected that GGCT expression would be relatively high in this tissue. However, the data suggest that expression levels in the kidney are similar to those in other major organs and tissues, such as the liver and skeletal muscle. This analysis revealed extremely high levels of gene expression in the bladder and salivary gland. The level of expression was even higher in bladder tumor, confirming the original observation. A number of other tumors in kidney, mammary gland, and prostate showed an increase in expression levels compared with normal tissue.

DISCUSSION
In this study, we have identified the hypothetical human gene known as C7orf24 (chromosome 7 open reading frame 24) as a gene encoding ␥-glutamyl cyclotransferase. This enzyme catalyzes an essential step in the ␥-glutamyl cycle (5), and its activity contributes to the excessive generation of 5-oxoproline and the associated pathology in some patients with glutathione synthetase deficiency (7). The identification of C7orf24 as the GGCT gene was based on the immunological identity of recombinant protein expressed from the corresponding cDNA and the demonstration that the recombinant protein has GGCT activity. The identification of the gene was also supported by the observation that the locus at 7p14-15 is syntenic with the region on mouse chromosome 6 where the mouse Ggct gene had been localized by linkage analysis (13). Several independent studies aimed at identifying proteins that are tumor markers have recently shown that C7orf24 is overexpressed in a range of cancers (1-3). In addition, some data indicate that C7orf24 expression is directly related to cancer cell proliferation. Our identification of C7orf24 as GGCT and the determination of its structure has opened the way for further studies of its role in cell proliferation and its evaluation as a novel cancer drug target.
The enzymatic properties of recombinant human GGCT are very similar to the properties of GGCT purified from human eryth-  rocytes ( Table 2). The higher specific activity of the recombinant protein is to be expected because of the rapidity of its purification from freshly cultured bacteria. In contrast, the naturally occurring enzyme was prepared from expired blood bank samples and required a larger number of chromatographic steps and longer time to purify (11). The subunit molecular weight of the cloned enzyme (20994 Da) is similar to the size of the enzyme purified from human erythrocytes (25,250 Da) (11). However, the human erythrocyte enzyme was reported to be a monomer, which contrasts with the dimeric structure determined by crystallography. In other studies not shown, we found that glutaraldehyde cross-linking of the recombinant protein generated dimers on SDS-PAGE, thus supporting the crystallographic structure. Further studies are clearly required to confirm the quarternary structure of GGCT in vivo.
The amino acid sequence and the crystal structure of GGCT are unlike those of any other functionally characterized protein.
The topology is unrelated to other known enzyme structures associated with cyclotransferasase-like activity, viz. papaya glutaminyl cyclase (32) and human glutaminyl cyclase (31). The fold is classified as the "BtrG-like" family in SCOP (33), a small family that includes hypothetical proteins from Mus musculus (Protein Data Bank code 1VKB), E. coli (Protein Data Bank code 1XHS), Pyrococcus horikoshii (Protein Data Bank code 1V30), and Arabidopsis thaliana (Protein Data Bank code 2G0Q). The structures of the five known BtrG-like proteins are compared in Fig. 9. Structural conservation of GGCT with these proteins spans strands ␤1-␤5. All of these proteins feature a pocket in a topologically conserved location (proposed to be the active site in GGCT); however, only 1VKB and 2G0Q have a residue equivalent to Glu 98 . In 1XHS, Glu 98 is replaced with an arginine, and there is no equivalent residue in 1V30 due to helix ␣2 being foreshortened in that structure. It is therefore unlikely that the catalytic activity of GGCT is conserved or even similar in all structural homologues. Residues Tyr 22 and Tyr 105 (GGCT sequence) line the binding pocket and are conserved in all cases. Tyr 125 is conserved or conservatively substituted (for tryptophan in 1V30). A short loop connecting strand ␤1 with helix 3 10 1 (Tyr 22 -Ser 24 in GGCT) contains backbone amide nitrogen atoms that are oriented into the pocket and interact with substrate in our model (Fig. 8). This loop is structurally conserved in all homologues. 1V30 has a 2-[N-cyclohexylamino]ethane sulfonic acid group bound in its pocket with the SO 3 group overlapping with the acetate group that interacts with Asn 22 and Tyr 25 in the GGCT E98Q structure. The 1VKB structure has a formate group bound in its pocket, corresponding to the aforementioned SO 3 and acetate groups in 1V30 and GGCT, respectively. This suggests that the pocket may have a conserved binding function in this structural family of proteins. The loops surrounding the proposed active site are poorly conserved and may reflect a diversity of function.
It is also of interest to consider the possible catalytic role of the BtrG protein that gave rise to the "BtrG-like" name that was originally proposed for this family of proteins (34). Butirosins are naturally occurring aminoglycoside antibiotics produced by Bacillus circulans. The synthesis involves the ␥-glutamylation and deglutamylation of an acyl carrier protein in what appears to be an example of naturally occurring protective group chemistry. The butirosin synthesis is catalyzed by the products of a gene cluster (btrA-btrW), and the specific function of many of these genes has been assigned (34). Although it has not been experimentally demonstrated, it has been proposed that the BtrG protein may catalyze either the transfer of an acyl chain to ribostamycin or the deglutamylation of the acyl-protein intermediate. In view of our present results, the latter reaction seems most likely, since we have shown that GGCT has a fold similar to that of the BtrG-like proteins and catalyzes the cleavage of ␥-glutamyl dipeptides.
GGCT appears to be the first protein of the "BtrG-like" folding family to have an unequivocal function assigned to it, and we propose that these structures should now be regarded as having the ␥-glutamyl cyclotransferase fold.
The glutaminyl cyclases catalyze the post-translational formation of NH 2 -terminal pyroglutamate on a number of bioactive neuropeptides, hormones, and cytokines (31). This is a reaction similar to that carried out by GGCT, but the structure of GGCT is clearly unrelated to that of the of the glutaminyl cyclases (31,32). The glutaminyl cyclases are structurally related to the zinc-dependent exopeptidases that catalyze a reaction that is also reminiscent of the cleavage of the ␥-glu-  tamyl-amino acid peptide bond catalyzed by GGCT (31). Despite their lack of structural homology with GGCT, we considered the possibility that there may be similarities in the reaction mechanism. The glutaminyl cyclases and the related zinc-dependent exopeptidases employ both a glutamyl residue as a general acid base and a zinc atom in the catalytic cycle (35,36). We identified Glu 98 within an accessible pocket as a potential active site residue. Mutation of Glu 98 to Ala or Gln completely eliminated activity, strongly suggesting that this residue is involved in catalysis. The pocket also contains some electron density that may be a reaction product but cannot be unequivocally identified at this resolution. It also seems unlikely that this electron density represents a metal ion, since there are insufficient negatively charged residues in the pocket to provide the appropriate coordination. Previous studies have indicated that guinea pig GGCT in various tissues was stimulated by optimal concentrations of K ϩ and Mg 2ϩ (37). However, human GGCT purified from erythrocytes was not stimulated or inhibited by up to 15 mM K ϩ , Mn 2ϩ , or Mg 2ϩ (11). The reaction mechanism proposed in Fig. 7 does not require a metal ion, and mutagenic studies are consistent with the view that Glu 98 is the primary catalytic residue. Despite many attempts under a variety of conditions, we were unable to obtain a structure containing a substrate or product in the active site. Based on the wild type and mutant structures as well as the mutagenesis results, we developed a model of substrate binding shown in Fig. 8. This model is consistent with the mutagenesis data and explains the absence of observed substrate binding in the mutants. In our model, the negatively charged Glu 98 is needed to complement the positively charged amine group of the substrate. Mutation to neutral Ala or Gln would require the energetically unfavorable burial of a positive charge upon substrate binding. The model also removes the need for a metal ion to stabilize the oxyanion in the intermediate. From our model, the amino group of ␥-Glu is expected to attack the Re-face of the peptide link to cysteine, creating a chiral center in the intermediate with an R-configuration. Since this is the first identification of a cDNA encoding a GGCT enzyme, we used a bioinformatics approach to search for related sequences in humans and other species. This analysis revealed related sequences in a range of species from C. elegans to mammals. It was notable that although GGCT activity has been detected in plants (30), related sequences were not evident in plants, yeast, or other microbial species. However, since the BtrG protein from B. circulans appears to catalyze a similar reaction (34) yet shares no significant sequence identity with GGCT, it is possible that the activity in plants may be derived from a protein belonging to the same folding family but with a highly diverged sequence. A previous study of GGCT from human erythrocytes revealed multiple isoenzymes that could have been the products of separate genetic loci (12). Searches of the human EST data base have only identified additional examples of the same cDNA coding sequence, suggesting that there is only one GGCT enzyme expressed in humans. This view is supported by the observation that only one functional gene could be identified within the human genome, and the sequence of exons within that gene corresponded exactly with the cDNA sequence. Although related sequences were identified on chromosomes 5 and 20, the lack of introns and the number of base changes suggests that these are nonfunctional reverse transcribed pseudogenes. It is possible that other members of the same folding family with distantly related sequences may occur and could catalyze a similar reaction. In this regard, we identified a human homologue (accession BU156875) of the mouse BtrG-like protein (1VKB) that has less than 10% identity with GGCT. Further studies are required to determine if this protein has GGCT-like activity. Alternate splicing can also result in the expression of multiple isoenzymes from a single gene. In this case, we found evidence of potential alternate splicing within the 5Ј-untranslated region (see Fig. 3), but this is unlikely to cause a change in the translated enzyme. The results so far suggest that the multiple GGCT isoenzymes found in human erythrocytes are most likely to be due to a post-translational modification. A similar post-translational modification may also occur in other species, since multiple isoenzymes were also detected in rat and mouse tissues (10,13).
Previous studies have identified GGCT in human brain (8), human erythrocytes (11), and a range of rat and mouse tissues (9,10,13). To gain a greater understanding of the tissues in which GGCT might be expressed, we analyzed the occurrence of GGCT transcripts in a wide range of tissues represented in the EST data base. This analysis (Table 5) revealed moderate levels of GGCT mRNA expression in a wide range of tissues, including the liver and kidney, which had shown the highest levels of protein expression in rats (10). In contrast, there were relatively very high (5ϫ) levels of GGCT mRNA expression in the bladder and in the salivary gland. It is not immediately obvious what these two tissues have in common that requires this high level of expression. Because GGCT is an essential step in the ␥-glutamyl cycle, we considered it likely that there may be similar levels of expression of other ␥-glutamyl cycle enzymes in the same tissues. However, there were no transcripts of ␥-glutamyltranspeptidase in the EST data base from either the bladder or the salivary gland. This observation suggests that there may be another source of ␥-glutamyl dipeptides in those tissues or that GGCT has another as yet undiscovered activity or function.
In summary, we have identified a cDNA and the gene encoding human ␥-glutamylcyclotransferase. The heterologous expression of active GGCT allowed the determination of its novel crystal structure and the preliminary analysis of its reaction mechanism. These studies will finally allow the elucidation of the role of GGCT in glutathione homeostasis and will permit the design of specific inhibitors that may be of value in the treatment of patients with glutathione synthetase deficiency or in manipulating cancer cell proliferation.