Structure and Function of the DUF2233 Domain in Bacteria and in the Human Mannose 6-Phosphate Uncovering Enzyme*

Background: DUF2233 domain is present in bacteria and human UCE, which is implicated in lysosomal storage disorders. Results: Functional residues in DUF2233 and UCE identified in the structure of a bacterial DUF2233 domain were investigated. Conclusion: A function for this domain in bacteria is proposed, and functional residues in human UCE were identified. Significance: This is the first structure/function study of this protein domain. DUF2233, a domain of unknown function (DUF), is present in many bacterial and several viral proteins and was also identified in the mammalian transmembrane glycoprotein N-acetylglucosamine-1-phosphodiester α-N-acetylglucosaminidase (“uncovering enzyme” (UCE)). We report the crystal structure of BACOVA_00430, a 315-residue protein from the human gut bacterium Bacteroides ovatus that is the first structural representative of the DUF2233 protein family. A notable feature of this structure is the presence of a surface cavity that is populated by residues that are highly conserved across the entire family. The crystal structure was used to model the luminal portion of human UCE (hUCE), which is involved in targeting of lysosomal enzymes. Mutational analysis of several residues in a highly conserved surface cavity of hUCE revealed that they are essential for function. The bacterial enzyme (BACOVA_00430) has ∼1% of the catalytic activity of hUCE toward the substrate GlcNAc-P-mannose, the precursor of the Man-6-P lysosomal targeting signal. GlcNAc-1-P is a poor substrate for both enzymes. We conclude that, for at least a subset of proteins in this family, DUF2233 functions as a phosphodiester glycosidase.

sequence analysis of the bacterial members of this family covering a sequence identity range of 30 -95% revealed several conserved residues located in a cleft on the surface of the BACOVA_00430 structure, indicating some involvement in its function. The BACOVA_00430 structure was used as a template for modeling the luminal region of human UCE (hUCE). Site-directed mutagenesis of hUCE based on this model confirmed the predicted functional importance of some of these conserved residues. Similar mutational analyses were performed on BACOVA_00430. These studies provide the first structure-function analysis of DUF2233 proteins.
Protein Production and Crystallization-Clones were generated using the polymerase incomplete primer extension (PIPE) cloning method (8). The gene encoding BACOVA_00430 was amplified by polymerase chain reaction (PCR) from B. ovatus ATCC 8483 genomic DNA using PfuTurbo DNA polymerase (Stratagene) and I-PIPE (insert) primers (forward primer, 5Ј-ctgtacttccagggcATGCCACAAACCGCCATAGGACGGC-3Ј; reverse primer, 5Ј-aattaagtcgcgttaCTTCTTTTCTATAATCA-ACATACTGTTG-3Ј where the target sequence is in uppercase) that include sequences for the predicted 5Ј-and 3Ј-ends of the gene encoding the full-length protein. The expression vector, pSpeedET, which encodes an N-terminal tobacco etch virus protease-cleavable expression and purification tag (MGSDKIHHHHHHENLYFQ2G), was PCR-amplified with V-PIPE (vector) primers (forward primer, 5Ј-taacgcgacttaattaactcgtttaaacggtctccagc-3Ј; reverse primer, 5Ј-gccctggaagtacaggttttcgtgatgatgatgatgatg-3Ј). V-PIPE and I-PIPE PCR products were mixed to anneal the amplified DNA fragments together. Escherichia coli GeneHogs (Invitrogen) competent cells were transformed with the I-PIPE/V-PIPE mixture and dispensed on selective LB agar plates. The cloning junctions were confirmed by DNA sequencing. Using the PIPE cloning method, the gene segment encoding residues Met 1 -Ala 31 was deleted. Expression was performed in a selenomethionine-containing medium at 37°C. Selenomethionine was incorporated via inhibition of methionine biosynthesis (9), which does not require a methionine auxotrophic strain. At the end of fermentation, lysozyme was added to the culture to a final concentration of 250 g/ml, and the cells were harvested and frozen. After one freeze/thaw cycle, the cells were homogenized and sonicated in lysis buffer (50 mM HEPES, pH 8.0, 50 mM NaCl, 10 mM imidazole, and 1 mM tris(2-carboxyethyl)phosphine HCl (TCEP)). The lysate was clarified by centrifugation at 32,500 ϫ g for 30 min. The soluble fraction was passed over nickel-chelating resin (GE Healthcare) pre-equilibrated with lysis buffer, the resin was washed with wash buffer (50 mM HEPES, pH 8.0, 300 mM NaCl, 40 mM imidazole, 10% (v/v) glycerol, and 1 mM TCEP), and the protein was eluted with elution buffer (20 mM HEPES, pH 8.0, 300 mM imidazole, 10% (v/v) glycerol, and 1 mM TCEP). The eluate was buffer-exchanged with tobacco etch virus buffer (20 mM HEPES, pH 8.0, 200 mM NaCl, 40 mM imidazole, and 1 mM TCEP) using a PD-10 column (GE Healthcare) and incubated with 1 mg of tobacco etch virus protease/15 mg of eluted protein for 2 h at ambient temperature followed by overnight incubation at 4°C. The protease-treated eluate was passed over nickel-chelating resin (GE Healthcare) pre-equilibrated with HEPES crystallization buffer (20 mM HEPES, pH 8.0, 200 mM NaCl, 40 mM imidazole, and 1 mM TCEP), and the resin was washed with the same buffer. The flow-through and wash fractions were combined and concentrated to 21 mg/ml by centrifugal ultrafiltration (Millipore) for crystallization trials.
BACOVA_00430 was crystallized using the nanodroplet vapor diffusion method (10) with standard Joint Center for Structural Genomics crystallization protocols (11,12). Sitting drops composed of 200 nl of protein solution mixed with 200 nl of crystallization solution in a sitting drop format were equilibrated against a 50-l reservoir at 277 K for 15 days prior to harvest. The crystallization reagent consisted of 0.2 M Li 2 SO 4 , 30% PEG 4000, and 0.1 M Tris, pH 8.5. Ethylene glycol was added to a final concentration of 10% (v/v) as a cryoprotectant. Initial screening for diffraction and data collection was carried out using the Stanford automated mounting system (SAM) (13) at the Stanford Synchrotron Radiation Lightsource (Menlo Park, CA). Diffraction data were collected from a rod-shaped crystal of dimensions 200 ϫ 20 ϫ 20 m and indexed in space group P6 1 . The oligomeric state of BACOVA_00430 in solution was determined using size exclusion chromatography with a 1 ϫ 30-cm 2 Superdex 200 size exclusion column (GE Healthcare) coupled with miniDAWN (Wyatt Technology) static light scattering and Optilab differential refractive index detectors (Wyatt Technology). The mobile phase consisted of 20 mM Tris, pH 8.0, 150 mM NaCl, and 0.02% (w/v) sodium azide. The molecular weight was calculated using ASTRA 5.1.5 software (Wyatt Technology).
X-ray Data Collection, Structure Determination, and Refinement-Multiwavelength anomalous diffraction (MAD) data were collected at the Stanford Synchrotron Radiation Lightsource on beamline 9-2 at wavelengths corresponding to the high energy remote ( 1 ), inflection point ( 2 ), and peak ( 3 ) of a selenium MAD experiment using the Beamline User Integrated Control Environment (BLU-ICE) (14) data collection environment. The data sets were collected at 100 K using a MarMosaic 325 charge-coupled device detector (Rayonix). The MAD data were integrated and reduced using MOSFLM (15) and scaled with the program SCALA (16). The heavy atom substructure was determined with SHELXD (17). Phasing was performed with autoSHARP (18), SOLOMON (19) (implemented in autoSHARP) was used for density modification, and ARP/ wARP (20) was used for automatic model building to 1.80 Å resolution. Model completion and crystallographic refinement were performed with the 1 data set using Coot (21) and REFMAC5 (22). The refinement protocol included the experi-mental phase restraints in the form of Hendrickson-Lattman coefficients from autoSHARP and TLS refinement with one TLS group for the whole molecule. Data and refinement statistics are summarized in Table 1 (23)(24)(25)(26).
Validation and Deposition-The quality of the crystal structure was analyzed using the Joint Center for Structural Genomics Quality Control server. This server verifies the stereochemical quality of the model using AutoDepInputTool (27), MolProbity (28), and WHATIF 5.0 (29); agreement between the atomic model and the data using SFcheck 4.0 (30) and RESOLVE (31); the protein sequence using ClustalW (32); atom occupancies using MOLEMAN2 (33); and the consistency of non-crystallographic symmetry pairs and evaluates difference in R cryst /R free , expected R free /R cryst , and maximum/ minimum B-factors by parsing the refinement log file and Protein Data Bank header. Protein quaternary structure analysis was performed using the PISA server (34). Fig. 1B was adapted from an analysis using PDBsum (35), and Figs. 1A, 2, and 3A were prepared with PyMOL (36). Atomic coordinates and experimental structure factors were deposited in the Protein Data Bank, Research Collaboratory for Structural Bioinformatics, Rutgers University, New Brunswick, NJ under accession code 3ohg.
Cell Lines and Human UCE and BACOVA_00430 Constructs-HeLa cells were obtained from the ATCC. The cells were maintained in Dulbecco's modified Eagle's medium (DMEM) supplemented with 10% FBS, 100 g/ml penicillin, and 100 units/ml streptomycin. For mutational analysis, hUCE cDNA (4) was modified by addition of a C-terminal HA tag (YPYDVP-DYA) and subcloned into the pcDNA3.1(Ϫ) expression vector (Invitrogen) using EcoRI and HindIII restriction sites. BACOVA_00430 cDNA was obtained from the Protein Structure Initiative Biology Materials Repository/DNASU (37) (Clone ID BoCD00384454). All mutations in the hUCE construct and the G288A mutation of BACOVA_00430 construct were introduced using QuikChange (Stratagene) site-directed mutagenesis protocols. For the H225Q and R227T double mutation BACOVA_00430 construct, the fragment of BACOVA between the EcoO109I site (264th base) and SpeI restriction site (612th base) was substituted with a fragment encoding Gln 225 and Thr 227 using the following primers (forward primer, 5Ј-GCA ACA GGA CCT GAA TCT AGT GCG CCT CTC CCG-3Ј; reverse primer, 5Ј-CAC ACT AGT CGG TTG GGT ATT CTG CAA ATC-3Ј).
BACOVA Protein Expression and Purification-Wild-type (WT) and mutant BACOVA constructs in DH5␣ were transformed into E. coli BL21 for the production of BACOVA protein. One milliliter of preculture was inoculated into 100 ml of LB containing 30 g/ml kanamycin. At an A 600 nm of around 1.0, 0.1% L-(ϩ)-arabinose was added, and the induction proceeded overnight. The cells were centrifuged, and the pellet was resuspended in 2 ml of lysis buffer composed of 50 mM NaH 2 PO 4 , pH 8.0, 300 mM NaCl, 10 mM Imidazole, and 0.5% Triton X-100. After sonication and centrifugation at 29,000 ϫ g for 10 min, the supernatant was incubated with 1 ml of nickelnitrilotriacetic acid-agarose (Qiagen) at 4°C for 1 h. The beads were washed three times with 1 ml of wash buffer containing 50 mM NaH 2 PO 4 , pH 8.0, 300 mM NaCl, and 20 mM imidazole and eluted with 0.7 ml of elution buffer composed of 50 mM NaH 2 PO 4 , pH 8.0, 300 mM NaCl, and 250 mM imidazole. The eluate was dialyzed against Milli-Q water. The yield was 4 -7 mg of purified protein.
Enzyme Assays-UCE assays were performed as described previously (38) using 0.41 mM [ 3 H]GlcNAc-P-␣-Me-Man (2,510 cpm/nmol). Assays using 3 mM p-nitrophenyl N-acetyl-␣-or -␤-D-glucosaminide were carried out in 50 mM citrate buffer, pH 4.5 and 0.5% Triton X-100 or 50 mM Tris maleate, pH 7.0 and 0.5% Triton X-100. The reactions were terminated by the addition of 200 l of 0.2 M Na 2 CO 3 , and the absorbance at 410 nm was measured. Inorganic phosphate release from GlcNAc-1-P was quantitated by the method of Lowry and Lopez as outlined by Leloir and Cardini (39).
Transfection, PNGase F Digestion, and Western Blot Analysis-HeLa cells cultured in 6-well plates at 37°C for 20 h (95% confluent) were transfected with 3.1 g of DNA using 7.7 l of Lipofectamine 2000 (Invitrogen). At 24 h post-transfection, the cells were solubilized with a buffer containing 0.1 M Tris, pH 8.0, 150 mM NaCl, 1% Triton X-100, and protease inhibitor mixture (Complete, Roche Applied Science). Ten micrograms of transfected HeLa cell lysates were treated with 1,000 units of PNGase F overnight at 37°C. Treated and control samples were subjected to 12% SDS-PAGE and Western blot analysis using monoclonal anti-HA antibody.

RESULTS
Structure of BACOVA_00430-The cloning, expression, purification, and crystallization of BACOVA_00430 was carried out using standard Joint Center for Structural Genomics protocols as detailed under "Experimental Procedures." The crystal structure of BACOVA_00430 was determined by MAD phasing to 1.80-Å resolution. Data collection, model, and refinement statistics are summarized in Table 1 (23)(24)(25)(26). BACOVA_00430 is present as a monomer in the crystallographic asymmetric unit. Crystal packing analysis and analytical size exclusion chromatography support a monomer as the predominant oligomerization state in solution. The final model ( Fig. 1) includes Gly 0 (which remains after cleavage of the expression and purification tag), residues 32-315 of the fulllength protein (the predicted lipoprotein signal peptide, residues 1-31, was excluded from the protein construct), one chloride ion from the purification buffer, two sulfate ions from the crystallization reagent, 17 1,2-ethanediol molecules from the cryoprotectant, and 470 water molecules. The Matthews' coefficient (40) is ϳ4.0 Å 3 /Da with an estimated solvent content of ϳ70%. The Ramachandran plot produced by MolProbity (28) shows that 96.8% of the residues are in favored regions with none in the disallowed regions.
BACOVA_00430 consist of four domains, each of which bears some resemblance to the cystatin fold (SCOP code 54402) (41), which consists of a curved antiparallel ␤-sheet wrapped around an ␣-helix (Fig. 1). The first domain, constituted by H1, ␤1, ␤2, and ␤3, resembles more closely the prototypical cystatin-like fold and is not included in the DUF2233 definition, which covers only domains 2-4 (residues 123-312) of BACOVA_00430. Interestingly, the C terminus of the bacterial protein reaches over from domain 4 and inserts its tail into Structural Comparisons-A search for other proteins of similar structure was carried out using FATCAT (42) (flexible alignment mode) against the SCOP database (43) and DALI (44). When queried using the entire BACOVA_00430 structure, FATCAT returned only two hits with significant p value scores (Ͻ0.05): human latexin (Protein Data Bank code 2bo9; p ϭ 0.0286; C␣ r.m.s.d., ϳ3 Å; sequence identity, ϳ3%) and a protein of unknown function, YpmB, from Bacillus subtilis (Protein Data Bank code 2gu3; p ϭ 0.04; C␣ r.m.s.d., ϳ2 Å; sequence identity, ϳ3%). However, in both cases, the coverage is restricted to domain 1 of BACOVA_00430 because latexin and YpmB are both ␣ and ␤ (␣ ϩ ␤) proteins belonging to the cystatin-like fold (and cystatin/monellin superfamily). No significant hits were found by FATCAT when the search was restricted to the DUF2233 domain. Similar results were obtained with a DALI search for the full BACOVA_00430 structure; all structural similarities were again limited to the N-terminal domain, which most closely resembles the prototypical cystatin-like fold. Thus, BACOVA_00430 is the first structural representative of the novel DUF2233 domain architecture.
Sequence Analysis and Putative Functional Site-Analysis of sequence conservation in the structure (Fig. 2)  Modeling Human UCE-Mammalian UCEs are highly conserved type 1 transmembrane glycoproteins of ϳ515 residues with ϳ85% sequence identity. hUCE consists of a signal peptide (residues 1-25), a propeptide (residues 26 -49), a luminal region (residues 50 -448), a single TM region (residues 449 -469), and a cytoplasmic tail (residues 470 -515) (4). A sequence profile-based search for homologs of the entire luminal portion of hUCE (residues 50 -448) using FFAS (45), which is very useful for finding remote homologs, identified a single significant hit to our BACOVA_00430 structure with a score of Ϫ39.3 (scores ϽϪ9.5 indicate less than 3% false positives) and 14% sequence identity between residues 50 -298 of hUCE (comprising ϳ62% of the luminal region) and residues 36 -282 of BACOVA_00430. When only those residues of hUCE that are included in the DUF2233-like domain definition (luminal residues 130 -325) are queried using FFAS, only one significant hit (score Ϫ58.7) is again found with ϳ20% sequence identity to residues 123-311 of BACOVA_00430. These results indicated that we could confidently use BACOVA_00430 as a template for modeling the corresponding region of hUCE. A model for hUCE residues 50 -335, which accounts for ϳ70% of the Golgi luminal region, was built using I-TASSER (46) using explicit disulfide restraints between Cys 51 -Cys 221 , Cys 115 -Cys 148 , Cys 132 -Cys 323 , and Cys 307 -Cys 314 (numbering is based on the hUCE sequence), corresponding to potential disulfide bonds identified in an earlier mass spectrometry study of a monomeric soluble construct of hUCE (47). The C-score of this final a Highest resolution shell. b R meas (redundancy-independent R merge ) ϭ ⌺ hkl (n/(n Ϫ 1)) 1/2 ⌺͉I i (hkl) Ϫ ͗I(hkl)͉͘/⌺ hkl ⌺ i I i (hkl) (24), where n is the number of observations of a given reflection. c R p.i.m. (precision-indicating R merge ) ϭ ⌺ hkl ((1/(n Ϫ 1)) 1/2 ⌺ i ͉I i (hkl) Ϫ ͗I(hkl)͉͘/ ⌺ hkl ⌺ i I i (hkl) (25,26). d R merge ϭ ⌺ hkl ⌺ i ͉I i (hkl) Ϫ ͗I(hkl)͉͘/⌺ hkl ⌺ i I i (hkl). e Typically, the number of unique reflections used in refinement is slightly less than the total number that were integrated and scaled. Reflections are excluded due to negative intensities and rounding errors in the resolution limits and cell parameters. f R cryst ϭ ⌺ʈF obs ͉ Ϫ ͉F calc ʈ/⌺ ͉F obs ͉, where F calc and F obs are the calculated and observed structure factor amplitudes, respectively. g R free as for R cryst , but for 5.0% of the total reflections chosen at random and omitted from refinement. h This value represents the total B that includes overall TLS refinement and residual B components. i ESU, estimated overall coordinate error (23). model (energy-minimized and optimized hydrogen-bonding contacts by I-TASSER) was Ϫ0.13 (C-scores usually range from Ϫ5 to 2, and a higher C-score indicates higher confidence), and the TM-score for estimated accuracy of the model (48) was 0.70 Ϯ 0.12 (a TM-score higher than 0.5 indicates correct topology of the model) with an r.m.s.d. of 6.4 Ϯ 3.9 Å compared with the template. I-TASSER using disulfide restraints produced the most complete model, although several other procedures were tested including Modeler, M4T, Swiss-Model, and HHPred (using the Protein Structure Initiative Protein Model Portal), which produced only partial models.
Despite attempts to use the disulfide restraints, the I-TASSER model does not contain all four expected disulfide bonds. The model (Fig. 3) contains a disulfide bond between Cys 132 -Cys 323 , and Cys 307 /Cys 314 and Cys 115 /Cys 148 are relatively close to each other (ϳ15 and 11 Å between C␣ atoms, respectively). Arg 328 and His 84 when mutated to Cys and Gln, respectively, are associated with persistent stuttering, and their location is visualized in the model. The major consequence of these mutations is impaired folding in the endoplasmic reticulum (ER) followed by degradation by the ER-associated protein degradation system (49). Thus, based on our model, we speculate that the impaired folding induced by these mutations could be a result of destabilization of the ␤-sheet in which His 84 resides as well as the potential to affect proper disulfide formation. In addition, three of the four N-linked glycosylation sites found by mass spectrometry in hUCE (Asn 208 , Asn 214 , Asn 296 , and Asn 366 ) are solvent-exposed in our model. The conservation of disulfide and potential glycosylation sites indicates that the registry of the alignment used for modeling is likely correct. However, because of remote homology between the template and hUCE, it is expected that the accuracy may be low in some regions of the model. Nevertheless, the model provides the first three-dimensional view of the putative functional site of UCE and helped guide the mutational analysis.
Mutational Analysis of Human UCE-Next, we tested the consequences of mutating a number of the residues located in or near the surface cavity identified in the hUCE model that appears to be a putative active site. Most of these residues are conserved across the DUF2233 family (Asn 137 , Gln 225 , Thr 227 , Arg 247 , Asn 284 , Asp 286 , Gly 287 , Gly 288 , Gly 289 , Ser 290 , Thr 320 , and Val 322 ) (Fig. 3). It was essential to take into account that hUCE is synthesized as an inactive preproenzyme that forms a tetramer in the ER and then traffics to the trans-Golgi network where the propiece is cleaved by furin to generate the active enzyme (6). Consequently, it was necessary to determine  JUNE 7, 2013 • VOLUME 288 • NUMBER 23 whether or not the mutant proteins could exit the ER and be processed by furin to ensure the mutation could be correlated with UCE activity.

Structure and Function of the DUF2233 Domain
HeLa cells transfected with plasmids encoding full-length WT or mutant hUCE containing a C-terminal HA tag were harvested, solubilized with Triton X-100, and analyzed to evaluate the effects of the mutations on cellular localization and activity of the proteins. First, aliquots were incubated with or without PNGase F to excise the N-linked glycans (high mannose and complex oligosaccharides) and then subjected to SDS-PAGE and Western blotting with anti-HA antibody. With two exceptions, N281A and V318A mutants, the untreated samples gave rise to two bands; the faster migrating band represents the ER form with high mannose glycans, and the slower migrating band represents the Golgi species with complex glycans (49) (Fig. 4). Following PNGase F treatment, the Golgi species migrated faster than the ER form, reflecting cleavage of the 24-residue propiece by furin in the trans-Golgi network in addition to the removal of the glycans. Evidence that the designation of these bands is correct is shown by the N281A and V318A mutants, which only exhibit the ER forms, with a single faster migrating band in the untreated samples and a single slower migrating band following PNGase F treatment. Exiting the ER seems to have been partially impaired for the G287A and G289A mutants, whereas all the other mutants trafficked to the Golgi and underwent furin cleavage similarly to the WT hUCE.
The remaining extract was used for hUCE activity measurements as summarized in Table 2. Among the residues in the most highly conserved patch, mutation of Asp 286 , Gly 288 , Gly 289 , and Ser 290 to Ala resulted in the complete loss of hUCE activity with either no or only partial impairment of trafficking to the Golgi and furin cleavage. The G287A mutant exhibited FIGURE 2. Residue conservation analysis. A residue conservation analysis was performed using ConSurf (54 -56) using the MAFFT (57) alignment program, the UniProt database (58) (UniProt release November 2010), and 31 unique sequences with 30 -95% sequence identity range (search range) identified in one iteration of PSI-BLAST with an expectation value cutoff of 0.001. The protein with the highest sequence identity was F7M5I1_9BACE (from Bacteroides sp. 1_1_30) at 55% (over the full-length protein; expectation value ϭ 1eϪ86), and the sequence identities of the other hits were ϳ27-45% (including hits to shorter segments from other proteins, resulting in higher sequence identity value; expectation values ranged from 1eϪ24 to 9eϪ15). A, the most highly conserved residues are shown in stick representation. The view is rotated ϳ90°anticlockwise relative to Fig. 1A. B, a putative functional site lined with the most highly conserved residues is visible on one surface of the protein (the views are rotated 180°along the vertical axis, and the left panel has approximately the same orientation as A). The surface is colored based on the conservation scale ranging from magenta (highest conservation) to cyan (most variable).
about 16% of WT hUCE activity when taking into account its partial impairment in trafficking to the Golgi, whereas the N284A mutant had only 22% of WT activity.
Among the other mutants, the N137A, R247A, T320A, and V322A mutants exhibited 11, 87, 43, and 67% of WT activity, respectively. The effect of the N281A and V318A mutations on hUCE activity could not be explored because these constructs were retained in the ER. Interestingly, when Gln 225 and Thr 227 were mutated to the residues at equivalent positions in BACOVA_00430 (His and Arg, respectively), the hUCE activity was greatly decreased to 5.8 and 0.1%, respectively, relative to WT.
Mutation of Cys 51 to Met had only a small effect on hUCE trafficking and activity (65% of WT), indicating that the Cys 51 -Cys 221 disulfide bond is not absolutely essential for folding or  Fig. 1. Of the four disulfide bonds predicted from an earlier mass spectrometry study, the model contains one disulfide bond between Cys 132 -Cys 323 , but Cys 307 / Cys 314 and Cys 115 /Cys 148 are relatively close to each other (yellow sticks). The other two free Cys residues that could potentially form a disulfide, Cys 51 and Cys 221 , are quite far apart. Cys 221 is in a 13-residue insert (residues 217-229) in the model of hUCE compared with BACOVA_00430; the conservation of two residues in this loop and the loss of activity when mutated suggest that this loop might actually be closer to the putative active site (where it could form a disulfide and move functionally important residues toward the active site region) than what is modeled here. The solvent-exposed asparagine residues that are predicted to be glycosylated based on mass spectrometry analysis are shown as red sticks (Asn 208 , Asn 214 , and Asn 296 ). The side chains of the conserved residues that represent the potential functional site are depicted as blue sticks. Residues Arg 328 and His 84 , whose mutations (R328C and H84Q) have been associated with stuttering, are shown as magenta sticks. B, structure-based sequence alignment of BACOVA_00430 and hUCE based on superimposing the crystal structure and the model using DaliLite (44), which resulted in a Z-score of ϳ31 and r.m.s.d of 1.7 Å over 252/285 C␣ residues with a sequence identity of 17%. Residues in the strictly conserved A(I/L)NLDGGGS(T/S/A)T motif in the DUF2233 family are noted by a black bar. The figure was prepared using the BOXSHADE server with identical residues shown as white letters on a black background and similar residues shown as black letters on a gray background. enzyme activity (Fig. 4 and Table 3). The double mutant C51M/ C221L also folded adequately and trafficked to the Golgi where it was cleaved by furin, but it had only 9.7 Ϯ 1.0% of WT activity. This result indicates that the C221L mutation leads to loss of enzymatic activity. By contrast, mutation of Cys 115 and Cys 132 greatly impaired folding of the enzyme as reflected by retention in the ER.
BACOVA_00430 Exhibits Low Activity toward GlcNAc-P-Man-The BACOVA_00430 protein was able to cleave GlcNAc from GlcNAc-P-Man but did so at a much slower rate than hUCE. Kinetic studies showed that its V max toward this substrate was only 1% of the UCE value, whereas the apparent K m for GlcNAc-P-Man was 10.9 versus 0.64 mM for hUCE (Table 4). Both proteins had much lower activity toward GlcNAc-1-P, and neither exhibited any activity toward p-nitrophenyl N-acetyl-␣-and -␤-D-glucosaminide substrates. This result is consistent with our previous finding that hUCE has a strong preference for substrates with the underlying phosphate in a diester linkage (50). To establish that the activity of the BACOVA_00430 protein toward these substrates was not the consequence of a contaminant in the preparation, a G288A BACOVA_00430 mutant was prepared and found to be completely inactive in this assay. Because the hUCE Q225H and T227R mutants were inactive, we tested the possibility that these residues may be important for increased activity toward GlcNAc-P-Man. However, mutating His 225 and Arg 227 of BACOVA_00430 to the corresponding residues in hUCE (Gln

TABLE 4
Activity of UCE and BACOVA_00430 toward GlcNAc-P-mannose and GlcNAc-1-P The kinetic analyses were carried out in 50 mM Tris maleate, pH 6.7, 0.5% Triton X-100 buffer containing various concentrations of substrates in a final volume of 30 l. The reactions contained 3 g of hUCE and 9 g of BACOVA_00430 for the GlcNAc-P-mannose assays and 0.5 g of UCE and 30 g of BACOVA_00430 for the GlcNAc-1-P assays. The values for GlcNAc-P-Man are the average of two determinations, and the values for GlcNAc-1-P are the average of five determinations. and Thr, respectively) also resulted in a complete loss of enzyme activity. The effects of a number of sugars, sugar phosphates, nucleotide sugars, and inorganic P i as inhibitors of the activity of BACOVA_00430 toward GlcNAc-P-Man were also investigated. All of the sugar phosphates as well as UDP-GlcNAc and inorganic P i were effective inhibitors, whereas the other nucleotide sugars and the simple sugars were very weak inhibitors (Fig. 5). This pattern differs from that observed for hUCE where GlcNAc-1-P is a more potent inhibitor than the other sugar phosphates, GlcNAc is a relatively good inhibitor, and inorganic phosphate is a weaker inhibitor. This pattern of inhibition of hUCE agrees with that reported previously for a partially purified preparation of rat liver UCE (51).

DISCUSSION
Elucidation of the crystal structure of BACOVA_00430 provides the first structural insights into proteins that contain DUF2233 domains. By focusing on the pattern and location of amino acids that are conserved within the DUF2233 family, we identified a surface cavity that is likely functionally important. Remarkably, among all known mammalian proteins, only one possesses the DUF2233 domain, namely UCE, a phosphodiester ␣-N-acetylglucosaminidase that catalyzes a critical step in the generation of the Man-6-P recognition signal on lysosomal acid hydrolases. The structure of BACOVA_00430 was used to model the catalytic domain of hUCE. This in turn provided a template for a structure-based mutational analysis and established that the highly conserved residues in the UCE surface cavity are essential for the catalytic function of the enzyme toward GlcNAc-P-Man.
The BACOVA_00430 protein exhibits only weak activity toward the GlcNAc-P-Man substrate. Kinetic analyses revealed that the apparent K m for this substrate is about 17 times higher than the corresponding value for hUCE, and the V max is only 1% of the value for hUCE. The finding that GlcNAc-1-P inhibited the enzymatic activity of BACOVA_00430 much more strongly than did GlcNAc points to a preference for the phosphorylated form of this aminosugar. This conclusion was confirmed by the fact that neither BACOVA_00430 nor hUCE has any detectable activity toward p-nitrophenyl N-acetyl-␣-or -␤-D-glucosaminide. However, both proteins exhibited poor activity toward GlcNAc-1-P, indicating that they function as phosphodiester glycosidases. Whereas BACOVA_00430 is inhibited equally by a variety of sugar phosphates, GlcNAc-1-P inhibits UCE much more than other sugar phosphates, consistent with having evolved to specifically recognize and act on GlcNAc-Pmannose. At this point, the physiological substrate(s) of BACOVA_00430 is unknown. In this regard, a number of bacterial cell wall components with sugar-P-sugar repeating structures could potentially be substrates for BACOVA_00430 and related bacterial proteins (52,53), and the sulfate from the crystallization reagents that is near Gly 272 -Gly 273 and Arg 239 / Arg 303 may mimic the phosphate from the physiologically relevant substrate. Our results strongly hint at this possibility.