Structural Basis of Reduction-dependent Activation of Human Cystatin F*

Cystatins are important natural cysteine protease inhibitors targeting primarily papain-like cysteine proteases, including cathepsins and parasitic proteases like cruzipain, but also mammalian asparaginyl endopeptidase. Mammalian cystatin F, which is expressed almost exclusively in hematopoietic cells and accumulates in lysosome-like organelles, has been implicated in the regulation of antigen presentation and other immune processes. It is an unusual cystatin superfamily member with a redox-regulated activation mechanism and a restricted specificity profile. We describe the 2.1Å crystal structure of human cystatin F in its dimeric “off” state. The two monomers interact in a fashion not seen before for cystatins or cystatin-like proteins that is crucially dependent on an unusual intermolecular disulfide bridge, suggesting how reduction leads to monomer formation and activation. Strikingly, core sugars for one of the two N-linked glycosylation sites of cystatin F are well ordered, and their conformation and interactions with the protein indicate that this unique feature of cystatin F may modulate its inhibitory properties, in particular its reduced affinity toward asparaginyl endopeptidase compared with other cystatins.

The cystatin superfamily of proteins constitutes an important class of natural cysteine protease inhibitors present in a wide variety of organisms. Their primary targets are family C1 cysteine proteases, including plant enzymes like papain, microbial proteases, and mammalian cathepsins (1). Besides playing a major role in lysosomal protein degradation, cathepsins have been shown to have more specific functions in antigen presentation, apoptosis, and bone remodeling (2)(3)(4). Correspondingly, aberrant cathepsin function has been implicated in disease processes like inflammation, tumor invasion, or neurodegeneration (5)(6)(7)(8). To prevent inappropriate activation, cathepsin function must be tightly regulated. An important component of this regulation is provided by cystatins, inhibitory proteins that act through the formation of tight reversible complexes with cathepsins.
Mammalian cystatins are divided into three classes (9,10). Stefins (type I cystatins) are predominantly cytosolic single-domain proteins of ϳ100 amino acids. Type II cystatins are somewhat larger (ϳ120 residues) extracellular proteins that usually contain two conserved intramolecular disulfide bridges. Kininogens (type III cystatins), which are present in blood, consist of three type II-like domains. In addition to inhibiting papain-like enzymes, a subset of type II cystatins has also been shown to reversibly inhibit mammalian asparaginyl endopeptidase (AEP), 3 a family C13 cysteine protease involved in antigen processing, using a binding site distinct from the family C1 interaction site (11).
Structures have been determined for several type II cystatins, including chicken egg white (CEW) cystatin (12,13) and human cystatin D (14). In addition, several structures have been reported of three-dimensional domain-swapped forms of human cystatin C (15,16), a pathological structural state associated with amyloid formation. The "cystatin fold" adopted by all these structures consists of a five-stranded antiparallel ␤-sheet wrapped around a single ␣-helix, with conserved loops making up structurally distinct cathepsin/AEP binding sites (11,17,18).
Cystatin F is a recently identified type II cystatin found in humans and mice (19 -21) that possesses a number of unusual properties. The inhibitor shares Ͻ35% sequence identity with other type II cystatins and possesses a unique extension of ϳ6 amino acids at its N terminus. Compared with other cystatins, the protein exhibits a distinct specificity profile, binding tightly to cathepsins F, K, L, and V, less tightly to cathepsins S and H, and not inhibiting cathepsins B, C, or X (20,22). Although cystatin F can inhibit AEP, its affinity is reduced compared with other AEP binding cystatins (11). The expression of cystatin F is limited to hematopoietic cells, with the highest expression levels being observed in monocytes, dendritic cells, and certain types of T-cells (19,20). Furthermore, it has been shown that cystatin F mRNA becomes up-regulated during dendritic cell maturation (23). Taken together, these suggest a specific role for cystatin F in immune response-related processes, even though the details of this role, and indeed its primary enzyme target, remain unknown.
Two N-linked glycosylation sites make cystatin F one of only two known glycosylated human type II cystatins (20). In addition to the two disulfide bridges common to all type II cystatins, mature cystatin F has two additional cysteine residues. Either one or both of these form a third, intermolecular disulfide that allows redox potential-dependent dimerization of cystatin F (19). Cystatin F is produced as a dimer (24) that is inactive as a cathepsin inhibitor. The dimer can be activated by chemical reduction, which is accompanied by a shift from dimeric to monomeric species (22).
Here we present the crystal structure of recombinant human cystatin F. The structure reveals a disulfide-dependent inhibitor dimer with an unusually positioned ordered glycan that appears to protect the intermolecular disulfide. The structure suggests a molecular mechanism for reduction-dependent activation.

EXPERIMENTAL PROCEDURES
Cloning and Expression-Cell lines overproducing full-length, C-terminal His 5 -tagged human cystatin F were obtained by methotrexate * This work was supported in part by a Wellcome Trust senior research fellowship and the EMBO Young Investigator Program (to D. M. F. v. A.) and a Wellcome Trust Program grant (to C. W.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. selection of dihydrofolate reductase-negative Chinese hamster ovary cells transfected with a vector based on pcDNA DHFR, employing a protocol similar to that previously described for human asparaginyl endopeptidase (25). Protein Purification-Up to 10 liters of culture medium were adjusted to pH 7.5 and loaded onto a nickel-agarose column that was washed with 500 mM NaCl, 10 mM Tris, pH 7.5. Cystatin F was eluted with washing buffer containing 300 mM imidazole and then dialyzed exhaustively into 100 mM NaCl in 20 mM sodium acetate, pH 5.0. After concentration using Vivaspin 20 concentrators (Vivascience) the protein was further purified by size exclusion chromatography on a Sephacryl S-200 column (Amersham Biosciences), followed by cation exchange chromatography on a HiTrap SP XL column (Amersham Biosciences) using a 0 -1 M NaCl gradient. Purity was assessed by SDS-PAGE and matrix-assisted laser desorption ionization time-of-flight mass spectrometry.
Crystallization and Data Collection-The protein was concentrated to 25 mg ml Ϫ1 . Crystals were grown at 4°C from drops of 1 l of protein solution and 1 l of reservoir solution over a 400-l reservoir containing 9% polyethylene glycol 3350, 180 mM zinc acetate/acetic acid, pH 4.6. The space group is P4 1 2 1 2 with unit cell dimensions of a ϭ b ϭ 65.35 Å and c ϭ 123.18 Å.
Two data sets, at the peak and inflection point of the zinc K edge, respectively, were collected at BM14, European Synchrotron Radiation Facility, Grenoble, from a single crystal maintained at 100 K with 2-methyl-2,4-pentanediol as a cryoprotectant. The data were processed using the HKL suite (26) and CCP4 programs (27). Relevant statistics are given in Table 1.
Phasing, Model Building, and Refinement-Using SHELXC/D/E controlled by HKL2MAP (28), five zinc sites were found, giving phases with a figure of merit of 0.40 to 2.5 Å. ARP/wARP (29) was used to build an initial model comprising 102 protein residues. Cycles of model building using O (30) and Coot (31) and refinement with REFMAC5 (32) resulted in a final model with R work 21.7% and R free 24.1%, including 126 of 131 possible protein residues. No electron density was visible for the N-terminal 5 residues, whereas the C-terminal pentahistidine tag could be built in defined electron density. It is noteworthy that the ordered His tag, together with two aspartic acid side chains from a symmetry mate, coordinates three of the five zinc atoms found in the asymmetric unit and thus likely played an essential role in the phasing of the structure.
Carbohydrate topologies were generated with PRODRG (33) and modified manually. The final model was validated using PROCHECK (34) and WHATCHECK (35); coordinates and structure factors have been deposited in the RCSB Protein Data Bank (PDB; ID code 2CH9).

RESULTS AND DISCUSSION
Monomer Structure and Cathepsin Specificity-Human cystatin F was cloned, overexpressed, and purified from Chinese hamster ovary cell culture supernatant as will be described in detail elsewhere. The x-ray crystal structure was solved by a Zn-MAD approach and refined to R work ϭ 21.7% with one protein molecule in the asymmetric unit ( Table 1). The structure shows a typical cystatin fold: five ␤-strands forming a twisted antiparallel ␤-sheet, which wraps around an ␣-helix (Fig. 1, A and B). Excluding the ␤3-␤4 loop, the cystatin F structure is most similar to that of CEW cystatin (PDB accession number 1CEW, 84 equivalenced C␣ atoms, r.m.s.d. 1.6 Å). The only significant difference between the two structures is the positioning of the L1 and L2 loops with respect to the rest of the molecule, which most likely results from differences in crystal contacts.
Cystatin F has a unique cathepsin inhibition profile, with a clear preference for cathepsins F, K, L, and V (20,22). Interactions of cystatins with family C1 cysteine proteases are mediated by three regions, an N-terminal segment and the two hairpin loops L1 and L2 (Fig. 1B). Comparison with cystatin-protease complex structures (17,18) suggests that for cystatin F the relevant residue stretches are 35-40 at the N terminus, 81-87 (L1), and 127-133 (L2 and part of the preceding strand; cf. Fig. 1A). Local superpositions show that, as for the overall structure, cystatin F is most similar to CEW cystatin, which gives an r.m.s.d. of ϳ0.85 Å for 24 C␣ atoms. Given this degree of similarity, the differences in target profile are likely due to the properties of the amino acid side chains in the binding regions. Notably, cystatin F contains several unique basic residues, namely Lys 35 and Lys 40 in the N-terminal region and Lys 84 in the L1 loop. Correspondingly, an electrostatic surface potential calculation reveals a mostly positively charged cathepsin binding edge (Fig. 1C) in contrast to the hydrophobic binding site of CEW cystatin (not shown). Considering the surface potentials of various cathepsins, it becomes obvious why cystatin F would preferentially bind to cathepsin L with its highly negatively charged active site cleft rather than to cathepsin S, which has an almost electroneutral active site surface (Fig. 1C). Interestingly, this preference is inverted for cystatin D (36), which presents a partly negatively charged binding site (14).
Cystatin F Is a Covalent Dimer-Cystatin F contains two conserved disulfide bridges connecting residues Cys 124 from ␤4 to Cys 144 from ␤5 and Cys 99 to Cys 110 from the ␤3-␤4 loop, respectively (Fig. 1B). Interestingly, cysteines Cys 26 and Cys 63 , unique to cystatin F, form an additional intermolecular disulfide bridge across a crystallographic 2-fold axis, generating a covalent dimer ( Fig. 2A) distinctly different from the three-dimensional domain-swapped cystatin C dimers observed previously (16). The dimer interface brings the convex faces of the two ␤-sheets together, burying 2 ϫ 1780 Å 2 or ϳ20% of the combined monomer solvent-accessible surface area. It involves five pairs of direct hydrogen bonds and additional water-mediated interactions. It is a common feature of the known cystatin structures that the N terminus of the inhibitor is highly flexible both in the crystalline state and in solution, even when bound to a target enzyme. Strikingly, the longer cystatin F N terminus (Fig. 1A) results in the cystatin F structure extending by ϳ12 ordered residues adopting a coil structure and a minimal "swapped" sixth ␤-strand, which contributes two hydrogen bonds to the dimer interface ( Fig. 2A). Experimental data show that dimeric cystatin F is inactive as a family C1 protease inhibitor (22). To understand why the dimer is inactive, a model for the interaction of cystatin F with papain was constructed by superposition of cystatin F dimer on the (monomeric) stefin moiety of the crystallographic stefin B-papain complex (PDB accession number 1STF). The resulting complex shows that, whereas the binding edges of the dimer are accessible to papain, there is a significant clash between the cystatin dimer and the protease; the second cystatin F molecule, as well as the N-terminal extension of the first one, overlaps with almost the entire L domain of the cysteine protease, making productive binding impossible (Fig. 2C).
Although the extent of the interface and the observed intermolecular disulfides are compatible with cystatin F forming a stable dimer, other properties of the interface are more suggestive of a transient interaction (37): the interface is unusually hydrophilic, ϳ50% of the direct proteinprotein contacts involve polar groups on both sides. Furthermore, the interface is fragmented, lacking a distinct solvent-excluding core (Fig.  2B), with most of the interface being supplied by a single chain of amino acids, namely the cystatin F N-terminal extension. Residues 25-36 (i.e. the residues uniquely ordered in the present structure) alone contribute two-thirds of either the buried surface area or the interatomic contacts and all ten direct hydrogen bonds (Fig. 2B, dark blue regions). Including the residues up to Lys 40 , the N-terminal region participates in Ͼ90% of all intersubunit contacts; a significant part of these contacts involves Phe 38 , a residue unique to cystatin F (Fig. 1A).
Outside the N terminus few direct dimer interactions (Fig. 2B, pale green regions) are observed. In fact, the dimer structure could be described as forming a solvent-filled bowl with the N-terminal extension forming the base and two (low) sides, while the two ␤-sheets form the other two sides ( Fig. 2A looks into the bowl). Not only are there few interactions between the two cores, but according to a hydrogen bonding network analysis carried out with WHAT IF (38) potential hydrogen bonding interactions remain unsatisfied, which is energetically unfavorable compared with the solvated monomeric state. The N-terminal extension up to residue Pro 36 forms mostly polar but electroneutral contacts with the other chain, suggesting that it would be enthalpically indifferent to the dimer/(solvated) monomer transition, while the entropic gain on flexibilization of the polypeptide chain should favor the monomeric state. Taken together, this suggests that the cystatin F FIGURE 1. The structure of cystatin F. A, sequence alignment of human type II cystatins and chicken egg white cystatin. An initial alignment was made using ClustalW and then adjusted using structural alignments where available. The (predicted) signal peptide sequences have been removed. Residues are shaded by sequence identity (light gray, 62%; black, 100%). The secondary structure elements of cystatin F as assigned by DSSP (41) are indicated above the sequences (red for helices, blue for strands). The putative asparaginyl endopeptidase interaction site is indicated by a filled circle. The L1 and L2 loops are labeled, and sites of N-linked glycosylation are highlighted by open triangles. The probable regions of cathepsin interaction are boxed in green. The top line gives residue numbers for cystatin F as used in this work. In all cases residue numbers pertain to the full-length sequences. B, schematic representation of the cystatin F monomer. N-linked sugars (green) and cysteine side chains involved in disulfide bridges (orange) are shown as stick models. Panels B and C, and Figs. 2 and 3 were prepared using MOLSCRIPT (42) and Raster3D (43). C, electrostatic surface potentials for the binding sites of cystatins F and D (PDB accession number 1ROA) and the active site grooves of cathepsins L (1MHW) (44) and S (1MS6) (45), colored from Ϫ10 kT (red) to ϩ10 kT (blue). For the cystatins, binding site elements are labeled. All molecular surfaces were calculated with GRASP (46). dimer, far from being thermodynamically stable, may be "spring loaded" and ready to fall apart as soon as the stabilizing intermolecular disulfide bridges become reduced. This is in agreement with the observation that cystatin F becomes active as an inhibitor only after reduction (22).
N-linked Glycosylation Modulates Function-As predicted, cystatin F is modified by N-linked glycosylation on Asn 62 and Asn 115 . The latter site lies in a less ordered part of the cystatin F structure, and the glycan is only partly visible in the electron density. In contrast, the sugars bound to Asn 62 at the C-terminal end of helix ␣1 are significantly better ordered: both N-acetylglucosamine (GlcNAc) residues are clearly visible, and the electron density shows evidence for core fucosylation, i.e. an L-fucose residue ␣1-6 linked to the protein-proximal GlcNAc (Fig. 3A). Two of the sugars form hydrogen bonds with the protein; the side chain amide of Asn 65 donates a hydrogen bond to O4 of the fucose, while the proximal GlcNAc accepts two hydrogen bonds, one from N of Lys 59 to the acetyl oxygen, the other from the backbone amide of Thr 25Ј of the second cystatin chain to O3 (Fig. 3A). This sugar thus contributes to the dimer interface.
Strikingly, the three ordered sugars attached to Asn 62 almost completely cover the nearby intermolecular disulfide (Fig. 3, B and C). Considering that they are anchored by hydrogen bonds, it is conceivable that this arrangement helps to prevent inappropriate reduction of the cystatin F dimer. This is in agreement with the observation that unusually high concentrations of reductant are required to activate the dimeric form in vitro (22).
The N-glycosylated Asn 62 immediately precedes the loop between ␣1 and ␤2, which is thought to be involved in the interaction of human B, the dimer interface. The solvent-accessible surface of one subunit is colored to indicate the "footprint" of the second subunit (shown as a semitransparent schematic). Coloring shows interactions involving residues 25-36 from either subunit (dark blue), other interactions involving residues 37-40 from at least one subunit (light blue), and the remaining interactions involving only core residues from both subunits (pale green). C, steric hindrance prevents the dimer from binding to C1 family proteases. Left, a cystatin F molecule (cyan) has been oriented correctly for binding to papain (gray). Residues 25-36 of cystatin F have been removed for clarity. Right, reconstitution of the complete cystatin F dimer shows that the second cystatin molecule (purple) occupies the same space as the papain L domain.
cystatins with AEP (11). Cystatin F has been shown to inhibit AEP, but with a significantly reduced affinity compared with cystatins C and M or CEW cystatin (11). Nevertheless, the local conformation and sequence of the cystatin F AEP binding loop is very close to that of other higher affinity type II cystatins; superposition of all backbone and C␤ atoms for residues equivalent to Cys 63 -Met 67 gives r.m.s.d. values between 0.65 and 0.85 Å. Given that the Asn 62 glycan is unique to cystatin F (Fig. 1A), it is possible that the sugar residues are at least partly responsible for the reduced activity of cystatin F against AEP by altering the shape of the likely AEP-interacting surface. In addition, the direct hydrogen bond between the glycan and the side chain of Asn 65 (Fig. 3A), which has been suggested to be crucially involved in cystatin-AEP interactions (11), might fix the conformation of the latter residue in a manner incompatible with AEP binding.

CONCLUSIONS
The present structure of dimeric cystatin F gives new insights into many of the unusual functional properties of this type II cystatin but also raises new questions. The structure explains why dimeric cystatin F is inactive as a cathepsin inhibitor, as well as how the inhibitor becomes activated in a reducing environment, supporting the notion that cystatin F is regulated by changes in redox potential. It is possible that the endosomal cystatin F pool is activated by GILT (␥-interferon-inducible lysosomal thiol-reductase (39)), linking cystatin F activity to antigen processing.
Even so, in vitro activation of cystatin F requires unusually high concentrations of reducing agent (22). This can be rationalized by the presence of the glycan attached to Asn 62 , which covers the intermolecular disulfide crucial for maintaining the inactive conformation. It is unclear whether glycosylated cystatin F can be reduced in vivo, especially when secreted into a relatively oxidizing environment or whether additional factors are involved in regulation of activity. Not all cystatin F molecules are glycosylated at Asn 62 (40), and it is conceivable that this feature defines two species of cystatin F with different functions that, in the case of the glycosylated form, might be unrelated to cathepsin inhibition. Alternatively it is possible that the activation of cystatin F requires additional steps to make the intermolecular disulfide bridge accessible for reduction. These might include removal or remodeling of the Asn 62 glycan or modification of the protein itself, e.g. by proteolytic cleavage. In turn, these modifications are likely to alter the inhibitory profile of cystatin F with respect to AEP and/or cathepsins, making further investigation of this question important for the identification of the as yet elusive in vivo target of cystatin F.