Leukocystatin, A New Class II Cystatin Expressed Selectively by Hematopoietic Cells*

We describe a new cystatin in both mice and humans, which we termed leukocystatin. This protein has all the features of a Class II secreted inhibitory cystatin but contains lysine residues in the normally hydrophobic binding regions. As determined by cDNA library Southern blots, this cystatin is expressed selectively in hematopoietic cells, although fine details of the distribution among these cell types differ between the human and mouse mRNAs. In addition, we have determined the genomic organization of mouse leukocystatin, and we found that in contrast to most cystatins, the leukocystatin gene contains three introns. The recombinant proteins corresponding to these cystatins were expressed in Escherichia coli as N-terminal glutathione S-transferase or FLAG™ fusions, and studies showed that they inhibited papain and cathepsin L but with affinities lower than other cystatins. The unique features of leukocystatin suggests that this cystatin plays a role in immune regulation through inhibition of a unique target in the hematopoietic system.

We describe a new cystatin in both mice and humans, which we termed leukocystatin. This protein has all the features of a Class II secreted inhibitory cystatin but contains lysine residues in the normally hydrophobic binding regions. As determined by cDNA library Southern blots, this cystatin is expressed selectively in hematopoietic cells, although fine details of the distribution among these cell types differ between the human and mouse mRNAs. In addition, we have determined the genomic organization of mouse leukocystatin, and we found that in contrast to most cystatins, the leukocystatin gene contains three introns. The recombinant proteins corresponding to these cystatins were expressed in Escherichia coli as N-terminal glutathione S-transferase or FLAG™ fusions, and studies showed that they inhibited papain and cathepsin L but with affinities lower than other cystatins. The unique features of leukocystatin suggests that this cystatin plays a role in immune regulation through inhibition of a unique target in the hematopoietic system.
Cysteine proteases play many very important roles in the immune system. For instance, the de-ubiquinating enzymes are cysteine proteases, whereas lysosomal proteases are involved in antigen presentation both through the degradation of proteins to antigenic peptides and by processing the invariant chain of class II major histocompatibility complexes (1). However, the overexpression of these proteases can be detrimental to cells, as can their release into the extracellular space. Therefore, their activities in these cells are controlled by a variety of mechanisms, including the presence of macromolecular protease inhibitors.
The cystatins make up a class of very tight, reversible, competitive inhibitors of the papain family of cysteine proteases. Cystatins have been divided into four classes based upon their sequences and properties. Class I, also called the stefins, are a group of intracellular proteins of approximately 100 residues that contain no disulfide bonds. Class II cystatins are secreted inhibitors of about 120 amino acids containing two disulfide bonds. Class III cystatins, known as the kininogens, contain three domains, each of which resembles Class II cystatins; two of these domains possess inhibitory activity. Finally, Class IV cystatins constitute a poorly understood group of glycoproteins with two nonfunctional cystatin domains. The amino acid sequences and genomic structures within each family are highly conserved. Cystatins are expressed throughout the body in a tissue-specific manner. Mutations in some cystatins or alterations in the balance of these with their cognate cysteine proteases have been implicated in several diseases (2,3). Many studies, involving changes of peptide sequence, have shown that three regions of the cystatin, which form a "wedge" that can associate with the active-site cleft, are all required for tight binding to the protease. These studies have been confirmed by the crystal structure of the cystatin B-papain complex (4) and supported by other structural studies showing that chicken egg white cystatin, a Class II cystatin, has the same fold as the Class I cystatin B (5,6).
In this paper, we describe the characterization of a new Class II cystatin, leukocystatin, specifically expressed by hematopoietic cells. The unique features of the amino acid sequence suggest that the as yet unidentified target protease is not one of the commonly studied lysosomal cysteine proteases, although leukocystatin is an active inhibitor of these cathepsins. In addition, the unusual genomic structure of the mouse protein and the amino acid sequences of both the human and mouse inhibitors suggest that they are quite divergent from other Class II cystatins.

MATERIALS AND METHODS
General-Antisera to the human protein (Josman Laboratories, Napa, CA), used for protein blotting, were produced in rabbits against the peptide GFPKTIKTNDPGVLQAAR, which was synthesized on a lysine matrix (Biosynthesis, Inc., Lewisville, TX). Protein blots were detected with the Enhanced Chemiluminescent Detection System (Amersham Pharmacia Biotech). Chicken egg white cystatin was from Pan-Vera (Madison, WI). Automated DNA sequencing was performed with a DyeTerminator Cycle Sequencing Ready Reaction kit on an Applied Biosystems Prism 377 DNA sequencer (both from Perkin-Elmer). PCRs were performed on a Perkin-Elmer 9600 with a GeneAmp PCR kit (Perkin-Elmer) followed by purification with QIAquick Gel Extraction kits (Qiagen, Santa Clarita, CA). Oligonucleotides were synthesized using an Applied Biosystems 394 synthesizer (Perkin-Elmer).
cDNA Libraries-cDNA libraries are listed on the Southern blots (see Figs. 5 and 6) and were made with the SuperscriptII system (Life Technologies Inc.) as detailed elsewhere (Refs. 7 and 8 and references therein). Details are also available directly from the authors. In cases where the derivation is not obvious from the name, conditions used are listed below. Mouse libraries: 2) Braf:ER transfectant NIH3T3 cell line, ethanol-treated; 3) Mel14 bright CD4ϩ cells from spleen, polarized for 7 days with IFN-␥ 1 and anti-IL4, activated with anti-CD3 for 2, 6, or * DNAX is supported by Schering-Plough Corporation. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Identification and Characterization of Human Leukocystatin cDNA-An average of 375 base pairs of highly unambiguous sequence (EST) from individual clones from cDNA libraries 84 and 86 were determined. The sequences were analyzed for possible encoded function by BLASTN searches versus the public data bases, followed by BLASTP searches of the open reading frames (9). By this method, two leukocystatin ESTs (from a total of 1190) were identified. Both cDNAs were completely sequenced on each DNA strand and were found to be full-length.
Isolation and Characterization of Mouse Leukocystatin cDNA-160 pools of approximately 500 clones each from a mouse TH2 cDNA library (Ref. 10; Library 4) were amplified overnight to form sublibraries. Southern blots were performed on the sublibraries using a 32 P-labeled 321-base pair probe to the human sequence (see below), washing with cross-species conditions (2ϫ SSC, 0.1% SDS at 65°C). One of the 160 sublibraries showed a positive signal. A bacterial stock from this pool was plated out, and colony hybridization was conducted under the same conditions to yield several possible positive clones, one of which was selected and found to encode a full-length copy of mouse leukocystatin.
cDNA Library Southern Blots-The method presented by Bolin et al. (7) was followed. Briefly, NotI/SalI digests of 5 g of cDNA library released the cDNA inserts from the vector. Digestion reactions were run on 1% agarose gels, transferred to Nytranϩ filters (Schleicher and Schuell), and cross-linked with a UV Stratalinker 1800 cross-linker (Stratagene, La Jolla, CA). For the human blot, a 321-base pair 32 Plabeled probe was synthesized with [ 32 P]dCTP (Amersham Pharmacia Biotech) using the Rediprime system (Amersham Pharmacia Biotech). Hybridization with 1.5 ϫ 10 6 cpm/ml was performed at 60°C in Ex-pressHyb (CLONTECH, Palo Alto, CA) followed by washes in 0.5ϫ SSC, 0.1% SDS. The 343-base pair 32 P-labeled mouse probe was made using a Prime-IT II kit (Stratagene) followed by purification on a Centrisep column (Princeton Separations, Adelphia, NJ). Hybridization was performed in 0.5 M sodium phosphate, pH 7.2, 7% SDS, 0.5 mM EDTA at 65°C followed by washes in 0.1ϫ SSC, 0.1% SDS. Intensities of the bands were quantitated with a Molecular Dynamics Personal Densitometer (Sunnyvale, CA) scan of the developed x-ray film (Kodak BioMax, Rochester, NY).
Genomic DNA Sequence-A 129SV mouse genomic library (Stratagene) was screened with a 617-base pair 32 P-labeled probe (complementary to the mouse cDNA sequence) in QuikHyb (Stratagene) using conditions recommended by the manufacturer. Of approximately 10 million clones, 4 were identified as being positive. Through a series of PCRs using primers that hybridized to various portions of the leukocystatin sequence, one clone was shown to contain the entire gene. In order to fully sequence this gene, a series of PCRs (26 reactions total) were performed. The resulting overlapping fragments were sequenced in both directions, generating 12 kilobases of sequence consistent with the cDNA sequence.
Recombinant Protein Expression and Purification-The PCR primers listed in Table I were used to amplify the leukocystatin sequence with appropriate restriction sites. These amplimers were subcloned into the BamHI/NotI or BamHI/EcoRI sites of pGEX-4T-1 (Amersham Pharmacia Biotech) or HindIII/EcoRI sites of pFLAG (IBI, Eastman Kodak). The coding regions of the constructs were completely sequenced, and DNA preparations using QIA filter plasmid maxi kits (Qiagen) were made for transformation into the Escherichia coli strains used for protein expression.
The following E. coli strains were used to produce the GST fusion dithiothreitol; PMA, phorbol myristate acetate; LPS, lipopolysaccharide; CSF, colony-stimulating factor; GST, glutathione S-transferase; TH1, T helper 1 cell; TH2, T helper 2 cell; TE, 10 mM Tris hydrochloride, 1 mM EDTA, pH 8. Leukocystatin-containing fractions, as determined by Western blot, were pooled, dialyzed into 50 mM NaOAc, pH 4, and stored at 4°C. The expression of the FLAG-tagged cystatin was similar, with the following E. coli strains being used to produce the protein: human short, UT4400; human long, W3110. However, following cell harvesting, the periplasmic fraction was obtained by osmotic shock (1 h at 4°C in 50 mM Tris-HCl, pH 8, 2 mM EDTA, 20% sucrose, 0.1 mg/ml lysozyme). The volume was doubled using the above buffer, and benzonase (25,000 units/liter of extract; American International Chemical Inc., Natick, MA) was added. After incubation for 10 min, the suspension was centrifuged at 27,500 ϫ g for 45 min. The inhibitor was purified from the supernatant by chromatography over a 5-ml M2 column (Kodak Scientific Imaging Systems) and eluted with 20 mM glycine hydrochloride, pH 3. Following dilution into 20 mM sodium citrate, pH 4, those fractions containing cystatin were further chromatographed on an S-Sepharose column and eluted with a 0 3 1 M NaCl gradient in 20 mM sodium citrate, pH 4.
N-terminal amino acid sequencing (ABI 476 Protein Sequencer) was performed for all forms and agreed with the predicted sequences. The mouse short and FLAG-human long materials were quantitated by amino acid analysis (Hewlett Packard AminoQuant using the manufacturer's standards), and the concentration obtained for the FLAG-human long form agreed within 2-fold with that obtained by densitometry (Molecular Dynamics Personal Densitometer) scanning of a silverstained (Daiichi, Integrated Separation Systems, Natick, MA) 10% Novex Bis-Tris gel using lysozyme as a concentration standard. Determination of protein concentrations for the other variants was performed by densitometry of silver-stained gels using both lysozyme and the FLAG-human long cystatin as standards. Final yields of protein were as follows: mouse short, 0.57 mg/15 liters; human short, 0.012 mg/15 liters; human long, 1 mg/11 liters; FLAG-human short, 0.6 mg/15 liters; FLAG-human long, 1 mg/2 liters.
Refolding of Chicken Egg White Cystatin-2 mg of chicken egg white cystatin was concentrated to 100 l and then incubated in 1 ml of 8 M guanidinium chloride in 50 mM Tris-HCl, pH 8.25, 10 mM DTT at room temperature for 2 h. The unfolded protein was diluted into 50 mM Tris-HCl, pH 8.25, 2.5 mM reduced glutathione, 1.0 mM oxidized glutathione and incubated for 12 h at 4°C. Following dialysis into 50 mM Tris-HCl, pH 8, the refolded material was purified on a Poros reverse phase column, eluting with a 2% 3 80% acetonitrile gradient containing 0.1% trifluoroacetic acid. The fractions containing active inhibitor were quantitated as for leukocystatin.

RESULTS
Mouse and Human Leukocystatin cDNA Cloning and Sequence-Almost 1200 ESTs were determined from a cDNA library of resting and activated human dendritic cells. Of these, 67% corresponded to sequences in the GenBank TM data base with known function (as of May 1996) or to repeat elements. Included among these were the known cystatins A (three copies), B (one copy), and C (four copies). Many of the remaining sequences were identical to sequences in public EST data bases. Of the unknown sequences, two corresponded to the same 867-base pair cDNA and appeared to encode a protein related to known cystatins. Further sequence analysis of these clones confirmed this notion, revealing a full-length open reading frame corresponding to the protein we now designate leukocystatin. Two possible in-frame start codons, at nucleotide positions Ϫ66 to Ϫ63 and 1-3, are present within the N-terminal sequence (Fig. 1). We believe that the second is the one actually used in protein translation because this is the one that is conserved with the mouse sequence (see below) and is most in agreement with alignment to other cystatins. In addition, the sequence Ϫ66 to 1 would not code for a signal sequence, which would be inconsistent with the other features of this sequence, indicating that leukocystatin is a Class II cystatin (see below).
In order to isolate the mouse homolog, we screened a cDNA library of T helper 2 cells using a human leukocystatin probe and found one full-length clone. Analysis of the full-length mouse cDNA clone revealed that the mRNA is 928 nucleotides long, with a short 5Ј-untranslated region and longer 3Ј-untranslated region (Fig. 2). The predicted amino acid sequence is 73% identical to the human sequence.
Both leukocystatin sequences have all the features of a Class II cystatin and have 33-35% identity to the Class II proteins human cystatin C and chicken egg white cystatin (Fig. 3). These features include a signal sequence, the conserved glycine at position 37 (human leukocystatin numbering), the QXVXG sequence at residues 81-85, Pro 132 -Trp 133 , and the four cysteine residues (99, 110, 124, 145) that form the conserved disulfide bonds. The human sequence contains two potential glycosylation sites at positions 62 and 115, whereas the mouse has one at amino acid 62.
The human and mouse N-terminal sequences are consistent with the presence of a cleavable signal sequence as determined by PSORT (14). These leader sequences, predicted to end at Gly 19 , would be approximately the same length as for other Class II cystatins but show no significant homology to those cystatins, and this is also reflected in the overall genomic structure (see below). In fact, little of the protein preceding the conserved glycine at position 37 is similar to other Class II cystatins both in length and in sequence. The predicted mature N-terminal region of leukocystatin is unusually long, being 8 amino acids longer than chicken egg white cystatin. Although the mouse and human N termini show 54% identity to each other-roughly the same amount as mouse and human cystatin C-they are only about 22% identical to the cystatin C N termini of the same species. As noted above, the human sequence, but not that of the mouse, has two possible start codons. However, PSORT would not predict a signal sequence for the amino acids coded by Ϫ66 to 1 but would instead predict that leukocystatin is a Type II transmembrane protein.
Other interesting features of the leukocystatin amino acid sequence include additional cysteine residues, one of which may be involved in stabilizing a homodimeric form of the protein (see below), and lysine residues at positions 35 and 84. Both lysine residues lie in the putative protease binding regions and replace nonpolar residues found in all other cystatins. The possible significance of these residues is examined under "Discussion." Genomic Structure-A mouse genomic sequence corresponding to leukocystatin was found, and approximately 12 kilobases was sequenced. The mouse leukocystatin gene (deposited in the GenBank TM data base under accession no. AF031826) contains four exons (Fig. 4). The N-terminal intron represents an additional break in the sequence relative to other Class II cystatin genes, splitting the codon for Asp 24 . This first intron is particularly large, about 5 kilobases. The sites of the two C-terminal exon-intron junctions are relatively conserved in comparison with other known Class II and Class III junctions from animals. The second intron, lying between the codons for Gln 81 and Val 82 , is approximately 1.8 kilobases, whereas the third is 0.6 kilobases and is located between the codons for Arg 120 and Thr 121 . The sequences at all the junctions are in agreement with GT/AG consensus sequences.
mRNA Tissue Distribution-Using cDNA library Southern blots (7), we determined the tissue distribution of mouse and human leukocystatin mRNAs (Figs. 5 and 6). This technology utilizes high quality representational cDNA libraries in place of the more typical Northern blots and is especially useful for examining mRNA tissue distributions in cases where these mRNAs are particularly hard to attain. We have confirmed many of these results with semiquantitative PCR (8, data not shown). In an analysis that addressed 61 human and 47 rodent cell types and/or activation conditions, human leukocystatin mRNA was found mainly in resting T-cells, premonocytic cells, activated dendritic cells derived from stem cells, and some natural killer cell clones. Interestingly, however, human leukocystatin was not seen in dendritic cells derived from monocytes, only in those from stem cells. Many differences between these two cell-types have been observed by others, although it is not clear what these differences mean functionally (15). The mouse leukocystatin mRNA, in contrast, is found primarily in differentiated T-cells, although there seems to be little difference in the levels between TH1 and TH2 cells; however, little is found in naive and pre-T-cells. A moderate amount is also found in monocytes, whereas B-cells, dendritic cells, and some macrophage libraries show small amounts of cDNA corresponding to this protein. The small amounts seen in lymph nodes, thymus, and spleen probably result from resident lymphocytes. The absence of readily-detectable mouse leukocystatin mRNA in splenic and bone marrow mouse dendritic cells may reflect their lineage.
Protein Production-Proteins of two types were made in order to study the role of the N terminus in binding: the short form beginning with the conserved glycine at position 37, and the long form commencing with Gly 19 . Gly 19 is predicted to be the last amino acid of the signal sequence, and so this version should represent the complete mature protein. Recombinant leukocystatin was produced in E. coli as either an N-terminal GST or FLAG fusion. Similar methods have been used previously to express other family members (16 -20). The FLAGtagged material was isolated as soluble material from the periplasm; the GST fusion, however, was primarily recovered as inclusion bodies which had to be refolded. Although the Class I cystatin A requires refolding after expression in E. coli (21,22), no other Class II cystatin is insoluble when expressed in E. coli, even when expressed as the GST fusion (20). The mature protein was isolated following thrombin cleavage of the GST moiety or enterokinase cleavage of the FLAG tag. SDSpolyacrylamide gel electrophoresis showed the expected molecular weights and that the proteins were reasonably pure (Fig.  7). Nonreduced gels, however, suggest that the long form may exist primarily as a dimer (Fig. 8): as DTT concentrations are varied from 0 to 8 mM, this protein product exists as a dimer and as a monomer, respectively, as determined by apparent molecular weights. Amino acid sequencing and detection with a leukocystatin-specific antibody confirmed that the correct proteins were isolated. Because freeze-drying the purified protein resulted in material that could not be resolubilized, the protein was stored at pH 4.0 at 4°C.
Inhibition of Cysteine Proteases-We studied the inhibition of three cysteine proteases (papain, cathepsin B, and cathepsin L) by normal methods (11). All studies were carried out near the optimal pHs of the proteases, and also in the presence of 5 mM DTT, which may partially denature the cystatins but was necessary to maintain maximal activity of the cysteine proteases. Using this method, we confirmed that chicken egg white cystatin binds tightly to cysteine proteases: an apparent inhibition constant of 90 pM was obtained versus cathepsin B, whereas binding was too tight to papain to quantitate. Identical results were obtained following denaturing and renaturing of the chicken egg white cystatin (data not shown). These results are consistent with published values (Table II). We found that binding of leukocystatin to the cysteine proteases studied is slow and is also weaker than other Class II cystatins (Table II). In fact, we could detect no inhibition of cathepsin B activity with leukocystatin (lower limit of detection, K d approximately 200 nM), although subnanomolar inhibition constants were found versus papain and cathepsin L. Whereas the different N-terminal forms of leukocystatin had little effect on the inhibition of papain, a 10-fold increase in affinity for cathepsin L was seen with the leukocystatin long form relative to the short form. Finally, although an affinity could not be determined quantitatively, we found that a cysteine-linked dimer of the leukocystatin long form was not as effective as a papain inhibitor as the reduced form. So, although leukocystatin is a functional Class II inhibitor, its unique amino acid sequence appears to interfere with binding to the commonly assayed cysteine proteases. DISCUSSION We have discovered a novel hematopoietic cell-specific Class II cystatin from an EST analysis of human dendritic cells. This protein, which we have called leukocystatin, has all the features of a Class II cystatin, but it has some notable characteristics. For example, leukocystatin contains lysine residues at two positions that are strictly hydrophobic (residue 35) and small, noncharged (residue 84) amino acids in all other characterized cystatins. Position 35 is thought to bind to the P3 site of the target protease (4, 5), so it is possible that this lysine substitution results in an especially high affinity for a cysteine protease with this preferred specificity. Because residues 81-85 in other cystatins usually form nonspecific hydrophobic interactions with the cognate protease (4), it is likely that the contacts formed by this region may also differ from those observed previously. Supporting this, computer modeling has shown that Lys 84 would interfere with binding of leukocystatin to papain in this region (see below).
Leukocystatin contains a total of eight cysteines: the four that are conserved with other Class II cystatins, and four unique cysteines, two of which are in the leader region (Fig. 3). Conserved cysteine residues in the N-terminal portion are not seen in any other mature Class II cystatin molecule, although they do occur sporadically in other cystatin leader sequences. Two cysteine residues, in positions different from leukocystatin, also appear in the N-terminal region of each inhibitory kininogen domain, and a polymorphism in the cystatin D sequence introduces a cysteine in this area (18,23). It is possible that the two additional leukocystatin cysteines in the putative mature protein form an intrachain disulfide and provide added stability. This, however, is not supported by the evidence. Participation of Cys 63 in an intrachain disulfide could only occur if the leukocystatin structure is markedly different from chicken egg white cystatin or if the N terminus folds back because the structure of chicken egg white cystatin shows the amino acid corresponding to Cys 63 at the end of an ␣-helix, 34 Å from the N terminus. This places it far from the other leukocystatin cysteines. Furthermore, nonreduced gels indicate that the long form, which contains Cys 26 , can dimerize (Fig. 8), whereas no evidence of dimerization exists for the short form, which contains Cys 63 but not Cys 26 . Because only monomer is seen in reducing gels, this interaction is apparently mediated by an interchain disulfide, formed by Cys 26 from two different molecules. This may be similar to the case of stefin B, a Class I cystatin, which has a cysteine at position 3 that is thought to mediate dimerization (24).
The mouse leukocystatin gene contains four exons (Fig. 4), unlike most other members of the Class II cystatins, which have three (25). Soyacystatin is the only known molecule containing one cystatin domain and having a gene encoding four exons. The additional exon in that case, however, lies in a unique C-terminal extension (26). Because the amino acids encoded by the first two exons of leukocystatin are very different from other Class II cystatins, it is clear that the evolution of this region is very different from other family members. The C-terminal genomic organization, however, is similar to the other Class II cystatins, with the intron/exon boundaries being conserved. Furthermore, the N-terminal portion of leukocystatin is not similar to Class I cystatins: the Class I genomic organization is different, with the first intron lying at a position between the first and second leukocystatin introns and with the second lying between the second and third leukocystatin introns (27,28).
Several forms of leukocystatin were produced in E. coli and were active cysteine protease inhibitors. Although some of these products require refolding, the FLAG-tagged material was soluble, similar to other Class II members expressed in E. coli, including those used for comparison in Table II (16,17). Although the Class I cystatin A requires refolding following overexpression in E. coli, this was shown to have no adverse effect on activity (21,22). We further controlled for any effects that refolding may have on activity by examining denatured/ renatured chicken egg white cystatin and found no difference in activity following this step.
We determined the apparent K i values of leukocystatin with papain and cathepsin L. These are compared in Table II with published values for other Class II cystatins. K i values in the literature vary for the same cystatin-protease pair, probably due to the differing lengths of the N termini in various cystatin preparations; these residues are easily proteolyzed during isolation of native cystatins. In general, the affinity of cystatins for cathepsin B is much weaker than the binding to cathepsin L or papain, a trend that holds for the leukocystatins as well. In fact, no inhibition of cathepsin B was detected, although a reasonable apparent K i was found for chicken egg white cystatin. One possible explanation for the weaker association of leukocystatin to all examined proteases is the presence of lysine residues at positions 35 and 84, which replaces amino acids that are uncharged in every other known cystatin. Both residues are also known to be intimately involved in the binding process of these inhibitors to cysteine proteases (4), and N-terminal truncation or substitution at either of these sites can lead to dramatic decreases in the ability of these cystatins to inhibit cysteine protease activity. For instance, some substitutions in the Gln 81 -Gly 85 loop have been shown to be detrimental to binding, although position 84 was not itself modified in these studies (16). Modeling of a lysine at position 84, based upon the known complexed structure of stefin B to papain (4), indicated that this substitution would cause serious steric interactions at this interface in addition to penalties resulting from desolvation and burying the positive charge of the lysine side chain (data not shown). Furthermore, it has been shown that the residues preceding the conserved glycine 37 are important for binding because removal of these can reduce binding drastically (29 -31), in some cases largely due to slower association rates. For the leukocystatins, we have observed that the time to reach equilibrium is rather slow, taking several minutes. In our case, this may indicate a necessary conformational change in the protease and/or leukocystatin for binding to occur. Furthermore, Hall et al. (17) postulate that the residue at position 35 may be a primary determinant of protease specificity because these N-terminal residues associate with the protease binding sites (4, 5). Lindahl et al. (32) have shown that an arginine substitution in cystatin C at the position equivalent to Pro 36 can have a large impact on binding to cathepsin B or to papain and may even cause displacement of the N terminus from the protease (32). Although that position is probably more critical to tight binding to the cognate protease than the residue at 35, it demonstrates that changes in the amino acids at these positions can greatly affect the ability of cystatins to inhibit cysteine proteases. It is therefore likely that the native binding partner of leukocystatin is unlike that of the examined proteases. It is possible that the target is some as yet unidentified lysosomal protease, or even a protease from a different family. For instance, the ubiquitin-hydrolase UCH-L3 has recently been shown by x-ray crystallography to have a papain-like fold, being particularly similar to cathepsin B in the active-site cleft (33), and so may very well be inhibited by cystatins, although no evidence of this is yet in the literature. Particularly intriguing is the fact that this isozyme is primarily found in hematopoietic cells (34,35), and is specific for the RGG sequence of ubiquitin. Furthermore, a domain of kininogen has shown inhibitory activity against calpains (36), and legumain is inhibited by chicken egg white cystatin (37), demonstrating that other families of cysteine proteases can be inhibited by these sorts of structures.
In support of a unique target for the leukocystatins, we found little difference in the binding abilities of long and short forms of cystatin with papain in the presence of 5 mM DTT. Based upon the results of experiments with chicken cystatin, in which N-terminally truncated forms were found to not be as efficient inhibitors as the full-length molecules (29 -31), we would expect to see dramatic differences in the abilities of these variants to inhibit this enzyme. One possible explanation is that the 8-amino acid N-terminal extension of the longer form impedes binding, although extensions have been shown to have little effect for other cystatins (38). Furthermore, the absence of an effect of the FLAG tag supports the idea that N-terminal extensions do not influence leukocystatin binding. The unique lysine at position 35 may, however, interfere with complex formation. There is also some evidence that the long forms )} with carbobenzoxy-L-phenylalanyl-L-arginine-7-amino-4methylcoumarin (papain, catL) or carbobenzoxy-L-argininyl-L-arginine-7-amino-4-methylcoumarin (catB) as substrate at pH 6.7 (papain) or pH 5.5 (catB, catL). Published values corresponding to chicken egg white cystatin and cystatin C inhibition are included for reference. The long form corresponds to amino acids Gly 19 -His/Gln 146 ; the short form corresponds to Gly 37 -His/Gln 146 ; and FLAG forms include the N-terminal DYKDDDDKL sequence. The human long and short forms prepared by enterokinase cleavage of the FLAG-tagged material had activities identical to that prepared from the GST fusion. Chicken cystatin denatured and renatured as described under "Materials and Methods" had activity identical to the native protein.  dimerize, and this may interfere with binding. Under the assay conditions, however, a large proportion is likely to be monomeric, as evidenced by the titration shown in Fig. 8. That the interchain cysteine primarily mediates this dimerization is evidenced by the fact that activity increases for the long form with increasing DTT concentrations. If we were to assume that the dimeric form did not bind at all to the studied cysteine proteases (and that all of the inhibition resulted from the monomeric form) this would only result in changing the apparent K i by a factor of less than 2 (in favor of tighter binding), because the effective concentration would be changed by this amount.
Although there was no effect on papain inhibition, there was a 10-fold increase in binding affinity to cathepsin L with the long form, showing that at least for this particular case, the N terminus contributes to binding. This supports the idea that the various portions of cystatins are differentially involved in association to individual proteases, even though the threedimensional structures of these proteases are very similar. We would expect the native binding partner of leukocystatin to fully take advantage of the unique features in these sites.
Leukocystatin was shown by cDNA library Southern blots to be expressed selectively in hematopoietic cells. Examination of a wide variety of immune cell types suggests that the highest levels are expressed in T-cells, monocytes, and dendritic cells. Clearly, a search for a specific target protease should focus on the effector functions of these immune cell types. Currently, we are developing other tagged versions of leukocystatin, additional antibody reagents, and a mouse gene knockout to probe in depth the biological role of this novel Class II cystatin.
In conclusion, we have characterized a new Class II cystatin, termed leukocystatin, which has a novel sequence, including unique lysine residues at two important protease binding sites, and a distinct distribution in hematopoietic cells.