Molecular Characterization of the Principal Substrate Binding Site of the Ubiquitous Folding Catalyst Protein Disulfide Isomerase*

Disulfide bond formation in the endoplasmic reticu-lum of eukaryotes is catalyzed by the ubiquitously expressed enzyme protein disulfide isomerase (PDI). The effectiveness of PDI as a catalyst of native disulfide bond formation in folding polypeptides depends on the ability to catalyze disulfide-dithiol exchange, to bind non-native proteins, and to trigger conformational changes in the bound substrate, allowing access to buried cysteine residues. It is known that the b (cid:1) domain of PDI provides the principal peptide binding site of PDI and that this domain is critical for catalysis of isomerization but not oxidation reactions in protein substrates. Here we use homology modeling to define more precisely the boundaries of the b (cid:1) domain and show the existence of an intradomain linker between the b (cid:1) and a (cid:1) domains. We have expressed the recombinant b (cid:1) domain thus defined; the stability and conformational properties of the recombinant product confirm

Native disulfide bond formation in the endoplasmic reticulum is a complex process that is rate-limiting in the biogenesis of many outer membrane and secreted proteins. Native disulfide bond formation can occur via multiple parallel pathways, and there is evidence that a large number of different gene families and redox carriers may play a role in the supply of redox equivalents for protein disulfide bond formation. What is clear is that the rate-limiting step for native disulfide bond formation in proteins that contain multiple disulfides is latestage isomerization reactions, where disulfide bond formation is linked to conformational changes in protein substrates with substantial regular secondary structure. These steps are thought to be catalyzed only by proteins belonging to the protein disulfide isomerase (PDI) family. PDI 1 was the first catalyst of protein folding identified over 40 years ago (1), but despite probably being the most widely studied protein folding catalyst, significant details of the mechanisms of action of this critical enzyme are still unclear. In all eukaryotes, there exists a species-dependent PDI family of enzymes; for example, in humans (2), ERp72, ERp57, P5, PDIp, PDIr, ERp44 (3), ERp28/29 (4), ERdj5 (5), and ERp18 (6) have been reported to date. Functional characterization and differentiation between these family members is far from complete. PDI is a multifunctional, multidomain enzyme. The domain structure of PDI has been determined by theoretical (7) and experimental (8 -11) procedures and comprises two catalytic domains, a and a, separated by two homologous non-catalytic domains, b and b, plus a C-terminal region designated as c. In addition, it has been proposed (Ref. 11 and references therein), but not substantiated, that there is a short linker region between the b and a domains.
In vitro studies indicate that PDI catalyzes all of the steps in native disulfide bond formation but that catalysis of disulfide bond isomerization is most significant over the non-catalyzed rate in a glutathione buffer that mimics the redox potential of the ER. The greatest enhancement of rate is for late-stage isomerizations, i.e. the rate-limiting steps for native disulfide bond formation. PDI is remarkable in that, to our knowledge, it appears to be able to catalyze all of the steps in native disulfide bond formation for all substrate proteins reported. It is still unclear how PDI recognizes all of these different folding states, from essentially unfolded through to substrates with quasinative conformations but lacking specific disulfide bonds, and yet does not appear to interact with correctly folded and disulfide-bonded substrate proteins. Some details have been elucidated of the roles of individual domains of PDI in the different aspects of the overall activity of PDI. It has been reported that oxidation reactions require only a single catalytic domain, simple isomerization reactions require a linear combination of a catalytic domain and the b domain, whereas all of PDI excluding the c region is required to catalyze isomerization reactions where disulfide bond formation is linked to conformational changes in protein substrates with substantial regular secondary structure (12). The requirement for the non-catalytic b domain was subsequently shown to arise due to the fact that the b domain contains the principal peptide or non-native protein binding site (13). For small peptide substrates, the b domain alone is sufficient for binding, whereas for larger pro-* The work was supported by grants from the Biosciences and Biotechnology Science Research Council (Grant 96/C 13322) (to L. W. R., P. K., and R. B. F.), the Biocenter Oulu, the Farmos Science Foundation (a personal grant to A. P.), and the Wellcome Trust. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
ʈ Present address: Dept. of Biological Sciences, University of Warwick, Coventry CV4 7AL, UK ** To whom correspondence should be addressed. E-mail: lloyd. ruddock@oulu.fi. tein substrates, the b domain is essential, but other domains contribute to substrate binding. No specificity has yet been reported for substrate binding by PDI, nor has the binding site been located within the b domain. However, for the homologue PDIp, the substrate specificity is now well defined (14,15), a fact that will greatly assist in the localization and definition of the binding site of the PDI family.
Here we confirm the existence of the putative linker region FIG. 1. Multiple sequence alignment for homology modeling including secondary structure assignments and predictions. To emphasize the structural alignment, each domain is split between panel A (N-terminal half of the domain) and panel B (C-terminal half of the domain). X-ray, secondary structure assignment as known from x-ray structure; NMR, secondary structure assignment as known from NMR structure; Pred.Prot., PredictProtein secondary structure prediction; PREDATOR 1, PREDATOR secondary structure prediction using MaxHom derived sequence data; PREDATOR 2, PREDATOR secondary structure prediction using specialized sequence data of b and b domain homologues. between the b and a domains and report the identification and characterization of the binding site within the b domain. Mutations within this site greatly reduce the binding affinity of PDI for small peptide substrates.

EXPERIMENTAL PROCEDURES
Secondary Structure Prediction-The sequence of mature human PDI was submitted to the PredictProtein server (16). PredictProtein uses the subprogram MaxHom (17) to perform a data base search for homologous sequences and to align those sequences into a multiple alignment file. The MaxHom-derived multiple alignment file contained 53 sequences, and this was used as an input to PredictProtein. The sequence data from the MaxHom-derived multiple alignment file was also used as an input for the secondary structure prediction program PREDATOR (18). The secondary structure assignments for the b and b domains of human PDI were also predicted with PREDATOR using a different input file that was specially created using the sequences of just the b and b domains as input for the generation of a multiple alignment file, which contained 35 sequence fragments.
Multiple Sequence Alignment and Homology Modeling of the PDI b Domain-The published alignments of Escherichia coli thioredoxin and the a domain of human PDI (8), of human thioredoxin and the a domain of PDI (9), and of the a and a domains of PDI (19) were combined to produce a multiple sequence alignment of E. coli and human thioredoxin and the a, b, and a domains of human PDI. In cases of conflicting data from different alignments, priority was given to the alignments from Creighton and co-workers (8,9), which are based on the structural data for PDI. The b domain of human PDI was added to this multiple sequence alignment after an initial alignment of the b and b domains using a hierarchical clustering algorithm, comparison table: BLOSUM62, gap opening penalty: 8, gap extension penalty: 1 (20). The resulting multiple sequence alignment was further refined considering experimentally derived and predicted secondary structure assignments. Homology modeling of the b domain was performed using version 4 of the homology modeling program MODELLER (21). From 10 constructed models, the one with the lowest value of the objective function was selected as the representative model.
Generation of Expression Vectors-A gene insert for an expression vector for the isolated b domain (Pro-218 -Gly-332) of PDI was generated by PCR from an existing vector encoding PDI (11), using primers that included an in-frame NdeI site 5Ј to the first codon of the gene and a BamHI site after a TAA stop codon at the 3Ј-end. The insert was cloned into pLWRP51, a modified version of pET23b (Novagen) that encodes for an N-terminal His tag in-frame with the cloned gene. The resulting gene products included the sequence MHHHHHHM-prior to the first amino acid of the domain sequence. Site-directed mutagenesis was performed as recommended by the manufacturer using the QuikChange TM site-directed mutagenesis kit (Stratagene). All plasmids generated were sequenced to ensure that there were no errors in the cloned genes (see Table I for plasmid names).
Protein Expression and Purification-Protein production was carried out in E. coli strain BL21 (DE3) pLysS grown either in LB medium or in M9 minimal medium at 37°C, 200 rpm, and induced at an A 600 of 0.3 for 3 h with 1 mM isopropyl ␤-D-thiogalactoside. Cell lysis and protein purification were carried out as per the purification of ERp18 (6). The concentration of each protein was determined spectrophotometrically using a calculated absorption coefficient, PDI b domain ( Cross-linking-Cell extracts from E. coli BL21 (DE3) pLysS were prepared by repeated freeze thawing. Bolton-Hunter 125 I labeling of ⌬-somatostatin (AGSKNFFWKTFSS) or Mastoparan (INLKALAAL-AKKIL) was performed as recommended by the manufacturer (Amersham Biosciences). Cross-linking was performed using the homobifunctional cross-linking reagent disuccinimidyl glutarate (Sigma) as described (13).
Biophysical Analysis-Far UV circular dichroism spectra were recorded on a Jasco J600 spectrophotometer. All scans were collected at 25°C as an average of eight scans, using a cell with a path length of 0.1 cm, a scan speed of 20 nm/min, a spectral bandwidth of 1.0 nm, and a time constant of 1 s. The maximal high tension voltage was 750 V. All spectra were corrected for the blank spectra with no protein added.
Fluorescence spectra were collected on a PerkinElmer Life Sciences LS50 spectrophotometer using a 1-ml cuvette. All scans were collected at 25°C as an average of four scans, with excitation at 280 nm, emission at 300 -400 nm (or 300 -350 for b), slit widths of 5 nm, a scan speed of 200 nm/min. All spectra were corrected for the blank spectra with no protein added. For the tryptophan-containing constructs, the fluorescence parameter examined to look at the effects of guanidinium chloride on protein structure was the ratio of the average fluorescence intensity of 2 nm to either side of the max for native protein to the average fluorescence intensity over the range 320 -400 nm. This parameter was chosen at it is independent of concentration and less dependent on the direct effects of guanidinium chloride on tryptophan fluorescence. (7)  This produces a b domain that is significantly longer than the homologous b domain. Based in part on partial proteolysis of bovine PDI (10), there has been a suggestion that a structured linker region exists between the b and a domains (2), but there has been no reported experimental confirmation of this.

Definition of the Boundaries of the b (Primary Substrate Binding) Domain of PDI-Previous theoretical
To define the domain boundaries of the b domain of human PDI, a sequence alignment of the homologous b and b domains of human PDI was performed. This alignment was then refined using the results from secondary structure prediction using PredictProtein (16) and PREDATOR (18) to move gaps into loop regions and to align corresponding helix and strand regions without misaligning regions of good sequence similarity. The resulting alignment, together with observed and predicted secondary structure assignments, is shown in Fig. 1. It is clear from this alignment that the b domain defined previously (11) includes 19 residues at the C terminus that form an unaligned extension of the aligned thioredoxin-like section. This apparent linker region between the b and a domains we designate as x.
To confirm the proposed domain boundaries, a PDI b construct (residues Pro-218 -Gly-332 of mature human PDI) was made. This construct and the original construct (residues Lys-213-Pro-351 of mature human PDI, now termed bx), with an N-terminal hexa-His tag to aid purification, could be expressed solubly in the cytoplasm of E. coli and purified to apparent homogeneity by a combination of immobilized metal affinity chromatography and anion exchange chromatography (data not shown). The ability to generate significant amounts of Characterization of the b Domain of PDI-To confirm that the new b domain construct is structured and retains the biological activity associated with it in the full-length protein, a variety of biophysical analyses were undertaken. Previously, the b domain of PDI was reported to include the primary substrate binding site of PDI, binding to small peptides and being essential for the binding of larger substrates, e.g. nonnative proteins (13). Both the b and the bx construct were tested for their ability to bind the 14-amino-acid test peptide ⌬-somatostatin via the standard cross-linking assay using an E. coli lysate containing the recombinantly expressed constructs. Both the b and bx constructs were able to bind ⌬-somatostatin ( Fig. 2A) and, as per full-length protein, this binding was reversible and dependent on hydrophobic interactions (Fig. 2, B and C) The far UV CD spectra of both constructs showed that both b and bx were well structured and contained a significant amount of ␣-helix and ␤-sheet. Since E. coli thioredoxin and the isolated b domain of PDI share the same ␣/␤-thioredoxin-like fold, the CD spectra of these were compared with those of the b and bx constructs and were found to contain similar features (Fig. 3A), suggesting that b also shares the thioredoxin-fold.
The b construct under non-denaturing conditions showed a max of 304 nm, consistent with the fact that the protein contains 2 tyrosine residues and no tryptophans (Fig. 3B). Upon addition of 4 M guanidinium chloride, a significant decrease in fluorescence intensity was observed with no shift in max . The bx construct contains one tryptophan, and fluorescence spectra of bx, under non-denaturing conditions, gave a max of 338 nm (Fig. 3C), indicative of a tryptophan residue being in a hydrophobic environment. Upon addition of 6 M guanidinium chloride, the fluorescence spectra of bx had a max of 356 nm, indicative of a denatured protein. Fluorescence-based denaturation curves for b and bx showed a single phase transition from the native to denatured state. From the data presented in Fig. 3D, the midpoints for denaturation were 1.65 and 2.32 M for b and bx, respectively.
The b domain of PDI contains two thiol residues (Cys-295 and Cys-326) that may be involved in modulating the binding affinity depending on redox potential (see Ref. 22, but also see Ref. 23). An Ellmans assay on purified b and bx under native and denaturing conditions indicated that both cysteines exist as free thiols and that both are relatively inaccessible to low molecular weight thiol-reactive reagents in the native state. This result was confirmed by examining the reaction of b with iodoacetate in the presence and absence of 4 M guanidinium chloride (data not shown). These data indicate that neither of the cysteine residues are likely to participate in substrate binding, nor are they likely to exchange between the dithiol and disulfide states upon the modulation of the redox potential.
Identifying the Primary Substrate Binding Site within the b Domain of PDI-Previous data (13) demonstrate that the b domain of PDI contains the primary substrate binding site of the enzyme, but there are no published data on the localization of the substrate binding site within this domain. Work in this area is hampered by the lack of a structure for any PDI or for any isolated PDI b domain. The structures of the a and b domains have been solved by NMR (8,9,24), and both show a thioredoxin fold. Since b is homologous to b, we sought to construct a model of b based on the known structure of b and to use this to identify the primary substrate binding site.
Although the NMR structure of the b domain of human PDI has been determined (9, 24), its coordinates had not been released at the time this modeling work was performed. Therefore, only the structures of E. coli thioredoxin (25), human thioredoxin (26), and the a domain of human PDI (8) could be used as template structures for homology modeling. These structures, combined with the sequence alignments for the individual domains of PDI, were used to build a model of the b domain of PDI using the homology modeling program MOD-ELLER (21). Only the thioredoxin-like section of the b-(218 -334) domain was homology-modeled.
Unsurprisingly, the final modeled structure for the b domain was very similar to that of the b domain with an overall thioredoxin-like fold. The two cysteine residues that b contains are not surface-exposed in the model. The two sulfur atoms are 6.4 Å apart, and thus, they are unlikely to form a disulfide; this is consistent with the experimental data that showed a lack of reactivity of the cysteine residues to iodoacetamide in the native state.
Although the substrate specificity of substrate binding by PDI is not known, there are strong indications (13,14) that the recognition motifs are primarily hydrophobic in nature, and substrate binding can be inhibited by small molecular weight molecules such as 2-propylphenol. Furthermore, the binding specificity of the homologous protein PDIp has been reported (14,15), and since it binds single amino acid methyl esters of tyrosine and tryptophan, the substrate binding pocket must be relatively small.
Examination no large distinct hydrophobic pockets. However, a small hydrophobic pocket could be identified close to where the active site would be in the a domain (comprising primarily the side chains of Leu-242, Leu-244, Phe-258, and Ile-272). Furthermore, a larger, more distinct pocket could be identified at the same position in a model of the b domain of PDIp. 2 Since the noncatalytic domains are thought to have arisen by gene duplication of catalytic domains, it is probable that the primary substrate binding site in the b domain would have arisen from the catalytic site and thus would be located in the same part of the thioredoxin fold. These two independent strands of evidence suggest the localization of the binding site to the same region of the b domain of PDI.
Defining the Primary Substrate Binding Site within the b Domain of PDI-To validate the localization of the binding site, a large number of mutations were made in the putative hydrophobic pocket and in spatially adjacent residues both in the isolated b domain and in full-length PDI. These mutants were expressed in E. coli. All but one of the mutants made in fulllength PDI produced soluble proteins of the expected molecular size and equivalent yields. The exception was the P245A mutation, for which no expressed protein could be seen by SDS-PAGE, suggesting that this residue is structurally important. Even under optimal expression conditions, a significant proportion (ϳ20%) of the wild type b domain was found in the insoluble fraction, and this fraction increased in some mutants to the extent that some showed no soluble expression. All of the solubly expressed mutants in PDI and in the b domain were screened for their ability to bind ⌬-somatostatin. The results indicated that mutations in the putative hydrophobic pocket significantly decreased peptide binding ( Fig. 4A and Table I), whereas mutations in many juxtaposed residues did not (Fig.  4B). Of special note was Ile-272, where all mutations made (I272W, I272A, I272Q, I272N, and I272L) significantly decreased peptide binding, with the greatest effect for any single point mutation being I272W. For ⌬-somatostatin, a good correlation was observed between the binding properties of mutants in b and in full-length PDI, with the exception of F258W, which appeared to have a much greater effect in the isolated b domain. Furthermore, a correlation was seen between the ability of mutant proteins to bind ⌬-somatostatin and to bind the unrelated 14-amino-acid peptide mastoparan (Fig. 4C), although the effects on binding were more pronounced for mastoparan, consistent with the lower affinity of wild type PDI for this substrate (27).
PDI not only binds small peptides but also binds non-native proteins such as "scrambled" RNase (scRNase). Previously, we have shown that the b domain is essential for this binding process, but that it is not sufficient, with other domains, especially a and a, contributing to binding. The ability of the I272W mutant of full-length PDI to bind scRNase was tested by a well established cross-linking-based assay. The results (Fig.  4D) show that this mutant is still able to bind scRNase but with reduced affinity when compared with the wild type protein.
Previously, mutations in the a domain of PDI have been shown to decrease peptide binding (28), but this was due to destabilization of the a domain and the probable intramolecular binding of the partially unstructured domain by the adjacent b domain (29). To ensure that the effects on substrate TABLE I Effects of mutations on ⌬-somatostatin binding pLWRP64 and pLWRP99 were generated previously (11), and the rest were generated for this study. H6 indicates an N-terminal hexa-histidine tag. The nature of the cross-linking assay is such that only qualitative results can be determined. ϩϩϩ, binding as per wild type; ϩϩ, small effect on binding; ϩ, significant effect on binding; Ϫ, very significant effect on binding. binding observed here were not due to gross structural effects, wild type and mutant proteins were compared in a variety of biophysical analyses. Screening the mutant forms of full-length PDI expressed in E. coli for protease stability revealed that although some showed a decrease in stability (implying a structural change), others including I272W behaved as per wild type (see Fig. 5A for examples). The full-length wild type protein and I272W mutant were then purified to apparent homogeneity by a combination of immobilized metal affinity chromatography and anion exchange chromatography (data not shown). The far UV CD spectra of both proteins were essentially identical (Fig. 5B). Although the fluorescence spectra of the two proteins were different (due to the presence of an extra tryptophan in the mutant), the fluorescence-based guanidinium chloride stability curves were very similar (data not shown) and showed near identical midpoint values (1.35 M for wild type, cf. 1.36 M for the mutant). These analyses all indicate that the I272W mutant has essentially the same structure as the wild type protein. DISCUSSION The ability of PDI to act as an efficient catalyst of disulfide bond formation in folding polypeptides is primarily dependent on three factors: the ability to catalyze disulfide-dithiol exchange, the ability to bind non-native proteins, and the ability to trigger conformational changes in the bound substrate to allow access to buried cysteine residues. To date, much of the focus has been on dithiol-disulfide exchange, which depends on the CXXC thioredoxin-like active site motif. This motif cycles through the disulfide, mixed disulfide, and dithiol states during many of the reactions that PDI catalyzes, and thus, both active site thiols are required for efficient catalysis of native disulfide bond formation, although some reactions may proceed with just the N-terminal thiol. The residues that lie between the two cysteines play a crucial role in determining the redox potential of the enzyme (for references, see Ref. 30) and thus influence the nature of the reactions catalyzed, e.g. oxidation, reduction, or isomerization. Other juxtaposed residues have also been implicated in modulating the redox potential and/or activity of the superfamily (see Ref. 30). Furthermore, we have recently demonstrated that the dynamical changes in the tertiary structure of the isolated catalytic a domain are required to complete the oxidative catalytic cycle (31).
The other two factors, the ability to bind non-native proteins and the ability to trigger conformational changes in the bound substrate, have received less attention. It is known that the b domain of PDI provides the principal peptide binding site of PDI but that the a and a domains also contribute to binding of misfolded proteins (13). Furthermore, although isomerization reactions require a linear combination of one catalytic domain plus b, catalysis of isomerization reactions where disulfide bond rearrangement is linked to conformational changes in the protein substrate requires all of PDI excluding the c region (12).
Here we have defined more precisely the boundaries of the b domain, localized the primary substrate binding site within this domain, and identified residues contributing significantly to the binding of peptide ligands. The alignment of b against b (Fig. 1) and the modeling of the b structure both suggested that there is a region of 19 residues between the C terminus of the b domain proper and the N terminus of the a domain, a suggestion made previously on the basis of partial proteolysis data (10). We have now expressed this better defined b domain and shown that it constitutes a well defined domain, on the basis of expression yield and the spectroscopic properties and protease resistance of the purified recombinant domain. We have not characterized the "linker" or x region in any detail, but the properties of the bx construct (e.g. the fluorescence denaturation titration and wavelength of the maximum fluorescence emission) support previous suggestions that x is a region of defined structure, not an exposed protease-sensitive loop.
Within the b domain as now more rigorously defined, the proposed ligand binding site is a small hydrophobic pocket, defined by the residues Leu-242, Leu-244, Phe-258, and Ile-272. Mutation of any of these residues influences peptide binding, with the greatest effect being seen for Ile-272. Surprisingly, substrate binding is very sensitive to mutations at this position with even the highly conservative mutation I272L having an effect on peptide binding. Furthermore, substantive biophysical analysis of full-length PDI and the PDI I272W could reveal no indications of any alterations in the structure or stability of the protein, indicating that the result is a direct effect on the substrate binding site and not an indirect structural effect. The effect on the binding of small peptides is much greater than the effect on the binding of non-native protein substrates (such as scRNase), consistent with the previous observation that the a and a domains also contribute to the binding of misfolded proteins (13).
The binding site in the b domain is located in a position homologous to the position of the active site in the catalytic domains. Specifically, Leu-242 is in an analogous position within the thioredoxin fold as Glu-30 in the a domain, whereas Ile-272 is in an analogous position to Lys-64 in the a domain. Glu-30 and Lys-64 form a salt bridge that is buried under the active site in the NMR structure. Glu-30 is analogous to Asp-26 in thioredoxin, which has been shown to be important in its catalytic ability (32,33). This is consistent with the idea that the multidomain structure of PDI arose through gene duplication events of catalytic domains (34).
The small size of the putative binding pocket is unsurprising since the specificity for substrate binding for the PDI family member PDIp has been reported to be a single amino acid (Tyr or Trp with no adjacent negative charge, see Ref. 14). Preliminary studies on the specificity of PDI suggest that again a single amino acid may provide the primary motif, but that contextually, the situation is more complex than for PDIp. 3 It is likely that recognition of a non-native protein by this site arises both from binding a hydrophobic side chain and from hydrogen bonding with the substrate peptide backbone (a combination of the two of these would be diagnostic for an unfolded protein). Consistent with this hypothesis, the model indicates that there is significant unsatisfied hydrogen bonding potential around the binding site in the b domain, from both backbone and side chain atoms. Elucidation of the individual roles for each potential hydrogen bond donor/acceptor would require substantive quantitative comparison of binding by multiple substrates in multiple mutants and/or the solution of the NMR structure of a b-peptide complex, both of which are beyond the scope of this publication.
A mechanistic understanding of the modes of action of the PDI family is still a long way off, but the identification and modulation of the substrate binding site opens up the possibil-ity for new avenues of research. Understanding the mechanisms of action of the enzymes involved should allow for the rational manipulation of these systems to improve the efficiency of eukaryotic cells as factories for the production of high value secreted proteins of biotechnological or biomedical importance.