Comprehensive Analysis of the Factors Contributing to the Stability and Solubility of Autonomous Human VH Domains*

We report a comprehensive analysis of sequence features that allow for the production of autonomous human heavy chain variable (VH) domains that are stable and soluble in the absence of a light chain partner. Using combinatorial phage-displayed libraries and conventional biophysical methods, we analyzed the entire former light chain interface and the third complementarity determining region (CDR3). Unlike the monomeric variable domains of camelid heavy chain antibodies (VHH domains), in which autonomous behavior depends on interactions between the hydrophobic former light chain interface and CDR3, we find that the stability of many in vitro evolved VH domains is essentially independent of the CDR3 sequence and instead derives from mutations that increase the hydrophilicity of the former light chain interface by replacing exposed hydrophobic residues with structurally compatible hydrophilic substitutions. The engineered domains can be expressed recombinantly at high yield, are predominantly monomeric at high concentrations, unfold reversibly, and are even more thermostable than typical camelid VHH domains. Many of the stabilizing mutations are rare in natural VH and VHH domains and thus could not be predicted by studying natural sequences and structures. The results demonstrate that autonomous VH domains with structural properties beyond the scope of natural frameworks can be derived by using non-natural mutations, which differ from those found in camelid VHH domains. These findings should enable the development of libraries of synthetic VH domains with CDR3 diversities unconstrained by structural demands.

It was long believed that all antibodies are composed of heavy and light chains and consequently that the simplest antigenbinding unit is the variable fragment (Fv), 2 a heterodimer of heavy and light chain variable domains (V H and V L , respectively) (1). This supposition has held true for most species, but members of the Camelidae family have proven to be notable exceptions (2,3). Camelids produce not only conventional antibodies but also "heavy chain antibodies" that are devoid of light chains (4). Consequently, the antigen-binding site of a heavy chain antibody is contained within a monomeric variable domain (V H H, for heavy chain variable domain of a heavy chain antibody), which has evolved to be autonomously stable in the absence of a light chain partner.
The simplicity of the V H H domain architecture has attracted interest from protein engineers concerned with developing novel antibodies for biotechnological and therapeutic applications. V H H domains are well suited for use as modular building blocks that can be linked together to assemble multivalent or bispecific antibodies (5). The third complementarity determining loop (CDR3) of camelid V H H domains is often unusually long, and it has been suggested that these protruding loops may be especially effective for targeting active site clefts (6) and cryptic viral epitopes (7). Furthermore, because of their small size, V H H domains are rapidly cleared in vivo and exhibit enhanced tumor penetration, and these properties may be useful for the delivery of toxins for therapy or radioisotopes for imaging (8 -11). Also, unlike most antibody fragments, thermodenaturation of most V H H domains is a fully reversible process, making them well suited for applications where transient heating may occur (12). Thus, considerable effort has been expended on elucidating the structural basis for the autonomous stability and solubility of V H H domains, with the ultimate goal of using these insights to engineer autonomous versions of conventional V H domains.
For a conventional V H domain, the major consequence of an absent light chain is the exposure of hydrophobic side chains that are normally buried within the interface. In addition, thermodynamic stability is usually compromised because the stability of most V H domains is augmented by contacts at the light chain interface (13). The elucidation of a number of camelid V H H domain structures has provided insights into how these challenges to structural integrity are accommodated (6, 14 -19). There are a number of hallmark sequence changes in V H H domains relative to conventional V H domains, which function to reduce the hydrophobicity of the former light chain interface. The most common and important changes, dubbed the "V H H tetrad," occur at four positions in the second framework region (2,3,20,21). In most cases, Glu-44 and Arg-45 are found in place of the less hydrophilic Gly-44 and Leu-45 of V H domains, and hydrophilicity at position 47 is also increased by substitution of Trp with smaller residues. The effect of the fourth change is less obvious from the primary sequence alone, as it involves replacement of Val-37 by a more hydrophobic Phe (or sometimes Tyr). However, structural analysis reveals that the Phe/Tyr residue at position 37 of V H H domains nucleates a small hydrophobic core, which also typically involves Tyr-91, Trp-103, the aliphatic portion of Arg-45, and hydrophobic residues in the CDR3 loop. The overall effect of these changes is to increase the hydrophilicity of the former light chain interface, either by direct substitution with more hydrophilic side chains or by sequestration of hydrophobic side chains from solvent by interactions between CDR3 and framework residues (6, 14 -16, 19).
Insights gained from the analysis of camelid V H H domains have been applied to efforts aimed at developing autonomous versions of human V H domains. Even prior to the elucidation of V H H domain three-dimensional structures, attempts were made to produce soluble V H domains by a "camelization" process, which involves transferring elements of the V H H tetrad to a human framework (22). It was found that the incorporation of three hydrophilic substitutions (G44E/L45R/W47G) improved solubility, but protein yields and thermodynamic stability were severely compromised. Substitution by Ile rather than Gly at position 47 improved yields and stability but reduced solubility somewhat. Nevertheless, the second variant was soluble enough to enable elucidation of the solution structure by NMR, albeit only in the presence of detergent (23,24). Frameworks of this type were used to construct phage-displayed libraries of random CDR3 sequences that yielded soluble V H domains with novel antigen specificities (25). The structure of the camelized V H domain revealed deformations in the framework ␤-sheet, which are likely responsible for reduced stability and protein yields (23). Subsequent attempts at camelization have also been only partially successful because it has been found that the substitutions of the V H H tetrad do not completely alleviate aggregation (26 -28). Thus, it was concluded that, although the camelization strategy yielded moderately stable autonomous V H domains, additional factors must be responsible for the stability and solubility of camelid V H H domains (23).
In retrospect, it appears that at least one of these additional factors is the requirement for a CDR3 loop that can supply stabilizing hydrophobic interactions with the framework. The importance of this additional requirement has been underscored by a study that showed that random CDR3 loops selected for stabilization of a llama V H H domain contained hydrophobic residues at key positions, and structural analysis showed that a selected CDR3 loop engaged in hydrophobic packing interactions that were similar to those of natural camelid CDR3 loops (29). Taken together, these results demonstrate a structural role for CDR3 in the stability and solubility of V H H domains, which places additional constraints on the CDR3 sequence. In practical terms, these constraints must be taken into account in the design of camelization strategies, and this limits the diversity of CDR3 sequences that are compatible with the camelid V H H framework.
Despite the importance of the V H H tetrad for stabilization of camelid V H H domains, several studies have discovered autonomous V H domains, from camelids and other species, which do not contain these hallmark sequences. Indeed, even prior to the discovery of camelid heavy chain antibodies, it was found that certain mouse V H domains can function autonomously (30). More recently, an unusual mouse V H domain containing a hydrophilic substitution at the former light chain interface (G44K) has been shown to exist as a stable monomer in solution (31). Surprisingly, a phage-displayed library derived from llama yielded seemingly conventional V H domains, which lacked elements of the V H H tetrad but nonetheless could be purified as stable and highly soluble monomers (32). Furthermore, phagedisplayed selections have shown that human V H domain libraries contain a substantial number of autonomous V H domains with framework sequences typical of conventional antibodies (33,34). Taken together, these studies suggest that most vertebrate species likely contain some V H domains that can function autonomously. Although the high sequence diversity among these unusual subpopulations has made it difficult to discern general rules for autonomous V H domain stabilization (32)(33)(34), two structural studies have provided clues to the mechanisms whereby autonomous V H domains lacking the V H H tetrad are stabilized and solubilized (35,36).
The solution structure of a llama V H domain (BrucD4-4) is similar to those of V H H domains, as the CDR3 loop folds over Val-37 and thus reduces the hydrophobicity of the former light chain interface (36). However, this interface may be transient because of the conformational flexibility of CDR3, and overall, the hydrophobicity of the BrucD4-4 surface lies between that of conventional V H domains and autonomous V H H domains. The crystal structure of an anti-lysozyme human V H domain (HEL4) reveals that, although the domain is monomeric in solution, the protein crystallizes as a dimer (35). HEL4 was derived from a germ line framework by mutations restricted to the three CDR loops, and the structure revealed that a key change in CDR1 is partially responsible for the favorable solution properties of HEL4. It appears that the substitution of His-35 by Gly produces a cavity that serves to bury the side chain of Trp-47, which is flipped relative to its orientation in structures of V H /V L pairs, and this reorganization decreases the hydrophobicity of the former light chain interface. However, because the former light chain interface and CDR3 are involved in crystal packing interactions, it is unclear from the structure whether Trp-47 assumes the same conformation in solution and also whether or not the CDR3 loop plays a stabilizing role similar to that observed in V H H domains.
All of the studies reported to date have relied on insights gained from natural antibodies to elucidate features contributing to the autonomous nature of camelid V H H domains and some V H domains. Camelid V H H domains have been studied (4, 6, 14 -19), features of V H H domains have been engineered into V H frameworks (22)(23)(24)(25)37), or subpopulations of autonomous V H domains have been isolated from libraries of natural frameworks (32)(33)(34)(35)(36). These studies have revealed that, in general, the stability and solubility of these domains are dependent on CDR3, and this places significant constraints on the types of CDR3 loops that are compatible with autonomous structural integrity. This limitation poses considerable problems for the engineering of autonomous V H domains with novel functions, as CDR3 is the major contributor to most antigen-binding sites, and it has been well established that the success of in vitro antibody libraries depends on the use of highly diverse CDR3 loops (38 -42). Thus, for the purposes of antibody engineering, it would be most advantageous to use V H domain scaffolds that exhibit the autonomous behavior of camelid V H H domains, but they do so without reliance on the CDR3 loop.
Here we have gone beyond the limits of natural antibody diversity to explore the issue of autonomous V H domains from a strictly biophysical perspective. Using combinatorial selections followed up by conventional biophysical methods, we scanned the entire former light chain interface and CDR3 to assess the compatibility of different framework and CDR3 combinations with autonomous stability. With this comprehensive approach, we confirm that the evolution of autonomous V H domains from V H /V L pairs depends on mutations that are well tolerated structurally and increase the hydrophilicity of the former light chain interface. However, we find that many sequences that improve the stability and solubility of autonomous V H domains are rare in natural V H H and V H domains. Furthermore, we isolated numerous autonomous V H domains whose structural stability does not depend on the CDR3 loop. Thus, our findings demonstrate that autonomous V H domains with structural properties beyond the scope of natural frameworks can be derived by the incorporation of substitutions that are rare in nature but, nonetheless, are structurally compatible with the V H domain fold. These results open the door to the development of libraries of synthetic V H domains with CDR3 diversities unconstrained by structural demands.

EXPERIMENTAL PROCEDURES
Library Construction and Analysis-For the phage display of VH-4D5, a phagemid (pPAB43778) was constructed from a previously described phagemid (pS1602) (43) by inserting a DNA fragment encoding VH-4D5 in place of the region encoding human growth hormone. Phage-displayed libraries were constructed using methods described previously (29,44,45). For library A, NNS degenerate codons were used (N ϭ A/G/ C/T and S ϭ G/C). For library B, degenerate mutagenic oligonucleotides contained ϳ70% of the wild-type nucleotide and 10% each of the three other nucleotides at each randomized position. For shotgun alanine scanning, codons were designed as described (46). For quantitative saturation scanning, NNK codons were used (K ϭ G/T), and two separate libraries were constructed; one library randomized positions 35, 39, 45, and 50, and the other randomized positions 37, 44, 47, 91, and 105.
Library phage pools were cycled through rounds of binding selection against protein A (Sigma) immobilized on NUNC 96-well Maxisorp immunoplates, as described (29,44). Bound phage were eluted with 0.1 M HCl, and the eluant was neutralized with 1.0 M Tris base. Eluted phage were amplified in Escherichia coli XL1-blue (Stratagene) with the addition of M13-KO7 helper phage (New England Biolabs) and used for further rounds of selection. Following selection, individual clones were subjected to DNA sequence analysis, as described (29).
For shotgun alanine scanning, quantitative saturation scanning, and the analysis of library B, the sequences were analyzed with the program SGCOUNT as described (46 -48). SGCOUNT aligned the sequences, tabulated the occurrence of each amino acid at each position, and removed any duplicate clones with identical sequences at all mutated positions. For alanine scanning, ϳ100 clones were sequenced for each V H domain. For the analysis of library B, ϳ400 clones were sequenced after selection for binding to protein A, and ϳ200 clones were sequenced from the naive library to determine the nucleotide distribution prior to selection. For saturation scanning, ϳ200 clones were sequenced from each library, and the data were normalized for codon bias in the NNK degenerate codon (e.g. the NNK codon contains three unique codons for Arg, and thus, the occurrence of Arg was divided by 3).
Protein Purification-To purify protein for biophysical analysis, phage display vectors were modified by the insertion of an amber stop codon between the sequence encoding the V H domain and the phage coat protein. V H domains were secreted in E. coli BL21 cells (Stratagene) by isopropyl 1-thio-␤-D-galactopyranoside induction (0.4 mM) at 30°C for 3 h. Frozen cell pellets were resuspended in 25 mM Tris, 25 mM NaCl, 5 mM EDTA, pH 7.1, and cells were lysed with an Ultra-Turrax T8 homogenizer (IKA Labortechnik, Staufen, Germany) and a M-110F Microfluidizer processor (Microfluidics, MA). The lysate was centrifuged, and the supernatant was passed through a 0.2-m filter and loaded onto a gravity flow protein A-Sepharose column (GE Healthcare). The column was washed with 10 bed volumes of 10 mM Tris, 1.0 mM EDTA, pH 8.0, and the protein was eluted with 0.1 M glycine, pH 3.0. The eluant was neutralized with 1.0 M Tris, pH 8.0, and protein concentration was determined by the method of Bradford (49).
Analytical Gel Filtration and Light Scattering Analysis-Purified V H domains were analyzed on a Superdex-75 column equilibrated with phosphate-buffered saline, 500 mM NaCl, pH 7.2, using an Agilent 1100 series high pressure liquid chromatography system (Agilent, Palo Alto, CA) in line with a Wyatt MiniDawn multiangle light scattering detector (Wyatt Technology, Santa Barbara, CA). The domains were injected at a concentration of ϳ70 M (1.0 mg/ml) in a volume of 100 l, and the flow rate was 0.5 ml/min. Concentration measurements were made using an on-line Wyatt OPTILAB DSP interferometric refractometer (Wyatt Technology). Astra software (Wyatt Technology) was used for light scattering data acquisition and processing. The light scattering unit and the refractometer were calibrated according to the manufacturer's instructions. A value of 0.185 ml/g was assumed for the dn/dc ratio of the protein. The detector responses were normalized by measuring the signal from monomeric bovine serum albumin (Sigma). The temperatures of the light scattering unit and the refractometer were maintained at 25 or 35°C, respectively. The column and all external connections were at ambient temperature.
Temperature-induced Denaturation-Thermal denaturation of V H domain protein was monitored at 207 nm with a J-810 spectrometer (Jasco, Easton, MD), using a 1-cm path length CD cuvette containing protein samples (10 M) in phosphate-buffered saline, pH 7.2. The temperature was increased from 25 to 85°C (or higher, when necessary to achieve complete unfolding) in steps of 2°C. The reversibility of the temperature-induced denaturation was checked by cooling the sample to 25°C and repeating the heating program (12). The unfolding curves were assumed to be two-state transitions (50). The fraction folded (␣) was calculated using the following equa- where T is the observed ellipticity at any temperature; F is the ellipticity of the fully folded form of the protein, and U is the ellipticity of the fully unfolded form of the protein (50). The raw ellipticity data of the folded and unfolded states show no temperature dependence outside of the transition zone, and therefore, F and U were considered to be the ellipticity value observed at 25°C and the lowest ellipticity value observed, respectively. The variation of ␣ was plotted versus the temperature to obtain the melting curve for each V H domain. The melting temperature (T m ) is defined as the temperature at which ␣ ϭ 0.5. The fraction of refolded protein recovered following thermal denaturation was estimated as the ␣ value for the sample cooled down to 25°C after heating to a temperature that induced complete unfolding.
Crystallization, Structure Determination, and Refinement-For crystallization, VH-B1a was purified by protein A-Sepharose chromatography, as described above, and the eluted protein (10 mg) was loaded on a Superdex TM HiLoad TM 16/60 gel filtration column (Amersham Biosciences Bioscience) with 20 mM Tris, 500 mM NaCl, pH 7.5, as mobile phase at 0.5 ml/min. The protein was concentrated to 10 mg/ml, and crystals were grown at 19°C in sitting drops using the vapor diffusion method with 2.0-l drops containing equal volumes of protein solution and crystallization buffer (1.1 M sodium malonate, pH 7.0, 0.1 M Hepes, pH 7.0, 0.5% (v/v) Jeffamine ED-2001, pH 7.0). Before data collection, crystals were dipped briefly into 2.0 M sodium malonate, pH 7.0, and flash-frozen in liquid nitrogen. Diffraction data were collected at the Stanford Synchrotron Radiation Laboratory (Stanford University) and processed using the programs Denzo and Scalepack (51).
The structure was solved by molecular replacement using the program Phaser (52) and the coordinates of the Herceptin Fab (53) (PDB code 1N8Z). The structure was refined using the program REFMAC (54). The model was manually adjusted using the program Coot (55). There was partial density for one or two Jeffamine molecules in the electron density maps, but overall quality was poor in comparison to the protein and was therefore not modeled. The coordinates and structure factors for the VH-B1a structure have been deposited in the RCSB Protein Data Bank (PDB code 3B9V).

Strategy for in Vitro Evolution and Analysis of Autonomous
V H Domains-To explore the requirements for an autonomous V H domain structure, we chose to study the V H domain of the humanized monoclonal antibody 4D5 (56) (Fig. 1), which binds to the epidermal growth factor receptor family member ErbB2 (57)(58)(59) and has been approved for cancer therapy (56,60). Antibody V H segments have been classified into three sub- groups on the basis of sequence homology (61), and the 4D5 V H domain (VH-4D5) and all camelid V H H domains belong to subgroup III (20,21). The 4D5 antibody has been extensively characterized both functionally and structurally, and structures of antibody fragments have been solved both uncomplexed (62) and complexed with antigen (53). Furthermore, the 4D5 antigen-binding fragment (42, 63) (Fab-4D5), single chain variable fragment (62, 64) (scFv-4D5), and V H domain (65, 66) (VH-4D5) can be produced in E. coli, and the Fab and scFv forms have been used as scaffolds for the construction of phage-displayed antibody libraries (42,64). Importantly, VH-4D5 binds to protein A through interactions that require a native tertiary structure and are mediated entirely by framework regions on the face opposite the light chain interface (66 -69), and this property can be used for facile selection and purification of correctly folded V H domains (29).
Using VH-4D5 as a starting point, we designed experiments to comprehensively explore CDR3 and the framework for features compatible with autonomous V H domains. We first used phage display to assess the effects of many different combinatorial mutations, as this method allows for the rapid sorting of extremely diverse protein populations. To enrich for stable phage-displayed V H domains, we took advantage of the interaction with protein A, as it has been shown that binding of a phage-displayed antibody fragment to this ligand is highly correlated with the intrinsic stability of the free protein (29,47,70). Phage-displayed libraries were used to not only select autonomous V H domains from naive libraries but also to rapidly analyze particular variants in detail using quantitative saturation scanning (48) and shotgun alanine-scanning methods (46).
The protein A selection strategy allowed us to rapidly enrich libraries for those members capable of adapting a stable fold in a phage-displayed format, but our ultimate goal was to obtain autonomous V H domains that would be stable and monomeric at high concentrations. Thus, we next purified selected V H domains and screened the proteins individually for favorable behavior using biophysical methods that have been applied previously to the study of V H H and V H domains (12,29). These screens assessed whether each V H domain 1) could be purified in high yield in a correctly folded form, 2) existed as a monomer at high concentrations as evidenced by gel filtration and light scattering analysis, and 3) exhibited reversible folding behavior and high thermostability in temperature-induced denaturation experiments monitored by CD.
First Generation Library for Selection of Autonomous V H Domains-To select for mutations that stabilize the VH-4D5 domain, we designed a phage-displayed library that targeted for randomization a set of residues that have been implicated in the stabilization of camelid V H H domains and an autonomous human V H domain (Fig. 1). Because camelid V H H domains are stabilized by interactions between CDR3 and the former light chain interface (6, 14 -16, 19), we replaced residues 93-102 with random loops of all possible lengths varying from 7 to 17 residues. In addition, we randomized three framework positions that are involved in the V H H tetrad and are occupied by hydrophobic residues in the VH-4D5 sequence (Val-37, Leu-45, and Trp-47). We did not randomize the fourth position of the V H H tetrad because VH-4D5 contains a hydrophilic Gly-44 at this position. We also targeted position 35 in CDR1, as a Gly residue at this position has been implicated in the stabilization of the human V H domain HEL4 (35). We used degenerate codons that encode for all 20 natural amino acids in order to sample all possible sequence combinations and to allow all the variants to compete among each other in a single pool.
The constructed library (library A) contained a total diversity of 2 ϫ 10 10 unique members, and the phage pool was cycled through six rounds of selection for binding to protein A. Sequencing of 57 clones revealed 25 unique sequences, which were aligned and inspected for conserved sequence motifs ( Fig.  2A). Among the CDR3 loops, there was no obvious consensus in terms of either length or sequence, aside from a modest preference for small residues at positions 93 and 94. There is no bias in favor of hydrophobic sequences, and this contrasts with the results of a previous study in which the CDR3 loop was randomized in the context of a fixed V H H framework and revealed significant bias for hydrophobic residues that were shown to be involved in stabilizing interactions with the framework (29,47). At positions 37 and 45, although the sequences were quite variable, the general hydrophobic character of the wild-type residues was conserved. Interestingly, most of the sequences at position 35 contain small residues, which differ markedly from the wild-type His and instead resemble the Gly residue of the HEL4 V H domain (35). Position 47 is occupied by a hydrophobic residue, a charged residue, or a small residue. The positions involved in the V H H tetrad are not biased in favor of camelidlike substitutions, and taken together with the lack of a hydrophobic bias in the CDR3 sequences, these results suggest that most of the selected domains are not stabilized by the CDR3/ framework interactions typical of V H H domains. Instead, it appears that increased hydrophilicity may be achieved by interactions between small residues at position 35 and hydrophobic residues at position 47, as observed in the structure of HEL4 (35) or, alternatively, by replacement of the hydrophobic Trp-47 by a charged or small residue.
Biophysical Analysis of Autonomous V H Domains-We purified 6 of the 25 selected V H domains from E. coli using protein A affinity chromatography and compared the protein yields to that of VH-4D5 ( Fig. 2A). The protein yields were all significantly higher than that of VH-4D5, and the proteins were further characterized by gel filtration and temperature-induced denaturation.
Analytical gel filtration was used to profile the quaternary structure of the V H domains (Fig. 3), and for those which exhibited distinct peaks, the molecular masses were estimated by light scattering (Table 1). VH-4D5 and two of the six V H domain variants (VH-A2 and VH-A6) eluted in several overlapping peaks and light scattering indicated significant aggregation (Fig. 3A). In contrast, the other four V H domains (VH-A1, VH-A3, VH-A4, and VH-A5) eluted mainly as single peaks with elution times close to that expected for a monomer (Fig. 3B), and weight average molar masses estimated by light scattering were in good agreement with the formula molecular masses. In addition to the major monomeric peak (peak 1), there were two minor peaks. One of the minor peaks (peak 1a) eluted close to peak 1 and exhibited a weight average molecular mass that also corresponded to that of a monomer, and we speculate that this peak may represent a monomeric form of the protein lacking a disulfide bond. The second minor peak (peak 2) exhibited a weight average molecular mass that is in good agreement with the calculated mass of a dimer. The monomer and dimer peaks were clearly separable, and the two forms did not interconvert when the peaks were purified and reinjected on the gel filtration column (data not shown). Thus, it is unlikely that dimerization is caused by transient interactions involving the former light chain interface, and instead, we speculate that the dimeric form may result from the swapping of ␤-strands between two monomers, as has been observed previously for a crystallized V H H domain (71). By comparing the areas under peaks 1 and 2 to the total peak area, we could estimate the fraction of the total injected protein sample that eluted as a monomer (peak 1) or dimer (peak 2), respectively, and for each of the four nonaggregating domains, the monomeric fraction exceeded 85% (Table 1).
We studied the temperature-induced denaturation properties of the proteins by CD to determine the temperature of unfolding transition (T m ) and to ascertain whether the folding pathways were reversible. In agreement with previous studies (72), Fab-4D5 exhibited high thermostability (T m ϭ 83°C), but the unfolding was irreversible (Fig. 4A). The unfolding of VH-4D5 was also irreversible, and thermostability was reduced substantially (T m ϭ 58°C), relative to that of Fab-4D5 (Fig. 4B). Notably, all six V H domain variants exhibited significant improvements in thermostability, relative to that of VH-4D5, and the melting temperatures of the most stable variants approached that of Fab-4D5 (Table 1). Furthermore, the four variants that behaved as monomers by gel filtration exhibited essentially reversible folding behavior (Fig. 4C).
Taken together, these results show that the phage display selection for binding to protein A selected for domains with improved thermostability, and also enriched for domains with a reduced tendency to aggregate, as evidenced by reversible folding behavior and monomeric gel filtration profiles. Thus, these results validate the use of the protein A selection as a rapid filter that enriches for autonomous V H domains. By following up with biophysical screening of selected variants, we were able to identify several V H domains that are highly stable and predominantly monomeric. It appears that monomeric quaternary structure is strongly correlated with reversible folding but less so with thermodynamic stability, as the nonmonomeric variants exhibited high thermostability but only partially reversible folding, whereas the monomeric variants exhibited both high thermostability and reversible folding.
The four monomeric V H domains contain small residues at position 35, but differ significantly in the nature of the residue at position 47 ( Fig. 2A). It is possible that the variants VH-A1 and VH-A3 are stabilized by structural mechanisms similar to those observed for the HEL4 V H domain (35), as all three proteins contain a small residue at position 35 and a hydrophobic residue at position 47. However, this is clearly not the only structural solution involving these positions, as the other two variants (VH-A4 and VH-A5) contain Ser at position 35 and Ser or Glu at position 47.
Second Generation Library for the Selection of Autonomous V H Domains-We next investigated whether the stability and quaternary structure of autonomous V H domains could be improved by targeting additional positions for randomization. For this purpose, we chose VH-A1 as the template, as this   Table 1. See "Experimental Procedures" for further details. FEBRUARY 8, 2008 • VOLUME 283 • NUMBER 6 domain exhibits desirable characteristics in terms of protein yield, quaternary structure, thermostability, and folding behavior. We examined the structure of the Fv-4D5 heterodimer (64) and identified all residues involved in the light chain interface, which we defined as residues that lose greater than 20 Å 2 of solvent-accessible surface area because of contacts with the light chain. This group consisted of nine positions, including three that were randomized in library A (37, 45, and 47) and six others (39,44,50,91, 103, and 105) (Fig. 1B). Because the residues at these positions are likely to become more solvent-accessible in the absence of the V L domain, we reasoned that mutations at these sites may improve the stability and/or solubility of autonomous V H domains. We designed a second generation library (library B) in which these nine positions and position 35 were simultaneously targeted for diversification. CDR3 was also targeted, but the length was fixed as that of the CDR3 of VH-A1.

Autonomous Human V H Domains
Because we targeted a large number of positions and were attempting to fine-tune a V H domain that already exhibited many of the characteristics we desired, we adapted a "soft randomization" approach. At each randomized position, library B was designed to contain roughly 50% wild-type sequence and 50% random sampling of other amino acids. Library B contained a total diversity of 7.5 ϫ 10 9 unique members and was cycled through six rounds of selection for binding to protein A.
Sequence alignment of 393 clones revealed 384 unique sequences, which were aligned to determine the distribution of amino acids at each mutagenized position. We also calculated the amino acid frequencies at each position within the naive library (prior to selection for binding to protein A), and we compared these to the observed frequencies among the selected clones (Fig. 5). In this way, we identified statistically significant deviations indicative of positive selective pressure for certain amino acid types at particular positions. Outside of the CDR3 loop, the parental VH-A1 sequence was significantly enriched at most positions. Most notably, Gly was strongly conserved at position 35, suggesting that the mutation H35G contributes to the stabilization of the autonomous V H domain relative to VH-4D5. However, the parental Trp residue was depleted in favor of Arg at position 103, suggesting that replacement of Trp-103 with Arg may further improve stability.
Biophysical Analysis of a Second Generation V H Domain-We measured the protein yields for 40 V H domains selected from the second generation library (Fig. 2B). Yields for most domains were significantly greater than the yield for VH-4D5, but only   (Fig. 3) were determined by light scattering. b The fraction refolded was defined as the fraction folded value (␣) for the protein at 25°C, following heating to a temperature that resulted in complete unfolding. c NA indicates that the gel filtration data could not be analyzed because distinct peaks could not be discerned.
one domain (VH-B1) exhibited a yield greater than that of VH-A1. These results suggest that we have reached the limits for protein yield in our secretion vector system, as extensive screening of the second generation population only resulted in modest improvements relative to the best first generation variant. We speculate that the high thermostability and reversible folding characteristics of the optimized V H domains may allow for correct folding of essentially all of the secreted polypeptides, and thus, the rate-limiting step for increased yield may now be translation and/or secretion, rather than folding. We subjected VH-B1 to analytical gel filtration and found the protein to be essentially monomeric (Fig. 3C). We also monitored the thermal denaturation behavior of VH-B1, which exhibited essentially reversible folding behavior and thermostability identical to that of the first generation parent VH-A1 (T m ϭ 73°C) (Fig. 4D and Table 1).
Shotgun Alanine-scanning Analysis of CDR3 Loops-As the CDR3 loops of camelid V H domains are involved in stabilization of the protein fold, we investigated whether this is also the case for our in vitro evolved V H domains. To facilitate the rapid analysis of many domains, we used an efficient shotgun alanine-scanning method that was previously validated for the analysis of several camelid V H domains (29). For each domain, we constructed a library that allowed each CDR3 residue to vary as the wild type or Ala with equal frequency. Following two rounds of selection for binding to protein A, ϳ100 clones were sequenced, and the WT/Ala ratio was determined for each scanned position. The WT/Ala ratio correlates with the contributions of each wildtype side chain to stability, with WT/Ala ratios greater than or less than 1 indicating side chains that stabilize or destabilize the protein fold, respectively.  FEBRUARY 8, 2008 • VOLUME 283 • NUMBER 6

JOURNAL OF BIOLOGICAL CHEMISTRY 3647
We scanned 10 variants from the second generation library and also the parental domain from the first generation library (VH-A1). For many of the domains, including VH-A1 and VH-B1, stability is essentially independent of the CDR3 loops, as the WT/Ala ratios were close to 1 at all positions (Fig. 6A). Furthermore, Ala was actually preferred over larger wild-type side chains in the central region of the loops (positions 98 -101), suggesting that small Ala side chains at these positions stabilize the fold, perhaps by favoring a turn conformation. However, for several domains, stability does appear to depend significantly upon the CDR3 loop, and in one case (VH-B9), almost half of the residues within CDR3 exhibit WT/Ala ratios significantly greater than 1 (Fig. 6B).
For comparison, we also alanine-scanned the CDR3 loop of an autonomous camelid V H H domain (anti-HCG), in which a Trp residue within CDR3 (Trp-100) packs against framework residues (Phe-37 and Arg-45) and sequesters the hydrophobic portions of the former light chain interface from the solvent environment (15,29). Two of the seven scanned residues were intolerant to Ala substitutions (Fig. 6C); Trp-100 was com-pletely conserved, and Gly-98 was highly conserved. These results confirm the importance of CDR3 for the stability of the camelid V H H domain and are in good agreement with previous results, which showed that four different CDR3 loops selected for stabilization of the same camelid framework also contained hydrophobic residues that were intolerant to substitution (29). For the anti-HCG V H H domain, it appears that Trp-100 is required to provide packing interactions with the framework and Gly-98, which is located at the apex of a turn, is required for a conformation that provides the correct orientation of CDR3 relative to the framework. Interestingly, examination of the CDR3 sequence of VH-B9 (Fig. 2B) suggests that the residues that are resistant to Ala substitution may stabilize the domain through a similar mechanism because it is possible that Ile-101 and/or Trp-103 provides hydrophobic packing interactions similar to those of Trp-100 in the camelid V H H domain, and Gly-99 and Pro-100 may be required for a turn conformation.
Taken together, these results demonstrate that, like camelid V H H domains, the stabilities of some of our evolved V H domains are dependent upon particular residues within the CDR3 loops. However, these evolved domains are only moderately dependent on CDR3 because in no case is the wild type more than 10-fold preferred over Ala (Fig. 6B), whereas in contrast, the camelid V H H domains examined here (Fig. 6C) and elsewhere (29) are highly dependent on hydrophobic CDR3 sequences. Furthermore, the stabilities of most of the selected domains seem to be independent of the CDR3 loop, and this latter class includes VH-A1 and VH-B1, the two domains that exhibit the most desirable characteristics for autonomous V H domains (Fig. 6A).
Quantitative Saturation Scanning Analysis of Former Light Chain Interfaces-Having ascertained that the stability of the VH-B1 domain is independent of the CDR3 loop sequence, we next investigated whether residues in the former light chain interface contribute to stability. In total, 10 positions outside CDR3 were randomized in the two-stage process that produced VH-B1 from VH-4D5. We conducted a quantitative saturation scan (48) of nine of these positions to assess the tolerance to all possible point mutations (position 103 was not scanned) ( Fig.  7). At most scanned positions, modest biases are observed, and the parental sequence is among the most prevalent sequences, but no position is dominated by any single residue type. These results suggest that the stabilization of VH-B1 relative to VH-4D5 was achieved by cumulative effects arising from contributions of multiple mutations. It appears that most of the residues can be substituted by smaller and/or more hydrophilic amino acids, and notably, the small hydrophilic amino acids Ser and Thr are tolerated at all positions. Thus, it may be possible to further improve the solubility and monomeric state of VH-B1 by additional point mutations that improve the hydrophilicity of the former light chain interface without compromising stability (see below). Together with alanine-scanning analysis of CDR3, these results indicate that the stability of VH-B1 does not depend on interactions between the CDR3 loop and residues in the former light chain interface.
For comparison, we also conducted a saturation scan of four positions within the former light chain interface of the anti-HCG V H H domain (Fig. 7). We randomized Phe-37 and Arg-FIGURE 6. Shotgun alanine-scanning analysis of CDR3 loops. The WT/Ala ratios for each residue in CDR3 following selection for binding to protein A are shown for V H domains whose stability appears to be independent of CDR3 (A), V H domains whose stability appears to be dependent on CDR3 (B), and the camelid V H H domain ␣-HCG (C). The asterisk indicates that no Ala substitutions were observed, and thus the WT/Ala ratio is a lower limit.
45, which pack with Trp-100 from CDR3 (see above), to form a small hydrophobic core and also two small residues (Ser-47 and Thr-91), which are close to Trp-100 and may accommodate hydrophobic substitutions that could expand upon this core. The results contrast with those for the VH-B1 domain, as there is strong sequence conservation at all four positions, suggesting that the stability of the camelid V H H domain is dependent upon particular residues at these positions. At position 37, the wildtype Phe was dominant or was replaced by Trp. Similarly, at position 45 the wild-type Arg was dominant or was replaced by hydrophobic residues. Strikingly, the wild-type Thr at position 91 was almost entirely replaced by hydrophobic residues, and at position 47, although the wild-type Ser was quite abundant, the dominant amino acid was Trp and other hydrophobic residues were also prevalent. It is noteworthy that the sequence of the anti-HCG V H H domain is unusual at positions 47 and 91 because these positions are occupied by hydrophobic residues in most V H H domain sequences (Fig. 7). Taken together, these results confirm that the packing interactions between Trp-100, Phe-37, and the aliphatic portion of Arg-45 are important for stabilizing the anti-HCG V H H domain fold and furthermore suggest that the introduction of hydrophobic residues at positions 47 and 91 may lead to further stabilization. Thus, in contrast with VH-B1, the stability of the camelid V H H domain depends on hydrophobic residues in the former light chain interface and the CDR3 loop.
The Crystal Structure of an Autonomous Version of VH-4D5-Autonomous V H domains such as VH-A1 and VH-B1 may be ideal scaffolds for the construction of synthetic antibody libraries, as they appear well adapted to support highly diverse CDR3 loops. To further test this supposition, we altered CDR3 and surrounding residues of VH-B1 (positions 93-103) to resemble the sequence of VH-4D5. The resulting VH domain (VH-B1a) is identical to VH-4D5 aside from four mutations that were accumulated during the selection process (H35G, Q39R, L45E, and R50S) and one mutation preceding CDR3 (S93A). VH-B1a does not bind to Her2 (data not shown), but this result was expected because the 4D5 light chain makes contact with antigen and helps to support the conformation of the heavy chain (53). Also, Arg-50 in VH-4D5 is important for antigen recognition (53,73), and substitution by Ser in VH-B1a may be detrimental for binding. However, unlike VH-4D5, VH-B1a is well expressed in E. coli, exhibits reversible folding kinetics upon thermal denaturation, and is more thermostable (T m ϭ 79°C) than even VH-B1 (Table 2). Furthermore, the protein elutes predominantly as a monomer in gel filtration experiments, although the elution time is increased in comparison with that of VH-B1, suggesting some interaction with the column matrix. These results demonstrate that the CDR3 loop does not contribute significantly to the stability of the VH-B1 fold, as the entire loop can be replaced without significantly affecting protein yield, oligomeric state, thermostability, or folding.
To gain insights into the structural basis for the autonomous behavior of VH-B1a, the crystal structure was solved and refined at 1.8 Å resolution with R work and R free of 16.0 and 19.9%, respectively ( Table 3). The structure consists of four molecules per asymmetric unit, and unlike the HEL4 structure (35), the crystal packing contacts do not resemble the interactions observed in the interfaces of V H /V L pairs. Excluding the CDR3 loop, superimposition of the four independent variable domains and VH-4D5 reveals essentially identical C␣ traces with root mean square deviations of 0.6 or 0.8 Å in comparing VH-B1a monomers to other VH-B1a monomers or VH-4D5, respectively (Fig. 8A). However, there are significant differences in the conformations of the CDR3 loops, which vary even among the VH-B1a structures, indicating that the loop is flexible and may adapt multiple conformations in solution.
Considering the four mutations outside CDR3 that were accumulated during the evolution of VH-B1a from VH-4D5 (Fig. 8B), it appears that the changes at positions 39 and 45 (Q39R/L45E) increase the solubility of VH-B1a because of the increased hydrophilicity of the side chains. In the case of the substitution of Arg by Ser at position 50, it is not clear if and how the change improves solubility because Arg is actually more hydrophilic than Ser. At position 35, the substitution of Gly for His creates a cavity that is occupied by Trp-95 from CDR3 in all four independent VH-B1a structures, despite the  (Table 2). Yellow or blue shading indicates sequences that are abundant (Ͼ5%) or rare (Ͻ0.5%), respectively, in human/mouse V H domains (top row) or llama V H H domains (bottom row) (20).
considerable differences in the CDR3 loop conformations. Consequently, compared with VH-4D5, the CDR3 loop of VH-B1a moves closer to the framework, but Trp-47 is exposed to solvent in both structures. These findings contrast with those for the HEL4 V H domain, where Trp-47 was found in a cavity similar to that occupied by Trp-95 in VH-B1a, and it was concluded that the sequestration of the hydrophobic Trp-47 side chain was in part responsible for increasing solubility (35).
Increasing Hydrophilicity at the Former Light Chain Interface by Rational Design-Using the results of saturation scanning (Fig. 7) and structural analysis (Fig. 8), we analyzed a panel of VH-B1a mutants designed to explore and further optimize autonomous V H domain behavior (Table 2). Because Trp-47 is exposed to solvent in the crystal structure, we reduced hydrophobicity at this position with substitutions that were abundant in the saturation scanning data set (VH-B1b-f). All five substitutions reduced gel filtration elution times and had only minimal effects on thermostability, which was reduced or increased slightly by hydrophilic (Thr/Glu) or aliphatic (Leu/Val) substitutions, respectively.
We also analyzed the effects of hydrophilic substitutions for two other exposed hydrophobic residues by adding mutations in the background of VH-B1d (VH-B1a with a W47L mutation). The tolerance of the V H domain to these mutations was  assessed by the effects on the elution time and area of the monomer peak observed by gel filtration. Substitution of Val-37 by Ser or Thr was well tolerated, and in particular, the Ser-substituted protein (VH-B1g) was very well behaved in gel filtration, as 97% of the sample eluted as a sharp single peak with an elution time and apparent molecular mass consistent with a monomer. The thermostability of VH-B1g (T m ϭ 72°C) was reduced relative to that of VH-B1a (T m ϭ 79°C), but nonetheless, it was still greater than those of typical camelid V H H domains (T m ϭ 57-67°C) (12,29). Although position 103 was not subjected to saturation scanning, the analysis of library B suggested that the solvent-exposed Trp-103 may be replaced by Arg (Fig. 5). Incorporation of this substitution produced a domain (VH-B1i) that was very thermostable (T m ϭ 83°C) and unfolded reversibly, but a significant proportion of the protein eluted from the gel filtration column as a dimer (peak 2). However, Ser or Thr substitutions at position 103 were also well tolerated (VHB1j and VHB1k) and the purified proteins exhibited monomeric behavior by gel filtration. Finally, we explored the effects of hydrophilic substitutions for two hydrophilic residues (Arg-39 and Glu-45) that were acquired during the evolution of VH-B1 from VH-4D5. At position 39, six different substitutions were well tolerated (VH-B1l-q), indicating that general hydrophilic character at this position is sufficient to promote monomeric behavior of the V H domain. At position 45, His was well tolerated (VH-B1r), but substitution by Ser increased the elution time significantly (VH-B1s), and substitution by Thr (VH-B1t) resulted in significant aggregation. Thus, it appears that hydrophilic residues at positions 39 and 45 promote solubility of the V H domain, but position 45 is more discriminating with regard to the chemical nature of the residues.
Taken together, these results validate the utility of the saturation scanning analysis for predicting substitutions that are well tolerated structurally by the V H domain fold. By judiciously incorporating hydrophilic substitutions that are prevalent in the quantitative scanning data set (Fig. 7), we were able to increase the hydrophilicity of the former light chain interface without significantly compromising thermostability. It is notable that many of the best substitutions for improving autonomous V H domain behavior are rare in natural V H and V H H domains and thus could not have been identified by analysis of natural sequences and structures.
Back Mutation Analysis of the Former Light Chain Interface-To further characterize how mutations contribute to autonomous V H domain behavior, we analyzed variants of VH-B1a to FIGURE 8. The crystal structure of VH-B1a. A, superimposition of the main chain of VH-4D5 (gray) and the four independent VH-B1a molecules in the asymmetric unit (colored). The dashed oval demarcates the CDR3 loops. B, comparison of the light chain interface of VH-4D5 (left) and the former light chain interface of VH-B1a (right). The model of VH-4D5 was derived from the Fab-4D5 structure coordinates (PDB code 1N8Z). The main chain is shown as a tube and the CDR loops are colored as follows: CDR1 (yellow), CDR2 (orange), and CDR3 (red). Gly residues are shown as spheres and side chains are rendered as sticks. Atoms are colored as follows: nitrogen, blue; oxygen, red; carbon, green or magenta for residues that are the same or different in the two domains, respectively.
assess the effects of back mutations at four positions that changed during the evolution of VH-B1 from VH-4D5 ( Fig. 9A and Table 2). The substitution of Gly-35 by His (VH-B1u) did not significantly alter thermostability or gel filtration elution time, but it did greatly reduce the fraction of the protein eluting as a monomer. Similarly, the substitution of Arg-39 by Gln (VH-B1v) reduced the monomeric fraction but only marginally increased the elution time and even improved thermostability slightly (T m ϭ 82°C). The substitution of Leu for Glu-45 (VH-B1w) also increased thermostability, but in contrast with the other back mutations, the elution time was increased dramatically, indicating that the greater hydrophobicity of the Leu side chain promotes interactions with the hydrophobic column matrix. Interestingly, the protein containing the fourth back mutation, substitution of Ser-50 by Arg (VH-B1x), exhibited slightly reduced thermostability but eluted as a sharp peak with an elution time and calculated molecular mass consistent with a well behaved monomer.
We also performed the back mutation analysis in the background of VH-B1a variants containing either Leu or Thr in place of Trp position 47 (VH-B1d and -f). Compared with VH-B1a, VH-B1d and VH-B1f exhibit gel filtration profiles more typical of monomeric proteins, and thus, the effects of the back mutations were muted in these backgrounds ( Fig. 9 and Table 2, VH-B1y to -af). However, in all cases, the back mutation at position 50 improved monomeric behavior, whereas those at the other three positions had detrimental effects.
Taken together, these results indicate that substitutions that reduce hydrophobicity at positions 39, 45, and 47 act additively to improve the solubility of V H domains. The presence of a Gly residue at position 35 is beneficial when Trp is present at position 47 but is not necessary when Trp-47 is replaced by less hydrophobic residues. Finally, it appears that the Ser for Arg substitution in VH-B1a compared with VH-4D5 is a spurious mutation that is actually detrimental to autonomous V H domain behavior.

DISCUSSION
Our analysis of sequence features that promote autonomous V H domain behavior reveals that, unlike camelid V H H domains, most solutions do not require interactions between the CDR3 loop and the former light chain interface. Instead, hydrophilicity of the former light chain interface is increased by replacing exposed hydrophobic residues with more hydrophilic residues that are compatible with the structure of the V H domain fold. Importantly, our quantitative saturation scanning analysis reveals that many different substitutions, both hydrophobic and hydrophilic, are structurally viable across the former light chain interface (Fig. 7). Many of these substitutions are rare in both natural V H and V H H domains and thus could only be discovered by the comprehensive and unbiased strategy that we employed. This data set expands the options available for the engineering of autonomous V H domains beyond the scope of natural sequence diversity.
By incorporating sequence changes suggested by biophysical analysis, rather than those suggested by the analysis of natural sequences, we evolved frameworks that should be ideal scaffolds for engineering autonomous V H domains with novel specificities (Tables 1 and 2). These domains can be purified at high yields from E. coli (Fig. 2) and, like camelid V H H domains, remain monomeric at high concentrations (Fig. 3) and fold reversibly (Fig. 4). Moreover, the thermostabilities of many of these domains exceed those of typical V H H domains (12,29) and approach those of highly stable Fabs. Furthermore, the autonomous nature of many of our in vitro evolved domains stems from only a few framework changes and, unlike camelid domains, is independent of the CDR3 loop (Fig. 6). Thus, we are  (Table 2) and the mutations relative to VH-B1a at the right.
optimistic that these novel V H domain scaffolds will be able to support diverse CDR3 sequences and should enable the development of naive phage-displayed libraries that can be used to derive specific antibodies against diverse antigens.