Nucleotide Insertions and Deletions Complement Point Mutations to Massively Expand the Diversity Created by Somatic Hypermutation of Antibodies

Background: The protein AID generates nucleotide insertions and deletions (indels) critical for antibody affinity maturation. Results: The location, diversity, and evolution of indels were examined. Conclusion: AID generates diverse sequence-related indels that localize to antigen binding regions in vitro and in vivo. Significance: AID is sufficient to form indels that combine with point mutations to form a robust system for antibody evolution. During somatic hypermutation (SHM), deamination of cytidine by activation-induced cytidine deaminase and subsequent DNA repair generates mutations within immunoglobulin V-regions. Nucleotide insertions and deletions (indels) have recently been shown to be critical for the evolution of antibody binding. Affinity maturation of 53 antibodies using in vitro SHM in a non-B cell context was compared with mutation patterns observed for SHM in vivo. The origin and frequency of indels seen during in vitro maturation were similar to that in vivo. Indels are localized to CDRs, and secondary mutations within insertions further optimize antigen binding. Structural determination of an antibody matured in vitro and comparison with human-derived antibodies containing insertions reveal conserved patterns of antibody maturation. These findings indicate that activation-induced cytidine deaminase acting on V-region sequences is sufficient to initiate authentic formation of indels in vitro and in vivo and that point mutations, indel formation, and clonal selection form a robust tripartite system for antibody evolution.

The adaptive immune system combines multiple processes to generate a repertoire of diverse antibodies. Recombination of VDJ germ line segments results in a naive antibody repertoire with both sequence and length diversity in antigen contacting regions. B cell clones expressing naive antibodies with weak affinity for antigen are stimulated to express AID 2 and initiate SHM. Maturation is accelerated by clonal expansion of cells expressing antibodies containing AID-mediated mutations that improve affinity (1).
The mechanisms underlying generation of indels during antibody affinity maturation are poorly understood, and examination has been hampered by their low in vivo frequency and the difficulties attending in situ monitoring of in vivo affinity maturation (9,11). The diversity in CDR3 lengths introduced by V(D)J recombination makes the analysis of indels introduced in this region during in vivo SHM extremely challenging. Although technical advances have recently enabled an investigation of indels in vivo (5), questions remain regarding which components of the SHM machinery are critical for indel formation, the diversity of indels generated during maturation to an antigen, and the interplay between indels and single amino acid substitutions during subsequent maturation to improve affinity and specificity. Selection and expansion of cells producing antibodies containing indels are subject to a number of constraints, as expressed antibodies need to retain their overall structure and stability as well as improve antigen binding characteristics. In addition, antibodies that incorporate indels that result in increased nonspecific binding or in cross-reactivity to host tissues would likely be eliminated.
In this study, in vitro SHM was used to scrutinize the in situ creation, selection, and maturation of indels. Fifty three distinct antibodies were affinity matured against 21 different antigens, and our findings were compared with in vivo antibody reper-toires. Indels observed during in vitro SHM were analyzed and found to significantly improve antibody affinity and function. Indels observed during in vitro affinity maturation were localized to regions likely to improve binding, in particular to CDR1 of the heavy chain (HC) and light chain (LC), similar to that observed in vivo. The crystal structures of a human antibody with and without an insertion derived from in vitro SHM were determined and compared with published antibody structures containing insertions. Multiple indels of related composition and origin were often observed for the same antibody during in vitro SHM, and secondary AID-mediated point mutations in and around the indel were found to further optimize antigen recognition. These findings suggest that AID expression in a heterologous context is sufficient to generate both indels and point mutations, and when combined with selection for improved antigen binding, it enables rapid evolution of naive antibody sequences to innumerable antigens.

EXPERIMENTAL PROCEDURES
In Vitro SHM Antibody Affinity Maturation-Antibody affinity maturation was conducted using in vitro somatic hypermutation as described previously (18 -20). In short, the respective antibody was simultaneously displayed on the surface of, and secreted from, HEK293-c18 cells using an episomal vector system. After establishment of stable episomal cell lines, a vector for expression of AID was transfected into the cells to initiate somatic hypermutation. Cell populations co-expressing the antibody and AID were expanded to 2-4 ϫ 10 7 cells, and fluorescence-activated cell sorting (FACS) was performed in the presence of fluorescently labeled antigen under increasingly stringent conditions. Iterative rounds of AID transfection and FACS selection, each isolating the brightest cells incubated in diminishing concentrations of fluorescent antigen, were used to enrich and identify cells expressing antibody variants with improved binding affinity for antigen. Cell pellets were collected in each round and submitted for antibody V-region sequencing by standard Sanger and/or next generation sequencing technology.
Sequencing and Preparation of PBMC cDNA-RNA from peripheral blood lymphocytes (PBMCs) from a total of 68 healthy donors was purchased from two sources as follows: seven donors from AllCells, Inc. ( Two g of RNA from each pool were reverse-transcribed by priming with a mixture of oligo(dT) and random hexamers using a SuperScript III first-strand synthesis system (Invitrogen) as per the manufacturer's protocol. One-tenth of each reaction was then amplified with HC-, -, or -specific internal oligonucleotides in a 50-l reaction as follows: 95°C for 7 min; 95°C for 30 s, 55°C for 30 s, and 68°C for 1 min for 20 cycles; 68°C for 7 min and then 4°C. Approximately 1/100th of each reaction was amplified for 25 cycles using "external" primers at 95°C for 7 min; 95°C for 30 s, 55°C for 30 s, and 68°C for 1 min; then 68°C for 7 min and then 4°C. 5 l of each reaction were run on a 1% agarose gel, and bands of correct size were excised from the gel. DNA was recovered using a Zymoclean TM gel DNA recovery kit (Zymo Research, Irvine, CA). Recovered DNA was sent to 454 Life Sciences (Roche Applied Science) for sequencing.
Sequencing of in Vitro AID-mediated Affinity Maturation-For Sanger sequencing, oligonucleotide primers were used that encompass ϳ140 nucleotides 5Ј to the ATG start codon through 30 nucleotides 3Ј to the junction of the variable/constant regions, to yield amplicons of ϳ550 nucleotides in length for both HC and LC.
Next Generation Sequencing Analysis of in Vitro Antibody Repertoires-For next generation sequencing, ϳ5 ϫ 10 5 HEK293-c18 cells were washed once with PBS, spun down, and resuspended in 50 l of Lyse-N-Go buffer (Fisher). After incubation at 95°C for 10 min, cell lysates were centrifuged at 14,000 rpm for 10 min to remove cell debris. Antibody V H and V L open reading frames were amplified by PCR from their episomal expression vectors (18) using specific primers complementary to the CMV promoter region and to the heavy and light chain constant regions using Pfx high fidelity polymerase (Invitrogen): CMV forward primer is as follows: 5Ј-TACG-GTGGGAGGTCTATATAAGCA-3Ј; HC reverse primer is as follows: 5Ј-CTGAGTTCCACGACACCGTCACAG-3Ј; and LC reverse primer is as follows: 5Ј-GTTACCCGATTGGAGG-GCGTTATC-3Ј. PCR fragments were purified using the Qiagen PCR cleanup kit (Qiagen) and quantified using the Quant-iT PicoGreen dsDNA kit (Invitrogen) according to the manufacturers' protocols. Sequencing output was processed by the 454 Amplicon Default pipeline (Roche Applied Science). Typically ϳ50,000 -200,000 HC and/or LC reads were obtained per sample for a given experiment.
Analysis of in Vitro and in Vitro Sequencing Data-For the in vitro derived samples, reads were mapped to the respective known parental heavy chain and light chain sequences incorporated in the vectors using 454 GSMapper (Roche Applied Science). For in vivo derived samples, read mapping was carried out using IgBlast (21) with searching against germ line gene sequences and their allelic variants assembled from multiple database sources. Indels were called from pairwise alignments generated by GSMapper, and those transformed from IgBlast (for in vivo samples) using VarScan (22) with customized modification. The signal peptide sequence was excluded in this analysis. Indels shorter than three bases, those occurring at the boundaries of sequence reads (within 5 nucleotides), and those in or proximal to homopolymer sequences (A, T, C, or G; n Ն 6) were excluded from consideration. High quality reads were then mapped to their originating HC or LC variable region immunoglobulin sequence (in vivo) or starting sequence (in vitro) by Smith-Waterman-based realignment (23). Indel-related motifs were detected by examining the base context of the parental sequences at the regions where indel occurred. For each insertion, 10 bp (five leading and five trailing) around the locus were extracted, and for each deletion, in addition to the five leading and five trailing pairs, the exact deleted sequences were also extracted.
SPR Affinity Measurements-Binding kinetics and affinities of IgG1 antibodies were determined by SPR on a BIAcore T200 instrument (GE Healthcare). Approximately 200 -400 response units of the antibodies were captured on a CM5 sensor chip previously immobilized with 10,000 response units of an anti-human Fc-specific antibody (GE Healthcare). For the antihuman ␤-nerve growth factor (␤NGF) antibodies, soluble h␤NGF (R&D Systems) was diluted 3-fold from 500 to 6 nM with HBS-EPϩ buffer. Each h␤NGF concentration was then injected for 2 min at a flow rate of 30 l/min and allowed to dissociate for 2 min. A similar protocol was followed for the anti-human cell line-derived neurotrophic factor receptor ␣1 (GFR␣1) antibodies, where the soluble extracellular domain of human GFR␣1 was diluted and used as the analyte. Each hGFR␣1 concentration was then injected for 3 min at a flow rate of 30 l/min and allowed to dissociate for 3 min. The surface was regenerated with 60 l of 3 M MgCl 2 after each cycle. Double reference subtracted sensorgrams were fit globally using a 1:1 binding model with mass transport with the BIAcore T200 evaluation software to determine dissociation constants (K D ).
Protein Production-The Fab portions of the HC and LC of an anti-h␤NGF antibody with and without a 9-amino acid insertion were cloned into pcDNA3.3 vectors and transiently expressed in Expi293F cells (Invitrogen) using ExpiFectamine reagent following the manufacturer's instructions. After 5-7 days of incubation at 37°C and 5% CO 2 , cell culture supernatants were harvested. Fabs were captured using KappaSelect resin (GE Healthcare). The column was washed with 10 column volumes of 1ϫ PBS, pH 7.4, and proteins were eluted with 100 mM glycine, pH 2.5. Eluted fractions containing purified Fab were buffer-exchanged into 10 mM Tris-HCl, pH 8.0, 150 mM NaCl, and protein concentrations were determined by absorption at 280 nm.
Crystallization and Data Processing-Fab APE1531 (5.8 mg/ml) was crystallized from 12% PEG 20,000, 0.1 M NaCl, 0.1 M MES, pH 6.5, at 22°C by sitting drop vapor diffusion by mixing 1.0 l of protein solution with 0.5 l of reservoir solution. Crystals were cryo-protected with 30% glycerol in reservoir solution and vitrified in liquid nitrogen. A complete data set to 1.75 Å was collected at the Stanford Synchrotron Radiation Lightsource beamline 11-1 (Palo Alto, CA), integrated with XDS (24), and scaled with SCALA (25).
Fab APE1551 (6.5 mg/ml) was crystallized from 16% PEG 6000, 0.1 M citric acid, pH 6 -7, at 22°C by sitting drop vapor diffusion by mixing 0.5 l of protein solution with 0.5 l of reservoir solution. Crystals were cryo-protected with 30% glycerol in reservoir solution and vitrified in liquid nitrogen. A complete data set to 1.60 Å was collected at the Stanford Synchrotron Radiation Lightsource beamline 12-2 and was integrated and scaled with HKL2000 (26).
Structure Determination, Refinement, and Analysis-The structure of the APE1531 Fab was determined by molecular replacement (MR) to 1.75 Å resolution in monoclinic space group C2 (V M ϭ 2.3 Å 3 /Da for one molecule per asymmetric unit). The Protein Data Bank (PDB) was searched for sequences with the highest identity to the individual Ig domains of APE1531. 3QOT for the V L :V H and 3SOB for C L :C H 1 were then used as MR templates, and MR solutions were found using PHASER (27). The MR model was subjected to rigid body refinement and restrained all atom refinement with REFMAC5 (28). Further refinement was achieved by alternating cycles of model building with COOT (29) and refinement with REFMAC5. The final model was refined to R cryst ϭ 15.0% and R free ϭ 18.1% (Table 1) and consists of one APE1531 Fab (chain L residues 1-214 and chain H residues 1-231), 2 MES buffer molecules, 4 acetate molecules, 1 short-chain PEG molecule, and 373 waters per asymmetric unit. The structure of the APE1551 Fab was determined by molecular replacement to 1.60 Å resolution in monoclinic space group C2 (V M ϭ 2.5 Å 3 /Da for one molecule per asymmetric unit). The individual Ig domains of APE1531 were used as MR templates, and the MR solution was rebuilt and refined similarly as described above. The final model was refined to R cryst ϭ 16.6% and R free ϭ 19.8% and consists of one APE1551 Fab (chain L residues 1-214 and chain H residues 2-230), and 434 waters per asymmetric unit. The final statistics for APE1531 and APE1551 are shown in Table 1. The quality of the structures was evaluated with the quality control features implemented in COOT, and superimpositions were done with SSM or LSQ, as implemented in COOT. Coordinates and structure factors have been deposited in the PDB Protein Data Bank with accession numbers 4NWT (APE1531) and 4NWU (APE1551).

RESULTS
To characterize the indel repertoire of antibodies in vivo and in vitro, antibody V-regions from several populations were sequenced ( Table 2). Normal human PBMC samples, obtained from 68 normal human donors and composed of both immature and mature B cells, were sequenced, and high quality V-region sequences were mapped to the closest human germ line V-region sequence. To investigate the in situ creation, selection and maturation of indels emanating from individual antibodies, an in vitro SHM system coupled with a mammalian cell display of full-length IgGs was used (18). In total, 39 human germ line antibodies and 14 CDR-grafted antibodies, directed against 21 unique antigens, were matured to high affinity via SHM in vitro. In addition, two in vitro SHM populations were collected for comparison in which antibodies were co-expressed with AID, without selection for improved antigen binding. Antibodies were also expressed in vitro in the absence of AID, to assess the spectrum of indels specific to sample handling and sequencing.
Indels were detected in both in vivo and in vitro antibody sequences, but not in in vitro samples lacking AID (Table 2). Indels were found in 0.52 and 0.46% of all in vivo HC and LC sequences, respectively, with 0.41 and 0.33% of HC and LC containing in-frame indels. Indels were also detected in sequences from in vitro AID samples with or without selection for improved antigen binding, but not in sequences from in vitro samples lacking AID, indicating that AID is essential and sufficient for indel generation in non-B cells and that the techniques utilized for analysis and in vitro sample manipulation did not generate indels. The frequency of in-frame indels in in vitro AID samples was ϳ0.05-0.1%, a rate 10-fold lower than the in vivo rate. Indels recovered from in vitro AID samples, with or without antigen binding selection, were produced exclusively by local sequence duplication (Fig. 1), consistent with indels observed in vivo here and elsewhere (11,30).
Location of Indels in Vitro and in Vivo-The location of indels in affinity-matured antibodies was comparable in vivo and in vitro. HC indels in vivo were strongly biased toward both HC CDR1 (i.e. CDRH1) and CDRH2, although LC indels preferentially occurred in CDRL1 (Fig. 2). This parallels previous findings, where 47% of insertions and 29% of deletions were found in CDRH1 and 13 and 7% in CDRH2 (5). Because the exact originating sequences of CDR3s in vivo are not known, indels could not be conclusively mapped for this region and were therefore omitted from in vivo analysis.
For antibody sequences co-expressed AID in vitro but in the absence of antigen binding selection, indels were distributed throughout the V-region and were only modestly enriched in CDR regions (Fig. 2). Affinity maturation in vitro led to significant enrichment of indels in CDR regions, most prominently for CDRH1 and CDRL1. Because the originating sequence is known for the in vitro samples, indels within HC and LC CDR3s were analyzed and observed to be enriched in CDRL3 and CDRH3.

TABLE 1 Crystallography data collection and refinement statistics
Crystallographic data collection and refinement statistics for the structural determination of the anti-h␤NGF antibody with (PDB code 4NWU) and without (PDB code 4NWT) a nucleotide insertion in H2.

TABLE 2 NGS and Sanger sequencing results for in vivo and in vitro derived antibodies
The number of samples, reads, and indels observed for in vivo and in vitro derived antibody samples is reported. The number of in-frame (e.g., n ϭ 3,6,9, . . . ) indels observed is shown in parentheses. NA means not available.

JOURNAL OF BIOLOGICAL CHEMISTRY 33561
The majority of in vivo in-frame indels were short (1 or 2 amino acids in length), with ϳ90% of insertions and deletions being Յ3 amino acids. The longest observed insert was 9 amino acids, and the longest deletion was 9 amino acids. For in vitro derived and affinity-matured antibodies, the majority of in-frame indels was relatively short, with 50% comprising less than 4 amino acids (Fig. 2, C and D), although the indels observed in antibodies in the absence of antigen binding selection were significantly longer (up to 18 amino acids). This finding may indicate the increased likelihood of longer indels disrupting antibody structure and/or stability thereby interfering with functional selection. Insertions ranging from 3 to 11 amino acids in length were observed during in vitro maturation with functional selection. In contrast, deletions of 1 or 2 amino acids accounted for the majority of observed events in both the in vitro and in vivo samples (90% Յ 3 amino acids).
In situ indel generation, selection, and evolution in individual antibodies was further examined using in vitro SHM (18). Fifty three antibodies were affinity-matured using iterative FACS selection (0.1-0.5%) of cells binding progressively lower concentrations of fluorescently labeled antigen, and sequencing was performed to identify point mutations and indels following each round of selection. Final affinity-matured antibodies contained a total of 358 point mutations, and nine of the antibodies contained one or more indels (n ϭ 11).
Numerous Sequence-related Indels Generated during Affinity Maturation-Related indels were frequently observed during in vitro maturation of an antibody to an antigen, and enriched insertions were further optimized by point mutations. A total of nine unique insertions were observed at the junction of framework region 1 (FW1)/CDRH1 (HC3-HC11, Fig. 3A) during initial maturation of a germ line antibody to h␤NGF (19). All insertions were generated by local sequence duplication and resulted in a CDRH1 loop that is extended by 1-9 additional amino acids. The majority of the originating insertions were accompanied by point mutations, and the locations of these mutations appear to be semi-conserved among the nine unique sequences containing related indels (Fig. 3A). Only in one instance was the originating and duplicated sequence not immediately adjacent (HC8, Fig. 3B), consistent with reported mechanisms of indel formation in vivo (9 -11).
Several of the anti-h␤NGF HC sequences containing an insertion were recovered, paired with the parental LC, and expressed as full-length IgGs, and binding kinetics were characterized using SPR. All of the HCs tested (HC3-6) improved affinity for h␤NGF (K D ϭ 900, 2.1, 20, and 6 nM, respectively) over that of the parental antibody (K D Ͼ Ͼ1 M) (Fig. 4) with corresponding improvements in bioactivity (19). During the course of affinity maturation, additional point mutations were observed within the FW1/CDRH1 region that improved bind- ing affinity (Fig. 3), presumably by means of direct contacts between FW1/CDRH1 residues and the antigen.
Likewise, multiple point mutations as well as three unique insertions (LC2-LC4) in CDRL1 were observed during affinity maturation of an antibody to hGFR␣1 (Fig. 3B). When tested by SPR, the most frequently observed insertion (LC3) improved antigen binding (K D ϭ 26 nM) over that of the parental antibody (K D Ͼ1 M) (Fig. 4, E and F).
Crystal Structures of an Antibody with and without an Indel-To examine the role of in vitro indels on antibody conformation and function, crystal structures of Fab fragments of the anti-h␤NGF antibody before and after incorporation of the insertion in CDRH1 were determined. The structure of the originating antibody was determined to 1.75 Å (Table 1). Ala L51 is the only residue in the disallowed Ramachandran region but has very well defined electron density. L51 is in a conserved ␥-turn in almost all other antibodies and often mis-assigned by analysis programs as an outlier (31).
The antibody resembles that of a typical Fab with the dominant features of the combining site being the long CDRH3 and CDRL1 loops that extend over the remainder of the CDR loops. Both the conformation and length of these CDRs are within the range of previously observed Fab crystal structures, as confirmed by three-dimensional structure searches for the individual V H and V L domains. All but two side chains in CDRH3, Trp H100c and Arg H100g , in the combining site display well defined electron density, including all residues of CDRH1 (Fig.  5, A and B). Elongated additional electron density in the center of the combining site was modeled as a short chain PEG located between the base of CDRH3, CDRH2, and the tip of the shorter CDRL3.
The Fab structure containing the nine amino acid insertion in CDRH1 (HC5 in Fig. 3B) was crystallized in same space group with similar unit cell constants as the parent antibody, and its structure was determined to 1.6 Å resolution (Table 1). Most residues were well ordered with the exception of CDRH1. No electron density was observed for Asp H27 -Ala H33 and Thr H33a , which is the first residue of the insertion, although the remainder of the insertion (Gly H33b -Ala H33i ) and residues C-terminal thereof had clearly defined electron density (Fig. 5C). . Improvement in binding affinity for antibodies containing insertions. Improvement of antigen binding affinity for anti-h␤NGF and anti-hGFR␣1 antibodies containing CDRH1 and CDRL1 insertions, respectively. SPR was carried out by antibody capture at low density on an anti-human IgG surface followed by flowing antigen over the surface. SPR sensorgrams are shown for an anti-h␤NGF antibody containing two point mutations, S31N and L45F in CDRH1 and FW2 (A); the same antibody with additionally incorporated insertions derived from in vitro SHM with affinity improvements of Ͼ20-fold (corresponding to Fig. 3A HC4-HC6) is shown (B-D). SPR sensorgrams for an anti-hGFR␣1 antibody containing no mutations are shown in E and for the same antibody with a 5-amino acid insertion in CDRL1 (LC3 in Fig. 3B paired with parental HC), which confers an Ͼ40-fold improvement in affinity, in F. RU, response unit.
Aside from the major rearrangements in CDRH1, the overall structure and the conformation of the combining site of the Fab containing the insertion did not significantly differ from that of the originating antibody (Fig. 5D). All atoms of the two Fabs superimpose with an r.m.s.d. of 0.58 Å; for V L all atoms superimpose with an r.m.s.d. of 0.51 Å and for V H with an r.m.s.d. of FIGURE 5. Crystal structures of an antibody with and without a 9-residue insertion. Crystal structures are shown of an anti-human ␤NGF antibody Fab with (PDB code 4NWU) and without (PDB code 4NWT) a 9-residue insertion in CDRH1. A, alignment of the CDRH1 antibody sequence with and without the insertion; Kabat numbering is listed vertically above. The insert is indicated in red, and residues that were not ordered in the crystal structure are shown with lighter shading. B, crystal structure and 2F o Ϫ F c electron density contoured at 1 of CDRH1 and flanking residues of the parent; C, insertion containing antibody. Insert residues are shown in red, and original sequence is shown in blue. No electron density was observed for eight residues, Asp H27 -Ala H33 and the first residue of the insertion, Thr H33a . D, schematic representation of the superimposed structures of the variable domains with and without CDRH1 insertion, color-coded as above. There are no major structural differences except in CDRH1, where the C-terminal portion of the 9-amino acid insertion occupies the place of the original residues, although the N-terminal part undergoes local rearrangements and leads to disorder indicative of conformational flexibility.
1.34 Å. The all-atom r.m.s.d. values for V H without CDRH1 residues Lys H23 -Ser H35 is 0.46 Å, indicating that indeed the only major structural differences between the two Fabs can be attributed to the CDRH1 region (Fig. 5D). Well defined electron density was observed for the side chains of Trp H100c and Arg H100g , whereas the side chain of Tyr L94 appears to exist in different rotamers, one of which occupies the position where the elongated ligand was observed in the parental antibody.
In this study, SHM in vitro was used to dissect the in situ creation, selection, and maturation of indels in antibodies, as compared with in vivo antibody repertoires. We found that heterologous expression of AID is sufficient to generate indels in vitro, which were generated throughout the V-region in the absence of antigen binding selection, and which subsequently were enriched in CDR regions during affinity maturation. Analysis of indel location after SHM and antigen selection in vitro is remarkably similar to that observed in in vivo derived antibodies. Multiple sequence-related indels were often generated for a maturing antibody, with indel enrichment and subsequent point mutations facilitating rapid optimization of the binding paratope. When tested, indels were found to improve both antibody affinity and function.
Indels originate from sequence duplication in vivo and in vitro and, due to the lack of a distinct editing mechanism and AID-mediated point mutations, the resulting sequence space that can be explored is vast. CDR loops have been shown to adopt a limited set of canonical conformations that are dependent on the side-chain packing, hydrogen bonding, or conformational preferences of key residues (36,37). Indels preferentially map to CDRs (5) and thereby significantly expand the structural repertoire of individual CDRs through generation of a multitude of novel, unique structural solutions (38). Comparison of in vitro and in vivo derived antibodies containing SHM indels demonstrates a strong preference for the addition of indels at conserved sites proximal to the CDRH3/L3 regions (Fig. 6), where they are well positioned to extend and optimize the antigen recognition surface (14,15).
Analysis of SHM in vitro allows facile analysis of the nature and composition of indels in the CDR3 region. Such analysis is difficult or impossible to carry out in antibodies derived from in vivo sources due to the nontemplated nature of the V(D)J rearrangements. The unique sequence composition of the CDR3s is thought to provide initial antigen recognition, with point mutations and indels modulating affinity and specificity by expanding the interaction interface or by remodeling the combining site (16). Notably, indels appear to play a prominent role in the optimization of antibodies to pathogens, which present rapidly evolving antigens (e.g. HIV), or to antigens whose topological, conformational, or surface chemistry properties (e.g. glycosylation) make generation of high affinity antibodies difficult. Significant mutation of framework regions and pronounced use of indels suggest that immune responses to these challenging antigens (1, 14 -16, 39, 40) often require significant remodeling of regions peripheral to the combining site center. Previous analysis of non-IgG protein structures found that indel residues are disordered and preferentially occur in regions of increased disorder tolerant to accommodating new sequences (41)(42)(43)(44)(45)(46). This work extends previous reports demonstrating the importance of secondary mechanisms in antibody diversification beyond V(D)J recombination and AID-mediated single nucleotide mutations. Generation of multiple sequence-related indels and their subsequent optimization by SHM dramatically increases the sequence and structure space that can be explored by the adaptive immune system. Many parallels to protein evolution in general are apparent, including preferential location in regions of structural plasticity, intrinsic disorder, and confor-mational flexibility, maintenance of overall fold with local modification of secondary structures in the vicinity of the indel, and a "mutagenic" effect of indels on their flanking regions.