Advertisement

How repertoire data are changing antibody science

Open AccessPublished:May 14, 2020DOI:https://doi.org/10.1074/jbc.REV120.010181
      Antibodies are vital proteins of the immune system that recognize potentially harmful molecules and initiate their removal. Mammals can efficiently create vast numbers of antibodies with different sequences capable of binding to any antigen with high affinity and specificity. Because they can be developed to bind to many disease agents, antibodies can be used as therapeutics. In an organism, after antigen exposure, antibodies specific to that antigen are enriched through clonal selection, expansion, and somatic hypermutation. The antibodies present in an organism therefore report on its immune status, describe its innate ability to deal with harmful substances, and reveal how it has previously responded. Next-generation sequencing technologies are being increasingly used to query the antibody, or B-cell receptor (BCR), sequence repertoire, and the amount of BCR data in public repositories is growing. The Observed Antibody Space database, for example, currently contains over a billion sequences from 68 different studies. Repertoires are available that represent both the naive state (i.e. antigen-inexperienced) and that after immunization. This wealth of data has created opportunities to learn more about our immune system. In this review, we discuss the many ways in which BCR repertoire data have been or could be exploited. We highlight its utility for providing insights into how the naive immune repertoire is generated and how it responds to antigens. We also consider how structural information can be used to enhance these data and may lead to more accurate depictions of the sequence space and to applications in the discovery of new therapeutics.
      Antibodies are proteins that play a key role in the adaptive immune response. They are produced by B cells and are either secreted or membrane-bound (in the latter case, they are known as B-cell receptors, or BCRs). They are able to neutralize and initiate the removal of foreign entities (known as antigens) from the body by binding to them (
      • Sela-Culang I.
      • Kunik V.
      • Ofran Y.
      The structural basis of antibody-antigen recognition.
      ). The ability of the immune system to respond to a huge range of antigens originates in the diversity of the antibodies that can be generated—antibodies can be produced that bind to nearly every antigen, with both high specificity and affinity (
      • Saper C.B.
      A guide to the perplexed on the specificity of antibodies.
      ). This property has made antibodies highly successful as therapeutics; to date, 87 have been approved for use in the clinic across a number of disease areas, and many more are undergoing clinical trials (
      • Ecker D.M.
      • Jones S.D.
      • Levine H.L.
      The therapeutic monoclonal antibody market.
      ,
      • Raybould M.I.J.
      • Marks C.
      • Lewis A.P.
      • Shi J.
      • Bujotzek A.
      • Taddese B.
      • Deane C.M.
      Thera-SAbDab: the therapeutic structural antibody database.
      ). Antibodies are currently the largest class of biotherapeutic (
      • Kaplon H.
      • Reichert J.M.
      Antibodies to watch in 2019.
      ).
      It is estimated that the human antibody repertoire contains around 1013 unique sequences (
      • Greiff V.
      • Miho E.
      • Menzel U.
      • Reddy S.T.
      Bioinformatic and statistical analysis of adaptive immune repertoires.
      ). This diversity is a result of how the proteins are encoded in the genome. Antibodies are composed of two types of protein chain, known as the heavy and light chains (Fig. 1). Each of these is encoded by multiple gene segments that are spliced together using a process called V(D)J recombination (
      • Tonegawa S.
      Somatic generation of antibody diversity.
      ). The sequence for the light-chain variable region (Fv) is made up of two segments: the variable segment (V) and the joining segment (J). The heavy chain is encoded from variable, joining, and diversity (D) segments. There are many genes for each of the V, D, and J segments, which can be matched up in different combinations to produce a diverse range of antibody sequences. Further diversity is introduced through the insertion or deletion of nucleotides at the segment junctions (
      • Jeske D.J.
      • Jarvis J.
      • Milstein C.
      • Capra J.D.
      Junctional diversity.
      ) and somatic hypermutation (a process through which the number of random mutations that occur is increased) (
      • Schramm C.A.
      • Douek D.C.
      Beyond hot spots: biases in antibody somatic hypermutation and implications for vaccine design.
      ). The majority of the variation in sequence occurs in the complementarity-determining regions, or CDRs—there are three of these on each of the heavy and light chains. The most variable of these is the H3 loop (the third CDR on the heavy chain), because the DNA encoding it is found at the join between the V, D, and J segments. By creating a large, diverse repertoire of antibody sequences, an individual is able to react to almost any antigen it may encounter.
      Figure thumbnail gr1
      Figure 1A, antibody structure. An antibody is made up of four chains: two light (orange) and two heavy (blue). Each chain is made up of a series of domains—the variable domains of the light and heavy chains together are known as the Fv region (shown on the right; PDB entry 12E8). The Fv features six loops known as CDRs (shown in dark blue); these are mainly responsible for antigen binding. B, example sequences for the VH and VL, highlighting the CDR regions and the genetic composition.
      The ability of an antibody to bind to its target antigen is governed by its three-dimensional structure. Knowledge of an antibody’s structure therefore allows for a deeper understanding of its physicochemical properties than can be gained from sequence alone. The general structure of an antibody is depicted in Fig. 1 The heavy and light variable domains both adopt a β-sandwich structure known as the immunoglobulin fold. Framework (non-CDR) regions are very highly conserved between different antibodies; in accordance with the observed variability of antibody sequences, the structural diversity that allows binding to many different targets occurs mainly in the CDRs. These correspond to loops in the three-dimensional structure, which are responsible for most of the antigen-binding interactions (
      • Collis A.V.
      • Brouwer A.P.
      • Martin A.C.
      Analysis of the antigen combining site: correlations between length and sequence composition of the hypervariable loops and the nature of the antigen.
      ). For five of the six CDRs (H1, H2, and L1–L3), structural diversity is limited—only a few different shapes have been observed, forming a set of discrete conformational classes known as canonical structures. However, as described above, the H3 loop is much more variable in sequence than the other CDRs and consequently is also more structurally diverse. It is thought that the H3 loop contributes the most to antigen-binding properties (
      • Xu J.L.
      • Davis M.M.
      Diversity in the CDR3 region of V.
      ,
      • Kuroda D.
      • Shirai H.
      • Jacobson M.P.
      • Nakamura H.
      Computer-aided antibody design.
      ).
      Upon exposure to an antigen, antibodies that are able to bind to it do so and are thus selected from the repertoire (clonal selection) (
      • Burnet F.M.
      Theories of immunity.
      ). Having a large repertoire of antibodies present in the body at any time increases the chance that at least one has the ability to bind to the antigen, even if only weakly, thereby allowing the initiation of an appropriate immune response. B cells producing binding antibodies undergo cycles of proliferation (clonal expansion) with simultaneous somatic hypermutation (
      • Schramm C.A.
      • Douek D.C.
      Beyond hot spots: biases in antibody somatic hypermutation and implications for vaccine design.
      ) to produce antibodies with higher affinity. The antibody repertoire is consequently enriched with antibodies that bind to the target antigen.
      The antibodies present in an organism therefore describe both its current and past immune status; what it is able to respond to, and what it has previously dealt with. Whereas previously only a handful of sequences could be obtained at a time, technological advances mean that large snapshots of this repertoire can now be obtained using next-generation sequencing approaches. This technique of BCR repertoire sequencing was first described by Glanville et al. in 2009 (
      • Glanville J.
      • Zhai W.
      • Berka J.
      • Telman D.
      • Huerta G.
      • Mehta G.R.
      • Ni I.
      • Mei L.
      • Sundar P.D.
      • Day G.M.
      • Cox D.
      • Rajpal A.
      • Pons J.
      Precise determination of the diversity of a combinatorial antibody library gives insight into the human immunoglobulin repertoire.
      ), and since then the volume of data available has increased exponentially (Fig. 2). As it is the H3 loop that mostly determines binding properties, many studies have focused only on sequencing this region. However, BCR repertoires containing full-length sequences are increasingly being produced—commonly only the heavy chain (
      • Georgiou G.
      • Ippolito G.C.
      • Beausang J.
      • Busse C.E.
      • Wardemann H.
      • Quake S.R.
      The promise and challenge of high-throughput sequencing of the antibody repertoire.
      ), but some studies have focused only the light chain (e.g. Refs.
      • Ota M.
      • Duong B.H.
      • Torkamani A.
      • Doyle C.M.
      • Gavin A.L.
      • Ota T.
      • Nemazee D.
      Regulation of the B cell receptor repertoire and self-reactivity by BAFF.
      and
      • Zhou T.
      • Zhu J.
      • Wu X.
      • Moquin S.
      • Zhang B.
      • Acharya P.
      • Georgiev I.S.
      • Altae-Tran H.R.
      • Chuang G.-Y.
      • Joyce M.G.
      • Kwon Y.D.
      • Longo N.S.
      • Louder M.K.
      • Luongo T.
      • McKee K.
      • et al.
      Multidonor analysis reveals structural elements, genetic determinants, and maturation pathway for HIV-1 neutralization by VRC01-class antibodies.
      ), and some data sets include both (e.g. Refs.
      • Vander Heiden J.A.
      • Stathopoulos P.
      • Zhou J.Q.
      • Chen L.
      • Gilbert T.J.
      • Bolen C.R.
      • Barohn R.J.
      • Dimachkie M.M.
      • Ciafaloni E.
      • Broering T.J.
      • Vigneault F.
      • Nowak R.J.
      • Kleinstein S.H.
      • O'Connor K.C.
      Dysregulation of B cell repertoire formation in myasthenia gravis patients revealed through deep sequencing.
      and
      • Gidoni M.
      • Snir O.
      • Peres A.
      • Polak P.
      • Lindeman I.
      • Mikocziova I.
      • Sarna V.K.
      • Lundin K.E.A.
      • Clouser C.
      • Vigneault F.
      • Collins A.M.
      • Sollid L.M.
      • Yaari G.
      Mosaic deletion patterns of the human antibody heavy chain gene locus shown by Bayesian haplotyping.
      ). Recent advances in sequencing technology have led to a small but growing number of repertoires that also include native pairing information (i.e. which heavy-chain sequences belong with which light-chain sequences).
      Figure thumbnail gr2
      Figure 2The cumulative growth of publicly available (redundant) antibody sequences over time (data from the Observed Antibody Space database (
      • Kovaltsuk A.
      • Leem J.
      • Kelm S.
      • Snowden J.
      • Deane C.M.
      • Krawczyk K.
      Observed antibody space: a resource for data mining next-generation sequencing of antibody repertoires.
      )).
      The largest repertoire sequencing study to date, by Briney et al. (
      • Briney B.
      • Inderbitzin A.
      • Joyce C.
      • Burton D.R.
      Commonality despite exceptional diversity in the baseline human antibody repertoire.
      ), alone resulted in a set of over 300 million heavy-chain sequences. In addition, many algorithms and pipelines have now been created that preprocess the generated data ready for analysis, performing tasks such as translation from nucleotides to amino acids, error estimation and correction, and sequence numbering (
      • López-Santibáñez-Jácome L.
      • Avendaño-Vázquez S.E.
      • Flores-Jasso C.F.
      The pipeline repertoire for Ig-Seq analysis.
      ). Recently, efforts have been made to create standardized, publicly available repositories for these sequencing data (e.g. iReceptor (
      • Corrie B.D.
      • Marthandan N.
      • Zimonja B.
      • Jaglale J.
      • Zhou Y.
      • Barr E.
      • Knoetze N.
      • Breden F.M.W.
      • Christley S.
      • Scott J.K.
      • Cowell L.G.
      • Breden F.
      iReceptor: a platform for querying and analyzing antibody/B-cell and T-cell receptor repertoire data across federated repositories.
      ), VDJServer (
      • Christley S.
      • Scarborough W.
      • Salinas E.
      • Rounds W.H.
      • Toby I.T.
      • Fonner J.M.
      • Levin M.K.
      • Kim M.
      • Mock S.A.
      • Jordan C.
      • Ostmeyer J.
      • Buntzman A.
      • Rubelt F.
      • Davila M.L.
      • Monson N.L.
      • et al.
      VDJServer: A cloud-based analysis portal and data commons for immune repertoire sequences and rearrangements.
      ), ImmuneDB (
      • Rosenfeld A.M.
      • Meng W.
      • Luning Prak E.T.
      • Hershberg U.
      ImmuneDB, a novel tool for the analysis, storage, and dissemination of immune repertoire sequencing data.
      ), and others (
      • Chailyan A.
      • Tramontano A.
      • Marcatili P.
      A database of immunoglobulins with integrated tools: DIGIT.
      ,
      • Swindells M.B.
      • Porter C.T.
      • Couch M.
      • Hurst J.
      • Abhinandan K.R.
      • Nielsen J.H.
      • Macindoe G.
      • Hetherington J.
      • Martin A.C.
      abYsis: integrated antibody sequence and structure-management, analysis, and prediction.
      ,
      • Zhang W.
      • Wang L.
      • Liu K.
      • Wei X.
      • Yang K.
      • Du W.
      • Wang S.
      • Guo N.
      • Ma C.
      • Luo L.
      • Wu J.
      • Lin L.
      • Yang F.
      • Gao F.
      • Wang X.
      • et al.
      PIRD: Pan Immune Repertoire Database.
      ,
      • Kovaltsuk A.
      • Leem J.
      • Kelm S.
      • Snowden J.
      • Deane C.M.
      • Krawczyk K.
      Observed antibody space: a resource for data mining next-generation sequencing of antibody repertoires.
      ,
      • DeWitt W.S.
      • Lindau P.
      • Snyder T.M.
      • Sherwood A.M.
      • Vignali M.
      • Carlson C.S.
      • Greenberg P.D.
      • Duerkopp N.
      • Emerson R.O.
      • Robins H.S.
      A public database of memory and naive B-cell receptor sequences.
      )). This has provided researchers with easy access to a vast number of sequences and created opportunities for large-scale data mining. The Observed Antibody Space (OAS) database, for example, which collates full-length variable region sequences, currently contains over 1 billion sequences spanning 68 different studies (
      • Kovaltsuk A.
      • Leem J.
      • Kelm S.
      • Snowden J.
      • Deane C.M.
      • Krawczyk K.
      Observed antibody space: a resource for data mining next-generation sequencing of antibody repertoires.
      ).
      The studies included in OAS cover many different repertoire characteristics. Sequences are available for six different species, with the majority (64%) being human. Diseased states are represented (i.e. repertoires from individuals who have been exposed to a specific antigen) as well as healthy ones (meaning the individual has not been exposed to the antigen of interest and also has not suffered from a disorder of the immune system). Repertoires from vaccination studies also feature (e.g. HIV, hepatitis B, flu, etc.), and in some cases, OAS has the repertoires of the same individual both pre- and post-immunization. Although the snapshots of the repertoire achieved through sequencing are actually small relative to the potential number of antibodies present in an organism (e.g. data sets in OAS contain between 20,000 and 300 million redundant sequences) and most studies feature only the heavy chain or have no pairing information, the data available still provides opportunities to investigate many different aspects of the immune response. In this review, we explore what can be done with the wealth of antibody sequence data stored in repositories such as OAS. We give examples of how this data has been used to give insights into the workings of the immune system, look at how it can be enhanced with structural information, explore how it offers new avenues for therapeutic antibody discovery and development, and consider what advances may be made in the future.

      Biological insights from antibody repertoire data

      Until the advent of BCR repertoire sequencing, antibody sequences were analyzed in much smaller numbers (normally a few hundred B cells per experiment (
      • Georgiou G.
      • Ippolito G.C.
      • Beausang J.
      • Busse C.E.
      • Wardemann H.
      • Quake S.R.
      The promise and challenge of high-throughput sequencing of the antibody repertoire.
      )), only a tiny fraction of the estimated total repertoire. This approach can be useful when investigating a few key antibodies (e.g. those that bind to an antigen of interest (e.g. Refs.
      • Wrammert J.
      • Smith K.
      • Miller J.
      • Langley W.A.
      • Kokko K.
      • Larsen C.
      • Zheng N.Y.
      • Mays I.
      • Garman L.
      • Helms C.
      • James J.
      • Air G.M.
      • Capra J.D.
      • Ahmed R.
      • Wilson P.C.
      Rapid cloning of high-affinity human monoclonal antibodies against influenza virus.
      and
      • Yu X.
      • Tsibane T.
      • McGraw P.A.
      • House F.S.
      • Keefer C.J.
      • Hicar M.D.
      • Tumpey T.M.
      • Pappas C.
      • Perrone L.A.
      • Martinez O.
      • Stevens J.
      • Wilson I.A.
      • Aguilar P.V.
      • Altschuler E.L.
      • Basler C.F.
      • Crowe Jr., J.E.
      Neutralizing antibodies derived from the B cells of 1918 influenza pandemic survivors.
      )) but cannot give an in-depth view of the repertoire as a whole (e.g. little can be learned about its diversity). Analysis of larger repertoire snapshots, on the other hand, gives a much more detailed picture and can provide valuable insights into how the immune system works. It can be used to explain how in its naive state (i.e. before exposure to a given antigen) it is capable of protecting against such diverse threats and can give a deeper understanding of the processes that produce higher-affinity antibodies after antigen exposure.
      Sequencing data has been used to learn more about the underlying mechanisms that shape the repertoire, such as V(D)J recombination (
      • Frost S.D.
      • Murrell B.
      • Hossain A.S.M.
      • Silverman G.J.
      • Pond S.L.
      Assigning and visualizing germline genes in antibody repertoires.
      ,
      • Miho E.
      • Yermanos A.
      • Weber C.R.
      • Berger C.T.
      • Reddy S.T.
      • Greiff V.
      Computational strategies for dissecting the high-dimensional complexity of adaptive immune repertoires.
      ). Increasing amounts of large-scale sequence data, along with the development of computational tools that annotate sequences with their V(D)J gene origins (
      • Gadala-Maria D.
      • Yaari G.
      • Uduman M.
      • Kleinstein S.H.
      Automated analysis of high-throughput B-cell sequencing data reveals a high frequency of novel immunoglobulin V gene segment alleles.
      ,
      • Gupta N.T.
      • Vander Heiden J.A.
      • Uduman M.
      • Gadala-Maria D.
      • Yaari G.
      • Kleinstein S.H.
      Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data.
      ,
      • Corcoran M.M.
      • Phad G.E.
      • Vázquez Bernat N.V.
      • Stahl-Hennig C.
      • Sumida N.
      • Persson M.A.
      • Martin M.
      • Karlsson Hedestam G.B.
      Production of individualized v gene databases reveals high levels of immunoglobulin genetic diversity.
      ,
      • Marcou Q.
      • Mora T.
      • Walczak A.M.
      High-throughput immune repertoire analysis with IGoR.
      ), have allowed trends in this process to be identified. It has been shown that the process is intrinsically biased; the available V, D, and J segments in the genome are not used with the same frequency, and therefore some combinations are observed more commonly than others (
      • Glanville J.
      • Zhai W.
      • Berka J.
      • Telman D.
      • Huerta G.
      • Mehta G.R.
      • Ni I.
      • Mei L.
      • Sundar P.D.
      • Day G.M.
      • Cox D.
      • Rajpal A.
      • Pons J.
      Precise determination of the diversity of a combinatorial antibody library gives insight into the human immunoglobulin repertoire.
      ,
      • Feeney A.J.
      • Tang A.
      • Ogwaro K.M.
      B-cell repertoire formation: role of the recombination signal sequence in non-random V segment utilization.
      ,
      • Greiff V.
      • Menzel U.
      • Miho E.
      • Weber C.
      • Riedel R.
      • Cook S.
      • Valai A.
      • Lopes T.
      • Radbruch A.
      • Winkler T.H.
      • Reddy S.T.
      Systems analysis reveals high genetic and antigen-driven predetermination of antibody repertoires throughout B cell development.
      ,
      • Weinstein J.A.
      • Jiang N.
      • White 3rd, R.A.
      • Fisher D.S.
      • Quake S.R.
      High-throughput sequencing of the zebrafish antibody repertoire.
      ,
      • Glanville J.
      • Kuo T.C.
      • von Büdingen H.C.
      • Guey L.
      • Berka J.
      • Sundar P.D.
      • Huerta G.
      • Mehta G.R.
      • Oksenberg J.R.
      • Hauser S.L.
      • Cox D.R.
      • Rajpal A.
      • Pons J.
      Naive antibody gene-segment frequencies are heritable and unaltered by chronic lymphocyte ablation.
      ). Mathematical models of V(D)J recombination have been developed that reproduce the natural biases (
      • Elhanati Y.
      • Sethna Z.
      • Marcou Q.
      • Callan Jr., C.G.
      • Mora T.
      • Walczak A.M.
      Inferring processes underlying B-cell repertoire diversity.
      ,
      • Elhanati Y.
      • Marcou Q.
      • Mora T.
      • Walczak A.M.
      RepgenHMM: A dynamic programming tool to infer the rules of immune receptor generation from sequence data.
      ). It has been proposed that this has the potential to aid in the discovery of new antibody therapeutics—replicating the underlying architecture of observed human repertoires should lead to the creation of more human-like (and hence less immunogenic) screening libraries (
      • Miho E.
      • Roškar R.
      • Greiff V.
      • Reddy S.T.
      Large-scale network analysis reveals the sequence space architecture of antibody repertoires.
      ).
      During the proliferation of B cells in clonal selection, the rate of mutation is increased up to 106-fold (
      • Odegard V.H.
      • Schatz D.G.
      Targeting of somatic hypermutation.
      ) compared with normal cells, due to somatic hypermutation (as described earlier). Variations on the original antigen-binding antibody sequence are therefore generated, and higher-affinity antibodies are iteratively produced. Repertoire data has been used to analyze this process (
      • Yaari G.
      • Uduman M.
      • Kleinstein S.H.
      Quantifying selection in high-throughput immunoglobulin sequencing data sets.
      ,
      • Sheng Z.
      • Schramm C.A.
      • Kong R.
      • Mullikin J.C.
      • Mascola J.R.
      • Kwong P.D.
      • Shapiro L.
      • NISC Comparative Sequencing Program
      Gene-specific substitution profiles describe the types and frequencies of amino acid changes during antibody somatic hypermutation.
      ,
      • Yaari G.
      • Vander Heiden J.A.
      • Uduman M.
      • Gadala-Maria D.
      • Gupta N.
      • Stern J.N.H.
      • O'Connor K.C.
      • Hafler D.A.
      • Laserson U.
      • Vigneault F.
      • Kleinstein S.H.
      Models of somatic hypermutation targeting and substitution based on synonymous mutations from high-throughput immunoglobulin sequencing data.
      ,
      • Hoehn K.B.
      • Lunter G.
      • Pybus O.G.
      A phylogenetic codon substitution model for antibody lineages.
      ,
      • Horns F.
      • Vollmers C.
      • Dekker C.L.
      • Quake S.R.
      Signatures of selection in the human antibody repertoire: selective sweeps, competing subclones, and neutral drift.
      ). This has increased our understanding of mutation frequencies, substitution bias, and the location of mutation hot spots and, hence, how the repertoire reacts to an antigenic stimulus. For example, researchers have demonstrated that memory cells of different isotypes experience different selection pressures (
      • Yaari G.
      • Uduman M.
      • Kleinstein S.H.
      Quantifying selection in high-throughput immunoglobulin sequencing data sets.
      ) and that substitution profiles vary between V genes (
      • Sheng Z.
      • Schramm C.A.
      • Kong R.
      • Mullikin J.C.
      • Mascola J.R.
      • Kwong P.D.
      • Shapiro L.
      • NISC Comparative Sequencing Program
      Gene-specific substitution profiles describe the types and frequencies of amino acid changes during antibody somatic hypermutation.
      ), are dependent on neighboring bases, and are conserved across individuals (
      • Yaari G.
      • Vander Heiden J.A.
      • Uduman M.
      • Gadala-Maria D.
      • Gupta N.
      • Stern J.N.H.
      • O'Connor K.C.
      • Hafler D.A.
      • Laserson U.
      • Vigneault F.
      • Kleinstein S.H.
      Models of somatic hypermutation targeting and substitution based on synonymous mutations from high-throughput immunoglobulin sequencing data.
      ). As in the case of V(D)J recombination, these insights have enabled accurate models of somatic hypermutation to be established (
      • Hoehn K.B.
      • Lunter G.
      • Pybus O.G.
      A phylogenetic codon substitution model for antibody lineages.
      ,
      • Horns F.
      • Vollmers C.
      • Dekker C.L.
      • Quake S.R.
      Signatures of selection in the human antibody repertoire: selective sweeps, competing subclones, and neutral drift.
      ). These models have led to the creation of software that simulates repertoires (
      • Yermanos A.
      • Greiff V.
      • Krautler N.J.
      • Menzel U.
      • Dounas A.
      • Miho E.
      • Oxenius A.
      • Stadler T.
      • Reddy S.T.
      Comparison of methods for phylogenetic B-cell lineage inference using time-resolved antibody repertoire simulations (AbSim).
      ) and mean that more accurate B-cell lineages can be established (
      • Hoehn K.B.
      • Lunter G.
      • Pybus O.G.
      A phylogenetic codon substitution model for antibody lineages.
      ). These phylogenies have the potential to be used in the identification of antibodies with high binding affinities (
      • Horns F.
      • Vollmers C.
      • Dekker C.L.
      • Quake S.R.
      Signatures of selection in the human antibody repertoire: selective sweeps, competing subclones, and neutral drift.
      ).
      Researchers have also investigated the interplay between all the processes that dictate repertoire diversity to ascertain how much is genetically predetermined and how much is antigen-driven; analysis indicates that both are important factors, but genetics are more influential (
      • Greiff V.
      • Menzel U.
      • Miho E.
      • Weber C.
      • Riedel R.
      • Cook S.
      • Valai A.
      • Lopes T.
      • Radbruch A.
      • Winkler T.H.
      • Reddy S.T.
      Systems analysis reveals high genetic and antigen-driven predetermination of antibody repertoires throughout B cell development.
      ). Further research has compared the repertoires of humans and other species (
      • Collins A.M.
      • Jackson K.J.
      On being the right size: antibody repertoire formation in the mouse and human.
      ,
      • Skaggs H.
      • Chellman G.J.
      • Collinge M.
      • Enright B.
      • Fuller C.L.
      • Krayer J.
      • Sivaraman L.
      • Weinbauer G.F.
      Comparison of immune system development in nonclinical species and humans: closing information gaps for immunotoxicity testing and human translatability.
      ), revealing that immune system development is broadly similar across different mammals (
      • Skaggs H.
      • Chellman G.J.
      • Collinge M.
      • Enright B.
      • Fuller C.L.
      • Krayer J.
      • Sivaraman L.
      • Weinbauer G.F.
      Comparison of immune system development in nonclinical species and humans: closing information gaps for immunotoxicity testing and human translatability.
      ) and that mice BCR repertoires tend to be closer to germline sequences than those of humans (
      • Collins A.M.
      • Jackson K.J.
      On being the right size: antibody repertoire formation in the mouse and human.
      ). The effect of disease on the immune system has also been studied (
      • Bashford-Rogers R.J.M.
      • Bergamaschi L.
      • McKinney E.F.
      • Pombal D.C.
      • Mescia F.
      • Lee J.C.
      • Thomas D.C.
      • Flint S.M.
      • Kellam P.
      • Jayne D.R.W.
      • Lyons P.A.
      • Smith K.G.C.
      Analysis of the B cell receptor repertoire in six immune-mediated diseases.
      ) and has indicated that repertoire analysis can have more practical applications; for example, it can be used to monitor the diversity of the repertoire before and after an organ transplant (
      • Lai L.
      • Zhou X.
      • Chen H.
      • Luo Y.
      • Sui W.
      • Zhang J.
      • Tang D.
      • Yan Q.
      • Dai Y.
      Composition and diversity analysis of the B-cell receptor immunoglobulin heavy chain complementarity determining region 3 repertoire in patients with acute rejection after kidney transplantation using high-throughput sequencing.
      ), and machine learning methods have been used to predict vaccination status or the presence of disease (
      • Greiff V.
      • Bhat P.
      • Cook S.C.
      • Menzel U.
      • Kang W.
      • Reddy S.T.
      A bioinformatic framework for immune repertoire diversity profiling enables detection of immunological status.
      ,
      • Ostmeyer J.
      • Christley S.
      • Rounds W.H.
      • Toby I.
      • Greenberg B.M.
      • Monson N.L.
      • Cowell L.G.
      Statistical classifiers for diagnosing disease from immune repertoires: a case study using multiple sclerosis.
      ,
      • Arora R.
      • Kapllinsky J.
      • Li A.
      • Arnaout R.
      Repertoire-based diagnostics using statistical biophysics.
      ).
      The overall architecture of the antibody repertoire can be investigated by inferring relationships between sequences (i.e. by predicting which ones originated from the same precursor antibody and hence which bind to the same antigen). One approach is to consider the repertoire as a network, with each sequence being a separate node and the presence of an edge between them indicating an evolutionary relationship (
      • Miho E.
      • Roškar R.
      • Greiff V.
      • Reddy S.T.
      Large-scale network analysis reveals the sequence space architecture of antibody repertoires.
      ). These relationships are normally defined based on sequence identity; for example, two sequences can be connected if they differ by one amino acid in their H3 region (
      • Miho E.
      • Roškar R.
      • Greiff V.
      • Reddy S.T.
      Large-scale network analysis reveals the sequence space architecture of antibody repertoires.
      ). Common network analysis metrics can then be used to explore the repertoire architecture—for example, the degree distribution (the degree of a node is the number of edges it is connected to) can reveal the presence or absence of clonal expansion (
      • Miho E.
      • Yermanos A.
      • Weber C.R.
      • Berger C.T.
      • Reddy S.T.
      • Greiff V.
      Computational strategies for dissecting the high-dimensional complexity of adaptive immune repertoires.
      ), because highly connected nodes are likely to represent sequences derived from a common precursor during affinity maturation (Fig. 3).
      Figure thumbnail gr3
      Figure 3The process of affinity maturation and methods of analyzing the resulting antibody repertoires. A, upon exposure to an antigen, those antibodies present in the naive repertoire that are able to bind to it proliferate, undergoing somatic hypermutation to produce variations upon the initial binder. Successive rounds of this process produce antibodies with high affinity. B, clonotyping groups antibodies in the repertoire based on sequence similarity; normally they must originate from the same V and J genes and have an H3 sequence identity of 80–100%. Antibodies of the same clonotype are predicted to bind to the same epitope. C, network analysis of antibody repertoires, where each node is a different sequence and edges are present between them if they meet set sequence similarity criteria. The lineages of different antibodies can be inferred using this method.
      Clonotyping is another related way of investigating the diversity of repertoires and, in particular, how they change upon antigen exposure. Similar antibody sequences are clustered into “clonotypes”; these are generally defined as sequences originating from the same V and J genes and with H3s that are the same length and similar in sequence (normally a sequence identity of 80–100%) (
      • Jiang N.
      • He J.
      • Weinstein J.A.
      • Penland L.
      • Sasaki S.
      • He X.S.
      • Dekker C.L.
      • Zheng N.Y.
      • Huang M.
      • Sullivan M.
      • Wilson P.C.
      • Greenberg H.B.
      • Davis M.M.
      • Fisher D.S.
      • Quake S.R.
      Lineage structure of the human antibody repertoire in response to influenza vaccination.
      ,
      • Lindner C.
      • Thomsen I.
      • Wahl B.
      • Ugur M.
      • Sethi M.K.
      • Friedrichsen M.
      • Smoczek A.
      • Ott S.
      • Baumann U.
      • Suerbaum S.
      • Schreiber S.
      • Bleich A.
      • Gaboriau-Routhiau V.
      • Cerf-Bensussan N.
      • Hazanov H.
      • et al.
      Diversification of memory B cells drives the continuous adaptation of secretory antibodies to gut microbiota.
      ,
      • Galson J.D.
      • Trück J.
      • Fowler A.
      • Münz M.
      • Cerundolo V.
      • Pollard A.J.
      • Lunter G.
      • Kelly D.F.
      In-depth assessment of within-individual and inter-individual variation in the B cell receptor repertoire.
      ,
      • Galson J.D.
      • Clutterbuck E.A.
      • Trück J.
      • Ramasamy M.N.
      • Münz M.
      • Fowler A.
      • Cerundolo V.
      • Pollard A.J.
      • Lunter G.
      • Kelly D.F.
      BCR repertoire sequencing: different patterns of B-cell activation after two Mningococcal vaccines.
      ), although alternative approaches have been used (
      • Gupta N.T.
      • Adams K.D.
      • Briggs A.W.
      • Timberlake S.C.
      • Vigneault F.
      • Kleinstein S.H.
      Hierarchical clustering can identify B cell clones with high confidence in Ig repertoire sequencing data.
      ). Antibodies belonging to the same clonotype are assumed to share the same precursor sequence (i.e. they arose from the proliferation of the same B cell) and are therefore predicted to bind to the same epitope. This is therefore a method of monitoring the clonal selection and expansion that occurs after exposure to an antigen and can be used to identify the antibodies that bind to a particular target.
      Because the repertoires of many individuals have now been sequenced, we can compare them to identify which characteristics of the repertoire are shared and which are unique to each organism. The idea of “public sequences” has recently been proposed—a set of sequences or clonotypes that are observed in the repertoires of two or more individuals (
      • Briney B.
      • Inderbitzin A.
      • Joyce C.
      • Burton D.R.
      Commonality despite exceptional diversity in the baseline human antibody repertoire.
      ,
      • Miho E.
      • Roškar R.
      • Greiff V.
      • Reddy S.T.
      Large-scale network analysis reveals the sequence space architecture of antibody repertoires.
      ,
      • Galson J.D.
      • Trück J.
      • Fowler A.
      • Münz M.
      • Cerundolo V.
      • Pollard A.J.
      • Lunter G.
      • Kelly D.F.
      In-depth assessment of within-individual and inter-individual variation in the B cell receptor repertoire.
      ,
      • Soto C.
      • Bombardi R.G.
      • Branchizio A.
      • Kose N.
      • Matta P.
      • Sevy A.M.
      • Sinkovits R.S.
      • Gilchuk P.
      • Finn J.A.
      • Crowe J.E.
      High frequency of shared clonotypes in human B cell receptor repertoires.
      ,
      • DeKosky B.J.
      • Kojima T.
      • Rodin A.
      • Charab W.
      • Ippolito G.C.
      • Ellington A.D.
      • Georgiou G.
      In-depth determination and analysis of the human paired heavy- and light-chain antibody repertoire.
      ,
      • Galson J.D.
      • Trück J.
      • Fowler A.
      • Clutterbuck E.A.
      • Münz M.
      • Cerundolo V.
      • Reinhard C.
      • van der Most R.
      • Pollard A.J.
      • Lunter G.
      • Kelly D.F.
      Analysis of B cell repertoire dynamics following hepatitis B vaccination in humans, and enrichment of vaccine-specific antibody sequences.
      ). One may expect that this is rare, due to the enormous potential number of sequences (estimated at 1013) and the relatively small proportion of those sequences sampled in current data sets (the largest samples from a single individual currently have on the order of 106 sequences). However, whereas repertoires are largely unique to the organism (
      • Wang C.
      • Liu Y.
      • Cavanagh M.M.
      • Le Saux S.
      • Qi Q.
      • Roskin K.M.
      • Looney T.J.
      • Lee J.Y.
      • Dixit V.
      • Dekker C.L.
      • Swan G.E.
      • Goronzy J.J.
      • Boyd S.D.
      B-cell repertoire responses to varicella-zoster vaccination in human identical twins.
      ), it has been shown that individuals share more heavy-chain sequences than would be expected by coincidence. Briney et al. (
      • Briney B.
      • Inderbitzin A.
      • Joyce C.
      • Burton D.R.
      Commonality despite exceptional diversity in the baseline human antibody repertoire.
      ), in their recent large-scale study, showed that in the repertoires of 10 individuals, on average 0.95% of clonotypes were shared between at least two subjects, and 0.022% were common to all 10. The pool of subjects contained both men and women, individuals from both Caucasian and African American ethnic backgrounds, and a variety of blood types; the authors report that the repertoires did not cluster based on these factors. The work of Soto et al. (
      • Soto C.
      • Bombardi R.G.
      • Branchizio A.
      • Kose N.
      • Matta P.
      • Sevy A.M.
      • Sinkovits R.S.
      • Gilchuk P.
      • Finn J.A.
      • Crowe J.E.
      High frequency of shared clonotypes in human B cell receptor repertoires.
      ) indicates that this public subrepertoire could be even larger, making up between 1 and 6% of the whole. Greiff et al. (
      • Greiff V.
      • Weber C.R.
      • Palme J.
      • Bodenhofer U.
      • Miho E.
      • Menzel U.
      • Reddy S.T.
      Learning the high-dimensional immunogenomic features that predict public and private antibody repertoires.
      ) have used machine learning techniques, trained on publicly available data sets such as those in OAS, to predict the public or private nature of a given sequence with 80% accuracy, hinting that this property is not random and that there are fundamental characteristics of the sequences that separate the two subsets. In their network-based analysis of antibody H3 sequences, where each node is a unique H3 sequence, Miho et al. (
      • Miho E.
      • Roškar R.
      • Greiff V.
      • Reddy S.T.
      Large-scale network analysis reveals the sequence space architecture of antibody repertoires.
      ) demonstrated that public clonotypes were among the most connected nodes (i.e. they are similar in sequence to many other nodes) and that most private clonotypes (74%) were connected to at least one public one. The removal of public clonotypes from the network therefore changed the underlying repertoire architecture; however, the system was robust to the removal of a large number of randomly selected clonotypes. This implies that public clonotypes are key in maintaining functional immunity against antigens, whereas the presence of other clonotypes is able to fluctuate over time.
      Light chain data has also been analyzed; VL sequences are less diverse than their VH counterparts (
      • Collins A.M.
      • Jackson K.J.
      On being the right size: antibody repertoire formation in the mouse and human.
      ,
      • Jackson K.J.L.
      • Wang Y.
      • Gaeta B.A.
      • Pomat W.
      • Siba P.
      • Rimmer J.
      • Sewell W.A.
      • Collins A.M.
      Divergent human populations show extensive shared IGK rearrangements in peripheral blood B cells.
      ,
      • Hoi K.H.
      • Ippolito G.C.
      Intrinsic bias and public rearrangements in the human immunoglobulin Vλ light chain repertoire.
      ), so the percentage of the repertoire comprising public sequences is much larger. For instance, Soto et al., in a three-individual experiment, observed that 20–34% of light chains (of both κ and λ types) were shared by at least two people (
      • Soto C.
      • Bombardi R.G.
      • Branchizio A.
      • Kose N.
      • Matta P.
      • Sevy A.M.
      • Sinkovits R.S.
      • Gilchuk P.
      • Finn J.A.
      • Crowe J.E.
      High frequency of shared clonotypes in human B cell receptor repertoires.
      ).
      Overall, the presence of shared clonotypes across different individuals, although small, may signal the existence of a baseline common functionality of the immune system. This core subset of the repertoire may be responsible for an organism's response to common antigens (
      • Galson J.D.
      • Trück J.
      • Fowler A.
      • Clutterbuck E.A.
      • Münz M.
      • Cerundolo V.
      • Reinhard C.
      • van der Most R.
      • Pollard A.J.
      • Lunter G.
      • Kelly D.F.
      Analysis of B cell repertoire dynamics following hepatitis B vaccination in humans, and enrichment of vaccine-specific antibody sequences.
      ), and it has been hypothesized that these public clonotypes are more likely to display low levels of immunogenicity and be more versatile binders and hence may be useful starting points in therapeutic development (
      • Setliff I.
      • McDonnell W.J.
      • Raju N.
      • Bombardi R.G.
      • Murji A.A.
      • Scheepers C.
      • Ziki R.
      • Mynhardt C.
      • Shepherd B.E.
      • Mamchak A.A.
      • Garrett N.
      • Karim S.A.
      • Mallal S.A.
      • Crowe Jr., J.E.
      • Morris L.
      • et al.
      Multi-donor longitudinal antibody repertoire sequencing reveals the existence of public antibody clonotypes in HIV-1 infection.
      ).
      M. I. J. Raybould, C. Marks, A. Kovaltsuk, A. P. Lewis, J. Shi, and C. M. Deane, manuscript in preparation.

      Combining sequence with structure

      Although much can be learned from sequences alone, it is the three-dimensional structure of the antibody that determines how it interacts with an antigen and therefore governs its binding properties (
      • Sela-Culang I.
      • Kunik V.
      • Ofran Y.
      The structural basis of antibody-antigen recognition.
      ,
      • Peng H.P.
      • Lee K.H.
      • Jian J.W.
      • Yang A.S.
      Origins of specificity and affinity in antibody-protein interactions.
      ). It is known that CDRs belonging to the same canonical class (i.e. that have nearly identical structures) can have very different sequences, and conversely H3 loops with similar sequences can adopt different conformations (Fig. 4) (
      • Kovaltsuk A.
      • Krawczyk K.
      • Galson J.D.
      • Kelly D.F.
      • Deane C.M.
      • Trück J.
      How B-cell receptor repertoire sequencing can be enriched with structural antibody data.
      ). Therefore, by considering sequence alone (e.g. in clonotyping), antibodies may be grouped together that have structurally dissimilar binding sites, and vice versa (
      • Krawczyk K.
      • Kelm S.
      • Kovaltsuk A.
      • Galson J.D.
      • Kelly D.
      • Trück J.
      • Regep C.
      • Leem J.
      • Wong W.K.
      • Nowak J.
      • Snowden J.
      • Wright M.
      • Starkie L.
      • Scott-Tucker A.
      • Shi J.
      • et al.
      Structurally mapping antibody repertoires.
      ). It is therefore crucial to consider structure as well as sequence to allow more accurate comparisons to be made and to properly understand antibody function.
      Figure thumbnail gr4
      Figure 4Sequence is not always a reliable indicator of structural similarity. A, L1 loops of the PDB entries 3PHO (red) and 3QUM (blue). The two loops differ in sequence at every position except one (sequence identity = 10%); however, they have very similar conformations (RMSD = 0.60 Å). B, H3 loops of the PDB entries 5I1G (red) and 5I1C (blue). These loops have very similar sequences (sequence identity = 92%) and therefore may be predicted to have similar structures; however, this is not the case (RMSD = 4.15 Å). RMSDs were calculated across all backbone atoms after superposition of the anchor residues (two residues on each side of the loop, shown in gray).
      Antibody structures can be obtained experimentally, normally through X-ray crystallography or NMR. However, the sequence-structure gap is large—whereas OAS consists of over a billion sequences, SAbDab, a database of publicly available antibody structures (
      • Dunbar J.
      • Krawczyk K.
      • Leem J.
      • Baker T.
      • Fuchs A.
      • Georges G.
      • Shi J.
      • Deane C.M.
      SAbDab: the structural antibody database.
      ), currently contains ∼4000 entries. This is because experimental structure determination is time-consuming and hence low-throughput; as such, it can be used to probe the chemistry of a select few sequences (
      • Li Y.
      • Li H.
      • Yang F.
      • Smith-Gill S.J.
      • Mariuzza R.A.
      X-ray snapshots of the maturation of an antibody response to a protein antigen.
      ,
      • Huang K.A.
      • Rijal P.
      • Jiang H.
      • Wang B.
      • Schimanski L.
      • Dong T.
      • Liu Y.M.
      • Chang P.
      • Iqbal M.
      • Wang M.C.
      • Chen Z.
      • Song R.
      • Huang C.C.
      • Yang J.H.
      • Qi J.
      • et al.
      Structure-function analysis of neutralizing antibodies to H7N9 influenza from naturally infected humans.
      ), but it cannot yet be used to structurally characterize a BCR repertoire.
      Computational modeling offers an alternative. It has been shown that the majority of antibody sequences from BCR repertoires can be mapped to known structures (
      • Krawczyk K.
      • Kelm S.
      • Kovaltsuk A.
      • Galson J.D.
      • Kelly D.
      • Trück J.
      • Regep C.
      • Leem J.
      • Wong W.K.
      • Nowak J.
      • Snowden J.
      • Wright M.
      • Starkie L.
      • Scott-Tucker A.
      • Shi J.
      • et al.
      Structurally mapping antibody repertoires.
      ). A number of algorithms have been developed that predict the structure of an antibody’s Fv region from its sequence (
      • Leem J.
      • Dunbar J.
      • Georges G.
      • Shi J.
      • Deane C.M.
      ABodyBuilder: automated antibody structure prediction with data-driven accuracy estimation.
      ,
      • Klausen M.S.
      • Anderson M.V.
      • Jespersen M.C.
      • Nielsen M.
      • Marcatili P.
      LYRA, a webserver for lymphocyte receptor structural modeling.
      ,
      • Marcatili P.
      • Rosi A.
      • Tramontano A.
      PIGS: automatic prediction of antibody structures.
      ,
      • Yamashita K.
      • Ikeda K.
      • Amada K.
      • Liang S.
      • Tsuchiya Y.
      • Nakamura H.
      • Shirai H.
      • Standley D.M.
      Kotai antibody builder: automated high-resolution structural modeling of antibodies.
      ,
      • Sivasubramanian A.
      • Sircar A.
      • Chaudhury S.
      • Gray J.J.
      Toward high-resolution homology modeling of antibody Fv regions and application to antibody-antigen docking.
      ,
      • Weitzner B.D.
      • Jeliazkov J.R.
      • Lyskov S.
      • Marze N.
      • Kuroda D.
      • Frick R.
      • Adolf-Bryfogle J.
      • Biswas N.
      • Dunbrack Jr., R.L.
      • Gray J.J.
      Modeling and docking of antibody structures with Rosetta.
      ,
      • Kemmish H.
      • Fasnacht M.
      • Yan L.
      Fully automated antibody structure prediction using BIOVIA tools: validation study.
      ,
      • Bujotzek A.
      • Fuchs A.
      • Qu C.
      • Benz J.
      • Klostermann S.
      • Antes I.
      • Georges G.
      MoFvAb: modeling the Fv region of antibodies.
      ,
      • Whitelegg N.R.J.
      • Rees A.R.
      WAM: an improved algorithm for modelling antibodies on the WEB.
      ,
      • Zhu K.
      • Day T.
      • Warshaviak D.
      • Murrett C.
      • Friesner R.
      • Pearlman D.
      Antibody structure determination using a combination of homology modeling, energy-based refinement and loop prediction.
      ,
      • Maier J.K.X.
      • Labute P.
      Assessment of fully automated antibody homology modeling protocols in molecular operating environment.
      ,
      • Mandal C.
      • Kingery B.D.
      • Anchin J.M.
      • Subramaniam S.
      • Linthicum D.S.
      ABGEN: a knowledge-based automated approach for antibody structure modeling.
      ,
      • Berrondo M.
      • Kaufmann S.
      • Berrondo M.
      Automated Aufbau of antibody structures from given sequences using Macromoltek’s SmrtMolAntibody.
      ,
      • Lapidoth G.
      • Parker J.
      • Prilusky J.
      • Fleishman S.J.
      AbPredict 2: A server for accurate and unstrained structure prediction of antibody variable domains.
      ). Due to the conserved nature of the antibody framework structure (see Fig. 1) and the existence of canonical classes, these tools generally rely on homology modeling (i.e. an existing structure with high sequence identity to a segment of or to the whole target is used as a template). Normally the structure is considered as separate regions, first the frameworks of the VH and VL and then the six CDRs. Separate templates may be chosen for the VH and VL; however, if a single template is available with high sequence identity to both chains, only one is required (
      • Leem J.
      • Dunbar J.
      • Georges G.
      • Shi J.
      • Deane C.M.
      ABodyBuilder: automated antibody structure prediction with data-driven accuracy estimation.
      ). In this case, the orientation of the two chains can be directly copied from the chosen template; otherwise, a further template that is similar in sequence to both chains is required, or the orientation between the chains must be predicted (
      • Bujotzek A.
      • Dunbar J.
      • Lipsmeier F.
      • Schäfer W.
      • Antes I.
      • Deane C.M.
      • Georges G.
      Prediction of VH-VL domain orientation for antibody variable domain modeling.
      ). The framework can be modeled with very high accuracy, typically with a root mean square deviation (RMSD) of below 1 Å. In the second Antibody Modelling Assessment (AMA-II), a blind test of prediction accuracy, VH and VL were modeled with an average backbone-atom RMSD of 0.65 and 0.50 Å, respectively (
      • Zhu K.
      • Day T.
      • Warshaviak D.
      • Murrett C.
      • Friesner R.
      • Pearlman D.
      Antibody structure determination using a combination of homology modeling, energy-based refinement and loop prediction.
      ,
      • Maier J.K.X.
      • Labute P.
      Assessment of fully automated antibody homology modeling protocols in molecular operating environment.
      ,
      • Berrondo M.
      • Kaufmann S.
      • Berrondo M.
      Automated Aufbau of antibody structures from given sequences using Macromoltek’s SmrtMolAntibody.
      ,
      • Teplyakov A.
      • Luo J.
      • Obmolova G.
      • Malia T.J.
      • Sweet R.
      • Stanfield R.L.
      • Kodangattil S.
      • Almagro J.C.
      • Gilliland G.L.
      Antibody modeling assessment II. Structures and models.
      ,
      • Fasnacht M.
      • Butenhof K.
      • Goupil-Lamy A.
      • Hernandez-Guzman F.
      • Huang H.
      • Yan L.
      Automated antibody structure prediction using Accelrys tools: results and best practices.
      ,
      • Shirai H.
      • Ikeda K.
      • Yamashita K.
      • Tsuchiya Y.
      • Sarmiento J.
      • Liang S.
      • Morokata T.
      • Mizuguchi K.
      • Higo J.
      • Standley D.M.
      • Nakamura H.
      High-resolution modeling of antibody structures by a combination of bioinformatics, expert knowledge, and molecular simulations.
      ,
      • Weitzner B.D.
      • Kuroda D.
      • Marze N.
      • Xu J.
      • Gray J.J.
      Blind prediction performance of RosettaAntibody 3.0: grafting, relaxation, kinematic loop modeling, and full CDR optimization.
      ). Prediction of the orientation of the two domains was more challenging, however, with predicted tilt angles differing from the true angle by 5–12 (
      • Teplyakov A.
      • Luo J.
      • Obmolova G.
      • Malia T.J.
      • Sweet R.
      • Stanfield R.L.
      • Kodangattil S.
      • Almagro J.C.
      • Gilliland G.L.
      Antibody modeling assessment II. Structures and models.
      ).
      Once a framework template has been selected, CDR structures can then be predicted, again using templates through knowledge-based loop modeling algorithms. As mentioned previously, in the majority of cases, CDRs L1–L3, H1, and H2 adopt a limited number of known conformations known as canonical classes (
      • Chothia C.
      • Lesk A.M.
      Canonical structures for the hypervariable regions of immunoglobulins.
      ,
      • North B.
      • Lehmann A.
      • Dunbrack Jr., R.L.
      A new clustering of antibody CDR loop conformations.
      ,
      • Nowak J.
      • Baker T.
      • Georges G.
      • Kelm S.
      • Klostermann S.
      • Shi J.
      • Sridharan S.
      • Deane C.M.
      Length-independent structural similarities enrich the antibody CDR canonical class model.
      ). As a result, they can be predicted accurately and quickly using this technique. Templates are selected from a database of known CDR structures based on sequence identity and the geometry of the anchor residues (the residues on either side of the CDR). The database of CDR structures can either include all known structures or be limited to the known conformations for the predicted canonical class of the target (
      • Leem J.
      • Dunbar J.
      • Georges G.
      • Shi J.
      • Deane C.M.
      ABodyBuilder: automated antibody structure prediction with data-driven accuracy estimation.
      ,
      • Marcatili P.
      • Rosi A.
      • Tramontano A.
      PIGS: automatic prediction of antibody structures.
      ). Average RMSDs achieved during AMA-II ranged from 0.50 Å for L2 to 1.6 Å for L3 (
      • Teplyakov A.
      • Luo J.
      • Obmolova G.
      • Malia T.J.
      • Sweet R.
      • Stanfield R.L.
      • Kodangattil S.
      • Almagro J.C.
      • Gilliland G.L.
      Antibody modeling assessment II. Structures and models.
      ).
      H3 can also be modeled using this method; however, its sequence and structural diversity compared with the other CDRs makes prediction more challenging (
      • Marks C.
      • Deane C.M.
      Antibody H3 structure prediction.
      ). The H3 loop has also been shown to be structurally distinct from typical protein loops (
      • Regep C.
      • Georges G.
      • Shi J.
      • Popovic B.
      • Deane C.M.
      The H3 loop of antibodies shows unique structural characteristics.
      ); researchers have therefore developed specialized software to model H3 loops more accurately (
      • Marks C.
      • Nowak J.
      • Klostermann S.
      • Georges G.
      • Dunbar J.
      • Shi J.
      • Kelm S.
      • Deane C.M.
      Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction.
      ,
      • Messih M.A.
      • Lepore R.
      • Marcatili P.
      • Tramontano A.
      Improving the accuracy of the structure prediction of the third hypervariable loop of the heavy chains of antibodies.
      ,
      • Choi Y.
      • Deane C.M.
      Predicting antibody complementarity determining region structures without classification.
      ,
      • Zhu K.
      • Day T.
      Ab initio structure prediction of the antibody hypervariable H3 loop.
      ). Ab initio techniques, which create potential loop conformations without knowledge of templates, are often used here, either in isolation or in combination with knowledge-based strategies as a hybrid algorithm (
      • Marks C.
      • Nowak J.
      • Klostermann S.
      • Georges G.
      • Dunbar J.
      • Shi J.
      • Kelm S.
      • Deane C.M.
      Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction.
      ). Despite the existence of H3-specific prediction algorithms, H3 modeling remains challenging, achieving RMSDs normally in the region of 2-3 Å (
      • Krawczyk K.
      • Kelm S.
      • Kovaltsuk A.
      • Galson J.D.
      • Kelly D.
      • Trück J.
      • Regep C.
      • Leem J.
      • Wong W.K.
      • Nowak J.
      • Snowden J.
      • Wright M.
      • Starkie L.
      • Scott-Tucker A.
      • Shi J.
      • et al.
      Structurally mapping antibody repertoires.
      ,
      • Teplyakov A.
      • Luo J.
      • Obmolova G.
      • Malia T.J.
      • Sweet R.
      • Stanfield R.L.
      • Kodangattil S.
      • Almagro J.C.
      • Gilliland G.L.
      Antibody modeling assessment II. Structures and models.
      ). In addition, ab initio methods typically require much longer run times than knowledge-based methods, and therefore H3 prediction is currently the main bottleneck for accurate modeling of BCR repertoires. Attempts have been made to circumvent this issue, either by imposing an H3 length cutoff (long loops are modeled less accurately due to the absence of experimental data) (
      • Kovaltsuk A.
      • Raybould M.I.J.
      • Wong W.K.
      • Marks C.
      • Kelm S.
      • Snowden J.
      • Trück J.
      • Deane C.M.
      Structural diversity of B-cell receptor repertoires along the B-cell differentiation axis in humans and mice.
      ) or by only considering those H3 sequences that can be confidently modeled using a knowledge-based algorithm (
      • Krawczyk K.
      • Kelm S.
      • Kovaltsuk A.
      • Galson J.D.
      • Kelly D.
      • Trück J.
      • Regep C.
      • Leem J.
      • Wong W.K.
      • Nowak J.
      • Snowden J.
      • Wright M.
      • Starkie L.
      • Scott-Tucker A.
      • Shi J.
      • et al.
      Structurally mapping antibody repertoires.
      ,
      • Kovaltsuk A.
      • Raybould M.I.J.
      • Wong W.K.
      • Marks C.
      • Kelm S.
      • Snowden J.
      • Trück J.
      • Deane C.M.
      Structural diversity of B-cell receptor repertoires along the B-cell differentiation axis in humans and mice.
      ).1 Whereas this may introduce some biases into the analysis—for example, long H3 loop structures will be underrepresented in model libraries—it increases the confidence we have in the models that are considered and subsequently in the conclusions that are drawn.
      Several studies have used antibody modeling to enhance the information given by BCR repertoires. DeKosky et al. (
      • DeKosky B.J.
      • Lungu O.I.
      • Park D.
      • Johnson E.L.
      • Charab W.
      • Chrysostomou C.
      • Kuroda D.
      • Ellington A.D.
      • Ippolito G.C.
      • Gray J.J.
      • Georgiou G.
      Large-scale sequence and structural comparisons of human naive and antigen-experienced antibody repertoires.
      ) modeled 2,000 VH/VL pairs using RosettaAntibody (
      • Sivasubramanian A.
      • Sircar A.
      • Chaudhury S.
      • Gray J.J.
      Toward high-resolution homology modeling of antibody Fv regions and application to antibody-antigen docking.
      ,
      • Weitzner B.D.
      • Jeliazkov J.R.
      • Lyskov S.
      • Marze N.
      • Kuroda D.
      • Frick R.
      • Adolf-Bryfogle J.
      • Biswas N.
      • Dunbrack Jr., R.L.
      • Gray J.J.
      Modeling and docking of antibody structures with Rosetta.
      ), limiting their sequences to those with high-identity templates available. They analyzed the physico-chemical properties of the antibodies, such as solvent-accessible surface area and hydrophobicity, and were able to demonstrate how these properties change with antigen experience and link their observations to germline usage. Raybould et al. (
      • Raybould M.I.J.
      • Marks C.
      • Krawczyk K.
      • Taddese B.
      • Nowak J.
      • Lewis A.P.
      • Bujotzek A.
      • Shi J.
      • Deane C.M.
      Five computational developability guidelines for therapeutic antibody profiling.
      ) used ABodyBuilder (
      • Leem J.
      • Dunbar J.
      • Georges G.
      • Shi J.
      • Deane C.M.
      ABodyBuilder: automated antibody structure prediction with data-driven accuracy estimation.
      ) to predict the structures of a large subset of a BCR repertoire (∼19,000 sequences) and compared these models with those of a set of therapeutics to deduce which properties are required to reduce developability issues. Because antibody properties can be predicted with greater accuracy with the inclusion of structural data (
      • Krawczyk K.
      • Liu X.
      • Baker T.
      • Shi J.
      • Deane C.M.
      Improving B-cell epitope prediction and its application to global antibody-antigen docking.
      ), models representing the repertoire have the potential to improve strategies such as directed design by using them as inputs to other computational tools (e.g. predictors of the sets of residues on the antibody and antigen that are involved in binding (known as the epitope and paratope respectively) and developability predictors).
      One problem with modeling the antibody sequences obtained through repertoire sequencing is that they are normally not paired (i.e. we do not know which VH belongs with which VL). Native pairings are important in creating accurate models that represent the repertoire and will affect the properties of the antibody, such as its folding, stability, expression, and binding. Pairing is currently thought to be mostly random (
      • Briney B.
      • Inderbitzin A.
      • Joyce C.
      • Burton D.R.
      Commonality despite exceptional diversity in the baseline human antibody repertoire.
      ,
      • DeKosky B.J.
      • Kojima T.
      • Rodin A.
      • Charab W.
      • Ippolito G.C.
      • Ellington A.D.
      • Georgiou G.
      In-depth determination and analysis of the human paired heavy- and light-chain antibody repertoire.
      ), meaning that most VH chains are capable of associating with most VLs. Prediction of true pairings is therefore difficult. Techniques currently used to propose likely pairings include comparing all of the potential interfaces with those observed in known structures (
      • Raybould M.I.J.
      • Marks C.
      • Krawczyk K.
      • Taddese B.
      • Nowak J.
      • Lewis A.P.
      • Bujotzek A.
      • Shi J.
      • Deane C.M.
      Five computational developability guidelines for therapeutic antibody profiling.
      ),2 pairing based on the relative frequency of the sequences (
      • Reddy S.T.
      • Ge X.
      • Miklos A.E.
      • Hughes R.A.
      • Kang S.H.
      • Hoi K.H.
      • Chrysostomou C.
      • Hunicke-Smith S.P.
      • Iverson B.L.
      • Tucker P.W.
      • Ellington A.D.
      • Georgiou G.
      Monoclonal antibodies isolated without screening by analyzing the variable-gene repertoire of plasma cells.
      ), or by constructing phylogenetic trees (
      • Zhu J.
      • Ofek G.
      • Yang Y.
      • Zhang B.
      • Louder M.K.
      • Lu G.
      • McKee K.
      • Pancera M.
      • Skinner J.
      • Zhang Z.
      • Parks R.
      • Eudailey J.
      • Lloyd K.E.
      • Blinn J.
      • Alam S.M.
      • et al.
      Mining the antibodyome for HIV-1-neutralizing antibodies with next-generation sequencing and phylogenetic pairing of heavy/light chains.
      ). Recently, experimental methods for immunoglobulin sequencing that preserve native pairings have been developed (
      • DeKosky B.J.
      • Ippolito G.C.
      • Deschner R.P.
      • Lavinder J.J.
      • Wine Y.
      • Rawlings B.M.
      • Varadarajan N.
      • Giesecke C.
      • Dörner T.
      • Andrews S.F.
      • Wilson P.C.
      • Hunicke-Smith S.P.
      • Willson C.G.
      • Ellington A.D.
      • Georgiou G.
      High-throughput sequencing of the paired human immunoglobulin heavy and light chain repertoire.
      ); as these techniques become more widespread, the amount of paired data will increase, and these approximations will no longer be required.
      Producing complete models of the antibody variable region can be time-consuming; for example, in the study by DeKosky et al. (
      • DeKosky B.J.
      • Lungu O.I.
      • Park D.
      • Johnson E.L.
      • Charab W.
      • Chrysostomou C.
      • Kuroda D.
      • Ellington A.D.
      • Ippolito G.C.
      • Gray J.J.
      • Georgiou G.
      Large-scale sequence and structural comparisons of human naive and antigen-experienced antibody repertoires.
      ), RosettaAntibody took 570,000 CPU hours to produce 2,000 models. Even for algorithms that are considered to be fast, execution times would be prohibitive—ABodyBuilder, for example, takes on average 567 CPU hours per 1,000 sequences (
      • Leem J.
      • Dunbar J.
      • Georges G.
      • Shi J.
      • Deane C.M.
      ABodyBuilder: automated antibody structure prediction with data-driven accuracy estimation.
      ). An alternative, faster method of characterizing a repertoire is the structural annotation of sequences. Instead of running a complete modeling protocol, sequences can be quickly matched up to their predicted templates using sequence identity. The conformations of the CDRs can be assigned by either exploiting a knowledge-based loop modeling algorithm (
      • Krawczyk K.
      • Kelm S.
      • Kovaltsuk A.
      • Galson J.D.
      • Kelly D.
      • Trück J.
      • Regep C.
      • Leem J.
      • Wong W.K.
      • Nowak J.
      • Snowden J.
      • Wright M.
      • Starkie L.
      • Scott-Tucker A.
      • Shi J.
      • et al.
      Structurally mapping antibody repertoires.
      ) or a canonical class predictor (for the non-H3 CDRs) (
      • Nowak J.
      • Baker T.
      • Georges G.
      • Kelm S.
      • Klostermann S.
      • Shi J.
      • Sridharan S.
      • Deane C.M.
      Length-independent structural similarities enrich the antibody CDR canonical class model.
      ,
      • Kovaltsuk A.
      • Raybould M.I.J.
      • Wong W.K.
      • Marks C.
      • Kelm S.
      • Snowden J.
      • Trück J.
      • Deane C.M.
      Structural diversity of B-cell receptor repertoires along the B-cell differentiation axis in humans and mice.
      ). Sequences can therefore be structurally annotated in much greater numbers than could be done using modeling tools. It has been shown that the majority of sequences can be mapped to an existing structure in this way (
      • Krawczyk K.
      • Kelm S.
      • Kovaltsuk A.
      • Galson J.D.
      • Kelly D.
      • Trück J.
      • Regep C.
      • Leem J.
      • Wong W.K.
      • Nowak J.
      • Snowden J.
      • Wright M.
      • Starkie L.
      • Scott-Tucker A.
      • Shi J.
      • et al.
      Structurally mapping antibody repertoires.
      ).
      SAAB (Structural Annotation of Antibodies) (
      • Krawczyk K.
      • Kelm S.
      • Kovaltsuk A.
      • Galson J.D.
      • Kelly D.
      • Trück J.
      • Regep C.
      • Leem J.
      • Wong W.K.
      • Nowak J.
      • Snowden J.
      • Wright M.
      • Starkie L.
      • Scott-Tucker A.
      • Shi J.
      • et al.
      Structurally mapping antibody repertoires.
      ) and its successor SAAB+ (
      • Kovaltsuk A.
      • Raybould M.I.J.
      • Wong W.K.
      • Marks C.
      • Kelm S.
      • Snowden J.
      • Trück J.
      • Deane C.M.
      Structural diversity of B-cell receptor repertoires along the B-cell differentiation axis in humans and mice.
      ) are algorithms that have been used to annotate millions of sequences with their proposed template structures, allowing thorough analysis of repertoire-wide structural properties. For example, Kovaltsuk et al. (
      • Kovaltsuk A.
      • Raybould M.I.J.
      • Wong W.K.
      • Marks C.
      • Kelm S.
      • Snowden J.
      • Trück J.
      • Deane C.M.
      Structural diversity of B-cell receptor repertoires along the B-cell differentiation axis in humans and mice.
      ) investigated structural changes that occur with B-cell differentiation. Clustering based on their proposed H3 templates resulted in the separation of antibodies from different stages of the immune response, indicating that there are structural changes that occur as the response progresses. The effect of aging on the repertoire has also been studied in this way, revealing that older individuals have a higher number of antibodies that are structurally distinct from the germline (
      • Ghraichy M.
      • Galson J.D.
      • Kovaltsuk A.
      • Niederhäusern V.V.
      • Trück J.
      Maturation of naïve and antigen-experienced B-cell receptor repertoires with age.
      ).
      The idea of public sequences has been extended to that of public structures. Instead of searching for sequences that are observed in the repertoires of multiple individuals, we can look instead for antibodies with shared backbone conformations, which may be a greater indicator of common functionality. Sequence-only analyses have shown that the shared space is present but only makes up a small percentage of the overall repertoire (
      • Briney B.
      • Inderbitzin A.
      • Joyce C.
      • Burton D.R.
      Commonality despite exceptional diversity in the baseline human antibody repertoire.
      ); however, by incorporating structure it can be seen that the public repertoire is likely to be much larger (
      • Kovaltsuk A.
      • Raybould M.I.J.
      • Wong W.K.
      • Marks C.
      • Kelm S.
      • Snowden J.
      • Trück J.
      • Deane C.M.
      Structural diversity of B-cell receptor repertoires along the B-cell differentiation axis in humans and mice.
      ).1

      BCR repertoire sequencing and therapeutic discovery

      Discovering antibodies specific to an antigen of interest

      Currently, potential therapeutic antibodies are commonly discovered in two ways: through the immunization of an animal, such as a mouse, with the target antigen and subsequent extraction of the antibodies it produces and through phage display, where viruses displaying antibodies on their surface are screened against the target antigen. High-throughput sequencing of the antibody repertoire has been used successfully to enhance both approaches. For example, researchers have genetically engineered mice such that they contain human antibody genes—the antibodies produced by these mice are therefore less likely to be immunogenic. The “humanness” of the repertoire was validated through sequencing of the mouse BCR repertoire (
      • Lee E.C.
      • Liang Q.
      • Ali H.
      • Bayliss L.
      • Beasley A.
      • Bloomfield-Gerdes T.
      • Bonoli L.
      • Brown R.
      • Campbell J.
      • Carpenter A.
      • Chalk S.
      • Davis A.
      • England N.
      • Fane-Dremucheva A.
      • Franz B.
      • et al.
      Complete humanization of the mouse immunoglobulin loci enables efficient therapeutic antibody discovery.
      ). Sequencing techniques have been used to characterize phage display libraries, to monitor their diversity and hence evaluate their capability of isolating antibodies that bind to different antigens (
      • Lim C.C.
      • Choong Y.S.
      • Lim T.S.
      Cognizance of molecular methods for the generation of mutagenic phage display antibody libraries for affinity maturation.
      ). Screening libraries can also be designed using BCR repertoire data—Zhai et al. (
      • Zhai W.
      • Glanville J.
      • Fuhrmann M.
      • Mei L.
      • Ni I.
      • Sundar P.D.
      • Van Blarcom T.
      • Abdiche Y.
      • Lindquist K.
      • Strohner R.
      • Telman D.
      • Cappuccilli G.
      • Finlay W.J.
      • Van den Brulle J.
      • Cox D.R.
      • et al.
      Synthetic antibodies designed on natural sequence landscapes.
      ) and Prassler et al. (
      • Prassler J.
      • Thiel S.
      • Pracht C.
      • Polzer A.
      • Peters S.
      • Bauer M.
      • Nörenberg S.
      • Stark Y.
      • Kölln J.
      • Popp A.
      • Urlinger S.
      • Enzelberger M.
      HuCAL PLATINUM, a synthetic fab library optimized for sequence diversity and superior performance in mammalian expression systems.
      ) have shown how this is possible, by reproducing the observed amino acid usages at each sequence position. Both groups found that the antibodies in their libraries exhibited better expression levels than other synthetic libraries, with high genetic diversity, and they were able to isolate high-affinity antibodies for a range of different antigens.
      It is now becoming possible to identify binders directly from BCR repertoire data. If an antibody that binds to the target antigen is already known, approaches such as clonotyping can be used to identify more potential binders with closely related sequences, expanding the pool of candidates that can be taken forward for further study. Known binders are not essential, however. The immunization of an organism with an antigen, as explained previously, leads to the enrichment of the repertoire with antibodies that bind to that antigen. Therefore, by analyzing how often a given sequence or clonotype appears in the repertoire after antigen exposure, specific antibodies can be identified. This approach can be used either to find antibodies that might work as therapeutics or to monitor the immune response during the development of vaccines (
      • Galson J.D.
      • Trück J.
      • Fowler A.
      • Clutterbuck E.A.
      • Münz M.
      • Cerundolo V.
      • Reinhard C.
      • van der Most R.
      • Pollard A.J.
      • Lunter G.
      • Kelly D.F.
      Analysis of B cell repertoire dynamics following hepatitis B vaccination in humans, and enrichment of vaccine-specific antibody sequences.
      ,
      • Fowler A.
      • Galson J.D.
      • Trück J.
      • Kelly D.F.
      • Lunter G.
      Inferring B cell specificity for vaccines using a mixture model.
      ,
      • Galson J.D.
      • Pollard A.J.
      • Trück J.
      • Kelly D.F.
      Studying the antibody repertoire after vaccination: practical applications.
      ,
      • Devarajan P.
      • Swain S.L.
      Original antigenic sin: friend or foe in developing a broadly cross-reactive vaccine to influenza?.
      ,
      • Lee J.
      • Paparoditis P.
      • Horton A.P.
      • Frühwirth A.
      • McDaniel J.R.
      • Jung J.
      • Boutz D.R.
      • Hussein D.A.
      • Tanno Y.
      • Pappas L.
      • Ippolito G.C.
      • Corti D.
      • Lanzavecchia A.
      • Georgiou G.
      Persistent antibody clonotypes dominate the serum response to influenza over multiple years and repeated vaccinations.
      ,
      • Henry C.
      • Zheng N.Y.
      • Huang M.
      • Cabanov A.
      • Rojas K.T.
      • Kaur K.
      • Andrews S.F.
      • Palm A.-K.E.
      • Chen Y.Q.
      • Li Y.
      • Hoskova K.
      • Utset H.A.
      • Vieira M.C.
      • Wrammert J.
      • Ahmed R.
      • et al.
      Influenza virus vaccination elicits poorly adapted B cell responses in elderly individuals.
      ). The repertoires of multiple individuals who have been exposed to the same antigen can be investigated to find potential binders, by identifying common features that hint at shared functionality (e.g. identical H3 sequences) (
      • Trück J.
      • Ramasamy M.N.
      • Galson J.D.
      • Rance R.
      • Parkhill J.
      • Lunter G.
      • Pollard A.J.
      • Kelly D.F.
      Identification of antigen-specific B cell receptor sequences using public repertoire analysis.
      ). The volume of data produced also means that deep learning techniques can be used effectively; for example Mason et al. (
      • Mason D.M.
      • Friedensohn S.
      • Weber C.R.
      • Jordi C.
      • Wagner B.
      • Meng S.
      • Gainza P.
      • Correia B.
      • Reddy S.T.
      Deep learning enables therapeutic antibody optimization in mammalian cells by deciphering high-dimensional protein sequence space.
      ) have generated neural networks that classify antibodies as HER2 binders or nonbinders based on sequence and thereby successfully identified 30 antigen-specific antibodies. BCR repertoire sequencing experiments have been carried out to discover binders for a wide range of antigens, including HIV (
      • Setliff I.
      • McDonnell W.J.
      • Raju N.
      • Bombardi R.G.
      • Murji A.A.
      • Scheepers C.
      • Ziki R.
      • Mynhardt C.
      • Shepherd B.E.
      • Mamchak A.A.
      • Garrett N.
      • Karim S.A.
      • Mallal S.A.
      • Crowe Jr., J.E.
      • Morris L.
      • et al.
      Multi-donor longitudinal antibody repertoire sequencing reveals the existence of public antibody clonotypes in HIV-1 infection.
      ,
      • Zhu J.
      • Ofek G.
      • Yang Y.
      • Zhang B.
      • Louder M.K.
      • Lu G.
      • McKee K.
      • Pancera M.
      • Skinner J.
      • Zhang Z.
      • Parks R.
      • Eudailey J.
      • Lloyd K.E.
      • Blinn J.
      • Alam S.M.
      • et al.
      Mining the antibodyome for HIV-1-neutralizing antibodies with next-generation sequencing and phylogenetic pairing of heavy/light chains.
      ,
      • Zhu J.
      • Wu X.
      • Zhang B.
      • McKee K.
      • O'Dell S.
      • Soto C.
      • Zhou T.
      • Casazza J.P.
      • NISC Comparative Sequencing Program
      • Mullikin J.C.
      • Kwong P.D.
      • Mascola J.R.
      • Shapiro L.
      • Becker J.
      • Benjamin B.
      • Blakesley R.
      De novo identification of VRC01 class HIV-1-neutralizing antibodies by next-generation sequencing of B-cell transcripts.
      ,
      • Walker L.M.
      • Huber M.
      • Doores K.J.
      • Falkowska E.
      • Pejchal R.
      • Julien J.P.
      • Wang S.K.
      • Ramos A.
      • Chan-Hui P.Y.
      • Moyle M.
      • Mitcham J.L.
      • Hammond P.W.
      • Olsen O.A.
      • Phung P.
      • Fling S.
      • et al.
      Broad neutralization coverage of HIV by multiple highly potent antibodies.
      ), Ebola (
      • Wang B.
      • Kluwe C.A.
      • Lungu O.I.
      • DeKosky B.J.
      • Kerr S.A.
      • Johnson E.L.
      • Tanno H.
      • Lee C.-H.
      • Jung J.
      • Rezigh A.B.
      • Carroll S.M.
      • Reyes A.N.
      • Bentz J.R.
      • Villanueva I.
      • Altman A.L.
      • et al.
      Facile discovery of a diverse panel of anti-Ebola virus antibodies by immune repertoire mining.
      ), hepatitis B (
      • Galson J.D.
      • Trück J.
      • Fowler A.
      • Clutterbuck E.A.
      • Münz M.
      • Cerundolo V.
      • Reinhard C.
      • van der Most R.
      • Pollard A.J.
      • Lunter G.
      • Kelly D.F.
      Analysis of B cell repertoire dynamics following hepatitis B vaccination in humans, and enrichment of vaccine-specific antibody sequences.
      ,
      • Sato S.
      • Beausoleil S.A.
      • Popova L.
      • Beaudet J.G.
      • Ramenani R.K.
      • Zhang X.
      • Wieler J.S.
      • Schieferl S.M.
      • Cheung W.C.
      • Polakiewicz R.D.
      Proteomics-directed cloning of circulating antiviral human monoclonal antibodies.
      ), and many others (
      • Huang K.A.
      • Rijal P.
      • Jiang H.
      • Wang B.
      • Schimanski L.
      • Dong T.
      • Liu Y.M.
      • Chang P.
      • Iqbal M.
      • Wang M.C.
      • Chen Z.
      • Song R.
      • Huang C.C.
      • Yang J.H.
      • Qi J.
      • et al.
      Structure-function analysis of neutralizing antibodies to H7N9 influenza from naturally infected humans.
      ,
      • Reddy S.T.
      • Ge X.
      • Miklos A.E.
      • Hughes R.A.
      • Kang S.H.
      • Hoi K.H.
      • Chrysostomou C.
      • Hunicke-Smith S.P.
      • Iverson B.L.
      • Tucker P.W.
      • Ellington A.D.
      • Georgiou G.
      Monoclonal antibodies isolated without screening by analyzing the variable-gene repertoire of plasma cells.
      ,
      • Zhai W.
      • Glanville J.
      • Fuhrmann M.
      • Mei L.
      • Ni I.
      • Sundar P.D.
      • Van Blarcom T.
      • Abdiche Y.
      • Lindquist K.
      • Strohner R.
      • Telman D.
      • Cappuccilli G.
      • Finlay W.J.
      • Van den Brulle J.
      • Cox D.R.
      • et al.
      Synthetic antibodies designed on natural sequence landscapes.
      ,
      • Fowler A.
      • Galson J.D.
      • Trück J.
      • Kelly D.F.
      • Lunter G.
      Inferring B cell specificity for vaccines using a mixture model.
      ,
      • Devarajan P.
      • Swain S.L.
      Original antigenic sin: friend or foe in developing a broadly cross-reactive vaccine to influenza?.
      ,
      • Trück J.
      • Ramasamy M.N.
      • Galson J.D.
      • Rance R.
      • Parkhill J.
      • Lunter G.
      • Pollard A.J.
      • Kelly D.F.
      Identification of antigen-specific B cell receptor sequences using public repertoire analysis.
      ,
      • Sato S.
      • Beausoleil S.A.
      • Popova L.
      • Beaudet J.G.
      • Ramenani R.K.
      • Zhang X.
      • Wieler J.S.
      • Schieferl S.M.
      • Cheung W.C.
      • Polakiewicz R.D.
      Proteomics-directed cloning of circulating antiviral human monoclonal antibodies.
      ,
      • Saggy I.
      • Wine Y.
      • Shefet-Carasso L.
      • Nahary L.
      • Georgiou G.
      • Benhar I.
      Antibody isolation from immunized animals: comparison of phage display and antibody discovery via v gene repertoire mining.
      ,
      • Cheung W.C.
      • Beausoleil S.A.
      • Zhang X.
      • Sato S.
      • Schieferl S.M.
      • Wieler J.S.
      • Beaudet J.G.
      • Ramenani R.K.
      • Popova L.
      • Comb M.J.
      • Rush J.
      • Polakiewicz R.D.
      A proteomics approach for the identification and cloning of monoclonal antibodies from serum.
      ,
      • Ravn U.
      • Gueneau F.
      • Baerlocher L.
      • Osteras M.
      • Desmurs M.
      • Malinge P.
      • Magistrelli G.
      • Farinelli L.
      • Kosco-Vilbois M.H.
      • Fischer N.
      By-passing in vitro screening—next generation sequencing technologies applied to antibody display and in silico candidate selection.
      ,
      • Wang B.
      • DeKosky B.J.
      • Timm M.R.
      • Lee J.
      • Normandin E.
      • Misasi J.
      • Kong R.
      • McDaniel J.R.
      • Delidakis G.
      • Leigh K.E.
      • Niezold T.
      • Choi C.W.
      • Viox E.G.
      • Fahad A.
      • Cagigi A.
      • et al.
      Functional interrogation and mining of natively paired human v H:V L antibody repertoires.
      ,
      • Wang B.
      • Lee C.H.
      • Johnson E.L.
      • Kluwe C.A.
      • Cunningham J.C.
      • Tanno H.
      • Crooks R.M.
      • Georgiou G.
      • Ellington A.D.
      Discovery of high affinity anti-ricin antibodies by B cell receptor sequencing and by yeast display of combinatorial VH:VL libraries from immunized animals.
      ).
      Following the isolation of binders in this way, a small number can be taken forward as starting points for further development (
      • Huang K.A.
      • Rijal P.
      • Jiang H.
      • Wang B.
      • Schimanski L.
      • Dong T.
      • Liu Y.M.
      • Chang P.
      • Iqbal M.
      • Wang M.C.
      • Chen Z.
      • Song R.
      • Huang C.C.
      • Yang J.H.
      • Qi J.
      • et al.
      Structure-function analysis of neutralizing antibodies to H7N9 influenza from naturally infected humans.
      ), or a larger number can be employed as a targeted screening library (
      • Reddy S.T.
      • Ge X.
      • Miklos A.E.
      • Hughes R.A.
      • Kang S.H.
      • Hoi K.H.
      • Chrysostomou C.
      • Hunicke-Smith S.P.
      • Iverson B.L.
      • Tucker P.W.
      • Ellington A.D.
      • Georgiou G.
      Monoclonal antibodies isolated without screening by analyzing the variable-gene repertoire of plasma cells.
      ). A comparison between repertoire mining and phage display has demonstrated that the antibodies isolated by each method are not necessarily the same, and therefore it could be beneficial to use the two techniques together (
      • Saggy I.
      • Wine Y.
      • Shefet-Carasso L.
      • Nahary L.
      • Georgiou G.
      • Benhar I.
      Antibody isolation from immunized animals: comparison of phage display and antibody discovery via v gene repertoire mining.
      ).
      Much of the data from these experiments has been deposited in public sequence repertoires (
      • Kovaltsuk A.
      • Leem J.
      • Kelm S.
      • Snowden J.
      • Deane C.M.
      • Krawczyk K.
      Observed antibody space: a resource for data mining next-generation sequencing of antibody repertoires.
      ), meaning it can be exploited by other researchers in their therapeutic discovery pipelines, for example to provide new lead molecules. It has recently been shown that there is a close sequence match to many known therapeutic antibodies in the OAS database (
      • Krawczyk K.
      • Raybould M.I.J.
      • Kovaltsuk A.
      • Deane C.M.
      Looking for therapeutic antibodies in next-generation sequencing repositories.
      ). Of 242 antibodies that are either currently used as therapeutics or undergoing clinical trials (Phase II or later), sequences with over 90% identity were available for 90 H chains and 158 light chains. Notably, for H3, which is thought to contribute the most to an antibody’s binding properties, 54 perfect matches were found. Given the huge number of potential sequences, this is significantly more than would be expected by chance alone in a sequence database of this size (around 1 billion sequences) and implies that artificially developed sequences are not dissimilar from their natural counterparts. It therefore follows that natural sequence repertoires could potentially be mined for new therapeutic leads, perhaps removing the need for large-scale screening experiments at the beginning of an antibody discovery project.
      Structural annotations and modeling can also be applied to discover antigen-specific antibodies. Krawczyk et al. (
      • Krawczyk K.
      • Kelm S.
      • Kovaltsuk A.
      • Galson J.D.
      • Kelly D.
      • Trück J.
      • Regep C.
      • Leem J.
      • Wong W.K.
      • Nowak J.
      • Snowden J.
      • Wright M.
      • Starkie L.
      • Scott-Tucker A.
      • Shi J.
      • et al.
      Structurally mapping antibody repertoires.
      ) annotated ∼3.4 million sequences from individuals who had been exposed to the influenza virus with their proposed templates and therefore whose repertoires were enriched with influenza-specific binders. They discovered that many of the templates assigned came from known influenza-binding antibodies. They therefore propose that sharing of a similar structural template could be an indication of similar specificity. Assuming that a structure of an antibody specific to a given antigen or epitope is known, antibodies can be selected from a repertoire if they are predicted to have a high degree of structural similarity to it. Other computational tools can also be exploited to find potential therapeutics: a large set of models generated from repertoire data can be used as an in silico screening library (
      • Raybould M.I.J.
      • Wong W.K.
      • Deane C.M.
      Antibody-antigen complex modelling in the era of immunoglobulin repertoire sequencing.
      )1 in conjunction with epitope predictors (
      • Krawczyk K.
      • Liu X.
      • Baker T.
      • Shi J.
      • Deane C.M.
      Improving B-cell epitope prediction and its application to global antibody-antigen docking.
      ,
      • Rapberger R.
      • Lukas A.
      • Mayer B.
      Identification of discontinuous antigenic determinants on proteins based on shape complementarities.
      ,
      • Sela-Culang I.
      • Benhnia M.R.-E.-I.
      • Matho M.H.
      • Kaever T.
      • Maybeno M.
      • Schlossman A.
      • Nimrod G.
      • Li S.
      • Xiang Y.
      • Zajonc D.
      • Crotty S.
      • Ofran Y.
      • Peters B.
      Using a combined computational-experimental approach to predict antibody-specific B cell epitopes.
      ,
      • Sela-Culang I.
      • Ashkenazi S.
      • Peters B.
      • Ofran Y.
      PEASE: predicting B-cell epitopes utilizing antibody sequence.
      ,
      • Jespersen M.C.
      • Mahajan S.
      • Peters B.
      • Nielsen M.
      • Marcatili P.
      Antibody specific B-cell epitope predictions: leveraging information from antibody-antigen protein complexes.
      ,
      • Hua C.K.
      • Gacerez A.T.
      • Sentman C.L.
      • Ackerman M.E.
      • Choi Y.
      • Bailey-Kellogg C.
      Computationally-driven identification of antibody epitopes.
      ,
      • Bourquard T.
      • Musnier A.
      • Puard V.
      • Tahir S.
      • Ayoub M.A.
      • Jullian Y.
      • Boulo T.
      • Gallay N.
      • Watier H.
      • Bruneau G.
      • Reiter E.
      • Crépieux P.
      • Poupon A.
      MAbTope: a method for improved epitope mapping.
      ,
      • Soga S.
      • Kuroda D.
      • Shirai H.
      • Kobori M.
      • Hirayama N.
      Use of amino acid composition to predict epitope residues of individual antibodies.
      ,
      • Zhao L.
      • Li J.
      Mining for the antibody-antigen interacting associations that predict the B cell epitopes.
      ,
      • Zhao L.
      • Wong L.
      • Li J.
      Antibody-specified B-cell epitope prediction in line with the principle of context-awareness.
      ), paratope predictors (
      • Peng H.P.
      • Lee K.H.
      • Jian J.W.
      • Yang A.S.
      Origins of specificity and affinity in antibody-protein interactions.
      ,
      • Krawczyk K.
      • Baker T.
      • Shi J.
      • Deane C.M.
      Antibody i-Patch prediction of the antibody binding site improves rigid local antibody-antigen docking.
      ,
      • Kunik V.
      • Ashkenazi S.
      • Ofran Y.
      Paratome: an online tool for systematic identification of antigen-binding regions in antibodies based on sequence or structure.
      ,
      • Olimpieri P.P.
      • Chailyan A.
      • Tramontano A.
      • Marcatili P.
      Prediction of site-specific interactions in antibody-antigen complexes: the proABC method and server.
      ,
      • Liberis E.
      • Velickovic P.
      • Sormanni P.
      • Vendruscolo M.
      • Liò P.
      Parapred: antibody paratope prediction using convolutional and recurrent neural networks.
      ,
      • Daberdaku S.
      • Ferrari C.
      Antibody interface prediction with 3D Zernike descriptors and SVM.
      ,
      • Deac A.
      • Veličković P.
      • Sormanni P.
      Attentive cross-modal paratope prediction.
      ), and docking algorithms (
      • Weitzner B.D.
      • Jeliazkov J.R.
      • Lyskov S.
      • Marze N.
      • Kuroda D.
      • Frick R.
      • Adolf-Bryfogle J.
      • Biswas N.
      • Dunbrack Jr., R.L.
      • Gray J.J.
      Modeling and docking of antibody structures with Rosetta.
      ,
      • Brenke R.
      • Hall D.R.
      • Chuang G.Y.
      • Comeau S.R.
      • Bohnuud T.
      • Beglov D.
      • Schueler-Furman O.
      • Vajda S.
      • Kozakov D.
      Application of asymmetric statistical potentials to antibody-protein docking.
      ,
      • Kozakov D.
      • Hall D.R.
      • Xia B.
      • Porter K.A.
      • Padhorny D.
      • Yueh C.
      • Beglov D.
      • Vajda S.
      The ClusPro web server for protein-protein docking.
      ,
      • Shimba N.
      • Kamiya N.
      • Nakamura H.
      Model building of antibody-antigen complex structures using GBSA scores.
      ,
      • Sircar A.
      • Gray J.J.
      SnugDock: paratope structural optimization during antibody-antigen docking compensates for errors in antibody homology models.
      ,
      • Ramírez-Aportela E.
      • López-Blanco J.R.
      • Chacón P.
      FRODOCK 2.0: fast protein-protein docking server.
      ,
      • Macindoe G.
      • Mavridis L.
      • Venkatraman V.
      • Devignes M.D.
      • Ritchie D.W.
      HexServer: an FFT-based protein docking server powered by graphics processors.
      ,
      • Chen R.
      • Li L.
      • Weng Z.
      ZDOCK: an initial-stage protein-docking algorithm.
      ,
      • Dominguez C.
      • Boelens R.
      • Bonvin A.M.
      HADDOCK: a protein-protein docking approach based on biochemical or biophysical information.
      ,
      • De Vries S.J.
      • van Dijk A.D.J.
      • Krzeminski M.
      • van Dijk M.
      • Thureau A.
      • Hsu V.
      • Wassenaar T.
      • Bonvin A.M.
      HADDOCK versus HADDOCK: new features and performance of HADDOCK2.0 on the CAPRI targets.
      ,
      • de Vries S.J.
      • Schindler C.E.
      • Chauvot de Beauchêne I.
      • Zacharias M.
      A web interface for easy flexible protein-protein docking with ATTRACT.
      ,
      • Tovchigrechko A.
      • Vakser I.A.
      GRAMM-X public web server for protein-protein docking.
      ,
      • Jiménez-García B.
      • Pons C.
      • Fernández-Recio J.
      pyDockWEB: a web server for rigid-body protein-protein docking using electrostatics and desolvation scoring.
      ,
      • Torchala M.
      • Moal I.H.
      • Chaleil R.A.
      • Fernandez-Recio J.
      • Bates P.A.
      SwarmDock: a server for flexible protein-protein docking.
      ,
      • Schneidman-Duhovny D.
      • Inbar Y.
      • Nussinov R.
      • Wolfson H.J.
      PatchDock and SymmDock: servers for rigid and symmetric docking.
      ). As computational methods continue to improve and become faster, this approach will become more accurate and more feasible, potentially making an entirely in silico antibody discovery platform a reality.
      However, issues arise due to most sequencing experiments focusing on only the heavy chain and unknown native pairings even when both the heavy and light chains are sequenced. Antibodies with high affinity and specificity are identified more often when the true VH/VL pairings are known (
      • Adler A.S.
      • Bedinger D.
      • Adams M.S.
      • Asensio M.A.
      • Edgar R.C.
      • Leong R.
      • Leong J.
      • Mizrahi R.A.
      • Spindler M.J.
      • Bandi S.R.
      • Huang H.
      • Tawde P.
      • Brams P.
      • Johnson D.S.
      A natively paired antibody library yields drug leads with higher sensitivity and specificity than a randomly paired antibody library.
      ); however, this is not achievable with most of the available data. As previously stated, single-cell approaches that retain pair information have been developed (
      • DeKosky B.J.
      • Ippolito G.C.
      • Deschner R.P.
      • Lavinder J.J.
      • Wine Y.
      • Rawlings B.M.
      • Varadarajan N.
      • Giesecke C.
      • Dörner T.
      • Andrews S.F.
      • Wilson P.C.
      • Hunicke-Smith S.P.
      • Willson C.G.
      • Ellington A.D.
      • Georgiou G.
      High-throughput sequencing of the paired human immunoglobulin heavy and light chain repertoire.
      ); however, the method is not as high-throughput as other sequencing techniques, so less data is currently available. In the future, this is likely to change, but for now, other approaches must be applied. For experiments resulting in both heavy- and light-chain sequences, pairings can be exhaustively tested for plausibility (
      • Raybould M.I.J.
      • Wong W.K.
      • Deane C.M.
      Antibody-antigen complex modelling in the era of immunoglobulin repertoire sequencing.
      )1 or by observing relative frequencies (
      • Reddy S.T.
      • Ge X.
      • Miklos A.E.
      • Hughes R.A.
      • Kang S.H.
      • Hoi K.H.
      • Chrysostomou C.
      • Hunicke-Smith S.P.
      • Iverson B.L.
      • Tucker P.W.
      • Ellington A.D.
      • Georgiou G.
      Monoclonal antibodies isolated without screening by analyzing the variable-gene repertoire of plasma cells.
      ). Alternatively, especially when light chains have not been sequenced, it may be possible to use an artificial light chain with the ability to associate with a range of heavy chains (
      • Xue H.
      • Sun L.
      • Fujimoto H.
      • Suzuki T.
      • Takahashi Y.
      • Ohnishi K.
      Artificial immunoglobulin light chain with potential to associate with a wide variety of immunoglobulin heavy chains.
      ). The concept of public sequences may also help here; a subset of the public light chain sequences could be used as a pairing library, as these sequences are clearly widely used and are therefore more likely to form successful pairings. In general, known public sequences may be a good place to start when attempting to discover a new therapeutic (e.g. in the design of a screening library), because they are likely to have low immunogenicity and be of high importance in the immune response to many common antigens.

      Using BCR repertoire data to identify undesirable properties during therapeutic development

      Binding affinity is not the only feature of a potential therapeutic that needs to be optimized. In addition to being biologically active, it must be safe to administer to humans and be able to withstand the stresses of the production process (i.e. the antibody should have good “developability”) (
      • Jarasch A.
      • Koll H.
      • Regula J.T.
      • Bader M.
      • Papadimitriou A.
      • Kettenberger H.
      Developability assessment during the selection of novel therapeutic antibodies.
      ). Antibodies discovered through the immunization of an organism (such as a mouse) against the target antigen cannot be used directly as therapeutics, because they would be identified as nonnative by the human immune system and would therefore cause an unwanted response themselves (
      • Safdari Y.
      • Farajnia S.
      • Asgharzadeh M.
      • Khalili M.
      Antibody humanization methods—a review and update.
      ). Changes made to potential therapeutics during the development process can also introduce nonhuman-like characteristics. It is therefore desirable to be able to quantify the similarity of a sequence to those from natural human repertoires (its “humanness”) and to propose changes that could be made to a sequence to make it more human and hence less likely to be rejected by a patient. This “humanization” process can be guided through comparisons with human BCR repertoires, because they are natural and represent what is “allowed” and what is safe in an organism (see Fig. 5). Previous work has used small sets of reference sequences (such as known germline sequences) to infer humanness (
      • Abhinandan K.R.
      • Martin A.C.R.
      Analyzing the “degree of humanness” of antibody sequences.
      ,
      • Lazar G.A.
      • Desjarlais J.R.
      • Jacinto J.
      • Karki S.
      • Hammond P.W.
      A molecular immunology approach to antibody humanization and functional optimization.
      ,
      • Gao S.H.
      • Huang K.
      • Tu H.
      • Adler A.S.
      Monoclonal antibody humanness score and its applications.
      ), but the growth of BCR repertoire sequencing has created new opportunities. The amount of data now available allows not only the identification of which amino acids are allowed at which positions, but also the investigation of residue couplings and covariation (
      • Wollacott A.M.
      • Xue C.
      • Qin Q.
      • Hua J.
      • Bohnuud T.
      • Viswanathan K.
      • Kolachalama V.B.
      Quantifying the nativeness of antibody sequences using long short-term memory networks.
      ). Recently, Wollacott et al. (
      • Wollacott A.M.
      • Xue C.
      • Qin Q.
      • Hua J.
      • Bohnuud T.
      • Viswanathan K.
      • Kolachalama V.B.
      Quantifying the nativeness of antibody sequences using long short-term memory networks.
      ) described a machine learning-based humanization method, trained on large sets of sequence data, and demonstrated that it outperformed other methods at evaluating the humanness of antibodies from sequence.
      Figure thumbnail gr5
      Figure 5WebLogo representations for the second framework region (residues 39–55 in the IMGT numbering scheme) for known human and mouse antibody sequences. Hydrophobic amino acids are shown in red, hydrophilic in blue, and neutral in gray. Data were extracted from OAS; we only considered repertoires from individuals with no disease and no vaccine recorded. Whereas amino acid usage is the same at many positions along the sequence, it can be seen that there are differences that could potentially be used to measure “humanness” and guide the humanization process. For example, it is rare to observe lysine at position 43 in human antibodies, but this is common in mice. Changing a lysine to an arginine at this position in a potential therapeutic may therefore reduce immunogenicity.
      The chemical properties of a potential therapeutic can also cause problems, such as instability, self-association, high viscosity, polyspecificity, and poor expression (
      • Jarasch A.
      • Koll H.
      • Regula J.T.
      • Bader M.
      • Papadimitriou A.
      • Kettenberger H.
      Developability assessment during the selection of novel therapeutic antibodies.
      ). These characteristics can be determined experimentally; however, this is time-consuming and hence low-throughput, meaning the examination of thousands or millions of sequences from a BCR repertoire is not feasible. However, some of these properties can be predicted from the amino acid sequence of the antibody. For example, a number of sequence motifs have been identified that indicate sites of potential post-translational modification (
      • Leem J.
      • Dunbar J.
      • Georges G.
      • Shi J.
      • Deane C.M.
      ABodyBuilder: automated antibody structure prediction with data-driven accuracy estimation.
      ,
      • Haberger M.
      • Bomans K.
      • Diepold K.
      • Hook M.
      • Gassner J.
      • Schlothauer T.
      • Zwick A.
      • Spick C.
      • Kepert J.F.
      • Hienz B.
      • Wiedmann M.
      • Beck H.
      • Metzger P.
      • Mølhøj M.
      • Knoblich C.
      • et al.
      Assessment of chemical modifications of sites in the CDRs of recombinant antibodies.
      ); hydrophobic residues in the CDRs are thought to lead to high aggregation, viscosity, and polyspecificity (
      • Jarasch A.
      • Koll H.
      • Regula J.T.
      • Bader M.
      • Papadimitriou A.
      • Kettenberger H.
      Developability assessment during the selection of novel therapeutic antibodies.
      ,
      • Xu Y.
      • Roach W.
      • Sun T.
      • Jain T.
      • Prinz B.
      • Yu T.Y.
      • Torrey J.
      • Thomas J.
      • Bobrowicz P.
      • Vásquez M.
      • Wittrup K.D.
      • Krauland E.
      Addressing polyspecificity of antibodies selected from an in vitro yeast presentation system: a FACS-based, high-throughput selection and analytical tool.
      ,
      • Sharma V.K.
      • Patapoff T.W.
      • Kabakoff B.
      • Pai S.
      • Hilario E.
      • Zhang B.
      • Li C.
      • Borisov O.
      • Kelley R.F.
      • Chorny I.
      • Zhou J.Z.
      • Dill K.A.
      • Swartz T.E.
      In silico selection of therapeutic antibodies for development: viscosity, clearance, and chemical stability.
      ,
      • Chennamsetty N.
      • Voynov V.
      • Kayser V.
      • Helk B.
      • Trout B.L.
      Enhanced stability.
      ,
      • Lauer T.M.
      • Agrawal N.J.
      • Chennamsetty N.
      • Egodage K.
      • Helk B.
      • Trout B.L.
      Developability index: a rapid in silico tool for the screening of antibody aggregation propensity.
      ,
      • Jain T.
      • Boland T.
      • Lilov A.
      • Burnina I.
      • Brown M.
      • Xu Y.
      • Vásquez M.
      Prediction of delayed retention of antibodies in hydrophobic interaction chromatography from sequence using machine learning.
      ,
      • Obrezanova O.
      • Arnell A.
      • de la Cuesta R.G.
      • Berthelot M.E.
      • Gallagher T.R.
      • Zurdo J.
      • Stallwood Y.
      Aggregation risk prediction for antibodies and its application to biotherapeutic development.
      ); patches of electrostatic charge on the antibody surface have been linked to high clearance rates and poor expression (
      • Datta-Mannan A.
      • Thangaraju A.
      • Leung D.
      • Tang Y.
      • Witcher D.R.
      • Lu J.
      • Wroblewski V.J.
      Balancing charge in the complementarity-determining regions of humanized mAbs without affecting pl reduces non-specific binding and improves the pharmacokinetics.
      ,
      • Popovic B.
      • Gibson S.
      • Senussi T.
      • Carmen S.
      • Kidd S.
      • Slidel T.
      • Strickland I.
      • Xu J.
      • Spooner J.
      • Lewis A.
      • Hudson N.
      • Mackenzie L.
      • Keen J.
      • Kemp B.
      • Hardman C.
      • et al.
      Engineering the expression of an anti-interleukin-13 antibody through rational design and mutagenesis.
      ); and asymmetric charges of the heavy and light variable domains result in self-association and high viscosity (
      • Sharma V.K.
      • Patapoff T.W.
      • Kabakoff B.
      • Pai S.
      • Hilario E.
      • Zhang B.
      • Li C.
      • Borisov O.
      • Kelley R.F.
      • Chorny I.
      • Zhou J.Z.
      • Dill K.A.
      • Swartz T.E.
      In silico selection of therapeutic antibodies for development: viscosity, clearance, and chemical stability.
      ,
      • Yadav S.
      • Laue T.M.
      • Kalonia D.S.
      • Singh S.N.
      • Shire S.J.
      The influence of charge distribution on self-association and viscosity behavior of monoclonal antibody solutions.
      ). A number of computational tools have therefore been developed that predict these risk factors (e.g. Refs.
      • Sharma V.K.
      • Patapoff T.W.
      • Kabakoff B.
      • Pai S.
      • Hilario E.
      • Zhang B.
      • Li C.
      • Borisov O.
      • Kelley R.F.
      • Chorny I.
      • Zhou J.Z.
      • Dill K.A.
      • Swartz T.E.
      In silico selection of therapeutic antibodies for development: viscosity, clearance, and chemical stability.
      ,
      • Chennamsetty N.
      • Voynov V.
      • Kayser V.
      • Helk B.
      • Trout B.L.
      Enhanced stability.
      ,
      • Lauer T.M.
      • Agrawal N.J.
      • Chennamsetty N.
      • Egodage K.
      • Helk B.
      • Trout B.L.
      Developability index: a rapid in silico tool for the screening of antibody aggregation propensity.
      ,
      • Jain T.
      • Boland T.
      • Lilov A.
      • Burnina I.
      • Brown M.
      • Xu Y.
      • Vásquez M.
      Prediction of delayed retention of antibodies in hydrophobic interaction chromatography from sequence using machine learning.
      ,
      • Obrezanova O.
      • Arnell A.
      • de la Cuesta R.G.
      • Berthelot M.E.
      • Gallagher T.R.
      • Zurdo J.
      • Stallwood Y.
      Aggregation risk prediction for antibodies and its application to biotherapeutic development.
      and
      • Sydow J.F.
      • Lipsmeier F.
      • Larraillet V.
      • Hilger M.
      • Mautz B.
      • Mølhøj M.
      • Kuentzer J.
      • Klostermann S.
      • Schoch J.
      • Voelger H.R.
      • Regula J.T.
      • Cramer P.
      • Papadimitriou A.
      • Kettenberger H.
      Structure-based prediction of asparagine and aspartate degradation sites in antibody variable regions.
      ). Whereas some of these attempt to predict solely from sequence, the majority require structural knowledge—for instance, it is important to know which residues are located on the antibody surface (
      • Chennamsetty N.
      • Voynov V.
      • Kayser V.
      • Helk B.
      • Trout B.L.
      Enhanced stability.
      ,
      • Lauer T.M.
      • Agrawal N.J.
      • Chennamsetty N.
      • Egodage K.
      • Helk B.
      • Trout B.L.
      Developability index: a rapid in silico tool for the screening of antibody aggregation propensity.
      ). The tools can be exploited during the identification of binders as described above to minimize issues further along the therapeutic development pipeline.
      The properties described above can also be examined by calculating repertoire-wide distributions. As a simple example, consider the lengths of the CDRs. Using sequence repertoires, the distribution of observed lengths can be obtained. If a given length falls outside the range of this distribution, it can be assumed that this property is “unnatural,” and therefore the antibody is more likely to have undesirable characteristics in vivo. Raybould et al. (
      • Raybould M.I.J.
      • Marks C.
      • Krawczyk K.
      • Taddese B.
      • Nowak J.
      • Lewis A.P.
      • Bujotzek A.
      • Shi J.
      • Deane C.M.
      Five computational developability guidelines for therapeutic antibody profiling.
      ) used this approach, alongside the generation of antibody model libraries, to contextualize known therapeutic sequences against human repertoires. They were therefore able to define five developability guidelines that predict whether a given antibody will be successful as a therapeutic, based on total CDR length, patches of hydrophobicity, patches of positive and negative charge, and the overall surface charge of VH and VL domains. Testing the guidelines on sequences from two antibody discovery projects showed that this approach successfully highlighted candidates with known developability issues.
      In summary, by representing the allowed antibody sequence space, BCR repertoires can be used to guide the antibody discovery and development process toward more successful therapeutic candidates. Using developability or humanness prediction algorithms in conjunction with in silico screening of BCR repertoires should be of great benefit to the therapeutic development community, and as sequence repositories continue to grow and computational techniques become more sophisticated, we can expect more advances to be made.

      Conclusions

      Advances in next-generation sequencing and its increasing use in characterizing the immune system has led to the exponential growth of the number of known antibody sequences. Subsequently, there is now a wealth of information, which has increased opportunities for large-scale data mining. The amount of data presents its challenges, however. Curated, publicly available sequence repositories such as the OAS are addressing the problem of storage and accessibility, but changes may have to be made as we learn more about the needs of researchers wishing to use the data. The increase in the amount of data will also create computational obstacles; we must continue to develop methods that can analyze huge numbers of sequences in a time- and resource-efficient manner.
      Repertoire data can be used to gain a deeper understanding of human immune system, including the mechanisms that drive repertoire diversity and its response to antigen exposure. Comparisons between individuals have detected the presence of a core set of shared sequences or clonotypes known as the public repertoire, potentially of great importance in protecting against common antigens.
      The antigen-binding properties of antibodies are governed by their structures. Sequence-similar antibodies may adopt different structures, and vice versa; by using sequence alone, these subtleties are not discerned. The incorporation of structural information into repertoire analyses, through annotation or modeling, therefore allows more accurate comparisons to be made and hence provides a better representation of the repertoire space. Ongoing improvements in modeling algorithms, in particular increased speed and accuracy of H3 structure prediction, will mean that larger subsets of the repertoire can be analyzed in this manner and with more reliability. An increase in the number of available templates would also improve structural modeling—repertoire data itself may be used in this process, to highlight areas of sequence space for which structures are currently lacking.
      Large-scale sequencing data can also be of great benefit during the discovery of antibodies for therapeutic use. Clonal selection and expansion leads to the enrichment of the repertoire with antigen binders post-exposure; these can be identified and used as starting points for further development. The presence of sequence-similar antibodies to known therapeutics in OAS (
      • Krawczyk K.
      • Kelm S.
      • Kovaltsuk A.
      • Galson J.D.
      • Kelly D.
      • Trück J.
      • Regep C.
      • Leem J.
      • Wong W.K.
      • Nowak J.
      • Snowden J.
      • Wright M.
      • Starkie L.
      • Scott-Tucker A.
      • Shi J.
      • et al.
      Structurally mapping antibody repertoires.
      ) indicates that it should be possible to mine these repositories for new therapeutic leads without performing specific experiments. For example, in silico screening libraries could be developed, by combining BCR repertoire data with modeling protocols and other computational tools (e.g. docking algorithms) to select likely binders.
      Currently, it is possible for the computational approaches such as those described in this review to be used in tandem with experimental work. For example, after a potential binder is identified experimentally, clonotyping can be used to select similar antibodies from a repertoire, thereby expanding the pool of candidates for further study. In the long term, however, the objective of many researchers is to make the discovery of new therapeutic antibodies completely computational, with little or no human input. Consolidating all of the knowledge gained from large-scale repertoire analysis may enable the creation of an in silico immune system, or at the least a completely human-like synthetic repertoire that can be screened to identify potential therapeutics. Although it is too soon to say whether an entirely in silico protocol would produce better results than an experimental one, it would remove the need for expensive and time-consuming experimental work and would mean the immunization of animals is no longer required. There are many obstacles to achieve this, perhaps most importantly in the initial selection of antibodies that bind to a specific antigen of interest—improvements in structural modeling, docking, and binding affinity prediction in particular will help this.
      Even though there is a large quantity of data already available, there is a vast amount of the antibody sequence space that remains unknown. For example, at around one billion sequences (including redundant sequences), the Observed Antibody Space database represents less than 0.01% of the potential total number (predicted to be around 1013 nonredundant sequences). Efforts should also be made to sequence repertoires with different attributes (e.g. ethnic background)—currently, this is not routinely disclosed, making analysis of its effect on the repertoire difficult. The continued growth of available sequence information should mean that currently unknown parts of sequence space are investigated, and therefore we should be able to analyze the workings of the immune system and predict antibody/repertoire properties more accurately. Importantly, with the development of experimental techniques that preserve the native VH-VL pairings, we will no longer have to rely on approximations and exhaustive combinatorics to achieve an accurate view of what binding sites are present. Overall, access to large-scale sequencing data has provided many opportunities to deepen our understanding of the immune system and improve our ability to design biotherapeutics and will surely continue to do so.

      References

        • Sela-Culang I.
        • Kunik V.
        • Ofran Y.
        The structural basis of antibody-antigen recognition.
        Front. Immunol. 2013; 4 (24115948): 302
        • Saper C.B.
        A guide to the perplexed on the specificity of antibodies.
        J. Histochem. Cytochem. 2009; 57 (18854594): 1-5
        • Ecker D.M.
        • Jones S.D.
        • Levine H.L.
        The therapeutic monoclonal antibody market.
        mAbs. 2015; 7 (25529996): 9-14
        • Raybould M.I.J.
        • Marks C.
        • Lewis A.P.
        • Shi J.
        • Bujotzek A.
        • Taddese B.
        • Deane C.M.
        Thera-SAbDab: the therapeutic structural antibody database.
        Nucleic Acids Res. 2020; 48 (31555805): D383-D388
        • Kaplon H.
        • Reichert J.M.
        Antibodies to watch in 2019.
        mAbs. 2019; 11 (30516432): 219-238
        • Greiff V.
        • Miho E.
        • Menzel U.
        • Reddy S.T.
        Bioinformatic and statistical analysis of adaptive immune repertoires.
        Trends Immunol. 2015; 36 (26508293): 738-749
        • Tonegawa S.
        Somatic generation of antibody diversity.
        Nature. 1983; 302 (6300689): 575-581
        • Jeske D.J.
        • Jarvis J.
        • Milstein C.
        • Capra J.D.
        Junctional diversity.
        J. Immunol. 1984; 133 (6747289): 1090-1092
        • Schramm C.A.
        • Douek D.C.
        Beyond hot spots: biases in antibody somatic hypermutation and implications for vaccine design.
        Front. Immunol. 2018; 9 (30154794): 1876
        • Collis A.V.
        • Brouwer A.P.
        • Martin A.C.
        Analysis of the antigen combining site: correlations between length and sequence composition of the hypervariable loops and the nature of the antigen.
        J. Mol. Biol. 2003; 325 (12488099): 337-354
        • Xu J.L.
        • Davis M.M.
        Diversity in the CDR3 region of V.
        Immunity. 2000; 13 (10933393): 37-45
        • Kuroda D.
        • Shirai H.
        • Jacobson M.P.
        • Nakamura H.
        Computer-aided antibody design.
        Protein Eng. Des. Sel. 2012; 25 (22661385): 507-521
        • Burnet F.M.
        Theories of immunity.
        Perspect. Biol. Med. 1960; 3 (13806209): 447-458
        • Glanville J.
        • Zhai W.
        • Berka J.
        • Telman D.
        • Huerta G.
        • Mehta G.R.
        • Ni I.
        • Mei L.
        • Sundar P.D.
        • Day G.M.
        • Cox D.
        • Rajpal A.
        • Pons J.
        Precise determination of the diversity of a combinatorial antibody library gives insight into the human immunoglobulin repertoire.
        Proc. Natl. Acad. Sci. U.S.A. 2009; 106 (19875695): 20216-20221
        • Georgiou G.
        • Ippolito G.C.
        • Beausang J.
        • Busse C.E.
        • Wardemann H.
        • Quake S.R.
        The promise and challenge of high-throughput sequencing of the antibody repertoire.
        Nat. Biotechnol. 2014; 32 (24441474): 158-168
        • Ota M.
        • Duong B.H.
        • Torkamani A.
        • Doyle C.M.
        • Gavin A.L.
        • Ota T.
        • Nemazee D.
        Regulation of the B cell receptor repertoire and self-reactivity by BAFF.
        J. Immunol. 2010; 185 (20817867): 4128-4136
        • Zhou T.
        • Zhu J.
        • Wu X.
        • Moquin S.
        • Zhang B.
        • Acharya P.
        • Georgiev I.S.
        • Altae-Tran H.R.
        • Chuang G.-Y.
        • Joyce M.G.
        • Kwon Y.D.
        • Longo N.S.
        • Louder M.K.
        • Luongo T.
        • McKee K.
        • et al.
        Multidonor analysis reveals structural elements, genetic determinants, and maturation pathway for HIV-1 neutralization by VRC01-class antibodies.
        Immunity. 2013; 39 (23911655): 245-258
        • Vander Heiden J.A.
        • Stathopoulos P.
        • Zhou J.Q.
        • Chen L.
        • Gilbert T.J.
        • Bolen C.R.
        • Barohn R.J.
        • Dimachkie M.M.
        • Ciafaloni E.
        • Broering T.J.
        • Vigneault F.
        • Nowak R.J.
        • Kleinstein S.H.
        • O'Connor K.C.
        Dysregulation of B cell repertoire formation in myasthenia gravis patients revealed through deep sequencing.
        J. Immunol. 2017; 198 (28087666): 1460-1473
        • Gidoni M.
        • Snir O.
        • Peres A.
        • Polak P.
        • Lindeman I.
        • Mikocziova I.
        • Sarna V.K.
        • Lundin K.E.A.
        • Clouser C.
        • Vigneault F.
        • Collins A.M.
        • Sollid L.M.
        • Yaari G.
        Mosaic deletion patterns of the human antibody heavy chain gene locus shown by Bayesian haplotyping.
        Nat. Commun. 2019; 10 (30733445): 628
        • Briney B.
        • Inderbitzin A.
        • Joyce C.
        • Burton D.R.
        Commonality despite exceptional diversity in the baseline human antibody repertoire.
        Nature. 2019; 566 (30664748): 393-397
        • López-Santibáñez-Jácome L.
        • Avendaño-Vázquez S.E.
        • Flores-Jasso C.F.
        The pipeline repertoire for Ig-Seq analysis.
        Front. Immunol. 2019; 10 (31114573): 899
        • Corrie B.D.
        • Marthandan N.
        • Zimonja B.
        • Jaglale J.
        • Zhou Y.
        • Barr E.
        • Knoetze N.
        • Breden F.M.W.
        • Christley S.
        • Scott J.K.
        • Cowell L.G.
        • Breden F.
        iReceptor: a platform for querying and analyzing antibody/B-cell and T-cell receptor repertoire data across federated repositories.
        Immunol. Rev. 2018; 284 (29944754): 24-41
        • Christley S.
        • Scarborough W.
        • Salinas E.
        • Rounds W.H.
        • Toby I.T.
        • Fonner J.M.
        • Levin M.K.
        • Kim M.
        • Mock S.A.
        • Jordan C.
        • Ostmeyer J.
        • Buntzman A.
        • Rubelt F.
        • Davila M.L.
        • Monson N.L.
        • et al.
        VDJServer: A cloud-based analysis portal and data commons for immune repertoire sequences and rearrangements.
        Front. Immunol. 2018; 9 (29867956): 976
        • Rosenfeld A.M.
        • Meng W.
        • Luning Prak E.T.
        • Hershberg U.
        ImmuneDB, a novel tool for the analysis, storage, and dissemination of immune repertoire sequencing data.
        Front. Immunol. 2018; 9 (30298069): 2107
        • Chailyan A.
        • Tramontano A.
        • Marcatili P.
        A database of immunoglobulins with integrated tools: DIGIT.
        Nucleic Acids Res. 2012; 40 (22080506): 1230-1234
        • Swindells M.B.
        • Porter C.T.
        • Couch M.
        • Hurst J.
        • Abhinandan K.R.
        • Nielsen J.H.
        • Macindoe G.
        • Hetherington J.
        • Martin A.C.
        abYsis: integrated antibody sequence and structure-management, analysis, and prediction.
        J. Mol. Biol. 2017; 429 (27561707): 356-364
        • Zhang W.
        • Wang L.
        • Liu K.
        • Wei X.
        • Yang K.
        • Du W.
        • Wang S.
        • Guo N.
        • Ma C.
        • Luo L.
        • Wu J.
        • Lin L.
        • Yang F.
        • Gao F.
        • Wang X.
        • et al.
        PIRD: Pan Immune Repertoire Database.
        Bioinformatics. 2019; 36 (31373607): 897-903
        • Kovaltsuk A.
        • Leem J.
        • Kelm S.
        • Snowden J.
        • Deane C.M.
        • Krawczyk K.
        Observed antibody space: a resource for data mining next-generation sequencing of antibody repertoires.
        J. Immunol. 2018; 201 (30217829): 2502-2509
        • DeWitt W.S.
        • Lindau P.
        • Snyder T.M.
        • Sherwood A.M.
        • Vignali M.
        • Carlson C.S.
        • Greenberg P.D.
        • Duerkopp N.
        • Emerson R.O.
        • Robins H.S.
        A public database of memory and naive B-cell receptor sequences.
        PLoS ONE. 2016; 11 (27513338): e0160853
        • Wrammert J.
        • Smith K.
        • Miller J.
        • Langley W.A.
        • Kokko K.
        • Larsen C.
        • Zheng N.Y.
        • Mays I.
        • Garman L.
        • Helms C.
        • James J.
        • Air G.M.
        • Capra J.D.
        • Ahmed R.
        • Wilson P.C.
        Rapid cloning of high-affinity human monoclonal antibodies against influenza virus.
        Nature. 2008; 453 (18449194): 667-671
        • Yu X.
        • Tsibane T.
        • McGraw P.A.
        • House F.S.
        • Keefer C.J.
        • Hicar M.D.
        • Tumpey T.M.
        • Pappas C.
        • Perrone L.A.
        • Martinez O.
        • Stevens J.
        • Wilson I.A.
        • Aguilar P.V.
        • Altschuler E.L.
        • Basler C.F.
        • Crowe Jr., J.E.
        Neutralizing antibodies derived from the B cells of 1918 influenza pandemic survivors.
        Nature. 2008; 455 (18716625): 532-536
        • Frost S.D.
        • Murrell B.
        • Hossain A.S.M.
        • Silverman G.J.
        • Pond S.L.
        Assigning and visualizing germline genes in antibody repertoires.
        Phil. Trans. R. Soc. B. 2015; 370 (26194754): 20140240
        • Miho E.
        • Yermanos A.
        • Weber C.R.
        • Berger C.T.
        • Reddy S.T.
        • Greiff V.
        Computational strategies for dissecting the high-dimensional complexity of adaptive immune repertoires.
        Front. Immunol. 2018; 9: 224
        • Gadala-Maria D.
        • Yaari G.
        • Uduman M.
        • Kleinstein S.H.
        Automated analysis of high-throughput B-cell sequencing data reveals a high frequency of novel immunoglobulin V gene segment alleles.
        Proc. Natl. Acad. Sci. U.S.A. 2015; 112 (25675496): E862-E870
        • Gupta N.T.
        • Vander Heiden J.A.
        • Uduman M.
        • Gadala-Maria D.
        • Yaari G.
        • Kleinstein S.H.
        Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data.
        Bioinformatics. 2015; 31 (26069265): 3356-3358
        • Corcoran M.M.
        • Phad G.E.
        • Vázquez Bernat N.V.
        • Stahl-Hennig C.
        • Sumida N.
        • Persson M.A.
        • Martin M.
        • Karlsson Hedestam G.B.
        Production of individualized v gene databases reveals high levels of immunoglobulin genetic diversity.
        Nat. Commun. 2016; 7 (27995928): 13642
        • Marcou Q.
        • Mora T.
        • Walczak A.M.
        High-throughput immune repertoire analysis with IGoR.
        Nat. Commun. 2018; 9 (29422654): 561