Advertisement

Recent advances in de novo protein design: Principles, methods, and applications

  • Xingjie Pan
    Correspondence
    For correspondence: Xingjie Pan; Tanja Kortemme
    Affiliations
    Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA

    UC Berkeley – UCSF Graduate Program in Bioengineering, University of California San Francisco, San Francisco, California, USA
    Search for articles by this author
  • Tanja Kortemme
    Correspondence
    For correspondence: Xingjie Pan; Tanja Kortemme
    Affiliations
    Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA

    UC Berkeley – UCSF Graduate Program in Bioengineering, University of California San Francisco, San Francisco, California, USA

    Quantitative Biosciences Institute (QBI), University of California San Francisco, San Francisco, California, USA
    Search for articles by this author
Open AccessPublished:March 17, 2021DOI:https://doi.org/10.1016/j.jbc.2021.100558
      The computational de novo protein design is increasingly applied to address a number of key challenges in biomedicine and biological engineering. Successes in expanding applications are driven by advances in design principles and methods over several decades. Here, we review recent innovations in major aspects of the de novo protein design and include how these advances were informed by principles of protein architecture and interactions derived from the wealth of structures in the Protein Data Bank. We describe developments in de novo generation of designable backbone structures, optimization of sequences, design scoring functions, and the design of the function. The advances not only highlight design goals reachable now but also point to the challenges and opportunities for the future of the field.

      Keywords

      Abbreviations:

      FASTER (fast and accurate side chain topology and energy refinement), LOCKR (latching orthogonal cage-key proteins), MSD (multistate design), PDB (Protein Data Bank), RIF (rotamer interaction field), SEWING (structure extension with native-substructure graphs), TERMs (tertiary structure motifs), TR-Rosetta (transform-restrained Rosetta)
      The “de novo” protein design describes the generation of new proteins with sequences unrelated to those in nature based on physical principles of intramolecular and intermolecular interactions (
      • Huang P.S.
      • Boyken S.E.
      • Baker D.
      The coming of age of de novo protein design.
      ). Although most current contributions to the de novo design focus on new structures, efforts in the field are increasingly directed toward designing new biological functions and their applications (
      • Huang P.S.
      • Boyken S.E.
      • Baker D.
      The coming of age of de novo protein design.
      ,
      • Kuhlman B.
      • Bradley P.
      Advances in protein structure prediction and design.
      ). Designer proteins are beginning to impact biomedical and synthetic biology research. Exciting recently designed functions include inhibitors of viral infections (
      • Chevalier A.
      • Silva D.A.
      • Rocklin G.J.
      • Hicks D.R.
      • Vergara R.
      • Murapa P.
      • Bernard S.M.
      • Zhang L.
      • Lam K.H.
      • Yao G.
      • Bahl C.D.
      • Miyashita S.I.
      • Goreshnik I.
      • Fuller J.T.
      • Koday M.T.
      • et al.
      Massively parallel de novo protein design for targeted therapeutics.
      ,
      • Cao L.
      • Goreshnik I.
      • Coventry B.
      • Case J.B.
      • Miller L.
      • Kozodoy L.
      • Chen R.E.
      • Carter L.
      • Walls A.C.
      • Park Y.J.
      • Strauch E.M.
      • Stewart L.
      • Diamond M.S.
      • Veesler D.
      • Baker D.
      De novo design of picomolar SARS-CoV-2 miniprotein inhibitors.
      ), immune system modulators (
      • Mohan K.
      • Ueda G.
      • Kim A.R.
      • Jude K.M.
      • Fallas J.A.
      • Guo Y.
      • Hafer M.
      • Miao Y.
      • Saxton R.A.
      • Piehler J.
      • Sankaran V.G.
      • Baker D.
      • Garcia K.C.
      Topological control of cytokine receptor signaling induces differential effects in hematopoiesis.
      ,
      • Silva D.A.
      • Yu S.
      • Ulge U.Y.
      • Spangler J.B.
      • Jude K.M.
      • Labao-Almeida C.
      • Ali L.R.
      • Quijano-Rubio A.
      • Ruterbusch M.
      • Leung I.
      • Biary T.
      • Crowley S.J.
      • Marcos E.
      • Walkey C.D.
      • Weitzner B.D.
      • et al.
      De novo design of potent and selective mimics of IL-2 and IL-15.
      ), self-assembling biomaterials (
      • Gonen S.
      • DiMaio F.
      • Gonen T.
      • Baker D.
      Design of ordered two-dimensional arrays mediated by noncovalent protein-protein interfaces.
      ,
      • Shen H.
      • Fallas J.A.
      • Lynch E.
      • Sheffler W.
      • Parry B.
      • Jannetty N.
      • Decarreau J.
      • Wagenbach M.
      • Vicente J.J.
      • Chen J.
      • Wang L.
      • Dowling Q.
      • Oberdorfer G.
      • Stewart L.
      • Wordeman L.
      • et al.
      De novo design of self-assembling helical protein filaments.
      ,
      • Chen Z.
      • Johnson M.C.
      • Chen J.
      • Bick M.J.
      • Boyken S.E.
      • Lin B.
      • De Yoreo J.J.
      • Kollman J.M.
      • Baker D.
      • DiMaio F.
      Self-assembling 2D arrays with de Novo protein building blocks.
      ), sense-and-respond signaling systems (
      • Feng J.
      • Jester B.W.
      • Tinberg C.E.
      • Mandell D.J.
      • Antunes M.S.
      • Chari R.
      • Morey K.J.
      • Rios X.
      • Medford J.I.
      • Church G.M.
      • Fields S.
      • Baker D.
      A general strategy to construct small molecule biosensors in eukaryotes.
      ,
      • Bick M.J.
      • Greisen P.J.
      • Morey K.J.
      • Antunes M.S.
      • La D.
      • Sankaran B.
      • Reymond L.
      • Johnsson K.
      • Medford J.I.
      • Baker D.
      Computational design of environmental sensors for the potent opioid fentanyl.
      ,
      • Glasgow A.A.
      • Huang Y.M.
      • Mandell D.J.
      • Thompson M.
      • Ritterson R.
      • Loshbaugh A.L.
      • Pellegrino J.
      • Krivacic C.
      • Pache R.A.
      • Barlow K.A.
      • Ollikainen N.
      • Jeon D.
      • Kelly M.J.S.
      • Fraser J.S.
      • Kortemme T.
      Computational design of a modular protein sense-response system.
      ,
      • Quijano-Rubio A.
      • Yeh H.W.
      • Park J.
      • Lee H.
      • Langan R.A.
      • Boyken S.E.
      • Lajoie M.J.
      • Cao L.
      • Chow C.M.
      • Miranda M.C.
      • Wi J.
      • Hong H.J.
      • Stewart L.
      • Oh B.H.
      • Baker D.
      De novo design of modular and tunable allosteric biosensors.
      ), and protein logic gates (
      • Chen Z.
      • Kibler R.D.
      • Hunt A.
      • Busch F.
      • Pearl J.
      • Jia M.
      • VanAernum Z.L.
      • Wicky B.I.M.
      • Dods G.
      • Liao H.
      • Wilken M.S.
      • Ciarlo C.
      • Green S.
      • El-Samad H.
      • Stamatoyannopoulos J.
      • et al.
      De novo design of protein logic gates.
      ,
      • Langan R.A.
      • Boyken S.E.
      • Ng A.H.
      • Samson J.A.
      • Dods G.
      • Westbrook A.M.
      • Nguyen T.H.
      • Lajoie M.J.
      • Chen Z.
      • Berger S.
      • Mulligan V.K.
      • Dueber J.E.
      • Novak W.R.P.
      • El-Samad H.
      • Baker D.
      De novo design of bioactive protein switches.
      ).
      Underlying these successful applications are developments of computational design principles over the last decades. Many such principles have been learned from the wealth of existing architectures in the Protein Data Bank (PDB) (
      • Berman H.M.
      • Westbrook J.
      • Feng Z.
      • Gilliland G.
      • Bhat T.N.
      • Weissig H.
      • Shindyalov I.N.
      • Bourne P.E.
      The protein data bank.
      ). While many computational design applications modify existing proteins (
      • Glasgow A.A.
      • Huang Y.M.
      • Mandell D.J.
      • Thompson M.
      • Ritterson R.
      • Loshbaugh A.L.
      • Pellegrino J.
      • Krivacic C.
      • Pache R.A.
      • Barlow K.A.
      • Ollikainen N.
      • Jeon D.
      • Kelly M.J.S.
      • Fraser J.S.
      • Kortemme T.
      Computational design of a modular protein sense-response system.
      ,
      • Dahiyat B.I.
      • Mayo S.L.
      De novo protein design: Fully automated sequence selection.
      ,
      • Jiang L.
      • Althoff E.A.
      • Clemente F.R.
      • Doyle L.
      • Rothlisberger D.
      • Zanghellini A.
      • Gallaher J.L.
      • Betker J.L.
      • Tanaka F.
      • Barbas 3rd, C.F.
      • Hilvert D.
      • Houk K.N.
      • Stoddard B.L.
      • Baker D.
      De novo computational design of retro-aldol enzymes.
      ,
      • Rothlisberger D.
      • Khersonsky O.
      • Wollacott A.M.
      • Jiang L.
      • DeChancie J.
      • Betker J.
      • Gallaher J.L.
      • Althoff E.A.
      • Zanghellini A.
      • Dym O.
      • Albeck S.
      • Houk K.N.
      • Tawfik D.S.
      • Baker D.
      Kemp elimination catalysts by computational enzyme design.
      ,
      • Tinberg C.E.
      • Khare S.D.
      • Dou J.
      • Doyle L.
      • Nelson J.W.
      • Schena A.
      • Jankowski W.
      • Kalodimos C.G.
      • Johnsson K.
      • Stoddard B.L.
      • Baker D.
      Computational design of ligand-binding proteins with high affinity and selectivity.
      ), it is becoming possible to design both structures and functions entirely de novo (
      • Huang P.S.
      • Boyken S.E.
      • Baker D.
      The coming of age of de novo protein design.
      ). It was recognized early that variations of helical architectures could be designed based on parametric equations (
      • Crick F.
      The Fourier transform of a coiled-coil.
      ). Helical bundle proteins have indeed proven to be very “designable” (
      • Hill R.B.
      • Raleigh D.P.
      • Lombardi A.
      • DeGrado W.F.
      De novo design of helical bundles as models for understanding protein folding and function.
      ) and have consequently been adapted to many functions (
      • Quijano-Rubio A.
      • Yeh H.W.
      • Park J.
      • Lee H.
      • Langan R.A.
      • Boyken S.E.
      • Lajoie M.J.
      • Cao L.
      • Chow C.M.
      • Miranda M.C.
      • Wi J.
      • Hong H.J.
      • Stewart L.
      • Oh B.H.
      • Baker D.
      De novo design of modular and tunable allosteric biosensors.
      ,
      • Chen Z.
      • Kibler R.D.
      • Hunt A.
      • Busch F.
      • Pearl J.
      • Jia M.
      • VanAernum Z.L.
      • Wicky B.I.M.
      • Dods G.
      • Liao H.
      • Wilken M.S.
      • Ciarlo C.
      • Green S.
      • El-Samad H.
      • Stamatoyannopoulos J.
      • et al.
      De novo design of protein logic gates.
      ,
      • Langan R.A.
      • Boyken S.E.
      • Ng A.H.
      • Samson J.A.
      • Dods G.
      • Westbrook A.M.
      • Nguyen T.H.
      • Lajoie M.J.
      • Chen Z.
      • Berger S.
      • Mulligan V.K.
      • Dueber J.E.
      • Novak W.R.P.
      • El-Samad H.
      • Baker D.
      De novo design of bioactive protein switches.
      ,
      • Lajoie M.J.
      • Boyken S.E.
      • Salter A.I.
      • Bruffey J.
      • Rajan A.
      • Langan R.A.
      • Olshefsky A.
      • Muhunthan V.
      • Bick M.J.
      • Gewe M.
      • Quijano-Rubio A.
      • Johnson J.
      • Lenz G.
      • Nguyen A.
      • Pun S.
      • et al.
      Designed protein logic to target cells with precise combinations of surface antigens.
      ,
      • Joh N.H.
      • Wang T.
      • Bhate M.P.
      • Acharya R.
      • Wu Y.
      • Grabe M.
      • Hong M.
      • Grigoryan G.
      • DeGrado W.F.
      De novo design of a transmembrane Zn(2)(+)-transporting four-helix bundle.
      ,
      • Polizzi N.F.
      • DeGrado W.F.
      A defined structural unit enables de novo design of small-molecule-binding proteins.
      ,
      • Polizzi N.F.
      • Wu Y.
      • Lemmin T.
      • Maxwell A.M.
      • Zhang S.Q.
      • Rawson J.
      • Beratan D.N.
      • Therien M.J.
      • DeGrado W.F.
      De novo design of a hyperstable non-natural protein-ligand complex with sub-A accuracy.
      ,
      • Robertson D.E.
      • Farid R.S.
      • Moser C.C.
      • Urbauer J.L.
      • Mulholland S.E.
      • Pidikiti R.
      • Lear J.D.
      • Wand A.J.
      • DeGrado W.F.
      • Dutton P.L.
      Design and synthesis of multi-haem proteins.
      ). More recent developments have expanded the structural repertoire of de novo proteins to other fold classes (
      • Basanta B.
      • Bick M.J.
      • Bera A.K.
      • Norn C.
      • Chow C.M.
      • Carter L.P.
      • Goreshnik I.
      • Dimaio F.
      • Baker D.
      An enumerative algorithm for de novo design of proteins with diverse pocket structures.
      ,
      • Dou J.
      • Vorobieva A.A.
      • Sheffler W.
      • Doyle L.A.
      • Park H.
      • Bick M.J.
      • Mao B.
      • Foight G.W.
      • Lee M.Y.
      • Gagnon L.A.
      • Carter L.
      • Sankaran B.
      • Ovchinnikov S.
      • Marcos E.
      • Huang P.S.
      • et al.
      De novo design of a fluorescence-activating beta-barrel.
      ,
      • Koga N.
      • Tatsumi-Koga R.
      • Liu G.
      • Xiao R.
      • Acton T.B.
      • Montelione G.T.
      • Baker D.
      Principles for designing ideal protein structures.
      ,
      • Marcos E.
      • Chidyausiku T.M.
      • McShan A.C.
      • Evangelidis T.
      • Nerli S.
      • Carter L.
      • Nivon L.G.
      • Davis A.
      • Oberdorfer G.
      • Tripsianes K.
      • Sgourakis N.G.
      • Baker D.
      De novo design of a non-local beta-sheet protein with high stability and accuracy.
      ,
      • Pan X.
      • Thompson M.C.
      • Zhang Y.
      • Liu L.
      • Fraser J.S.
      • Kelly M.J.S.
      • Kortemme T.
      Expanding the space of protein geometries by computational design of de novo fold families.
      ). The first new alpha-beta protein, with a fold not previously observed in nature, was assembled from fragments from the PDB (
      • Kuhlman B.
      • Dantas G.
      • Ireton G.C.
      • Varani G.
      • Stoddard B.L.
      • Baker D.
      Design of a novel globular protein fold with atomic-level accuracy.
      ). Subsequent careful analyses of natural protein architectures led to the design of different alpha-beta proteins (
      • Koga N.
      • Tatsumi-Koga R.
      • Liu G.
      • Xiao R.
      • Acton T.B.
      • Montelione G.T.
      • Baker D.
      Principles for designing ideal protein structures.
      ), including a symmetrical artificial TIM barrel (
      • Huang P.S.
      • Feldmeier K.
      • Parmeggiani F.
      • Velasco D.A.F.
      • Hocker B.
      • Baker D.
      De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy.
      ), and all-beta proteins (
      • Dou J.
      • Vorobieva A.A.
      • Sheffler W.
      • Doyle L.A.
      • Park H.
      • Bick M.J.
      • Mao B.
      • Foight G.W.
      • Lee M.Y.
      • Gagnon L.A.
      • Carter L.
      • Sankaran B.
      • Ovchinnikov S.
      • Marcos E.
      • Huang P.S.
      • et al.
      De novo design of a fluorescence-activating beta-barrel.
      ,
      • Marcos E.
      • Chidyausiku T.M.
      • McShan A.C.
      • Evangelidis T.
      • Nerli S.
      • Carter L.
      • Nivon L.G.
      • Davis A.
      • Oberdorfer G.
      • Tripsianes K.
      • Sgourakis N.G.
      • Baker D.
      De novo design of a non-local beta-sheet protein with high stability and accuracy.
      ).
      Toward new functions, recent computational advances have led to the ability to generate precise geometric variations in de novo–designed protein families, mimicking the ability of evolution to precisely tune the shapes of the members of protein families for new activities (
      • Basanta B.
      • Bick M.J.
      • Bera A.K.
      • Norn C.
      • Chow C.M.
      • Carter L.P.
      • Goreshnik I.
      • Dimaio F.
      • Baker D.
      An enumerative algorithm for de novo design of proteins with diverse pocket structures.
      ,
      • Pan X.
      • Thompson M.C.
      • Zhang Y.
      • Liu L.
      • Fraser J.S.
      • Kelly M.J.S.
      • Kortemme T.
      Expanding the space of protein geometries by computational design of de novo fold families.
      ). Although these designed proteins are not close in sequence to any naturally occurring proteins, principles from structures in the PDB are still the guiding design. Such principles are useful for generating new protein structures through assembly from continuous (
      • Kuhlman B.
      • Dantas G.
      • Ireton G.C.
      • Varani G.
      • Stoddard B.L.
      • Baker D.
      Design of a novel globular protein fold with atomic-level accuracy.
      ,
      • Simons K.T.
      • Kooperberg C.
      • Huang E.
      • Baker D.
      Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions.
      ) or discontinuous (
      • Polizzi N.F.
      • DeGrado W.F.
      A defined structural unit enables de novo design of small-molecule-binding proteins.
      ,
      • Jacobs T.M.
      • Williams B.
      • Williams T.
      • Xu X.
      • Eletsky A.
      • Federizon J.F.
      • Szyperski T.
      • Kuhlman B.
      Design of structurally distinct proteins using strategies inspired by evolution.
      ,
      • Mackenzie C.O.
      • Zhou J.
      • Grigoryan G.
      Tertiary alphabet for the observable protein structural universe.
      ) three-dimensional elements, as well as for the development (
      • Kortemme T.
      • Morozov A.V.
      • Baker D.
      An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes.
      ) and optimization (
      • O'Meara M.J.
      • Leaver-Fay A.
      • Tyka M.
      • Stein A.
      • Houlihan K.
      • DiMaio F.
      • Bradley P.
      • Kortemme T.
      • Baker D.
      • Snoeyink J.
      • Kuhlman B.
      A combined covalent-electrostatic model of hydrogen bonding improves structure prediction with Rosetta.
      ,
      • Park H.
      • Bradley P.
      • Greisen Jr., P.
      • Liu Y.
      • Mulligan V.K.
      • Kim D.E.
      • Baker D.
      • DiMaio F.
      Simultaneous optimization of biomolecular energy functions on features from small molecules and macromolecules.
      ) of design energy functions used to rank design candidates. Moreover, the most recent developments of deep learning for protein structure prediction (
      • Senior A.W.
      • Evans R.
      • Jumper J.
      • Kirkpatrick J.
      • Sifre L.
      • Green T.
      • Qin C.
      • Zidek A.
      • Nelson A.W.R.
      • Bridgland A.
      • Penedones H.
      • Petersen S.
      • Simonyan K.
      • Crossan S.
      • Kohli P.
      • et al.
      Improved protein structure prediction using potentials from deep learning.
      ,
      • Yang J.
      • Anishchenko I.
      • Park H.
      • Peng Z.
      • Ovchinnikov S.
      • Baker D.
      Improved protein structure prediction using predicted interresidue orientations.
      ,
      • Callaway E.
      It will change everything': DeepMind's AI makes gigantic leap in solving protein structures.
      ) foreshadow new methods in the design, taking advantage of learned principles of the protein structure (
      • Alley E.C.
      • Khimulya G.
      • Biswas S.
      • AlQuraishi M.
      • Church G.M.
      Unified rational protein engineering with sequence-based deep representation learning.
      ,
      • Anishchenko I.
      • Chidyausiku T.M.
      • Ovchinnikov S.
      • Pellock S.J.
      • Baker D.
      De novo protein design by deep network hallucination.
      ).
      Computational methods have addressed a number of key challenges in the protein design and will continue to play a major role in advancing applications. Computational protein design is typically defined as at optimization problem: given a user-defined structure and function, find one or a few low-energy amino acid sequences stably adopting the desired structure and performing the targeted function. Ongoing challenges for designing de novo functional proteins arise from all major aspects of this process (Fig. 1): generation of designable protein backbone conformations, sampling of sequences optimal for these structures, scoring functions that are sufficiently accurate to distinguish correct from incorrect solutions, and design of functional sites with the desired activities. In this review, we discuss development of design principles and methods in these aspects and will highlight the role played by the structural data in the PDB in informing these principles, in the context of this special issue of the Journal of Biological Chemistry celebrating the 50th anniversary of the PDB. We focus on advances made in the past 5 years. For readers interested in the history of de novo protein design, we refer to a recent review (
      • Korendovych I.V.
      • DeGrado W.F.
      De novo protein design, a retrospective.
      ).
      Figure thumbnail gr1
      Figure 1Major aspects of the de novo protein design. The design of a functional de novo protein, for example, a binder (middle, magenta) to a target protein (middle, gray), requires sampling of the backbone structure space to find a backbone compatible with the function, sequence optimization to stabilize the backbone, and designing the functional site interactions. A scoring function is necessary to select designs with desired properties, typically by identifying low-energy sequence–structure combinations.

      Sampling of de novo backbone structures for the protein design

      Backbone structures determine the overall shapes of proteins and therefore play a critical role in protein functions. Even small proteins (100 residues or less) have hundreds of backbone degrees of freedom, making it impossible to sample the backbone structure space by brute force. Moreover, because folded proteins need to have well-packed cores and satisfied hydrogen bonds, only a small fraction of the backbone structure space can stably exist, that is, is “designable” (
      • Li H.
      • Helling R.
      • Tang C.
      • Wingreen N.
      Emergence of preferred structures in a simple model of protein folding.
      ,
      • Helling R.
      • Li H.
      • Melin R.
      • Miller J.
      • Wingreen N.
      • Zeng C.
      • Tang C.
      The designability of protein structures.
      ). In the following sections, we describe different levels of sampling backbone conformations for the design, starting from variation of existing structures and ranging to the design of novel folds, fold families, and constrained peptides, and ending with a perspective on the backbone design by emerging machine learning methods.

      Variation of existing structures

      A workaround to the difficulty of de novo backbone design is redesigning native backbone structures from the PDB for new functions (
      • Jiang L.
      • Althoff E.A.
      • Clemente F.R.
      • Doyle L.
      • Rothlisberger D.
      • Zanghellini A.
      • Gallaher J.L.
      • Betker J.L.
      • Tanaka F.
      • Barbas 3rd, C.F.
      • Hilvert D.
      • Houk K.N.
      • Stoddard B.L.
      • Baker D.
      De novo computational design of retro-aldol enzymes.
      ,
      • Rothlisberger D.
      • Khersonsky O.
      • Wollacott A.M.
      • Jiang L.
      • DeChancie J.
      • Betker J.
      • Gallaher J.L.
      • Althoff E.A.
      • Zanghellini A.
      • Dym O.
      • Albeck S.
      • Houk K.N.
      • Tawfik D.S.
      • Baker D.
      Kemp elimination catalysts by computational enzyme design.
      ,
      • Tinberg C.E.
      • Khare S.D.
      • Dou J.
      • Doyle L.
      • Nelson J.W.
      • Schena A.
      • Jankowski W.
      • Kalodimos C.G.
      • Johnsson K.
      • Stoddard B.L.
      • Baker D.
      Computational design of ligand-binding proteins with high affinity and selectivity.
      ). Because proteins are not static, state-of-the-art design methods typically consider small structural adjustments in response to sequence changes, or to diversify native backbones. In particular, several approaches have been developed to mimic “back-rub” motions (
      • Smith C.A.
      • Kortemme T.
      Backrub-like backbone simulation recapitulates natural protein conformational variability and improves mutant side-chain prediction.
      ,
      • Georgiev I.
      • Keedy D.
      • Richardson J.S.
      • Richardson D.C.
      • Donald B.R.
      Algorithm for backrub motions in protein design.
      ), a common mechanism for interconverting between alternate backbone conformations observed in high-resolution (≤1 Å) crystal structures (
      • Davis I.
      • Arendalliii W.
      • Richardson D.
      • Richardson J.
      The backrub motion: How protein backbone shrugs when a sidechain dances.
      ). A back-rub motion involves internal backbone rotations about axes between C-alpha atoms. Incorporating such back-rub moves into design simulations has led to considerable improvements in modeling structural changes in point mutants (
      • Smith C.A.
      • Kortemme T.
      Backrub-like backbone simulation recapitulates natural protein conformational variability and improves mutant side-chain prediction.
      ,
      • Georgiev I.
      • Keedy D.
      • Richardson J.S.
      • Richardson D.C.
      • Donald B.R.
      Algorithm for backrub motions in protein design.
      ,
      • Keedy D.A.
      • Georgiev I.
      • Triplett E.B.
      • Donald B.R.
      • Richardson D.C.
      • Richardson J.S.
      The role of local backrub motions in evolved and designed mutations.
      ), protein dynamics on fast timescales (
      • Friedland G.D.
      • Linares A.J.
      • Smith C.A.
      • Kortemme T.
      A simple model of backbone flexibility improves modeling of side-chain conformational variability.
      ,
      • Friedland G.D.
      • Lakomek N.A.
      • Griesinger C.
      • Meiler J.
      • Kortemme T.
      A correspondence between solution-state dynamics of an individual protein and the sequence and conformational diversity of its family.
      ), prediction of molecular recognition specificity (
      • Smith C.A.
      • Kortemme T.
      Structure-based prediction of the peptide sequence space recognized by natural and synthetic PDZ domains.
      ), and the sequence design (
      • Smith C.A.
      • Kortemme T.
      Predicting the tolerated sequences for proteins and protein interfaces using RosettaBackrub flexible backbone design.
      ).

      Helical bundles

      Helical bundles were the first type of protein fold designed de novo at atomic accuracy (
      • Hill R.B.
      • Raleigh D.P.
      • Lombardi A.
      • DeGrado W.F.
      De novo design of helical bundles as models for understanding protein folding and function.
      ,
      • Harbury P.B.
      • Plecs J.J.
      • Tidor B.
      • Alber T.
      • Kim P.S.
      High-resolution protein design with backbone freedom.
      ). Owing to their regularity, backbone structures of coiled-coil helical bundles can be sampled near exhaustively by Crick’s parameterization (
      • Crick F.
      The Fourier transform of a coiled-coil.
      ). The availability of a method to systematically sample helical bundle backbones and the high stability (
      • Huang P.S.
      • Oberdorfer G.
      • Xu C.
      • Pei X.Y.
      • Nannenga B.L.
      • Rogers J.M.
      • DiMaio F.
      • Gonen T.
      • Luisi B.
      • Baker D.
      High thermodynamic stability of parametrically designed helical bundles.
      ) of the fold make helical bundles a good model system for designing a broad scope of functions such as ligand binding (
      • Polizzi N.F.
      • DeGrado W.F.
      A defined structural unit enables de novo design of small-molecule-binding proteins.
      ), ion transport (
      • Joh N.H.
      • Wang T.
      • Bhate M.P.
      • Acharya R.
      • Wu Y.
      • Grabe M.
      • Hong M.
      • Grigoryan G.
      • DeGrado W.F.
      De novo design of a transmembrane Zn(2)(+)-transporting four-helix bundle.
      ), and switches (
      • Langan R.A.
      • Boyken S.E.
      • Ng A.H.
      • Samson J.A.
      • Dods G.
      • Westbrook A.M.
      • Nguyen T.H.
      • Lajoie M.J.
      • Chen Z.
      • Berger S.
      • Mulligan V.K.
      • Dueber J.E.
      • Novak W.R.P.
      • El-Samad H.
      • Baker D.
      De novo design of bioactive protein switches.
      ). More details on recent progress of the coiled-coil design can be found in a review by Woolfson (
      • Woolfson D.N.
      Coiled-coil design: Updated and upgraded.
      ).

      De novo design by assembling local structures

      De novo backbones beyond helical bundles can be designed by a fragment assembly strategy originally used in structure prediction (
      • Simons K.T.
      • Kooperberg C.
      • Huang E.
      • Baker D.
      Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions.
      ,
      • Bowers P.M.
      • Strauss C.E.
      • Baker D.
      De novo protein structure determination using sparse NMR data.
      ). Typically, the first step in design is defining a blueprint that specifies the lengths and relative orientations of secondary structure elements. Short fragments with desired secondary structures are then extracted from the PDB and assembled into a three-dimensional protein model (Fig. 2A). Top7 was the first protein designed by this method and has a fold topology not observed in nature (
      • Kuhlman B.
      • Dantas G.
      • Ireton G.C.
      • Varani G.
      • Stoddard B.L.
      • Baker D.
      Design of a novel globular protein fold with atomic-level accuracy.
      ).
      Figure thumbnail gr2
      Figure 2Advances in de novo backbone generations. A, methods to build de novo proteins by assembling local structures. The blueprint method assembles fragments of three or nine residues into idealized structures with different fold topologies (
      • Dou J.
      • Vorobieva A.A.
      • Sheffler W.
      • Doyle L.A.
      • Park H.
      • Bick M.J.
      • Mao B.
      • Foight G.W.
      • Lee M.Y.
      • Gagnon L.A.
      • Carter L.
      • Sankaran B.
      • Ovchinnikov S.
      • Marcos E.
      • Huang P.S.
      • et al.
      De novo design of a fluorescence-activating beta-barrel.
      ,
      • Koga N.
      • Tatsumi-Koga R.
      • Liu G.
      • Xiao R.
      • Acton T.B.
      • Montelione G.T.
      • Baker D.
      Principles for designing ideal protein structures.
      ,
      • Marcos E.
      • Chidyausiku T.M.
      • McShan A.C.
      • Evangelidis T.
      • Nerli S.
      • Carter L.
      • Nivon L.G.
      • Davis A.
      • Oberdorfer G.
      • Tripsianes K.
      • Sgourakis N.G.
      • Baker D.
      De novo design of a non-local beta-sheet protein with high stability and accuracy.
      ,
      • Kuhlman B.
      • Dantas G.
      • Ireton G.C.
      • Varani G.
      • Stoddard B.L.
      • Baker D.
      Design of a novel globular protein fold with atomic-level accuracy.
      ,
      • Lin Y.R.
      • Koga N.
      • Tatsumi-Koga R.
      • Liu G.
      • Clouser A.F.
      • Montelione G.T.
      • Baker D.
      Control over overall shape and size in de novo designed proteins.
      ,
      • Marcos E.
      • Basanta B.
      • Chidyausiku T.M.
      • Tang Y.
      • Oberdorfer G.
      • Liu G.
      • Swapna G.V.
      • Guan R.
      • Silva D.A.
      • Dou J.
      • Pereira J.H.
      • Xiao R.
      • Sankaran B.
      • Zwart P.H.
      • Montelione G.T.
      • et al.
      Principles for designing proteins with cavities formed by curved beta sheets.
      ,
      • Rocklin G.J.
      • Chidyausiku T.M.
      • Goreshnik I.
      • Ford A.
      • Houliston S.
      • Lemak A.
      • Carter L.
      • Ravichandran R.
      • Mulligan V.K.
      • Chevalier A.
      • Arrowsmith C.H.
      • Baker D.
      Global analysis of protein folding using massively parallel design, synthesis, and testing.
      ). Modular leucine-rich motifs are connected into repeat proteins with defined curvatures (
      • Park K.
      • Shen B.W.
      • Parmeggiani F.
      • Huang P.S.
      • Stoddard B.L.
      • Baker D.
      Control of repeat-protein curvature by computational protein design.
      ). The SEWING method (
      • Jacobs T.M.
      • Williams B.
      • Williams T.
      • Xu X.
      • Eletsky A.
      • Federizon J.F.
      • Szyperski T.
      • Kuhlman B.
      Design of structurally distinct proteins using strategies inspired by evolution.
      ) connects local structural elements into helical proteins with novel folds. Overlapping regions are colored. B, the Foldit game (
      • Koepnick B.
      • Flatten J.
      • Husain T.
      • Ford A.
      • Silva D.A.
      • Bick M.J.
      • Bauer A.
      • Liu G.
      • Ishida Y.
      • Boykov A.
      • Estep R.D.
      • Kleinfelter S.
      • Norgard-Solano T.
      • Wei L.
      • Players F.
      • et al.
      De novo protein design by citizen scientists.
      ) and TopoBuilder (
      • Yang C.
      • Sesterhenn F.
      • Bonet J.
      • van Aalen E.A.
      • Scheller L.
      • Abriata L.A.
      • Cramer J.T.
      • Wen X.
      • Rosset S.
      • Georgeon S.
      • Jardetzky T.
      • Krey T.
      • Fussenegger M.
      • Merkx M.
      • Correia B.E.
      Bottom-up de novo design of functional proteins with complex structural features.
      ) let players or experts rationally design the atomic details of backbone structures. C, symmetry reduces the complexity of backbone generation. Symmetry was used to design a 4-fold (colors) symmetric TIM barrel (
      • Huang P.S.
      • Feldmeier K.
      • Parmeggiani F.
      • Velasco D.A.F.
      • Hocker B.
      • Baker D.
      De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy.
      ) and repeat proteins (
      • Brunette T.J.
      • Parmeggiani F.
      • Huang P.S.
      • Bhabha G.
      • Ekiert D.C.
      • Tsutakawa S.E.
      • Hura G.L.
      • Tainer J.A.
      • Baker D.
      Exploring the repeat protein universe through computational protein design.
      ). D, de novo protein fold families can be generated by sampling the geometries (length, as well as relative position and orientation) of secondary structure elements (
      • Basanta B.
      • Bick M.J.
      • Bera A.K.
      • Norn C.
      • Chow C.M.
      • Carter L.P.
      • Goreshnik I.
      • Dimaio F.
      • Baker D.
      An enumerative algorithm for de novo design of proteins with diverse pocket structures.
      ,
      • Pan X.
      • Thompson M.C.
      • Zhang Y.
      • Liu L.
      • Fraser J.S.
      • Kelly M.J.S.
      • Kortemme T.
      Expanding the space of protein geometries by computational design of de novo fold families.
      ). E, generative machine learning methods (red) build novel backbone structures by latent space sampling (
      • Anand N.
      • Eguchi R.R.
      • Huang P.-S.
      Fully differentiable full-atom protein backbone generation.
      ). The hallucination method (
      • Anishchenko I.
      • Chidyausiku T.M.
      • Ovchinnikov S.
      • Pellock S.J.
      • Baker D.
      De novo protein design by deep network hallucination.
      ) (red) uses the TR-Rosetta neural network to predict the structure distribution of a sequence. The sequence is optimized using Monte Carlo–simulated annealing by maximizing the divergence between the predicted structure distribution and a background distribution representing unstructured proteins. SEWING, structure extension with native-substructure graphs; TR, transform-restrained.
      The blueprint strategy was subsequently generalized to design de novo backbones for a number of different fold topologies. Notably, each fold topology required specific design rules derived from native structures in the PDB. For instance, idealized alpha-beta fold proteins favor certain β-hairpin chirality, relative orientations of alpha-beta and beta-alpha units, and ranges for the values of backbone torsion angles in the connecting loops (
      • Koga N.
      • Tatsumi-Koga R.
      • Liu G.
      • Xiao R.
      • Acton T.B.
      • Montelione G.T.
      • Baker D.
      Principles for designing ideal protein structures.
      ,
      • Lin Y.R.
      • Koga N.
      • Tatsumi-Koga R.
      • Liu G.
      • Clouser A.F.
      • Montelione G.T.
      • Baker D.
      Control over overall shape and size in de novo designed proteins.
      ). Proteins with curved β-sheets need bulges and register shifts to enable defined β-sheet curvatures (
      • Marcos E.
      • Basanta B.
      • Chidyausiku T.M.
      • Tang Y.
      • Oberdorfer G.
      • Liu G.
      • Swapna G.V.
      • Guan R.
      • Silva D.A.
      • Dou J.
      • Pereira J.H.
      • Xiao R.
      • Sankaran B.
      • Zwart P.H.
      • Montelione G.T.
      • et al.
      Principles for designing proteins with cavities formed by curved beta sheets.
      ). The jelly roll fold topology is constrained by loop conformations, side-chain directionality, and β-strand length (
      • Marcos E.
      • Chidyausiku T.M.
      • McShan A.C.
      • Evangelidis T.
      • Nerli S.
      • Carter L.
      • Nivon L.G.
      • Davis A.
      • Oberdorfer G.
      • Tripsianes K.
      • Sgourakis N.G.
      • Baker D.
      De novo design of a non-local beta-sheet protein with high stability and accuracy.
      ). β-Barrel proteins require glycine kinks and β-bulges to reduce Lennard–Jones repulsive interactions (
      • Dou J.
      • Vorobieva A.A.
      • Sheffler W.
      • Doyle L.A.
      • Park H.
      • Bick M.J.
      • Mao B.
      • Foight G.W.
      • Lee M.Y.
      • Gagnon L.A.
      • Carter L.
      • Sankaran B.
      • Ovchinnikov S.
      • Marcos E.
      • Huang P.S.
      • et al.
      De novo design of a fluorescence-activating beta-barrel.
      ). Traditionally, de novo–designed proteins were validated using low throughput assays. Recent developments in large-scale DNA synthesis (
      • Kosuri S.
      • Church G.M.
      Large-scale de novo DNA synthesis: Technologies and applications.
      ) now enable high-throughput stability screening of de novo–designed small proteins (
      • Rocklin G.J.
      • Chidyausiku T.M.
      • Goreshnik I.
      • Ford A.
      • Houliston S.
      • Lemak A.
      • Carter L.
      • Ravichandran R.
      • Mulligan V.K.
      • Chevalier A.
      • Arrowsmith C.H.
      • Baker D.
      Global analysis of protein folding using massively parallel design, synthesis, and testing.
      ). A recent screen identified thousands of sequences encoding stable designs with four different target structures and identified features of the models associated with design success.
      Other strategies for de novo backbone generation do not use blueprints but still use assembly of protein fragments borrowed from nature. Proteins with controllable curvatures can be designed by combinations of modular leucine-rich-repeat units (
      • Park K.
      • Shen B.W.
      • Parmeggiani F.
      • Huang P.S.
      • Stoddard B.L.
      • Baker D.
      Control of repeat-protein curvature by computational protein design.
      ) (Fig. 2A). The structure extension with native-substructure graphs (SEWING) method (
      • Jacobs T.M.
      • Williams B.
      • Williams T.
      • Xu X.
      • Eletsky A.
      • Federizon J.F.
      • Szyperski T.
      • Kuhlman B.
      Design of structurally distinct proteins using strategies inspired by evolution.
      ) combines continuous or discontinuous helical building blocks from existing proteins (Fig. 2A). SEWING first extracts small substructures from proteins in the PDB. Substructures that share high similarity in local regions are overlapped and combined. Finally, loops are designed to close the gaps between discontinuous elements. Notably, previous applications of Crick’s parameterization to the design were restricted to the coiled-coil topology, while SEWING allows the exploration of more diverse helical topologies.
      A recent method called AbDesign (
      • Lipsh-Sokolik R.
      • Listov D.
      • Fleishman S.J.
      The AbDesign computational pipeline for modular backbone assembly and design of binders and enzymes.
      ) seeks to mimic natural homologous recombination. In contrast to other methods, Abdesign uses larger segments and relies on the similarity between members of the same protein family to facilitate backbone sampling. In particular, AbDesign breaks proteins from a structure family into a few modular segments based on structural alignments and then recombines these segments into new backbones. AbDesign is able to build large numbers of similar structures even for moderately sized families of homologs.
      The complexity of the backbone design problem can be reduced by symmetry (Fig. 2C). A 4-fold symmetric TIM barrel was designed using the blueprint fragment assembly strategy described above (
      • Huang P.S.
      • Feldmeier K.
      • Parmeggiani F.
      • Velasco D.A.F.
      • Hocker B.
      • Baker D.
      De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy.
      ). Experimental characterization of the designs revealed important hydrogen bonds defining the strand register between repeat units. Tandem repeat proteins made of a series of identical helix–loop–helix–loop structural motifs can be systematically assembled (
      • Brunette T.J.
      • Parmeggiani F.
      • Huang P.S.
      • Bhabha G.
      • Ekiert D.C.
      • Tsutakawa S.E.
      • Hura G.L.
      • Tainer J.A.
      • Baker D.
      Exploring the repeat protein universe through computational protein design.
      ). The designed repeat proteins span a broad range of curvatures. By modulating the curvature, alpha tandem repeat proteins can form closed toroid structures (
      • Doyle L.
      • Hallinan J.
      • Bolduc J.
      • Parmeggiani F.
      • Baker D.
      • Stoddard B.L.
      • Bradley P.
      Rational design of alpha-helical tandem repeat proteins with closed architectures.
      ). A large number of proteins with diverse shapes can be generated by designing rigid junctions to connect helical repeat proteins (
      • Brunette T.J.
      • Bick M.J.
      • Hansen J.M.
      • Chow C.M.
      • Kollman J.M.
      • Baker D.
      Modular repeat protein sculpting using rigid helical junctions.
      ).

      Backbone design by fragment assembly using human intuition

      Human rationale can design the atomic details of de novo proteins (Fig. 2B). The developers of the online game Foldit (
      • Cooper S.
      • Khatib F.
      • Treuille A.
      • Barbero J.
      • Lee J.
      • Beenen M.
      • Leaver-Fay A.
      • Baker D.
      • Popovic Z.
      • Players F.
      Predicting protein structures with a multiplayer online game.
      ) crowd-sourced solutions for the challenge of de novo protein design (
      • Koepnick B.
      • Flatten J.
      • Husain T.
      • Ford A.
      • Silva D.A.
      • Bick M.J.
      • Bauer A.
      • Liu G.
      • Ishida Y.
      • Boykov A.
      • Estep R.D.
      • Kleinfelter S.
      • Norgard-Solano T.
      • Wei L.
      • Players F.
      • et al.
      De novo protein design by citizen scientists.
      ). Online Foldit players were provided with a set of tools to generate, mutate, move, and score protein structures. Starting from a fully extended peptide chain, players were able to fold the chain into de novo structures and stabilize the structures by sequence optimization. The players designed more than ten million models. The Foldit developers experimentally tested 146 top designs and identified 56 designs that adopted well-folded monomeric structures. The experimentally solved structures of four of these designs closely agreed with the computational models.
      A different strategy incorporates human expert knowledge into the process of backbone generation for design. The TopoBuilder (
      • Yang C.
      • Sesterhenn F.
      • Bonet J.
      • van Aalen E.A.
      • Scheller L.
      • Abriata L.A.
      • Cramer J.T.
      • Wen X.
      • Rosset S.
      • Georgeon S.
      • Jardetzky T.
      • Krey T.
      • Fussenegger M.
      • Merkx M.
      • Correia B.E.
      Bottom-up de novo design of functional proteins with complex structural features.
      ) protocol lets designers build proteins in a bottom-up approach starting from functional motifs (e.g., a helix in a binding interface). Designers define the sizes and three-dimensional coordinates of secondary structure elements. The coordinates are then transformed into constraints for the Rosetta FunFolDes (
      • Bonet J.
      • Wehrle S.
      • Schriever K.
      • Yang C.
      • Billet A.
      • Sesterhenn F.
      • Scheck A.
      • Sverrisson F.
      • Veselkova B.
      • Vollers S.
      • Lourman R.
      • Villard M.
      • Rosset S.
      • Krey T.
      • Correia B.E.
      Rosetta FunFolDes - a general framework for the computational design of functional proteins.
      ) method to build all-atom models. The TopoBuilder protocol successfully designed protein binders (
      • Yang C.
      • Sesterhenn F.
      • Bonet J.
      • van Aalen E.A.
      • Scheller L.
      • Abriata L.A.
      • Cramer J.T.
      • Wen X.
      • Rosset S.
      • Georgeon S.
      • Jardetzky T.
      • Krey T.
      • Fussenegger M.
      • Merkx M.
      • Correia B.E.
      Bottom-up de novo design of functional proteins with complex structural features.
      ).

      Fold family design

      Naturally occurring proteins with the same fold topology can have distinct functions because of fine-tuned differences in the precise geometries of structural elements (
      • Dawson N.L.
      • Lewis T.E.
      • Das S.
      • Lees J.G.
      • Lee D.
      • Ashford P.
      • Orengo C.A.
      • Sillitoe I.
      Cath: An expanded resource to predict protein function through structure and sequence.
      ,
      • Fox N.K.
      • Brenner S.E.
      • Chandonia J.M.
      SCOPe: Structural Classification of Proteins--extended, integrating SCOP and ASTRAL data and classification of new structures.
      ). The ability to explore such geometric variation within fold families is critical for design of new protein functions that require precise three-dimensional conformations of active sites. The recently developed loop-helix-loop unit combinatorial sampling method systematically samples loop-helix-loop geometries in arbitrary protein folds by near exhaustive testing of combinations of short loops (
      • Pan X.
      • Thompson M.C.
      • Zhang Y.
      • Liu L.
      • Fraser J.S.
      • Kelly M.J.S.
      • Kortemme T.
      Expanding the space of protein geometries by computational design of de novo fold families.
      ) (Fig. 2D). The generated protein geometries had similar distributions to those observed in native structures in the PDB but also included thousands of new structures. Experimentally solved structures spanned a wide range of the sampled distribution. Using a different approach to geometric variation, an enumerative algorithm was developed to sample diverse pocket structures of nuclear transport factor 2 fold proteins (
      • Basanta B.
      • Bick M.J.
      • Bera A.K.
      • Norn C.
      • Chow C.M.
      • Carter L.P.
      • Goreshnik I.
      • Dimaio F.
      • Baker D.
      An enumerative algorithm for de novo design of proteins with diverse pocket structures.
      ). Parameters such as sheet curvatures, loop types, and secondary structure lengths were sampled during a hierarchical backbone assembly process. Thousands of stable designs with diverse pocket geometries were identified by a high-throughput yeast surface display experiment.

      Constrained peptides

      Naturally occurring constrained peptides can have strong pharmacological activities. The GenKIC method (
      • Bhardwaj G.
      • Mulligan V.K.
      • Bahl C.D.
      • Gilmore J.M.
      • Harvey P.J.
      • Cheneval O.
      • Buchko G.W.
      • Pulavarti S.V.
      • Kaas Q.
      • Eletsky A.
      • Huang P.S.
      • Johnsen W.A.
      • Greisen P.J.
      • Rocklin G.J.
      • Song Y.
      • et al.
      Accurate de novo design of hyperstable constrained peptides.
      ) adapted the robotics-inspired kinematic closure algorithm (
      • Coutsias E.A.
      • Seok C.
      • Jacobson M.P.
      • Dill K.A.
      A kinematic view of loop closure.
      ,
      • Mandell D.J.
      • Coutsias E.A.
      • Kortemme T.
      Sub-angstrom accuracy in protein loop reconstruction by robotics-inspired conformational sampling.
      ) from loop modeling, generalized the approach to sample noncanonical backbone degrees of freedom, and applied it to cyclic peptides and peptides constrained by disulfide bonds. The designed peptides closely matched the experimentally solved structures and showed high stability against thermal and chemical denaturation. Kinematic closure methods in Rosetta (
      • Bhardwaj G.
      • Mulligan V.K.
      • Bahl C.D.
      • Gilmore J.M.
      • Harvey P.J.
      • Cheneval O.
      • Buchko G.W.
      • Pulavarti S.V.
      • Kaas Q.
      • Eletsky A.
      • Huang P.S.
      • Johnsen W.A.
      • Greisen P.J.
      • Rocklin G.J.
      • Song Y.
      • et al.
      Accurate de novo design of hyperstable constrained peptides.
      ,
      • Mandell D.J.
      • Coutsias E.A.
      • Kortemme T.
      Sub-angstrom accuracy in protein loop reconstruction by robotics-inspired conformational sampling.
      ) can be used to enumerate backbones of cyclic peptides with seven to ten residues nearly exhaustively (
      • Hosseinzadeh P.
      • Bhardwaj G.
      • Mulligan V.K.
      • Shortridge M.D.
      • Craven T.W.
      • Pardo-Avila F.
      • Rettie S.A.
      • Kim D.E.
      • Silva D.A.
      • Ibrahim Y.M.
      • Webb I.K.
      • Cort J.R.
      • Adkins J.N.
      • Varani G.
      • Baker D.
      Comprehensive computational design of ordered peptide macrocycles.
      ). GenKIC was also applied to design meso-size proteins stabilized by multivalent cross-linkers (
      • Dang B.
      • Wu H.
      • Mulligan V.K.
      • Mravic M.
      • Wu Y.
      • Lemmin T.
      • Ford A.
      • Silva D.A.
      • Baker D.
      • DeGrado W.F.
      De novo design of covalently constrained mesosize protein scaffolds with unique tertiary structures.
      ).

      Backbone design by machine learning

      Machine learning models trained with the rich structural data from the PDB are able to generate novel protein backbone structures (Fig. 2E). A generative adversarial network (
      • Anand N.
      • Eguchi R.R.
      • Huang P.-S.
      Fully differentiable full-atom protein backbone generation.
      ) model builds protein structures represented as pairwise distances between all backbone atoms. A pretrained deep convolutional neural network then recovers the three-dimensional backbone structure from pairwise distances. Some of the designed structures could be recapitulated by fragment-based structure prediction methods (
      • Bradley P.
      • Misura K.M.
      • Baker D.
      Toward high-resolution de novo structure prediction for small proteins.
      ). Another variational autoencoder–based model focused on generating immunoglobulin structures (
      • Eguchi R.R.
      • Anand N.
      • Choe C.A.
      • Huang P.-S.
      IG-VAE: Generative modeling of immunoglobulin proteins by Direct 3D coordinate generation.
      ). The model learned the distribution of immunoglobulin structures and compressed the distribution into a low-dimensional space termed latent space. Immunoglobulins with defined complementarity determining regions can then be generated through latent space sampling. A new method used the idea of neural network “hallucination” (generation of structures) for the protein design (
      • Anishchenko I.
      • Chidyausiku T.M.
      • Ovchinnikov S.
      • Pellock S.J.
      • Baker D.
      De novo protein design by deep network hallucination.
      ). The model repurposes the neural network from transform-restrained (TR)-Rosetta (
      • Yang J.
      • Anishchenko I.
      • Park H.
      • Peng Z.
      • Ovchinnikov S.
      • Baker D.
      Improved protein structure prediction using predicted interresidue orientations.
      ). The TR-Rosetta network is a fast method to predict the inter-residue contact map of an arbitrary sequence. A loss function is defined as Kullback-Leibler divergence (
      • Kullback S.
      • Leibler R.A.
      On information and sufficiency.
      ) between the TR-Rosetta neural network–predicted contact map and a background distribution. Novel sequences and structures can be designed simultaneously by optimizing the loss function through Monte Carlo–simulated annealing. Diverse structures were designed by the model and shown to be folded by experimental characterization.

      Sequence optimization

      After generation of protein backbones, the second step in a typical de novo protein design protocol is selection of amino acid side-chain types and conformations to stabilize the backbone conformation and to adopt specific three-dimensional active site geometries optimized for function. Early de novo design studies used amino acids that favor specific secondary structure types (
      • Ho S.P.
      • DeGrado W.F.
      Design of a 4-helix bundle protein: Synthesis of peptides which self-associate into a helical protein.
      ) or binary polar/hydrophobic patterns (
      • Kamtekar S.
      • Schiffer J.M.
      • Xiong H.
      • Babik J.M.
      • Hecht M.H.
      Protein design by binary patterning of polar and nonpolar amino acids.
      ) to define protein structures. Because side-chain conformations are clustered as rotamers (
      • Chandrasekaran R.
      • Ramachandran G.N.
      Studies on the conformation of amino acids. XI. Analysis of the observed side group conformation in proteins.
      ,
      • Shapovalov M.V.
      • Dunbrack Jr., R.L.
      A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions.
      ), the side-chain design can be formulated as a discrete optimization problem (
      • Ponder J.W.
      • Richards F.M.
      Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes.
      ), that is, find a combination of rotamers that minimize the energy of a structure. The complexity of the problem grows exponentially with the increase of the number of residues. Small-scale side-chain design problems can be solved deterministically by the dead-end elimination algorithm (
      • Desmet J.
      • De Maeyer M.
      • Hazes B.
      • Lasters I.
      The dead-end elimination theorem and its use in protein side-chain positioning.
      ), but many de novo protein side-chain optimization problems are too large to be solved deterministically. Instead, amino acid sequences and side-chain conformations are often optimized using Monte Carlo methods (
      • Kuhlman B.
      • Baker D.
      Native protein sequences are close to optimal for their structures.
      ,
      • Lee C.
      • Subbiah S.
      Prediction of protein side-chain conformation by packing optimization.
      ), which do not guarantee to find the global minimum, but the solutions are often sufficiently accurate for applications.
      The efficiency of side chain sampling methods can be improved by constraining the amino acid types allowed at each residue position. LayerDesign is a common strategy (
      • Dahiyat B.I.
      • Mayo S.L.
      De novo protein design: Fully automated sequence selection.
      ,
      • Marcos E.
      • Chidyausiku T.M.
      • McShan A.C.
      • Evangelidis T.
      • Nerli S.
      • Carter L.
      • Nivon L.G.
      • Davis A.
      • Oberdorfer G.
      • Tripsianes K.
      • Sgourakis N.G.
      • Baker D.
      De novo design of a non-local beta-sheet protein with high stability and accuracy.
      ,
      • Pan X.
      • Thompson M.C.
      • Zhang Y.
      • Liu L.
      • Fraser J.S.
      • Kelly M.J.S.
      • Kortemme T.
      Expanding the space of protein geometries by computational design of de novo fold families.
      ,
      • Marcos E.
      • Basanta B.
      • Chidyausiku T.M.
      • Tang Y.
      • Oberdorfer G.
      • Liu G.
      • Swapna G.V.
      • Guan R.
      • Silva D.A.
      • Dou J.
      • Pereira J.H.
      • Xiao R.
      • Sankaran B.
      • Zwart P.H.
      • Montelione G.T.
      • et al.
      Principles for designing proteins with cavities formed by curved beta sheets.
      ,
      • Rocklin G.J.
      • Chidyausiku T.M.
      • Goreshnik I.
      • Ford A.
      • Houliston S.
      • Lemak A.
      • Carter L.
      • Ravichandran R.
      • Mulligan V.K.
      • Chevalier A.
      • Arrowsmith C.H.
      • Baker D.
      Global analysis of protein folding using massively parallel design, synthesis, and testing.
      ) to constrain designable amino acid types (Fig. 3A). Residue positions are divided into three categories: core, boundary, and surface. The core region allows only hydrophobic amino acids, the surface region allows only polar amino acids, and the boundary region allows all amino acids. The LayerDesign method increases sampling speed and reduces artifacts, such as buried polar residues, which may result from insufficient sampling or scoring errors. To further eliminate flawed designs, the results from Monte Carlo samplers are often filtered by a set of properties such as core packing (
      • Sheffler W.
      • Baker D.
      RosettaHoles: Rapid assessment of protein core packing for structure prediction, refinement, design and validation.
      ) and hydrogen bond satisfaction (
      • Pan X.
      • Thompson M.C.
      • Zhang Y.
      • Liu L.
      • Fraser J.S.
      • Kelly M.J.S.
      • Kortemme T.
      Expanding the space of protein geometries by computational design of de novo fold families.
      ) (Fig. 3B). A high-throughput stability screen of designed small proteins showed that buried nonpolar surface area and local sequence-structure compatibility had strong correlations with the stabilities of designs (
      • Rocklin G.J.
      • Chidyausiku T.M.
      • Goreshnik I.
      • Ford A.
      • Houliston S.
      • Lemak A.
      • Carter L.
      • Ravichandran R.
      • Mulligan V.K.
      • Chevalier A.
      • Arrowsmith C.H.
      • Baker D.
      Global analysis of protein folding using massively parallel design, synthesis, and testing.
      ).
      Figure thumbnail gr3
      Figure 3Advances in side-chain design. A, in layer design, polar residues (cyan) are only allowed at surface and boundary positions, while hydrophobic residues (yellow) are only allowed at boundary and core positions. B, structures generated by side chain design methods can be evaluated by a set of filters, such as core packing quality, hydrogen bond satisfaction and local sequence/structure compatibility. C, side chain design methods that exploit backbone flexibility outperform fixed backbone methods (
      • Loshbaugh A.L.
      • Kortemme T.
      Comparison of Rosetta flexible-backbone computational protein design methods on binding interactions.
      ). D, the HBNet method (
      • Boyken S.E.
      • Chen Z.
      • Groves B.
      • Langan R.A.
      • Oberdorfer G.
      • Ford A.
      • Gilmore J.M.
      • Xu C.
      • DiMaio F.
      • Pereira J.H.
      • Sankaran B.
      • Seelig G.
      • Zwart P.H.
      • Baker D.
      De novo design of protein homo-oligomers with modular hydrogen-bond network-mediated specificity.
      ) designs hydrogen bond networks. E, neural networks can predict the probabilities of sequences given a backbone structure (
      • Anand-Achim N.
      • Eguchi R.R.
      • Derry A.
      • Altman R.B.
      • Huang P.-S.
      Protein sequence design with a learned potential.
      ,
      • Wang J.
      • Cao H.
      • Zhang J.Z.H.
      • Qi Y.
      Computational protein design with deep learning neural networks.
      ) (red). Generative machine learning models design sequences by latent space sampling (
      • Karimi M.
      • Zhu S.
      • Cao Y.
      • Shen Y.
      De novo protein design for novel folds using guided conditional Wasserstein generative adversarial networks.
      ,
      • Davidsen K.
      • Olson B.J.
      • DeWitt 3rd, W.S.
      • Feng J.
      • Harkins E.
      • Bradley P.
      • Matsen F.A.t.
      Deep generative models for T cell receptor protein sequences.
      ,
      • Hawkins-Hooker A.
      • Depardieu F.
      • Baur S.
      • Couairon G.
      • Chen A.
      • Bikard D.
      Generating functional protein variants with variational autoencoders.
      ,
      • Ingraham J.
      • Garg V.K.
      • Barzilay R.
      • Jaakkola T.
      Generative models for graph-based protein design.
      ,
      • Strokach A.
      • Becerra D.
      • Corbi-Verge C.
      • Perez-Riba A.
      • Kim P.M.
      Fast and flexible protein design using deep graph neural networks.
      ) (green). The TR-Rosetta neural network predicts the probability of the structure of a given sequence. The difference between the desired structure and the predicted structure can be backpropagated through the neural network to optimize the sequence (
      • Norn C.
      • Wicky B.I.M.
      • Juergens D.
      • Liu S.
      • Kim D.
      • Koepnick B.
      • Anishchenko I.
      • Players F.
      • Baker D.
      • Ovchinnikov S.
      Protein sequence design by explicit energy landscape optimization.
      ) (blue). TR-Rosetta, transform-restrained Rosetta.

      Sequence optimization with flexible backbones

      Solutions of fixed backbone side-chain design problems are sensitive to the backbone structures used as input. Because the Lennard-Jones potential term in scoring functions (see the section below) scales as the 12th power of distance when two atoms are close to each other, a small adjustment to the backbone structure may result in a considerable energy change. To address these problems, state-of-the-art side-chain design methods sample both side-chain rotamers and local backbone conformations (
      • Georgiev I.
      • Keedy D.
      • Richardson J.S.
      • Richardson D.C.
      • Donald B.R.
      Algorithm for backrub motions in protein design.
      ,
      • Keedy D.A.
      • Georgiev I.
      • Triplett E.B.
      • Donald B.R.
      • Richardson D.C.
      • Richardson J.S.
      The role of local backrub motions in evolved and designed mutations.
      ,
      • Ollikainen N.
      • de Jong R.M.
      • Kortemme T.
      Coupling protein side-chain and backbone flexibility improves the Re-design of protein-ligand specificity.
      ,
      • Georgiev I.
      • Donald B.R.
      Dead-end elimination with backbone flexibility.
      ) (Fig. 3C). Typically, methods that exploit backbone flexibility or use backbone ensembles outperform the fixed backbone design (
      • Ollikainen N.
      • Smith C.A.
      • Fraser J.S.
      • Kortemme T.
      Flexible backbone sampling methods to model and design protein alternative conformations.
      ,
      • Davey J.A.
      • Chica R.A.
      Improving the accuracy of protein stability predictions with multistate design using a variety of backbone ensembles.
      ). A study benchmarked (
      • Loshbaugh A.L.
      • Kortemme T.
      Comparison of Rosetta flexible-backbone computational protein design methods on binding interactions.
      ) several flexible backbone side-chain design methods including CoupledMoves (
      • Ollikainen N.
      • de Jong R.M.
      • Kortemme T.
      Coupling protein side-chain and backbone flexibility improves the Re-design of protein-ligand specificity.
      ), BackrubEnsemble (
      • Smith C.A.
      • Kortemme T.
      Predicting the tolerated sequences for proteins and protein interfaces using RosettaBackrub flexible backbone design.
      ), and FastDesign compared with a fixed backbone design method using the same scoring function. Methods that simultaneously, rather than sequentially, optimize sequence and backbone structure, such as CoupledMoves (
      • Ollikainen N.
      • de Jong R.M.
      • Kortemme T.
      Coupling protein side-chain and backbone flexibility improves the Re-design of protein-ligand specificity.
      ), may be advantageous (
      • Loshbaugh A.L.
      • Kortemme T.
      Comparison of Rosetta flexible-backbone computational protein design methods on binding interactions.
      ).

      Hydrogen-bonding networks

      Hydrogen bonds play an important role in the specificity of protein–ligand and protein–protein interactions. The formation of a hydrogen bond only allows narrow ranges of distances and orientations between the donor and acceptor groups (
      • Kortemme T.
      • Morozov A.V.
      • Baker D.
      An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes.
      ). Almost all hydrogen bond donor or acceptor groups in a protein must form hydrogen bonds within the protein or with solvent molecules to avoid large energetic penalties of unsatisfied hydrogen bonds (
      • McDonald I.K.
      • Thornton J.M.
      Satisfying hydrogen bonding potential in proteins.
      ). The HBNet method addresses the challenges for the hydrogen bond design by systematically searching for possible hydrogen-bond networks (
      • Boyken S.E.
      • Chen Z.
      • Groves B.
      • Langan R.A.
      • Oberdorfer G.
      • Ford A.
      • Gilmore J.M.
      • Xu C.
      • DiMaio F.
      • Pereira J.H.
      • Sankaran B.
      • Seelig G.
      • Zwart P.H.
      • Baker D.
      De novo design of protein homo-oligomers with modular hydrogen-bond network-mediated specificity.
      ) (Fig. 3D). HBNet constructs a graph whose nodes are rotamers that have hydrogen bond donors or acceptors. Two nodes are connected by an edge if the rotamers of the nodes can form hydrogen bonds. Hydrogen bond networks can be generated by traversing the graph. HBNet was successfully applied to design helical bundle homo-oligomers with specificity mediated by hydrogen bond networks. A Monte Carlo version of the HBNet method uses a stochastic algorithm to traverse the HBNet graph (
      • Maguire J.B.
      • Boyken S.E.
      • Baker D.
      • Kuhlman B.
      Rapid sampling of hydrogen bond networks for computational protein design.
      ). This new approach significantly improves the sampling speed and makes larger design problems possible.

      Sequence design using machine learning methods

      A number of machine learning methods for protein sequence design were developed recently (Fig. 3E). Deep neural network methods were trained to predict probabilities of amino acids at each residue position of a backbone structure (
      • Anand-Achim N.
      • Eguchi R.R.
      • Derry A.
      • Altman R.B.
      • Huang P.-S.
      Protein sequence design with a learned potential.
      ,
      • Wang J.
      • Cao H.
      • Zhang J.Z.H.
      • Qi Y.
      Computational protein design with deep learning neural networks.
      ). Generative models learn distributions of protein sequences and can generate new native-like protein sequences with or without input backbone structures. A number of generative models were developed for sequence design, including generative adversarial networks (
      • Karimi M.
      • Zhu S.
      • Cao Y.
      • Shen Y.
      De novo protein design for novel folds using guided conditional Wasserstein generative adversarial networks.
      ), variational autoencoders (
      • Davidsen K.
      • Olson B.J.
      • DeWitt 3rd, W.S.
      • Feng J.
      • Harkins E.
      • Bradley P.
      • Matsen F.A.t.
      Deep generative models for T cell receptor protein sequences.
      ,
      • Hawkins-Hooker A.
      • Depardieu F.
      • Baur S.
      • Couairon G.
      • Chen A.
      • Bikard D.
      Generating functional protein variants with variational autoencoders.
      ), and graph-based (
      • Ingraham J.
      • Garg V.K.
      • Barzilay R.
      • Jaakkola T.
      Generative models for graph-based protein design.
      ,
      • Strokach A.
      • Becerra D.
      • Corbi-Verge C.
      • Perez-Riba A.
      • Kim P.M.
      Fast and flexible protein design using deep graph neural networks.
      ) models. Notably, the structure prediction neural network from TR-Rosetta (
      • Yang J.
      • Anishchenko I.
      • Park H.
      • Peng Z.
      • Ovchinnikov S.
      • Baker D.
      Improved protein structure prediction using predicted interresidue orientations.
      ) can be repurposed for sequence optimization (
      • Norn C.
      • Wicky B.I.M.
      • Juergens D.
      • Liu S.
      • Kim D.
      • Koepnick B.
      • Anishchenko I.
      • Players F.
      • Baker D.
      • Ovchinnikov S.
      Protein sequence design by explicit energy landscape optimization.
      ). For a protein sequence, the TR-Rosetta neural network predicts distances, angles, and dihedrals for every pair of residues. A loss function is defined as the difference between the prediction and the target structure. The gradient of the loss is then back-propagated through the TR-Rosetta neural network to optimize the sequence. Combining machine learning models and traditional Monte Carlo samplers improves performance over every single method (
      • Wang J.
      • Cao H.
      • Zhang J.Z.H.
      • Qi Y.
      Computational protein design with deep learning neural networks.
      ,
      • Norn C.
      • Wicky B.I.M.
      • Juergens D.
      • Liu S.
      • Kim D.
      • Koepnick B.
      • Anishchenko I.
      • Players F.
      • Baker D.
      • Ovchinnikov S.
      Protein sequence design by explicit energy landscape optimization.
      ).

      Scoring functions for the design

      Scoring functions in the computational protein design aim to distinguish designs with desired properties from those not adopting the intended structures and functions, typically by identifying low-energy sequence–structure combinations. Early protein energy functions (
      • Levitt M.
      • Lifson S.
      Refinement of protein conformations using a macromolecular energy minimization procedure.
      ) used harmonic terms for bond energies and a Lennard–Jones potential for van der Waals interactions. Modern physics–based energy functions (
      • Brooks B.R.
      • Bruccoleri R.E.
      • Olafson B.D.
      • States D.J.
      • Swaminathan S.
      • Karplus M.
      Charmm: A program for macromolecular energy, minimization, and dynamics calculations.
      ,
      • Jorgensen W.L.
      • Tirado-Rives J.
      The OPLS [optimized potentials for liquid simulations] potential functions for proteins, energy minimizations for crystals of cyclic peptides and crambin.
      ,
      • Cornell W.D.
      • Cieplak P.
      • Bayly C.I.
      • Gould I.R.
      • Merz K.M.
      • Ferguson D.M.
      • Spellmeyer D.C.
      • Fox T.
      • Caldwell J.W.
      • Kollman P.A.
      A second generation force field for the simulation of proteins, nucleic acids, and organic molecules (vol 117, pg 5179, 1995).
      ) account for additional energy terms such as electrostatics and desolvation. An alternative approach to physics-based energy terms is using statistics from known structures to derive potential functions (
      • Tanaka S.
      • Scheraga H.A.
      Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins.
      ). The first version of the scoring function in the Rosetta program for structural modeling, and the design was developed for protein structure prediction (
      • Rohl C.A.
      • Strauss C.E.
      • Misura K.M.
      • Baker D.
      Protein structure prediction using Rosetta.
      ) and was a statistical potential function derived from structures in the PDB (
      • Berman H.M.
      • Westbrook J.
      • Feng Z.
      • Gilliland G.
      • Bhat T.N.
      • Weissig H.
      • Shindyalov I.N.
      • Bourne P.E.
      The protein data bank.
      ,
      • Bernstein F.C.
      • Koetzle T.F.
      • Williams G.J.
      • Meyer Jr., E.F.
      • Brice M.D.
      • Rodgers J.R.
      • Kennard O.
      • Shimanouchi T.
      • Tasumi M.
      The protein data bank: A computer-based archival file for macromolecular structures.
      ) using Bayesian statistics (
      • Simons K.T.
      • Kooperberg C.
      • Huang E.
      • Baker D.
      Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions.
      ). To adapt Rosetta for the protein design, all-atom detail and physics-based terms were incorporated (
      • Kuhlman B.
      • Dantas G.
      • Ireton G.C.
      • Varani G.
      • Stoddard B.L.
      • Baker D.
      Design of a novel globular protein fold with atomic-level accuracy.
      ,
      • Kortemme T.
      • Morozov A.V.
      • Baker D.
      An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes.
      ), which in turn led to considerable advances in both protein structure prediction and protein design (
      • Bradley P.
      • Misura K.M.
      • Baker D.
      Toward high-resolution de novo structure prediction for small proteins.
      ,
      • Schueler-Furman O.
      • Wang C.
      • Bradley P.
      • Misura K.
      • Baker D.
      Progress in modeling of protein structures and interactions.
      ). The current version of the Rosetta force field used for design is similar to modern molecular mechanics force fields (
      • Park H.
      • Bradley P.
      • Greisen Jr., P.
      • Liu Y.
      • Mulligan V.K.
      • Kim D.E.
      • Baker D.
      • DiMaio F.
      Simultaneous optimization of biomolecular energy functions on features from small molecules and macromolecules.
      ,
      • Alford R.F.
      • Leaver-Fay A.
      • Jeliazkov J.R.
      • O'Meara M.J.
      • DiMaio F.P.
      • Park H.
      • Shapovalov M.V.
      • Renfrew P.D.
      • Mulligan V.K.
      • Kappel K.
      • Labonte J.W.
      • Pacella M.S.
      • Bonneau R.
      • Bradley P.
      • Dunbrack R.L.
      • et al.
      The Rosetta all-atom energy function for macromolecular modeling and design.
      ), but including orientation dependency of hydrogen-bonding interactions based on PDB statistics and electronic structure calculations (
      • Kortemme T.
      • Morozov A.V.
      • Baker D.
      An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes.
      ,
      • Morozov A.V.
      • Kortemme T.
      • Tsemekhman K.
      • Baker D.
      Close agreement between the orientation dependence of hydrogen bonds observed in protein structures and quantum mechanical calculations.
      ); the orientation dependence of hydrogen bonding is important for designing interaction specificity critical to many functions (
      • Chen Z.
      • Kibler R.D.
      • Hunt A.
      • Busch F.
      • Pearl J.
      • Jia M.
      • VanAernum Z.L.
      • Wicky B.I.M.
      • Dods G.
      • Liao H.
      • Wilken M.S.
      • Ciarlo C.
      • Green S.
      • El-Samad H.
      • Stamatoyannopoulos J.
      • et al.
      De novo design of protein logic gates.
      ,
      • Boyken S.E.
      • Chen Z.
      • Groves B.
      • Langan R.A.
      • Oberdorfer G.
      • Ford A.
      • Gilmore J.M.
      • Xu C.
      • DiMaio F.
      • Pereira J.H.
      • Sankaran B.
      • Seelig G.
      • Zwart P.H.
      • Baker D.
      De novo design of protein homo-oligomers with modular hydrogen-bond network-mediated specificity.
      ,
      • Chen Z.
      • Boyken S.E.
      • Jia M.
      • Busch F.
      • Flores-Solis D.
      • Bick M.J.
      • Lu P.
      • VanAernum Z.L.
      • Sahasrabuddhe A.
      • Langan R.A.
      • Bermeo S.
      • Brunette T.J.
      • Mulligan V.K.
      • Carter L.P.
      • DiMaio F.
      • et al.
      Programmable design of orthogonal protein heterodimers.
      ). In the following, we highlight recent developments in scoring functions for membrane proteins and for interactions with nonprotein molecules, as well as scoring approaches that learn from structures in the PDB.

      Membrane scoring functions

      Scoring functions for soluble proteins take advantage of the large number of solved structures in the PDB to validate and fit the parameters of the score function (
      • Jacobson M.P.
      • Kaminski G.A.
      • Friesner R.A.
      • Rapp C.S.
      Force field validation using protein side chain prediction.
      ,
      • Leaver-Fay A.
      • O'Meara M.J.
      • Tyka M.
      • Jacak R.
      • Song Y.F.
      • Kellogg E.H.
      • Thompson J.
      • Davis I.W.
      • Pache R.A.
      • Lyskov S.
      • Gray J.J.
      • Kortemme T.
      • Richardson J.S.
      • Havranek J.J.
      • Snoeyink J.
      • et al.
      Scientific benchmarks for guiding macromolecular energy function improvement.
      ). Transmembrane proteins make up about 30% of ORFs in known genomes but are currently underrepresented in the PDB, complicating the development of membrane protein scoring functions. An early version of the Rosetta membrane scoring function (
      • Yarov-Yarovoy V.
      • Schonbrun J.
      • Baker D.
      Multipass membrane protein structure prediction using Rosetta.
      ) used statistics from 28 transmembrane proteins to fit parameters and was validated by ab initio structure prediction of 12 multipass membrane proteins. Recently, a new membrane scoring model (
      • Alford R.F.
      • Fleming P.J.
      • Fleming K.G.
      • Gray J.J.
      Protein structure prediction and design in a biologically realistic implicit membrane.
      ) was developed, which aims to better capture the heterogeneous membrane environment (Fig. 4A). The interface between bulk water and bulk lipid is modeled as a continuous transition of hydration fraction, with water-filled pores modeled using a convex-hull algorithm (
      • Koehler Leman J.
      • Lyskov S.
      • Bonneau R.
      Computing structure-based lipid accessibility of membrane proteins with mp_lipid_acc in RosettaMP.
      ). The water-to-bilayer transfer energy is then calculated using the hydration fraction and the Moon and Fleming hydrophobicity scale (
      • Moon C.P.
      • Fleming K.G.
      Side-chain hydrophobicity scale derived from transmembrane protein folding into lipid bilayers.
      ). This membrane model improves performance in several computational tests, including prediction of membrane protein orientation, calculation of changes in membrane protein stability upon mutation, discrimination of native structures from incorrect models, and the extent to which the native sequence is recovered in design simulations.
      Figure thumbnail gr4
      Figure 4Advances in scoring functions. A, a membrane scoring function (
      • Alford R.F.
      • Fleming P.J.
      • Fleming K.G.
      • Gray J.J.
      Protein structure prediction and design in a biologically realistic implicit membrane.
      ) uses a continuous hydration fraction to calculate the free energy change of residues from water to the lipid environment. Water pores in membrane proteins are explicitly modeled. B, protein design scoring functions are generalized to model small molecules (
      • Park H.
      • Zhou G.
      • Baek M.
      • Baker D.
      • DiMaio F.
      Learning a force field from small-molecule crystal lattice predictions enables consistent sub-Angstrom protein-ligand docking.
      ) and carbohydrates (
      • Labonte J.W.
      • Adolf-Bryfogle J.
      • Schief W.R.
      • Gray J.J.
      Residue-centric modeling and design of saccharide and glycoconjugate structures.
      ). C, the TERMs-based scoring function (
      • Zheng F.
      • Zhang J.
      • Grigoryan G.
      Tertiary structural propensities reveal fundamental sequence/structure relationships.
      ) breaks proteins into tertiary structure motifs and evaluates the fitness of the sequence for any local structure using the sequence profiles of the tertiary motifs. D, machine learning methods predict the probability of sequences given a structure (
      • Anand-Achim N.
      • Eguchi R.R.
      • Derry A.
      • Altman R.B.
      • Huang P.-S.
      Protein sequence design with a learned potential.
      ) or the probability of structures given a sequence (
      • Norn C.
      • Wicky B.I.M.
      • Juergens D.
      • Liu S.
      • Kim D.
      • Koepnick B.
      • Anishchenko I.
      • Players F.
      • Baker D.
      • Ovchinnikov S.
      Protein sequence design by explicit energy landscape optimization.
      ). The predicted probabilities can be used as scores for the compatibility between sequences and structures. TERMs, tertiary structural motifs.

      Scoring interactions with nonprotein molecules

      Many protein functions involve interactions with other types of molecules such as DNA, RNA, saccharides, or small molecules. Expanding the types of molecules supported by scoring functions is critical for designing such protein functions. Scoring functions for DNA (
      • Havranek J.J.
      • Duarte C.M.
      • Baker D.
      A simple physical model for the prediction and design of protein-DNA interactions.
      ) and RNA (
      • Das R.
      • Baker D.
      Automated de novo prediction of native-like RNA tertiary structures.
      ) have been successfully applied to structure prediction and design (
      • Ashworth J.
      • Havranek J.J.
      • Duarte C.M.
      • Sussman D.
      • Monnat Jr., R.J.
      • Stoddard B.L.
      • Baker D.
      Computational redesign of endonuclease DNA binding and cleavage specificity.
      ,
      • Das R.
      • Karanicolas J.
      • Baker D.
      Atomic accuracy in predicting and designing noncanonical RNA structure.
      ). Recently, a scoring function was developed for saccharide and glycoconjugate structures (
      • Labonte J.W.
      • Adolf-Bryfogle J.
      • Schief W.R.
      • Gray J.J.
      Residue-centric modeling and design of saccharide and glycoconjugate structures.
      ) (Fig. 4B). Benchmarking results on docking problems showed that the scoring function has the ability to predict binding of glycan ligands. Small molecules have highly diverse combinations of chemical groups, making it challenging to transfer parameters calculated for representative molecules to other molecules. A new approach (
      • Park H.
      • Zhou G.
      • Baek M.
      • Baker D.
      • DiMaio F.
      Learning a force field from small-molecule crystal lattice predictions enables consistent sub-Angstrom protein-ligand docking.
      ) simultaneously optimized all parameters in a small-molecule energy function guided by thousands of small-molecule crystal structures. The resulting scoring functions significantly improved docking success rate.

      TERM-based scoring

      Protein design methods typically seek to find low-energy sequences for a given target structure, but this approach does not consider if there are alternative structures a sequence can adopt that have even lower free energies. One way to overcome this limitation is by directly calculating the fitness for a given structure in the protein sequence space. Protein structures can be broken up into three-dimensional local pieces called tertiary structural motifs (TERMs) (
      • Zheng F.
      • Zhang J.
      • Grigoryan G.
      Tertiary structural propensities reveal fundamental sequence/structure relationships.
      ) (Fig. 4C). Half of the structures in the PDB can be described by only about 600 TERMs (
      • Mackenzie C.O.
      • Zhou J.
      • Grigoryan G.
      Tertiary alphabet for the observable protein structural universe.
      ), indicating that the sequence preferences of each TERM could be used to calculate the fitness of a sequence for a given local structure. A strong correlation (
      • Zheng F.
      • Zhang J.
      • Grigoryan G.
      Tertiary structural propensities reveal fundamental sequence/structure relationships.
      ) was observed between the TERM-derived scores and protein structure model accuracies from the Critical Assessment of Structure Prediction. Recently, the TERM score was used to predict protein–peptide binding energies and design peptide binders of antiapoptotic proteins Bfl-1 and Mcl-1 (
      • Frappier V.
      • Jenson J.M.
      • Zhou J.
      • Grigoryan G.
      • Keating A.E.
      Tertiary structural motif sequence statistics enable facile prediction and design of peptides that bind anti-apoptotic Bfl-1 and Mcl-1.
      ).

      Protein scoring functions by machine learning methods

      The power of machine learning models to learn the statistical representations underlying rich sequence and structural data provides new perspectives for protein structure prediction and design (
      • Senior A.W.
      • Evans R.
      • Jumper J.
      • Kirkpatrick J.
      • Sifre L.
      • Green T.
      • Qin C.
      • Zidek A.
      • Nelson A.W.R.
      • Bridgland A.
      • Penedones H.
      • Petersen S.
      • Simonyan K.
      • Crossan S.
      • Kohli P.
      • et al.
      Improved protein structure prediction using potentials from deep learning.
      ,
      • Yang J.
      • Anishchenko I.
      • Park H.
      • Peng Z.
      • Ovchinnikov S.
      • Baker D.
      Improved protein structure prediction using predicted interresidue orientations.
      ,
      • Alley E.C.
      • Khimulya G.
      • Biswas S.
      • AlQuraishi M.
      • Church G.M.
      Unified rational protein engineering with sequence-based deep representation learning.
      ,
      • Xu J.
      Distance-based protein folding powered by deep learning.
      ) (Fig. 4D). Neural network models trained with evolutionary sequence data and structures from the PDB outperform traditional methods in structure prediction (
      • Senior A.W.
      • Evans R.
      • Jumper J.
      • Kirkpatrick J.
      • Sifre L.
      • Green T.
      • Qin C.
      • Zidek A.
      • Nelson A.W.R.
      • Bridgland A.
      • Penedones H.
      • Petersen S.
      • Simonyan K.
      • Crossan S.
      • Kohli P.
      • et al.
      Improved protein structure prediction using potentials from deep learning.
      ,
      • Yang J.
      • Anishchenko I.
      • Park H.
      • Peng Z.
      • Ovchinnikov S.
      • Baker D.
      Improved protein structure prediction using predicted interresidue orientations.
      ,
      • Xu J.
      Distance-based protein folding powered by deep learning.
      ). Most recently, it has been proposed that neural networks that predict inter-residue orientations (defined by three dihedral and two planar angles) can be inverted for assessing the probability of the desired structure for a given sequence; in principle, such an approach could be used as a scoring function for protein design to evaluate the fitness of a sequence across an entire structural landscape (
      • Norn C.
      • Wicky B.I.M.
      • Juergens D.
      • Liu S.
      • Kim D.
      • Koepnick B.
      • Anishchenko I.
      • Players F.
      • Baker D.
      • Ovchinnikov S.
      Protein sequence design by explicit energy landscape optimization.
      ). Another approach using a deep convolutional neural network scoring function seeks to predict the probability distribution of amino acid types at each residue position conditioned on the local environment (
      • Anand-Achim N.
      • Eguchi R.R.
      • Derry A.
      • Altman R.B.
      • Huang P.-S.
      Protein sequence design with a learned potential.
      ).

      Design of new protein functions

      Proteins perform functions by placing atoms with certain physicochemical properties at specific positions in the three-dimensional space. Initial work on the functional protein design directly borrowed from native functional site “motifs” (three-dimensional arrangements of functional groups in an existing active site) (
      • Hellinga H.W.
      • Richards F.M.
      Construction of new ligand binding sites in proteins of known structure. I. Computer-aided modeling of sites with pre-defined geometry.
      ). Recent developments and successful applications of de novo protein structure design methods are gradually overcoming the limitations imposed by the use of existing functional sites, beginning to make it possible to both design the precise placement of arbitrary functional groups and the protein environment de novo. In the following sections, we describe advances in the design in the areas of binding proteins for ligands and other proteins, large protein assemblies, membrane proteins, and protein switches.

      Ligand-binding sites

      Ligand binding is a common function for native proteins. The de novo ligand-binding site design requires high accuracy in sampling and scoring. Specificity of ligand binding is often realized by polar interactions which are highly sensitive to the positions and orientations of polar groups. A misaligned hydrogen bond could cause a considerable free energy penalty and reduce the binding affinity by an order of magnitude. Early studies designed de novo binding sites by manually defining side chains that form favorable interactions with ligands (
      • Bick M.J.
      • Greisen P.J.
      • Morey K.J.
      • Antunes M.S.
      • La D.
      • Sankaran B.
      • Reymond L.
      • Johnsson K.
      • Medford J.I.
      • Baker D.
      Computational design of environmental sensors for the potent opioid fentanyl.
      ,
      • Tinberg C.E.
      • Khare S.D.
      • Dou J.
      • Doyle L.
      • Nelson J.W.
      • Schena A.
      • Jankowski W.
      • Kalodimos C.G.
      • Johnsson K.
      • Stoddard B.L.
      • Baker D.
      Computational design of ligand-binding proteins with high affinity and selectivity.
      ,
      • Polizzi N.F.
      • Wu Y.
      • Lemmin T.
      • Maxwell A.M.
      • Zhang S.Q.
      • Rawson J.
      • Beratan D.N.
      • Therien M.J.
      • DeGrado W.F.
      De novo design of a hyperstable non-natural protein-ligand complex with sub-A accuracy.
      ). An effort that uses HBNet and a Monte Carlo sequence design algorithm to design hydrogen bonds resulted in designs that bind to ligands, but a crystal structure revealed that the ligand is rotated 180° in the pocket around a pseudo-two-fold axis in the compound (
      • Dou J.
      • Doyle L.
      • Jr Greisen P.
      • Schena A.
      • Park H.
      • Johnsson K.
      • Stoddard B.L.
      • Baker D.
      Sampling and energy evaluation challenges in ligand binding protein design.
      ). The authors suggested that the sampling methods failed to model subtle structural changes and that the scoring function underestimated desolvation energies for the ligand. This result highlights the challenges inherent in sampling and energy evaluation in binding-site designs.
      Recent developments in binding site–generation methods aim to address these challenges. The rotamer interaction field (RIF) docking method (
      • Dou J.
      • Vorobieva A.A.
      • Sheffler W.
      • Doyle L.A.
      • Park H.
      • Bick M.J.
      • Mao B.
      • Foight G.W.
      • Lee M.Y.
      • Gagnon L.A.
      • Carter L.
      • Sankaran B.
      • Ovchinnikov S.
      • Marcos E.
      • Huang P.S.
      • et al.
      De novo design of a fluorescence-activating beta-barrel.
      ) generates an ensemble of billions of discrete amino acid side chains that make hydrogen-bonding and hydrophobic interactions with the target ligand. The method then searches for protein backbone scaffolds that are able to present ligand-binding side chains with the appropriate geometry. RIF docking was successfully applied to design a binding site for the fluorogenic compound DFHBI into a de novo beta barrel scaffold (
      • Dou J.
      • Vorobieva A.A.
      • Sheffler W.
      • Doyle L.A.
      • Park H.
      • Bick M.J.
      • Mao B.
      • Foight G.W.
      • Lee M.Y.
      • Gagnon L.A.
      • Carter L.
      • Sankaran B.
      • Ovchinnikov S.
      • Marcos E.
      • Huang P.S.
      • et al.
      De novo design of a fluorescence-activating beta-barrel.
      ). Two other methods use the structural information in the PDB to generate binding-site ensembles (
      • Polizzi N.F.
      • DeGrado W.F.
      A defined structural unit enables de novo design of small-molecule-binding proteins.
      ,
      • Lucas J.E.
      • Kortemme T.
      New computational protein design methods for de novo small molecule binding sites.
      ). These methods break the ligand into smaller substructures (fragments) and find protein residues that interact with the ligand fragments from the PDB. The interacting residues are combined into binding sites by Monte Carlo–simulated annealing (
      • Lucas J.E.
      • Kortemme T.
      New computational protein design methods for de novo small molecule binding sites.
      ) or built onto backbone scaffolds by an algorithm called Convergent Motifs for Binding Sites (
      • Polizzi N.F.
      • DeGrado W.F.
      A defined structural unit enables de novo design of small-molecule-binding proteins.
      ). The Convergent Motifs for Binding Sites method was applied to engineer de novo proteins that bind the drug apixaban with low and submicromolar affinity (Fig. 5A).
      Figure thumbnail gr5
      Figure 5Advances in design of new protein functions. A, a apixaban (yellow) binder designed by the Convergent Motifs for Binding Sites (COMBS) algorithm (
      • Polizzi N.F.
      • DeGrado W.F.
      A defined structural unit enables de novo design of small-molecule-binding proteins.
      ). B, A de novo protein (green) binds the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike protein (gray) (
      • Cao L.
      • Goreshnik I.
      • Coventry B.
      • Case J.B.
      • Miller L.
      • Kozodoy L.
      • Chen R.E.
      • Carter L.
      • Walls A.C.
      • Park Y.J.
      • Strauch E.M.
      • Stewart L.
      • Diamond M.S.
      • Veesler D.
      • Baker D.
      De novo design of picomolar SARS-CoV-2 miniprotein inhibitors.
      ). C, de novo proteins self-assemble into heterodimers (
      • Chen Z.
      • Boyken S.E.
      • Jia M.
      • Busch F.
      • Flores-Solis D.
      • Bick M.J.
      • Lu P.
      • VanAernum Z.L.
      • Sahasrabuddhe A.
      • Langan R.A.
      • Bermeo S.
      • Brunette T.J.
      • Mulligan V.K.
      • Carter L.P.
      • DiMaio F.
      • et al.
      Programmable design of orthogonal protein heterodimers.
      ), two-dimensional materials (
      • Chen Z.
      • Johnson M.C.
      • Chen J.
      • Bick M.J.
      • Boyken S.E.
      • Lin B.
      • De Yoreo J.J.
      • Kollman J.M.
      • Baker D.
      • DiMaio F.
      Self-assembling 2D arrays with de Novo protein building blocks.
      ), filaments (
      • Shen H.
      • Fallas J.A.
      • Lynch E.
      • Sheffler W.
      • Parry B.
      • Jannetty N.
      • Decarreau J.
      • Wagenbach M.
      • Vicente J.J.
      • Chen J.
      • Wang L.
      • Dowling Q.
      • Oberdorfer G.
      • Stewart L.
      • Wordeman L.
      • et al.
      De novo design of self-assembling helical protein filaments.
      ), cages (
      • King N.P.
      • Bale J.B.
      • Sheffler W.
      • McNamara D.E.
      • Gonen S.
      • Gonen T.
      • Yeates T.O.
      • Baker D.
      Accurate design of co-assembling multi-component protein nanomaterials.
      ), and alpha amyloids (
      • Zhang S.Q.
      • Huang H.
      • Yang J.
      • Kratochvil H.T.
      • Lolicato M.
      • Liu Y.
      • Shu X.
      • Liu L.
      • DeGrado W.F.
      Designed peptides that assemble into cross-alpha amyloid-like structures.
      ). D, a de novo–designed multipass transmembrane protein that has a defined membrane orientation (
      • Lu P.
      • Min D.
      • DiMaio F.
      • Wei K.Y.
      • Vahey M.D.
      • Boyken S.E.
      • Chen Z.
      • Fallas J.A.
      • Ueda G.
      • Sheffler W.
      • Mulligan V.K.
      • Xu W.
      • Bowie J.U.
      • Baker D.
      Accurate computational design of multipass transmembrane proteins.
      ). E. the designed DANCER protein has a tryptophan side chain that switches between predicted conformational states on the millisecond timescale (
      • Davey J.A.
      • Damry A.M.
      • Goto N.K.
      • Chica R.A.
      Rational design of proteins that exchange on functional timescales.
      ).

      Protein binders

      Similar to the ligand binding–site design, designing protein binders to target proteins requires high accuracy scoring and sampling. A workaround to these challenges is using binding motifs from known protein–protein interfaces. Proteins that bind to influenza hemagglutinin and botulinum neurotoxin B (
      • Chevalier A.
      • Silva D.A.
      • Rocklin G.J.
      • Hicks D.R.
      • Vergara R.
      • Murapa P.
      • Bernard S.M.
      • Zhang L.
      • Lam K.H.
      • Yao G.
      • Bahl C.D.
      • Miyashita S.I.
      • Goreshnik I.
      • Fuller J.T.
      • Koday M.T.
      • et al.
      Massively parallel de novo protein design for targeted therapeutics.
      ) were designed by building known helical motifs that bind to the intended targets onto de novo designed small protein scaffolds (
      • Rocklin G.J.
      • Chidyausiku T.M.
      • Goreshnik I.
      • Ford A.
      • Houliston S.
      • Lemak A.
      • Carter L.
      • Ravichandran R.
      • Mulligan V.K.
      • Chevalier A.
      • Arrowsmith C.H.
      • Baker D.
      Global analysis of protein folding using massively parallel design, synthesis, and testing.
      ). Several hundred high-affinity binders were validated by a high-throughput yeast surface display assay. Likewise, proteins that bind to the interleukin-2 and interleukin-15 receptors were designed by building a helical bundle from interface helices of native interleukin-2 and interleukin-15 (
      • Silva D.A.
      • Yu S.
      • Ulge U.Y.
      • Spangler J.B.
      • Jude K.M.
      • Labao-Almeida C.
      • Ali L.R.
      • Quijano-Rubio A.
      • Ruterbusch M.
      • Leung I.
      • Biary T.
      • Crowley S.J.
      • Marcos E.
      • Walkey C.D.
      • Weitzner B.D.
      • et al.
      De novo design of potent and selective mimics of IL-2 and IL-15.
      ).
      Although difficult, interaction interfaces can also be designed without native motifs. Recently, the RIF docking method originally developed for the small-molecule binding site design was applied to design small helical bundle proteins that bind to the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike protein (
      • Cao L.
      • Goreshnik I.
      • Coventry B.
      • Case J.B.
      • Miller L.
      • Kozodoy L.
      • Chen R.E.
      • Carter L.
      • Walls A.C.
      • Park Y.J.
      • Strauch E.M.
      • Stewart L.
      • Diamond M.S.
      • Veesler D.
      • Baker D.
      De novo design of picomolar SARS-CoV-2 miniprotein inhibitors.
      ) (Fig. 5B), yielding binders with affinities ranging from high nanomolar to micromolar. After experimental optimization, the most potent design had a 100-pM affinity to spike.

      Protein assembly

      Several design studies have addressed the problem of the protein–protein interface design where both sides of each interface are designed, leading to protein assembly (Fig. 5C). Homo-oligomers with cyclic symmetries were designed by systematic enumeration of arrangements of the monomers followed by the interface design (
      • Fallas J.A.
      • Ueda G.
      • Sheffler W.
      • Nguyen V.
      • McNamara D.E.
      • Sankaran B.
      • Pereira J.H.
      • Parmeggiani F.
      • Brunette T.J.
      • Cascio D.
      • Yeates T.R.
      • Zwart P.
      • Baker D.
      Computational design of self-assembling cyclic protein homo-oligomers.
      ). A set of heterodimers that have orthogonal binding specificities were designed using parametric backbone generation and HBNet (
      • Chen Z.
      • Boyken S.E.
      • Jia M.
      • Busch F.
      • Flores-Solis D.
      • Bick M.J.
      • Lu P.
      • VanAernum Z.L.
      • Sahasrabuddhe A.
      • Langan R.A.
      • Bermeo S.
      • Brunette T.J.
      • Mulligan V.K.
      • Carter L.P.
      • DiMaio F.
      • et al.
      Programmable design of orthogonal protein heterodimers.
      ). The orthogonal heterodimers can be used to design protein logic gates (
      • Chen Z.
      • Kibler R.D.
      • Hunt A.
      • Busch F.
      • Pearl J.
      • Jia M.
      • VanAernum Z.L.
      • Wicky B.I.M.
      • Dods G.
      • Liao H.
      • Wilken M.S.
      • Ciarlo C.
      • Green S.
      • El-Samad H.
      • Stamatoyannopoulos J.
      • et al.
      De novo design of protein logic gates.
      ). Self-assembled nanocages with higher-order symmetries were designed by symmetric docking followed by Monte Carlo interface sequence design (
      • King N.P.
      • Bale J.B.
      • Sheffler W.
      • McNamara D.E.
      • Gonen S.
      • Gonen T.
      • Yeates T.O.
      • Baker D.
      Accurate design of co-assembling multi-component protein nanomaterials.
      ,
      • Hsia Y.
      • Bale J.B.
      • Gonen S.
      • Shi D.
      • Sheffler W.
      • Fong K.K.
      • Nattermann U.
      • Xu C.
      • Huang P.S.
      • Ravichandran R.
      • Yi S.
      • Davis T.N.
      • Gonen T.
      • King N.P.
      • Baker D.
      Design of a hyperstable 60-subunit protein dodecahedron. [corrected].
      ). Fusing the designed cages to membrane binding and endosomal sorting recruiting peptides induced the formation of nanocage-containing extracellular vesicles (
      • Votteler J.
      • Ogohara C.
      • Yi S.
      • Hsia Y.
      • Nattermann U.
      • Belnap D.M.
      • King N.P.
      • Sundquist W.I.
      Designed proteins induce the formation of nanocage-containing extracellular vesicles.
      ). The strategy of combining symmetric arrangement of protein chains and Monte Carlo interface sequence design was also successfully applied to design protein filaments (
      • Shen H.
      • Fallas J.A.
      • Lynch E.
      • Sheffler W.
      • Parry B.
      • Jannetty N.
      • Decarreau J.
      • Wagenbach M.
      • Vicente J.J.
      • Chen J.
      • Wang L.
      • Dowling Q.
      • Oberdorfer G.
      • Stewart L.
      • Wordeman L.
      • et al.
      De novo design of self-assembling helical protein filaments.
      ), alpha amyloid-like structures (
      • Zhang S.Q.
      • Huang H.
      • Yang J.
      • Kratochvil H.T.
      • Lolicato M.
      • Liu Y.
      • Shu X.
      • Liu L.
      • DeGrado W.F.
      Designed peptides that assemble into cross-alpha amyloid-like structures.
      ), or two-dimensional materials (
      • Gonen S.
      • DiMaio F.
      • Gonen T.
      • Baker D.
      Design of ordered two-dimensional arrays mediated by noncovalent protein-protein interfaces.
      ,
      • Chen Z.
      • Johnson M.C.
      • Chen J.
      • Bick M.J.
      • Boyken S.E.
      • Lin B.
      • De Yoreo J.J.
      • Kollman J.M.
      • Baker D.
      • DiMaio F.
      Self-assembling 2D arrays with de Novo protein building blocks.
      ).

      Membrane proteins

      Proteins that localize to phospholipid bilayer membranes have been designed since the emergence of the de novo protein design (
      • Lear J.D.
      • Wasserman Z.R.
      • DeGrado W.F.
      Synthetic amphiphilic peptide models for protein ion channels.
      ,
      • Whitley P.
      • Nilsson I.
      • von Heijne G.
      De novo design of integral membrane proteins.
      ). Membrane-spanning peptides that self-assemble into helical bundles were designed to perform functions such as cofactor binding (
      • Korendovych I.V.
      • Senes A.
      • Kim Y.H.
      • Lear J.D.
      • Fry H.C.
      • Therien M.J.
      • Blasie J.K.
      • Walker F.A.
      • Degrado W.F.
      De novo design and molecular assembly of a transmembrane diporphyrin-binding protein complex.
      ) and ion transport (
      • Joh N.H.
      • Wang T.
      • Bhate M.P.
      • Acharya R.
      • Wu Y.
      • Grabe M.
      • Hong M.
      • Grigoryan G.
      • DeGrado W.F.
      De novo design of a transmembrane Zn(2)(+)-transporting four-helix bundle.
      ). Recent advances have expanded the scope of the membrane protein design. A study of the driving forces of membrane protein stability showed that steric packing of nonpolar side chains alone is sufficient for the folding of membrane proteins (
      • Mravic M.
      • Thomaston J.L.
      • Tucker M.
      • Solomon P.E.
      • Liu L.
      • DeGrado W.F.
      Packing of apolar side chains enables accurate design of highly stable membrane proteins.
      ). Using a steric packing code derived from the natural protein phospholamban, the authors were able to design a synthetic membrane protein stabilized entirely by nonpolar side chains. Accurate multipass transmembrane proteins were designed (
      • Lu P.
      • Min D.
      • DiMaio F.
      • Wei K.Y.
      • Vahey M.D.
      • Boyken S.E.
      • Chen Z.
      • Fallas J.A.
      • Ueda G.
      • Sheffler W.
      • Mulligan V.K.
      • Xu W.
      • Bowie J.U.
      • Baker D.
      Accurate computational design of multipass transmembrane proteins.
      ) using a recently developed framework for membrane protein modeling (
      • Alford R.F.
      • Koehler Leman J.
      • Weitzner B.D.
      • Duran A.M.
      • Tilley D.C.
      • Elazar A.
      • Gray J.J.
      An integrated framework advancing membrane protein modeling and design.
      ) (Fig. 5D). Parametrically generated backbones were stabilized by hydrogen bond networks designed with HBNet and Monte Carlo side-chain optimization. Orientations of the designs were specified by incorporating a ring of amphipathic aromatic residues at the lipid-water boundary on the extracellular side and a ring of positively charged residues on the cytoplasmic side. This strategy was then applied to design transmembrane pores (
      • Xu C.
      • Lu P.
      • Gamal El-Din T.M.
      • Pei X.Y.
      • Johnson M.C.
      • Uyeda A.
      • Bick M.J.
      • Xu Q.
      • Jiang D.
      • Bai H.
      • Reggiano G.
      • Hsia Y.
      • Brunette T.J.
      • Dou J.
      • Ma D.
      • et al.
      Computational design of transmembrane pores.
      ). Although there was no explicit modeling of ligands that can pass through the pores, several designs displayed ligand specificity: a designed 12-helix pore selectively passed potassium over sodium, and a designed 16-helix pore (but not the 12-helix pore) enabled the passage of biotinylated Alexa Fluor 488.

      Conformational changes

      Among the most challenging functions to design are conformational changes between multiple states. A single-state design would be successful as long as the designed state resides in a deep energy minimum, so that sizable scoring errors can often be tolerated (
      • Baker D.
      What has de novo protein design taught us about protein folding and biophysics?.
      ). However, the multistate design (MSD) requires considerable accuracy in scoring relative stabilities, such that the probability distributions among multiple states can be modeled correctly. In addition, the MSD must simultaneously optimize several objectives, for example, the energies of each state and the energy differences between states. This multiple-objective optimization problem adds significant challenges to the sequence design. A recently developed meta-MSD protocol designed a protein that has a tryptophan side-chain switching between defined conformational states on the millisecond timescale (
      • Davey J.A.
      • Damry A.M.
      • Goto N.K.
      • Chica R.A.
      Rational design of proteins that exchange on functional timescales.
      ) (Fig. 5E). Meta-MSD used a back-rub ensemble of backbones (
      • Smith C.A.
      • Kortemme T.
      Predicting the tolerated sequences for proteins and protein interfaces using RosettaBackrub flexible backbone design.
      ) as the input. Side chains were then designed by optimizing the Boltzmann-weighted average energy of all members from the ensemble using the fast and accurate side chain topology and energy refinement algorithm (
      • Allen B.D.
      • Mayo S.L.
      An efficient algorithm for multistate protein design based on FASTER.
      ). The energy landscape of a designed sequence was estimated using energies of each backbone structure from the ensemble. Sequences with energy landscapes that supported desired conformational dynamics were selected as final designs.

      Protein switches

      Protein switches change their conformations when triggered by external signals, adding a potential extra layer of complexity over designing proteins that adopt multiple conformations. However, designing switches could be seen as a more tractable problem because the external trigger can introduce a large free-energy bias toward one state, making the design success less sensitive to scoring errors. An early study described a protein designed to switch between two distinct target folds triggered by the addition of Zn2+ (
      • Ambroggio X.I.
      • Kuhlman B.
      Computational design of a single amino acid sequence that can switch between two distinct protein folds.
      ). The authors used a Monte Carlo side-chain design method to optimize the sum of energies of the two folded states, showing that it is possible to design protein switches by solving a single-objective optimization problem. Following similar principles, other proteins were designed to change the oligomerization state in response to a pH change (
      • Boyken S.E.
      • Benhaim M.A.
      • Busch F.
      • Jia M.
      • Bick M.J.
      • Choi H.
      • Klima J.C.
      • Chen Z.
      • Walkey C.
      • Mileant A.
      • Sahasrabuddhe A.
      • Wei K.Y.
      • Hodge E.A.
      • Byron S.
      • Quijano-Rubio A.
      • et al.
      De novo design of tunable, pH-driven conformational changes.
      ) (Fig. 6A) or change conformations in the presence of Ca2+ (
      • Wei K.Y.
      • Moschidi D.
      • Bick M.J.
      • Nerli S.
      • McShan A.C.
      • Carter L.P.
      • Huang P.S.
      • Fletcher D.A.
      • Sgourakis N.G.
      • Boyken S.E.
      • Baker D.
      Computational design of closely related proteins that adopt two well-defined but structurally divergent folds.
      ) (Fig. 6B). A modular protein switch that senses a small molecule was designed through an induced dimerization mechanism (
      • Glasgow A.A.
      • Huang Y.M.
      • Mandell D.J.
      • Thompson M.
      • Ritterson R.
      • Loshbaugh A.L.
      • Pellegrino J.
      • Krivacic C.
      • Pache R.A.
      • Barlow K.A.
      • Ollikainen N.
      • Jeon D.
      • Kelly M.J.S.
      • Fraser J.S.
      • Kortemme T.
      Computational design of a modular protein sense-response system.
      ) (Fig. 6C). A ligand binding site for farnesyl pyrophosphate was designed de novo at the interface of a protein–protein heterodimer complex. The designed proteins dimerized in the presence of the farnesyl pyrophosphate ligand and were able to transduce several modular downstream signals such as the enzyme activity, fluorescence, or luminescence. Latching orthogonal cage-key proteins is another recently designed protein switch system (
      • Langan R.A.
      • Boyken S.E.
      • Ng A.H.
      • Samson J.A.
      • Dods G.
      • Westbrook A.M.
      • Nguyen T.H.
      • Lajoie M.J.
      • Chen Z.
      • Berger S.
      • Mulligan V.K.
      • Dueber J.E.
      • Novak W.R.P.
      • El-Samad H.
      • Baker D.
      De novo design of bioactive protein switches.
      ), consistent of a helical bundle and a helical peptide called key (Fig. 6D). The key peptide can displace a helix in the bundle and expose a signal on the displaced helix. The latching orthogonal cage-key proteins system was used to induce protein degradation and localization (
      • Langan R.A.
      • Boyken S.E.
      • Ng A.H.
      • Samson J.A.
      • Dods G.
      • Westbrook A.M.
      • Nguyen T.H.
      • Lajoie M.J.
      • Chen Z.
      • Berger S.
      • Mulligan V.K.
      • Dueber J.E.
      • Novak W.R.P.
      • El-Samad H.
      • Baker D.
      De novo design of bioactive protein switches.
      ), target cells with precise combinations of surface antigens (
      • Lajoie M.J.
      • Boyken S.E.
      • Salter A.I.
      • Bruffey J.
      • Rajan A.
      • Langan R.A.
      • Olshefsky A.
      • Muhunthan V.
      • Bick M.J.
      • Gewe M.
      • Quijano-Rubio A.
      • Johnson J.
      • Lenz G.
      • Nguyen A.
      • Pun S.
      • et al.
      Designed protein logic to target cells with precise combinations of surface antigens.
      ), and detect viral proteins (
      • Quijano-Rubio A.
      • Yeh H.W.
      • Park J.
      • Lee H.
      • Langan R.A.
      • Boyken S.E.
      • Lajoie M.J.
      • Cao L.
      • Chow C.M.
      • Miranda M.C.
      • Wi J.
      • Hong H.J.
      • Stewart L.
      • Oh B.H.
      • Baker D.
      De novo design of modular and tunable allosteric biosensors.
      ).
      Figure thumbnail gr6
      Figure 6Advances in the design of protein switches that change conformation in response to diverse signals. A, a designed helical trimer changes its oligomerization state in response to pH changes (
      • Boyken S.E.
      • Benhaim M.A.
      • Busch F.
      • Jia M.
      • Bick M.J.
      • Choi H.
      • Klima J.C.
      • Chen Z.
      • Walkey C.
      • Mileant A.
      • Sahasrabuddhe A.
      • Wei K.Y.
      • Hodge E.A.
      • Byron S.
      • Quijano-Rubio A.
      • et al.
      De novo design of tunable, pH-driven conformational changes.
      ). B, a designed helical bundle protein changes conformation upon binding to a calcium ion (green) and a chloride ion (blue) (
      • Wei K.Y.
      • Moschidi D.
      • Bick M.J.
      • Nerli S.
      • McShan A.C.
      • Carter L.P.
      • Huang P.S.
      • Fletcher D.A.
      • Sgourakis N.G.
      • Boyken S.E.
      • Baker D.
      Computational design of closely related proteins that adopt two well-defined but structurally divergent folds.
      ). C, a designed artificial chemically induced dimerization system (
      • Glasgow A.A.
      • Huang Y.M.
      • Mandell D.J.
      • Thompson M.
      • Ritterson R.
      • Loshbaugh A.L.
      • Pellegrino J.
      • Krivacic C.
      • Pache R.A.
      • Barlow K.A.
      • Ollikainen N.
      • Jeon D.
      • Kelly M.J.S.
      • Fraser J.S.
      • Kortemme T.
      Computational design of a modular protein sense-response system.
      ) assembles upon binding to a farnesyl pyrophosphate ligand (spheres), linking ligand binding (sensing) to a modular response through reconstitution of a split output module (gray, magenta). D, in the LOCKR system, a helical peptide “key” (magenta) can displace and expose a signal peptide (green) (
      • Langan R.A.
      • Boyken S.E.
      • Ng A.H.
      • Samson J.A.
      • Dods G.
      • Westbrook A.M.
      • Nguyen T.H.
      • Lajoie M.J.
      • Chen Z.
      • Berger S.
      • Mulligan V.K.
      • Dueber J.E.
      • Novak W.R.P.
      • El-Samad H.
      • Baker D.
      De novo design of bioactive protein switches.
      ). LOCKR, latching orthogonal cage-key proteins.

      Future perspectives

      The development of computational methods for de novo protein design in the last two decades has expanded the scope of designable protein structures and functions considerably. Automatic computational tools have enabled nonexperts to accurately design well-folded de novo proteins (
      • Koepnick B.
      • Flatten J.
      • Husain T.
      • Ford A.
      • Silva D.A.
      • Bick M.J.
      • Bauer A.
      • Liu G.
      • Ishida Y.
      • Boykov A.
      • Estep R.D.
      • Kleinfelter S.
      • Norgard-Solano T.
      • Wei L.
      • Players F.
      • et al.
      De novo protein design by citizen scientists.
      ). However, the de novo protein design is not a solved problem. Because proteins have highly diverse structures and functions, the difficulties of design problems also have great variations (Fig. 7, Tables 1 and 2). While robust protocols exist for designing helical bundles and small, idealized proteins with certain alpha-beta fold topologies (
      • Koga N.
      • Tatsumi-Koga R.
      • Liu G.
      • Xiao R.
      • Acton T.B.
      • Montelione G.T.
      • Baker D.
      Principles for designing ideal protein structures.
      ,
      • Huang P.S.
      • Oberdorfer G.
      • Xu C.
      • Pei X.Y.
      • Nannenga B.L.
      • Rogers J.M.
      • DiMaio F.
      • Gonen T.
      • Luisi B.
      • Baker D.
      High thermodynamic stability of parametrically designed helical bundles.
      ,
      • Rocklin G.J.
      • Chidyausiku T.M.
      • Goreshnik I.
      • Ford A.
      • Houliston S.
      • Lemak A.
      • Carter L.
      • Ravichandran R.
      • Mulligan V.K.
      • Chevalier A.
      • Arrowsmith C.H.
      • Baker D.
      Global analysis of protein folding using massively parallel design, synthesis, and testing.
      ), the success rates for other proteins such as beta barrels can be low (
      • Dou J.
      • Vorobieva A.A.
      • Sheffler W.
      • Doyle L.A.
      • Park H.
      • Bick M.J.
      • Mao B.
      • Foight G.W.
      • Lee M.Y.
      • Gagnon L.A.
      • Carter L.
      • Sankaran B.
      • Ovchinnikov S.
      • Marcos E.
      • Huang P.S.
      • et al.
      De novo design of a fluorescence-activating beta-barrel.
      ,
      • Marcos E.
      • Chidyausiku T.M.
      • McShan A.C.
      • Evangelidis T.
      • Nerli S.
      • Carter L.
      • Nivon L.G.
      • Davis A.
      • Oberdorfer G.
      • Tripsianes K.
      • Sgourakis N.G.
      • Baker D.
      De novo design of a non-local beta-sheet protein with high stability and accuracy.
      ,
      • Huang P.S.
      • Feldmeier K.
      • Parmeggiani F.
      • Velasco D.A.F.
      • Hocker B.
      • Baker D.
      De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy.
      ). Addressing those challenging problems still requires significant amount of expertise, and sometimes trial and error. Challenges are particularly apparent in the design of proteins with new functions (Fig. 7). New protein structures can be designed with considerable success rates without experimental optimization (Table 1), but the activities of proteins derived directly from the computational design are often weaker than achievable activities of naturally evolved proteins. Therefore, computational designs are often (although not always) optimized by experimental methods such as site saturation mutagenesis (
      • Cao L.
      • Goreshnik I.
      • Coventry B.
      • Case J.B.
      • Miller L.
      • Kozodoy L.
      • Chen R.E.
      • Carter L.
      • Walls A.C.
      • Park Y.J.
      • Strauch E.M.
      • Stewart L.
      • Diamond M.S.
      • Veesler D.
      • Baker D.
      De novo design of picomolar SARS-CoV-2 miniprotein inhibitors.
      ,
      • Tinberg C.E.
      • Khare S.D.
      • Dou J.
      • Doyle L.
      • Nelson J.W.
      • Schena A.
      • Jankowski W.
      • Kalodimos C.G.
      • Johnsson K.
      • Stoddard B.L.
      • Baker D.
      Computational design of ligand-binding proteins with high affinity and selectivity.
      ).
      Figure thumbnail gr7
      Figure 7Success rates reported for design studies listed in . The success rate is defined as the percentage of reported designs in each study that adopt the designed structure (folded, blue; experimental structure determined, orange) or function (green, red). The circle size denotes the number of folded/functional designs in each study. The success rates for studies where proteins were de novo–designed to have new structures are varied but can be high with many designs (blue). In contrast, success rates and numbers of successful designs for proteins with new functions (green) are much lower, except in a few cases where functional designs were all-helical proteins (red). Only studies that reported ten or more experimentally characterized designs () are included. “Folded” refers to designs that were characterized by CD and/or NMR spectroscopy or had an experimentally determined structure, displayed the expected oligomerization state (if measured), and/or were functional (if designed to have a function).
      Table 1Success rates of designs tested by low- to medium-throughput experiments
      Design goal and referenceDesigns testedSolubleFolded (CD)Correct monomer/oligomerFolded (NMR)Solved structureFunctional
      Here we use a broad definition of functions, including, for example, membrane localization or formation of defined complex structure.
      Highly stable helical bundles (
      • Huang P.S.
      • Oberdorfer G.
      • Xu C.
      • Pei X.Y.
      • Nannenga B.L.
      • Rogers J.M.
      • DiMaio F.
      • Gonen T.
      • Luisi B.
      • Baker D.
      High thermodynamic stability of parametrically designed helical bundles.
      )
      95553
      Ideal α-β proteins (
      • Koga N.
      • Tatsumi-Koga R.
      • Liu G.
      • Xiao R.
      • Acton T.B.
      • Montelione G.T.
      • Baker D.
      Principles for designing ideal protein structures.
      )
      54453217165
      Ideal α-β proteins (
      • Lin Y.R.
      • Koga N.
      • Tatsumi-Koga R.
      • Liu G.
      • Clouser A.F.
      • Montelione G.T.
      • Baker D.
      Control over overall shape and size in de novo designed proteins.
      )
      72644739176
      Proteins with curved β-sheets (
      • Marcos E.
      • Basanta B.
      • Chidyausiku T.M.
      • Tang Y.
      • Oberdorfer G.
      • Liu G.
      • Swapna G.V.
      • Guan R.
      • Silva D.A.
      • Dou J.
      • Pereira J.H.
      • Xiao R.
      • Sankaran B.
      • Zwart P.H.
      • Montelione G.T.
      • et al.
      Principles for designing proteins with cavities formed by curved beta sheets.
      )
      66585354258
      Proteins with the jelly roll topology (
      • Marcos E.
      • Chidyausiku T.M.
      • McShan A.C.
      • Evangelidis T.
      • Nerli S.
      • Carter L.
      • Nivon L.G.
      • Davis A.
      • Oberdorfer G.
      • Tripsianes K.
      • Sgourakis N.G.
      • Baker D.
      De novo design of a non-local beta-sheet protein with high stability and accuracy.
      )
      19162221
      Novel helical folds (
      • Jacobs T.M.
      • Williams B.
      • Williams T.
      • Xu X.
      • Eletsky A.
      • Federizon J.F.
      • Szyperski T.
      • Kuhlman B.
      Design of structurally distinct proteins using strategies inspired by evolution.
      )
      118442
      FoldIt player designed proteins (
      • Koepnick B.
      • Flatten J.
      • Husain T.
      • Ford A.
      • Silva D.A.
      • Bick M.J.
      • Bauer A.
      • Liu G.
      • Ishida Y.
      • Boykov A.
      • Estep R.D.
      • Kleinfelter S.
      • Norgard-Solano T.
      • Wei L.
      • Players F.
      • et al.
      De novo protein design by citizen scientists.
      )
      14610156664
      4-Fold symmetric TIM barrels (
      • Huang P.S.
      • Feldmeier K.
      • Parmeggiani F.
      • Velasco D.A.F.
      • Hocker B.
      • Baker D.
      De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy.
      )
      222251
      Leucine-repeat proteins (
      • Park K.
      • Shen B.W.
      • Parmeggiani F.
      • Huang P.S.
      • Stoddard B.L.
      • Baker D.
      Control of repeat-protein curvature by computational protein design.
      )
      292925227
      Repeat proteins (
      • Brunette T.J.
      • Parmeggiani F.
      • Huang P.S.
      • Bhabha G.
      • Ekiert D.C.
      • Tsutakawa S.E.
      • Hura G.L.
      • Tainer J.A.
      • Baker D.
      Exploring the repeat protein universe through computational protein design.
      )
      8374725315
      Repeat proteins with closed toroid structures (
      • Doyle L.
      • Hallinan J.
      • Bolduc J.
      • Parmeggiani F.
      • Baker D.
      • Stoddard B.L.
      • Bradley P.
      Rational design of alpha-helical tandem repeat proteins with closed architectures.
      )
      20104
      De novo fold families (
      • Pan X.
      • Thompson M.C.
      • Zhang Y.
      • Liu L.
      • Fraser J.S.
      • Kelly M.J.S.
      • Kortemme T.
      Expanding the space of protein geometries by computational design of de novo fold families.
      )
      45241717174
      Constrained peptides (
      • Bhardwaj G.
      • Mulligan V.K.
      • Bahl C.D.
      • Gilmore J.M.
      • Harvey P.J.
      • Cheneval O.
      • Buchko G.W.
      • Pulavarti S.V.
      • Kaas Q.
      • Eletsky A.
      • Huang P.S.
      • Johnsen W.A.
      • Greisen P.J.
      • Rocklin G.J.
      • Song Y.
      • et al.
      Accurate de novo design of hyperstable constrained peptides.
      )
      13712
      Peptide macrocycles (
      • Hosseinzadeh P.
      • Bhardwaj G.
      • Mulligan V.K.
      • Shortridge M.D.
      • Craven T.W.
      • Pardo-Avila F.
      • Rettie S.A.
      • Kim D.E.
      • Silva D.A.
      • Ibrahim Y.M.
      • Webb I.K.
      • Cort J.R.
      • Adkins J.N.
      • Varani G.
      • Baker D.
      Comprehensive computational design of ordered peptide macrocycles.
      )
      231111
      Design by deep network hallucination (
      • Anishchenko I.
      • Chidyausiku T.M.
      • Ovchinnikov S.
      • Pellock S.J.
      • Baker D.
      De novo protein design by deep network hallucination.
      )
      1291292732
      Helical bundles with hydrogen bond networks (
      • Boyken S.E.
      • Chen Z.
      • Groves B.
      • Langan R.A.
      • Oberdorfer G.
      • Ford A.
      • Gilmore J.M.
      • Xu C.
      • DiMaio F.
      • Pereira J.H.
      • Sankaran B.
      • Seelig G.
      • Zwart P.H.
      • Baker D.
      De novo design of protein homo-oligomers with modular hydrogen-bond network-mediated specificity.
      )
      1141011016610
      Fentanyl binding proteins (
      • Bick M.J.
      • Greisen P.J.
      • Morey K.J.
      • Antunes M.S.
      • La D.
      • Sankaran B.
      • Reymond L.
      • Johnsson K.
      • Medford J.I.
      • Baker D.
      Computational design of environmental sensors for the potent opioid fentanyl.
      )
      6213
      Digoxigenin binding proteins (
      • Tinberg C.E.
      • Khare S.D.
      • Dou J.
      • Doyle L.
      • Nelson J.W.
      • Schena A.
      • Jankowski W.
      • Kalodimos C.G.
      • Johnsson K.
      • Stoddard B.L.
      • Baker D.
      Computational design of ligand-binding proteins with high affinity and selectivity.
      )
      1722
      Porphyrin binding protein (
      • Polizzi N.F.
      • Wu Y.
      • Lemmin T.
      • Maxwell A.M.
      • Zhang S.Q.
      • Rawson J.
      • Beratan D.N.
      • Therien M.J.
      • DeGrado W.F.
      De novo design of a hyperstable non-natural protein-ligand complex with sub-A accuracy.
      )
      1111111
      Apixaban binding proteins (
      • Polizzi N.F.
      • DeGrado W.F.
      A defined structural unit enables de novo design of small-molecule-binding proteins.
      )
      66612
      Fluorescence-activating β barrels (
      • Dou J.
      • Vorobieva A.A.
      • Sheffler W.
      • Doyle L.A.
      • Park H.
      • Bick M.J.
      • Mao B.
      • Foight G.W.
      • Lee M.Y.
      • Gagnon L.A.
      • Carter L.
      • Sankaran B.
      • Ovchinnikov S.
      • Marcos E.
      • Huang P.S.
      • et al.
      De novo design of a fluorescence-activating beta-barrel.
      )
      5638162212
      IL-2 and IL-15 mimics (
      • Silva D.A.
      • Yu S.
      • Ulge U.Y.
      • Spangler J.B.
      • Jude K.M.
      • Labao-Almeida C.
      • Ali L.R.
      • Quijano-Rubio A.
      • Ruterbusch M.
      • Leung I.
      • Biary T.
      • Crowley S.J.
      • Marcos E.
      • Walkey C.D.
      • Weitzner B.D.
      • et al.
      De novo design of potent and selective mimics of IL-2 and IL-15.
      )
      1218
      Repeat proteins using rigid helical junctions (
      • Brunette T.J.
      • Bick M.J.
      • Hansen J.M.
      • Chow C.M.
      • Kollman J.M.
      • Baker D.
      Modular repeat protein sculpting using rigid helical junctions.
      )
      34333330428
      Cyclic protein homo-oligomers (
      • Fallas J.A.
      • Ueda G.
      • Sheffler W.
      • Nguyen V.
      • McNamara D.E.
      • Sankaran B.
      • Pereira J.H.
      • Parmeggiani F.
      • Brunette T.J.
      • Cascio D.
      • Yeates T.R.
      • Zwart P.
      • Baker D.
      Computational design of self-assembling cyclic protein homo-oligomers.
      )
      966421515
      Orthogonal protein heterodimers (
      • Chen Z.
      • Boyken S.E.
      • Jia M.
      • Busch F.
      • Flores-Solis D.
      • Bick M.J.
      • Lu P.
      • VanAernum Z.L.
      • Sahasrabuddhe A.
      • Langan R.A.
      • Bermeo S.
      • Brunette T.J.
      • Mulligan V.K.
      • Carter L.P.
      • DiMaio F.
      • et al.
      Programmable design of orthogonal protein heterodimers.
      )
      979485639
      60-Subunit protein dodecahedron (
      • Hsia Y.
      • Bale J.B.
      • Gonen S.
      • Shi D.
      • Sheffler W.
      • Fong K.K.
      • Nattermann U.
      • Xu C.
      • Huang P.S.
      • Ravichandran R.
      • Yi S.
      • Davis T.N.
      • Gonen T.
      • King N.P.
      • Baker D.
      Design of a hyperstable 60-subunit protein dodecahedron. [corrected].
      )
      173212
      Protein filaments (
      • Shen H.
      • Fallas J.A.
      • Lynch E.
      • Sheffler W.
      • Parry B.
      • Jannetty N.
      • Decarreau J.
      • Wagenbach M.
      • Vicente J.J.
      • Chen J.
      • Wang L.
      • Dowling Q.
      • Oberdorfer G.
      • Stewart L.
      • Wordeman L.
      • et al.
      De novo design of self-assembling helical protein filaments.
      )
      124
      Successful designs can be insoluble.
      634
      α Amyloid peptides (
      • Zhang S.Q.
      • Huang H.
      • Yang J.
      • Kratochvil H.T.
      • Lolicato M.
      • Liu Y.
      • Shu X.
      • Liu L.
      • DeGrado W.F.
      Designed peptides that assemble into cross-alpha amyloid-like structures.
      )
      6
      Successful designs can be insoluble.
      644
      Two-dimensional protein arrays (