Advertisement

How the Protein Data Bank changed biology: An introduction to the JBC Reviews thematic series, part 1

  • Helen M. Berman
    Correspondence
    For correspondence: Helen M. Berman; Lila M. Gierasch
    Affiliations
    Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA

    Department of Biological Sciences and Bridge Institute, University of Southern California, Los Angeles, California, USA
    Search for articles by this author
  • Lila M. Gierasch
    Correspondence
    For correspondence: Helen M. Berman; Lila M. Gierasch
    Affiliations
    Departments of Biochemistry & Molecular Biology and Chemistry, University of Massachusetts, Amherst, Massachusetts, USA
    Search for articles by this author
Open AccessPublished:March 27, 2021DOI:https://doi.org/10.1016/j.jbc.2021.100608
      This collection of articles celebrates the 50th anniversary of the Protein Data Bank (PDB), the single global digital archive of biological macromolecular structures. The impact of the PDB is immense; we have invited a number of top researchers in structural biology to illustrate its influence on an array of scientific fields. What emerges is a compelling picture of the synergism between the PDB and the explosive progress witnessed in many scientific areas. Availability of reliable, openly accessible, well-archived structural information has arguably had more impact on cell and molecular biology than even some of the enabling technologies such as PCR. We have seen the science move from a time when structural biologists contributed the lion’s share of the structures to the PDB and for discussion within their community to a time when any effort to achieve in-depth understanding of a biochemical or cell biological question demands an interdisciplinary approach built atop structural underpinnings.

      Keywords

      Abbreviations:

      BNL (Brookhaven National Laboratory), PDB (Protein Data Bank), RCSB (Research Collaboratory for Structural Bioinformatics)
      A bit of history: In the 1960s when the very first protein structures began to be published, groups of scientists in the United States and Europe began to discuss the possibility of creating an archive for these data. Informal and formal meetings among interested parties were held, and one such meeting led to a petition directed at the crystallographic community to create a repository for crystallographic data accessible to all. The motivation behind the many discussions and the petition were many: Those who were determining the structures were being asked to share their data, which in those days was very challenging. Others began to sense that the data could yield some very interesting science and were keen to be able to analyze the structures. At the Cold Spring Harbor Symposium held in 1971, a group of scientists who had been involved in the earlier discussions (including one of us [H.B.]) approached Walter Hamilton, a prominent crystallographer at Brookhaven National Laboratory (BNL), and raised the possibility of a Protein Data Bank (PDB). He immediately said he would do it and promptly flew to England to initiate a collaboration with Olga Kennard, the head of the Cambridge Crystallographic Data Centre, to set up the PDB. An announcement of the fledgling archive appeared in October 1971 (
      Protein Data Bank
      Crystallography: Protein Data Bank.
      ). Edgar Meyer, also at BNL, who developed protein visualization software, and Helen Berman began to work on the project with only a handful of structures. After Hamilton’s untimely death in 1973, Tom Koetzle, a postdoctoral fellow at BNL, took over the leadership and so the work continued.
      At first, the PDB grew very slowly, and Koetzle put in significant effort to convince people to deposit their data. Ten years later, members of the community began to make very public demands that deposition be mandatory. After many discussions and yet another petition to the structural biology community—this one led by Fred Richards of Yale—guidelines were put in place by the International Union of Crystallography for deposition of coordinates as a condition of publication (
      International Union of Crystallography
      Policy on publication and the deposition of data from crystallographic studies of biological macromolecules.
      ). More structures were deposited, at first mostly by crystallographers. As other methods for structure determination emerged, such as NMR spectroscopy and cryo-EM, those structures became a part of the PDB. In 1999, when there were 9000 structures in the PDB, the management was taken over by the Research Collaboratory for Structural Bioinformatics (RCSB), a consortium consisting of Rutgers, the San Diego Supercomputer Center, and the National Institute of Standards and Technology. In 2003, the Worldwide PDB was created to ensure that there would be a single global archive for structural data and that deposited data would be processed with uniform standards (
      • Berman H.M.
      • Henrick K.
      • Nakamura H.
      Announcing the worldwide Protein Data Bank.
      ). The initial members were the RCSB PDB, Macromolecular Structure Database (MSD) later called Protein Data Bank Europe (PDBe), and Protein Data Bank Japan (PDBj). In 2006, the database created for NMR spectra called BioMagResBank (BMRB) joined.
      The last 20 years has seen massive growth of the PDB, both in terms of the number of structures and complexity. Today, the archive contains more than 175,000 structures ranging in molecular weight from less than 20,000 Da to more than 380,000 Da. Complexes with more than 1000 components are now part of the PDB. The emphasis on quality control has grown, with expert task forces making recommendations for how best to validate structures determined by X-ray crystallography, NMR spectroscopy, and 3D electron microscopy. As integrative models are computed from data generated by several different methods, a special project is underway to create the necessary infrastructure to archive these structures.
      These structures, standards, and projects are brought to life in this collection of JBC Reviews. We are particularly delighted to present this collection to you, as JBC has taken center stage in the history of the PDB for two key reasons: First, JBC was one of the first journals to require that authors deposit structural data reported in accepted articles to the PDB (
      The Editors
      Instructions to authors.
      ). This requirement is now widespread, if not ubiquitous among journals. Second, more structures now in the PDB have been published in JBC than in any other journal.
      The structures in the PDB are diverse in almost every respect and cover multiple areas of biology and biochemistry. In this compendium, we have tried to cover at least part of the spectrum and give a sense of how much we have learned by being able to compare and study groups of structures. Many of the authors have chosen to describe the history of the field in terms of how the technology has evolved and in terms of the attitudes about structure sharing. In the paragraphs below, we offer previews of the review articles included in part 1 of this collection. Part 2 will include more reviews that celebrate additional scientific areas that have been profoundly touched by the creation of the PDB.
      Many would agree that some of the most influential structures in the PDB are those of the ribosome, the complex factory made up of two large subunits, each with multiple chains of protein and RNA molecules, which produces proteins through the coordinated reactions of translation. Peter Moore (Yale University) provides in his article a historical account of the determination of the ribosome structure by four groups, three of whose leaders won Nobel Prizes in 2009 for their work (
      • Moore P.B.
      The PDB and the ribosome.
      ). He points out the challenges that those first ribosome structures presented for the PDB. In retrospect, it is clear how fortunate it was that early RCSB PDB curators had been part of the Nucleic Acid Database project, enabling them to apply that experience and insight into representing these game-changing structures. Moore’s review captures the transition from reliance on X-ray crystallography to study these large machines to the current practice of using 3D electron microscopy for most ribosome structures, and why that is now the method of choice for these large assemblies. In addition to his description of the impactful work on the structure of the ribosome, Peter Moore adds unique reflections that his participation in structural biology from the “birth” of the PDB enables him to offer. He comments on the “state of play” in 1971 and the amazing group of scientists who presented at the Cold Spring Harbor Meeting described above. It is not surprising after reading his remarks that the creation of the PDB was one outcome of the meeting.
      The importance of understanding the details of virus structure has been highlighted during this pandemic. In their review, John ‘Jack’ Johnson and Art Olson (The Scripps Research Institute) focus on icosahedral viruses, starting from the determination of the structure of two plant viruses in the 1980s followed by human viruses such as rhinovirus (
      • Johnson J.E.
      • Olson A.J.
      Icosahedral virus structures and the Protein Data Bank.
      ). Now there are hundreds of icosahedral virus structures in the PDB. They point out that, although these structures were of intense interest to the structural biology community, it was difficult to communicate the details to virologists. A graphics program, VIPER, provided that pathway and is also the basis for the way PDB represents these structures. The increasing use of 3D electron microscopy by crystallographers is described in one of Johnson’s publications (
      • Johnson J.E.
      Confessions of an icosahedral virus crystallographer.
      ). Olson also shares some of his early experiences in co-opting computer graphics programs to make the first movies of plant virus structures.
      The discovery of the DNA structure using fiber diffraction data was the seminal event that paved the way to molecular and structural biology. Yet, it took almost two decades before an atomic level structure of a defined sequence was determined by Richard Dickerson’s laboratory at California Institute of Technology. Stephen Neidle (University College London School of Pharmacy) describes the history of DNA crystallography and how the earliest structures paved the way for understanding sequence-dependent structural diversity in oligonucleotides that are key to the recognition of DNA by proteins (
      • Neidle S.
      Beyond the double helix: DNA structural diversity and the PDB.
      ). The ability of G-rich tracts to form quadruplexes, far from being a biophysical artifact, underlies their role in human telomeres and allows them to assume a variety of functional topologies. And, more recently, deoxyribozymes that cleave RNA have been discovered and analyzed. Neidle highlights the important role that the PDB plays in ensuring the quality of the structures that are used for computational analyses and drug design.
      The last 50 years has also seen an amazing evolution in RNA structure. Eric Westhof (University of Strasbourg) and the late Neocles Leontis (Bowling Green University) trace the history, which began with the structure determinations of tRNA and small RNA fragments as models for the double helix (
      • Westhof E.
      • Leontis N.B.
      An RNA-centric historical narrative around the Protein Data Bank.
      ). The discovery of ribozymes opened the door to the RNA world, which continues to surprise us with its new folds and functions. The rich set of folding rules derived from these structures is discussed, as are modeling efforts based on these rules. The authors point out the value of the PDB in assembling and curating these structures and the key importance of validation. They also comment on the early role that the Nucleic Acid Database played as a testbed for the new formats that allow large macromolecular assemblies such as the ribosome to be properly archived in the PDB. Westhof also includes a tribute to Leontis, who was an RNA scholar and his longtime collaborator.
      Many biological recognition and regulatory processes rely on the surface composition of proteins—the “face” they present to the world around them. In eukaryotic systems, carbohydrate modifications on the surfaces of proteins are extremely widespread. Yet, these decorations on glycoproteins proved elusive structurally for many years, in part because of heavy reliance on bacterial expression systems for preparation of adequate amounts of proteins for crystallography. In addition, the heterogeneity and dynamic character of carbohydrate modifications on proteins stymied crystallography for many years. James Prestegard (University of Georgia) reviews the breakthroughs that led to structural descriptions of the carbohydrate components of glycoproteins and how the resulting insights have elucidated biological puzzles (
      • Prestegard J.H.
      A perspective on the PDB’s impact on the field of glycobiology.
      ). Central to this challenging area of structural biology was the deployment of NMR as a key method to determine the compositions and structures of the carbohydrate components of glycoproteins. The PDB has welcomed NMR structures for many years, and glycoproteins represented an example where synergistic use of multiple methods was essential to structural advances.
      Membrane proteins have also presented unique challenges for structure determination because of the intimate dependence of their structural integrity on the anisotropic environment in which they function. Robert Stroud (University of California, San Francisco) and coworkers have beautifully described the massive progress that has occurred in the structural biology of membrane proteins, how breakthroughs were achieved by discovery of productive crystallization methods, and the growing number of examples now in the PDB that are leading to stunning advances in our understanding of many biological systems (
      • Li F.
      • Egea P.F.
      • Vecchio A.J.
      • Asial I.
      • Gupta M.
      • Paulino J.
      • Bajaj R.
      • Dickinson M.S.
      • Ferguson-Miller S.
      • Monk B.C.
      • Stroud R.M.
      Highlighting membrane protein structure and function: A celebration of the Protein Data Bank.
      ). This area of structural biology has been greatly facilitated by recent progress in cryo-EM.
      Indeed, cryo-EM has emerged as one of the fastest growing methods for structure determination. Wah Chiu (Stanford University) and colleagues provide an historical overview of the key technical advances in sample preparation, instrumentation, and computer software that have contributed to the remarkable growth in the number of structures and vast improvements in resolution (
      • Chiu W.
      • Schmid M.F.
      • Pintilie G.D.
      • Lawson C.L.
      Evolution of standardization and dissemination of cryo-EM structures and data jointly by the community, PDB, and EMDB.
      ). He discusses the community activities that have led to the creation of data archives for maps and models, and the global collaboration that made it possible for both types of data to be deposited via the single Worldwide PDB deposition system. The development of validation criteria for maps and models has been facilitated by "Challenges" organized by the Electron Microscopy Data Resource (https://challenges.emdataresource.org/). The authors end their piece by highlighting the importance of validation reports in raising the quality of final structures in the PDB.
      Computational biology has made major leaps over the lifetime of the PDB, in no small measure because of the rich data available in solved protein structures. Tanja Kortemme (University of California, San Francisco) and her colleague Xingjie Pan describe how the field of protein design has been enabled by access to the extensive structural data in the PDB (
      • Pan X.
      • Kortemme T.
      Recent advances in de novo protein design: Principles, methods, and applications.
      ). The ability to use the principles that Nature illustrates in evolutionarily honed structures to build up guiding principles for designed proteins has been crucial. As beautifully presented in this JBC Review, we are seeing the field of computational protein design achieve landmark goals: design of novel proteins with desired functions, design of folds never previously observed, and mimicking the impact of evolution on naturally occurring protein families in families of designed proteins. Not only do these advances open many doors for engineering novel proteins but also they inform the field of structure prediction, as strikingly illustrated by recent artificial intelligence prediction methods (
      • Senior A.W.
      • Evans R.
      • Jumper J.
      • Kirkpatrick J.
      • Sifre L.
      • Green T.
      • Qin C.
      • Žídek A.
      • Nelson A.W.R.
      • Bridgland A.
      • Penedones H.
      • Petersen S.
      • Simonyan K.
      • Crossan S.
      • Kohli P.
      • et al.
      Improved protein structure prediction using potentials from deep learning.
      ).
      The advances in computational biology have also fed impressive developments in methods to exploit the breadth of available structural data, and Barry Honig (Columbia University) and colleagues describe how one new area, computational systems biology, has emerged as a way to tackle complex functional relationships among proteins (
      • Murray D.
      • Petrey D.
      • Honig B.
      Integrating 3D structural information into systems biology.
      ). They illustrate how computational methods can lead to structure-informed relationships between proteins. These in turn can reveal protein interactomes and provide testable models for genetic screens. In the long run, such methods may lead to novel therapeutic strategies based on linking protein structure space with chemical compound space.
      Knowledge of protein structures is essential to our understanding of health and disease and for drug discovery. In his review, Stephen K. Burley (Rutgers University) discusses how structures are used in many stages of drug development: target validation, druggability, small-molecule binding to drug targets, structure-guided lead optimization, and optimization of pharmacokinetic properties (
      • Burley S.K.
      Impact of structural biologists and the Protein Data Bank on small-molecule drug discovery and development.
      ). He describes quantitative analyses of the impact of PDB structures on the drug approval process, and three case studies for how structure-guided approaches were key in the approvals of small-molecule antineoplastic drugs are provided as illustration.
      This splendid array of scholarly and forward-looking JBC Reviews are a compelling testament to the power of openly accessible data depositories, with the amazing biological discoveries fueled by structural information available in the PDB serving as a premier example. We hope you join us in saluting the structural biology community for its prescience in establishing the PDB a half a century ago and in basking in the beauty and fundamental knowledge structural information has brought to biology. There is more to come in this celebratory collection: stay tuned for part 2!

      Conflict of interest

      The authors declare that they have no conflicts of interest with the contents of this article.

      Funding and additional information

      This work was supported in part by NIH Grant R35 GM118161 to L. M. G. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

      References

        • Protein Data Bank
        Crystallography: Protein Data Bank.
        Nat. New Biol. 1971; 233: 223
        • International Union of Crystallography
        Policy on publication and the deposition of data from crystallographic studies of biological macromolecules.
        Acta Cryst. 1989; A45: 658
        • Berman H.M.
        • Henrick K.
        • Nakamura H.
        Announcing the worldwide Protein Data Bank.
        Nat. Struct. Biol. 2003; 10: 980
        • The Editors
        Instructions to authors.
        J. Biol. Chem. 1989; 264: 663-673
        • Moore P.B.
        The PDB and the ribosome.
        J. Biol. Chem. 2021; 296: 100561
        • Johnson J.E.
        • Olson A.J.
        Icosahedral virus structures and the Protein Data Bank.
        J. Biol. Chem. 2021; 296: 100554
        • Johnson J.E.
        Confessions of an icosahedral virus crystallographer.
        Microscopy. 2013; 62: 69-79
        • Neidle S.
        Beyond the double helix: DNA structural diversity and the PDB.
        J. Biol. Chem. 2021; 296: 100553
        • Westhof E.
        • Leontis N.B.
        An RNA-centric historical narrative around the Protein Data Bank.
        J. Biol. Chem. 2021; 296: 100555
        • Prestegard J.H.
        A perspective on the PDB’s impact on the field of glycobiology.
        J. Biol. Chem. 2021; 296: 100556
        • Li F.
        • Egea P.F.
        • Vecchio A.J.
        • Asial I.
        • Gupta M.
        • Paulino J.
        • Bajaj R.
        • Dickinson M.S.
        • Ferguson-Miller S.
        • Monk B.C.
        • Stroud R.M.
        Highlighting membrane protein structure and function: A celebration of the Protein Data Bank.
        J. Biol. Chem. 2021; 296: 100557
        • Chiu W.
        • Schmid M.F.
        • Pintilie G.D.
        • Lawson C.L.
        Evolution of standardization and dissemination of cryo-EM structures and data jointly by the community, PDB, and EMDB.
        J. Biol. Chem. 2021; 296: 100560
        • Pan X.
        • Kortemme T.
        Recent advances in de novo protein design: Principles, methods, and applications.
        J. Biol. Chem. 2021; 296: 100558
        • Senior A.W.
        • Evans R.
        • Jumper J.
        • Kirkpatrick J.
        • Sifre L.
        • Green T.
        • Qin C.
        • Žídek A.
        • Nelson A.W.R.
        • Bridgland A.
        • Penedones H.
        • Petersen S.
        • Simonyan K.
        • Crossan S.
        • Kohli P.
        • et al.
        Improved protein structure prediction using potentials from deep learning.
        Nature. 2020; 577: 706-710
        • Murray D.
        • Petrey D.
        • Honig B.
        Integrating 3D structural information into systems biology.
        J. Biol. Chem. 2021; 296: 100562
        • Burley S.K.
        Impact of structural biologists and the Protein Data Bank on small-molecule drug discovery and development.
        J. Biol. Chem. 2021; 296: 100559

      Linked Article

      • Beyond the double helix: DNA structural diversity and the PDB
        Journal of Biological ChemistryVol. 296
        • Preview
          The determination of the double helical structure of DNA in 1953 remains the landmark event in the development of modern biological and biomedical science. This structure has also been the starting point for the determination of some 2000 DNA crystal structures in the subsequent 68 years. Their structural diversity has extended to the demonstration of sequence-dependent local structure in duplex DNA, to DNA bending in short and long sequences and in the DNA wound round the nucleosome, and to left-handed duplex DNAs.
        • Full-Text
        • PDF
        Open Access
      • Integrating 3D structural information into systems biology
        Journal of Biological ChemistryVol. 296
        • Preview
          Systems biology is a data-heavy field that focuses on systems-wide depictions of biological phenomena necessarily sacrificing a detailed characterization of individual components. As an example, genome-wide protein interaction networks are widely used in systems biology and continuously extended and refined as new sources of evidence become available. Despite the vast amount of information about individual protein structures and protein complexes that has accumulated in the past 50 years in the Protein Data Bank, the data, computational tools, and language of structural biology are not an integral part of systems biology.
        • Full-Text
        • PDF
        Open Access
      • Impact of structural biologists and the Protein Data Bank on small-molecule drug discovery and development
        Journal of Biological ChemistryVol. 296
        • Preview
          The Protein Data Bank (PDB) is an international core data resource central to fundamental biology, biomedicine, bioenergy, and biotechnology/bioengineering. Now celebrating its 50th anniversary, the PDB houses >175,000 experimentally determined atomic structures of proteins, nucleic acids, and their complexes with one another and small molecules and drugs. The importance of three-dimensional (3D) biostructure information for research and education obtains from the intimate link between molecular form and function evident throughout biology.
        • Full-Text
        • PDF
        Open Access
      • A perspective on the PDB’s impact on the field of glycobiology
        Journal of Biological ChemistryVol. 296
        • Preview
          Structures deposited in the Protein Data Bank (PDB) facilitate our understanding of many biological processes including those that fall under the general category of glycobiology. However, structure-based studies of how glycans affect protein structure, how they are synthesized, and how they regulate other biological processes remain challenging. Despite the abundant presence of glycans on proteins and the dense layers of glycans that surround most of our cells, structures containing glycans are underrepresented in the PDB.
        • Full-Text
        • PDF
        Open Access
      • Recent advances in de novo protein design: Principles, methods, and applications
        Journal of Biological ChemistryVol. 296
        • Preview
          The computational de novo protein design is increasingly applied to address a number of key challenges in biomedicine and biological engineering. Successes in expanding applications are driven by advances in design principles and methods over several decades. Here, we review recent innovations in major aspects of the de novo protein design and include how these advances were informed by principles of protein architecture and interactions derived from the wealth of structures in the Protein Data Bank.
        • Full-Text
        • PDF
        Open Access
      • Icosahedral virus structures and the protein data bank
        Journal of Biological ChemistryVol. 296
        • Preview
          The structural study of icosahedral viruses has a long and impactful history in both crystallographic methodology and molecular biology. The evolution of the Protein Data Bank has paralleled and supported these studies providing readily accessible formats dealing with novel features associated with viral particle symmetries and subunit interactions. This overview describes the growth in size and complexity of icosahedral viruses from the first early studies of small RNA plant viruses and human picornaviruses up to the larger and more complex bacterial phage, insect, and human disease viruses such as Zika, hepatitis B, Adeno and Polyoma virus.
        • Full-Text
        • PDF
        Open Access
      • An RNA-centric historical narrative around the Protein Data Bank
        Journal of Biological ChemistryVol. 296
        • Preview
          Some of the amazing contributions brought to the scientific community by the Protein Data Bank (PDB) are described. The focus is on nucleic acid structures with a bias toward RNA. The evolution and key roles in science of the PDB and other structural databases for nucleic acids illustrate how small initial ideas can become huge and indispensable resources with the unflinching willingness of scientists to cooperate globally. The progress in the understanding of the molecular interactions driving RNA architectures followed the rapid increase in RNA structures in the PDB.
        • Full-Text
        • PDF
        Open Access
      • The PDB and the ribosome
        Journal of Biological ChemistryVol. 296
        • Preview
          This essay, which was written to commemorate the 50th anniversary of the Protein Data Bank, opens with some comments about the intentions of the scientists who pressed for its establishment and the nature of services it provides. It includes a brief account of the events that resulted in the determination of the crystal structure of the large ribosomal subunit from Haloarcula marismortui. The magnitude of the challenge the first ribosome crystal structures posed for the PDB is commented upon, and in the description of subsequent developments in the ribosome structure field that follows, it is pointed out that cryo-EM has replaced X-ray crystallography as the method of choice for investigating ribosome structure.
        • Full-Text
        • PDF
        Open Access
      • Highlighting membrane protein structure and function: A celebration of the Protein Data Bank
        Journal of Biological ChemistryVol. 296
        • Preview
          Biological membranes define the boundaries of cells and compartmentalize the chemical and physical processes required for life. Many biological processes are carried out by proteins embedded in or associated with such membranes. Determination of membrane protein (MP) structures at atomic or near-atomic resolution plays a vital role in elucidating their structural and functional impact in biology. This endeavor has determined 1198 unique MP structures as of early 2021. The value of these structures is expanded greatly by deposition of their three-dimensional (3D) coordinates into the Protein Data Bank (PDB) after the first atomic MP structure was elucidated in 1985.
        • Full-Text
        • PDF
        Open Access
      • Evolution of standardization and dissemination of cryo-EM structures and data jointly by the community, PDB, and EMDB
        Journal of Biological ChemistryVol. 296
        • Preview
          Cryogenic electron microscopy (cryo-EM) methods began to be used in the mid-1970s to study thin and periodic arrays of proteins. Following a half-century of development in cryo-specimen preparation, instrumentation, data collection, data processing, and modeling software, cryo-EM has become a routine method for solving structures from large biological assemblies to small biomolecules at near to true atomic resolution. This review explores the critical roles played by the Protein Data Bank (PDB) and Electron Microscopy Data Bank (EMDB) in partnership with the community to develop the necessary infrastructure to archive cryo-EM maps and associated models.
        • Full-Text
        • PDF
        Open Access