Advertisement

Structural genomics and the Protein Data Bank

  • Karolina Michalska
    Affiliations
    Center for Structural Genomics of Infectious Diseases, University of Chicago, Chicago, Illinois, USA

    Structural Biology Center, X-Ray Science Division, Argonne National Laboratory, Lemont, Illinois, USA
    Search for articles by this author
  • Andrzej Joachimiak
    Correspondence
    For correspondence: Andrzej Joachimiak
    Affiliations
    Center for Structural Genomics of Infectious Diseases, University of Chicago, Chicago, Illinois, USA

    Structural Biology Center, X-Ray Science Division, Argonne National Laboratory, Lemont, Illinois, USA

    Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois, USA
    Search for articles by this author
Open AccessPublished:May 03, 2021DOI:https://doi.org/10.1016/j.jbc.2021.100747
      The field of Structural Genomics arose over the last 3 decades to address a large and rapidly growing divergence between microbial genomic, functional, and structural data. Several international programs took advantage of the vast genomic sequence information and evaluated the feasibility of structure determination for expanded and newly discovered protein families. As a consequence, structural genomics has developed structure-determination pipelines and applied them to a wide range of novel, uncharacterized proteins, often from “microbial dark matter,” and later to proteins from human pathogens. Advances were especially needed in protein production and rapid de novo structure solution. The experimental three-dimensional models were promptly made public, facilitating structure determination of other members of the family and helping to understand their molecular and biochemical functions. Improvements in experimental methods and databases resulted in fast progress in molecular and structural biology. The Protein Data Bank structure repository played a central role in the coordination of structural genomics efforts and the structural biology community as a whole. It facilitated development of standards and validation tools essential for maintaining high quality of deposited structural data.

      Keywords

      Abbreviations:

      Hcp (hemolysin-coregulated protein), MCSG (Midwest Center for Structural Genomics), PSI (Protein Structure Initiative), SG (structural genomics), SGC (Structural Genomics Consortium), TSR (thrombospondin type 1 repeat)
      The concept of Structural Genomics (SG) was born as a result of exponential progress in genome sequencing. The fast growth of DNA sequence information in the 1990s led to the generation of huge amounts of genomic data, which was accompanied by significant knowledge gaps in our understanding of biological roles and biochemical functions encoded in the genomes. Of importance, the sequence information bore little insights about the proteins (often called hypothetical) these newly discovered genes programmed, hampering progress toward functional interpretation. Massive accumulation of genomic and metagenomic sequences posed many questions that could not simply be neglected or ignored. To address these new challenges, the National Institutes of Health, Department of Energy, RIKEN, Gates Foundation, Wellcome Trust, and other numerous government and private agencies around the world funded structural genomics programs as early as 1997 to 2000. Table 1 summarizes the contribution of larger SG programs to determination of protein structures.
      Table 1Top 20 structural genomics programs
      CenterNumber of PDB depositsOrigin and fundingTechniques used
      RIKEN Structural Genomics/Proteomics Initiative2746Japan, government, National Project on Protein Structural and Functional AnalysesNMR, X-ray
      Midwest Center for Structural Genomics1955USA, PSI/NIH/NIGMSX-ray, NMR
      Structural Genomics Consortium1896International/a public–private partnershipX-ray, NMR
      Joint Center for Structural Genomics1601USA, PSI/NIH/NIGMSX-ray, NMR
      Center for Structural Genomics of Infectious Diseases1359USA, NIH/NIAIDX-ray, NMR, cryo-EM
      Seattle Structural Genomics Center for Infectious Disease1355USA, NIH/NIAIDX-ray, NMR, cryo-EM
      Northeast Structural Genomics Consortium1234USA, PSI/NIH/NIGMSX-ray, NMR
      New York SGX Research Center for Structural Genomics1041USA, PSI/NIH/NIGMSX-ray, NMR
      New York Structural Genomics Research Consortium364USA, PSI/NIH/NIGMSX-ray, NMR
      TB Structural Genomics Consortium344International worldwide consortium/VariousX-ray, NMR
      Center for Eukaryotic Structural Genomics219USA, PSI/NIH/NIGMSX-ray, NMR
      Montreal-Kingston Bacterial Structural Genomics Initiative132Canada, Canadian Institutes of Health ResearchX-ray, NMR
      Southeast Collaboratory for Structural Genomics122USA, PSI/NIH/NIGMSX-ray, NMR
      Structural Proteomics in Europe118European UnionX-ray, NMR
      Berkeley Structural Genomics Center101USA, PSI/NIH/NIGMSX-ray
      Enzyme Discovery for Natural Product Biosynthesis91USA, NIHX-ray
      Structural Genomics of Pathogenic Protozoa Consortium73USA, PSI/NIH/NIGMSX-ray, NMR
      New York Consortium on Membrane Protein Structure70USA, PSI/NIH/NIGMSX-ray
      Structure 2 Function Project54USA, PSI/NIH/NIGMSX-ray, NMR
      GPCR Network52USA, PSI/NIH/NIGMSX-ray
      NIAID, National Institute of Allergy and Infectious Diseases; NIGMS, National Institute of General Medical Sciences; NIH, National Institutes of Health; PSI, Protein Structure Initiative.
      The mission of SG programs was to facilitate rapid de novo structure determination for proteins representing new protein families to provide meaningful structural coverage of the genomes (
      • Levitt M.
      Nature of the protein universe.
      ,
      • Stevens R.C.
      • Yokoyama S.
      • Wilson I.A.
      Global efforts in structural genomics.
      ,
      • Tepper J.
      • Nardi G.
      • Sutt H.
      Carcinoma of the pancreas: Review of MGH experience from 1963 to 1973. Analysis of surgical failure and implications for radiation therapy.
      ), with the presumption that eventually it would be possible to generate good-quality three-dimensional models of all proteins (
      • Mizianty M.J.
      • Fan X.
      • Yan J.
      • Chalmers E.
      • Woloschuk C.
      • Joachimiak A.
      • Kurgan L.
      Covering complete proteomes with X-ray structures: A current snapshot.
      ). Such a goal could be achieved by structural characterization of representative members of protein sequence families, followed by homology modeling for the remaining proteins. Selection of protein targets for structural studies has therefore become a crucial component of this effort (
      • Yeats C.
      • Dessailly B.H.
      • Glass E.M.
      • Fremont D.H.
      • Orengo C.A.
      Target selection for structural genomics of infectious diseases.
      ,
      • Pearl F.M.
      • Martin N.
      • Bray J.E.
      • Buchan D.W.
      • Harrison A.P.
      • Lee D.
      • Reeves G.A.
      • Shepherd A.J.
      • Sillitoe I.
      • Todd A.E.
      • Thornton J.M.
      • Orengo C.A.
      A rapid classification protocol for the CATH Domain Database to support structural genomics.
      ,
      • Marsden R.L.
      • Orengo C.A.
      Target selection for structural genomics: An overview.
      ,
      • Marsden R.L.
      • Lewis T.A.
      • Orengo C.A.
      Towards a comprehensive structural coverage of completed genomes: A structural genomics viewpoint.
      ,
      • Levitt M.
      Growth of novel protein structural data.
      ), and it remains important today (
      • Varga J.
      • Dobson L.
      • Remenyi I.
      • Tusnady G.E.
      TSTMP: Target selection for structural genomics of human transmembrane proteins.
      ). The structural biology research was set to undergo a major transformation.
      There were urgent needs and significant challenges to advance technologies for preparation of thousands of proteins and for their structural and functional characterization. The SG programs quickly recognized and attacked deficiencies in protein production and structure solution methods, improved effectiveness and reproducibility of scientific experiments. As a result, in the past 25 years, a number of world-wide structural genomics programs developed high-throughput pipelines for target selection, protein production, characterization, crystallization, and de novo structure determination by synchrotron-based X-ray crystallography and NMR (
      • Graslund S.
      • Nordlund P.
      • Weigelt J.
      • Hallberg B.M.
      • Bray J.
      • Gileadi O.
      • Knapp S.
      • Oppermann U.
      • Arrowsmith C.
      • Hui R.
      • Ming J.
      • dhe-Paganon S.
      • et al.
      Structural Genomics ConsortiumChina Structural Genomics ConsortiumNortheast Structural Genomics Consortium
      Protein production and purification.
      ,
      • Makowska-Grzyska M.
      • Kim Y.
      • Maltseva N.
      • Li H.
      • Zhou M.
      • Joachimiak G.
      • Babnigg G.
      • Joachimiak A.
      Protein production for structural genomics using E. coli expression.
      ,
      • Kim Y.
      • Babnigg G.
      • Jedrzejczak R.
      • Eschenfeldt W.H.
      • Li H.
      • Maltseva N.
      • Hatzos-Skintges C.
      • Gu M.
      • Makowska-Grzyska M.
      • Wu R.
      • An H.
      • Chhor G.
      • Joachimiak A.
      High-throughput protein purification and quality assessment for crystallization.
      ,
      • Minor W.
      • Cymborowski M.
      • Otwinowski Z.
      • Chruszcz M.
      HKL-3000: The integration of data reduction and structure solution--from diffraction images to an initial model in minutes.
      ). These standardized protocols ensured reproducibility of experiments and resulted in higher data quality. The tools developed by the SG consortia that streamlined the gene-to-structure approach significantly benefitted biological and biomedical research, providing insights into novel structural and functional space (
      • Graslund S.
      • Nordlund P.
      • Weigelt J.
      • Hallberg B.M.
      • Bray J.
      • Gileadi O.
      • Knapp S.
      • Oppermann U.
      • Arrowsmith C.
      • Hui R.
      • Ming J.
      • dhe-Paganon S.
      • et al.
      Structural Genomics ConsortiumChina Structural Genomics ConsortiumNortheast Structural Genomics Consortium
      Protein production and purification.
      ,
      • Burley S.K.
      • Joachimiak A.
      • Montelione G.T.
      • Wilson I.A.
      Contributions to the NIH-nigms protein structure initiative from the PSI production centers.
      ,
      • Chance M.R.
      • Bresnick A.R.
      • Burley S.K.
      • Jiang J.S.
      • Lima C.D.
      • Sali A.
      • Almo S.C.
      • Bonanno J.B.
      • Buglino J.A.
      • Boulton S.
      • Chen H.
      • Eswar N.
      • He G.
      • Huang R.
      • Ilyin V.
      • et al.
      Structural genomics: A pipeline for providing structures for the biologist.
      ,
      • Elsliger M.A.
      • Deacon A.M.
      • Godzik A.
      • Lesley S.A.
      • Wooley J.
      • Wuthrich K.
      • Wilson I.A.
      The JCSG high-throughput structural biology pipeline.
      ,
      • Grabowski M.
      • Chruszcz M.
      • Zimmerman M.D.
      • Kirillova O.
      • Minor W.
      Benefits of structural genomics for drug discovery research.
      ,
      • Anderson W.F.
      Structural genomics and drug discovery for infectious diseases.
      ). The advancements resulted in the determination of over 14,000 protein structures worldwide, mostly from unique protein families, and increased structural coverage of the rapidly expanding protein universe. These three-dimensional models based on experimental data were deposited to the macromolecular structure repository, the Protein Data Bank (PDB, (
      • Berman H.M.
      • Westbrook J.
      • Feng Z.
      • Gilliland G.
      • Bhat T.N.
      • Weissig H.
      • Shindyalov I.N.
      • Bourne P.E.
      The Protein Data Bank.
      )), and were made immediately available to the scientific community. Similarly, the advanced technologies that aimed to make structure determination efficient and models more accurate were disseminated broadly and adopted by the biology community. The experimental data generated by the SG centers are freely available to the community and have been utilized by scientists in various fields of research.
      By contributing to structural coverage of thousands of protein families (
      • Lee D.
      • de Beer T.A.
      • Laskowski R.A.
      • Thornton J.M.
      • Orengo C.A.
      1,000 Structures and more from the MCSG.
      ,
      • Grabowski M.
      • Niedzialkowska E.
      • Zimmerman M.D.
      • Minor W.
      The impact of structural genomics: The first quindecennial.
      ), SG programs provided many targets for the Critical Assessment of Techniques for Protein Structure Prediction (CASP) (
      • Kryshtafovych A.
      • Schwede T.
      • Topf M.
      • Fidelis K.
      • Moult J.
      Critical assessment of methods of protein structure prediction (CASP)-round XIII.
      ), a community-wide, biannual experiment to determine the state and progress of protein structure prediction. Characterization of unique structural folds generated training datasets to protein structure prediction algorithms and enormously improved the quality of models in CASP14 (
      • Service R.F.
      'The game has changed.' AI triumphs at protein folding.
      ,
      • Callaway E.
      'It will change everything': DeepMind's AI makes gigantic leap in solving protein structures.
      ), getting closer to a major goal of SG programs of obtaining good-quality three-dimensional models for all proteins.

      Structural genomics programs

      The US structural genomics effort was launched in 2000, when the National Institutes of Health (NIH) funded the pilot phase of the Protein Structure Initiative (PSI) (http://www.nigms.nih.gov/Initiatives/PSI/). The PSI had three phases. In the first phase (PSI-1), nine centers were established focusing on structural genomics studies of a range of model organisms. During this 5-year period, over 1100 protein structures were determined, more than 700 of which were classified as “unique” owing to their low sequence identity (<30%) with other structurally characterized proteins. In the second phase (PSI-2), the number of funded research centers expanded to include four large-scale “production” centers. The goal was to use methods introduced in PSI-1 to determine a large number of proteins and continue development in streamlining the SG pipelines. By the end of PSI-2, the program had delivered to the community over 4800 protein structures; 85% of these were unique. Many of the structures were of proteins of unknown function. The third PSI phase was called PSI:Biology and intended to increase emphasis on the immediate scientific impact of structures. The PSI centers network worked collaboratively with community investigators and applied the established structure determination pipelines to study a broad range of important biological and biomedical problems, such as complexes and membrane proteins. The SG centers formed extensive interaction and collaboration networks (Fig. 1) that were highly impactful. For example, biology partnership between the Midwest Center for Structural Genomics (MCSG) and the Natural Product Biology Partnership resulted in 68 PDB deposits and 38 peer-reviewed publications (see example (
      • Wang N.
      • Rudolf J.D.
      • Dong L.B.
      • Osipiuk J.
      • Hatzos-Skintges C.
      • Endres M.
      • Chang C.Y.
      • Babnigg G.
      • Joachimiak A.
      • Phillips Jr., G.N.
      • Shen B.
      Natural separation of the acyl-CoA ligase reaction results in a non-adenylating enzyme.
      )). Collaboration within smaller partnerships also led to important contributions, sometimes in novel, emerging fields such as bacterial contact-dependent growth inhibition and signaling. One of these structures showed for the first time that fully functional RNase A–like enzymes are present in bacteria (Fig. 2) (
      • Batot G.
      • Michalska K.
      • Ekberg G.
      • Irimpan E.M.
      • Joachimiak G.
      • Jedrzejczak R.
      • Babnigg G.
      • Hayes C.S.
      • Joachimiak A.
      • Goulding C.W.
      The CDI toxin of Yersinia kristensenii is a novel bacterial member of the RNase A superfamily.
      ). By the end of the PSI program, there were more than 9400 structures determined, with the majority of them being unique. Nearly 90% of these were determined by X-ray crystallography, and the rest by NMR (
      • Grabowski M.
      • Niedzialkowska E.
      • Zimmerman M.D.
      • Minor W.
      The impact of structural genomics: The first quindecennial.
      ).
      Figure thumbnail gr1
      Figure 1Structural genomics networks (http://sbkb.org/metrics/). The dots represent community interactions.
      Figure thumbnail gr2
      Figure 2Discovery of a member of RNase A family in bacteria that serves as a toxin in contact-dependent growth inhibition (
      • Batot G.
      • Michalska K.
      • Ekberg G.
      • Irimpan E.M.
      • Joachimiak G.
      • Jedrzejczak R.
      • Babnigg G.
      • Hayes C.S.
      • Joachimiak A.
      • Goulding C.W.
      The CDI toxin of Yersinia kristensenii is a novel bacterial member of the RNase A superfamily.
      ) serves as a good example of structure solved by the Midwest Center for Structural Genomics in partnership with biology community. A, nuclease domain of contact-dependent toxin from Yersinia kristensenii (PDB 5E3E). B, human RNase A angiogenin (PDB 4B36) (
      • Batot G.
      • Michalska K.
      • Ekberg G.
      • Irimpan E.M.
      • Joachimiak G.
      • Jedrzejczak R.
      • Babnigg G.
      • Hayes C.S.
      • Joachimiak A.
      • Goulding C.W.
      The CDI toxin of Yersinia kristensenii is a novel bacterial member of the RNase A superfamily.
      ).
      In parallel to the US effort, there were several other structural genomics programs in Canada, Europe, Japan, and China (the Structural Genomics Consortium [SGC]), Mycobacterium Tuberculosis Structural Proteomics Project, Europe Structural Proteomics in Europe (SPINE) and others, Protein 3000 implemented in the RIKEN Structural Genomics/Proteomics Initiative (RSGI), and international collaborations International TB Structural Genomics Consortium (TBSGC). The TBSGC focused exclusively on functionally characterized proteins and potential drug targets from Mycobacterium tuberculosis.
      In 2007, the National Institute for Allergy and Infectious Diseases started a structural genomics program, Structural Genomics Centers for Infectious Diseases, targeting the emerging and re-emerging (drug-resistant) human pathogens. The program established two centers and emphasized target submissions from the wider biology community. These two centers determined, thus far, over 2700 structures, more than 50% of these structures were community-nominated targets.
      The importance of developing high-throughput methods became very evident when the COVID-19 pandemic emerged, and we needed to obtain structural information about SARS-CoV-2 proteins to assist drug and vaccine development. In striking contrast to SARS-CoV international effort that from 2003 to 2007 generated ~20 structures, since the emergence of the SARS-CoV-2, the scientific community has contributed over 1200 structures (
      • Brzezinski D.
      • Kowiel M.
      • Cooper D.R.
      • Cymborowski M.
      • Grabowski M.
      • Wlodawer A.
      • Dauter Z.
      • Shabalin I.G.
      • Gilski M.
      • Rupp B.
      • Jaskolski M.
      • Minor W.
      Covid-19.bioreproducibility.org: A web resource for SARS-CoV-2-related structural models.
      ), with ~10% of them determined by two Structural Genomics Centers for Infectious Diseases centers. Most of these structures were determined by X-ray crystallography (for example, (
      • Kim Y.
      • Wower J.
      • Maltseva N.
      • Chang C.
      • Jedrzejczak R.
      • Wilamowski M.
      • Kang S.
      • Nicolaescu V.
      • Randall G.
      • Michalska K.
      • Joachimiak A.
      Tipiracil binds to uridine site and inhibits Nsp15 endoribonuclease NendoU from SARS-CoV-2.
      ,
      • Osipiuk J.
      • Azizi S.A.
      • Dvorkin S.
      • Endres M.
      • Jedrzejczak R.
      • Jones K.A.
      • Kang S.
      • Kathayat R.S.
      • Kim Y.
      • Lisnyak V.G.
      • Maki S.L.
      • Nicolaescu V.
      • Taylor C.A.
      • Tesar C.
      • Zhang Y.A.
      • et al.
      Structure of papain-like protease from SARS-CoV-2 and its complexes with non-covalent inhibitors.
      ,
      • Kim Y.
      • Jedrzejczak R.
      • Maltseva N.I.
      • Wilamowski M.
      • Endres M.
      • Godzik A.
      • Michalska K.
      • Joachimiak A.
      Crystal structure of Nsp15 endoribonuclease NendoU from SARS-CoV-2.
      ,
      • Michalska K.
      • Kim Y.
      • Jedrzejczak R.
      • Maltseva N.I.
      • Stols L.
      • Endres M.
      • Joachimiak A.
      Crystal structures of SARS-CoV-2 ADP-ribose phosphatase: From the apo form to ligand complexes.
      ,
      • Walls A.C.
      • Park Y.J.
      • Tortorici M.A.
      • Wall A.
      • McGuire A.T.
      • Veesler D.
      Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein.
      )), but there was very impressive and important contribution from cryo-EM as well (
      • Brzezinski D.
      • Kowiel M.
      • Cooper D.R.
      • Cymborowski M.
      • Grabowski M.
      • Wlodawer A.
      • Dauter Z.
      • Shabalin I.G.
      • Gilski M.
      • Rupp B.
      • Jaskolski M.
      • Minor W.
      Covid-19.bioreproducibility.org: A web resource for SARS-CoV-2-related structural models.
      ,
      • Walls A.C.
      • Park Y.J.
      • Tortorici M.A.
      • Wall A.
      • McGuire A.T.
      • Veesler D.
      Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein.
      ,
      • Mariano G.
      • Farthing R.J.
      • Lale-Farjat S.L.M.
      • Bergeron J.R.C.
      Structural characterization of SARS-CoV-2: Where we are, and where we need to be.
      ).

      Highlights of the SG accomplishments

      SG programs produced a number of high-profile results in collaboration with the biology community. Here we show several examples from PSI centers. The MCSG determined several structures of hemolysin-coregulated protein (Hcp). These proteins are highly conserved among Gram-negative proteobacteria and were suspected to be part of the type VI secretion apparatus. They shared little sequence homology with proteins of known structure. In an effort to gain insight into the function of these proteins, the crystal structure of Hsp1 from Pseudomonas aeruginosa was determined (Fig. 3). This Hcp1 protein formed hexameric rings that can stack and create a wide channel used for protein secretion (
      • Mougous J.D.
      • Cuff M.E.
      • Raunser S.
      • Shen A.
      • Zhou M.
      • Gifford C.A.
      • Goodman A.L.
      • Joachimiak G.
      • Ordonez C.L.
      • Lory S.
      • Walz T.
      • Joachimiak A.
      • Mekalanos J.J.
      A virulence locus of Pseudomonas aeruginosa encodes a protein secretion apparatus.
      ). Later, the MCSG determined a structure of Hsp3, a low-sequence identity Hsp1 paralog from P. aeruginosa that shows a very similar architecture (
      • Osipiuk J.
      • Xu X.
      • Cui H.
      • Savchenko A.
      • Edwards A.
      • Joachimiak A.
      Crystal structure of secretory protein Hcp3 from Pseudomonas aeruginosa.
      ). Joint Center for Structural Genomics combined structures available in the PDB (several of which were determined by PSI centers) with homology models and, for the first time, generated a three-dimensional reconstruction of metabolic networks in the bacterium Thermatoga maritima (Fig. 4) (
      • Zhang Y.
      • Thiele I.
      • Weekes D.
      • Li Z.
      • Jaroszewski L.
      • Ginalski K.
      • Deacon A.M.
      • Wooley J.
      • Lesley S.A.
      • Wilson I.A.
      • Palsson B.
      • Osterman A.
      • Godzik A.
      Three-dimensional structural view of the central metabolic network of Thermotoga maritima.
      ). The Joint Center for Structural Genomics has showed that one can integrate structural data with networks analysis to inform about functions, mechanisms, and evolution of cellular systems. Another PSI center, the New York SGX Research Center for Structural Genomics, systematically studied structures of protein phosphatases from human and biomedically relevant pathogens, including Toxoplasma gondii, Trypanosoma brucei, and Anopheles gambiae. These enzymes are important drug targets, and their crystal structures provide insights into regulation, signaling, and development processes. Together with the contributions from other SG consortia, it allowed to build a database and materials repository for structure-guided experimental and computational drug discovery for protein phosphatases (
      • Almo S.C.
      • Bonanno J.B.
      • Sauder J.M.
      • Emtage S.
      • Dilorenzo T.P.
      • Malashkevich V.
      • Wasserman S.R.
      • Swaminathan S.
      • Eswaramoorthy S.
      • Agarwal R.
      • Kumaran D.
      • Madegowda M.
      • Ragumani S.
      • Patskovsky Y.
      • Alvarado J.
      • et al.
      Structural genomics of protein phosphatases.
      ). Northeast Structural Genomics Consortium funded by PSI contributed important data to understand the rules of protein structures and helped developing tools for protein design (
      • Koga N.
      • Tatsumi-Koga R.
      • Liu G.
      • Xiao R.
      • Acton T.B.
      • Montelione G.T.
      • Baker D.
      Principles for designing ideal protein structures.
      ). These rules relate secondary structural patterns to protein tertiary motifs (Fig. 5). Based on these guidelines it was possible to engineer a stable, funnel-shaped protein fold. The SG programs determined many novel structures including those with new folds. One example is shown in Figure 6 (
      • Tan K.
      • Duquette M.
      • Liu J.H.
      • Dong Y.
      • Zhang R.
      • Joachimiak A.
      • Lawler J.
      • Wang J.H.
      Crystal structure of the TSP-1 type 1 repeats: A novel layered fold and its biological implication.
      ). Thrombospondin type 1 repeats (TSRs) showed a novel, antiparallel, three-stranded fold that consists of alternating stacked layers of tryptophan and arginine residues and is capped with disulfide bonds on each end. The structure of the TSR domain provides insight into structural and functional studies of the TSR superfamily. TSRs play a role in mediating cell attachment, glycosaminoglycan binding, and inhibition of angiogenesis and matrix metalloproteinases.
      Figure thumbnail gr3
      Figure 3Structure of Hcp1 protein. Hsp1 forms a hexameric ring with a large internal diameter. A, Ribbon representation of the Hcp1 monomer colored by secondary structure: b strands, red; a helices, blue; and loops, green. B, Top view of a ribbon representation of the crystallographic Hcp1 hexamer. The individual subunits are colored differently to highlight their organization. C, edge-on view of the Hcp1 hexamer shown in (B). D, electron microscopy and single-particle analysis of Hcp1. Electron micrograph of Hcp1 negatively stained with 0.75% (w/v) uranyl formate. Scale bar, 100 nm. Inset, Left, representative class averages and (right) the same averages after 6-fold symmetrization. Inset scale bar, 10 nm. E, sequence conservation analysis of Hcp1. An alignment of 107 Hcp proteins in 43 Gram-negative bacteria was used to plot the relative degree of conservation at each amino acid on the surface of Hcp1. Conservation is indicated by color, where red residues are highly conserved and white residues are poorly conserved. Figure from (
      • Mougous J.D.
      • Cuff M.E.
      • Raunser S.
      • Shen A.
      • Zhou M.
      • Gifford C.A.
      • Goodman A.L.
      • Joachimiak G.
      • Ordonez C.L.
      • Lory S.
      • Walz T.
      • Joachimiak A.
      • Mekalanos J.J.
      A virulence locus of Pseudomonas aeruginosa encodes a protein secretion apparatus.
      ).
      Figure thumbnail gr4
      Figure 4Combining metabolic reconstruction and structural genomics approaches for an integrated annotation of the T. maritima central metabolic network. Underlying genomics information (bottom) enabled both a metabolic reconstruction (left subpanel) and an atomic-level structure determination/modeling of all T. maritima proteins (right subpanel). Integration of these two approaches enabled detailed information to be acquired for every reaction in the network (upper subpanel); an example from the T. maritima serine degradation pathway is illustrated. Figure taken from (
      • Zhang Y.
      • Thiele I.
      • Weekes D.
      • Li Z.
      • Jaroszewski L.
      • Ginalski K.
      • Deacon A.M.
      • Wooley J.
      • Lesley S.A.
      • Wilson I.A.
      • Palsson B.
      • Osterman A.
      • Godzik A.
      Three-dimensional structural view of the central metabolic network of Thermotoga maritima.
      ).
      Figure thumbnail gr5
      Figure 5Fundamental rules of designing proteins relating local backbone structures to favorable tertiary motifs. Left, ββ-rule, the chirality of β-hairpins is determined by the length of the connecting loop. The chirality is defined on the basis of the pleat of the strand residue preceding or following the connecting loop. Middle, βα-rule, the helix direction is determined by the pleat direction of the last strand residue and the length of the connecting loop. Right, αβ-rule, the pleat of the first strand residue points away from the helix (
      • Koga N.
      • Tatsumi-Koga R.
      • Liu G.
      • Xiao R.
      • Acton T.B.
      • Montelione G.T.
      • Baker D.
      Principles for designing ideal protein structures.
      ). Figure provided by Dr Nobuyasu Koga (Institute of Molecular Science, Japan).
      Figure thumbnail gr6
      Figure 6CWR-layered core structure of the TSR domain. A, a stereoview of C, W, and R layers in TSR2 of TSP-1. Displayed residues that are directly involved in forming the layered structure are drawn in ball and stick representation with salt bridges, and hydrogen bonds drawn as dashed lines. The big jar handle motif, which is associated with the first W layer is highlighted in pink. B, a schematic drawing of the CWR-layered structure with each layer and layer-forming residue(s) labeled. The residue Glu459 that is marked with an asterisk forms a hydrogen bond between its main chain carbonyl group and the side chain of Arg442 in the R1 layer. The three antiparallel strands are drawn in lines schematically with arrowheads indicating their polarities. The three bulges associated with the rippled strand A and the big jar handle are also shown. Figure taken from (
      • Tan K.
      • Duquette M.
      • Liu J.H.
      • Dong Y.
      • Zhang R.
      • Joachimiak A.
      • Lawler J.
      • Wang J.H.
      Crystal structure of the TSP-1 type 1 repeats: A novel layered fold and its biological implication.
      ). TSR, thrombospondin type 1 repeat.

      Databases and repositories

      During the initial trial period it was shown that it is possible to establish high-throughput semiautomated production pipelines and generate large number of proteins in quantities suitable for structural studies. It also became clear that the success rate of these pipelines was not very high, exposing the necessity to collect all generated information and analyze the data to improve target selection, technologies, and protocols (
      • Gifford L.K.
      • Carter L.G.
      • Gabanyi M.J.
      • Berman H.M.
      • Adams P.D.
      The protein structure initiative structural biology knowledgebase technology portal: A structural biology web resource.
      ). Therefore, software and database developments were necessary to handle high-throughput structure determination workflows and, overall, they have led to production of better proteins for structural biology, structures of higher quality and improved integrity of the associated data. To further disseminate structural genomics materials, the Material Repository (PSI-MR) (
      • Seiler C.Y.
      • Park J.G.
      • Sharma A.
      • Hunter P.
      • Surapaneni P.
      • Sedillo C.
      • Field J.
      • Algar R.
      • Price A.
      • Steel J.
      • Throop A.
      • Fiacco M.
      • LaBaer J.
      DNASU plasmid and PSI:Biology-Materials repositories: Resources to accelerate biological research.
      ) was created to store and distribute biological reagents, primarily expression clones at low cost.
      Databases were developed to track trials and improve effectiveness and reproducibility of experiments. These were first created as local resources that later were combined into centralized databases (
      • Grabowski M.
      • Niedzialkowska E.
      • Zimmerman M.D.
      • Minor W.
      The impact of structural genomics: The first quindecennial.
      ,
      • Berman H.M.
      • Bhat T.N.
      • Bourne P.E.
      • Feng Z.
      • Gilliland G.
      • Weissig H.
      • Westbrook J.
      The Protein Data Bank and the challenge of structural genomics.
      ), with the final coordinates and structure factors files reaching to the PDB. SG-created resources included Target Registration Database (TargetDB) (
      • Chen L.
      • Oughtred R.
      • Berman H.M.
      • Westbrook J.
      TargetDB: A target registration database for structural genomics projects.
      ,
      • Westbrook J.
      • Feng Z.
      • Chen L.
      • Yang H.
      • Berman H.M.
      The Protein Data Bank and structural genomics.
      ) and PepcDB (Protein Expression Purification and Crystallization Data Base; (
      • Kouranov A.
      • Xie L.
      • de la Cruz J.
      • Chen L.
      • Westbrook J.
      • Bourne P.E.
      • Berman H.M.
      The RCSB PDB information portal for structural genomics.
      )), which were eventually merged in the TargetTrack knowledgebase (
      • Berman H.M.
      • Westbrook J.D.
      • Gabanyi M.J.
      • Tao W.
      • Shah R.
      • Kouranov A.
      • Schwede T.
      • Arnold K.
      • Kiefer F.
      • Bordoli L.
      • Kopp J.
      • Podvinec M.
      • Adams P.D.
      • Carter L.G.
      • Minor W.
      • et al.
      The protein structure initiative structural genomics knowledgebase.
      ) and Structural Biology Knowledgebase (
      • Gifford L.K.
      • Carter L.G.
      • Gabanyi M.J.
      • Berman H.M.
      • Adams P.D.
      The protein structure initiative structural biology knowledgebase technology portal: A structural biology web resource.
      ,
      • Gabanyi M.J.
      • Adams P.D.
      • Arnold K.
      • Bordoli L.
      • Carter L.G.
      • Flippen-Andersen J.
      • Gifford L.
      • Haas J.
      • Kouranov A.
      • McLaughlin W.A.
      • Micallef D.I.
      • Minor W.
      • Shah R.
      • Schwede T.
      • Tao Y.P.
      • et al.
      The structural biology knowledgebase: A portal to protein structures, sequences, functions, and methods.
      ). These databases exposed limitations of existing resources; for example, files deposited to the PDB were missing important information about projects because including these data in deposition was optional. Clearly, the SG structures presented new challenges to the PDB (
      • Berman H.M.
      • Westbrook J.D.
      The impact of structural genomics on the protein data bank.
      ). These programs were also very different because of the National Institutes of Health requirements to make all generated data available to the community. The original guidelines for deposition were established in 1989 as part of the International Union for Crystallography initiative. Validation standards were later set as part of a wwPDB project in which Task Forces made recommendations and the wwPDB implemented them (
      • Berman H.M.
      • Kleywegt G.J.
      • Nakamura H.
      • Markley J.L.
      How community has shaped the Protein Data Bank.
      ,
      • Bluhm W.F.
      • Beran B.
      • Bi C.
      • Dimitropoulos D.
      • Prlic A.
      • Quinn G.B.
      • Rose P.W.
      • Shah C.
      • Young J.
      • Yukich B.
      • Berman H.M.
      • Bourne P.E.
      Quality assurance for the query and distribution systems of the RCSB Protein Data Bank.
      ,
      • Gore S.
      • Sanz Garcia E.
      • Hendrickx P.M.S.
      • Gutmanas A.
      • Westbrook J.D.
      • Yang H.
      • Feng Z.
      • Baskaran K.
      • Berrisford J.M.
      • Hudson B.P.
      • Ikegawa Y.
      • Kobayashi N.
      • Lawson C.L.
      • Mading S.
      • Mak L.
      • et al.
      Validation of structures in the Protein Data Bank.
      ). The SG programs and biology community worked together with the PDB to facilitate the rapid deposition of data and track the progress of the work. At the same time, the American Crystallography Association created committees to formulate guidelines for structure deposition. In a series of workshops and extensive discussions, standards were established for X-ray crystallography deposits and later for NMR and cryo-EM structures as well (
      • Bhattacharya A.
      • Tejero R.
      • Montelione G.T.
      Evaluating protein structures determined by structural genomics consortia.
      ,
      • Davis I.W.
      • Leaver-Fay A.
      • Chen V.B.
      • Block J.N.
      • Kapral G.J.
      • Wang X.
      • Murray L.W.
      • Arendall 3rd, W.B.
      • Snoeyink J.
      • Richardson J.S.
      • Richardson D.C.
      MolProbity: All-atom contacts and structure validation for proteins and nucleic acids.
      ,
      • Yang H.
      • Guranovic V.
      • Dutta S.
      • Feng Z.
      • Berman H.M.
      • Westbrook J.D.
      Automated and accurate deposition of structures solved by X-ray diffraction to the Protein Data Bank.
      ,
      • Ludtke S.J.
      • Lawson C.L.
      • Kleywegt G.J.
      • Berman H.M.
      • Chiu W.
      Workshop on the validation and modeling of electron cryo-microscopy structures of biological nanomachines.
      ,
      • Chen V.B.
      • Wedell J.R.
      • Wenger R.K.
      • Ulrich E.L.
      • Markley J.L.
      MolProbity for the masses-of data.
      ). A set of PDB deposition guidelines was published and subsequently adopted by funding agencies and scientific journals (
      • Gore S.
      • Sanz Garcia E.
      • Hendrickx P.M.S.
      • Gutmanas A.
      • Westbrook J.D.
      • Yang H.
      • Feng Z.
      • Baskaran K.
      • Berrisford J.M.
      • Hudson B.P.
      • Ikegawa Y.
      • Kobayashi N.
      • Lawson C.L.
      • Mading S.
      • Mak L.
      • et al.
      Validation of structures in the Protein Data Bank.
      ). Today, they are broadly implemented and serve as an example to the entire scientific community. Structural genomic programs monitored structure quality, which resulted in overall improvement of deposited structures. The growth of the PDB was incredible. Between 2001, when the first SG structures were deposited, and 2016 when the majority of SG structures were completed, the PDB deposits increased from 2814/year to 10,819/year, or 3.84 times, with SG programs contributing significant fraction of unique structures.

      Current status and future outlook

      Today the PDB offers online tools, summary reports, protein sequence information and redundancy, other data associated with protein structure determination, and links to homology models (
      • Kouranov A.
      • Xie L.
      • de la Cruz J.
      • Chen L.
      • Westbrook J.
      • Bourne P.E.
      • Berman H.M.
      The RCSB PDB information portal for structural genomics.
      ). Functional coverage can be examined according to enzyme classification, gene ontology (biological process, cell component, and molecular function), and disease (
      • Sillitoe I.
      • Bordin N.
      • Dawson N.
      • Waman V.P.
      • Ashford P.
      • Scholes H.M.
      • Pang C.S.M.
      • Woodridge L.
      • Rauer C.
      • Sen N.
      • Abbasian M.
      • Le Cornu S.
      • Lam S.D.
      • Berka K.
      • Varekova I.H.
      • et al.
      CATH: Increased structural coverage of functional space.
      ).
      Structural genomics projects propelled technology development and helped to disseminate it through the biology community. Structure solution using X-ray diffraction at light sources was never simpler. The tools developed for structure validation help to rapidly identify potential issues and guide improvement of structural models. The PDB has become a fully integrated, single global repository of experimentally determined 3D structures of biological macromolecules and their complexes, which the community can access and analyze the structural data (
      • Burley S.K.
      • Berman H.M.
      • Kleywegt G.J.
      • Markley J.L.
      • Nakamura H.
      • Velankar S.
      Protein Data Bank (PDB): The single global macromolecular structure archive.
      ,
      • Berman H.M.
      • Vallat B.
      • Lawson C.L.
      The data universe of structural biology.
      ). Archives for homology models (
      • Studer G.
      • Tauriello G.
      • Bienert S.
      • Biasini M.
      • Johner N.
      • Schwede T.
      ProMod3-A versatile homology modelling toolbox.
      ) and integrative/hybrid structures are available (
      • Burley S.K.
      • Kurisu G.
      • Markley J.L.
      • Nakamura H.
      • Velankar S.
      • Berman H.M.
      • Sali A.
      • Schwede T.
      • Trewhella J.
      PDB-dev: A prototype system for depositing integrative/hybrid structural models.
      ). Raw data can be deposited into versatile servers (
      • Grabowski M.
      • Cymborowski M.
      • Porebski P.J.
      • Osinski T.
      • Shabalin I.G.
      • Cooper D.R.
      • Minor W.
      The integrated resource for reproducibility in macromolecular crystallography: Experiences of the first four years.
      ,
      • Grabowski M.
      • Langner K.M.
      • Cymborowski M.
      • Porebski P.J.
      • Sroka P.
      • Zheng H.
      • Cooper D.R.
      • Zimmerman M.D.
      • Elsliger M.A.
      • Burley S.K.
      • Minor W.
      A public database of macromolecular diffraction experiments.
      ), although challenges remain as the amount of data increases exponentially with serial crystallography experiments collected at FELs and other light sources (
      • Ponsard R.
      • Janvier N.
      • Kieffer J.
      • Houzet D.
      • Fristot V.
      RDMA data transfer and GPU acceleration methods for high-throughput online processing of serial crystallography images.
      ). There are ongoing discussions to better integrate with other databases and new community resources, especially in support of drug discovery (
      • Adams P.D.
      • Aertgeerts K.
      • Bauer C.
      • Bell J.A.
      • Berman H.M.
      • Bhat T.N.
      • Blaney J.M.
      • Bolton E.
      • Bricogne G.
      • Brown D.
      • Burley S.K.
      • Case D.A.
      • Clark K.L.
      • Darden T.
      • Emsley P.
      • et al.
      Outcome of the first wwPDB/CCDC/D3R ligand validation workshop.
      ), rapidly expanding cryo-EM data (
      • Lawson C.L.
      Unified data resource for cryo-EM.
      ), deep learning models (
      • Zaucha J.
      • Softley C.A.
      • Sattler M.
      • Frishman D.
      • Popowicz G.M.
      Deep learning model predicts water interaction sites on the surface of proteins using limited-resolution data.
      ), as well as Department of Energy funded Systems Biology Knowledgebase, KBase (
      • Arkin A.P.
      • Cottingham R.W.
      • Henry C.S.
      • Harris N.L.
      • Stevens R.L.
      • Maslov S.
      • Dehal P.
      • Ware D.
      • Perez F.
      • Canon S.
      • Sneddon M.W.
      • Henderson M.L.
      • Riehl W.J.
      • Murphy-Olson D.
      • Chan S.Y.
      • et al.
      KBase: The United States Department of Energy systems biology knowledgebase.
      ) and others.

      Dedications

      Dedicated to Professor Wladek Minor on the occasion of his 75th birthday.

      Conflict of interest

      The authors declare that they have no conflicts of interest with the contents of this article.

      Acknowledgments

      Funding for this project was provided by federal funds from the National Institute of Allergy and Infectious Diseases , National Institutes of Health , Department of Health and Human Services , under Contract No. HHSN272201700060C and in part by the US Department of Energy (DOE) Office of Science and operated for the DOE Office of Science by Argonne National Laboratory under Contract No. DE-AC02-06CH11357 . The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

      Author contributions

      A.J. conceived, wrote, and edited the manuscript, and K.M. wrote and edited the manuscript.

      Funding and additional information

      The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory (“Argonne”). Argonne, a US Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. The US Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government.

      References

        • Levitt M.
        Nature of the protein universe.
        Proc. Natl. Acad. Sci. U. S. A. 2009; 106: 11079-11084
        • Stevens R.C.
        • Yokoyama S.
        • Wilson I.A.
        Global efforts in structural genomics.
        Science. 2001; 294: 89-92
        • Tepper J.
        • Nardi G.
        • Sutt H.
        Carcinoma of the pancreas: Review of MGH experience from 1963 to 1973. Analysis of surgical failure and implications for radiation therapy.
        Cancer. 1976; 37: 1519-1524
        • Mizianty M.J.
        • Fan X.
        • Yan J.
        • Chalmers E.
        • Woloschuk C.
        • Joachimiak A.
        • Kurgan L.
        Covering complete proteomes with X-ray structures: A current snapshot.
        Acta Crystallogr. D Biol. Crystallogr. 2014; 70: 2781-2793
        • Yeats C.
        • Dessailly B.H.
        • Glass E.M.
        • Fremont D.H.
        • Orengo C.A.
        Target selection for structural genomics of infectious diseases.
        Methods Mol. Biol. 2014; 1140: 35-51
        • Pearl F.M.
        • Martin N.
        • Bray J.E.
        • Buchan D.W.
        • Harrison A.P.
        • Lee D.
        • Reeves G.A.
        • Shepherd A.J.
        • Sillitoe I.
        • Todd A.E.
        • Thornton J.M.
        • Orengo C.A.
        A rapid classification protocol for the CATH Domain Database to support structural genomics.
        Nucleic Acids Res. 2001; 29: 223-227
        • Marsden R.L.
        • Orengo C.A.
        Target selection for structural genomics: An overview.
        Methods Mol. Biol. 2008; 426: 3-25
        • Marsden R.L.
        • Lewis T.A.
        • Orengo C.A.
        Towards a comprehensive structural coverage of completed genomes: A structural genomics viewpoint.
        BMC Bioinformatics. 2007; 8: 86
        • Levitt M.
        Growth of novel protein structural data.
        Proc. Natl. Acad. Sci. U. S. A. 2007; 104: 3183-3188
        • Varga J.
        • Dobson L.
        • Remenyi I.
        • Tusnady G.E.
        TSTMP: Target selection for structural genomics of human transmembrane proteins.
        Nucleic Acids Res. 2017; 45: D325-D330
        • Graslund S.
        • Nordlund P.
        • Weigelt J.
        • Hallberg B.M.
        • Bray J.
        • Gileadi O.
        • Knapp S.
        • Oppermann U.
        • Arrowsmith C.
        • Hui R.
        • Ming J.
        • dhe-Paganon S.
        • et al.
        • Structural Genomics Consortium
        • China Structural Genomics Consortium
        • Northeast Structural Genomics Consortium
        Protein production and purification.
        Nat. Methods. 2008; 5: 135-146
        • Makowska-Grzyska M.
        • Kim Y.
        • Maltseva N.
        • Li H.
        • Zhou M.
        • Joachimiak G.
        • Babnigg G.
        • Joachimiak A.
        Protein production for structural genomics using E. coli expression.
        Methods Mol. Biol. 2014; 1140: 89-105
        • Kim Y.
        • Babnigg G.
        • Jedrzejczak R.
        • Eschenfeldt W.H.
        • Li H.
        • Maltseva N.
        • Hatzos-Skintges C.
        • Gu M.
        • Makowska-Grzyska M.
        • Wu R.
        • An H.
        • Chhor G.
        • Joachimiak A.
        High-throughput protein purification and quality assessment for crystallization.
        Methods. 2011; 55: 12-28
        • Minor W.
        • Cymborowski M.
        • Otwinowski Z.
        • Chruszcz M.
        HKL-3000: The integration of data reduction and structure solution--from diffraction images to an initial model in minutes.
        Acta Crystallogr. D Biol. Crystallogr. 2006; 62: 859-866
        • Burley S.K.
        • Joachimiak A.
        • Montelione G.T.
        • Wilson I.A.
        Contributions to the NIH-nigms protein structure initiative from the PSI production centers.
        Structure. 2008; 16: 5-11
        • Chance M.R.
        • Bresnick A.R.
        • Burley S.K.
        • Jiang J.S.
        • Lima C.D.
        • Sali A.
        • Almo S.C.
        • Bonanno J.B.
        • Buglino J.A.
        • Boulton S.
        • Chen H.
        • Eswar N.
        • He G.
        • Huang R.
        • Ilyin V.
        • et al.
        Structural genomics: A pipeline for providing structures for the biologist.
        Protein Sci. 2002; 11: 723-738
        • Elsliger M.A.
        • Deacon A.M.
        • Godzik A.
        • Lesley S.A.
        • Wooley J.
        • Wuthrich K.
        • Wilson I.A.
        The JCSG high-throughput structural biology pipeline.
        Acta Crystallogr. Sect. F Struct. Biol. Cryst. Commun. 2010; 66: 1137-1142
        • Grabowski M.
        • Chruszcz M.
        • Zimmerman M.D.
        • Kirillova O.
        • Minor W.
        Benefits of structural genomics for drug discovery research.
        Infect. Disord. Drug Targets. 2009; 9: 459-474
        • Anderson W.F.
        Structural genomics and drug discovery for infectious diseases.
        Infect. Disord. Drug Targets. 2009; 9: 507-517
        • Berman H.M.
        • Westbrook J.
        • Feng Z.
        • Gilliland G.
        • Bhat T.N.
        • Weissig H.
        • Shindyalov I.N.
        • Bourne P.E.
        The Protein Data Bank.
        Nucleic Acids Res. 2000; 28: 235-242
        • Lee D.
        • de Beer T.A.
        • Laskowski R.A.
        • Thornton J.M.
        • Orengo C.A.
        1,000 Structures and more from the MCSG.
        BMC Struct. Biol. 2011; 11: 2
        • Grabowski M.
        • Niedzialkowska E.
        • Zimmerman M.D.
        • Minor W.
        The impact of structural genomics: The first quindecennial.
        J. Struct. Funct. Genomics. 2016; 17: 1-16
        • Kryshtafovych A.
        • Schwede T.
        • Topf M.
        • Fidelis K.
        • Moult J.
        Critical assessment of methods of protein structure prediction (CASP)-round XIII.
        Proteins. 2019; 87: 1011-1020
        • Service R.F.
        'The game has changed.' AI triumphs at protein folding.
        Science. 2020; 370: 1144-1145
        • Callaway E.
        'It will change everything': DeepMind's AI makes gigantic leap in solving protein structures.
        Nature. 2020; 588: 203-204
        • Wang N.
        • Rudolf J.D.
        • Dong L.B.
        • Osipiuk J.
        • Hatzos-Skintges C.
        • Endres M.
        • Chang C.Y.
        • Babnigg G.
        • Joachimiak A.
        • Phillips Jr., G.N.
        • Shen B.
        Natural separation of the acyl-CoA ligase reaction results in a non-adenylating enzyme.
        Nat. Chem. Biol. 2018; 14: 730-737
        • Batot G.
        • Michalska K.
        • Ekberg G.
        • Irimpan E.M.
        • Joachimiak G.
        • Jedrzejczak R.
        • Babnigg G.
        • Hayes C.S.
        • Joachimiak A.
        • Goulding C.W.
        The CDI toxin of Yersinia kristensenii is a novel bacterial member of the RNase A superfamily.
        Nucleic Acids Res. 2017; 45: 5013-5025
        • Brzezinski D.
        • Kowiel M.
        • Cooper D.R.
        • Cymborowski M.
        • Grabowski M.
        • Wlodawer A.
        • Dauter Z.
        • Shabalin I.G.
        • Gilski M.
        • Rupp B.
        • Jaskolski M.
        • Minor W.
        Covid-19.bioreproducibility.org: A web resource for SARS-CoV-2-related structural models.
        Protein Sci. 2021; 30: 115-124
        • Kim Y.
        • Wower J.
        • Maltseva N.
        • Chang C.
        • Jedrzejczak R.
        • Wilamowski M.
        • Kang S.
        • Nicolaescu V.
        • Randall G.
        • Michalska K.
        • Joachimiak A.
        Tipiracil binds to uridine site and inhibits Nsp15 endoribonuclease NendoU from SARS-CoV-2.
        Commun. Biol. 2021; 4: 193
        • Osipiuk J.
        • Azizi S.A.
        • Dvorkin S.
        • Endres M.
        • Jedrzejczak R.
        • Jones K.A.
        • Kang S.
        • Kathayat R.S.
        • Kim Y.
        • Lisnyak V.G.
        • Maki S.L.
        • Nicolaescu V.
        • Taylor C.A.
        • Tesar C.
        • Zhang Y.A.
        • et al.
        Structure of papain-like protease from SARS-CoV-2 and its complexes with non-covalent inhibitors.
        Nat. Commun. 2021; 12: 743
        • Kim Y.
        • Jedrzejczak R.
        • Maltseva N.I.
        • Wilamowski M.
        • Endres M.
        • Godzik A.
        • Michalska K.
        • Joachimiak A.
        Crystal structure of Nsp15 endoribonuclease NendoU from SARS-CoV-2.
        Protein Sci. 2020; 29: 1596-1605
        • Michalska K.
        • Kim Y.
        • Jedrzejczak R.
        • Maltseva N.I.
        • Stols L.
        • Endres M.
        • Joachimiak A.
        Crystal structures of SARS-CoV-2 ADP-ribose phosphatase: From the apo form to ligand complexes.
        IUCrJ. 2020; 7: 814-824
        • Walls A.C.
        • Park Y.J.
        • Tortorici M.A.
        • Wall A.
        • McGuire A.T.
        • Veesler D.
        Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein.
        Cell. 2020; 181: 281-292.e286
        • Mariano G.
        • Farthing R.J.
        • Lale-Farjat S.L.M.
        • Bergeron J.R.C.
        Structural characterization of SARS-CoV-2: Where we are, and where we need to be.
        Front. Mol. Biosci. 2020; 7: 605236
        • Mougous J.D.
        • Cuff M.E.
        • Raunser S.
        • Shen A.
        • Zhou M.
        • Gifford C.A.
        • Goodman A.L.
        • Joachimiak G.
        • Ordonez C.L.
        • Lory S.
        • Walz T.
        • Joachimiak A.
        • Mekalanos J.J.
        A virulence locus of Pseudomonas aeruginosa encodes a protein secretion apparatus.
        Science. 2006; 312: 1526-1530
        • Osipiuk J.
        • Xu X.
        • Cui H.
        • Savchenko A.
        • Edwards A.
        • Joachimiak A.
        Crystal structure of secretory protein Hcp3 from Pseudomonas aeruginosa.
        J. Struct. Funct. Genomics. 2011; 12: 21-26
        • Zhang Y.
        • Thiele I.
        • Weekes D.
        • Li Z.
        • Jaroszewski L.
        • Ginalski K.
        • Deacon A.M.
        • Wooley J.
        • Lesley S.A.
        • Wilson I.A.
        • Palsson B.
        • Osterman A.
        • Godzik A.
        Three-dimensional structural view of the central metabolic network of Thermotoga maritima.
        Science. 2009; 325: 1544-1549
        • Almo S.C.
        • Bonanno J.B.
        • Sauder J.M.
        • Emtage S.
        • Dilorenzo T.P.
        • Malashkevich V.
        • Wasserman S.R.
        • Swaminathan S.
        • Eswaramoorthy S.
        • Agarwal R.
        • Kumaran D.
        • Madegowda M.
        • Ragumani S.
        • Patskovsky Y.
        • Alvarado J.
        • et al.
        Structural genomics of protein phosphatases.
        J. Struct. Funct. Genomics. 2007; 8: 121-140
        • Koga N.
        • Tatsumi-Koga R.
        • Liu G.
        • Xiao R.
        • Acton T.B.
        • Montelione G.T.
        • Baker D.
        Principles for designing ideal protein structures.
        Nature. 2012; 491: 222-227
        • Tan K.
        • Duquette M.
        • Liu J.H.
        • Dong Y.
        • Zhang R.
        • Joachimiak A.
        • Lawler J.
        • Wang J.H.
        Crystal structure of the TSP-1 type 1 repeats: A novel layered fold and its biological implication.
        J. Cell Biol. 2002; 159: 373-382
        • Gifford L.K.
        • Carter L.G.
        • Gabanyi M.J.
        • Berman H.M.
        • Adams P.D.
        The protein structure initiative structural biology knowledgebase technology portal: A structural biology web resource.
        J. Struct. Funct. Genomics. 2012; 13: 57-62
        • Seiler C.Y.
        • Park J.G.
        • Sharma A.
        • Hunter P.
        • Surapaneni P.
        • Sedillo C.
        • Field J.
        • Algar R.
        • Price A.
        • Steel J.
        • Throop A.
        • Fiacco M.
        • LaBaer J.
        DNASU plasmid and PSI:Biology-Materials repositories: Resources to accelerate biological research.
        Nucleic Acids Res. 2014; 42: D1253-1260
        • Berman H.M.
        • Bhat T.N.
        • Bourne P.E.
        • Feng Z.
        • Gilliland G.
        • Weissig H.
        • Westbrook J.
        The Protein Data Bank and the challenge of structural genomics.
        Nat. Struct. Biol. 2000; 7 Suppl: 957-959
        • Chen L.
        • Oughtred R.
        • Berman H.M.
        • Westbrook J.
        TargetDB: A target registration database for structural genomics projects.
        Bioinformatics. 2004; 20: 2860-2862
        • Westbrook J.
        • Feng Z.
        • Chen L.
        • Yang H.
        • Berman H.M.
        The Protein Data Bank and structural genomics.
        Nucleic Acids Res. 2003; 31: 489-491
        • Kouranov A.
        • Xie L.
        • de la Cruz J.
        • Chen L.
        • Westbrook J.
        • Bourne P.E.
        • Berman H.M.
        The RCSB PDB information portal for structural genomics.
        Nucleic Acids Res. 2006; 34: D302-305
        • Berman H.M.
        • Westbrook J.D.
        • Gabanyi M.J.
        • Tao W.
        • Shah R.
        • Kouranov A.
        • Schwede T.
        • Arnold K.
        • Kiefer F.
        • Bordoli L.
        • Kopp J.
        • Podvinec M.
        • Adams P.D.
        • Carter L.G.
        • Minor W.
        • et al.
        The protein structure initiative structural genomics knowledgebase.
        Nucleic Acids Res. 2009; 37: D365-D368
        • Gabanyi M.J.
        • Adams P.D.
        • Arnold K.
        • Bordoli L.
        • Carter L.G.
        • Flippen-Andersen J.
        • Gifford L.
        • Haas J.
        • Kouranov A.
        • McLaughlin W.A.
        • Micallef D.I.
        • Minor W.
        • Shah R.
        • Schwede T.
        • Tao Y.P.
        • et al.
        The structural biology knowledgebase: A portal to protein structures, sequences, functions, and methods.
        J. Struct. Funct. Genomics. 2011; 12: 45-54
        • Berman H.M.
        • Westbrook J.D.
        The impact of structural genomics on the protein data bank.
        Am. J. Pharmacogenomics. 2004; 4: 247-252
        • Berman H.M.
        • Kleywegt G.J.
        • Nakamura H.
        • Markley J.L.
        How community has shaped the Protein Data Bank.
        Structure. 2013; 21: 1485-1491
        • Bluhm W.F.
        • Beran B.
        • Bi C.
        • Dimitropoulos D.
        • Prlic A.
        • Quinn G.B.
        • Rose P.W.
        • Shah C.
        • Young J.
        • Yukich B.
        • Berman H.M.
        • Bourne P.E.
        Quality assurance for the query and distribution systems of the RCSB Protein Data Bank.
        Database (Oxford). 2011; 2011bar003
        • Gore S.
        • Sanz Garcia E.
        • Hendrickx P.M.S.
        • Gutmanas A.
        • Westbrook J.D.
        • Yang H.
        • Feng Z.
        • Baskaran K.
        • Berrisford J.M.
        • Hudson B.P.
        • Ikegawa Y.
        • Kobayashi N.
        • Lawson C.L.
        • Mading S.
        • Mak L.
        • et al.
        Validation of structures in the Protein Data Bank.
        Structure. 2017; 25: 1916-1927
        • Bhattacharya A.
        • Tejero R.
        • Montelione G.T.
        Evaluating protein structures determined by structural genomics consortia.
        Proteins. 2007; 66: 778-795
        • Davis I.W.
        • Leaver-Fay A.
        • Chen V.B.
        • Block J.N.
        • Kapral G.J.
        • Wang X.
        • Murray L.W.
        • Arendall 3rd, W.B.
        • Snoeyink J.
        • Richardson J.S.
        • Richardson D.C.
        MolProbity: All-atom contacts and structure validation for proteins and nucleic acids.
        Nucleic Acids Res. 2007; 35: W375-383
        • Yang H.
        • Guranovic V.
        • Dutta S.
        • Feng Z.
        • Berman H.M.
        • Westbrook J.D.
        Automated and accurate deposition of structures solved by X-ray diffraction to the Protein Data Bank.
        Acta Crystallogr. D Biol. Crystallogr. 2004; 60: 1833-1839
        • Ludtke S.J.
        • Lawson C.L.
        • Kleywegt G.J.
        • Berman H.M.
        • Chiu W.
        Workshop on the validation and modeling of electron cryo-microscopy structures of biological nanomachines.
        Pac. Symp. Biocomput. 2011; : 369-373
        • Chen V.B.
        • Wedell J.R.
        • Wenger R.K.
        • Ulrich E.L.
        • Markley J.L.
        MolProbity for the masses-of data.
        J. Biomol. NMR. 2015; 63: 77-83
        • Sillitoe I.
        • Bordin N.
        • Dawson N.
        • Waman V.P.
        • Ashford P.
        • Scholes H.M.
        • Pang C.S.M.
        • Woodridge L.
        • Rauer C.
        • Sen N.
        • Abbasian M.
        • Le Cornu S.
        • Lam S.D.
        • Berka K.
        • Varekova I.H.
        • et al.
        CATH: Increased structural coverage of functional space.
        Nucleic Acids Res. 2021; 49: D266-D273
        • Burley S.K.
        • Berman H.M.
        • Kleywegt G.J.
        • Markley J.L.
        • Nakamura H.
        • Velankar S.
        Protein Data Bank (PDB): The single global macromolecular structure archive.
        Methods Mol. Biol. 2017; 1607: 627-641
        • Berman H.M.
        • Vallat B.
        • Lawson C.L.
        The data universe of structural biology.
        IUCrJ. 2020; 7: 630-638
        • Studer G.
        • Tauriello G.
        • Bienert S.
        • Biasini M.
        • Johner N.
        • Schwede T.
        ProMod3-A versatile homology modelling toolbox.
        PLoS Comput. Biol. 2021; 17e1008667
        • Burley S.K.
        • Kurisu G.
        • Markley J.L.
        • Nakamura H.
        • Velankar S.
        • Berman H.M.
        • Sali A.
        • Schwede T.
        • Trewhella J.
        PDB-dev: A prototype system for depositing integrative/hybrid structural models.
        Structure. 2017; 25: 1317-1318
        • Grabowski M.
        • Cymborowski M.
        • Porebski P.J.
        • Osinski T.
        • Shabalin I.G.
        • Cooper D.R.
        • Minor W.
        The integrated resource for reproducibility in macromolecular crystallography: Experiences of the first four years.
        Struct. Dyn. 2019; 6064301
        • Grabowski M.
        • Langner K.M.
        • Cymborowski M.
        • Porebski P.J.
        • Sroka P.
        • Zheng H.
        • Cooper D.R.
        • Zimmerman M.D.
        • Elsliger M.A.
        • Burley S.K.
        • Minor W.
        A public database of macromolecular diffraction experiments.
        Acta Crystallogr. D Struct. Biol. 2016; 72: 1181-1193
        • Ponsard R.
        • Janvier N.
        • Kieffer J.
        • Houzet D.
        • Fristot V.
        RDMA data transfer and GPU acceleration methods for high-throughput online processing of serial crystallography images.
        J. Synchrotron Radiat. 2020; 27: 1297-1306
        • Adams P.D.
        • Aertgeerts K.
        • Bauer C.
        • Bell J.A.
        • Berman H.M.
        • Bhat T.N.
        • Blaney J.M.
        • Bolton E.
        • Bricogne G.
        • Brown D.
        • Burley S.K.
        • Case D.A.
        • Clark K.L.
        • Darden T.
        • Emsley P.
        • et al.
        Outcome of the first wwPDB/CCDC/D3R ligand validation workshop.
        Structure. 2016; 24: 502-508
        • Lawson C.L.
        Unified data resource for cryo-EM.
        Methods Enzymol. 2010; 483: 73-90
        • Zaucha J.
        • Softley C.A.
        • Sattler M.
        • Frishman D.
        • Popowicz G.M.
        Deep learning model predicts water interaction sites on the surface of proteins using limited-resolution data.
        Chem. Commun. (Camb.). 2020; 56: 15454-15457
        • Arkin A.P.
        • Cottingham R.W.
        • Henry C.S.
        • Harris N.L.
        • Stevens R.L.
        • Maslov S.
        • Dehal P.
        • Ware D.
        • Perez F.
        • Canon S.
        • Sneddon M.W.
        • Henderson M.L.
        • Riehl W.J.
        • Murphy-Olson D.
        • Chan S.Y.
        • et al.
        KBase: The United States Department of Energy systems biology knowledgebase.
        Nat. Biotechnol. 2018; 36: 566-569

      Biography

      Andrzej Joachimiak is the Director of the Structural Biology Center and the Midwest Center for Structural Genomics at Argonne National Laboratory and Co-Director of the Center for Structural Genomics of Infectious Diseases at the University of Chicago. As a leader in structural genomics, he has developed many new methods for high-throughput molecular biology and crystallography.

      Linked Article

      • How the Protein Data Bank changed biology: An introduction to the JBC Reviews thematic series, part 2
        Journal of Biological ChemistryVol. 296
        • Preview
          In part 1 of this remarkable collection, we told you the story of The Protein Data Bank (PDB) (1), which was founded 50 years ago, and we illustrated the breadth of the science contained within it with ten informative review articles. The second half of this collection is a continuation of our celebrations to mark this momentous anniversary. Part 2 provides eight more superb articles describing how the PDB has influenced biology over the course of the last half-century and how biology has fueled the deposition of impactful structures in the PDB.
        • Full-Text
        • PDF
        Open Access