Advertisement

An RNA-centric historical narrative around the Protein Data Bank

Open AccessPublished:March 17, 2021DOI:https://doi.org/10.1016/j.jbc.2021.100555
      Some of the amazing contributions brought to the scientific community by the Protein Data Bank (PDB) are described. The focus is on nucleic acid structures with a bias toward RNA. The evolution and key roles in science of the PDB and other structural databases for nucleic acids illustrate how small initial ideas can become huge and indispensable resources with the unflinching willingness of scientists to cooperate globally. The progress in the understanding of the molecular interactions driving RNA architectures followed the rapid increase in RNA structures in the PDB. That increase was consecutive to improvements in chemical synthesis and purification of RNA molecules, as well as in biophysical methods for structure determination and computer technology. The RNA modeling efforts from the early beginnings are also described together with their links to the state of structural knowledge and technological development. Structures of RNA and of its assemblies are physical objects, which, together with genomic data, allow us to integrate present-day biological functions and the historical evolution in all living species on earth.

      Keywords

      Abbreviations:

      PDB (Protein Data Bank), PDBj (PDB Japan), PDP (Programmed Data Processor), NDB (Nucleic Acid Database), RNP (RNA–protein complex), wwPDB (worldwide PDB)
      The Protein Data Bank (PDB) is an icon for structural biologists. In its virtual vaults, the PDB stores the structures of biological macromolecules obtained painstakingly by thousands of crystallographers, electron microscopists, and nuclear magnetic resonance spectroscopists over the last 50 years or so. This note is primarily written to honor and acknowledge those who set up the PDB and who have strived throughout all these years to improve and maintain the database. We thank the Journal of Biological Chemistry for organizing the publication of such pieces that provide a mixture of personal recollections, historical records about nucleic acids structure, and some general comments about databases and their future.

      The evolution of structural databases

      When one of us was a postdoctoral fellow with M. Sundaralingam at the Department of Biochemistry of the University of Wisconsin in Madison, the only way to work with a published molecular structure was to first type in the atomic coordinates at the console of a PDP computer (Programmed Data Processor from Digital Equipment Corporation, Massachusetts). Only then could one deduce or check atomic distances, angles, stereochemistry, contacts and visualize the 3D structure using programs developed in-house. This was a tedious job, but very instructive to students and postdoctoral fellows. In those days, atomic coordinates were part of the paper and presented with temperature factors in tables (for an example, see (
      • Sundaralingam M.
      • Jensen L.H.
      Stereochemistry of nucleic acid constituents: I. Refinement of the structure of cytidylic acid b.
      ) and for general guidelines (
      • Kennard O.
      • Speakman J.C.
      • Donnay J.D.H.
      Primary crystallographic data.
      )). The number and diversity of errors that escaped detection before publication, while rarely casting doubt on the crystallography itself, led generally to inaccurately described molecular structures with, for example, some wrong chiralities, or atoms of the molecular unit in different unit cells. This manual handling of data constituted for sure an excellent training ground. Happily, the nucleic acid structures that could be tackled at that time were restricted to the bases (often modified), nucleosides, and nucleotides, with the first structure of a dinucleotide appearing in 1971 (
      • Rubin J.
      • Brennan T.
      • Sundaralingam M.
      Crystal structure of a naturally occurring dinucleoside monophosphate: Uridylyl (3',5') adenosine hemihydrate.
      ,
      • Seeman N.C.
      • Sussman J.L.
      • Berman H.N.
      • Kim S.H.
      Nucleic acid conformation: Crystal structure of a naturally occurring dinucleoside phosphate (UpA).
      ). In 1965, Olga Kennard, inspired by J.D. Bernal, had started to work on the Cambridge Structural Database, CSD (
      • Kennard O.A.
      • Brice F.H.
      • Hummelink M.D.
      • Motherwell T.W.A.
      • Roidgers W.D.S.
      • Watson J.R.
      • D.G.
      Computer based systems for the retrieval of data: Crystallography.
      ) and, in 1971, the PDB was established jointly by the Crystallographic Data Centre, Cambridge and the Brookhaven National Laboratory (Protein Data Bank). PDB originally served as a repository system where crystallographers could mail in their data on punch cards, then computer tapes, for archiving and distribution. The announcement of the founding of the PDB in Nature New Biology was a simple little insert of 256 words (
      Crystallography: Protein Data Bank.
      )! Full publications appeared later (
      Crystallography: Protein Data Bank.
      ,
      • Bernstein F.C.
      • Koetzle T.F.
      • Williams G.J.
      • Meyer Jr., E.F.
      • Brice M.D.
      • Rodgers J.R.
      • Kennard O.
      • Shimanouchi T.
      • Tasumi M.
      The Protein Data Bank. A computer-based archival file for macromolecular structures.
      ). The CSD archived structures of individual nucleotides, as well as di- and trinucleotides. These structures are not in PDB. In fact, the first release of PDB did not include any nucleic acid structures, just seven protein structures (
      • Bernstein F.C.
      • Koetzle T.F.
      • Williams G.J.
      • Meyer Jr., E.F.
      • Brice M.D.
      • Rodgers J.R.
      • Kennard O.
      • Shimanouchi T.
      • Tasumi M.
      The Protein Data Bank. A computer-based archival file for macromolecular structures.
      ). Later, in 1992, the Nucleic Acid Database (NDB) was founded by Berman et al. (
      • Berman H.M.
      • Olson W.K.
      • Beveridge D.L.
      • Westbrook J.
      • Gelbin A.
      • Demeny T.
      • Hsieh S.H.
      • Srinivasan A.R.
      • Schneider B.
      The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids.
      ) and structures of RNA units were included in the NDB.
      In the early days of the development of structural databases, the main objectives were to prevent the loss of data by archiving solved structures and making them freely available upon request. The deposited data were checked for atom names, numberings, and geometry by database curators, but not as systematically and thoroughly as it is done today. Dictionaries and defined formats were developed over the years and applied to standardize nomenclature, file formats, and metadata. As regards nucleic acids, the relational NDB archived data accompanied by structural information and descriptions (
      • Berman H.M.
      • Olson W.K.
      • Beveridge D.L.
      • Westbrook J.
      • Gelbin A.
      • Demeny T.
      • Hsieh S.H.
      • Srinivasan A.R.
      • Schneider B.
      The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids.
      ). NDB served as a testbed for new data formats that could accommodate much larger structures and much more metadata, which were eventually adopted by PDB. Nowadays, these databases are rich in metadata and tools for assessing and describing structures. Very valuable tools for structural validations (
      • Davis I.W.
      • Leaver-Fay A.
      • Chen V.B.
      • Block J.N.
      • Kapral G.J.
      • Wang X.
      • Murray L.W.
      • Arendall 3rd, W.B.
      • Snoeyink J.
      • Richardson J.S.
      • Richardson D.C.
      MolProbity: All-atom contacts and structure validation for proteins and nucleic acids.
      ) and corrections (
      • Chou F.C.
      • Sripakdeevong P.
      • Dibrov S.M.
      • Hermann T.
      • Das R.
      Correcting pervasive errors in RNA crystallography through enumerative structure prediction.
      ) of nucleic acid structures were introduced. The PDB now offers various validation tools with compelling metrics for all types of structures (
      • Read R.J.
      • Adams P.D.
      • Arendall 3rd, W.B.
      • Brunger A.T.
      • Emsley P.
      • Joosten R.P.
      • Kleywegt G.J.
      • Krissinel E.B.
      • Lutteke T.
      • Otwinowski Z.
      • Perrakis A.
      • Richardson J.S.
      • Sheffler W.H.
      • Smith J.L.
      • Tickle I.J.
      • et al.
      A new generation of crystallographic validation tools for the protein data bank.
      ). While the PDB remains, understandably, protein-centric, other databases and tools are available for nucleic acids (
      • Berman H.M.
      • Olson W.K.
      • Beveridge D.L.
      • Westbrook J.
      • Gelbin A.
      • Demeny T.
      • Hsieh S.H.
      • Srinivasan A.R.
      • Schneider B.
      The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids.
      ,
      • Petrov A.I.
      • Zirbel C.L.
      • Leontis N.B.
      WebFR3D--a server for finding, aligning and analyzing recurrent RNA 3D motifs.
      ,
      • Coimbatore Narayanan B.
      • Westbrook J.
      • Ghosh S.
      • Petrov A.I.
      • Sweeney B.
      • Zirbel C.L.
      • Leontis N.B.
      • Berman H.M.
      The nucleic acid database: New features and capabilities.
      )—see also https://www.bgsu.edu/research/rna/.

      Quality and accuracy of structural data

      All of the precise structural data regarding RNA comes ultimately from atomic-resolution X-ray structures of nucleotides, oligonucleotides, and various biologically relevant structures, ranging in size from individual helical elements to the full ribosome. The early work was carried out by pioneers such as A. Rich, O. Kennard, and M. Sundaralingam, who from the mid-1960s and into the 1980s carried out precise crystallographic studies of nucleosides, nucleotides, and dinucleotides. These data comprise all our basic knowledge of bond lengths, angles, and stereochemistry, as well as interaction preferences, including all types of base pairs, and most stacking and base–backbone interactions. The new data led early on to useful concepts including the conformational wheel, describing ribose conformations (
      • Altona C.
      • Sundaralingam M.
      Conformational analysis of the sugar ring in nucleosides and nucleotides. A new description using the concept of pseudorotation.
      ), and better understanding of the conformational preferences of nucleic acids, the analysis of which was quite daunting to theoreticians and modelers alike (
      • Olson W.K.
      • Flory P.J.
      Spatial configurations of polynucleotide chains. 3. Polydeoxyribonucleotides.
      ,
      • Olson W.K.
      • Flory P.J.
      Spatial configurations of polynucleotide chains. I. Steric interactions in polyribonucleotides: A virtual bond model.
      ,
      • Olson W.K.
      • Flory P.J.
      Spatial configuration of polynucleotide chains. II. Conformational energies and the average dimensions of polyribonucleotides.
      ). Unlike proteins, which effectively have just two degrees of freedom per monomer unit, nucleotides present seven, six dihedrals along the backbone, and the one around the glycosidic bond. In 1965, based on the small number of protein structures known, as well as better resolved peptide structures, Ramakrishnan and Ramachandran (
      • Ramakrishnan C.
      • Ramachandran G.N.
      Stereochemical criteria for polypeptide and protein chain conformations. II. Allowed conformations for a pair of peptide units.
      ) worked out the sterically favored backbone configurations and summarized them graphically in two dimensions, the dihedral angles phi and psi. Such Ramachandran plots continue to be used as easily visualized metrics for protein structures. Shortly afterward, Sundaralingam (
      • Sundaralingam M.
      Stereochemistry of nucleic acids and their constituents.† IV. Allowed and preferred conformations of nucleosides, nucleoside mono-, di-, tri-, tetraphosphates, nucleic acids and polynucleotides.
      ) carried out a series of landmark studies to accurately determine molecular structures of nucleic acid constituents, culminating in a highly cited 1969 paper in Biopolymers that laid the foundation for conformational analysis of nucleotides and polynucleotides, and gave the first stereochemical rules for the sugar-phosphate backbone. This work revealed conclusively that backbone dihedrals are restricted and that correlations exist between them and supported theoretical work that proposed various schemes to simplify nucleic acid conformational analysis using virtual bonds (
      • Olson W.K.
      • Flory P.J.
      Spatial configurations of polynucleotide chains. 3. Polydeoxyribonucleotides.
      ,
      • Yathindra N.
      • Sundaralingam M.
      Analysis of the possible helical structures of nucleic acids and polynucleotides. Application of (n-h) plots.
      ).
      High-resolution data from small molecules are also used to build force fields and to infer rules for assembly of molecular moieties. These force fields and energetic rules are then used for producing and optimizing structures, sampling the conformational space, or simulating molecular dynamics. The quality and general value of the deduced force fields strongly depend on the number and variety of structures available. The importance of efficient and accurate force fields is paramount in modern biochemical research, since these force fields are used not only for computing trajectories of molecular dynamics simulations, but also in determination of new structure using NMR and cryo-electron microscopy.
      It cannot be overemphasized that the quality of the deposited structures is of primary importance; it is directly related to the crystallographic resolution of the X-ray data and on the refinement process, since only a minor fraction of X-ray structures is obtained at true atomic resolution (better than 1 Å). For example, can one really be confident in RNA structures at resolutions above 3.3 Å with average B-factors around 200 Å2, bad clash scores, and poor PDB validation metrics? Only a systematic analysis of the various regions of the RNA molecule in electron density maps would allow a knowledgeable and interested person to reach an informed opinion. One key parameter for compiling reference databases for knowledge extraction is the nonredundancy of the structures that are included. Nonredundant structure databases, by reducing bias in the parameters deduced from structures, are extremely valuable for extracting knowledge about RNA as well as for benchmarking modeling strategies (
      • Petrov A.I.
      • Zirbel C.L.
      • Leontis N.B.
      Automated classification of RNA 3D motifs and the RNA 3D motif atlas.
      ). In this respect, it is worth noting that less than 100 nonredundant RNA structures have been solved at 2 Å resolution or better. Well-defined metrics for 3D structures determined from crystallography (X-ray and NMR) have been established and are reported at the nucleotide level by PDB (
      • Gore S.
      • Sanz Garcia E.
      • Hendrickx P.M.S.
      • Gutmanas A.
      • Westbrook J.D.
      • Yang H.
      • Feng Z.
      • Baskaran K.
      • Berrisford J.M.
      • Hudson B.P.
      • Ikegawa Y.
      • Kobayashi N.
      • Lawson C.L.
      • Mading S.
      • Mak L.
      • et al.
      Validation of structures in the Protein Data Bank.
      ,
      • Wuthrich K.
      NMR of Proteins and Nucleic Acids.
      ,
      • Allain F.H.
      • Varani G.
      How accurately and precisely can RNA structure be determined by NMR?.
      ). Metrics have recently also been defined for structures determined by cryo-electron microscopy, which is largely replacing X-ray diffraction for structure determination of large macromolecular machines such as ribosomes, viruses, and spliceosomes (
      • Lawson C.L.
      • Berman H.M.
      • Chiu W.
      Evolving data standards for cryo-EM structures.
      ).

      The first 20 years of RNA structures

      The first nucleic acid structures were of short synthetic DNA helices and were deposited in 1981 (Z-DNA, 2DCG (
      • Wang A.J.
      • Quigley G.J.
      • Kolpak F.J.
      • van der Marel G.
      • van Boom J.H.
      • Rich A.
      Left-handed double helical DNA: Variations in the backbone conformation.
      ) and B-DNA, 1BNA (
      • Drew H.R.
      • Wing R.M.
      • Takano T.
      • Broka C.
      • Tanaka S.
      • Itakura K.
      • Dickerson R.E.
      Structure of a B-DNA dodecamer: Conformation and dynamics.
      )), with the molecular descriptions published respectively in 1979 (
      • Wang A.H.
      • Quigley G.J.
      • Kolpak F.J.
      • Crawford J.L.
      • van Boom J.H.
      • van der Marel G.
      • Rich A.
      Molecular structure of a left-handed double helical DNA fragment at atomic resolution.
      ) and 1980 (
      • Wing R.
      • Drew H.
      • Takano T.
      • Broka C.
      • Tanaka S.
      • Itakura K.
      • Dickerson R.E.
      Crystal structure analysis of a complete turn of B-DNA.
      ). These high-resolution DNA structures only became possible once solid-state synthesis could provide sufficient quantities of pure oligonucleotides for crystallization and structure determination (
      • Clerici L.
      • Campagnari F.
      • de Rooij J.F.
      • van Boom J.H.
      Preparation of polydeoxynucleotides linked to a solid support by coupling CNBr-activated cellulose with 5'-NH2-terminated oligo and poly(pdT)'s.
      ,
      • Itakura K.
      • Katagiri N.
      • Bahl C.P.
      • Wightman R.H.
      • Narang S.A.
      Improved triester approach for the synthesis of pentadecathymidylic acid.
      ). The early RNA structures, of biological origin, were those of various tRNAs isolated from Escherichia coli or yeast (
      • Clark B.F.C.
      • Doctor B.P.
      • Holmes K.C.
      • Klug A.
      • Marcker K.A.
      • Morris S.J.
      • Paradies H.H.
      Crystallization of transfer RNA.
      ,
      • Kim S.H.
      • Rich A.
      Single crystals of transfer RNA: An x-ray diffraction study.
      ,
      • Hampel A.
      • Labanauskas M.
      • Connors P.G.
      • Kirkegard L.
      • RajBhandary U.L.
      • Sigler P.B.
      • Bock R.M.
      Single crystals of transfer RNA from formylmethionine and phenylalanine transfer RNA's.
      ), an accomplishment that required novel purification and crystallization protocols. It took roughly 10 years to produce refined structures of yeast tRNAPhe (deposited in 1978, 4TNA (
      • Hingerty B.
      • Brown R.S.
      • Jack A.
      Further refinement of the structure of yeast tRNAPhe.
      ) and 6TNA (
      • Sussman J.L.
      • Holbrook S.R.
      • Warrant R.W.
      • Church G.M.
      • Kim S.H.
      Crystal structure of yeast phenylalanine transfer RNA. I. Crystallographic refinement.
      )). The choice of tRNA for study was obvious because of the amount present in cells and the availability of some purification protocols. Its primary and secondary structures had been determined in the 1960s, by chemical and biochemical means (
      • Holley R.W.
      • Apgar J.
      • Everett G.A.
      • Madison J.T.
      • Marquisee M.
      • Merrill S.H.
      • Penswick J.R.
      • Zamir A.
      Structure of a ribonucleic acid.
      ,
      • Madison J.T.
      • Everett G.A.
      • Kung H.
      Nucleotide sequence of a yeast tyrosine transfer RNA.
      ). In the 1980s, additional tRNA structures were solved as well as the first RNA viruses (1BMV (
      • Chen Z.G.
      • Stauffacher C.
      • Li Y.
      • Schmidt T.
      • Bomu W.
      • Kamer G.
      • Shanks M.
      • Lomonossoff G.
      • Johnson J.E.
      Protein-RNA interactions in an icosahedral virus at 3.0 A resolution.
      ) and 2TMV (
      • Namba K.
      • Pattanayek R.
      • Stubbs G.
      Visualization of protein-nucleic acid interactions in a virus. Refined structure of intact tobacco mosaic virus at 2.9 A resolution by X-ray fiber diffraction.
      )), containing short fragments of the viral genome bound to the coat proteins. In those years, at crystallography meetings and conferences, talks on nucleic acids, and RNA in particular, were generally relegated to the last day, either after the meeting dinner (always very lively and joyous events) in the early morning or just before the departure of the bus.
      Following novelty and progress in chemical synthesis and purification of RNA oligonucleotides (
      • Hayes J.A.
      • Brunden M.J.
      • Gilham P.T.
      • Gough G.R.
      High-yield synthesis of oligoribonucleotides using o-nitrobenzyl protection of 2′-hydroxyls.
      ), the first X-ray structures of synthetic RNAs appeared in 1988 (
      • Dock-Bregeon A.C.
      • Chevrier B.
      • Podjarny A.
      • Moras D.
      • deBear J.S.
      • Gough G.R.
      • Gilham P.T.
      • Johnson J.E.
      High resolution structure of the RNA duplex [U(U-A)6A]2.
      ) and were deposited in 1991. That structure showed first examples of intermolecular interactions involving the ribose hydroxyl groups between RNA helices (
      • Dock-Bregeon A.C.
      • Chevrier B.
      • Podjarny A.
      • Johnson J.
      • de Bear J.S.
      • Gough G.R.
      • Gilham P.T.
      • Moras D.
      Crystallographic structure of an RNA helix: [U(UA)6A]2.
      ). The next advances came from NMR structures of small recurrent RNA motifs, such as the frequent GNRA tetraloop (
      • Heus H.A.
      • Pardi A.
      Structural features that give rise to the unusual stability of RNA hairpins containing GNRA loops.
      ) - identified by sequence comparisons in large RNAs (autocatalytic introns, ribosomal RNAs, RNase P). These included most revealingly the loop E of 5S rRNA (
      • Gewirth D.T.
      • Abo S.R.
      • Leontis N.B.
      • Moore P.B.
      Secondary structure of 5S RNA: NMR experiments on RNA molecules partially labeled with nitrogen-15.
      ,
      • Leontis N.B.
      • Moore P.B.
      Imino proton exchange in the 5S RNA of Escherichia coli and its complex with protein L25 at 490 MHz.
      ). Even though they lacked atomic resolution, the NMR structures were sufficiently detailed to provide models that expanded our understanding on how the RNA structures could form, in particular the variety of non-Watson–Crick base pairs, and they provided enough information to infer sequence signatures to identify recurrences of the same motifs in other structures (
      • Heus H.A.
      • Pardi A.
      Structural features that give rise to the unusual stability of RNA hairpins containing GNRA loops.
      ,
      • Woese C.R.
      • Winker S.
      • Gutell R.R.
      Architecture of ribosomal RNA: Constraints on the sequence of “tetra-loops”.
      ). The willingness of its directors to expand PDB to include NMR structures proved to be a wise decision.
      By 1995, the number of X-ray RNA structures (alone or in complex with protein) in the PDB amounted to only about 1% of the present content of total RNA and RNA–protein complex (RNP) structures. Figures 1 and 2 show the evolution in the number of structures of RNA alone and RNP with time. Some key structures are indicated to illustrate the increase in complexity of the solved structures. Again progress in chemical synthesis (
      • Caruthers M.H.
      The chemical synthesis of DNA/RNA: Our gift to science.
      ,
      • Scaringe S.A.
      Advanced 5'-silyl-2'-orthoester approach to RNA oligonucleotide synthesis.
      ), biochemical RNA production (
      • Milligan J.F.
      • Groebe D.R.
      • Witherell G.W.
      • Uhlenbeck O.C.
      Oligoribonucleotide synthesis using T7 RNA polymerase and synthetic DNA templates.
      ) and purification (
      • Anderson A.C.
      • Scaringe S.A.
      • Earp B.E.
      • Frederick C.A.
      HPLC purification of RNA for crystallography and NMR.
      ), as well as in X-ray technology, including the spreading use of synchrotron radiation at cryogenic temperatures (
      • Greenhough T.J.
      • Helliwell J.R.
      The uses of synchrotron X-radiation in the crystallography of molecular biology.
      ,
      • Hope H.
      • Frolow F.
      • von Bohlen K.
      • Makowski I.
      • Kratky C.
      • Halfon Y.
      • Danz H.
      • Webster P.
      • Bartels K.S.
      • Wittmann H.G.
      • Yonath A.
      Cryocrystallography of ribosomal particles.
      ) were central to the rapid increase in the resolution of RNA and RNP structures. Many advances in crystallogenesis (
      • Giege R.
      • Lorber B.
      • Theobald-Dietrich A.
      Crystallogenesis of biological macromolecules: Facts and perspectives.
      ,
      • Giege R.
      What macromolecular crystallogenesis tells us - what is needed in the future.
      ) and in biochemical preparation for crystallization of RNA (
      • Meyer M.
      • Masquida B.
      cis-Acting 5' hammerhead ribozyme optimization for in vitro transcription of highly structured RNAs.
      ,
      • Ferre-D'Amare A.R.
      • Doudna J.A.
      Use of cis- and trans-ribozymes to remove 5' and 3' heterogeneities from milligrams of in vitro transcribed RNA.
      ,
      • Ferre-D'Amare A.R.
      • Zhou K.
      • Doudna J.A.
      A general module for RNA crystallization.
      ) or RNA complexes (
      • Price S.R.
      • Ito N.
      • Oubridge C.
      • Avis J.M.
      • Nagai K.
      Crystallization of RNA-protein complexes. I. Methods for the large-scale preparation of RNA suitable for crystallographic studies.
      ,
      • Oubridge C.
      • Ito N.
      • Teo C.H.
      • Fearnley I.
      • Nagai K.
      Crystallisation of RNA-protein complexes. II. The application of protein engineering for crystallisation of the U1A protein-RNA complex.
      ,
      • Ferre-D'Amare A.R.
      Use of the spliceosomal protein U1A to facilitate crystallization and structure determination of complex RNAs.
      ) appeared in those years and the whole community benefited greatly from those novel approaches. Amazing breakthroughs in cryo-electron microscopy are now accelerating the pace at which highly complex particles can be observed (
      • Nakane T.
      • Kotecha A.
      • Sente A.
      • McMullan G.
      • Masiulis S.
      • Brown P.
      • Grigoras I.T.
      • Malinauskaite L.
      • Malinauskas T.
      • Miehling J.
      • Uchanski T.
      • Yu L.
      • Karia D.
      • Pechnikova E.V.
      • de Jong E.
      • et al.
      Single-particle cryo-EM at atomic resolution.
      ,
      • Cheng Y.
      Single-particle cryo-EM at crystallographic resolution.
      ,
      • Subramaniam S.
      • Earl L.A.
      • Falconieri V.
      • Milne J.L.
      • Egelman E.H.
      Resolution advances in cryo-EM enable application to drug discovery.
      ).
      Figure thumbnail gr1
      Figure 1The evolution of the number of RNA structures in the PDB. The figure is downloaded from the option “Analyze PDB statistics.” All RNA structures are included (from X-ray, NMR, and cryo-EM). Some key X-ray structures are indicated. Up to 1991, only tRNA structures were present. Following time, these structures are highlighted: (1) A synthetic 14-mer duplex, 1RNA (
      • Dock-Bregeon A.C.
      • Chevrier B.
      • Podjarny A.
      • Johnson J.
      • de Bear J.S.
      • Gough G.R.
      • Gilham P.T.
      • Moras D.
      Crystallographic structure of an RNA helix: [U(UA)6A]2.
      ); (2) the core hammerhead ribozyme, 1MME (
      • Scott W.G.
      • Finch J.T.
      • Klug A.
      The crystal structure of an all-RNA hammerhead ribozyme: A proposed mechanism for RNA catalytic cleavage.
      ); (3) the P4-P6 domain of the Tetrahymena ribozyme, 1GID (
      • Cate J.H.
      • Gooding A.R.
      • Podell E.
      • Zhou K.
      • Golden B.L.
      • Kundrot C.E.
      • Cech T.R.
      • Doudna J.A.
      Crystal structure of a group I ribozyme domain: Principles of RNA packing.
      ); (4) the eukaryotic loop E structure, 354D (
      • Correll C.C.
      • Freeborn B.
      • Moore P.B.
      • Steitz T.A.
      Metals, motifs, and recognition in the crystal structure of a 5S rRNA domain.
      ); (5) the core of the Tetrahymena ribozyme, 1GRZ (
      • Golden B.L.
      • Gooding A.R.
      • Podell E.R.
      • Cech T.R.
      A preorganized active site in the crystal structure of the tetrahymena ribozyme.
      ); (6) the hepatitis delta ribozyme, 1DRZ (
      • Ferre-D'Amare A.R.
      • Zhou K.
      • Doudna J.A.
      Crystal structure of a hepatitis delta virus ribozyme.
      ); (7) Aptamers binding to malachite green, 1F1T (
      • Baugh C.
      • Grate D.
      • Wilson C.
      2.8 A crystal structure of the malachite green aptamer.
      ), and vitamin B12, 1DDY (
      • Sussman D.
      • Nix J.C.
      • Wilson C.
      The structural basis for molecular recognition by the vitamin B 12 RNA aptamer.
      ); (8) RNA quadruplex, 1J8G (
      • Vasudevan S.S.
      • Sundaralingam M.
      The occurence of the syn-C3' endo conformation and the distorted backbone conformations for C4'-C5' and P-O5' in oligo and polynucleotides.
      ); an earlier NMR structure was solved before, 1RAU (
      • Cheong C.
      • Moore P.B.
      Solution structure of an unusually stable RNA tetraplex containing G- and U-quartet structures.
      ); (9) the purine riboswitch, 1Y27 (
      • Serganov A.
      • Yuan Y.R.
      • Pikovskaya O.
      • Polonskaia A.
      • Malinina L.
      • Phan A.T.
      • Hobartner C.
      • Micura R.
      • Breaker R.R.
      • Patel D.J.
      Structural basis for discriminative regulation of gene expression by adenine- and guanine-sensing mRNAs.
      ), since then the structures of a great variety of riboswitches have appeared (
      • McCown P.J.
      • Corbino K.A.
      • Stav S.
      • Sherlock M.E.
      • Breaker R.R.
      Riboswitch diversity and distribution.
      ,
      • Breaker R.R.
      Imaginary ribozymes.
      ,
      • Garst A.D.
      • Edwards A.L.
      • Batey R.T.
      Riboswitches: Structures and mechanisms.
      ); (10) the group I intron from Azoarcus, 1U6B (
      • Adams P.L.
      • Stahley M.R.
      • Gill M.L.
      • Kosek A.B.
      • Wang J.
      • Strobel S.A.
      Crystal structure of a group I intron splicing intermediate.
      ); (11) the core of a RNase P ribozyme, 2A2E (
      • Torres-Larios A.
      • Swinger K.K.
      • Krasilnikov A.S.
      • Pan T.
      • Mondragon A.
      Crystal structure of the RNA component of bacterial ribonuclease P.
      ); (12) the full hammerhead ribozyme with long-range loop–loop contacts stabilizing the core, 3ZD5 (
      • Martick M.
      • Scott W.G.
      Tertiary contacts distant from the active site prime a ribozyme for catalysis.
      ); (13) a complete group II intron, 3EOH (
      • Toor N.
      • Keating K.S.
      • Taylor S.D.
      • Pyle A.M.
      Crystal structure of a self-spliced group II intron.
      ); (14) the structure of a RNA nanosquare, 3P59 (
      • Dibrov S.M.
      • McLean J.
      • Parsons J.
      • Hermann T.
      Self-assembling RNA square.
      ); (15) the complex between the T-box riboswitch and its tRNA target, 4MGN (
      • Grigg J.C.
      • Ke A.
      Structural determinants for geometry and information decoding of tRNA by T box leader RNA.
      ); (16) the TYMV tRNA-like, 4P5J (
      • Colussi T.M.
      • Costantino D.A.
      • Hammond J.A.
      • Ruehle G.M.
      • Nix J.C.
      • Kieft J.S.
      The structural basis of transfer RNA mimicry and conformational plasticity by a viral RNA.
      ); (17) the Spinach fluorescent aptamer, 4TS0 (
      • Warner K.D.
      • Chen M.C.
      • Song W.
      • Strack R.L.
      • Thorn A.
      • Jaffrey S.R.
      • Ferre-D'Amare A.R.
      Structural basis for activity of highly efficient RNA mimics of green fluorescent protein.
      ); (18) a group II intron with a lariat primed for transposition, 5J01 (
      • Costa M.
      • Walbott H.
      • Monachello D.
      • Westhof E.
      • Michel F.
      Crystal structures of a group II intron lariat primed for reverse splicing.
      ); (19) the full structure of the T-box between GlyQS and its tRNA, 6POM (
      • Li S.
      • Su Z.
      • Lehmann J.
      • Stamatopoulou V.
      • Giarimoglou N.
      • Henderson F.E.
      • Fan L.
      • Pintilie G.D.
      • Zhang K.
      • Chen M.
      • Ludtke S.J.
      • Wang Y.X.
      • Stathopoulos C.
      • Chiu W.
      • Zhang J.
      Structural basis of amino acid surveillance by higher-order tRNA-mRNA interactions.
      ).
      Figure thumbnail gr2
      Figure 2The evolution of the number of structures of RNA–protein complexes (RNPs) in the PDB. The figure is downloaded from the option “Analyze PDB statistics.” All RNP structures are included (from X-ray, NMR, and cryo-EM). The number of structures related to ribosomes and its cofactors is much too large to show them on such a figure. We preferred to emphasize the complexes formed in the spliceosome (for detailed reviews, see (
      • Wilkinson M.E.
      • Charenton C.
      • Nagai K.
      RNA splicing by the spliceosome.
      ,
      • Yan C.
      • Wan R.
      • Shi Y.
      Molecular mechanisms of pre-mRNA splicing through structural biology of the spliceosome.
      )). Most of the large RNP structures after 2015 are based on cryo-EM data. Following time, these structures are highlighted: (1) RNA viruses, 1BMV (
      • Chen Z.G.
      • Stauffacher C.
      • Li Y.
      • Schmidt T.
      • Bomu W.
      • Kamer G.
      • Shanks M.
      • Lomonossoff G.
      • Johnson J.E.
      Protein-RNA interactions in an icosahedral virus at 3.0 A resolution.
      ); (2) class I tRNA synthetase complex, 1GSG (
      • Rould M.A.
      • Perona J.J.
      • Soll D.
      • Steitz T.A.
      Structure of E. coli glutaminyl-tRNA synthetase complexed with tRNA(Gln) and ATP at 2.8 A resolution.
      ); (3) tRNASer, a class II tRNA with a long variable loop, complexed with its specific synthetase, 1SER (
      • Belrhali H.
      • Yaremchuk A.
      • Tukalo M.
      • Larsen K.
      • Berthet-Colominas C.
      • Leberman R.
      • Beijer B.
      • Sproat B.
      • Als-Nielsen J.
      • Grubel G.
      • Legrand J.-F.
      • Lehmann M.
      • Cusack S.
      Crystal structures at 2.5 angstrom resolution of seryl-tRNA synthetase complexed with two analogs of seryl adenylate.
      ); (4) class II tRNA synthetase complex, 1ASY (
      • Ruff M.
      • Krishnaswamy S.
      • Boeglin M.
      • Poterszman A.
      • Mitschler A.
      • Podjarny A.
      • Rees B.
      • Thierry J.C.
      • Moras D.
      Class II aminoacyl transfer RNA synthetases: Crystal structure of yeast aspartyl-tRNA synthetase complexed with tRNA(Asp).
      ); (5) MS2 RNA coat protein, 1AQ3 (
      • van den Worm S.H.
      • Stonehouse N.J.
      • Valegard K.
      • Murray J.B.
      • Walton C.
      • Fridborg K.
      • Stockley P.G.
      • Liljas L.
      Crystal structures of MS2 coat protein mutants in complex with wild-type RNA operator fragments.
      ); (6) spliceosomal U2 complex, 1A9N (
      • Price S.R.
      • Evans P.R.
      • Nagai K.
      Crystal structure of the spliceosomal U2B"-U2A' protein complex bound to a fragment of U2 small nuclear RNA.
      ); (7) the kink-turn was first observed in the complex of U4 sRNA fragment, 1E7K (
      • Vidovic I.
      • Nottrott S.
      • Hartmuth K.
      • Luhrmann R.
      • Ficner R.
      Crystal structure of the spliceosomal 15.5kD protein bound to a U4 snRNA fragment.
      ), before being recurrently observed in the ribosome structure (
      • Nissen P.
      • Ippolito J.A.
      • Ban N.
      • Moore P.B.
      • Steitz T.A.
      RNA tertiary interactions in the large ribosomal subunit: The A-minor motif.
      ); (8) the Signal Recognition Particle complex, 2V3C (
      • Hainzl T.
      • Huang S.
      • Sauer-Eriksson A.E.
      Interaction of signal-recognition particle 54 GTPase domain and signal-recognition particle RNA in the free signal-recognition particle.
      ); (9) complex between a tyrosyl tRNA synthetase and a group I intron, 2RKJ (
      • Paukstelis P.J.
      • Chen J.H.
      • Chase E.
      • Lambowitz A.M.
      • Golden B.L.
      Structure of a tyrosyl-tRNA synthetase splicing factor bound to a group I intron RNA.
      ); (10) U1 snRNP, 3CW1 (
      • Pomeranz Krummel D.A.
      • Oubridge C.
      • Leung A.K.
      • Li J.
      • Nagai K.
      Crystal structure of human spliceosomal U1 snRNP at 5.5 A resolution.
      ); (11) an RNAse P holoenzyme, 3Q1Q (
      • Reiter N.J.
      • Osterman A.
      • Torres-Larios A.
      • Swinger K.K.
      • Pan T.
      • Mondragon A.
      Structure of a bacterial ribonuclease P holoenzyme in complex with tRNA.
      ); (12) in the U4 snRNP, 4WZJ (
      • Leung A.K.
      • Nagai K.
      • Li J.
      Structure of the spliceosomal U4 snRNP core domain and its implication for snRNP biogenesis.
      ); (13) Lsm/U6 snRNP complex, 4M7A (
      • Zhou L.
      • Hang J.
      • Zhou Y.
      • Wan R.
      • Lu G.
      • Yin P.
      • Yan C.
      • Shi Y.
      Crystal structures of the Lsm complex bound to the 3' end sequence of U6 small nuclear RNA.
      ); (14) the tri-snRNP structure, 3JCM (
      • Wan R.
      • Yan C.
      • Bai R.
      • Wang L.
      • Huang M.
      • Wong C.C.
      • Shi Y.
      The 3.8 A structure of the U4/U6.U5 tri-snRNP: Insights into spliceosome assembly and catalysis.
      ); (15) Intron-lariat complex, 3JB9 (
      • Yan C.
      • Hang J.
      • Wan R.
      • Huang M.
      • Wong C.C.
      • Shi Y.
      Structure of a yeast spliceosome at 3.6-angstrom resolution.
      ); (16) Bact complex, 5GM6 (
      • Yan C.
      • Wan R.
      • Bai R.
      • Huang G.
      • Shi Y.
      Structure of a yeast activated spliceosome at 3.5 A resolution.
      ), C-complex, 5GMK (
      • Wan R.
      • Yan C.
      • Bai R.
      • Huang G.
      • Shi Y.
      Structure of a yeast catalytic step I spliceosome at 3.4 A resolution.
      ), 5LJ3 (
      • Galej W.P.
      • Wilkinson M.E.
      • Fica S.M.
      • Oubridge C.
      • Newman A.J.
      • Nagai K.
      Cryo-EM structure of the spliceosome immediately after branching.
      ); (17) C∗-complex, 5WSG (
      • Yan C.
      • Wan R.
      • Bai R.
      • Huang G.
      • Shi Y.
      Structure of a yeast step II catalytically activated spliceosome.
      ), 5MPS (
      • Fica S.M.
      • Oubridge C.
      • Galej W.P.
      • Wilkinson M.E.
      • Bai X.C.
      • Newman A.J.
      • Nagai K.
      Structure of a spliceosome remodelled for exon ligation.
      ); P-complex, 5YLZ (
      • Bai R.
      • Yan C.
      • Wan R.
      • Lei J.
      • Shi Y.
      Structure of the post-catalytic spliceosome from Saccharomyces cerevisiae.
      ), 6EXN (
      • Wilkinson M.E.
      • Fica S.M.
      • Galej W.P.
      • Norman C.M.
      • Newman A.J.
      • Nagai K.
      Postcatalytic spliceosome structure reveals mechanism of 3'-splice site selection.
      ), 6BK8 (
      • Liu S.
      • Li X.
      • Zhang L.
      • Jiang J.
      • Hill R.C.
      • Cui Y.
      • Hansen K.C.
      • Zhou Z.H.
      • Zhao R.
      Structure of the yeast spliceosomal postcatalytic P complex.
      ).
      In 1996, a breakthrough in RNA crystallography was achieved in the laboratories of Tom Cech and Jennifer Doudna with the crystal structure of the P4–P6 domain of the Tetrahymena group I intron, a structure twice the size of tRNA, featuring striking compact folds and novel types of RNA tertiary contacts (
      • Cate J.H.
      • Gooding A.R.
      • Podell E.
      • Zhou K.
      • Golden B.L.
      • Kundrot C.E.
      • Cech T.R.
      • Doudna J.A.
      Crystal structure of a group I ribozyme domain: Principles of RNA packing.
      ,
      • Cate J.H.
      • Gooding A.R.
      • Podell E.
      • Zhou K.
      • Golden B.L.
      • Szewczak A.A.
      • Kundrot C.E.
      • Cech T.R.
      • Doudna J.A.
      RNA tertiary structure mediation by adenosine platforms.
      ). With the amazing structure of P4–P6, many interested scientists learned that large RNAs could also be crystallized starting from in vitro synthesis and production. This breakthrough spurred fruitful efforts that quickly expanded our knowledge of the repertoire of surprising and beautiful RNA architectures, culminating in the structures of the ribosomal subunits themselves (
      • Ban N.
      • Nissen P.
      • Hansen J.
      • Moore P.B.
      • Steitz T.A.
      The complete atomic structure of the large ribosomal subunit at 2.4 A resolution.
      ,
      • Cate J.H.
      • Yusupov M.M.
      • Yusupova G.Z.
      • Earnest T.N.
      • Noller H.F.
      X-ray crystal structures of 70S ribosome functional complexes.
      ,
      • Schluenzen F.
      • Tocilj A.
      • Zarivach R.
      • Harms J.
      • Gluehmann M.
      • Janell D.
      • Bashan A.
      • Bartels H.
      • Agmon I.
      • Franceschi F.
      • Yonath A.
      Structure of functionally activated small ribosomal subunit at 3.3 angstroms resolution.
      ,
      • Wimberly B.T.
      • Brodersen D.E.
      • Clemons Jr., W.M.
      • Morgan-Warren R.J.
      • Carter A.P.
      • Vonrhein C.
      • Hartsch T.
      • Ramakrishnan V.
      Structure of the 30S ribosomal subunit.
      ) and the Nobel Prize in Chemistry to Venki Ramakrishnan, Tom Steitz, Ada Yonath in 2009 (https://www.nobelprize.org/prizes/chemistry/). And, after the Nobel Prize in Physiology or Medicine for RNA interference (RNAi) in 2006 to Andrew Fire and Craig Mello, the RNA community is celebrating the 2020 Nobel Prize in Chemistry awarded to Jennifer Doudna and Emmanuelle Charpentier for the structure-based design of the splendidly efficient RNA-programmed Crispr-cas9 system (
      • Jinek M.
      • Chylinski K.
      • Fonfara I.
      • Hauer M.
      • Doudna J.A.
      • Charpentier E.
      A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.
      ,
      • Jinek M.
      • Jiang F.
      • Taylor D.W.
      • Sternberg S.H.
      • Kaya E.
      • Ma E.
      • Anders C.
      • Hauer M.
      • Zhou K.
      • Lin S.
      • Kaplan M.
      • Iavarone A.T.
      • Charpentier E.
      • Nogales E.
      • Doudna J.A.
      Structures of Cas9 endonucleases reveal RNA-mediated conformational activation.
      ).

      The unfolding of RNA structural folding rules

      The accumulation of RNA structures, each one bringing either a new insight key to folding or additional confirmation and sequence variants, led to deep understanding of the main physicochemical features underlying RNA architecture. Some of these features, gained from structures of Figure 1, are gathered in Table 1. Up to the 1970s, structures were restricted to the bases, nucleosides, or nucleotides and analyzed the stereochemistry of nucleic acids (
      • Sundaralingam M.
      Stereochemistry of nucleic acids and their constituents.† IV. Allowed and preferred conformations of nucleosides, nucleoside mono-, di-, tri-, tetraphosphates, nucleic acids and polynucleotides.
      ), the orientation of the base with respect to the ribose, stacking (
      • Bugg C.E.
      • Thomas J.M.
      • Sundaralingam M.
      • Rao S.T.
      Stereochemistry of nucleic acids and their constituents. X. Solid-state base-stacking patterns in nucleic acid constituents and polynucleotides.
      ), sugar pucker (
      • Altona C.
      • Sundaralingam M.
      Conformational analysis of the sugar ring in nucleosides and nucleotides. A new description using the concept of pseudorotation.
      ), protonation states, tautomeric forms, effects of modifications (
      • Saenger W.
      Principles of Nucleic Acid Structure.
      ). These fundamental studies are still very relevant as shown by the roles of (de)protonation in nucleolytic ribozymes (
      • Lilley D.M.
      Catalysis by the nucleolytic ribozymes.
      ,
      • Seith D.D.
      • Bingaman J.L.
      • Veenis A.J.
      • Button A.C.
      • Bevilacqua P.C.
      Elucidation of catalytic strategies of small nucleolytic ribozymes from comparative analysis of active sites.
      ,
      • Wilson T.J.
      • Liu Y.
      • Li N.S.
      • Dai Q.
      • Piccirilli J.A.
      • Lilley D.M.J.
      Comparison of the structures and mechanisms of the pistol and hammerhead ribozymes.
      ) or of tautomeric forms in ribosomal translation (
      • Weixlbaumer A.
      • Murphy F.V.T.
      • Dziergowska A.
      • Malkiewicz A.
      • Vendeix F.A.
      • Agris P.F.
      • Ramakrishnan V.
      Mechanism for expanding the decoding capacity of transfer RNAs by modification of uridines.
      ,
      • Kurata S.
      • Weixlbaumer A.
      • Ohtsuki T.
      • Shimazaki T.
      • Wada T.
      • Kirino Y.
      • Takai K.
      • Watanabe K.
      • Ramakrishnan V.
      • Suzuki T.
      Modified uridines with C5-methylene substituents at the first position of the tRNA anticodon stabilize U.G wobble pairing during decoding.
      ,
      • Demeshkina N.
      • Jenner L.
      • Westhof E.
      • Yusupov M.
      • Yusupova G.
      A new understanding of the decoding principle on the ribosome.
      ,
      • Rozov A.
      • Demeshkina N.
      • Khusainov I.
      • Westhof E.
      • Yusupov M.
      • Yusupova G.
      Novel base-pairing interactions at the tRNA wobble position crucial for accurate reading of the genetic code.
      ). Despite its relatively small size (<80 nts), tRNAs have a tightly folded structure and provided many surprising and eye-opening insights into the logic of RNA folding (see also Table 1): short Watson–Crick paired helices coaxially stacked (Fig. 3, A and B), short-range and long-range interactions (Fig. 3C), the central roles of non-Watson–Crick base pairs in mediating tertiary contacts (Fig. 4), base–phosphate or sugar–phosphate contacts (Fig. 4, A and B), structured hairpin loops (with the famous U-turn) and their propensities to intimately interact, and base–base intercalative stacking interactions (Fig. 4C) that contribute to long-range loop–loop interactions.
      Table 1A short historical overview of folding rules derived from crystal structures
      DateRNAKey advances in revealing interactions within RNA structures
      50s, 60s, 70sBases, nucleo-sides/-tidesBase pairing (
      • Saenger W.
      Principles of Nucleic Acid Structure.
      ), stacking (
      • Bugg C.E.
      • Thomas J.M.
      • Sundaralingam M.
      • Rao S.T.
      Stereochemistry of nucleic acids and their constituents. X. Solid-state base-stacking patterns in nucleic acid constituents and polynucleotides.
      ), sugar puckers (
      • Altona C.
      • Sundaralingam M.
      Conformational analysis of the sugar ring in nucleosides and nucleotides. A new description using the concept of pseudorotation.
      ), protonation, tautomers, modifications, metal binding (
      • Swaminathan V.
      • Sundaralingam M.
      The crystal structures of metal complexes of nucleic acids and their constituents.
      )
      1976tRNAs- coaxial stacking between helices (continuous strand)

      - loop–loop interactions

      - non-Watson–Crick base pairs

      - U-turn structuration of loops

      - base triple contacts between a single-strand and the deep major groove of a helix

      - intercalation of unpaired bases

      - polyamines, Mg ions, lead cleavage
      1991RNA helicesO2’H…O2’ H-bonds between adjacent helices (
      • Dock-Bregeon A.C.
      • Chevrier B.
      • Podjarny A.
      • Johnson J.
      • de Bear J.S.
      • Gough G.R.
      • Gilham P.T.
      • Moras D.
      Crystallographic structure of an RNA helix: [U(UA)6A]2.
      ), 4-way junctions (
      • Krol A.
      • Westhof E.
      • Bach M.
      • Luhrmann R.
      • Ebel J.P.
      • Carbon P.
      Solution structure of human U1 snRNA. Derivation of a possible three-dimensional model.
      )
      1995Hammerhead ribozyme3-way junction structured by non-Watson–Crick pairs (
      • Scott W.G.
      • Finch J.T.
      • Klug A.
      The crystal structure of an all-RNA hammerhead ribozyme: A proposed mechanism for RNA catalytic cleavage.
      ,
      • Pley H.W.
      • Flaherty K.M.
      • McKay D.B.
      Model for an RNA tertiary interaction from the structure of an intermolecular complex between a GAAA tetraloop and an RNA helix.
      ), GNRA/helix minor groove contacts (
      • Pley H.W.
      • Flaherty K.M.
      • McKay D.B.
      Model for an RNA tertiary interaction from the structure of an intermolecular complex between a GAAA tetraloop and an RNA helix.
      )
      1996P4-P6A-minor contacts, ribose zippers, platform triples, packing of helices, loop–loop interactions (
      • Cate J.H.
      • Gooding A.R.
      • Podell E.
      • Zhou K.
      • Golden B.L.
      • Kundrot C.E.
      • Cech T.R.
      • Doudna J.A.
      Crystal structure of a group I ribozyme domain: Principles of RNA packing.
      • Cate J.H.
      • Gooding A.R.
      • Podell E.
      • Zhou K.
      • Golden B.L.
      • Szewczak A.A.
      • Kundrot C.E.
      • Cech T.R.
      • Doudna J.A.
      RNA tertiary structure mediation by adenosine platforms.
      )
      1997Loop EContinuous stack of non-Watson–Crick pairs, Mg ions (
      • Correll C.C.
      • Freeborn B.
      • Moore P.B.
      • Steitz T.A.
      Metals, motifs, and recognition in the crystal structure of a 5S rRNA domain.
      )
      1999Hepatitis delta virus ribozymeHighly constrained pseudoknot, continuous A-minor contacts (
      • Ferre-D'Amare A.R.
      • Zhou K.
      • Doudna J.A.
      Crystal structure of a hepatitis delta virus ribozyme.
      )
      2000K-turnRecurrent RNA modular unit based on non-WC pairs (
      • Nissen P.
      • Ippolito J.A.
      • Ban N.
      • Moore P.B.
      • Steitz T.A.
      RNA tertiary interactions in the large ribosomal subunit: The A-minor motif.
      ,
      • Ogle J.M.
      • Brodersen D.E.
      • Clemons Jr., W.M.
      • Tarry M.J.
      • Carter A.P.
      • Ramakrishnan V.
      Recognition of cognate transfer RNA by the 30S ribosomal subunit.
      )
      2000RibosomeA goldmine of molecular contacts. Prevalence of A-minor contacts (
      • Nissen P.
      • Ippolito J.A.
      • Ban N.
      • Moore P.B.
      • Steitz T.A.
      RNA tertiary interactions in the large ribosomal subunit: The A-minor motif.
      ); A-minor recognition of codon/anticodon triplet (
      • Ogle J.M.
      • Brodersen D.E.
      • Clemons Jr., W.M.
      • Tarry M.J.
      • Carter A.P.
      • Ramakrishnan V.
      Recognition of cognate transfer RNA by the 30S ribosomal subunit.
      ), base–phosphate contacts (
      • Zirbel C.L.
      • Sponer J.E.
      • Sponer J.
      • Stombaugh J.
      • Leontis N.B.
      Classification and energetics of the base-phosphate interactions in RNA.
      ), conservation of GoU pairs (
      • Wu J.
      • Zhou C.
      • Li J.
      • Li C.
      • Tao X.
      • Leontis N.B.
      • Zirbel C.L.
      • Bisaro D.M.
      • Ding B.
      Functional analysis reveals G/U pairs critical for replication and trafficking of an infectious non-coding viroid RNA.
      ), multiple junctions (
      • Noller H.F.
      RNA structure: Reading the ribosome.
      )
      A GNRA tetraloop is a stretch of four nucleotides capping a hairpin and starting with a guanine (G), any of the four nucleotides (A, G, C, U, or N), a purine (A or G, R) and ending with an adenine (A). The GoU pairs were first suggested by Francis Crick (
      • Crick F.H.
      Codon--anticodon pairing: The wobble hypothesis.
      ) for explaining the degeneracy of the genetic code. In such a pair, a U pairs with a G instead of a C. GoU pairs have definite characteristics and play key roles in RNA structures and biology (for a recent overview, see (
      • Westhof E.
      • Yusupov M.
      • Yusupova G.
      The multiple flavors of GoU pairs in RNA.
      )).
      Figure thumbnail gr3
      Figure 3Three representations of the interactions present between nucleotides in transfer RNA with increasing levels of structural complexity. A, standard cloverleaf structure of yeast tRNAAsp (
      • Gangloff J.
      • Keith G.
      • Ebel J.P.
      • Dirheimer G.
      Structure of aspartate-tRNA from brewer's yeast.
      ). B, a two-dimensional view of the tertiary structure of yeast tRNAAsp, it follows the representation proposed by Kim (
      • Kim S.H.
      Three-dimensional structure of transfer RNA and its functional implications.
      ) that stresses the two main arms made of helical stems, the acceptor-stem with the Thymine (T)-stem and of the Dihydrouridine (D)-stem with the anticodon stem. A stem capped by a loop is called a hairpin. The numbering follows that of yeast tRNAPhe and because the numbers of nucleotides are not the same in the D- and variable-loops, residues 17 and 47 are skipped and the residue following D20 is C20a. The representation clearly shows the contacts linking the T- and D-loops and the tertiary base pairs and triples between the single-stranded segments and the D-hairpin. The contacts represented correspond to those observed in the yeast tRNAAsp structure (
      • Moras D.
      • Comarmond M.B.
      • Fischer J.
      • Weiss R.
      • Thierry J.C.
      • Ebel J.P.
      • Giege R.
      Crystal structure of yeast tRNAAsp.
      ,
      • Westhof E.
      • Dumas P.
      • Moras D.
      Crystallographic refinement of yeast aspartic acid transfer RNA.
      ). For characterizing the tertiary pairs, the following nomenclature is used (
      • Leontis N.B.
      • Westhof E.
      Geometric nomenclature and classification of RNA base pairs.
      ). Nucleic acid bases can interact through three possible edges: the Watson–Crick edge, the Hoogsteen edge (the edge with N7 in purines or C5 in pyrimidines), and the sugar edge (O2 in pyridines or N3 and N2 in purines, with often the hydroxyl O2’ of the ribose). The nucleotides can interact with the sugars on the same side of the H-bonds (like in normal Watson–Crick pairs) and the pair is called cis; or on opposite sides and the pair is called trans. The three symbols, circle, square, triangle, represent respectively the Watson–Crick, the Hoogsteen, and the sugar edges. When the pair is cis, the symbols are dark and, when in trans, they are white. This nomenclature applies to the large number of specific base–base interactions. Pairs form through single H-bond (see F) or bifurcated H-bonds (see B) are not easily annotated. C, the tertiary structure of yeast tRNAAsp with the four domains colored (green: acceptor stem; yellow; T-hairpin; blue: D-hairpin; red; anticodon hairpin; the two nucleotides U8 and R9 (generally a purine) linking the 5’-end acceptor strand to the D-strand is magenta; the variable loop linking the 3’-end of the anticodon hairpin to the 5’-end T-strand is orange). The double arrows (green and red) indicate the sets of helices that stack upon each other in a coaxial manner in the three-dimensional fold. Capital letters indicate the position of the contacts shown in .
      Figure thumbnail gr4
      Figure 4Illustrations of contacts discussed in the text. A, the U-turn after U33 in the anticodon loop: the torsion angle about P-O5’ of the 5’-phosphate of residue 34 is trans (180°) instead of the usual gauche-minus (−60°); the 5’-phosphate of residue 35 stacks below U33; there is a H-bond between N3-H of U33 and an anionic phosphate oxygen of the 5’phosphate of residue 36. Thus, all three residues of the anticodon triplet have some interaction with the highly conserved U33 (in mammalian initiator tRNAMet, C33 occurs). B, the equivalent U-turn in the T-loop where the U is a pseudouridine (noted Ψ or Psi). The bifurcated pair Ψ55oG18 in which the O4(Ψ) interacts with both N1-H and N2-H of the guanine is also shown. The residue A58 stacks above G18. C, interdigitated nucleotides between the D- and T-loops. D, the highly conserved trans Watson–Crick/Hoogsteen pair between U8 and A14 forms three H-bonds with A21, also highly conserved. E, the famous trans Watson–Crick/Watson–Crick pair between R15 and Y48. Levitt (
      • Levitt M.
      Detailed molecular model for transfer ribonucleic acid.
      ) noted that residue 15 is always a purine and residue always a pyrimidine and modeled that pair as a regular cis Watson–Crick/Watson–Crick. Notice that, in standard nucleotide conformations, trans base pairs lead to parallel strands and not antiparallel strands as in usual helices (
      • Westhof E.
      Westhof's rule.
      ). F, nucleotides 32 and 38 immediately adjacent to the last base pair of the anticodon stem generally present a single H-bond (for details (
      • Auffinger P.
      • Westhof E.
      An extended structural signature for the tRNA anticodon loop.
      )).
      The quest for the structure of the hammerhead ribozyme took more than 10 years before a coherent picture emerged and overall, about 20 years since its discovery (
      • Hutchins C.J.
      • Rathjen P.D.
      • Forster A.C.
      • Symons R.H.
      Self-cleavage of plus and minus RNA transcripts of avocado sunblotch viroid.
      ,
      • Prody G.A.
      • Bakos J.T.
      • Buzayan J.M.
      • Schneider I.R.
      • Bruening G.
      Autolytic processing of dimeric plant virus satellite RNA.
      ) (for short overviews see (
      • Westhof E.
      A tale in molecular recognition: The hammerhead ribozyme.
      ,
      • Uhlenbeck O.C.
      Less isn't always more.
      )). In any case, it displays a three-way junction maintained in the proper relative orientations of the helices by non-Watson–Crick pairs (Fig. 5). As mentioned above, the structure of P4–P6, a fragment of the group I intron from Tetrahymena thermophila (
      • Cate J.H.
      • Gooding A.R.
      • Podell E.
      • Zhou K.
      • Golden B.L.
      • Kundrot C.E.
      • Cech T.R.
      • Doudna J.A.
      Crystal structure of a group I ribozyme domain: Principles of RNA packing.
      ), unveiled several key recurrent RNA folding rules. As alluded to above, that structure, beyond the breathtaking views on RNA folds, initiated a strong impetus on the RNA crystallography community. An illustration is shown in Figure 6 and short descriptions in Table 1.
      Figure thumbnail gr5
      Figure 5A native state of the hammerhead ribozyme (left drawing) that promotes cleavage at low magnesium concentrations (
      • Khvorova A.
      • Lescoute A.
      • Westhof E.
      • Jayasena S.D.
      Sequence elements outside the hammerhead ribozyme catalytic core enable intracellular activity.
      ) that is reached through intricate contacts (and variable depending on the type of ribozymes) between an apical loop and an internal loop (drawn in red color). Without these tertiary contacts, a different core of the three-way junction (drawn in magenta in both structures) is observed (compare the regions in magenta color). The nomenclature described in B is used for the non-Watson–Crick pairs (
      • Leontis N.B.
      • Westhof E.
      Geometric nomenclature and classification of RNA base pairs.
      ).
      Figure thumbnail gr6
      Figure 6From left to right, a two-dimensional view of the P4–P6 domain with the nomenclature described in B (
      • Leontis N.B.
      • Westhof E.
      Geometric nomenclature and classification of RNA base pairs.
      ) and, with next to it, the whole molecule shown in space-filling mode highlighting the coaxial stacking of helices and their parallel packing. In the left two-dimensional view, the region boxed in green shows how the RNA is able to bend 180° and the regions boxed in red are shown in space-filling and atomic views on the right of the figure; these views show the precise and tight contact with the tetraloop GAAA and its receptor, called the 11-nt motif; notice that there are twice as many H-bonds between the hydroxyl groups (red dotted lines) than between the bases (black dotted lines). That type of RNA–RNA contacts were discovered through sequence analysis and SELEX experiments (
      • Costa M.
      • Michel F.
      Frequent use of the same tertiary motif by self-folding RNAs.
      ).

      Form and function

      The quote from Wainwright (
      • Wainwright S.A.
      Form and function in organisms.
      ) captures aptly the conundrum: “Structure without function is a corpse; function without structure is a ghost” (
      • Wainwright S.A.
      Form and function in organisms.
      ). And it would take a lot more pages to do justice to what RNA structures have taught us about biology. Many RNA structures have been milestones and fueled our advances in understanding biochemical function and biological evolution. Forty years after the establishment by Tom Cech (
      • Cech T.R.
      • Zaug A.J.
      • Grabowski P.J.
      In vitro splicing of the ribosomal RNA precursor of tetrahymena: Involvement of a guanosine nucleotide in the excision of the intervening sequence.
      ) and Sidney Altman (
      • Guerrier-Takada C.
      • Gardiner K.
      • Marsh T.
      • Pace N.
      • Altman S.
      The RNA moiety of ribonuclease P is the catalytic subunit of the enzyme.
      ) that all cells exploit RNA catalysis, we now have high-resolution structures of different catalytic RNAs in various stages (see Fig. 1) (
      • Lilley D.M.J.
      Classification of the nucleolytic ribozymes based upon catalytic mechanism.
      ,
      • Wilson T.J.
      • Liu Y.
      • Domnick C.
      • Kath-Schorr S.
      • Lilley D.M.
      The novel chemical mechanism of the twister ribozyme.
      ,
      • Strobel S.A.
      • Cochrane J.C.
      RNA catalysis: Ribozymes, ribosomes, and riboswitches.
      ,
      • Marcia M.
      • Pyle A.M.
      Visualizing group II intron catalysis through the stages of splicing.
      ,
      • Doudna J.A.
      • Lorsch J.R.
      Ribozyme catalysis: Not different, just worse.
      ). We have also learned from ribosome structures that peptide bond catalysis during translation is performed in an RNA only environment (
      • Cech T.R.
      Structural biology. The ribosome is a ribozyme.
      ,
      • Hiller D.A.
      • Singh V.
      • Zhong M.
      • Strobel S.A.
      A two-step chemical mechanism for ribosome-catalysed peptide bond formation.
      ,
      • Schmeing T.M.
      • Huang K.S.
      • Strobel S.A.
      • Steitz T.A.
      An induced-fit mechanism to promote peptide bond formation and exclude hydrolysis of peptidyl-tRNA.
      ). And the recent structures of the active complexes of the spliceosome (see Fig. 2) (
      • Wan R.
      • Bai R.
      • Yan C.
      • Lei J.
      • Shi Y.
      Structures of the catalytically activated yeast spliceosome reveal the mechanism of branching.
      ,
      • Wan R.
      • Bai R.
      • Shi Y.
      Molecular choreography of pre-mRNA splicing by the spliceosome.
      ,
      • Wilkinson M.E.
      • Charenton C.
      • Nagai K.
      RNA splicing by the spliceosome.
      ,
      • Kastner B.
      • Will C.L.
      • Stark H.
      • Luhrmann R.
      Structural insights into nuclear pre-mRNA splicing in higher eukaryotes.
      ), together with the structural similarities with the autocatalytic group II introns (
      • Toor N.
      • Keating K.S.
      • Taylor S.D.
      • Pyle A.M.
      Crystal structure of a self-spliced group II intron.
      ,
      • Costa M.
      • Walbott H.
      • Monachello D.
      • Westhof E.
      • Michel F.
      Crystal structures of a group II intron lariat primed for reverse splicing.
      ), indicate clearly also that the chemistry of splicing is performed by RNA elements. In those large RNP complexes, such as the ribosomes and the spliceosomes, the RNA elements are positioned precisely for function by the concomitant actions of RNA folding and protein complex formation. In the ribosome field, technological development in cryo-electron microscopy has led to stunning increases in resolution; for example, we now can delve inside ribosomal structures at 2 Å resolution (
      • Watson Z.L.
      • Ward F.R.
      • Meheust R.
      • Ad O.
      • Schepartz A.
      • Banfield J.F.
      • Cate J.H.
      Structure of the bacterial ribosome at 2 A resolution.
      ,
      • Bai X.C.
      • Fernandez I.S.
      • McMullan G.
      • Scheres S.H.
      Ribosome structures to near-atomic resolution from thirty thousand cryo-EM particles.
      ). Further, active ribosomes are assembled through long and convoluted maturation processes. Structures are now appearing showing various steps in the maturation of ribosomes with changes in RNA base pairing and structures and with binding of maturation cofactors absent in the final assembled ribosomes (
      • Jomaa A.
      • Fu Y.H.
      • Boehringer D.
      • Leibundgut M.
      • Shan S.O.
      • Ban N.
      Structure of the quaternary complex between SRP, SR, and translocon bound to the translating ribosome.
      ,
      • Lau B.
      • Cheng J.
      • Flemming D.
      • La Venuta G.
      • Berninghausen O.
      • Beckmann R.
      • Hurt E.
      Structure of the maturing 90S pre-ribosome in association with the RNA exosome.
      ,
      • Ameismeier M.
      • Zemp I.
      • van den Heuvel J.
      • Thoms M.
      • Berninghausen O.
      • Kutay U.
      • Beckmann R.
      Structural basis for the final steps of human 40S ribosome maturation.
      ,
      • Cheng J.
      • Lau B.
      • La Venuta G.
      • Ameismeier M.
      • Berninghausen O.
      • Hurt E.
      • Beckmann R.
      90S Pre-ribosome transformation into the primordial 40S subunit.
      ). While the ribosome performs the programmed steps in translation (initiation, translocation, termination) essentially as a dynamical single complex entity, this is not the case for the spliceosome. Driven by several splicing or regulatory factors and RNA-dependent ATPase/helicases, the RNA and protein composition as well as the structures of the spliceosomal complex change along the catalytic steps, from the recognition of the 5’ and 3’ splice junctions and the branching point leading to the B∗ complex ready for the first step of splicing (cleavage at 5’ splice site) to the C∗ complex ready for the second step of splicing, the ligation of the exons, and the disassembly of the spliceosome for initiating a new cycle (
      • Wan R.
      • Bai R.
      • Shi Y.
      Molecular choreography of pre-mRNA splicing by the spliceosome.
      ,
      • Wilkinson M.E.
      • Charenton C.
      • Nagai K.
      RNA splicing by the spliceosome.
      ). As shown in Figure 2, we now can visualize many of the states along the spliceosomal cycle and derive illuminating movies after decades of sustained efforts by several groups around the world.
      One should always keep in mind the famous quote by T. Dobzhansky: “Nothing in Biology Makes Sense except in the Light of Evolution” (
      • Dobzhansky T.
      Nothing in biology makes sense except in the light of evolution.
      ). The accumulated RNA structures offer us amazing insights into the principles of molecular evolution underlying biological evolution. Powerful techniques in molecular evolution have demonstrated that starting from a random sequence one can isolate sequences specific for binding a given ligand or with a defined function (
      • Ellington A.D.
      • Szostak J.W.
      In vitro selection of RNA molecules that bind specific ligands.
      ,
      • Tuerk C.
      • Gold L.
      Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase.
      ,
      • Beaudry A.A.
      • Joyce G.F.
      Directed evolution of an RNA enzyme.
      ). The structures of many aptamers are in the PDB now and new ones are still being selected and crystallized, such as the RNA fluorophores (
      • Warner K.D.
      • Chen M.C.
      • Song W.
      • Strack R.L.
      • Thorn A.
      • Jaffrey S.R.
      • Ferre-D'Amare A.R.
      Structural basis for activity of highly efficient RNA mimics of green fluorescent protein.
      ,
      • Trachman 3rd, R.J.
      • Abdolahzadeh A.
      • Andreoni A.
      • Cojocaru R.
      • Knutson J.R.
      • Ryckelynck M.
      • Unrau P.J.
      • Ferre-D'Amare A.R.
      Crystal structures of the mango-II RNA aptamer reveal heterogeneous fluorophore binding and guide engineering of variants with improved selectivity and brightness.
      ,
      • Trachman 3rd, R.J.
      • Autour A.
      • Jeng S.C.Y.
      • Abdolahzadeh A.
      • Andreoni A.
      • Cojocaru R.
      • Garipov R.
      • Dolgosheina E.V.
      • Knutson J.R.
      • Ryckelynck M.
      • Unrau P.J.
      • Ferre-D'Amare A.R.
      Structure and functional reselection of the mango-III fluorogenic RNA aptamer.
      ). The SELEX experiments led to the discovery of riboswitches in bacteria (
      • Mandal M.
      • Lee M.
      • Barrick J.E.
      • Weinberg Z.
      • Emilsson G.M.
      • Ruzzo W.L.
      • Breaker R.R.
      A glycine-dependent riboswitch that uses cooperative binding to control gene expression.
      ). Riboswitches are RNA sequences that, during transcription in the presence of a given ligand, fold into a native structure different from the one obtained after transcription in the absence of the same ligand (
      • McCown P.J.
      • Corbino K.A.
      • Stav S.
      • Sherlock M.E.
      • Breaker R.R.
      Riboswitch diversity and distribution.
      ,
      • Salvail H.
      • Balaji A.
      • Yu D.
      • Roth A.
      • Breaker R.R.
      Biochemical validation of a fourth guanidine riboswitch class in bacteria.
      ,
      • Serganov A.
      • Yuan Y.R.
      • Pikovskaya O.
      • Polonskaia A.
      • Malinina L.
      • Phan A.T.
      • Hobartner C.
      • Micura R.
      • Breaker R.R.
      • Patel D.J.
      Structural basis for discriminative regulation of gene expression by adenine- and guanine-sensing mRNAs.
      ). The alternative folds exert different action on the production of metabolic enzymes through transcription termination (or antitermination) or inhibition (or not) of ribosome binding for initiation of translation. There are now many structures of riboswitches with and without ligand bound (
      • McCown P.J.
      • Corbino K.A.
      • Stav S.
      • Sherlock M.E.
      • Breaker R.R.
      Riboswitch diversity and distribution.
      ,
      • Breaker R.R.
      Imaginary ribozymes.
      ). Amazingly, the ligand can be as small as a fluoride ion (
      • Ren A.
      • Rajashankar K.R.
      • Patel D.J.
      Fluoride ion encapsulation by Mg2+ ions and phosphates in a fluoride riboswitch.
      ) or as large as vitamin B12 (
      • Johnson Jr., J.E.
      • Reyes F.E.
      • Polaski J.T.
      • Batey R.T.
      B12 cofactors directly stabilize an mRNA regulatory switch.
      ,
      • Peselis A.
      • Serganov A.
      Structural insights into ligand binding and gene expression control by an adenosylcobalamin riboswitch.
      ). Furthermore, there are several families of sequences for the same or similar ligand; up to four classes of guanidine and SAM riboswitches have been identified (
      • Breaker R.R.
      Imaginary ribozymes.
      ,
      • Sun A.
      • Gasser C.
      • Li F.
      • Chen H.
      • Mair S.
      • Krasheninina O.
      • Micura R.
      • Ren A.
      SAM-VI riboswitch structure and signature for ligand discrimination.
      ).
      Beyond the comforting insights on the RNA world at the origins of life about 2.8 billion years ago, all these RNA structures, coupled with structural alignments of homologous sequences, offer us multiple “Rosetta stones” for deciphering the RNA molecular evolution principles (
      • Zuckerkandl E.
      • Pauling L.
      Molecules as documents of evolutionary history.
      ). Prior to the visualization of three-dimensional RNA architectures, RNA sequence alignments played major roles (and they still continue to do so). Through the analysis of nucleotide positions covarying according to the Watson–Crick rules, one can deduce the secondary structure (i.e., the ensemble of base-paired RNA helices) of a set of homologous RNA molecules. The third kingdom of life, the Archaea, were discovered by Carl Woese by sequencing ribosomal RNA and aligning them (
      • Woese C.R.
      • Fox G.E.
      Phylogenetic structure of the prokaryotic domain: The primary kingdoms.
      ,
      • Woese C.R.
      • Kandler O.
      • Wheelis M.L.
      Towards a natural system of organisms: Proposal for the domains archaea, bacteria, and eucarya.
      ). The bacteria are still identified according to the 16S rRNA sequences (
      • Fox G.E.
      • Stackebrandt E.
      • Hespell R.B.
      • Gibson J.
      • Maniloff J.
      • Dyer T.A.
      • Wolfe R.S.
      • Balch W.E.
      • Tanner R.S.
      • Magrum L.J.
      • Zablen L.B.
      • Blakemore R.
      • Gupta R.
      • Bonen L.
      • Lewis B.J.
      • et al.
      The phylogeny of prokaryotes.
      ). The secondary structures of many functional structured RNAs were obtained by sequence alignments (
      • Noller H.F.
      • Woese C.R.
      Secondary structure of 16S ribosomal RNA.
      ,
      • Michel F.
      • Jacquier A.
      • Dujon B.
      Comparison of fungal mitochondrial introns reveals extensive homologies in RNA secondary structure.
      ,
      • James B.D.
      • Olsen G.J.
      • Liu J.S.
      • Pace N.R.
      The secondary structure of ribonuclease P RNA, the catalytic element of a ribonucleoprotein enzyme.
      ,
      • Michel F.
      • Costa M.
      • Massire C.
      • Westhof E.
      Modeling RNA tertiary structure from patterns of sequence variation.
      ,
      • Michel F.
      • Umesono K.
      • Ozeki H.
      Comparative and functional anatomy of group II catalytic introns--a review.
      ,
      • Michel F.
      • Ferat J.L.
      Structure and activities of group II introns.
      ). The iconic cloverleaf structure of all cellular tRNAs was deduced with the first two sequences determined (
      • Holley R.W.
      • Apgar J.
      • Everett G.A.
      • Madison J.T.
      • Marquisee M.
      • Merrill S.H.
      • Penswick J.R.
      • Zamir A.
      Structure of a ribonucleic acid.
      ,
      • Madison J.T.
      • Everett G.A.
      • Kung H.
      Nucleotide sequence of a yeast tyrosine transfer RNA.
      ). The availability of both RNA sequences and RNA structures allows structural alignment patterned on RNA architectures. These allow us to learn about how RNAs evolve, in other words how changes or mutations in homologous sequences maintain the RNA folded architecture. One can learn which variations are neutral and which molecular interactions are conserved through species. When several homologous three-dimensional structures are available, one can learn also how molecular accommodations occur and how ions or water molecules compensate for the loss of some interaction. By analysis of sequence alignments coupled with structures, one can also visualize how molecular units can swap or interchange and determine which interactions are opportunistic and not critical for form and function. Integrated databases with sophisticated techniques for guaranteeing the interoperability of data will need to be developed to exploit fully the one-dimensional and the three-dimensional data. In complexes between RNA and proteins, the diversity and multiplicity of intermolecular contacts are enormously potentiated. The understanding of RNP formation and structure is a major challenge for years to come.

      A short history of RNA modeling and RNA assembly computational tools

      In the 1980s, the PDB also accepted structures derived by computational modeling. One of us did deposit several structures of RNA modeled on the basis of sequence alignments and/or chemical and enzymatic probing in solution. While PDB may have accepted these submissions somewhat reluctantly, many of these modeled structures were of intense interest to the growing crowd of RNA aficionados, and their accessibility stimulated experimentalists to put them to the test. After presenting the modeling of the core of group I introns (
      • Michel F.
      • Westhof E.
      Modelling of the three-dimensional architecture of group I catalytic introns based on comparative sequence analysis.
      ) at The 49th Pittsburgh Diffraction Conference, Columbus, in November 1991, there was almost a riot in the room with half of the participants arguing for and the other half against the validity of the procedure. These modeled structures can no longer be found on the PDB but can be retrieved from the following website https://eric-westhof.ibmc.cnrs.fr/. Interestingly, wwPDB recently held a workshop on data standards for integrative structural models (
      • Berman H.M.
      • Adams P.D.
      • Bonvin A.A.
      • Burley S.K.
      • Carragher B.
      • Chiu W.
      • DiMaio F.
      • Ferrin T.E.
      • Gabanyi M.J.
      • Goddard T.D.
      • Griffin P.R.
      • Haas J.
      • Hanke C.A.
      • Hoch J.C.
      • Hummer G.
      • et al.
      Federating structural models and data: Outcomes from a workshop on archiving integrative structures.
      ), and Protein Data Bank Japan (PDBj) recently launched the Biological Structure Model Archive (BSM-Arc) (
      • Bekker G.J.
      • Kawabata T.
      • Kurisu G.
      The biological structure model archive (BSM-Arc): An archive for in silico models and simulations.
      ).
      Figure 7 (top) shows a timeline of selected RNA models that have been published. It starts with the modeling work of Fuller and Hodgson (
      • Fuller W.
      • Hodgson A.
      Conformation of the anticodon loop intRNA.
      ) on the anticodon loop of tRNA. They manually assembled space filling (
      • Koltun W.L.
      Precision space-filling atomic models.
      ) and Kendrew (
      • Langridge R.
      • Marvin D.A.
      • Seeds W.E.
      • Wilson H.R.
      • Hooper C.W.
      • Wilkins M.H.F.
      • Hamilton L.D.
      The molecular configuration of deoxyribonucleic acid: II. Molecular models and their Fourier transforms.
      ) models. They stated very clearly the objectives that are still of value today: “We do not necessarily believe that our models describe the actual molecular conformations to an accuracy of a few hundredths of an angstrom, but the analysis shows that a model with the general characteristics we propose can be built with acceptable stereochemistry. Only if model building is treated as a rigid discipline with strict attention paid to detailed stereochemistry can the results of a study such as this be considered reliable and meaningful.” They did identify and rationalize the correct stacking of the five nucleotides on the 3’-end of the loop with the 3’-strand of the anticodon helix (called 3’-stack). Michael Levitt assembled in 1969 a whole tRNA (
      • Levitt M.
      Detailed molecular model for transfer ribonucleic acid.
      ). For that model, the available sequences were used and analyzed. The choice of the stacked arms followed previous small-angle X-ray scattering (
      • Lake J.A.
      • Beeman W.W.
      On the conformation of yeast transfer RNA.
      ) (Fig. 3, AC). Interestingly, contacts between the D- and T-loops are proposed as well as base pairs between residues 8 and 14 as well as 15 and 48 (now sometimes called the Levitt pair), but both in the usual Watson–Crick configurations instead of the observed trans configuration (Fig. 4, D and E). Sequence analysis coupled with model building led to the suggestion that some RNAs can form pseudoknots, for example, between a hairpin loop and a single strand (meaning that the single strand does not go through the loop) (
      • Pleij C.W.
      • Rietveld K.
      • Bosch L.
      A new principle of RNA folding based on pseudoknotting.
      ). This fold, now on the logo of the International RNA Society (https://www.rnasociety.org/), is present in a large number of functional structured RNAs. In 1987, Kim and Cech (
      • Kim S.H.
      • Cech T.R.
      Three-dimensional model of the active site of the self-splicing rRNA precursor of tetrahymena.
      ) published a model of the core of group I intron on the basis of the available experimental data and with clear model building principles that are still valid. They are worth restating: “(i) RNA duplexes were assumed to have A-form RNA helix conformation. (ii) If two duplex stems were separated by fewer than three unpaired nucleotides, they were stacked collinearly. (iii) If one helix competed with two others for collinear stacking, the two helices separated by the least number of unpaired nucleotides were chosen to form a stacked helix. (iv) Non-Watson-Crick base-pairing was allowed at the junction of two helices. (v) Single “bulged” bases were stacked within a helix. (vi) If two conserved bases in a single-stranded region were in proximity, base-pairing was attempted subject to the constraints of the chemical-modification data. (vii) Loop conformations were taken from those in the tRNA structure.” Kim had built the tRNA structure in electron density in Alex Rich’s laboratory (
      • Kim S.H.
      • Quigley G.
      • Suddath F.L.
      • McPherson A.
      • Sneden D.
      • Kim J.J.
      • Weinzierl J.
      • Blattmann P.
      • Rich A.
      The three-dimensional structure of yeast phenylalanine transfer RNA: Shape of the molecule at 5.5-A resolution.
      ,
      • Kim S.H.
      • Sussman J.L.
      • Suddath F.L.
      • Quigley G.J.
      • McPherson A.
      • Wang A.H.
      • Seeman N.C.
      • Rich A.
      The general structure of transfer RNA molecules.
      ) some years before. The coaxial stacks P3–P7–P8 and P4–P6 were identified (
      • Kim S.H.
      • Cech T.R.
      Three-dimensional model of the active site of the self-splicing rRNA precursor of tetrahymena.
      ). Afterward, with the development of molecular graphics and refinement programs, full molecular models with coordinates could be generated (Fig. 7).
      Figure thumbnail gr7
      Figure 7Top, a timeline of RNA models. Before 1987, no sets of coordinates for the suggested models were made available. With the development of modeling tools based on computer graphics, one could derive coordinates (without manually building physical models) and refine them (see bottom). Along time, the following RNA models are highlighted: (1) the anticodon loop (
      • Fuller W.
      • Hodgson A.
      Conformation of the anticodon loop intRNA.
      ); (2) the Levitt tRNA (
      • Levitt M.
      Detailed molecular model for transfer ribonucleic acid.
      ); (3) the pseudoknot fold (
      • Pleij C.W.
      • Rietveld K.
      • Bosch L.
      A new principle of RNA folding based on pseudoknotting.
      ); (4) the Kim & Cech core of group I intron (
      • Kim S.H.
      • Cech T.R.
      Three-dimensional model of the active site of the self-splicing rRNA precursor of tetrahymena.
      ); (5) the tRNA-like in TYMV (
      • Dumas P.
      • Moras D.
      • Florentz C.
      • Giege R.
      • Verlaan P.
      • Van Belkum A.
      • Pleij C.W.
      3-D graphics modelling of the tRNA-like 3'-end of turnip yellow mosaic virus RNA: Structural and functional implications.
      ); (6) the GNRA loop in 5S rRNA (
      • Westhof E.
      • Romby P.
      • Romaniuk P.J.
      • Ebel J.P.
      • Ehresmann C.
      • Ehresmann B.
      Computer modeling from solution data of spinach chloroplast and of Xenopus laevis somatic and oocyte 5 S rRNAs.
      ); (7) the Michel & Westhof core of group I intron (
      • Michel F.
      • Westhof E.
      Modelling of the three-dimensional architecture of group I catalytic introns based on comparative sequence analysis.
      ); (7) the 4-way junction of U1 snRNA (
      • Krol A.
      • Westhof E.
      • Bach M.
      • Luhrmann R.
      • Ebel J.P.
      • Carbon P.
      Solution structure of human U1 snRNA. Derivation of a possible three-dimensional model.
      ); (8) the tRNA selenocysteine (
      • Sturchler C.
      • Westhof E.
      • Carbon P.
      • Krol A.
      Unique secondary and tertiary structural features of the eucaryotic selenocysteine tRNA(Sec).
      ); (9) the hammerhead (
      • Tuschl T.
      • Gohlke C.
      • Jovin T.M.
      • Westhof E.
      • Eckstein F.
      A three-dimensional model for the hammerhead ribozyme based on fluorescence measurements.
      ) and the hepatitis delta (
      • Tanner N.K.
      • Schaff S.
      • Thill G.
      • Petit-Koskas E.
      • Crain-Denoyelle A.M.
      • Westhof E.
      A three-dimensional model of hepatitis delta virus ribozyme based on biochemical and mutational analyses.
      ) ribozymes; (10) full group I introns (
      • Serganov A.A.
      • Masquida B.
      • Westhof E.
      • Cachia C.
      • Portier C.
      • Garber M.
      • Ehresmann B.
      • Ehresmann C.
      The 16S rRNA binding site of Thermus thermophilus ribosomal protein S15: Comparison with Escherichia coli S15, minimum site and structure.
      ); (11) A and B families of RNase P (
      • Massire C.
      • Jaeger L.
      • Westhof E.
      Derivation of the three-dimensional architecture of bacterial ribonuclease P RNAs from comparative sequence analysis.
      ); (12) the Azoarcus group I intron (
      • Rangan P.
      • Masquida B.
      • Westhof E.
      • Woodson S.A.
      Architecture and folding mechanism of the Azoarcus group I pre-tRNA.
      ). Many RNA structures were also modeled afterward, especially within the RNA-Puzzles Consortium (
      • Cruz J.A.
      • Blanchet M.F.
      • Boniecki M.
      • Bujnicki J.M.
      • Chen S.J.
      • Cao S.
      • Das R.
      • Ding F.
      • Dokholyan N.V.
      • Flores S.C.
      • Huang L.
      • Lavender C.A.
      • Lisi V.
      • Major F.
      • Mikolajczak K.
      • et al.
      RNA-puzzles: A CASP-like evaluation of RNA three-dimensional structure prediction.
      ,
      • Miao Z.
      • Adamiak R.W.
      • Antczak M.
      • Batey R.T.
      • Becka A.J.
      • Biesiada M.
      • Boniecki M.J.
      • Bujnicki J.M.
      • Chen S.J.
      • Cheng C.Y.
      • Chou F.C.
      • Ferre-D'Amare A.R.
      • Das R.
      • Dawson W.K.
      • Ding F.
      • et al.
      RNA-puzzles round III: 3D RNA structure prediction of five riboswitches and one ribozyme.
      ,
      • Miao Z.
      • Adamiak R.W.
      • Antczak M.
      • Boniecki M.J.
      • Bujnicki J.
      • Chen S.J.
      • Cheng C.Y.
      • Cheng Y.
      • Chou F.C.
      • Das R.
      • Dokholyan N.V.
      • Ding F.
      • Geniesse C.
      • Jiang Y.
      • Joshi A.
      • et al.
      RNA-puzzles round IV: 3D structure predictions of four ribozymes and two aptamers.
      ,
      • Miao Z.
      • Adamiak R.W.
      • Blanchet M.F.
      • Boniecki M.
      • Bujnicki J.M.
      • Chen S.J.
      • Cheng C.
      • Chojnowski G.
      • Chou F.C.
      • Cordero P.
      • Cruz J.A.
      • Ferre-D'Amare A.R.
      • Das R.
      • Ding F.
      • Dokholyan N.V.
      • et al.
      RNA-puzzles round II: Assessment of RNA structure prediction programs applied to three large RNA structures.
      ). Bottom, a timeline of some RNA assembly and computing tools. The most recent ones are regularly used and actively improved. (1) FRODO developed by Alwyn Jones was a pioneering tool in computer molecular graphics (
      • Jones T.A.
      A graphics model building and refinement system for macromolecules.
      ); (2) NUCLIN-NUCLSQ (
      • Moult J.
      The current state of the art in protein structure prediction.
      ), an inclusive refinement program dedicated to nucleic acids and based on Hendrickson–Konnert PROLSQ (
      • Konnert J.H.
      • Hendrickson W.A.
      A restrained-parameter thermal-factor refinement procedure.
      ,
      • Hendrickson W.A.
      • Konnert J.H.
      ); (3) MIDAS (
      • Ferrin T.E.
      • Couch G.S.
      • Huang C.C.
      • Pettersen E.F.
      • Langridge R.
      An affordable approach to interactive desktop molecular modeling.
      ,
      • Ferrin T.E.
      • Huang C.C.
      • Jarvis L.E.
      • Langridge R.
      The MIDAS display system.
      ); (4) MC-Sym (
      • Major F.
      • Turcotte M.
      • Gautheret D.
      • Lapalme G.
      • Fillion E.
      • Cedergren R.
      The combination of symbolic and numerical computation for three-dimensional modeling of RNA.
      ); (5) MANIP (
      • Massire C.
      • Westhof E.
      MANIP: An interactive tool for modelling RNA.
      ); (6) Chimera (
      • Pettersen E.F.
      • Goddard T.D.
      • Huang C.C.
      • Couch G.S.
      • Greenblatt D.M.
      • Meng E.C.
      • Ferrin T.E.
      UCSF Chimera--a visualization system for exploratory research and analysis.
      ); (7) S2S (
      • Jossinet F.
      • Ludwig T.E.
      • Westhof E.
      Assemble: An interactive graphical tool to analyze and build RNA architectures at the 2D and 3D levels.
      ,
      • Jossinet F.
      • Westhof E.
      Sequence to structure (S2S): Display, manipulate and interconnect RNA data from sequence to structure.
      ); (8) FARFAR (
      • Das R.
      • Baker D.
      Automated de novo prediction of native-like RNA tertiary structures.
      ); (9) MC-Fold (
      • Parisien M.
      • Major F.
      The MC-fold and MC-Sym pipeline infers RNA structure from sequence data.
      ), iFoldRNA (
      • Ding F.
      • Sharma S.
      • Chalasani P.
      • Demidov V.V.
      • Broude N.E.
      • Dokholyan N.V.
      Ab initio RNA folding by discrete molecular dynamics: From structure prediction to folding mechanisms.
      ); (10) ModeRNA (
      • Rother M.
      • Rother K.
      • Puton T.
      • Bujnicki J.M.
      ModeRNA: A tool for comparative modeling of RNA 3D structure.
      ); (11) RNAComposer (
      • Popenda M.
      • Szachniuk M.
      • Antczak M.
      • Purzycka K.J.
      • Lukasiak P.
      • Bartol N.
      • Blazewicz J.
      • Adamiak R.W.
      Automated 3D structure composition for large RNAs.
      ); 3dRNA (
      • Zhao Y.
      • Huang Y.
      • Gong Z.
      • Wang Y.
      • Man J.
      • Xiao Y.
      Automated and fast building of three-dimensional RNA structures.
      ); (12) VFold (
      • Xu X.
      • Zhao P.
      • Chen S.J.
      Vfold: A web server for RNA structure and folding thermodynamics prediction.
      ); (13) SimRNA (
      • Boniecki M.J.
      • Lach G.
      • Dawson W.K.
      • Tomala K.
      • Lukasz P.
      • Soltysinski T.
      • Rother K.M.
      • Bujnicki J.M.
      SimRNA: A coarse-grained method for RNA folding simulations and 3D structure prediction.
      ).
      Figure 7 (bottom) shows the evolution of the computer tools necessary for the manipulation and refinement of RNA structures. The advent of computer graphics and the development of systems allowing the display and the manipulation of molecular objects in real time, pioneered by Levinthal (
      • Katz L.
      • Levinthal C.
      Interactive computer graphics and representation of complex biological structures.
      ), Feldmann (
      • Feldmann R.J.
      The design of computing systems for molecular modeling.
      ), Langridge (
      • Langridge R.
      Interactive three-dimensional computer graphics in molecular biology.
      ,
      • Langridge R.
      • Ferrin T.E.
      • Kuntz I.D.
      • Connolly M.L.
      Real-time color graphics in studies of molecular interactions.
      ), and Jones (
      • Jones T.A.
      A graphics model building and refinement system for macromolecules.
      ), were breakthroughs in structural biology. Several of these tools were developed for crystallography and modeling purposes as well. Model building is the process in crystallography (or cryo-EM) in which one actually assembles a molecular model by fitting molecular objects into electron density (manually either with physical objects in a Richard’s box or with computer graphics, a process now almost automatically done). The model building is guided and constrained by the electron density. The molecular models are then refined geometrically and stereochemically so as to satisfy best the crystallographic data or the electron density. In ab initio modeling, the molecular objects are built without the constraints of crystallographic data. The model building can be driven by geometric and physicochemical energetic terms in (semi-) automatic fashion by various computing tools (for RNA, see the recent reviews (
      • Li B.
      • Cao Y.
      • Westhof E.
      • Miao Z.
      Advances in RNA 3D structure modeling using experimental data.
      ,
      • Miao Z.
      • Westhof E.
      RNA structure: Advances and assessment of 3D structure prediction.
      )). Such tools should integrate the knowledge, accumulated at a given time, on the molecular systems to be modeled. This knowledge integration is still a formidable challenge owing to the diversity of molecular interactions to consider and treat simultaneously (weak forces, water molecules, ions, etc.). When performed manually with physical objects, model building is still very powerful to assimilate and understand the rules underlying the folding of macromolecules. Richard Hamming said, “The purpose of computing is insight, not numbers.” Similarly, the purpose of modeling is insight, not models. One can also recall the famous quote from Richard Feynman: “What I cannot create, I do not understand.”
      The point of archiving 3D models is to provide a record of previous modeling efforts that can help to benchmark methodological progress in computational modeling and to develop metrics to assess contemporary modeling efforts (see (
      • DellaVigna S.
      • Pope D.
      • Vivalt E.
      Predict science to improve science.
      )). Toward that aim, the RNA-Puzzles computational contests were created in 2011 (
      • Cruz J.A.
      • Blanchet M.F.
      • Boniecki M.
      • Bujnicki J.M.
      • Chen S.J.
      • Cao S.
      • Das R.
      • Ding F.
      • Dokholyan N.V.
      • Flores S.C.
      • Huang L.
      • Lavender C.A.
      • Lisi V.
      • Major F.
      • Mikolajczak K.
      • et al.
      RNA-puzzles: A CASP-like evaluation of RNA three-dimensional structure prediction.
      ,
      • Miao Z.
      • Adamiak R.W.
      • Antczak M.
      • Batey R.T.
      • Becka A.J.
      • Biesiada M.
      • Boniecki M.J.
      • Bujnicki J.M.
      • Chen S.J.
      • Cheng C.Y.
      • Chou F.C.
      • Ferre-D'Amare A.R.
      • Das R.
      • Dawson W.K.
      • Ding F.
      • et al.
      RNA-puzzles round III: 3D RNA structure prediction of five riboswitches and one ribozyme.
      ,
      • Miao Z.
      • Adamiak R.W.
      • Antczak M.
      • Boniecki M.J.
      • Bujnicki J.
      • Chen S.J.
      • Cheng C.Y.
      • Cheng Y.
      • Chou F.C.
      • Das R.
      • Dokholyan N.V.
      • Ding F.
      • Geniesse C.
      • Jiang Y.
      • Joshi A.
      • et al.
      RNA-puzzles round IV: 3D structure predictions of four ribozymes and two aptamers.
      ,
      • Miao Z.
      • Adamiak R.W.
      • Blanchet M.F.
      • Boniecki M.
      • Bujnicki J.M.
      • Chen S.J.
      • Cheng C.
      • Chojnowski G.
      • Chou F.C.
      • Cordero P.
      • Cruz J.A.
      • Ferre-D'Amare A.R.
      • Das R.
      • Ding F.
      • Dokholyan N.V.
      • et al.
      RNA-puzzles round II: Assessment of RNA structure prediction programs applied to three large RNA structures.
      ). The RNA-Puzzles consortium is a community-wide assessment of RNA 3D structure prediction that aims to expose the bottlenecks in current RNA 3D structure prediction and promote the improvement of these prediction methods. RNA-Puzzles attempts to provide the RNA modeling community what CASP (Critical Assessment of Methods of Protein Structure Prediction) has been doing for protein modeling (
      • Moult J.
      The current state of the art in protein structure prediction.
      ). All the data sets for the modeled structures and codes for assessment are now available as open source on GitHub (https://github.com/RNA-Puzzles) (
      • Magnus M.
      • Antczak M.
      • Zok T.
      • Wiedemann J.
      • Lukasiak P.
      • Cao Y.
      • Bujnicki J.M.
      • Westhof E.
      • Szachniuk M.
      • Miao Z.
      RNA-puzzles toolkit: A computational resource of RNA 3D structure benchmark datasets, structure manipulation, and evaluation tools.
      ).

      Databases play central roles in science

      In 2003, three organizations, the Research Collaboratory for Structural Bioinformatics (RCSB) in the United States, the Macromolecular Structure Database (MSD) at EBI, and the PDBj in Osaka, created the worldwide PDB (wwPDB; http://www.wwpdb.org/) with the goal of maintaining a single archive of structure data of macromolecules that is freely and publicly available to the global community (
      • Young J.Y.
      • Westbrook J.D.
      • Feng Z.
      • Sala R.
      • Peisach E.
      • Oldfield T.J.
      • Sen S.
      • Gutmanas A.
      • Armstrong D.R.
      • Berrisford J.M.
      • Chen L.
      • Chen M.
      • Di Costanzo L.
      • Dimitropoulos D.
      • Gao G.
      • et al.
      OneDep: Unified wwPDB system for deposition, biocuration, and validation of macromolecular structures in the PDB archive.
      ). The Biological Magnetic Resonance Data Bank is now also part of wwPDB, and soon the China Protein Data Bank will join. This structure also facilitates unified deposition, curation, and distribution of structures to the growing global community of structural biologists and scientists that use 3D data (
      • Young J.Y.
      • Westbrook J.D.
      • Feng Z.
      • Sala R.
      • Peisach E.
      • Oldfield T.J.
      • Sen S.
      • Gutmanas A.
      • Armstrong D.R.
      • Berrisford J.M.
      • Chen L.
      • Chen M.
      • Di Costanzo L.
      • Dimitropoulos D.
      • Gao G.
      • et al.
      OneDep: Unified wwPDB system for deposition, biocuration, and validation of macromolecular structures in the PDB archive.
      ). At the same time, each independent center is free to innovate new visualization and analysis tools to promote the reuse of the data in creative and useful ways for nonspecialists. Each center is also engaged in creating educational materials to make structural biology accessible to students at all levels, for example, PDB101’s Molecule of the Month and the PDBj’s Encyclopedia of Protein Structures (
      • Goodsell D.S.
      • Zardecki C.
      • Berman H.M.
      • Burley S.K.
      Insights from 20 years of the molecule of the month.
      ). A number of the Molecule of the Month entries are RNA or DNA structures, and several others are proteins that interact with nucleic acids.
      We would like to recall here a sentence written, in 1990, by Brändén and Jones (
      • Branden C.I.
      • Jones T.A.
      Between objectivity and subjectivity.
      ): “It is the crystallographer’s responsibility to make sure that incorrect protein structures do not reach the literature.” They could as well have written: “It is the structural biologist’s responsibility to make sure that incorrect macromolecular structures do not reach the literature,” which would encompass also NMR and cryo-EM technologies as well as nucleic acids and their complexes. PDB continues to adapt in the new century by adopting new data formats to accommodate the much larger structures being solved. The development of metrics for assessing structure validity both globally and locally at the nucleotide level is proving key to understanding the significance of structural variation and mapping out the dynamic behavior of nucleic acid structures in response to interactions with small molecules, proteins, and other nucleic acids. Cryo-EM is rapidly displacing X-ray crystallography for very large complexes and molecular machines that cycle through multiple functional states and are hard to crystallize in unique states. New metrics for cryo-EM are in progress and should soon be available (
      • Anderson W.P.
      • Global Life Science Data Resources Working Group
      Data management: A global coalition to sustain core data.
      ). In this context, referees and journal editors have also a major role to play, to prevent incorrect structures from reaching the literature and worse, the archival databases. Referees are encouraged to request (and in many cases are actually provided with) complete and detailed statistics tables, validation reports and quality indicators, coordinates, and electron-density maps. Journal editors, as well as authors, should comply with such requests despite fierce competition for publication. The PDB has played an important role by providing sophisticated metrics and automation that provides rapid feedback to contributors of data when they submit structures.
      Databases constitute an absolute necessity for advancing science, to facilitate the work of scientists in the same and related disciplines. They are repositories of observational data organized with carefully curated dictionaries and controlled vocabularies upon which future science can build and develop. In the immediate future, there are at least two main challenges to which we would like to draw attention. Owing to the sizes and complexities of modern databases, these two challenges are interconnected and interdependent.
      First, the sustainability and maintenance of databases for the use of a now internationalized community of scientists raise the issue of how the associated costs shall be equitably apportioned; these are recurrent costs to support trained and competent personnel, together with the maintenance and constant upgrading of the extensive infrastructure required to keep up with the growth in the size and complexity of the data. Short-term grants with a dedicated focus are not adapted to maintain large international resources (
      • Anderson W.P.
      • Global Life Science Data Resources Working Group
      Data management: A global coalition to sustain core data.
      ). New business models are being explored and promoted (
      • Bourne P.E.
      • Lorsch J.R.
      • Green E.D.
      Perspective: Sustaining the big-data ecosystem.
      ). The Global Biodata Coalition is an initiative started by the International Human Frontier Science Program Organization that according to their website (https://globalbiodata.org) aims “to stabilize and ensure sustainable financial support for the global biodata infrastructure and in particular to identify for prioritized long-term support a set of Global Core Data Resources that are crucial for sustaining the broader biodata infrastructure.”
      And, secondly, databases fundamentally comprise well-organized archives of chosen sets of objects. In structural biology, the objects are mainly molecular sequences and structures. However, as these two types of data attach to the same natural entities, they need to be integrated to grasp biological function and evolution more holistically. New developments in cryo-electron tomography (
      • Turk M.
      • Baumeister W.
      The promise and the challenges of cryo-electron tomography.
      ), which allows visualization of macromolecules in situ, will open up visualization of macromolecules in their cellular environment. Maybe one day we will also include digital descriptions of phenotypes, including bones, skeletons, or wings in order to relate their forms and colors to the underlying genes and genetic networks. Launched by the RNAcentral Consortium in 2014 at the EMBL-EBI, Wellcome Genome Campus (Hinxton, UK), RNAcentral (https://rnacentral.org) integrates and unifies access to all types of noncoding RNA sequences from all organisms. This is a major enterprise that goes in the right direction for the integration and interoperability of data. A bridge between sequences and standard secondary structures has been achieved recently (
      RNAcentral Consortium
      RNAcentral 2021: Secondary structure integration, improved sequence search and new member databases.
      ). In the future this bridging integration should be extended to three-dimensional structures and thus the PDB and the NDB.

      In the light of evolution

      It would take a whole book with several dozens of figures to convey the amazing knowledge accumulated on RNA structures and architectures since the beginning of RNA structural biology and the advent of the PDB. Only glimpses of RNA structural biology are presented here, and we would like to apologize for not mentioning the contributions of many structural biologists, especially those using NMR spectroscopy, a field that contributed several seminal structures, for example, on RNPs and telomerase (
      • Daubner G.M.
      • Clery A.
      • Allain F.H.
      RRM-RNA recognition: NMR or crystallography…and new findings.
      ,
      • Dominguez C.
      • Allain F.H.
      NMR structure of the three quasi RNA recognition motifs (qRRMs) of human hnRNP F and interaction studies with Bcl-x G-tract RNA: A novel mode of RNA recognition.
      ,
      • Wu H.
      • Finger L.D.
      • Feigon J.
      Structure determination of protein/RNA complexes by NMR.
      ,
      • Zhang Q.
      • Kim N.K.
      • Feigon J.
      Architecture of human telomerase RNA.
      ,
      • Wang Y.
      • Susac L.
      • Feigon J.
      Structural biology of telomerase.
      ), with unique critical data on RNA dynamics (
      • Dethoff E.A.
      • Petzold K.
      • Chugh J.
      • Casiano-Negroni A.
      • Al-Hashimi H.M.
      Visualizing transient low-populated structures of RNA.
      ,
      • Dethoff E.A.
      • Chugh J.
      • Mustoe A.M.
      • Al-Hashimi H.M.
      Functional complexity and regulation through RNA dynamics.
      ,
      • Kimsey I.J.
      • Petzold K.
      • Sathyamoorthy B.
      • Stein Z.W.
      • Al-Hashimi H.M.
      Visualizing transient Watson-Crick-like mispairs in DNA and RNA duplexes.
      ,
      • Rinnenthal J.
      • Buck J.
      • Ferner J.
      • Wacker A.
      • Furtig B.
      • Schwalbe H.
      Mapping the landscape of RNA dynamics with NMR spectroscopy.
      ,
      • Reining A.
      • Nozinovic S.
      • Schlepckow K.
      • Buhr F.
      • Furtig B.
      • Schwalbe H.
      Three-state mechanism couples ligand and temperature sensing in riboswitches.
      ). Almost each RNA structure still brings some surprising and unexpected features, and these unveil the constraints of the physicochemical interactions and the congruent accommodations due to the historical contingencies of biological evolution. If we can cite again T. Dobzhansky (
      • Dobzhansky T.
      Nothing in biology makes sense except in the light of evolution.
      ): “Seen in the light of evolution, biology is, perhaps, intellectually the most satisfying and inspiring science. Without that light it becomes a pile of sundry facts, some of them interesting or curious but making no meaningful picture as a whole.” A quote from Ernest Rutherford states “All science is either physics or stamp collecting,” which is in sharp contrast to the lifelong accumulation of observations and data by the biologist Charles Darwin. The RNA structures in the PDB are absolutely not a pile of miscellaneous items, “sundry facts”; they are physical objects, which, together with genomic data, allow us to integrate present-day biological functions and the historical evolution in all living species on earth.
      Technological improvements and breakthroughs are at the heart of discovery and scientific progress. One may recall the famous quote from Sydney Brenner “progress in science depends on new techniques, new discoveries and new ideas, probably in that order” (
      • Robertson M.
      Biology in the 1980s, plus or minus a decade.
      ). As alluded above, the recent and constant progress in cryo-EM techniques is totally changing structural biology (
      • Kappel K.
      • Zhang K.
      • Su Z.
      • Watkins A.M.
      • Kladwang W.
      • Li S.
      • Pintilie G.
      • Topkar V.V.
      • Rangan R.
      • Zheludev I.N.
      • Yesselman J.D.
      • Chiu W.
      • Das R.
      Accelerated cryo-EM-guided determination of three-dimensional RNA-only structures.
      ). The protein prediction field is now overwhelmed by the achievements of artificial intelligence tools (
      • Senior A.W.
      • Evans R.
      • Jumper J.
      • Kirkpatrick J.
      • Sifre L.
      • Green T.
      • Qin C.
      • Zidek A.
      • Nelson A.W.R.
      • Bridgland A.
      • Penedones H.
      • Petersen S.
      • Simonyan K.
      • Crossan S.
      • Kohli P.
      • et al.
      Improved protein structure prediction using potentials from deep learning.
      ,
      • Gao W.
      • Mahajan S.P.
      • Sulam J.
      • Gray J.J.
      Deep learning in protein structural modeling and design.
      ,
      • Callaway E.
      'It will change everything': DeepMind's AI makes gigantic leap in solving protein structures.
      ). For those raised with the warning of the Levinthal’s paradox (
      • Levinthal C.
      Are there pathways for protein folding?.
      ), we admire in awe the progress and breakthroughs. Could such tools be applied to RNA structure prediction? Since they are based on the analysis of hundreds of structures, do we have enough RNA structures sufficiently nonredundant? As remarked by Jane Richardson (
      • Richardson J.S.
      A new way to see RNAs.
      ), among the “1491 RNA-only structures, just 373 of which have a chain ≥60 nucleotides” and of those a large number consists of ribosomal structures with various ligands or from various biological sources. As appropriately reminded by John Helliwell (
      • Helliwell J.R.
      DeepMind and CASP14.
      ), in the end, these breakthroughs would not have been feasible without the PDB.

      Dedication

      Figure thumbnail fx2
      During the writing of this work, on December 8, 2020, Neocles B. Leontis died in a car accident (
      • Moore P.B.
      • Petrov A.
      • Westhof E.
      • Zirbel C.L.
      Neocles B. Leontis (1955 - 2020).
      ). Together with the editors, we would like to dedicate this article to his memory. I have been fortunate to exchange and collaborate with Neocles for more than 20 years. Beyond the many scientific papers published together, always occasions of in-depth, dynamic, and always fair discussions, Neocles was a friend intensely human, joyous, and positive with whom it was recomforting to be. I admired him deeply for his move toward local politics (in 2019, he was elected to the City Council of Bowling Green, Ohio) and his involvement in many social and climate issues. He applied his intellectual power to these issues as thoroughly and seriously as he did to the various scientific problems he tackled. His untimely and brutal death is a loss to the local community around his University and town, but also to all the RNA scientists around the world who use the concepts, tools database he contributed to develop. Neocles earned his PhD from Yale with Peter Moore as mentor. He became professor of Chemistry at Bowling Green State University in 1987. With Peter Moore, he worked using NMR on the 5S rRNA and showed the key role of Mg ions and the presence of non-Watson–Crick pairs. Later using sequence comparisons, he could identify loop E modules in RNA. Together we published a nomenclature for classifying logically and nonambiguously non-Watson–Crick. Neocles Leontis and his group at Bowling Green developed many websites and tools for analyzing and compiling RNA structures.

      Conflict of interest

      The authors declare that they have no conflicts of interest with the contents of this article.

      Acknowledgments

      We wish to thank the numerous scientists, engineers, technicians who have been involved since the origins in the development and maintenance of the PDB. We gratefully acknowledge also the funding bodies that offered to the scientific community worldwide this essential component of its everyday life.

      Author contributions

      E. W. and N. B. L. wrote the article, did the bibliographic research, and made first drafts of the drawings and figures.

      References

        • Sundaralingam M.
        • Jensen L.H.
        Stereochemistry of nucleic acid constituents: I. Refinement of the structure of cytidylic acid b.
        J. Mol. Biol. 1965; 13: 914-929
        • Kennard O.
        • Speakman J.C.
        • Donnay J.D.H.
        Primary crystallographic data.
        Acta Cryst. 1967; 22: 445-449
        • Rubin J.
        • Brennan T.
        • Sundaralingam M.
        Crystal structure of a naturally occurring dinucleoside monophosphate: Uridylyl (3',5') adenosine hemihydrate.
        Science. 1971; 174: 1020-1022
        • Seeman N.C.
        • Sussman J.L.
        • Berman H.N.
        • Kim S.H.
        Nucleic acid conformation: Crystal structure of a naturally occurring dinucleoside phosphate (UpA).
        Nat. New Biol. 1971; 233: 90-92
        • Kennard O.A.
        • Brice F.H.
        • Hummelink M.D.
        • Motherwell T.W.A.
        • Roidgers W.D.S.
        • Watson J.R.
        • D.G.
        Computer based systems for the retrieval of data: Crystallography.
        Pure Appl. Chem. 1977; 49: 1807-1816
      1. Crystallography: Protein Data Bank.
        Nat. New Biol. 1971; 233: 223
        • Bernstein F.C.
        • Koetzle T.F.
        • Williams G.J.
        • Meyer Jr., E.F.
        • Brice M.D.
        • Rodgers J.R.
        • Kennard O.
        • Shimanouchi T.
        • Tasumi M.
        The Protein Data Bank. A computer-based archival file for macromolecular structures.
        Eur. J. Biochem. 1977; 80: 319-324
        • Berman H.M.
        • Olson W.K.
        • Beveridge D.L.
        • Westbrook J.
        • Gelbin A.
        • Demeny T.
        • Hsieh S.H.
        • Srinivasan A.R.
        • Schneider B.
        The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids.
        Biophys. J. 1992; 63: 751-759
        • Davis I.W.
        • Leaver-Fay A.
        • Chen V.B.
        • Block J.N.
        • Kapral G.J.
        • Wang X.
        • Murray L.W.
        • Arendall 3rd, W.B.
        • Snoeyink J.
        • Richardson J.S.
        • Richardson D.C.
        MolProbity: All-atom contacts and structure validation for proteins and nucleic acids.
        Nucleic Acids Res. 2007; 35: W375-W383
        • Chou F.C.
        • Sripakdeevong P.
        • Dibrov S.M.
        • Hermann T.
        • Das R.
        Correcting pervasive errors in RNA crystallography through enumerative structure prediction.
        Nat. Methods. 2013; 10: 74-76
        • Read R.J.
        • Adams P.D.
        • Arendall 3rd, W.B.
        • Brunger A.T.
        • Emsley P.
        • Joosten R.P.
        • Kleywegt G.J.
        • Krissinel E.B.
        • Lutteke T.
        • Otwinowski Z.
        • Perrakis A.
        • Richardson J.S.
        • Sheffler W.H.
        • Smith J.L.
        • Tickle I.J.
        • et al.
        A new generation of crystallographic validation tools for the protein data bank.
        Structure. 2011; 19: 1395-1412
        • Petrov A.I.
        • Zirbel C.L.
        • Leontis N.B.
        WebFR3D--a server for finding, aligning and analyzing recurrent RNA 3D motifs.
        Nucleic Acids Res. 2011; 39: W50-W55
        • Coimbatore Narayanan B.
        • Westbrook J.
        • Ghosh S.
        • Petrov A.I.
        • Sweeney B.
        • Zirbel C.L.
        • Leontis N.B.
        • Berman H.M.
        The nucleic acid database: New features and capabilities.
        Nucleic Acids Res. 2014; 42: D114-D122
        • Altona C.
        • Sundaralingam M.
        Conformational analysis of the sugar ring in nucleosides and nucleotides. A new description using the concept of pseudorotation.
        J. Am. Chem. Soc. 1972; 94: 8205-8212
        • Olson W.K.
        • Flory P.J.
        Spatial configurations of polynucleotide chains. 3. Polydeoxyribonucleotides.
        Biopolymers. 1972; 11: 57-66
        • Olson W.K.
        • Flory P.J.
        Spatial configurations of polynucleotide chains. I. Steric interactions in polyribonucleotides: A virtual bond model.
        Biopolymers. 1972; 11: 1-23
        • Olson W.K.
        • Flory P.J.
        Spatial configuration of polynucleotide chains. II. Conformational energies and the average dimensions of polyribonucleotides.
        Biopolymers. 1972; 11: 25-56
        • Ramakrishnan C.
        • Ramachandran G.N.
        Stereochemical criteria for polypeptide and protein chain conformations. II. Allowed conformations for a pair of peptide units.
        Biophys. J. 1965; 5: 909-933
        • Sundaralingam M.
        Stereochemistry of nucleic acids and their constituents.† IV. Allowed and preferred conformations of nucleosides, nucleoside mono-, di-, tri-, tetraphosphates, nucleic acids and polynucleotides.
        Biopolymers. 1969; 7: 821-860
        • Yathindra N.
        • Sundaralingam M.
        Analysis of the possible helical structures of nucleic acids and polynucleotides. Application of (n-h) plots.
        Nucleic Acids Res. 1976; 3: 729-747
        • Petrov A.I.
        • Zirbel C.L.
        • Leontis N.B.
        Automated classification of RNA 3D motifs and the RNA 3D motif atlas.
        RNA. 2013; 19: 1327-1340
        • Gore S.
        • Sanz Garcia E.
        • Hendrickx P.M.S.
        • Gutmanas A.
        • Westbrook J.D.
        • Yang H.
        • Feng Z.
        • Baskaran K.
        • Berrisford J.M.
        • Hudson B.P.
        • Ikegawa Y.
        • Kobayashi N.
        • Lawson C.L.
        • Mading S.
        • Mak L.
        • et al.
        Validation of structures in the Protein Data Bank.
        Structure. 2017; 25: 1916-1927
        • Wuthrich K.
        NMR of Proteins and Nucleic Acids.
        Wiley, New York, NY1991
        • Allain F.H.
        • Varani G.
        How accurately and precisely can RNA structure be determined by NMR?.
        J. Mol. Biol. 1997; 267: 338-351
        • Lawson C.L.
        • Berman H.M.
        • Chiu W.
        Evolving data standards for cryo-EM structures.
        Struct. Dyn. 2020; 7014701
        • Wang A.J.
        • Quigley G.J.
        • Kolpak F.J.
        • van der Marel G.
        • van Boom J.H.
        • Rich A.
        Left-handed double helical DNA: Variations in the backbone conformation.
        Science. 1981; 211: 171-176
        • Drew H.R.
        • Wing R.M.
        • Takano T.
        • Broka C.
        • Tanaka S.
        • Itakura K.
        • Dickerson R.E.
        Structure of a B-DNA dodecamer: Conformation and dynamics.
        Proc. Natl. Acad. Sci. U. S. A. 1981; 78: 2179-2183