If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Some of the amazing contributions brought to the scientific community by the Protein Data Bank (PDB) are described. The focus is on nucleic acid structures with a bias toward RNA. The evolution and key roles in science of the PDB and other structural databases for nucleic acids illustrate how small initial ideas can become huge and indispensable resources with the unflinching willingness of scientists to cooperate globally. The progress in the understanding of the molecular interactions driving RNA architectures followed the rapid increase in RNA structures in the PDB. That increase was consecutive to improvements in chemical synthesis and purification of RNA molecules, as well as in biophysical methods for structure determination and computer technology. The RNA modeling efforts from the early beginnings are also described together with their links to the state of structural knowledge and technological development. Structures of RNA and of its assemblies are physical objects, which, together with genomic data, allow us to integrate present-day biological functions and the historical evolution in all living species on earth.
The Protein Data Bank (PDB) is an icon for structural biologists. In its virtual vaults, the PDB stores the structures of biological macromolecules obtained painstakingly by thousands of crystallographers, electron microscopists, and nuclear magnetic resonance spectroscopists over the last 50 years or so. This note is primarily written to honor and acknowledge those who set up the PDB and who have strived throughout all these years to improve and maintain the database. We thank the Journal of Biological Chemistry for organizing the publication of such pieces that provide a mixture of personal recollections, historical records about nucleic acids structure, and some general comments about databases and their future.
The evolution of structural databases
When one of us was a postdoctoral fellow with M. Sundaralingam at the Department of Biochemistry of the University of Wisconsin in Madison, the only way to work with a published molecular structure was to first type in the atomic coordinates at the console of a PDP computer (Programmed Data Processor from Digital Equipment Corporation, Massachusetts). Only then could one deduce or check atomic distances, angles, stereochemistry, contacts and visualize the 3D structure using programs developed in-house. This was a tedious job, but very instructive to students and postdoctoral fellows. In those days, atomic coordinates were part of the paper and presented with temperature factors in tables (for an example, see (
)). The number and diversity of errors that escaped detection before publication, while rarely casting doubt on the crystallography itself, led generally to inaccurately described molecular structures with, for example, some wrong chiralities, or atoms of the molecular unit in different unit cells. This manual handling of data constituted for sure an excellent training ground. Happily, the nucleic acid structures that could be tackled at that time were restricted to the bases (often modified), nucleosides, and nucleotides, with the first structure of a dinucleotide appearing in 1971 (
) and, in 1971, the PDB was established jointly by the Crystallographic Data Centre, Cambridge and the Brookhaven National Laboratory (Protein Data Bank). PDB originally served as a repository system where crystallographers could mail in their data on punch cards, then computer tapes, for archiving and distribution. The announcement of the founding of the PDB in Nature New Biology was a simple little insert of 256 words (
). The CSD archived structures of individual nucleotides, as well as di- and trinucleotides. These structures are not in PDB. In fact, the first release of PDB did not include any nucleic acid structures, just seven protein structures (
) and structures of RNA units were included in the NDB.
In the early days of the development of structural databases, the main objectives were to prevent the loss of data by archiving solved structures and making them freely available upon request. The deposited data were checked for atom names, numberings, and geometry by database curators, but not as systematically and thoroughly as it is done today. Dictionaries and defined formats were developed over the years and applied to standardize nomenclature, file formats, and metadata. As regards nucleic acids, the relational NDB archived data accompanied by structural information and descriptions (
). NDB served as a testbed for new data formats that could accommodate much larger structures and much more metadata, which were eventually adopted by PDB. Nowadays, these databases are rich in metadata and tools for assessing and describing structures. Very valuable tools for structural validations (
All of the precise structural data regarding RNA comes ultimately from atomic-resolution X-ray structures of nucleotides, oligonucleotides, and various biologically relevant structures, ranging in size from individual helical elements to the full ribosome. The early work was carried out by pioneers such as A. Rich, O. Kennard, and M. Sundaralingam, who from the mid-1960s and into the 1980s carried out precise crystallographic studies of nucleosides, nucleotides, and dinucleotides. These data comprise all our basic knowledge of bond lengths, angles, and stereochemistry, as well as interaction preferences, including all types of base pairs, and most stacking and base–backbone interactions. The new data led early on to useful concepts including the conformational wheel, describing ribose conformations (
). Unlike proteins, which effectively have just two degrees of freedom per monomer unit, nucleotides present seven, six dihedrals along the backbone, and the one around the glycosidic bond. In 1965, based on the small number of protein structures known, as well as better resolved peptide structures, Ramakrishnan and Ramachandran (
) worked out the sterically favored backbone configurations and summarized them graphically in two dimensions, the dihedral angles phi and psi. Such Ramachandran plots continue to be used as easily visualized metrics for protein structures. Shortly afterward, Sundaralingam (
) carried out a series of landmark studies to accurately determine molecular structures of nucleic acid constituents, culminating in a highly cited 1969 paper in Biopolymers that laid the foundation for conformational analysis of nucleotides and polynucleotides, and gave the first stereochemical rules for the sugar-phosphate backbone. This work revealed conclusively that backbone dihedrals are restricted and that correlations exist between them and supported theoretical work that proposed various schemes to simplify nucleic acid conformational analysis using virtual bonds (
High-resolution data from small molecules are also used to build force fields and to infer rules for assembly of molecular moieties. These force fields and energetic rules are then used for producing and optimizing structures, sampling the conformational space, or simulating molecular dynamics. The quality and general value of the deduced force fields strongly depend on the number and variety of structures available. The importance of efficient and accurate force fields is paramount in modern biochemical research, since these force fields are used not only for computing trajectories of molecular dynamics simulations, but also in determination of new structure using NMR and cryo-electron microscopy.
It cannot be overemphasized that the quality of the deposited structures is of primary importance; it is directly related to the crystallographic resolution of the X-ray data and on the refinement process, since only a minor fraction of X-ray structures is obtained at true atomic resolution (better than 1 Å). For example, can one really be confident in RNA structures at resolutions above 3.3 Å with average B-factors around 200 Å2, bad clash scores, and poor PDB validation metrics? Only a systematic analysis of the various regions of the RNA molecule in electron density maps would allow a knowledgeable and interested person to reach an informed opinion. One key parameter for compiling reference databases for knowledge extraction is the nonredundancy of the structures that are included. Nonredundant structure databases, by reducing bias in the parameters deduced from structures, are extremely valuable for extracting knowledge about RNA as well as for benchmarking modeling strategies (
). In this respect, it is worth noting that less than 100 nonredundant RNA structures have been solved at 2 Å resolution or better. Well-defined metrics for 3D structures determined from crystallography (X-ray and NMR) have been established and are reported at the nucleotide level by PDB (
). Metrics have recently also been defined for structures determined by cryo-electron microscopy, which is largely replacing X-ray diffraction for structure determination of large macromolecular machines such as ribosomes, viruses, and spliceosomes (
)). The choice of tRNA for study was obvious because of the amount present in cells and the availability of some purification protocols. Its primary and secondary structures had been determined in the 1960s, by chemical and biochemical means (
)), containing short fragments of the viral genome bound to the coat proteins. In those years, at crystallography meetings and conferences, talks on nucleic acids, and RNA in particular, were generally relegated to the last day, either after the meeting dinner (always very lively and joyous events) in the early morning or just before the departure of the bus.
Following novelty and progress in chemical synthesis and purification of RNA oligonucleotides (
). Even though they lacked atomic resolution, the NMR structures were sufficiently detailed to provide models that expanded our understanding on how the RNA structures could form, in particular the variety of non-Watson–Crick base pairs, and they provided enough information to infer sequence signatures to identify recurrences of the same motifs in other structures (
). The willingness of its directors to expand PDB to include NMR structures proved to be a wise decision.
By 1995, the number of X-ray RNA structures (alone or in complex with protein) in the PDB amounted to only about 1% of the present content of total RNA and RNA–protein complex (RNP) structures. Figures 1 and 2 show the evolution in the number of structures of RNA alone and RNP with time. Some key structures are indicated to illustrate the increase in complexity of the solved structures. Again progress in chemical synthesis (
) appeared in those years and the whole community benefited greatly from those novel approaches. Amazing breakthroughs in cryo-electron microscopy are now accelerating the pace at which highly complex particles can be observed (
In 1996, a breakthrough in RNA crystallography was achieved in the laboratories of Tom Cech and Jennifer Doudna with the crystal structure of the P4–P6 domain of the Tetrahymena group I intron, a structure twice the size of tRNA, featuring striking compact folds and novel types of RNA tertiary contacts (
). With the amazing structure of P4–P6, many interested scientists learned that large RNAs could also be crystallized starting from in vitro synthesis and production. This breakthrough spurred fruitful efforts that quickly expanded our knowledge of the repertoire of surprising and beautiful RNA architectures, culminating in the structures of the ribosomal subunits themselves (
) and the Nobel Prize in Chemistry to Venki Ramakrishnan, Tom Steitz, Ada Yonath in 2009 (https://www.nobelprize.org/prizes/chemistry/). And, after the Nobel Prize in Physiology or Medicine for RNA interference (RNAi) in 2006 to Andrew Fire and Craig Mello, the RNA community is celebrating the 2020 Nobel Prize in Chemistry awarded to Jennifer Doudna and Emmanuelle Charpentier for the structure-based design of the splendidly efficient RNA-programmed Crispr-cas9 system (
The accumulation of RNA structures, each one bringing either a new insight key to folding or additional confirmation and sequence variants, led to deep understanding of the main physicochemical features underlying RNA architecture. Some of these features, gained from structures of Figure 1, are gathered in Table 1. Up to the 1970s, structures were restricted to the bases, nucleosides, or nucleotides and analyzed the stereochemistry of nucleic acids (
). Despite its relatively small size (<80 nts), tRNAs have a tightly folded structure and provided many surprising and eye-opening insights into the logic of RNA folding (see also Table 1): short Watson–Crick paired helices coaxially stacked (Fig. 3, A and B), short-range and long-range interactions (Fig. 3C), the central roles of non-Watson–Crick base pairs in mediating tertiary contacts (Fig. 4), base–phosphate or sugar–phosphate contacts (Fig. 4, A and B), structured hairpin loops (with the famous U-turn) and their propensities to intimately interact, and base–base intercalative stacking interactions (Fig. 4C) that contribute to long-range loop–loop interactions.
Table 1A short historical overview of folding rules derived from crystal structures
Key advances in revealing interactions within RNA structures
- coaxial stacking between helices (continuous strand) - loop–loop interactions - non-Watson–Crick base pairs - U-turn structuration of loops - base triple contacts between a single-strand and the deep major groove of a helix - intercalation of unpaired bases - polyamines, Mg ions, lead cleavage
A GNRA tetraloop is a stretch of four nucleotides capping a hairpin and starting with a guanine (G), any of the four nucleotides (A, G, C, U, or N), a purine (A or G, R) and ending with an adenine (A). The GoU pairs were first suggested by Francis Crick (
) for explaining the degeneracy of the genetic code. In such a pair, a U pairs with a G instead of a C. GoU pairs have definite characteristics and play key roles in RNA structures and biology (for a recent overview, see (
)). In any case, it displays a three-way junction maintained in the proper relative orientations of the helices by non-Watson–Crick pairs (Fig. 5). As mentioned above, the structure of P4–P6, a fragment of the group I intron from Tetrahymena thermophila (
), unveiled several key recurrent RNA folding rules. As alluded to above, that structure, beyond the breathtaking views on RNA folds, initiated a strong impetus on the RNA crystallography community. An illustration is shown in Figure 6 and short descriptions in Table 1.
). And it would take a lot more pages to do justice to what RNA structures have taught us about biology. Many RNA structures have been milestones and fueled our advances in understanding biochemical function and biological evolution. Forty years after the establishment by Tom Cech (
), indicate clearly also that the chemistry of splicing is performed by RNA elements. In those large RNP complexes, such as the ribosomes and the spliceosomes, the RNA elements are positioned precisely for function by the concomitant actions of RNA folding and protein complex formation. In the ribosome field, technological development in cryo-electron microscopy has led to stunning increases in resolution; for example, we now can delve inside ribosomal structures at 2 Å resolution (
). Further, active ribosomes are assembled through long and convoluted maturation processes. Structures are now appearing showing various steps in the maturation of ribosomes with changes in RNA base pairing and structures and with binding of maturation cofactors absent in the final assembled ribosomes (
). While the ribosome performs the programmed steps in translation (initiation, translocation, termination) essentially as a dynamical single complex entity, this is not the case for the spliceosome. Driven by several splicing or regulatory factors and RNA-dependent ATPase/helicases, the RNA and protein composition as well as the structures of the spliceosomal complex change along the catalytic steps, from the recognition of the 5’ and 3’ splice junctions and the branching point leading to the B∗ complex ready for the first step of splicing (cleavage at 5’ splice site) to the C∗ complex ready for the second step of splicing, the ligation of the exons, and the disassembly of the spliceosome for initiating a new cycle (
). The accumulated RNA structures offer us amazing insights into the principles of molecular evolution underlying biological evolution. Powerful techniques in molecular evolution have demonstrated that starting from a random sequence one can isolate sequences specific for binding a given ligand or with a defined function (
). Riboswitches are RNA sequences that, during transcription in the presence of a given ligand, fold into a native structure different from the one obtained after transcription in the absence of the same ligand (
). The alternative folds exert different action on the production of metabolic enzymes through transcription termination (or antitermination) or inhibition (or not) of ribosome binding for initiation of translation. There are now many structures of riboswitches with and without ligand bound (
Beyond the comforting insights on the RNA world at the origins of life about 2.8 billion years ago, all these RNA structures, coupled with structural alignments of homologous sequences, offer us multiple “Rosetta stones” for deciphering the RNA molecular evolution principles (
). Prior to the visualization of three-dimensional RNA architectures, RNA sequence alignments played major roles (and they still continue to do so). Through the analysis of nucleotide positions covarying according to the Watson–Crick rules, one can deduce the secondary structure (i.e., the ensemble of base-paired RNA helices) of a set of homologous RNA molecules. The third kingdom of life, the Archaea, were discovered by Carl Woese by sequencing ribosomal RNA and aligning them (
). The availability of both RNA sequences and RNA structures allows structural alignment patterned on RNA architectures. These allow us to learn about how RNAs evolve, in other words how changes or mutations in homologous sequences maintain the RNA folded architecture. One can learn which variations are neutral and which molecular interactions are conserved through species. When several homologous three-dimensional structures are available, one can learn also how molecular accommodations occur and how ions or water molecules compensate for the loss of some interaction. By analysis of sequence alignments coupled with structures, one can also visualize how molecular units can swap or interchange and determine which interactions are opportunistic and not critical for form and function. Integrated databases with sophisticated techniques for guaranteeing the interoperability of data will need to be developed to exploit fully the one-dimensional and the three-dimensional data. In complexes between RNA and proteins, the diversity and multiplicity of intermolecular contacts are enormously potentiated. The understanding of RNP formation and structure is a major challenge for years to come.
A short history of RNA modeling and RNA assembly computational tools
In the 1980s, the PDB also accepted structures derived by computational modeling. One of us did deposit several structures of RNA modeled on the basis of sequence alignments and/or chemical and enzymatic probing in solution. While PDB may have accepted these submissions somewhat reluctantly, many of these modeled structures were of intense interest to the growing crowd of RNA aficionados, and their accessibility stimulated experimentalists to put them to the test. After presenting the modeling of the core of group I introns (
) at The 49th Pittsburgh Diffraction Conference, Columbus, in November 1991, there was almost a riot in the room with half of the participants arguing for and the other half against the validity of the procedure. These modeled structures can no longer be found on the PDB but can be retrieved from the following website https://eric-westhof.ibmc.cnrs.fr/. Interestingly, wwPDB recently held a workshop on data standards for integrative structural models (
) models. They stated very clearly the objectives that are still of value today: “We do not necessarily believe that our models describe the actual molecular conformations to an accuracy of a few hundredths of an angstrom, but the analysis shows that a model with the general characteristics we propose can be built with acceptable stereochemistry. Only if model building is treated as a rigid discipline with strict attention paid to detailed stereochemistry can the results of a study such as this be considered reliable and meaningful.” They did identify and rationalize the correct stacking of the five nucleotides on the 3’-end of the loop with the 3’-strand of the anticodon helix (called 3’-stack). Michael Levitt assembled in 1969 a whole tRNA (
) (Fig. 3, A–C). Interestingly, contacts between the D- and T-loops are proposed as well as base pairs between residues 8 and 14 as well as 15 and 48 (now sometimes called the Levitt pair), but both in the usual Watson–Crick configurations instead of the observed trans configuration (Fig. 4, D and E). Sequence analysis coupled with model building led to the suggestion that some RNAs can form pseudoknots, for example, between a hairpin loop and a single strand (meaning that the single strand does not go through the loop) (
) published a model of the core of group I intron on the basis of the available experimental data and with clear model building principles that are still valid. They are worth restating: “(i) RNA duplexes were assumed to have A-form RNA helix conformation. (ii) If two duplex stems were separated by fewer than three unpaired nucleotides, they were stacked collinearly. (iii) If one helix competed with two others for collinear stacking, the two helices separated by the least number of unpaired nucleotides were chosen to form a stacked helix. (iv) Non-Watson-Crick base-pairing was allowed at the junction of two helices. (v) Single “bulged” bases were stacked within a helix. (vi) If two conserved bases in a single-stranded region were in proximity, base-pairing was attempted subject to the constraints of the chemical-modification data. (vii) Loop conformations were taken from those in the tRNA structure.” Kim had built the tRNA structure in electron density in Alex Rich’s laboratory (
). Afterward, with the development of molecular graphics and refinement programs, full molecular models with coordinates could be generated (Fig. 7).
Figure 7 (bottom) shows the evolution of the computer tools necessary for the manipulation and refinement of RNA structures. The advent of computer graphics and the development of systems allowing the display and the manipulation of molecular objects in real time, pioneered by Levinthal (
), were breakthroughs in structural biology. Several of these tools were developed for crystallography and modeling purposes as well. Model building is the process in crystallography (or cryo-EM) in which one actually assembles a molecular model by fitting molecular objects into electron density (manually either with physical objects in a Richard’s box or with computer graphics, a process now almost automatically done). The model building is guided and constrained by the electron density. The molecular models are then refined geometrically and stereochemically so as to satisfy best the crystallographic data or the electron density. In ab initio modeling, the molecular objects are built without the constraints of crystallographic data. The model building can be driven by geometric and physicochemical energetic terms in (semi-) automatic fashion by various computing tools (for RNA, see the recent reviews (
)). Such tools should integrate the knowledge, accumulated at a given time, on the molecular systems to be modeled. This knowledge integration is still a formidable challenge owing to the diversity of molecular interactions to consider and treat simultaneously (weak forces, water molecules, ions, etc.). When performed manually with physical objects, model building is still very powerful to assimilate and understand the rules underlying the folding of macromolecules. Richard Hamming said, “The purpose of computing is insight, not numbers.” Similarly, the purpose of modeling is insight, not models. One can also recall the famous quote from Richard Feynman: “What I cannot create, I do not understand.”
The point of archiving 3D models is to provide a record of previous modeling efforts that can help to benchmark methodological progress in computational modeling and to develop metrics to assess contemporary modeling efforts (see (
). The RNA-Puzzles consortium is a community-wide assessment of RNA 3D structure prediction that aims to expose the bottlenecks in current RNA 3D structure prediction and promote the improvement of these prediction methods. RNA-Puzzles attempts to provide the RNA modeling community what CASP (Critical Assessment of Methods of Protein Structure Prediction) has been doing for protein modeling (
In 2003, three organizations, the Research Collaboratory for Structural Bioinformatics (RCSB) in the United States, the Macromolecular Structure Database (MSD) at EBI, and the PDBj in Osaka, created the worldwide PDB (wwPDB; http://www.wwpdb.org/) with the goal of maintaining a single archive of structure data of macromolecules that is freely and publicly available to the global community (
). The Biological Magnetic Resonance Data Bank is now also part of wwPDB, and soon the China Protein Data Bank will join. This structure also facilitates unified deposition, curation, and distribution of structures to the growing global community of structural biologists and scientists that use 3D data (
). At the same time, each independent center is free to innovate new visualization and analysis tools to promote the reuse of the data in creative and useful ways for nonspecialists. Each center is also engaged in creating educational materials to make structural biology accessible to students at all levels, for example, PDB101’s Molecule of the Month and the PDBj’s Encyclopedia of Protein Structures (
): “It is the crystallographer’s responsibility to make sure that incorrect protein structures do not reach the literature.” They could as well have written: “It is the structural biologist’s responsibility to make sure that incorrect macromolecular structures do not reach the literature,” which would encompass also NMR and cryo-EM technologies as well as nucleic acids and their complexes. PDB continues to adapt in the new century by adopting new data formats to accommodate the much larger structures being solved. The development of metrics for assessing structure validity both globally and locally at the nucleotide level is proving key to understanding the significance of structural variation and mapping out the dynamic behavior of nucleic acid structures in response to interactions with small molecules, proteins, and other nucleic acids. Cryo-EM is rapidly displacing X-ray crystallography for very large complexes and molecular machines that cycle through multiple functional states and are hard to crystallize in unique states. New metrics for cryo-EM are in progress and should soon be available (
). In this context, referees and journal editors have also a major role to play, to prevent incorrect structures from reaching the literature and worse, the archival databases. Referees are encouraged to request (and in many cases are actually provided with) complete and detailed statistics tables, validation reports and quality indicators, coordinates, and electron-density maps. Journal editors, as well as authors, should comply with such requests despite fierce competition for publication. The PDB has played an important role by providing sophisticated metrics and automation that provides rapid feedback to contributors of data when they submit structures.
Databases constitute an absolute necessity for advancing science, to facilitate the work of scientists in the same and related disciplines. They are repositories of observational data organized with carefully curated dictionaries and controlled vocabularies upon which future science can build and develop. In the immediate future, there are at least two main challenges to which we would like to draw attention. Owing to the sizes and complexities of modern databases, these two challenges are interconnected and interdependent.
First, the sustainability and maintenance of databases for the use of a now internationalized community of scientists raise the issue of how the associated costs shall be equitably apportioned; these are recurrent costs to support trained and competent personnel, together with the maintenance and constant upgrading of the extensive infrastructure required to keep up with the growth in the size and complexity of the data. Short-term grants with a dedicated focus are not adapted to maintain large international resources (
). The Global Biodata Coalition is an initiative started by the International Human Frontier Science Program Organization that according to their website (https://globalbiodata.org) aims “to stabilize and ensure sustainable financial support for the global biodata infrastructure and in particular to identify for prioritized long-term support a set of Global Core Data Resources that are crucial for sustaining the broader biodata infrastructure.”
And, secondly, databases fundamentally comprise well-organized archives of chosen sets of objects. In structural biology, the objects are mainly molecular sequences and structures. However, as these two types of data attach to the same natural entities, they need to be integrated to grasp biological function and evolution more holistically. New developments in cryo-electron tomography (
), which allows visualization of macromolecules in situ, will open up visualization of macromolecules in their cellular environment. Maybe one day we will also include digital descriptions of phenotypes, including bones, skeletons, or wings in order to relate their forms and colors to the underlying genes and genetic networks. Launched by the RNAcentral Consortium in 2014 at the EMBL-EBI, Wellcome Genome Campus (Hinxton, UK), RNAcentral (https://rnacentral.org) integrates and unifies access to all types of noncoding RNA sequences from all organisms. This is a major enterprise that goes in the right direction for the integration and interoperability of data. A bridge between sequences and standard secondary structures has been achieved recently (
). In the future this bridging integration should be extended to three-dimensional structures and thus the PDB and the NDB.
In the light of evolution
It would take a whole book with several dozens of figures to convey the amazing knowledge accumulated on RNA structures and architectures since the beginning of RNA structural biology and the advent of the PDB. Only glimpses of RNA structural biology are presented here, and we would like to apologize for not mentioning the contributions of many structural biologists, especially those using NMR spectroscopy, a field that contributed several seminal structures, for example, on RNPs and telomerase (
). Almost each RNA structure still brings some surprising and unexpected features, and these unveil the constraints of the physicochemical interactions and the congruent accommodations due to the historical contingencies of biological evolution. If we can cite again T. Dobzhansky (
): “Seen in the light of evolution, biology is, perhaps, intellectually the most satisfying and inspiring science. Without that light it becomes a pile of sundry facts, some of them interesting or curious but making no meaningful picture as a whole.” A quote from Ernest Rutherford states “All science is either physics or stamp collecting,” which is in sharp contrast to the lifelong accumulation of observations and data by the biologist Charles Darwin. The RNA structures in the PDB are absolutely not a pile of miscellaneous items, “sundry facts”; they are physical objects, which, together with genomic data, allow us to integrate present-day biological functions and the historical evolution in all living species on earth.
Technological improvements and breakthroughs are at the heart of discovery and scientific progress. One may recall the famous quote from Sydney Brenner “progress in science depends on new techniques, new discoveries and new ideas, probably in that order” (
), we admire in awe the progress and breakthroughs. Could such tools be applied to RNA structure prediction? Since they are based on the analysis of hundreds of structures, do we have enough RNA structures sufficiently nonredundant? As remarked by Jane Richardson (
), among the “1491 RNA-only structures, just 373 of which have a chain ≥60 nucleotides” and of those a large number consists of ribosomal structures with various ligands or from various biological sources. As appropriately reminded by John Helliwell (
). Together with the editors, we would like to dedicate this article to his memory. I have been fortunate to exchange and collaborate with Neocles for more than 20 years. Beyond the many scientific papers published together, always occasions of in-depth, dynamic, and always fair discussions, Neocles was a friend intensely human, joyous, and positive with whom it was recomforting to be. I admired him deeply for his move toward local politics (in 2019, he was elected to the City Council of Bowling Green, Ohio) and his involvement in many social and climate issues. He applied his intellectual power to these issues as thoroughly and seriously as he did to the various scientific problems he tackled. His untimely and brutal death is a loss to the local community around his University and town, but also to all the RNA scientists around the world who use the concepts, tools database he contributed to develop. Neocles earned his PhD from Yale with Peter Moore as mentor. He became professor of Chemistry at Bowling Green State University in 1987. With Peter Moore, he worked using NMR on the 5S rRNA and showed the key role of Mg ions and the presence of non-Watson–Crick pairs. Later using sequence comparisons, he could identify loop E modules in RNA. Together we published a nomenclature for classifying logically and nonambiguously non-Watson–Crick. Neocles Leontis and his group at Bowling Green developed many websites and tools for analyzing and compiling RNA structures.
Conflict of interest
The authors declare that they have no conflicts of interest with the contents of this article.
We wish to thank the numerous scientists, engineers, technicians who have been involved since the origins in the development and maintenance of the PDB. We gratefully acknowledge also the funding bodies that offered to the scientific community worldwide this essential component of its everyday life.
E. W. and N. B. L. wrote the article, did the bibliographic research, and made first drafts of the drawings and figures.
Stereochemistry of nucleic acid constituents: I. Refinement of the structure of cytidylic acid b.