Advertisement

SARS-CoV-2 (COVID-19) structural and evolutionary dynamicome: Insights into functional evolution and human genomics

Open AccessPublished:June 25, 2020DOI:https://doi.org/10.1074/jbc.RA120.014873
      The pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has challenged the speed at which laboratories can discover the viral composition and study health outcomes. The small ∼30-kb ssRNA genome of coronaviruses makes them adept at cross-species spread while enabling a robust understanding of all of the proteins the viral genome encodes. We have employed protein modeling, molecular dynamics simulations, evolutionary mapping, and 3D printing to gain a full proteome- and dynamicome-level understanding of SARS-CoV-2. We established the Viral Integrated Structural Evolution Dynamic Database (VIStEDD at RRID:SCR_018793) to facilitate future discoveries and educational use. Here, we highlight the use of VIStEDD for nsp6, nucleocapsid (N), and spike (S) surface glycoprotein. For both nsp6 and N, we found highly conserved surface amino acids that likely drive protein–protein interactions. In characterizing viral S protein, we developed a quantitative dynamics cross-correlation matrix to gain insights into its interactions with the angiotensin I–converting enzyme 2 (ACE2)–solute carrier family 6 member 19 (SLC6A19) dimer. Using this quantitative matrix, we elucidated 47 potential functional missense variants from genomic databases within ACE2/SLC6A19/transmembrane serine protease 2 (TMPRSS2), warranting genomic enrichment analyses in SARS-CoV-2 patients. These variants had ultralow frequency but existed in males hemizygous for ACE2. Two ACE2 noncoding variants (rs4646118 and rs143185769) present in ∼9% of individuals of African descent may regulate ACE2 expression and may be associated with increased susceptibility of African Americans to SARS-CoV-2. We propose that this SARS-CoV-2 database may aid research into the ongoing pandemic.
      The current SARS-CoV-2 outbreak has become a global pandemic. There is an urgent need to understand the proteins coded by SARS-CoV-2 and how they can be targeted for intervention. Coronaviruses belong to the Orthocoronavirinae subfamily, which lies under the Coronaviridae family. Their 26–33-kb genome consists of positive-sense, single-stranded RNA, coding for nonstructural and structural proteins. To date, seven coronaviruses have been discovered that are capable of human-to-human transmission. Four of these cause the common cold (HKU1, NL63, OC43, and 229E), whereas the other three (MERS-CoV, SARS-CoV, SARS-CoV-2) can cause more severe respiratory illnesses resulting in multisystem organ failure and death (
      • Kasmi Y.
      • Khataby K.
      • Souiri A.
      • Ennaji M.M.
      Coronaviridae: 100,000 years of emergence and reemergence.
      ). SARS-CoV-2 shares 79% genomic similarity with SARS-CoV, linking it to the bat and human SARS-CoV annotation. Both SARS-CoV and SARS-CoV-2 bind to the ACE2 receptor through the spike (S) protein (
      • Yan R.
      • Zhang Y.
      • Li Y.
      • Xia L.
      • Guo Y.
      • Zhou Q.
      Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2.
      ,
      • Kuba K.
      • Imai Y.
      • Rao S.
      • Gao H.
      • Guo F.
      • Guan B.
      • Huan Y.
      • Yang P.
      • Zhang Y.
      • Deng W.
      • Bao L.
      • Zhang B.
      • Liu G.
      • Wang Z.
      • Chappell M.
      • et al.
      A crucial role of angiotensin converting enzyme 2 (ACE2) in SARS coronavirus-induced lung injury.
      ). The ubiquitous presence of Coronaviridae in many animals and its relatively small genome makes these an ideal infective agent as it adapts and evolves into a highly effective pathogen. In the case of the SARS-CoV-2 genome, 96.2% of the genome is shared with a bat coronavirus, suggesting a zoonotic origin (
      • Ceraolo C.
      • Giorgi F.M.
      Genomic variance of the 2019-nCoV coronavirus.
      ,
      • Andersen K.G.
      • Rambaut A.
      • Lipkin W.I.
      • Holmes E.C.
      • Garry R.F.
      The proximal origin of SARS-CoV-2.
      ). Initial disease propagation was detected in Wuhan, China, a major transportation hub with over 11 million people. The population density and the heavy traffic into and out of the city made for a large outbreak that spread quickly throughout the world (
      • Guarner J.
      Three emerging coronaviruses in two decades: the story of SARS, MERS, and now COVID-19.
      ,
      • Dong E.
      • Du H.
      • Gardner L.
      An interactive web-based dashboard to track COVID-19 in real time.
      ). This combination has given rise to a once-in-a-generation pandemic.
      Insights can be gathered from SARS-CoV and MERS-CoV regarding SARS-CoV-2 illness severity. Studies in mouse models have led to speculation that SARS-CoV and MERS-CoV infections cause a delayed type I interferon response, which allows for early uncontrolled viral replication. This leads to an influx of neutrophils and monocytes/macrophages, resulting in a hyperproduction of cytokines causing pneumonia, acute respiratory distress syndrome, and global sepsis. In SARS-CoV-2, patients experience similar changes in neutrophils and lymphocytes, indicating that SARS-CoV-2 infection severity may closely depend on a delayed response of the innate immune system (
      • Prompetchara E.
      • Ketloy C.
      • Palaga T.
      Immune responses in COVID-19 and potential vaccines: lessons learned from SARS and MERS epidemic.
      ). Early Chinese data provide remarkable insights into how SARS-CoV-2 drives lethality through a sepsis-driven multiple-organ failure model (
      • Zhou F.
      • Yu T.
      • Du R.
      • Fan G.
      • Liu Y.
      • Liu Z.
      • Xiang J.
      • Wang Y.
      • Song B.
      • Gu X.
      • Guan L.
      • Wei Y.
      • Li H.
      • Wu X.
      • Xu J.
      • et al.
      Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study.
      ), including early spike in ferritin, cytokine storm, and injury to the cardiac system (
      • Mehta P.
      • McAuley D.F.
      • Brown M.
      • Sanchez E.
      • Tattersall R.S.
      • Manson J.J.
      • HLH Across Speciality Collaboration, UK
      COVID-19: consider cytokine storm syndromes and immunosuppression.
      ,
      • Shi S.
      • Qin M.
      • Shen B.
      • Cai Y.
      • Liu T.
      • Yang F.
      • Gong W.
      • Liu X.
      • Liang J.
      • Zhao Q.
      • Huang H.
      • Yang B.
      • Huang C.
      Association of cardiac injury with mortality in hospitalized patients with COVID-19 in Wuhan, China.
      ). Understanding of the SARS-CoV-2 genome could provide critical insights into the complex interplay with the host genome driving disease progression in severe risk group patients, allowing for the identification of potential target sites for intervention.
      Several Coronaviridae proteins have been highly studied as targets of intervention to prevent infection spread. One of these proteins, the N protein, is of particular interest as it interacts with host ribosomal subunits and has been shown to suppress nonsense-mediated decay of viral mRNA by the host cell (
      • Wada M.
      • Lokugamage K.G.
      • Nakagawa K.
      • Narayanan K.
      • Makino S.
      Interplay between coronavirus, a cytoplasmic RNA virus, and nonsense-mediated mRNA decay pathway.
      ,
      • Malle L.
      A map of SARS-CoV-2 and host cell interactions.
      ). Enzymes encoded by SARS-CoV-2, such as the 3-chymotrypsin (3C)-like protease, RNA-dependent RNA polymerase, and papain-like protease, are potential targets of drugs (
      • Wu C.
      • Liu Y.
      • Yang Y.
      • Zhang P.
      • Zhong W.
      • Wang Y.
      • Wang Q.
      • Xu Y.
      • Li M.
      • Li X.
      • Zheng M.
      • Chen L.
      • Li H.
      Analysis of therapeutic targets for SARS-CoV-2 and discovery of potential drugs by computational methods.
      ). Proteins of the virus, including N, nsp9, nsp13, nsp15, ORF3a, and ORF6, have been shown to target innate immune signaling pathways (
      • Malle L.
      A map of SARS-CoV-2 and host cell interactions.
      ). The S protein, surface-expressed, enters cells through ACE2/SLC6A19 membrane receptor contact similarly to SARS-CoV, but with higher binding affinity (
      • Yan R.
      • Zhang Y.
      • Li Y.
      • Xia L.
      • Guo Y.
      • Zhou Q.
      Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2.
      ). This interaction has the potential to be therapeutically inhibited through antibody neutralization (
      • Walls A.C.
      • Park Y.-J.
      • Tortorici M.A.
      • Wall A.
      • McGuire A.T.
      • Veesler D.
      Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein.
      ). ACE2 is highly expressed in the heart, small intestine, kidney, thyroid, breast, arterial walls, adipose, and testis, with a lower number of cells in lung, oral/nasal cavity, pancreas, and liver (
      • Hamming I.
      • Timens W.
      • Bulthuis M.L.C.
      • Lely A.T.
      • Navis G.J.
      • van Goor H.
      Tissue distribution of ACE2 protein, the functional receptor for SARS coronavirus: a first step in understanding SARS pathogenesis.
      ). Nasal epithelial cells play a critical role in SARS-CoV-2 (
      • Sungnak W.
      • Huang N.
      • Bécavin C.
      • Berg M.
      • Queen R.
      • Litvinukova M.
      • Talavera-López C.
      • Maatz H.
      • Reichart D.
      • Sampaziotis F.
      • Worlock K.B.
      • Yoshida M.
      • Barnes J.L.
      • HCA Lung Biological Network
      SARS-CoV-2 entry factors are highly expressed in nasal epithelial cells together with innate immune genes.
      ). In the lung, ACE2 is expressed in the type 2 pneumocytes, putative progenitor cells of alveolar epithelia linked to lung fibrosis pathways (
      • Uhal B.D.
      • Dang M.
      • Dang V.
      • Llatos R.
      • Cano E.
      • Abdul-Hafez A.
      • Markey J.
      • Piasecki C.C.
      • Molina-Molina M.
      Cell cycle dependence of ACE-2 explains downregulation in idiopathic pulmonary fibrosis.
      ,
      • Barkauskas C.E.
      • Cronce M.J.
      • Rackley C.R.
      • Bowie E.J.
      • Keene D.R.
      • Stripp B.R.
      • Randell S.H.
      • Noble P.W.
      • Hogan B.L.M.
      Type 2 alveolar cells are stem cells in adult lung.
      ). The spike-ACE2 complex is further processed by TMPRSS2 for internalization of the virus, which has been postulated to be a target site with protease inhibitors (
      • Hoffmann M.
      • Kleine-Weber H.
      • Schroeder S.
      • Krüger N.
      • Herrler T.
      • Erichsen S.
      • Schiergens T.S.
      • Herrler G.
      • Wu N.-H.
      • Nitsche A.
      • Müller M.A.
      • Drosten C.
      • Pöhlmann S.
      SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor.
      ). Currently, it is not well-understood how the spike-ACE2 complex is impacted by viral or human variants, suggesting a need for further research.
      As we learn more about viral pathogenesis, hopefully the current outbreak will be brought under control and future outbreaks prevented. In this current work, we developed the Viral Integrated Structural Evolution Dynamic Database (VIStEDD) for the SARS-CoV-2 proteome, enriching VIStEDD using evolutionary insights of viruses and human variant mapping for potential functional outcomes.

      Results

      SARS-CoV-2 dynamicome database

      A total of 24 proteins of SARS-CoV-2 (Table 1) were run through our standardized workflow (Fig. 1), consisting of protein structure assessment, setting of protein protonation at pH 7.4, energy minimization in water with NaCl, 20 ns of molecular dynamics simulations, analysis of the movement trajectories, sequence identification from the nonredundant (nr) database, mapping of conservation onto structure/dynamics, and assessment of interactions with known binding partners. We extracted 55,390 sequences homologous to the SARS-CoV-2 proteins that consist of 9,701 amino acids (Table 1). The uneven sequence depth for each of the proteins made it necessary to utilize a z-score conservation calculation for each of the amino acids in the proteins so as to normalize, using the average (z-score of 0) as the starting point (yellow), followed by z-scores of 0–0.5 (yellow), 0.5–1 (bright orange), 1–1.5 (orange), 1.5–2 (dark orange), and >2 (red) (Fig. 2). The dynamics and evolution for each protein were integrated together into VIStEDD, available at RRID:SCR_018793. VIStEDD has been built for the addition of future viruses. Within the SARS-CoV-2 page is a list of each of the 24 proteins in the format of Table 1, where each protein can be clicked to assess data. On the page for each protein is a link to the individual protein data folder system, a video of the protein rotating with conservation, details of the protein function, a widget to purchase a 3D print of the protein at cost of production, the amino acid movement from molecular dynamics simulations (mds), and the table of data for each amino acid of the protein. If protein interactions structures are known, information is present with a link to protein–protein interaction (PPI) data. For example, within the nsp10 data (RRID:SCR_018793, SARS-CoV-2, nsp10), structures of nsp10 interacting with either nsp14 or nsp16 are available, both of which show highly conserved contact sites of interaction. As we continue to advance VIStEDD, we anticipate the addition of more material within each page.
      Table 1SARS-CoV-2 proteins analyzed
      GeneSARS proteinAccessionModeling toolPDB filesAmino acidsBound moleculesSequencesRMSFDCCM >0.9CHETG
      repnsp1QHD43415YASARA2GDT1802503.63.143.321.122.211.12.2
      repnsp2QHD43415ITASSER6382466.011.867.119.10.612.50.6
      repPapain-like proteinaseQHD43415YASARAMultiple1945Zn3,1803.18.850.524.613.111.30.5
      repnsp4QHD43415ITASSER5003,3252.83.057.835.00.07.20.0
      rep3C-like proteinaseQHD43415YASARAMultiple306Dimer3,3971.40.626.523.931.716.71.3
      repnsp6QHD43415ITASSER2902,5582.60.762.814.11.421.70.0
      repnsp7QHD43415YASARA6NUR, 2AHM, 3UB0833,2562.93.08.481.90.09.60.0
      repnsp8QHD43415YASARA6NUR, 3UB0, 5XOG1983,3395.98.919.252.019.79.10.0
      repnsp9QHD43415YASARAMultiple113Dimer3,3861.50.120.414.247.817.70.0
      repnsp10QHD43415YASARAMultiple139Zn3,3441.80.444.628.17.220.10.0
      repRNA-directed RNA polymeraseQHD43415YASARA6NUR932Zn5,0862.22.830.542.011.513.32.8
      repHelicaseQHD43415YASARA5WWP, 6JYT601Zn5,5981.51.535.826.623.514.10.0
      repGuanine-N7 methyltransferaseQHD43415YASARA5NFY, 5C8T527Zn2,7942.12.436.120.124.117.52.3
      repUridylate-specific endoribonucleaseQHD43415YASARAMultiple3462,4891.30.437.923.429.87.81.2
      rep2′-O-methyltransferaseQHD43415YASARA2XYV, 5YNN298Na, Mg, SAH2,4951.30.040.323.818.113.44.4
      SSpike glycoproteinQHD43416.1YASARA6CRW, 6NB6, 5X581273Trimer/Membrane6,6123.05.440.423.025.310.40.9
      3aProtein 3aQIM47458.1ITASSER275652.70.673.812.02.511.60.0
      EEQIM47459.1ITASSER75944.82.218.758.70.016.06.7
      MMembrane proteinQIM47460.1ITASSER2221,5072.70.551.820.77.719.80.0
      6ORF6QIM47461.1ITASSER61319.28.257.56.49.126.01.0
      7a7aQHD43421.1YASARA1XAK, 1YO4121426.56.533.117.432.217.40.0
      8ORF8QIM47463.1ITASSER121352.20.469.410.70.019.80.0
      NNucleoproteinQHD43423.2YASARA/ITASSERMultiple4192,2612.70.247.526.20.019.76.6
      10ORF10QIM47465.1ITASSER382.60.042.131.626.30.00.0
      Figure thumbnail gr1
      Figure 1SARS-CoV-2 structural/evolution dynamicome workflow. Shown are data for the RNA-directed RNA polymerase.
      Figure thumbnail gr2
      Figure 2SARS-CoV-2 structural/evolution models. Shown in the middle is the viral RNA with protein-coding genes in red. Gray lines connect the RNA region to each protein. Colors on the models are based on z-score levels of conservation for each protein based on extracted sequences (). Gray amino acids fall below the average conservation (value <0) for each protein; yellow, 0–0.5; light orange, 0.5–1; orange, 1–1.5; dark orange, 1.5–2; red, >2.
      The raw data of each protein is the strength of VIStEDD (drive.google.com/drive/folders/1dXBJpLo3bay1JQ9BckUsVcTViv6P0w1q?usp=sharing). For each protein present in VIStEDD, we have generated in the root folder of the protein a fasta sequence file, PDB file of the protein structure, protein models with conservation (sce = YASARA scene, pse- PyMOL scene), high-resolution image of conservation, molecular video of the conservation rotating around the y axis (mpg and mp4), and compiled conservation and dynamics data for each amino acid (csv or tab-delineated). Five folders of data are also present: 1) 3D, containing a vrml 3D printing file (also zipped) of conservation mapped on the protein; 2) genomics, containing aligned reads of the species sequences extracted; 3) mds, containing all of the trajectory files for the mds; 4) report, containing all of the analysis files from YASARA assessment of mds; and 5) tab, containing all of the tab-delineated analysis files of the mds. All of the 3D files can be ordered from Shapeways with the web links provided in Table S1. Another folder (PPI) contains data for all of the mds performed on protein–protein interactions, including spike-ACE2-SLC6A19, TMPRSS2, the polymerase complex, the N complex, and ACE2_S_Database consisting of 235 species ACE2 sequences modeled with spike interaction, energy-minimized, and binding or potential energy calculated. A tab-delineated file is in the ACE2_S_Database to label the species for each numbered complex.
      From the mds of all proteins, a total of 6,594,981 atoms including water, there is an average movement per amino acid (root mean square fluctuation (RMSF)) of 3.2 Å with 3 amino acids correlated per residue >0.9 based on dynamics cross-correlation calculation. The secondary structures of the proteins are relatively similar from the beginning of the simulation compared with the end (Fig. 3A), with the largest percentage coiled (C, 45.02%) followed by helix (H, 25.42%), β-sheet (E, 15.08%), turns (T, 13.44%), and three-turn helix (G, 1.05%). On their own, before PPI, the nsp7, nsp8, nsp9, E, 3C-proteinase, and RNA-directed RNA polymerase are the most helical and β-sheet–containing proteins, whereas protein 3a, ORF8, nsp2, nsp6, and nsp4 are the most disordered with coil composition (Fig. 3B). Several of the proteins contain high movement of the structure (ORF6, ORF7a, nsp8, and nsp2), and two of the proteins (papain-like proteinase and spike glycoprotein) have more overall lower movement indicative of hydrophobic collapse, with a high number of correlated amino acids per residue (Fig. 3C).
      Figure thumbnail gr3
      Figure 3SARS-CoV-2 structural/evolution statistics. A, the percentage of secondary structure for all proteins at the start of molecular dynamics simulations (gray) and at the end (red). Classes consist of coil (C), helix (H), β-sheet (E), turn (T), and 3-turn helix (G). B, breakdown for each protein for secondary structure percentage that is coil (x axis) versus helix/β-sheet (y axis). Those with the most helix/beta sheet or coiled are labeled. C, breakdown of molecular dynamics simulation data for each protein showing the average amino acid RMSF (Å, x axis) versus the average number of correlated amino acids per residue (y axis). Proteins with high movement are labeled. D, plot of the number of amino acids in each protein versus the number of BLAST-extracted sequences. Labeled are those that have high numbers of identified sequences (black) and those with only a few (red). E, the conservation z-score (x axis) versus the RMSF (y axis) for all amino acids analyzed in SARS-CoV-2. The lines represent cutoffs used for F, with those >5 Å for RMSF and 1–1.5 (gray), 1.5–2 (orange), or >2 (red) value for z-score cutoffs. F, the percentage of each protein's amino acids that fall into identified groups from E, representing identified highly dynamic and conserved amino acids. G, conservation/dynamics of nsp6 amino acids. Shown at the top is the RMSF of each amino acid of nsp6 with colors corresponding to cutoffs of E and F. Shown at the bottom is a sliding window calculation of 7 amino acids for additive z-scores to map two highly conserved sites (shown in H and I). H and I, protein model of nsp6 with z-score coloring of . Shown are the two sites of high conservation with amino acids labeled.
      The largest proteins of SARS-CoV-2 contain the highest number of sequences extracted for homology, including the papain-like proteinase, RNA-directed RNA polymerase, spike glycoprotein, and helicase (Fig. 3D). Several of the proteins, including E, protein 3a, protein 7a, ORF8, and ORF6, have a low number of mapped sequences (Fig. 3D), whereas ORF10 has no other identified sequences. Plotting the conservation relative to the mds-based movement, RMSF, for each of the 9,701 amino acids of SARS-CoV-2 can be used to identify critical sites of proteins under high selection that might be targeted (Fig. 3E). Highly dynamic and conserved amino acids are the prime location for critical PPI. Therefore, we mapped sites of high movement, >5 Å, with conservation 1–1.5 (gray), 1.5–2 (orange), or >2 (red) S.D. values higher than the mean of the protein. Clustering these sites to the percentage of amino acid reveals a likely high selection of dynamic amino acids (Fig. 3F). The nsp2 protein with low coverage of species has some suggested PPI contacts conserved in the range of 1–1.5 S.D. conservation. Proteins nsp7 and nsp8, which are known to contact the RNA-dependent RNA polymerase (Fig. 1), have several amino acids conserved in the 1.5–2 S.D. range with high dynamics, as the mds of PDB 6m71 and 7btf show stability and correlation to PPI within VIStEDD.

      nsp6 and conserved

      The papain-like protease and nsp6 have conserved sites >2 S.D. values (Fig. 3F). The SARS-CoV-2 nsp6 protein is known to interact with multiple ATPases of vesicle trafficking (
      • Gordon D.E.
      • Jang G.M.
      • Bouhaddou M.
      • Xu J.
      • Obernier K.
      • O'Meara M.J.
      • Guo J.Z.
      • Swaney D.L.
      • Tummino T.A.
      • Huettenhain R.
      • Kaake R.M.
      • Richards A.L.
      • Tutuncuoglu B.
      • Foussard H.
      • Batra J.
      • et al.
      A SARS-CoV-2-human protein-protein interaction map reveals drug targets and potential drug-repurposing.
      ) and interacts with nsp3 and nsp4 to induce double-membrane vesicles (
      • Angelini M.M.
      • Akhlaghpour M.
      • Neuman B.W.
      • Buchmeier M.J.
      Severe acute respiratory syndrome coronavirus nonstructural proteins 3, 4, and 6 induce double-membrane vesicles.
      ). Nsp6 protein also interacts with the Sigma receptor, which is thought to regulate ER stress response (
      • Gordon D.E.
      • Jang G.M.
      • Bouhaddou M.
      • Xu J.
      • Obernier K.
      • O'Meara M.J.
      • Guo J.Z.
      • Swaney D.L.
      • Tummino T.A.
      • Huettenhain R.
      • Kaake R.M.
      • Richards A.L.
      • Tutuncuoglu B.
      • Foussard H.
      • Batra J.
      • et al.
      A SARS-CoV-2-human protein-protein interaction map reveals drug targets and potential drug-repurposing.
      ) and blocks ER-induced autophagosome/autolysosome vesicle that restricts viral production, leading to the generation of small autophagosome vesicles, thereby limiting their expansion (
      • Cottam E.M.
      • Whelband M.C.
      • Wileman T.
      Coronavirus NSP6 restricts autophagosome expansion.
      ). To date, no structures of nsp6 have been solved even though the protein is present in 2,558 Coronaviridae genomes. We identify two regions with minimal conserved hydrophobic collapse (Fig. 3G) consisting of mostly coiled secondary structure (Fig. 3B). These two regions of conservation (Fig. 3G) cluster together with multiple charged and aromatic amino acids that would tend to drive PPI (Fig. 3, H and I). Moving forward, these nsp6 sites could be critical regions to target with therapeutics for the broad Coronaviridae proteins.

      Nucleocapsid (N) data insights

      Both the S and N proteins have multiple sites conserved >2 S.D. with high dynamics (Fig. 3F), with N having around 4% of its amino acids falling into this category. These two proteins are further dissected below. The SARS-CoV-2 N protein has been shown to interact with multiple RNA processing and stress granule proteins (
      • Gordon D.E.
      • Jang G.M.
      • Bouhaddou M.
      • Xu J.
      • Obernier K.
      • O'Meara M.J.
      • Guo J.Z.
      • Swaney D.L.
      • Tummino T.A.
      • Huettenhain R.
      • Kaake R.M.
      • Richards A.L.
      • Tutuncuoglu B.
      • Foussard H.
      • Batra J.
      • et al.
      A SARS-CoV-2-human protein-protein interaction map reveals drug targets and potential drug-repurposing.
      ). Using 2,261 sequences (Fig. 4A), we mapped four highly conserved regions of N (Fig. 4B). The conservation of region 1 contributes to a highly conserved hydrophobic, aromatic core consisting of several β strands (Fig. 4, C and D). Conserved region 3 consists of several serine amino acids that are likely phosphorylated and a potential 14-3-3 binding motif (Fig. 4D). Molecular dynamics of N suggests three regions of structural organization, with region 1 corresponding to a domain with structural folding (Fig. 4E). Amino acids 331–333 were the most conserved yet dynamic site of N (Fig. 4F). The C-terminal region of N is known to form a multimer complex (Fig. 4G) with amino acids 331–333 fitting into contacts of the subunits (Fig. 4H). From the multimer N structure, conserved amino acids Val-270, Phe-274, Arg-277, Asn-285, Gly-287, Phe-286, and Asp-288 are clustered and surface-exposed (Fig. 4I). These sites are likely to contribute to PPI and warrant future investigations.
      Figure thumbnail gr4
      Figure 4SARS-CoV-2 protein N (nucleocapsid) conserved dynamic amino acids. A, phylogenetic tree of 2,261 sequences extracted from the nr protein sequence database for N. B, conservation of amino acids based on sequences from A. Four regions of conserved function are identified in red. C, model of N with conservation (percentage of 2,261 sequences) colored (<50% (gray), 50–60% (yellow), 60–70% (light orange), 70–80% (orange), 80–90% (dark orange), and >90% (red)). The four regions of B are boxed and labeled in red. D, top conserved motifs in N. The z-scores for N conservation were placed on an 11-codon sliding window to identify regions of interest in the four conserved regions of B. E, dynamics cross-correlation matrix (DCCM) of amino acids on the nucleocapsid protein assessed with molecular dynamics simulations. F, intersection of conservation and dynamics of the protein with highly dynamic and conserved amino acids in red. G, multimeric model of N with coloring based on E. H, zoom-in view of black box from F showing amino acids 331–333 identified in D. I, conserved region 2 from B identifying the top amino acids on the multimer model.

      Posttranslational modification analysis

      Building on our insights for N, we expanded to a systematic analysis of posttranslational modifications for SARS-CoV-2 proteins that was integrated into our amino acid matrix insights (Fig. 5). Current literature on SARS-CoV-2 modifications focuses on conserved and novel glycosylation and phosphorylation sites of the S protein (
      • Xiao X.
      • Chakraborti S.
      • Dimitrov A.S.
      • Gramatikoff K.
      • Dimitrov D.S.
      The SARS-CoV S glycoprotein: expression and functional characterization.
      ). With host–pathogen interaction being regulated by modifications, it is now a target for pharmacotherapy. Data from SARS-CoV suggest modifications in the N, M, E, 3a, nsp4, and nsp9 proteins that have yet to be explored in SAR-CoV-2 (
      • Fung T.S.
      • Liu D.X.
      Post-translational modifications of coronavirus proteins: roles and function.
      ,
      • Nal B.
      • Chan C.
      • Kien F.
      • Siu L.
      • Tse J.
      • Chu K.
      • Kam J.
      • Staropoli I.
      • Crescenzo-Chaigne B.
      • Escriou N.
      • van der Werf S.
      • Yuen K.-Y.
      • Altmeyer R.
      Differential maturation and subcellular localization of severe acute respiratory syndrome coronavirus surface proteins S, M and E.
      ,
      • Oostra M.
      • de Haan C.A M.
      • de Groot R.J.
      • Rottier P.J.M.
      Glycosylation of the severe acute respiratory syndrome coronavirus triple-spanning membrane proteins 3a and M.
      ). Specifically, the N protein was shown to undergo extensive acetylation, phosphorylation, sumoylation, cleavage, and ADP-ribosylation (
      • Fan Z.
      • Zhuo Y.
      • Tan X.
      • Zhou Z.
      • Yuan J.
      • Qiang B.
      • Yan J.
      • Peng X.
      • Gao G.F.
      SARS-CoV nucleocapsid protein binds to hUbc9, a ubiquitin conjugating enzyme of the sumoylation system.
      ,
      • Surjit M.
      • Kumar R.
      • Mishra R.N.
      • Reddy M.K.
      • Chow V.T.K.
      • Lal S.K.
      The severe acute respiratory syndrome coronavirus nucleocapsid protein is phosphorylated and localizes in the cytoplasm by 14-3-3-mediated translocation.
      ). Inhibition of several cellular kinases (CK2 and CDK) was shown to impact proper localization of the N protein, trapping it within the nucleus of the host cell, further highlighting the need for N protein phosphorylation to carry out proper function (
      • Surjit M.
      • Lal S.K.
      The SARS-CoV nucleocapsid protein: a protein with multifarious activities.
      ). With these PTMs playing a vital role in proper virion assembly, they must be further explored and understood in SARS-CoV-2 to elucidate all options for targeted pharmacotherapy.
      Figure thumbnail gr5
      Figure 5Posttranslational modification screening of SARS-CoV-2. A, ranking of functional predicted sites using 10 different tools. B and C, conservation z-score (B) and z-scores put on a five-codon sliding window for additive motif conservation (C) of each site from A for each tool shown as a box and whisker plot. D–F, top conserved sites (left) and least conserved sites that are unique to SARS-CoV-2 (right) for functional predictions. Colors of bars correspond to the tools of A. Each functional amino acid is labeled (protein, variant, secondary structure, and tool annotating PTM). Data are shown for the z-score of conservation (D), the 5-amino acid sliding window (E), and structural movement (F). G, the highest conserved site throughout the database found on spike at Ser-816 with conservation of amino acids around the site. H and I, highlighting the phosphorylation prediction (magenta) on spike for Ser-816 (H) with a zoom-in view of the site on one of the monomers (I).
      With high filters on each tool, we identify 186 NetPhos3.1 (phosphorylation), 15 SUMO1.0 (Sumo binding or SUMOylation), 27 SNO 1.0 (S-nitrosylation), 28 YNO21.0 (tyrosine nitration), 273 CCD1.0 (calpain cleavage), 36 Polo1.0 (polo-like kinases), 34 PUP1.0 (pupylation), 20 TSP1.0 (tyrosine sulfation), 25 PAIL2.0 (lysine acetylation), and 41 Lipid1.0 (lipidation) predicted modifications to SARS-CoV-2 viral proteins (Fig. 5A) with data available within our VIStEDD tools. Few modification predictions occur at highly conserved amino acids (Fig. 5B) or within 5-amino acid conserved motifs (Fig. 5C) based on our evolutionary analysis. Moreover, there are also very few unique modification predictions to the SARS-CoV-2 sequence not found throughout our evolutionary conservation. We have identified 22 modification predictions that are highly conserved and 28 sites poorly conserved, including multiple phosphorylation, lipidation, and acetylation events (Fig. 5, D–F). The most highly conserved motif predicted to be modified is Ser-816 of spike (Fig. 5G), where the amino acid is found surface-exposed on a loop of the protein (Fig. 5, H and I). Future work is desperately needed to further refine the modification sites within SARS-CoV-2.

      SARS-CoV-2 S interaction with host ACE2, SLC6A19, TMPRSS2 genomics

      The most highly researched protein of SARS-CoV-2, the S surface glycoprotein, is of most interest as it is the only portion outside of the virus that could be recognizable by B and T cells. The coronavirus S protein is the primary determinant of viral tropism and is responsible for receptor binding and membrane fusion. It is a large (∼180-kDa) glycoprotein that is present on the viral surface as a prominent trimer, and it is composed of two domains, S1 and S2 (
      • Belouzard S.
      • Millet J.K.
      • Licitra B.N.
      • Whittaker G.R.
      Mechanisms of coronavirus cell entry mediated by the viral spike protein.
      ). The S protein interacts with ACE2 to enter into cells, forming contacts with the ACE2-SLC6A19 dimer complex (Fig. 6A), where the presence of SLC6A19 is not required for spike-ACE2 binding. ACE2 servers a chaperone function on SLC6A19 and is able to stabilize the full ACE2 dimer complex in crystallization (
      • Yan R.
      • Zhang Y.
      • Li Y.
      • Xia L.
      • Guo Y.
      • Zhou Q.
      Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2.
      ). We elected to build a full complex model for analysis to allow for simultaneous matrix generation for spike binding and SLC6A19 chaperone screening of genomic variants. This protein complex model was built through the integration of PDB structures 6CRW, 6NB6, and 5X58 for the trimer of spike proteins with 6M17, which shows the interaction of a fragment of spike with the ACE2-SLC6A19 dimer of dimers. From 236 species of ACE2, we determined the conservation of ACE2 amino acids, suggesting poor conservation of the S-ACE2 contact across all of vertebrate evolution (Fig. 6B). The lack of conservation at this site also suggests that ACE2 does not likely have a conserved interaction with another human protein that would compete with spike for function. Structural mapping of human variants from 141,456 people of the gnomADv2 database for ACE2 suggests a few possible variants at the interface of S-ACE2 interaction (Fig. 6C). To go from qualitative mapping to quantitative insights into human variants, we utilized mds of S-ACE2-SLC6A19 complex, determining amino acids that correlate in movement between the proteins (Fig. 6D). From these correlations we calculated the amino acids contributing to S-ACE interaction (red in Fig. 6, E and F), ACE2 dimerization (blue in Fig. 6, E and G), and ACE2-SLC6A19 interaction (magenta/yellow in Fig. 6, E, H, and I).
      Figure thumbnail gr6
      Figure 6Spike-ACE2-SLC6A19 dynamics/evolution to human variants. A, structural model of the spike (magenta), ACE2 (red/orange), and SLC6A19 (green/blue) in lipid membranes. Black box, region zoomed in in B and C. B, conserved amino acids in 236 species ACE2. Cutoff colors are as follows: 2 (highest) (red), 1.5 (dark orange), 1 and 1.25 (light orange), and 1 and 0.5 (yellow). C, missense variants from 141,456 people for ACE2. Cutoffs for allele frequency are shown on the model. Several yellow low frequency variants can be seen at the ACE2/spike contact. D, dynamics correlation matrix of interacting sites throughout simulation of the dimer complex. High correlations are shown in yellow. E, amino acids in red are those that correlate in D between S and ACE2; those shown as side chains and labeled have human variants. The four boxes are the sites for F–I. F, zoom-in view of variants in ACE2 predicted to alter spike contact (red). Lys-26 is labeled in magenta with its uncertain but relatively common variant. G, contact sites for ACE2 dimerization (blue). H and I, contact sites between SLC6A19 (magenta) and ACE2 (yellow) at two different regions.
      From this mds data, along with functional variant prediction tools (PolyPhen2, Provean, SIFT, Align-GVGD, and our conservation analysis), we systematically assessed functional human variants for ACE2, SLC6A19, and TMPRSS2. TMPRSS2 is involved in cleaving the complex for internalization (
      • Hoffmann M.
      • Kleine-Weber H.
      • Schroeder S.
      • Krüger N.
      • Herrler T.
      • Erichsen S.
      • Schiergens T.S.
      • Herrler G.
      • Wu N.-H.
      • Nitsche A.
      • Müller M.A.
      • Drosten C.
      • Pöhlmann S.
      SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor.
      ). Of these three proteins, ACE2 is the only one found on a sex chromosome (X-chromosome), linking it to male hemizygous status that elevates the impact of genomic variants. Variants included are linked to protein function (Table 2, Inclusion group = Functional), S contact (Table 2, Inclusion group = Spike contact), SLC6A19-ACE2 contact (Table 2, Inclusion group = ACE2 contact), posttranslational modifications (Table 2, Inclusion group = Glycosylation, Disulfide bond, or Phosphoserine), or known active-site amino acids (Table 2, Inclusion group = Zinc binding or Active site).
      Table 2Top functional genomic variants of ACE2, SLC6A19, and TMPRSS2
      InclusionProteinChromosomersIDAADamaging callsHemizygote countMaximum allele frequencyMaximum population
      Spike contactACE2Xrs4646116K26R02820.011876795Ashkenazi Jewish
      FunctionalSLC6A195rs781039193W530C500.000868709Latino
      ACE2 contactSLC6A195rs374559483P351S1.500.000640872African
      FunctionalSLC6A195rs374527866R513S4.500.000603136East Asian
      FunctionalSLC6A195rs141487939T180M5.500.000601878African
      FunctionalSLC6A195rs200783817P73L5.2500.000451309East Asian
      FunctionalSLC6A195rs752378013P109L5.500.000437158East Asian
      FunctionalSLC6A195rs141497538P230S5.500.000417246Other
      FunctionalSLC6A195rs552867213R328H500.000401091European (Finnish)
      GlycosylationACE2Xrs761944150N546D2.510.00036784African
      FunctionalSLC6A195rs374976872A271V5.500.000300933East Asian
      FunctionalSLC6A195rs199795977A329T500.000289519Ashkenazi Jewish
      FunctionalSLC6A195rs1253340252Y209S400.000278242Other
      FunctionalSLC6A195rs139182948R95W500.000277701Other
      FunctionalSLC6A195rs369804798S314L5.500.000276932Other
      FunctionalSLC6A195rs142164435R328C5.500.000247927European (non-Finnish)
      FunctionalSLC6A195rs762989809R57C5.2500.000218126East Asian
      GlycosylationACE2Xrs143158922N103H0.500.000210615African
      Zinc bindingACE2Xrs142984500H378R4.560.000194968European (non-Finnish)
      Spike contactACE2Xrs143936283E329G010.000190476Other
      FunctionalSLC6A195rs1403845937W56R500.000164582Other
      FunctionalSLC6A195rs201936518R95Q500.00016372Other
      FunctionalSLC6A195rs375879452T256M5.500.000150708East Asian
      FunctionalTMPRSS221rs762844469G391E5.2500.000144601Latino
      Spike contactACE2Xrs1348114695E35K010.000144436East Asian
      FunctionalSLC6A195rs757679627G93R5.500.00013885Other
      FunctionalTMPRSS221rs1306483136S460R5.2500.000138351Other
      FunctionalSLC6A195rs202220597T243M5.500.000130693South Asian
      FunctionalSLC6A195rs200745023L242P500.000130685South Asian
      FunctionalSLC6A195rs778015723P149L5.500.000130651South Asian
      FunctionalACE2Xrs200745906P263S5.2500.000125798European (non-Finnish)
      FunctionalSLC6A195rs765501634E405K5.2500.000120318African
      FunctionalSLC6A195rs748703513T330I500.000114404European (non-Finnish)
      Active siteACE2Xrs1395782023E375D3.2510.000109585Latino
      FunctionalSLC6A195rs769457402E165K5.2500.000108755East Asian
      FunctionalTMPRSS221rs145171279T459I5.500.000108731East Asian
      FunctionalSLC6A195rs756920378I325T5.2500.000108731East Asian
      GlycosylationTMPRSS221rs1326192818N213K1.508.67553E−05Latino
      Spike contactACE2Xrs781255386T27A007.30327E−05Latino
      GlycosylationSLC6A195rs766784542N158T2.506.53381E−05South Asian
      FunctionalACE2Xrs766319182M270V426.44787E−05European (non-Finnish)
      Disulfide bondTMPRSS221rs906113408C297Y605.99988E−05Latino
      GlycosylationSLC6A195rs146176472N258S1.504.06207E−05African
      FunctionalACE2Xrs756358940I291K5.2523.75756E−05European (non-Finnish)
      PhosphoserineSLC6A195rs754779609S17C203.57373E−05European (non-Finnish)
      Spike contactACE2Xrs961360700D355N4.2502.59141E−05European (non-Finnish)
      Spike contactACE2Xrs778030746I21V022.44636E−05European (non-Finnish)
      47 variants are ranked by the maximum allele frequency within the subpopulations of gnomAD. The ACE2 variant K26R has the highest allele frequency of any variant within the table, but the conserved polar basic amino acid at the spike contact likely does not impact binding. This means that there are only ultra-rare variants in these proteins, with SLC6A19 having the highest-impact variants at 29, ACE2 with 13, and TMPRSS2 with 5. Outside of the European non-Finnish population, the East Asian population carries 10 of these variants, “other” (those individuals not falling into other populations) with 9, African with 6, Latino with 6, and South Asian with 4. A total of 31 of the variants are predicted with a score of ≥4 (of 6 maximum) to be functional variants, 6 at the spike contact of ACE2, and 5 that would alter a glycosylation signal. The most interesting ACE2 variant, H378R (rs142984500), is found in 0.019% of European non-Finnish individuals and has been observed as hemizygous in 6 males of gnomADv2, is one of the critical residues of the Zn binding that drives the enzyme's function, and has not been previously published.
      ACE2's hemizygous nature warrants an investigation of noncoding variants that might influence expression. The ACE2 gene contains a 5′ region that suggests most gene regulation for ACE2 to occur within this region (chrX:15,612,899-15,641,393, hg19). We extracted all noncoding variants followed by an assessment with RegulomeDB (Table S2). Five total variants are identified to potentially alter transcriptional regulation by RegulomeDB score. Two of these variants (rs4646118 and rs143185769) are found in ∼9% of African individuals with hundreds of male hemizygotes identified within gnomADv3 whole-genome data. This supports the potential for noncoding variants of ACE2, with a higher allele frequency than that of coding variants, as contributors to increased susceptibility and within at-risk populations (African and Male) to SARS-CoV-2 infection.

      Discussion

      SARS-CoV-2 represents a generational challenge to science, racing the clock next to a global pandemic that kills ∼2% of those infected. The need to understand the viral structure is urgent regarding therapeutic targets, repurposing compounds, understanding zoonotic spread, and identifying gene variant risk factors in the human host that interact with the pathogen to increase spread and pathogenicity. In future studies, investigations of known human PPI to SARS-CoV-2, similar to ACE2, can determine how genetic variations of host proteins impact the disease course and susceptibility to the virus. Further insight into the genetics could provide useful information about the susceptibility or prognosis of those exposed to or infected with SARS-CoV-2. In this paper, we generated a SARS-CoV-2 structural dynamicome database, integrating structural/dynamic insights with viral evolution for 24 proteins coded by SARS-CoV-2. VIStEDD has elucidated insights that include potential druggable targets, educational material describing each protein, and human variants that may impact the viral life cycle.
      We show here two highly dynamic protein regions that have high conservation, indicative of PPI sites critical to viral infection and spreading. The first is the largely understudied role of the nsp6 conserved amino acids that interact with ATPases of vesicle trafficking (
      • Gordon D.E.
      • Jang G.M.
      • Bouhaddou M.
      • Xu J.
      • Obernier K.
      • O'Meara M.J.
      • Guo J.Z.
      • Swaney D.L.
      • Tummino T.A.
      • Huettenhain R.
      • Kaake R.M.
      • Richards A.L.
      • Tutuncuoglu B.
      • Foussard H.
      • Batra J.
      • et al.
      A SARS-CoV-2-human protein-protein interaction map reveals drug targets and potential drug-repurposing.
      ). Evolutionary conservation throughout thousands of Coronaviridae sequences is found in two regions of the protein that are likely found near each other in 3D space, yet to date, no protein structures have ever been solved of nsp6 and publicly shared, representing a challenge to the structural biology community. ATPases are required for both endocytic and exocytic portions of the viral infections (
      • Palokangas H.
      • Metsikkö K.
      • Väänänen K.
      Active vacuolar H+ATPase is required for both endocytic and exocytic processes during viral infection of BHK-21 cells.
      ) and integral to the release of viral RNA into the cell (
      • Hinton A.
      • Bond S.
      • Forgac M.
      V-ATPase functions in normal and disease processes.
      ). The conserved amino acids are found on surfaces exposed on the I-TASSER–generated predicted structure of nsp6, including multiple charged or aromatic residues. The conserved sites of nsp6 and the ATPases they interact with can likely be therapeutically targeted (
      • Gordon D.E.
      • Jang G.M.
      • Bouhaddou M.
      • Xu J.
      • Obernier K.
      • O'Meara M.J.
      • Guo J.Z.
      • Swaney D.L.
      • Tummino T.A.
      • Huettenhain R.
      • Kaake R.M.
      • Richards A.L.
      • Tutuncuoglu B.
      • Foussard H.
      • Batra J.
      • et al.
      A SARS-CoV-2-human protein-protein interaction map reveals drug targets and potential drug-repurposing.
      ). We also show here that the N protein has several highly conserved amino acids that contribute to a multimer structural organization with surface-exposed conserved amino acids and the N-terminal region of the protein that may reflect sites of targeting to alter the ribosomal control of the protein.
      Second, this new database presents many opportunities for use in education. VIStEDD was generated through a team partnership known as Characterizing our DNA Exceptions (CODE) with the intent of bringing the mds and evolutionary data to undergraduate students across the United States. The data can be used by anyone as the full data set is publicly available (drive.google.com/drive/folders/1dXBJpLo3bay1JQ9BckUsVcTViv6P0w1q?usp=sharing). From these data, we provide high-resolution figures of conservation-mapped, structural files that can be opened in either YASARA or PyMOL tools and molecular videos of the molecules rotating with conservation. For the 3D files, we have provided a vrml file for each protein that can be fed to any 3D printer, with our file containing colors for conservation as well. To expedite 3D printing, we have provided all of the vrml files to Shapeways to allow at-cost printing of the proteins (Table S1), where the files can be used for education.
      For this work, we utilized an integration of known PDB-based structures using YASARA modeling tools followed by energy minimization within a physiological environment. This allows for reduction of crystal packing forces and merging the structural knowledge of the PDB into a single starting protein. Where no templates exist in the PDB (9 of 24 proteins), we utilized the already existing database of I-TASSER SARS-CoV-2 proteins.
      From these structures, our primary goal was to move qualitative structures into a quantitative matrix that can be integrated with evolutionary data. This strategy allows for a single amino acid matrix of functional data for every protein of the SARS-CoV-2. To do that, we utilized 20 ns of molecular dynamics simulations, allowing for tracking of the constraints and correlations of each amino acid using the starting structure. These 20 ns of time all provide dynamic equilibrium as can be observed within the source data (tab-delineated folder, file x_analysisres.tab). Whereas 20 ns of time is not enough to give insights into allosteric movement of the protein, it provides robust quantitative maps of amino acid constraints of the initial protein templates. In the future, we plan to integrate the growing wealth of structural knowledge of inhibitor-bound enzymes, protein–protein interactions, and SARS-CoV-2 protein allosteric structures into our integrated amino acid matrix using additional starting points of the protein structures for 20-ns simulations. The five protein–protein interaction simulations shown within this paper were the beginning of that work.
      The biophysical and structural evidence suggested that SARS-CoV-2 may bind ACE2 with a much higher affinity than SARS-CoV (
      • Wrapp D.
      • Wang N.
      • Corbett K.S.
      • Goldsmith J.A.
      • Hsieh C.-L.
      • Abiona O.
      • Graham B.S.
      • McLellan J.S.
      Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation.
      ). Our group had previously investigated the evolution of ACE2 throughout species, including mapping variants within rat populations (
      • Prokop J.W.
      • Petri V.
      • Shimoyama M.E.
      • Watanabe I.K.M.
      • Casarini D.E.
      • Leeper T.C.
      • Bilinovich S.M.
      • Jacob H.J.
      • Santos R.A.S.
      • Martins A.S.
      • Araujo F.C.
      • Reis F.M.
      • Milsted A.
      Structural libraries of protein models for multiple species to understand evolution of the renin-angiotensin system.
      ), a model system for studying the renin-angiotensin aldosterone system (known as the RAAS). We have expanded those tools here, generating ACE2 models for 235 species from mammals to birds/fish, where each of the models is energy-minimized with the S protein interaction. That database can be used by groups to investigate species where SARS-CoV-2 may be able to enter cells, with variants that enhance or inhibit binding. With the S-ACE2-SLC6A19 complex resolved and mds available, a systematic quantitative map of human variants was created. Few functional variants were identified in ACE2, SLC6A19, or TMPRSS2. Moreover, none of the variants identified are common, all falling below 1% of the global population. In SLC6A19 or TMPRSS2, these rare variants would have minimal outcomes as they would rarely, if ever, reach homozygosity to cause 100% of the proteins to be influenced by variants. However, ACE2 falls on the X-chromosome, linking it to male-specific hemizygous influence. Several of the top rare ACE2 variants identified within this paper have been seen to have hemizygous variants, suggesting a dominant outcome. Whereas functional missense variants in ACE2 are ultra-rare, noncoding variants reach slightly higher allele frequencies, namely rs4646118 and rs143185769. This would suggest that as genome sequencing of patients with SARS-CoV-2 occurs, we should focus on analysis of ultra-rare missense variants listed within this paper and more importantly on several likely functional noncoding variants.
      Through the quantitative dissection of the SARS-CoV-2–encoded proteins and their interaction partners, we have developed a database (VIStEDD) of information that can be used to advance our knowledge. The amino acid quantitative matrix knowledge generated from this work can be used to pinpoint the many human protein interaction partners, mechanisms for PTMs, screening of SARS-CoV-2 functional mutational drift, and drug development to sites that are unique to SARS-CoV-2 or conserved across the coronavirus family. From the ability to regulate these interactions with pharmaceutical intervention to understanding how host genomics can influence the viral biology, VIStEDD will allow for more robust insight into SARS-CoV-2 biology.

      Experimental procedures

      Protein modeling and molecular dynamics simulations

      Similar to our laboratory's analysis of human variants, we have assessed the SARS-CoV-2 proteins using our previous established workflow (
      • Prokop J.W.
      • Lazar J.
      • Crapitto G.
      • Smith D.C.
      • Worthey E.A.
      • Jacob H.J.
      Molecular modeling in the age of clinical genomics, the enterprise of the next generation.
      ). Protein modeling was performed by utilizing YASARA homology modeling (
      • Krieger E.
      • Joo K.
      • Lee J.
      • Lee J.
      • Raman S.
      • Thompson J.
      • Tyka M.
      • Baker D.
      • Karplus K.
      Improving physical realism, stereochemistry, and side-chain accuracy in homology modeling: four approaches that performed well in CASP8.
      ,
      • Krieger E.
      • Vriend G.
      New ways to boost molecular dynamics simulations.
      ) when a structural template was available that matched the sequences listed in Table 1. Homology modeling is the preferred platform as it allows molecules associated with the proteins to be included in the protein structure, including Zn ions critical to folding of Zn fingers within the papain-like proteinase, nsp10, RNA-directed RNA polymerase, helicase, and guanine-N7 methyltransferase. The transmembrane portions of S were manually cleaned and clustered, allowing for insertion into a phosphatidylethanolamine membrane before mds using the YASARA md_runmembrane macro. For those proteins without structural homologs, we utilized models that are part of the I-TASSER SARS-CoV-2 database (
      • Zhang C.
      • Zheng W.
      • Huang X.
      • Bell E.W.
      • Zhou X.
      • Zhang Y.
      Protein structure and sequence reanalysis of 2019-nCoV genome refutes snakes as its intermediate host and the unique similarity between its spike protein insertions and HIV-1.
      ). Each of these models was then fed through homology modeling in YASARA to normalize energetic predictions to the homology models. mds were performed on each of the proteins in YASARA (
      • Krieger E.
      • Vriend G.
      New ways to boost molecular dynamics simulations.
      ) using the AMBER14 force field (
      • Duan Y.
      • Wu C.
      • Chowdhury S.
      • Lee M.C.
      • Xiong G.
      • Zhang W.
      • Yang R.
      • Cieplak P.
      • Luo R.
      • Lee T.
      • Caldwell J.
      • Wang J.
      • Kollman P.
      A point-charge force field for molecular mechanics simulations of proteins based on condensed-phase quantum mechanical calculations.
      ), 0.997 g/ml explicit water, NaCl at 0.9 mass fraction, a pH of 7.4 for protonation predictions, saving trajectory files every 25 ps for 20 ns total. The trajectory files were analyzed with the YASARA md_analyze and md_analyzeres macros, generating an HTML file present in each of the protein report folders. If this report folder is downloaded and the HTML file opened, it generates a full report of the protein dynamics, including multiple figures of analysis. Additionally, all of the tab-delineated analysis files are within the tab folder of VIStEDD proteins and the trajectory files, allowing for reanalysis of trajectory.

      Generation of database information

      From the models and mds, we generated files for VIStEDD. Sequences (within the genomics folder of each protein) were extracted using the sequences listed in Table 1 with BLASTp against the nr protein sequences and aligned using ClustalW (
      • Larkin M.A.
      • Blackshields G.
      • Brown N.P.
      • Chenna R.
      • McGettigan P.A.
      • McWilliam H.
      • Valentin F.
      • Wallace I.M.
      • Wilm A.
      • Lopez R.
      • Thompson J.D.
      • Gibson T.J.
      • Higgins D.G.
      Clustal W and Clustal X version 2.0.
      ). An amino acid matrix of all sequences was generated in MEGA (
      • Tamura K.
      • Peterson D.
      • Peterson N.
      • Stecher G.
      • Nei M.
      • Kumar S.
      MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods.
      ), followed by calculating the percentage of all amino acids at each spot that are the same as in SARS-CoV-2. The conservation of each protein was normalized using a z-score ((value − mean)/S.D.) for comparison across all proteins. The generated models were loaded into YASARA (z-score 0–0.5 (yellow), 0.5–1 (color 166), 1–1.5 (color 157), 1.5–2 (color 145), >2 (red)) or PyMOL (z-score 0–0.5 (yellow), 0.5–1 (bright orange), 1–1.5 (orange), 1.5–2 (warm pink), >2 (red)), colored based on conservation, and saved as respective scene files for the tool. The YASARA colored molecule was saved as a y axis rotation in mpg and converted into mp4, which is more amenable to PowerPoint. Within PyMOL, the structure was also exported as a vrml file for 3D printing. Motifs and posttranslational modification predictions for the N protein were generated using ELM (
      • Dinkel H.
      • Michael S.
      • Weatheritt R.J.
      • Davey N.E.
      • Van Roey K.
      • Altenberg B.
      • Toedt G.
      • Uyar B.
      • Seiler M.
      • Budd A.
      • Jödicke L.
      • Dammert M.A.
      • Schroeter C.
      • Hammer M.
      • Schmidt T.
      • et al.
      ELM–the database of eukaryotic linear motifs.
      ) and NetPhos3.1 (
      • Blom N.
      • Sicheritz-Pontén T.
      • Gupta R.
      • Gammeltoft S.
      • Brunak S.
      Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence.
      ).

      Human variant analysis

      Models for ACE2 and SLC6A19 were homology-modeled using PDB 6M17 (S-ACE2-SLC6A19) followed by alignment back onto the complex. TMPRSS2 was homology-modeled using YASARA followed by manual correction of the transmembrane helix. Each of the two models was embedded into a phosphatidylethanolamine lipid membrane using the YASARA macro md_runmembrane followed by mds as done on SARS-CoV-2 proteins. Vertebrate sequences of the three proteins were extracted using NCBI orthologs for the transcript, open reading frames were assessed using Transdecoder (
      • Haas B.J.
      • Papanicolaou A.
      • Yassour M.
      • Grabherr M.
      • Blood P.D.
      • Bowden J.
      • Couger M.B.
      • Eccles D.
      • Li B.
      • Lieber M.
      • MacManes M.D.
      • Ott M.
      • Orvis J.
      • Pochet N.
      • Strozzi F.
      • et al.
      De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis.
      ), and sequences were aligned using ClustalW codons in MEGA. Conservation was performed on the data as previously published (
      • Prokop J.W.
      • Yeo N.C.
      • Ottmann C.
      • Chhetri S.B.
      • Florus K.L.
      • Ross E.J.
      • Sosonkina N.
      • Link B.A.
      • Freedman B.I.
      • Coppola C.J.
      • McDermott-Roe C.
      • Leysen S.
      • Milroy L.-G.
      • Meijer F.A.
      • Geurts A.M.
      • et al.
      Characterization of coding/noncoding variants for SHROOM3 in patients with CKD.
      ). Genomic missense variants were extracted from gnomADv2 for each of the three genes followed by assessment using PolyPhen2 (
      • Adzhubei I.A.
      • Schmidt S.
      • Peshkin L.
      • Ramensky V.E.
      • Gerasimova A.
      • Bork P.
      • Kondrashov A.S.
      • Sunyaev S.R.
      A method and server for predicting damaging missense mutations.
      ), Provean (
      • Choi Y.
      • Chan A.P.
      PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels.
      ), SIFT (
      • Ng P.C.
      • Henikoff S.
      SIFT: predicting amino acid changes that affect protein function.
      ), and Align-GVGD (
      • Tavtigian S.V.
      • Deffenbaugh A.M.
      • Yin L.
      • Judkins T.
      • Scholl T.
      • Samollow P.B.
      • de Silva D.
      • Zharkikh A.
      • Thomas A.
      Comprehensive statistical study of 452 BRCA1 missense substitutions with classification of eight recurrent substitutions as neutral.
      ). The ACE2 regulatory region was identified using the Roadmap Epigenomics 18-state model (
      • Roadmap Epigenomics Consortium
      • Kundaje A.
      • Meuleman W.
      • Ernst J.
      • Bilenky M.
      • Yen A.
      • Heravi-Moussavi A.
      • Kheradpour P.
      • Zhang Z.
      • Wang J.
      • Ziller M.J.
      • Amin V.
      • Whitaker J.W.
      • Schultz M.D.
      • Ward L.D.
      • et al.
      Integrative analysis of 111 reference human epigenomes.
      ), followed by the extraction of all gnomADv3 variants and assessment with RegulomeDB (
      • Boyle A.P.
      • Hong E.L.
      • Hariharan M.
      • Cheng Y.
      • Schaub M.A.
      • Kasowski M.
      • Karczewski K.J.
      • Park J.
      • Hitz B.C.
      • Weng S.
      • Cherry J.M.
      • Snyder M.
      Annotation of functional variation in personal genomes using RegulomeDB.
      ).

      Data availability

      All data published in this paper are available at doi:10.6084/m9.figshare.12298790.v1 or within our VIStEDD tools (RRID:SCR_018793).

      References

        • Kasmi Y.
        • Khataby K.
        • Souiri A.
        • Ennaji M.M.
        Coronaviridae: 100,000 years of emergence and reemergence.
        in: Ennaji M.M. Emerging and Reemerging Viral Pathogens. Academic Press, Inc., New York2020: 127-149
        • Yan R.
        • Zhang Y.
        • Li Y.
        • Xia L.
        • Guo Y.
        • Zhou Q.
        Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2.
        Science. 2020; 367 (32132184): 1444-1448
        • Kuba K.
        • Imai Y.
        • Rao S.
        • Gao H.
        • Guo F.
        • Guan B.
        • Huan Y.
        • Yang P.
        • Zhang Y.
        • Deng W.
        • Bao L.
        • Zhang B.
        • Liu G.
        • Wang Z.
        • Chappell M.
        • et al.
        A crucial role of angiotensin converting enzyme 2 (ACE2) in SARS coronavirus-induced lung injury.
        Nat. Med. 2005; 11 (16007097): 875-879
        • Ceraolo C.
        • Giorgi F.M.
        Genomic variance of the 2019-nCoV coronavirus.
        J. Med. Virol. 2020; 92 (32027036): 522-528
        • Andersen K.G.
        • Rambaut A.
        • Lipkin W.I.
        • Holmes E.C.
        • Garry R.F.
        The proximal origin of SARS-CoV-2.
        Nat. Med. 2020; 26 (32284615): 450-452
        • Guarner J.
        Three emerging coronaviruses in two decades: the story of SARS, MERS, and now COVID-19.
        Am. J. Clin. Pathol. 2020; 153 (32053148): 420-421
        • Dong E.
        • Du H.
        • Gardner L.
        An interactive web-based dashboard to track COVID-19 in real time.
        Lancet Infect. Dis. 2020; 20 (32087114): 533-534
        • Prompetchara E.
        • Ketloy C.
        • Palaga T.
        Immune responses in COVID-19 and potential vaccines: lessons learned from SARS and MERS epidemic.
        Asian Pac. J. Allergy Immunol. 2020; 38 (32105090): 1-9
        • Zhou F.
        • Yu T.
        • Du R.
        • Fan G.
        • Liu Y.
        • Liu Z.
        • Xiang J.
        • Wang Y.
        • Song B.
        • Gu X.
        • Guan L.
        • Wei Y.
        • Li H.
        • Wu X.
        • Xu J.
        • et al.
        Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study.
        Lancet. 2020; 395 (32171076): 1054-1062
        • Mehta P.
        • McAuley D.F.
        • Brown M.
        • Sanchez E.
        • Tattersall R.S.
        • Manson J.J.
        • HLH Across Speciality Collaboration, UK
        COVID-19: consider cytokine storm syndromes and immunosuppression.
        Lancet. 2020; 395 (32192578): 1033-1034
        • Shi S.
        • Qin M.
        • Shen B.
        • Cai Y.
        • Liu T.
        • Yang F.
        • Gong W.
        • Liu X.
        • Liang J.
        • Zhao Q.
        • Huang H.
        • Yang B.
        • Huang C.
        Association of cardiac injury with mortality in hospitalized patients with COVID-19 in Wuhan, China.
        JAMA Cardiol. 2020; (32211816)
        • Wada M.
        • Lokugamage K.G.
        • Nakagawa K.
        • Narayanan K.
        • Makino S.
        Interplay between coronavirus, a cytoplasmic RNA virus, and nonsense-mediated mRNA decay pathway.
        Proc. Natl. Acad. Sci. U. S. A. 2018; 115 (30297408): E10157-E10166
        • Malle L.
        A map of SARS-CoV-2 and host cell interactions.
        Nat. Rev. Immunol. 2020; 20 (32211816): 351
        • Wu C.
        • Liu Y.
        • Yang Y.
        • Zhang P.
        • Zhong W.
        • Wang Y.
        • Wang Q.
        • Xu Y.
        • Li M.
        • Li X.
        • Zheng M.
        • Chen L.
        • Li H.
        Analysis of therapeutic targets for SARS-CoV-2 and discovery of potential drugs by computational methods.
        Acta Pharm. Sin. B. 2020; 10 (32292689): 766-788
        • Walls A.C.
        • Park Y.-J.
        • Tortorici M.A.
        • Wall A.
        • McGuire A.T.
        • Veesler D.
        Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein.
        Cell. 2020; 181 (32155444): 281-292.e6
        • Hamming I.
        • Timens W.
        • Bulthuis M.L.C.
        • Lely A.T.
        • Navis G.J.
        • van Goor H.
        Tissue distribution of ACE2 protein, the functional receptor for SARS coronavirus: a first step in understanding SARS pathogenesis.
        J. Pathol. 2004; 203 (15141377): 631-637
        • Sungnak W.
        • Huang N.
        • Bécavin C.
        • Berg M.
        • Queen R.
        • Litvinukova M.
        • Talavera-López C.
        • Maatz H.
        • Reichart D.
        • Sampaziotis F.
        • Worlock K.B.
        • Yoshida M.
        • Barnes J.L.
        • HCA Lung Biological Network
        SARS-CoV-2 entry factors are highly expressed in nasal epithelial cells together with innate immune genes.
        Nat. Med. 2020; 26 (32327758): 681-687
        • Uhal B.D.
        • Dang M.
        • Dang V.
        • Llatos R.
        • Cano E.
        • Abdul-Hafez A.
        • Markey J.
        • Piasecki C.C.
        • Molina-Molina M.
        Cell cycle dependence of ACE-2 explains downregulation in idiopathic pulmonary fibrosis.
        Eur. Respir. J. 2013; 42 (23100504): 198-210
        • Barkauskas C.E.
        • Cronce M.J.
        • Rackley C.R.
        • Bowie E.J.
        • Keene D.R.
        • Stripp B.R.
        • Randell S.H.
        • Noble P.W.
        • Hogan B.L.M.
        Type 2 alveolar cells are stem cells in adult lung.
        J. Clin. Invest. 2013; 123 (23921127): 3025-3036
        • Hoffmann M.
        • Kleine-Weber H.
        • Schroeder S.
        • Krüger N.
        • Herrler T.
        • Erichsen S.
        • Schiergens T.S.
        • Herrler G.
        • Wu N.-H.
        • Nitsche A.
        • Müller M.A.
        • Drosten C.
        • Pöhlmann S.
        SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor.
        Cell. 2020; 181 (32142651): 271-280.e8
        • Gordon D.E.
        • Jang G.M.
        • Bouhaddou M.
        • Xu J.
        • Obernier K.
        • O'Meara M.J.
        • Guo J.Z.
        • Swaney D.L.
        • Tummino T.A.
        • Huettenhain R.
        • Kaake R.M.
        • Richards A.L.
        • Tutuncuoglu B.
        • Foussard H.
        • Batra J.
        • et al.
        A SARS-CoV-2-human protein-protein interaction map reveals drug targets and potential drug-repurposing.
        bioRxiv. 2020;
        • Angelini M.M.
        • Akhlaghpour M.
        • Neuman B.W.
        • Buchmeier M.J.
        Severe acute respiratory syndrome coronavirus nonstructural proteins 3, 4, and 6 induce double-membrane vesicles.
        mBio. 2013; 4 (23943763)
        • Cottam E.M.
        • Whelband M.C.
        • Wileman T.
        Coronavirus NSP6 restricts autophagosome expansion.
        Autophagy. 2014; 10 (24991833): 1426-1441
        • Xiao X.
        • Chakraborti S.
        • Dimitrov A.S.
        • Gramatikoff K.
        • Dimitrov D.S.
        The SARS-CoV S glycoprotein: expression and functional characterization.
        Biochem. Biophys. Res. Commun. 2003; 312 (14651994): 1159-1164
        • Fung T.S.
        • Liu D.X.
        Post-translational modifications of coronavirus proteins: roles and function.
        Future Virol. 2018; 13 (32201497): 405-430
        • Nal B.
        • Chan C.
        • Kien F.
        • Siu L.
        • Tse J.
        • Chu K.
        • Kam J.
        • Staropoli I.
        • Crescenzo-Chaigne B.
        • Escriou N.
        • van der Werf S.
        • Yuen K.-Y.
        • Altmeyer R.
        Differential maturation and subcellular localization of severe acute respiratory syndrome coronavirus surface proteins S, M and E.
        J. Gen. Virol. 2005; 86 (15831954): 1423-1434
        • Oostra M.
        • de Haan C.A M.
        • de Groot R.J.
        • Rottier P.J.M.
        Glycosylation of the severe acute respiratory syndrome coronavirus triple-spanning membrane proteins 3a and M.
        J. Virol. 2006; 80 (16474139): 2326-2336
        • Fan Z.
        • Zhuo Y.
        • Tan X.
        • Zhou Z.
        • Yuan J.
        • Qiang B.
        • Yan J.
        • Peng X.
        • Gao G.F.
        SARS-CoV nucleocapsid protein binds to hUbc9, a ubiquitin conjugating enzyme of the sumoylation system.
        J. Med. Virol. 2006; 78 (16998888): 1365-1373
        • Surjit M.
        • Kumar R.
        • Mishra R.N.
        • Reddy M.K.
        • Chow V.T.K.
        • Lal S.K.
        The severe acute respiratory syndrome coronavirus nucleocapsid protein is phosphorylated and localizes in the cytoplasm by 14-3-3-mediated translocation.
        J. Virol. 2005; 79 (16103198): 11476-11486
        • Surjit M.
        • Lal S.K.
        The SARS-CoV nucleocapsid protein: a protein with multifarious activities.
        Infect. Genet. Evol. 2008; 8 (17881296): 397-405
        • Belouzard S.
        • Millet J.K.
        • Licitra B.N.
        • Whittaker G.R.
        Mechanisms of coronavirus cell entry mediated by the viral spike protein.
        Viruses. 2012; 4 (22816037): 1011-1033
        • Palokangas H.
        • Metsikkö K.
        • Väänänen K.
        Active vacuolar H+ATPase is required for both endocytic and exocytic processes during viral infection of BHK-21 cells.
        J. Biol. Chem. 1994; 269 (8021266): 17577-17585
        • Hinton A.
        • Bond S.
        • Forgac M.
        V-ATPase functions in normal and disease processes.
        Pflugers Arch. 2009; 457 (18026982): 589-598
        • Wrapp D.
        • Wang N.
        • Corbett K.S.
        • Goldsmith J.A.
        • Hsieh C.-L.
        • Abiona O.
        • Graham B.S.
        • McLellan J.S.
        Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation.
        Science. 2020; 367 (32075877): 1260-1263
        • Prokop J.W.
        • Petri V.
        • Shimoyama M.E.
        • Watanabe I.K.M.
        • Casarini D.E.
        • Leeper T.C.
        • Bilinovich S.M.
        • Jacob H.J.
        • Santos R.A.S.
        • Martins A.S.
        • Araujo F.C.
        • Reis F.M.
        • Milsted A.
        Structural libraries of protein models for multiple species to understand evolution of the renin-angiotensin system.
        Gen. Comp. Endocrinol. 2015; 215 (25260253): 106-116
        • Prokop J.W.
        • Lazar J.
        • Crapitto G.
        • Smith D.C.
        • Worthey E.A.
        • Jacob H.J.
        Molecular modeling in the age of clinical genomics, the enterprise of the next generation.
        J. Mol. Model. 2017; 23 (28204942): 75
        • Krieger E.
        • Joo K.
        • Lee J.
        • Lee J.
        • Raman S.
        • Thompson J.
        • Tyka M.
        • Baker D.
        • Karplus K.
        Improving physical realism, stereochemistry, and side-chain accuracy in homology modeling: four approaches that performed well in CASP8.
        Proteins. 2009; 77 (19768677): 114-122
        • Krieger E.
        • Vriend G.
        New ways to boost molecular dynamics simulations.
        J. Comput. Chem. 2015; 36 (25824339): 996-1007
        • Zhang C.
        • Zheng W.
        • Huang X.
        • Bell E.W.
        • Zhou X.
        • Zhang Y.
        Protein structure and sequence reanalysis of 2019-nCoV genome refutes snakes as its intermediate host and the unique similarity between its spike protein insertions and HIV-1.
        J. Proteome Res. 2020; 19 (32200634): 1351-1360
        • Duan Y.
        • Wu C.
        • Chowdhury S.
        • Lee M.C.
        • Xiong G.
        • Zhang W.
        • Yang R.
        • Cieplak P.
        • Luo R.
        • Lee T.
        • Caldwell J.
        • Wang J.
        • Kollman P.
        A point-charge force field for molecular mechanics simulations of proteins based on condensed-phase quantum mechanical calculations.
        J. Comput. Chem. 2003; 24 (14531054): 1999-2012
        • Larkin M.A.
        • Blackshields G.
        • Brown N.P.
        • Chenna R.
        • McGettigan P.A.
        • McWilliam H.
        • Valentin F.
        • Wallace I.M.
        • Wilm A.
        • Lopez R.
        • Thompson J.D.
        • Gibson T.J.
        • Higgins D.G.
        Clustal W and Clustal X version 2.0.
        Bioinformatics. 2007; 23 (17846036): 2947-2948
        • Tamura K.
        • Peterson D.
        • Peterson N.
        • Stecher G.
        • Nei M.
        • Kumar S.
        MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods.
        Mol. Biol. Evol. 2011; 28 (21546353): 2731-2739
        • Dinkel H.
        • Michael S.
        • Weatheritt R.J.
        • Davey N.E.
        • Van Roey K.
        • Altenberg B.
        • Toedt G.
        • Uyar B.
        • Seiler M.
        • Budd A.
        • Jödicke L.
        • Dammert M.A.
        • Schroeter C.
        • Hammer M.
        • Schmidt T.
        • et al.
        ELM–the database of eukaryotic linear motifs.
        Nucleic Acids Res. 2012; 40 (22110040): D242-D251
        • Blom N.
        • Sicheritz-Pontén T.
        • Gupta R.
        • Gammeltoft S.
        • Brunak S.
        Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence.
        Proteomics. 2004; 4 (15174133): 1633-1649
        • Haas B.J.
        • Papanicolaou A.
        • Yassour M.
        • Grabherr M.
        • Blood P.D.
        • Bowden J.
        • Couger M.B.
        • Eccles D.
        • Li B.
        • Lieber M.
        • MacManes M.D.
        • Ott M.
        • Orvis J.
        • Pochet N.
        • Strozzi F.
        • et al.
        De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis.
        Nat. Protoc. 2013; 8 (23845962): 1494-1512
        • Prokop J.W.
        • Yeo N.C.
        • Ottmann C.
        • Chhetri S.B.
        • Florus K.L.
        • Ross E.J.
        • Sosonkina N.
        • Link B.A.
        • Freedman B.I.
        • Coppola C.J.
        • McDermott-Roe C.
        • Leysen S.
        • Milroy L.-G.
        • Meijer F.A.
        • Geurts A.M.
        • et al.
        Characterization of coding/noncoding variants for SHROOM3 in patients with CKD.
        J. Am. Soc. Nephrol. 2018; 29 (29476007): 1525-1535
        • Adzhubei I.A.
        • Schmidt S.
        • Peshkin L.
        • Ramensky V.E.
        • Gerasimova A.
        • Bork P.
        • Kondrashov A.S.
        • Sunyaev S.R.
        A method and server for predicting damaging missense mutations.
        Nat. Methods. 2010; 7 (20354512): 248-249
        • Choi Y.
        • Chan A.P.
        PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels.
        Bioinformatics. 2015; 31 (25851949): 2745-2747
        • Ng P.C.
        • Henikoff S.
        SIFT: predicting amino acid changes that affect protein function.
        Nucleic Acids Res. 2003; 31 (12824425): 3812-3814
        • Tavtigian S.V.
        • Deffenbaugh A.M.
        • Yin L.
        • Judkins T.
        • Scholl T.
        • Samollow P.B.
        • de Silva D.
        • Zharkikh A.
        • Thomas A.
        Comprehensive statistical study of 452 BRCA1 missense substitutions with classification of eight recurrent substitutions as neutral.
        J. Med. Genet. 2006; 43 (16014699): 295-305
        • Roadmap Epigenomics Consortium
        • Kundaje A.
        • Meuleman W.
        • Ernst J.
        • Bilenky M.
        • Yen A.
        • Heravi-Moussavi A.
        • Kheradpour P.
        • Zhang Z.
        • Wang J.
        • Ziller M.J.
        • Amin V.
        • Whitaker J.W.
        • Schultz M.D.
        • Ward L.D.
        • et al.
        Integrative analysis of 111 reference human epigenomes.
        Nature. 2015; 518 (25693563): 317-330
        • Boyle A.P.
        • Hong E.L.
        • Hariharan M.
        • Cheng Y.
        • Schaub M.A.
        • Kasowski M.
        • Karczewski K.J.
        • Park J.
        • Hitz B.C.
        • Weng S.
        • Cherry J.M.
        • Snyder M.
        Annotation of functional variation in personal genomes using RegulomeDB.
        Genome Res. 2012; 22 (22955989): 1790-1797