23 Feb 2023

RNAcentral Release 22



We are pleased to announce RNAcentral release 22, featuring two new expert databases, EVlncRNAs and Ribocentre, a new visualisation, and updates to LitScan.

Welcome EVlncRNAs

EVlncRNAs is a database of experimentally validated long non-coding RNAs, curated from papers alongside their expression, interaction and association with disease. In release 22, we have imported the second version of the database, which expands on the original by manually curating almost 19,000 additional papers. You can explore their data here.



Welcome Ribocentre

Ribocentre aims to become a database of all natural ribozymes, and includes representative structures alongside the chemical mechanism of ribozymes. You can explore their data here.


Visualise expression

In release 21 we import cross references to Expression Atlas, we have gone further and now integrated their viewer into RNAcentral. You can now view expression information from some ncRNAs on RNAcentral. Below is one example:




Browse all sequences with data from Expression Atlas here.

LitScan updates

LitScan is our text mining tool and is used to associate papers with the RNA sequences they mention. We use our comprehensive collection of RNA names to associate sequences with papers. LitScan scanned all open access literature available on Europe PMC with 8,939,826 ids and found 865,179 articles which matched 4,497,573 unique sequences. Please reach out to us if we have missed any names!

Additionally, we are preparing exports of LitScan results. In future releases this will be part of our FTP export. Get in contact if you would like to hear about the export as soon as it is ready.

FTP export updates

RNAcentral is powered by a PostgreSQL database, which is publicly available. We are now making dumps of the latest release publically available on our FTP site. We intend to keep only the latest dump available. Power users interested in large scale analysis with RNAcentral can fetch and use our database dumps now. Take a look at some documentation, details on our schema, or reach out if you have any questions!

Database updates

These databases were updated in this release to the version stated below:

  • Ensembl 107 -> 108
  • Ensembl Fungi 54 -> 55
  • Ensembl Metazoa 54 -> 55
  • Ensembl Plants 54 -> 55
  • Ensembl Protists 54 -> 55
  • RefSeq 213 -> 216
  • Rfam 14.7 -> 14.9

The below databases were imported at their latest version as of 2023-01-25

  • ENA
  • FlyBase
  • HGNC
  • IntAct
  • PDBe
  • PSICQUIC
  • PomBase
  • QuickGO

Get in touch


As always, all data are freely available on the RNAcentral website, via the API, our public database, and in the FTP archive. If you have any feedback, please get in touch by email, on Twitter, or by submitting an issue on GitHub. We look forward to hearing from you!


27 Oct 2022

R2DT Version 1.3


We are pleased to announce the release of R2DT version 1.3 that introduces constrained folding functionality as well as new and updated templates. Read on to find out more or head to
GitHub or the RNAcentral web app to start using the new software.

Constrained folding

R2DT uses templates for predicting and visualising RNA secondary structures. This works well for the majority of sequences in RNAcentral. However, some sequences include large insertions that do not align to the templates. Previous versions of R2DT displayed such regions as unfolded loops.


For example, constrained folding improves species-specific insertions found in many rRNAs:


The new mode can also generate better diagrams where the Rfam consensus structure has long dangling ends or large hairpin loops:


In order to add base pairs for the unfolded regions that are not modelled by the templates, R2DT uses RNAfold from the Vienna RNA package. There are four ways of using constrained folding:


  • Local folding: the secondary structure of the insertion relative to the template is predicted with RNAfold and added to the diagram;

  • Global folding: the entire molecule is folded using RNAfold with the template structure provided as a constraint;

  • Global folding with single-stranded (s/s) nucleotides enforced: same as above except for the nucleotides that align to single-stranded regions of the template are kept unpaired when predicting the new structure using RNAfold.


To use constrained folding include the “--constraint” option on the command line or choose one of the modes in the advanced options of the web app. The folding mode is automatically selected based on molecule type but can be manually overridden with “–fold_type” parameter or using a dropdown:

Special thanks to Holly McCann and Anton S. Petrov (Georgia Tech) for developing this feature! 👏👏👏

Other updates

  • The R2DT library now includes the latest RNA families from the Rfam release 14.8, a manually curated template for HCV IRES, and updated rRNA templates for tomato (Solanum lycopersicum). 

  • A new “--skip_ribovore_filters” command line option has been added to generate diagrams for sequences that align to the templates with a large number of insertions (learn more).

  • Traveler software has been updated to v3.0.0.

  • R2DT now generates output in JSON format using the RNA2D-data-schema.

  • See the GitHub pull request for further details on these and other changes.

Summary

The new software is now available everywhere where you can find R2DT: on GitHub, Docker Hub, RNAcentral web app, and the API. Work is underway to regenerate the secondary structures displayed in RNAcentral using the new version of R2DT. Stay tuned for future releases and feel free to let us know if you have any feedback by raising an issue or using the contact us form.


20 Oct 2022

RNAcentral Release 21

We are pleased to announce RNAcentral release 21, featuring two new expert databases, PLncDB and Expression Atlas, a new visualisation, and updates to LitScan.

Welcome PLncDB

PLncDB is a comprehensive database of long non-coding RNAs found in plants, featuring data on 80 species.



PLncDB have gone to considerable lengths to ensure confidence in their annotations, more details on which can be found in the paper. To explore their data, start here.

Welcome Expression Atlas

Expression atlas is a resource that allows users to find out where genes are expressed, and how the expression changes when disease is present. In this release, we have imported cross-references to all RNAs that have expression data, so that users can find which sequences have expression data. To explore the data imported from Expression Atlas, have a look at all Expression Atlas entries, or for a single example look at the URS below:




We plan to extend our expression data to include visualisations in the future. Please let us know if you have any suggestions for expression data in RNAcentral.

Visualise locations

We have included SwissBioPic visualisations into RNAcentral. These provide a simple and clear way to visualise the locations of RNA molecules in a cell. Here we show MALAT1 as an example:




We have used the gene ontology annotations provided by groups like the Functional Gene Annotation team at UCL, as the source for these figures.

LitScan updates

LitScan is our text mining tool and is used to associate papers with the RNA sequences they mention. We use our comprehensive collection of RNA names to associate sequences with papers. It has been updated with 5.6 million new RNA names leading to 545,244 papers connected to 2.5 million sequences. Please reach out to us if we have missed any names!

Additionally, we are preparing exports of LitScan results. In future releases this will be part of our FTP export. Get in contact if you would like to hear about the export as soon as it is ready.

FTP export updates

RNAcentral is powered by a PostgreSQL database, which is publicly available. We are now making dumps of the latest release publically available in our FTP site. We intend to keep only the latest dump available. Power users interested in large scale analysis with RNAcentral can fetch and use our database dumps now. Take a look at some documentation, details on our schema, or reach out if you have any questions!

Database updates

  • ENA (snapshot as of October '22)

  • Ensembl (version 107)

  • FlyBase (2022_04)

  • GeneCards (5.12)

  • gtRNAdb (19)

  • HGNC (snapshot as of September '22)

  • IntAct (1.0.3)

  • PDBe (snapshot as of September '22)

  • PSICQUIC (snapshot as of September '22)

  • PomBase (2022-08-24)

  • QuickGO (snapshot as of September '22)

  • RefSeq (213)

  • Rfam (14.7)

  • SILVA (138.1)

  • ZFIN (2022-08-25)

  • ZWD (1.1)

Get in touch

As always, all data are freely available on the RNAcentral website, via the API, our public database, and in the FTP archive. If you have any feedback, please get in touch by email, on Twitter, or by submitting an issue on GitHub. We look forward to hearing from you!



4 Apr 2022

RNAcentral Release 20

 

We are pleased to announce RNAcentral release 20 featuring literature integration with LitScan, new permissive licence (CC0) to enable data reuse without restrictions, as well as a new expert database, Ribovision.

Introducing LitScan

One of the most requested features for RNAcentral has been a comprehensive, up-to-date connection with literature. In release 20, we have taken our first step toward this goal by developing RNAcentral LitScan to analyse open access articles from Europe PMC. RNAcentral LitScan is a new text mining pipeline that connects RNA sequences with the latest open access scientific literature. LitScan uses a collection of identifiers (Ids), gene names, and synonyms provided to RNAcentral by the Expert Databases to scan the papers available in Europe PMC. For this release, LitScan searched 2.7 million Ids from 19 Expert Databases and identified >387,000 papers which contain 1.6 million Ids corresponding to >280,000 unique RNA sequences.

For example, lncRNA THRIL is also known as Linc1992. Using LitScan, the corresponding RNAcentral entry includes papers about THRIL, Linc1992, and even NR_110375 which is another Id for the same gene:


LitScan features an interactive user interface that enables the users to filter the papers using facets, including year, journal, identifier, and the part of the paper where the Id is found.

Read more about the methodology here. Please reach out by email, Twitter or GitHub issue if you have any feedback or if you are interested in using the widget on your website!

All RNAcentral data is now released under CC0

As we announced previously, all RNAcentral data is now released under the Creative Commons Zero (CC0) licence. This means:

  • Our terms of use are more inline with the spirit of EMBL-EBI's Terms of use and places the data in the public domain without constraints. We believe that this approach to research data sharing strengthens open science and scientific progress.
  • We encourage remixing and reuse as it makes clear to any user - academic, commercial or otherwise - that the data is not owned by anyone and therefore can be used freely.
  • We hope this will save researchers time when reusing the data, which speeds up the process of science.

Additionally, we have included a licence page to help clarify how the terms we release our data under and how we use data submitted to our services.

Welcome RiboVision

RiboVision is a tool for the exploration of ribosomal data using combined 1D, 2D and 3D visualisations. Using their carefully curated secondary structures they provide a way to explore the sequence and structural data for ribosomes. The RiboVision diagrams are also used in R2DT. You can explore their data here.

Database updates

  • ENA
  • FlyBase
  • HGNC
  • IntAct
  • MirGeneDB
  • PDBe
  • PSICQUIC
  • PomBase
  • QuickGO
  • RefSeq

Get in touch

As always, all data are freely available on the RNAcentral website, via the API, our public database, and in the FTP archive. If you have any feedback, please get in touch by email, on Twitter, or by submitting an issue on GitHub. We look forward to hearing from you!