26 Mar 2024

RNAcentral Release 24

We are pleased to announce RNAcentral release 24, which features a major update to the tmRNA Website, improvements to our genome browser and updates to LitScan and LitSumm. LitSumm now generates summaries using GPT4. Read on for the details.

Welcome tmRNA Website 2.0

Recently, the tmRNA Website has undergone two major changes. First, they have overhauled how they identify tmRNA sequences. They have provided a new dataset, which now provides RNAcentral 96,670 sequences, which are extensively annotated with functional features. For example:


Show above is an example of Candidatus Gastranaerophilaceae tmRNA (URS0002856755/3022868)


Secondly, the tmRNA Website is no longer updated and all data is now hosted at RNAcentral. We will continue to host and update this data as normal. You can browse the tmRNA data by searching expert_db:”tmRNA Website”

Improved taxonomic search

We have improved our taxonomic search to allow users to more easily find sequences of interest. In our last release we made it possible for users to search for subspecies, but this wasn’t very user friendly. Now there is an option to find all sequences from that taxon or any subspecies right in the taxonomic search box. For example this search can now find all E. coli and subspecies, like E. coli K-12.


Here is an example of searching for: (TAXONOMY:"562" OR lineage_path:"562"). There is now a simple button to push to get taxonomy aware search.


This change applies to all taxonomic levels, so searching for all sequences for bacterial rRNAs is as easy as (TAXONOMY:"2" OR lineage_path:"2") AND so_rna_type_name:"RRNA". Currently, users have to choose to use subspecies searching, but this may change in the future. Try it out and let us know if you have any feedback!

LitScan updates

LitScan is our text mining tool and is used to associate papers with the RNA sequences they mention. We use our comprehensive collection of RNA names to associate sequences with papers. 


A significant increase in the number of articles was observed in this release. The number of papers increased from 915 thousand to 1.1 million, and there are 2.5 million sequences with associated papers. Additionally, LitScan actively monitors retracted articles, promptly eliminating them from the results. Over the last three months, approximately 1,000 articles have been retracted. You can browse all sequences with associated papers here: RNA AND has_lit_scan:"True".

LitSumm updates

LitSumm, our tool to produce gene summaries for ncRNAs, has been updated to use GPT4. As part of the update, we worked with Sam Griffiths-Jones to help validate selected miRNA summaries. Thanks Sam! Overall, the change to GPT4 has brought some impressive improvements. You can read the details in our updated pre-print or browse the summaries


We are looking forward to expanding LitSumm to as many RNAs as possible. We also have a prototype API for fetching these summaries and would be interested in feedback on it. If you have any feedback on the summaries or would like to use LitSumm on your site, please get in contact.

Database updates

These databases were updated in this release to the version stated below:

  • Ensembl 110 -> 111

  • Ensembl Fungi 57 -> 58

  • Ensembl Metazoa 57 -> 58

  • Ensembl Plants 57 -> 58

  • Ensembl Protists 57 -> 58

  • Ensembl/GENCODE human 42/mouse M31 -> human 45/mouse M34

  • FlyBase FB2023_05 -> FB2024_01

  • GeneCards 5.18 -> 5.19

  • lncRNAdb 221 -> 223

  • MalaCards 5.18 -> 5.19

  • Rfam 14.9 -> 14.10

  • RefSeq 221 -> 223


The below databases were imported at their latest version as of 2024-02-20

  • ENA

  • HGNC

  • IntAct

  • PDBe

  • PomBase

  • SGD

  • SRPDB

  • tmRNA Website

  • WormBase

  • ZFIN

Website updates

Various minor errors have been rectified on the website, including the previously missing thin lines denoting introns in the RNAcentral genomic coordinates data. Should you encounter any inaccuracies, kindly inform us.

Get in touch

As always, all data are freely available on the RNAcentral website, via the API, our public database, and in the FTP archive.  If you have any feedback, please get in touch by email, on Twitter, or by submitting an issue on GitHub. We look forward to hearing from you!

30 Nov 2023

RNAcentral Release 23

We are pleased to announce RNAcentral release 23, featuring two new expert databases, MGnify and REDIportal, a new genome browser and a new lineage based search. We are also releasing LLM generated summaries of literature from our tool LitSumm.

Welcome MGnify

MGnify is a database of microbiome data. They provide a large collection of metagenomes assembled genomes (MAGs) from submitted and publicly accessible datasets. As part of their analysis pipelines, they use Rfam to identify ncRNAs. In this release we have imported ncRNAs from complete MAGs. This has created 135,924 new metagenome sequences from 1,929 organisms, like https://rnacentral.org/rna/URS000093D738/562 shown below.



Welcome REDIportal

REDIportal is a database of RNA editing events primarily in human sequences. In this release RNAcentral has imported the editing events to display on our sequence feature viewer. You can find sequences with edits using a simple search: has_editing_event:"True" and see an example below:



(https://rnacentral.org/rna/URS000047A7F4/9606)

 

Improved genome browser

We have moved from using genoverse to igv.js. This change allows users to now upload their own tracks to view alongside RNAcentral annotations. Also, this improves the long-term maintenance and stability of RNAcentral. Below is an example of the old vs new browser.





Try out the new browser and let us know what you think! 


New lineage search

The RNAcentral search index now allows users to search by phylogeny. As an example you can now do lineage_name:"Bacteria" to find all sequences which are from any bacteria, similar searches can be done with the NCBI taxonomy id using ‘lineage_path’. Previously, searches would only find exact taxonomic matches. Thanks to the EBI search team for making this possible!


Introducing LitSumm

The long term goal of RNAcentral is to let our users know what is the function of any RNA they find in our database. We have been working toward this goal with Rfam analysis and more recently LitScan, our tool to connect literature to RNAcentral sequences. In this release, we are announcing LitSumm, our tool to summarise open access literature about specific ncRNAs using LLMs and provide citations for all claims. You can find the details of our method in our preprint: LitSumm: Large language models for literature summarisation of non-coding RNAs. Using this we summarised the literature about 4,610 RNAs, which you can see with a simple search: has_litsumm:"True". Below is the example summary for SNORA73B (URS00006422E6_9606).


SNORA73B is a small nucleolar RNA (snoRNA) that has been found to be overexpressed in Huh7 cells [PMC8763008]. It is also downregulated in histone encoding genes and spliceosome-associated small nuclear ribonucleoproteins (RNP) [PMC7191197]. In the context of age-related macular degeneration (AMD), SNORA73B has been found to be expressed at higher levels in both retina and PRCS tissues compared to normal tissues, suggesting a potential role in AMD [PMC5813239]. SNORA73B is associated with SIRT7 and is involved in the processing of pre-rRNAs to produce mature rRNAs [PMC4754350] [PMC8410784]. Overexpression of SNORA73B has been shown to inhibit the expression of its host genes [PMC8763008]. The snoRNABase database provides accession numbers and approved symbols for snoRNAs, including SNORA73B [PMC1687206]. The expression of SNORA23 and SNORA73B is regulated by the AKT molecular inhibitor, GSK2141795, suggesting their involvement in the PI3K/Akt/mTOR signaling pathway [PMC8763008]. In terms of cancer prognosis, SNORA73B has been identified as a potential predicting factor for outcome in cutaneous melanoma (CM) [PMC7550331]. In summary, SNORA73B is a snoRNA that shows differential expression patterns in various cellular contexts and diseases. It plays a role in RNA processing and may have implications for AMD and cancer prognosis.



We encourage the community to take a look at these summaries and give us feedback. We have big plans for the future of LitSumm and are very excited about what LLMs can offer RNAcentral.
Database updates

These databases were updated in this release to the version stated below:

Ensembl 108 ->110
Ensembl Fungi 55->57
Ensembl Metazoa 55->57
Ensembl Plants 55->57
Ensembl Protists 55->57
FlyBase FB2022_05 -> FB2023_05
GeneCards 5.12 -> 5.18
gtRNAdb 19 -> 21
MalaCards 5.12 -> 5.18
RefSeq 216 -> 221
Rfam 14.9
REDIportal 2.0


The below databases were imported at their latest version as of 2023-10-19:


Ribocentre
SGD
SRP DB (ENA)
WormBase (ENA)
ZFIN
PDB
PomBase
QuickGO
MGNIFY
HGNC
IntAct
Expression Atlas
ENA

Get in touch

As always, all data are freely available on the RNAcentral website, via the API, our public database, and in the FTP archive. If you have any feedback, please get in touch by email, on Twitter, or by submitting an issue on GitHub. We look forward to hearing from you!

23 Feb 2023

RNAcentral Release 22



We are pleased to announce RNAcentral release 22, featuring two new expert databases, EVlncRNAs and Ribocentre, a new visualisation, and updates to LitScan.

Welcome EVlncRNAs

EVlncRNAs is a database of experimentally validated long non-coding RNAs, curated from papers alongside their expression, interaction and association with disease. In release 22, we have imported the second version of the database, which expands on the original by manually curating almost 19,000 additional papers. You can explore their data here.



Welcome Ribocentre

Ribocentre aims to become a database of all natural ribozymes, and includes representative structures alongside the chemical mechanism of ribozymes. You can explore their data here.


Visualise expression

In release 21 we import cross references to Expression Atlas, we have gone further and now integrated their viewer into RNAcentral. You can now view expression information from some ncRNAs on RNAcentral. Below is one example:




Browse all sequences with data from Expression Atlas here.

LitScan updates

LitScan is our text mining tool and is used to associate papers with the RNA sequences they mention. We use our comprehensive collection of RNA names to associate sequences with papers. LitScan scanned all open access literature available on Europe PMC with 8,939,826 ids and found 865,179 articles which matched 4,497,573 unique sequences. Please reach out to us if we have missed any names!

Additionally, we are preparing exports of LitScan results. In future releases this will be part of our FTP export. Get in contact if you would like to hear about the export as soon as it is ready.

FTP export updates

RNAcentral is powered by a PostgreSQL database, which is publicly available. We are now making dumps of the latest release publically available on our FTP site. We intend to keep only the latest dump available. Power users interested in large scale analysis with RNAcentral can fetch and use our database dumps now. Take a look at some documentation, details on our schema, or reach out if you have any questions!

Database updates

These databases were updated in this release to the version stated below:

  • Ensembl 107 -> 108
  • Ensembl Fungi 54 -> 55
  • Ensembl Metazoa 54 -> 55
  • Ensembl Plants 54 -> 55
  • Ensembl Protists 54 -> 55
  • RefSeq 213 -> 216
  • Rfam 14.7 -> 14.9

The below databases were imported at their latest version as of 2023-01-25

  • ENA
  • FlyBase
  • HGNC
  • IntAct
  • PDBe
  • PSICQUIC
  • PomBase
  • QuickGO

Get in touch


As always, all data are freely available on the RNAcentral website, via the API, our public database, and in the FTP archive. If you have any feedback, please get in touch by email, on Twitter, or by submitting an issue on GitHub. We look forward to hearing from you!


27 Oct 2022

R2DT Version 1.3


We are pleased to announce the release of R2DT version 1.3 that introduces constrained folding functionality as well as new and updated templates. Read on to find out more or head to
GitHub or the RNAcentral web app to start using the new software.

Constrained folding

R2DT uses templates for predicting and visualising RNA secondary structures. This works well for the majority of sequences in RNAcentral. However, some sequences include large insertions that do not align to the templates. Previous versions of R2DT displayed such regions as unfolded loops.


For example, constrained folding improves species-specific insertions found in many rRNAs:


The new mode can also generate better diagrams where the Rfam consensus structure has long dangling ends or large hairpin loops:


In order to add base pairs for the unfolded regions that are not modelled by the templates, R2DT uses RNAfold from the Vienna RNA package. There are four ways of using constrained folding:


  • Local folding: the secondary structure of the insertion relative to the template is predicted with RNAfold and added to the diagram;

  • Global folding: the entire molecule is folded using RNAfold with the template structure provided as a constraint;

  • Global folding with single-stranded (s/s) nucleotides enforced: same as above except for the nucleotides that align to single-stranded regions of the template are kept unpaired when predicting the new structure using RNAfold.


To use constrained folding include the “--constraint” option on the command line or choose one of the modes in the advanced options of the web app. The folding mode is automatically selected based on molecule type but can be manually overridden with “–fold_type” parameter or using a dropdown:

Special thanks to Holly McCann and Anton S. Petrov (Georgia Tech) for developing this feature! 👏👏👏

Other updates

  • The R2DT library now includes the latest RNA families from the Rfam release 14.8, a manually curated template for HCV IRES, and updated rRNA templates for tomato (Solanum lycopersicum). 

  • A new “--skip_ribovore_filters” command line option has been added to generate diagrams for sequences that align to the templates with a large number of insertions (learn more).

  • Traveler software has been updated to v3.0.0.

  • R2DT now generates output in JSON format using the RNA2D-data-schema.

  • See the GitHub pull request for further details on these and other changes.

Summary

The new software is now available everywhere where you can find R2DT: on GitHub, Docker Hub, RNAcentral web app, and the API. Work is underway to regenerate the secondary structures displayed in RNAcentral using the new version of R2DT. Stay tuned for future releases and feel free to let us know if you have any feedback by raising an issue or using the contact us form.


20 Oct 2022

RNAcentral Release 21

We are pleased to announce RNAcentral release 21, featuring two new expert databases, PLncDB and Expression Atlas, a new visualisation, and updates to LitScan.

Welcome PLncDB

PLncDB is a comprehensive database of long non-coding RNAs found in plants, featuring data on 80 species.



PLncDB have gone to considerable lengths to ensure confidence in their annotations, more details on which can be found in the paper. To explore their data, start here.

Welcome Expression Atlas

Expression atlas is a resource that allows users to find out where genes are expressed, and how the expression changes when disease is present. In this release, we have imported cross-references to all RNAs that have expression data, so that users can find which sequences have expression data. To explore the data imported from Expression Atlas, have a look at all Expression Atlas entries, or for a single example look at the URS below:




We plan to extend our expression data to include visualisations in the future. Please let us know if you have any suggestions for expression data in RNAcentral.

Visualise locations

We have included SwissBioPic visualisations into RNAcentral. These provide a simple and clear way to visualise the locations of RNA molecules in a cell. Here we show MALAT1 as an example:




We have used the gene ontology annotations provided by groups like the Functional Gene Annotation team at UCL, as the source for these figures.

LitScan updates

LitScan is our text mining tool and is used to associate papers with the RNA sequences they mention. We use our comprehensive collection of RNA names to associate sequences with papers. It has been updated with 5.6 million new RNA names leading to 545,244 papers connected to 2.5 million sequences. Please reach out to us if we have missed any names!

Additionally, we are preparing exports of LitScan results. In future releases this will be part of our FTP export. Get in contact if you would like to hear about the export as soon as it is ready.

FTP export updates

RNAcentral is powered by a PostgreSQL database, which is publicly available. We are now making dumps of the latest release publically available in our FTP site. We intend to keep only the latest dump available. Power users interested in large scale analysis with RNAcentral can fetch and use our database dumps now. Take a look at some documentation, details on our schema, or reach out if you have any questions!

Database updates

  • ENA (snapshot as of October '22)

  • Ensembl (version 107)

  • FlyBase (2022_04)

  • GeneCards (5.12)

  • gtRNAdb (19)

  • HGNC (snapshot as of September '22)

  • IntAct (1.0.3)

  • PDBe (snapshot as of September '22)

  • PSICQUIC (snapshot as of September '22)

  • PomBase (2022-08-24)

  • QuickGO (snapshot as of September '22)

  • RefSeq (213)

  • Rfam (14.7)

  • SILVA (138.1)

  • ZFIN (2022-08-25)

  • ZWD (1.1)

Get in touch

As always, all data are freely available on the RNAcentral website, via the API, our public database, and in the FTP archive. If you have any feedback, please get in touch by email, on Twitter, or by submitting an issue on GitHub. We look forward to hearing from you!