We are pleased to announce RNAcentral release 23, featuring two new expert databases, MGnify and REDIportal, a new genome browser and a new lineage based search. We are also releasing LLM generated summaries of literature from our tool LitSumm.
Welcome MGnify
MGnify is a database of microbiome data. They provide a large collection of metagenomes assembled genomes (MAGs) from submitted and publicly accessible datasets. As part of their analysis pipelines, they use Rfam to identify ncRNAs. In this release we have imported ncRNAs from complete MAGs. This has created 135,924 new metagenome sequences from 1,929 organisms, like https://rnacentral.org/rna/URS000093D738/562 shown below.
Welcome REDIportal
REDIportal is a database of RNA editing events primarily in human sequences. In this release RNAcentral has imported the editing events to display on our sequence feature viewer. You can find sequences with edits using a simple search: has_editing_event:"True" and see an example below:
(https://rnacentral.org/rna/URS000047A7F4/9606)
Improved genome browser
We have moved from using genoverse to igv.js. This change allows users to now upload their own tracks to view alongside RNAcentral annotations. Also, this improves the long-term maintenance and stability of RNAcentral. Below is an example of the old vs new browser.
Try out the new browser and let us know what you think!
New lineage search
The RNAcentral search index now allows users to search by phylogeny. As an example you can now do lineage_name:"Bacteria" to find all sequences which are from any bacteria, similar searches can be done with the NCBI taxonomy id using ‘lineage_path’. Previously, searches would only find exact taxonomic matches. Thanks to the EBI search team for making this possible!
Introducing LitSumm
The long term goal of RNAcentral is to let our users know what is the function of any RNA they find in our database. We have been working toward this goal with Rfam analysis and more recently LitScan, our tool to connect literature to RNAcentral sequences. In this release, we are announcing LitSumm, our tool to summarise open access literature about specific ncRNAs using LLMs and provide citations for all claims. You can find the details of our method in our preprint: LitSumm: Large language models for literature summarisation of non-coding RNAs. Using this we summarised the literature about 4,610 RNAs, which you can see with a simple search: has_litsumm:"True". Below is the example summary for SNORA73B (URS00006422E6_9606).
We encourage the community to take a look at these summaries and give us feedback. We have big plans for the future of LitSumm and are very excited about what LLMs can offer RNAcentral.
Database updates
These databases were updated in this release to the version stated below:
The below databases were imported at their latest version as of 2023-10-19:
Ribocentre
SGD
SRP DB (ENA)
WormBase (ENA)
ZFIN
PDB
PomBase
QuickGO
MGNIFY
HGNC
IntAct
Expression Atlas
ENA
Get in touch
As always, all data are freely available on the RNAcentral website, via the API, our public database, and in the FTP archive. If you have any feedback, please get in touch by email, on Twitter, or by submitting an issue on GitHub. We look forward to hearing from you!