30 Nov 2023

RNAcentral Release 23

We are pleased to announce RNAcentral release 23, featuring two new expert databases, MGnify and REDIportal, a new genome browser and a new lineage based search. We are also releasing LLM generated summaries of literature from our tool LitSumm.

Welcome MGnify

MGnify is a database of microbiome data. They provide a large collection of metagenomes assembled genomes (MAGs) from submitted and publicly accessible datasets. As part of their analysis pipelines, they use Rfam to identify ncRNAs. In this release we have imported ncRNAs from complete MAGs. This has created 135,924 new metagenome sequences from 1,929 organisms, like https://rnacentral.org/rna/URS000093D738/562 shown below.

Welcome REDIportal

REDIportal is a database of RNA editing events primarily in human sequences. In this release RNAcentral has imported the editing events to display on our sequence feature viewer. You can find sequences with edits using a simple search: has_editing_event:"True" and see an example below:



Improved genome browser

We have moved from using genoverse to igv.js. This change allows users to now upload their own tracks to view alongside RNAcentral annotations. Also, this improves the long-term maintenance and stability of RNAcentral. Below is an example of the old vs new browser.

Try out the new browser and let us know what you think! 

New lineage search

The RNAcentral search index now allows users to search by phylogeny. As an example you can now do lineage_name:"Bacteria" to find all sequences which are from any bacteria, similar searches can be done with the NCBI taxonomy id using ‘lineage_path’. Previously, searches would only find exact taxonomic matches. Thanks to the EBI search team for making this possible!

Introducing LitSumm

The long term goal of RNAcentral is to let our users know what is the function of any RNA they find in our database. We have been working toward this goal with Rfam analysis and more recently LitScan, our tool to connect literature to RNAcentral sequences. In this release, we are announcing LitSumm, our tool to summarise open access literature about specific ncRNAs using LLMs and provide citations for all claims. You can find the details of our method in our preprint: LitSumm: Large language models for literature summarisation of non-coding RNAs. Using this we summarised the literature about 4,610 RNAs, which you can see with a simple search: has_litsumm:"True". Below is the example summary for SNORA73B (URS00006422E6_9606).

SNORA73B is a small nucleolar RNA (snoRNA) that has been found to be overexpressed in Huh7 cells [PMC8763008]. It is also downregulated in histone encoding genes and spliceosome-associated small nuclear ribonucleoproteins (RNP) [PMC7191197]. In the context of age-related macular degeneration (AMD), SNORA73B has been found to be expressed at higher levels in both retina and PRCS tissues compared to normal tissues, suggesting a potential role in AMD [PMC5813239]. SNORA73B is associated with SIRT7 and is involved in the processing of pre-rRNAs to produce mature rRNAs [PMC4754350] [PMC8410784]. Overexpression of SNORA73B has been shown to inhibit the expression of its host genes [PMC8763008]. The snoRNABase database provides accession numbers and approved symbols for snoRNAs, including SNORA73B [PMC1687206]. The expression of SNORA23 and SNORA73B is regulated by the AKT molecular inhibitor, GSK2141795, suggesting their involvement in the PI3K/Akt/mTOR signaling pathway [PMC8763008]. In terms of cancer prognosis, SNORA73B has been identified as a potential predicting factor for outcome in cutaneous melanoma (CM) [PMC7550331]. In summary, SNORA73B is a snoRNA that shows differential expression patterns in various cellular contexts and diseases. It plays a role in RNA processing and may have implications for AMD and cancer prognosis.

We encourage the community to take a look at these summaries and give us feedback. We have big plans for the future of LitSumm and are very excited about what LLMs can offer RNAcentral.
Database updates

These databases were updated in this release to the version stated below:

Ensembl 108 ->110
Ensembl Fungi 55->57
Ensembl Metazoa 55->57
Ensembl Plants 55->57
Ensembl Protists 55->57
FlyBase FB2022_05 -> FB2023_05
GeneCards 5.12 -> 5.18
gtRNAdb 19 -> 21
MalaCards 5.12 -> 5.18
RefSeq 216 -> 221
Rfam 14.9
REDIportal 2.0

The below databases were imported at their latest version as of 2023-10-19:

WormBase (ENA)
Expression Atlas

Get in touch

As always, all data are freely available on the RNAcentral website, via the API, our public database, and in the FTP archive. If you have any feedback, please get in touch by email, on Twitter, or by submitting an issue on GitHub. We look forward to hearing from you!