8 Oct 2025

RNAcentral Release 26

We are thrilled to announce RNAcentral release 26, which introduces a major milestone in how we organize and present non-coding RNA data: the creation of gene-level entries for all sequences across 204 organisms.

Why genes matter

Until now, RNAcentral has been a sequence-based resource, where each unique RNA sequence receives its own identifier (URS id) and is treated as a separate entry. While this approach has served us well, it has created some challenges for our users:

  • Fragmented related sequences: Transcripts that differ by just a single nucleotide are treated as completely separate entries, even when they represent variants of the same biological entity.
  • Confusing multiplicity: For highly studied genes like rRNAs, there can be thousands of near-identical sequences with slight variations in length or a few nucleotide changes.
  • Lost biological context: Many experiments and research questions operate at the gene level rather than the transcript level, making it difficult to find all relevant sequences for a particular gene.

These issues led us to develop a gene building pipeline that groups related transcripts together while maintaining RNAcentral's comprehensive coverage.

Building genes at scale

Creating gene-level entries for RNAcentral presented unique challenges. We couldn't simply adopt gene definitions from resources like Ensembl, because RNAcentral contains many sequences absent from other databases. We needed an automated approach that could:

  • Build gene-level entries comparable to those in Ensembl
  • Handle the addition of thousands of new sequences with each release
  • Maintain stable gene identifiers even as transcripts are added or modified
  • Work consistently across all organisms in RNAcentral

Our solution uses a graph clustering algorithm combined with machine learning. We compare pairs of transcripts using a random forest model trained on manually curated human genes from Ensembl/GENCODE. The model considers three types of features: the distance between transcript start sites, the overlap of exons, and the similarity of RNA types. Transcripts predicted to come from the same gene are connected in a graph, and communities within this graph become our genes.

To ensure genes remain stable across releases, we developed logic to track and merge gene clusters between versions. This means that as RNAcentral grows, gene identifiers persist even as new transcript variants are discovered and added.

What's in this release

RNAcentral release 26 now contains 103,814 human non-coding RNA genes built from 600,225 transcripts, covering 56 different RNA types. The most abundant are long non-coding RNAs (lncRNAs) with 65,187 genes, followed by antisense lncRNAs (16,790 genes) and pre-miRNAs (8,560 genes). Beyond human, we have predicted 367,909 genes across 203 species, totalling 1,189,743 transcripts. The average species has ~1800 predicted genes.

The average ncRNA gene in RNAcentral contains 6 transcripts, though this varies widely depending on the RNA type. We've successfully built single genes for well-studied lncRNAs like MALAT1 and NEAT1, and correctly separated miRNA sequences that map to different genomic locations.

Each gene receives a unique identifier following the pattern RNACG<species-prefix><11-digit hash>.<version>, allowing you to track genes across releases. Genes are assigned RNA types and descriptions based on expert database annotations, Rfam families, and R2DT structural information.

How to use genes in RNAcentral

We've made genes accessible throughout the RNAcentral website:

  • Text search: Select 'Genes' in the Entry Type facet to see only gene-level results
  • Sequence pages: All transcript entries now link to their parent gene (if applicable), e.g. https://rnacentral.org/rna/URS0000D59DC9/9606
  • GFF files: Genes appear as 'predicted_gene' entries in our downloadable annotation files, available on our FTP site

We recognize that some complex lncRNAs with extensive alternative splicing may not yet be built perfectly, and we're actively working on improvements. This is the first iteration of our gene building pipeline, and we'll continue to refine it based on user feedback and emerging edge cases.

Database Updates

Release 26 does not update any of the underlying data, so database versions are as they were in Release 25.

Get in touch

As always, all data are freely available on the RNAcentral website, via the API, our public database, and in the FTP archive. If you have any feedback about the new gene-level entries or any other aspect of RNAcentral, please get in touch by email, on Twitter, or by submitting an issue on GitHub. We look forward to hearing from you!

19 Jun 2025

RNAcentral Release 25

We are pleased to announce RNAcentral release 25, which features a major update to the Rfam, update TarBase to v9 and improvements to LitScan.

Welcome Rfam 15.0


Last year, Rfam underwent a major update to release 15.0, you can read more about in this blog post or with our NAR article. This produced a major growth in the number of sequences, with Rfam growing from 1,879,595 to 5,902,604 unique sequences in this release. This release comes with updated genomes used in Rfamseq and updated full sequence data for all Rfam families!

Tarbase updated to version 9


Tarbase produces, curates and delivers high-quality experimentally supported miRNA targets on protein-coding transcripts. In our latest release, Tarbase has been updated to the latest version 9, which includes a six-fold increase in the number of miRNA-gene interactions compared to the previous version. Click here to browse all TarBase sequences.

LitScan updates


Litscan has been updated to scan 13,631,547 ids across 1,273,544 papers. We are testing a new addition to LitScan to filter out false-positive articles that are unrelated to ncRNA. In our evaluations, this significantly reduced the number of articles identified where ncRNA was not the main focus of the work, particularly for identifiers that are common words. We hope you will see an improvement in the articles presented on the literature tab. 

Database updates


These databases were updated in this release to the version stated below:

ENA 2024-02-20 -> 2025-03-06
Ensembl 111 -> 113
Ensembl Fungi 58 -> 60
Ensembl Metazoa 58 -> 60
Ensembl Plants 58 -> 60
Ensembl Protists 58 -> 60
Ensembl/GENCODE human 45/mouse M25 -> human 47/mouse M34
Expression Atlas 2023-10-19 -> 2025-04-01
FlyBase FB2024_01 -> FB2025_01
GeneCards 5.19 -> 5.23
HGNC 2024-02-20 -> 2025-03-06
IntAct 2024-02-20 -> 2024-11-05
MalaCards 5.19 -> 5.23
MirGeneDB 3.0
PDBe 2024-02-20 -> 2024-11-05
PomBase 2024-02-20 -> 2024-11-05
QuickGo 2024-11-05
RefSeq 223 -> 2025-03-06
Rfam 14.10 -> 15.0
SGD 2024-02-20 -> 2025-03-06
SRPDB 2024-02-20 -> 2025-03-06
TarBase v7 -> v9
WormBase 2024-02-20 -> 2025-03-06
ZFIN 2024-02-20 -> 2024-11-05
lncRNAdb 223 -> 223

Sequence Search

Our sequence search tool is still in the process of being updated. While we don’t expect this will cause any problems, there may be some temporary capacity issues. This update should be completed soon, and the service will return to full functionality.

Get in touch

As always, all data are freely available on the RNAcentral website, via the API, our public database, and in the FTP archive. If you have any feedback, please get in touch by email, on Twitter, or by submitting an issue on GitHub. We look forward to hearing from you!

26 Mar 2024

RNAcentral Release 24

We are pleased to announce RNAcentral release 24, which features a major update to the tmRNA Website, improvements to our genome browser and updates to LitScan and LitSumm. LitSumm now generates summaries using GPT4. Read on for the details.

Welcome tmRNA Website 2.0

Recently, the tmRNA Website has undergone two major changes. First, they have overhauled how they identify tmRNA sequences. They have provided a new dataset, which now provides RNAcentral 96,670 sequences, which are extensively annotated with functional features. For example:


Show above is an example of Candidatus Gastranaerophilaceae tmRNA (URS0002856755/3022868)


Secondly, the tmRNA Website is no longer updated and all data is now hosted at RNAcentral. We will continue to host and update this data as normal. You can browse the tmRNA data by searching expert_db:”tmRNA Website”

Improved taxonomic search

We have improved our taxonomic search to allow users to more easily find sequences of interest. In our last release we made it possible for users to search for subspecies, but this wasn’t very user friendly. Now there is an option to find all sequences from that taxon or any subspecies right in the taxonomic search box. For example this search can now find all E. coli and subspecies, like E. coli K-12.


Here is an example of searching for: (TAXONOMY:"562" OR lineage_path:"562"). There is now a simple button to push to get taxonomy aware search.


This change applies to all taxonomic levels, so searching for all sequences for bacterial rRNAs is as easy as (TAXONOMY:"2" OR lineage_path:"2") AND so_rna_type_name:"RRNA". Currently, users have to choose to use subspecies searching, but this may change in the future. Try it out and let us know if you have any feedback!

LitScan updates

LitScan is our text mining tool and is used to associate papers with the RNA sequences they mention. We use our comprehensive collection of RNA names to associate sequences with papers. 


A significant increase in the number of articles was observed in this release. The number of papers increased from 915 thousand to 1.1 million, and there are 2.5 million sequences with associated papers. Additionally, LitScan actively monitors retracted articles, promptly eliminating them from the results. Over the last three months, approximately 1,000 articles have been retracted. You can browse all sequences with associated papers here: RNA AND has_lit_scan:"True".

LitSumm updates

LitSumm, our tool to produce gene summaries for ncRNAs, has been updated to use GPT4. As part of the update, we worked with Sam Griffiths-Jones to help validate selected miRNA summaries. Thanks Sam! Overall, the change to GPT4 has brought some impressive improvements. You can read the details in our updated pre-print or browse the summaries


We are looking forward to expanding LitSumm to as many RNAs as possible. We also have a prototype API for fetching these summaries and would be interested in feedback on it. If you have any feedback on the summaries or would like to use LitSumm on your site, please get in contact.

Database updates

These databases were updated in this release to the version stated below:

  • Ensembl 110 -> 111

  • Ensembl Fungi 57 -> 58

  • Ensembl Metazoa 57 -> 58

  • Ensembl Plants 57 -> 58

  • Ensembl Protists 57 -> 58

  • Ensembl/GENCODE human 42/mouse M31 -> human 45/mouse M34

  • FlyBase FB2023_05 -> FB2024_01

  • GeneCards 5.18 -> 5.19

  • lncRNAdb 221 -> 223

  • MalaCards 5.18 -> 5.19

  • Rfam 14.9 -> 14.10

  • RefSeq 221 -> 223


The below databases were imported at their latest version as of 2024-02-20

  • ENA

  • HGNC

  • IntAct

  • PDBe

  • PomBase

  • SGD

  • SRPDB

  • tmRNA Website

  • WormBase

  • ZFIN

Website updates

Various minor errors have been rectified on the website, including the previously missing thin lines denoting introns in the RNAcentral genomic coordinates data. Should you encounter any inaccuracies, kindly inform us.

Get in touch

As always, all data are freely available on the RNAcentral website, via the API, our public database, and in the FTP archive.  If you have any feedback, please get in touch by email, on Twitter, or by submitting an issue on GitHub. We look forward to hearing from you!

30 Nov 2023

RNAcentral Release 23

We are pleased to announce RNAcentral release 23, featuring two new expert databases, MGnify and REDIportal, a new genome browser and a new lineage based search. We are also releasing LLM generated summaries of literature from our tool LitSumm.

Welcome MGnify

MGnify is a database of microbiome data. They provide a large collection of metagenomes assembled genomes (MAGs) from submitted and publicly accessible datasets. As part of their analysis pipelines, they use Rfam to identify ncRNAs. In this release we have imported ncRNAs from complete MAGs. This has created 135,924 new metagenome sequences from 1,929 organisms, like https://rnacentral.org/rna/URS000093D738/562 shown below.



Welcome REDIportal

REDIportal is a database of RNA editing events primarily in human sequences. In this release RNAcentral has imported the editing events to display on our sequence feature viewer. You can find sequences with edits using a simple search: has_editing_event:"True" and see an example below:



(https://rnacentral.org/rna/URS000047A7F4/9606)

 

Improved genome browser

We have moved from using genoverse to igv.js. This change allows users to now upload their own tracks to view alongside RNAcentral annotations. Also, this improves the long-term maintenance and stability of RNAcentral. Below is an example of the old vs new browser.





Try out the new browser and let us know what you think! 


New lineage search

The RNAcentral search index now allows users to search by phylogeny. As an example you can now do lineage_name:"Bacteria" to find all sequences which are from any bacteria, similar searches can be done with the NCBI taxonomy id using ‘lineage_path’. Previously, searches would only find exact taxonomic matches. Thanks to the EBI search team for making this possible!


Introducing LitSumm

The long term goal of RNAcentral is to let our users know what is the function of any RNA they find in our database. We have been working toward this goal with Rfam analysis and more recently LitScan, our tool to connect literature to RNAcentral sequences. In this release, we are announcing LitSumm, our tool to summarise open access literature about specific ncRNAs using LLMs and provide citations for all claims. You can find the details of our method in our preprint: LitSumm: Large language models for literature summarisation of non-coding RNAs. Using this we summarised the literature about 4,610 RNAs, which you can see with a simple search: has_litsumm:"True". Below is the example summary for SNORA73B (URS00006422E6_9606).


SNORA73B is a small nucleolar RNA (snoRNA) that has been found to be overexpressed in Huh7 cells [PMC8763008]. It is also downregulated in histone encoding genes and spliceosome-associated small nuclear ribonucleoproteins (RNP) [PMC7191197]. In the context of age-related macular degeneration (AMD), SNORA73B has been found to be expressed at higher levels in both retina and PRCS tissues compared to normal tissues, suggesting a potential role in AMD [PMC5813239]. SNORA73B is associated with SIRT7 and is involved in the processing of pre-rRNAs to produce mature rRNAs [PMC4754350] [PMC8410784]. Overexpression of SNORA73B has been shown to inhibit the expression of its host genes [PMC8763008]. The snoRNABase database provides accession numbers and approved symbols for snoRNAs, including SNORA73B [PMC1687206]. The expression of SNORA23 and SNORA73B is regulated by the AKT molecular inhibitor, GSK2141795, suggesting their involvement in the PI3K/Akt/mTOR signaling pathway [PMC8763008]. In terms of cancer prognosis, SNORA73B has been identified as a potential predicting factor for outcome in cutaneous melanoma (CM) [PMC7550331]. In summary, SNORA73B is a snoRNA that shows differential expression patterns in various cellular contexts and diseases. It plays a role in RNA processing and may have implications for AMD and cancer prognosis.



We encourage the community to take a look at these summaries and give us feedback. We have big plans for the future of LitSumm and are very excited about what LLMs can offer RNAcentral.
Database updates

These databases were updated in this release to the version stated below:

Ensembl 108 ->110
Ensembl Fungi 55->57
Ensembl Metazoa 55->57
Ensembl Plants 55->57
Ensembl Protists 55->57
FlyBase FB2022_05 -> FB2023_05
GeneCards 5.12 -> 5.18
gtRNAdb 19 -> 21
MalaCards 5.12 -> 5.18
RefSeq 216 -> 221
Rfam 14.9
REDIportal 2.0


The below databases were imported at their latest version as of 2023-10-19:


Ribocentre
SGD
SRP DB (ENA)
WormBase (ENA)
ZFIN
PDB
PomBase
QuickGO
MGNIFY
HGNC
IntAct
Expression Atlas
ENA

Get in touch

As always, all data are freely available on the RNAcentral website, via the API, our public database, and in the FTP archive. If you have any feedback, please get in touch by email, on Twitter, or by submitting an issue on GitHub. We look forward to hearing from you!

23 Feb 2023

RNAcentral Release 22



We are pleased to announce RNAcentral release 22, featuring two new expert databases, EVlncRNAs and Ribocentre, a new visualisation, and updates to LitScan.

Welcome EVlncRNAs

EVlncRNAs is a database of experimentally validated long non-coding RNAs, curated from papers alongside their expression, interaction and association with disease. In release 22, we have imported the second version of the database, which expands on the original by manually curating almost 19,000 additional papers. You can explore their data here.



Welcome Ribocentre

Ribocentre aims to become a database of all natural ribozymes, and includes representative structures alongside the chemical mechanism of ribozymes. You can explore their data here.


Visualise expression

In release 21 we import cross references to Expression Atlas, we have gone further and now integrated their viewer into RNAcentral. You can now view expression information from some ncRNAs on RNAcentral. Below is one example:




Browse all sequences with data from Expression Atlas here.

LitScan updates

LitScan is our text mining tool and is used to associate papers with the RNA sequences they mention. We use our comprehensive collection of RNA names to associate sequences with papers. LitScan scanned all open access literature available on Europe PMC with 8,939,826 ids and found 865,179 articles which matched 4,497,573 unique sequences. Please reach out to us if we have missed any names!

Additionally, we are preparing exports of LitScan results. In future releases this will be part of our FTP export. Get in contact if you would like to hear about the export as soon as it is ready.

FTP export updates

RNAcentral is powered by a PostgreSQL database, which is publicly available. We are now making dumps of the latest release publically available on our FTP site. We intend to keep only the latest dump available. Power users interested in large scale analysis with RNAcentral can fetch and use our database dumps now. Take a look at some documentation, details on our schema, or reach out if you have any questions!

Database updates

These databases were updated in this release to the version stated below:

  • Ensembl 107 -> 108
  • Ensembl Fungi 54 -> 55
  • Ensembl Metazoa 54 -> 55
  • Ensembl Plants 54 -> 55
  • Ensembl Protists 54 -> 55
  • RefSeq 213 -> 216
  • Rfam 14.7 -> 14.9

The below databases were imported at their latest version as of 2023-01-25

  • ENA
  • FlyBase
  • HGNC
  • IntAct
  • PDBe
  • PSICQUIC
  • PomBase
  • QuickGO

Get in touch


As always, all data are freely available on the RNAcentral website, via the API, our public database, and in the FTP archive. If you have any feedback, please get in touch by email, on Twitter, or by submitting an issue on GitHub. We look forward to hearing from you!