We are pleased to announce RNAcentral release 24, which features a major update to the tmRNA Website, improvements to our genome browser and updates to LitScan and LitSumm. LitSumm now generates summaries using GPT4. Read on for the details.
Welcome tmRNA Website 2.0
Recently, the tmRNA Website has undergone two major changes. First, they have overhauled how they identify tmRNA sequences. They have provided a new dataset, which now provides RNAcentral 96,670 sequences, which are extensively annotated with functional features. For example:
Show above is an example of Candidatus Gastranaerophilaceae tmRNA (URS0002856755/3022868)
Secondly, the tmRNA Website is no longer updated and all data is now hosted at RNAcentral. We will continue to host and update this data as normal. You can browse the tmRNA data by searching expert_db:”tmRNA Website”.
Improved taxonomic search
We have improved our taxonomic search to allow users to more easily find sequences of interest. In our last release we made it possible for users to search for subspecies, but this wasn’t very user friendly. Now there is an option to find all sequences from that taxon or any subspecies right in the taxonomic search box. For example this search can now find all E. coli and subspecies, like E. coli K-12.
Here is an example of searching for: (TAXONOMY:"562" OR lineage_path:"562"). There is now a simple button to push to get taxonomy aware search.
This change applies to all taxonomic levels, so searching for all sequences for bacterial rRNAs is as easy as (TAXONOMY:"2" OR lineage_path:"2") AND so_rna_type_name:"RRNA". Currently, users have to choose to use subspecies searching, but this may change in the future. Try it out and let us know if you have any feedback!
LitScan updates
LitScan is our text mining tool and is used to associate papers with the RNA sequences they mention. We use our comprehensive collection of RNA names to associate sequences with papers.
A significant increase in the number of articles was observed in this release. The number of papers increased from 915 thousand to 1.1 million, and there are 2.5 million sequences with associated papers. Additionally, LitScan actively monitors retracted articles, promptly eliminating them from the results. Over the last three months, approximately 1,000 articles have been retracted. You can browse all sequences with associated papers here: RNA AND has_lit_scan:"True".
LitSumm updates
LitSumm, our tool to produce gene summaries for ncRNAs, has been updated to use GPT4. As part of the update, we worked with Sam Griffiths-Jones to help validate selected miRNA summaries. Thanks Sam! Overall, the change to GPT4 has brought some impressive improvements. You can read the details in our updated pre-print or browse the summaries.
We are looking forward to expanding LitSumm to as many RNAs as possible. We also have a prototype API for fetching these summaries and would be interested in feedback on it. If you have any feedback on the summaries or would like to use LitSumm on your site, please get in contact.
Database updates
These databases were updated in this release to the version stated below:
Ensembl 110 -> 111
Ensembl Fungi 57 -> 58
Ensembl Metazoa 57 -> 58
Ensembl Plants 57 -> 58
Ensembl Protists 57 -> 58
Ensembl/GENCODE human 42/mouse M31 -> human 45/mouse M34
FlyBase FB2023_05 -> FB2024_01
GeneCards 5.18 -> 5.19
lncRNAdb 221 -> 223
MalaCards 5.18 -> 5.19
Rfam 14.9 -> 14.10
RefSeq 221 -> 223
The below databases were imported at their latest version as of 2024-02-20
ENA
HGNC
IntAct
PDBe
PomBase
SGD
SRPDB
tmRNA Website
WormBase
ZFIN
Website updates
Various minor errors have been rectified on the website, including the previously missing thin lines denoting introns in the RNAcentral genomic coordinates data. Should you encounter any inaccuracies, kindly inform us.