18 Oct 2021

RNAcentral Release 19

 


We are pleased to announce that RNAcentral release 19 is now live. Version 19 features a new intermolecular interaction database, PSICQUIC, improved genome mapping, updated Rfam annotations, and CPAT-based protein detection. Read on to learn more or browse >31 million non-coding RNAs in RNAcentral.

Welcome to PSICQUIC

RNAcentral welcomes PSICQUIC (pronounced ‘psychic’) as a new member database. This database provides manually curated intermolecular interactions and we have imported the ncRNA-protein interaction data produced by UCL functional gene annotation team. This adds 4,420 new RNA-protein interactions to RNAcentral, for example human hsa-miR-27a-5p has 42 new interactions:

Future updates to PSICQUIC will be added to RNAcentral as part of our release process. You can browse all sequences with PSICQUIC annotations here.

Updated Rfam Analysis

Since RNAcentral started annotating sequences with Rfam models in release 8, Rfam has updated and improved hundreds of families. RNAcentral always uses the latest Rfam models on the new sequences, but will only yearly reanalyze all sequences. As part of this release we reanalyzed all sequences with current Rfam models. We will continue to ensure RNAcentral keeps sequence annotations up-to-date with Rfam.

Visualisation of OFRs predicted by CPAT

RNAcentral is committed to providing a high quality set of ncRNAs. To that end we have started analyzing all sequences from human, fly, zebrafish, and mouse with CPAT, which predicts possible open reading frames. We found that only 6% of sequences from scanned organisms contain a possible ORF. For example, Homo sapiens long non-coding RNA NONHSAT212870.1, contains a possible and it is displayed in the feature viewer as:



We have added a new QA check, possible ORF, to ensure users are aware of these issues, and these can be browsed here.

Database updates

We have updated 11 databases, bringing the total number of sequences to 31 million. The updated databases are:

  • ENA (as of 3 September 2021)
  • FlyBase (FB2021_04)
  • GeneCards/MalaCards (5.5)
  • GtRNAdb (release 19)
  • HGNC (as of 3 September 2021)
  • IntAct (as of 3 September 2021)
  • PDBe (as of 3 September 2021)
  • PSICQUIC (as of 3 September 2021)
  • PomBase (as of 2 September 2021)
  • QuickGO (as of 3 September 2021)
  • RefSeq (208)

Get in touch

As always, all data are freely available on the RNAcentral website, via the API, our public database, and in the FTP archive. The next release is scheduled for January 2022. In the meantime, if you have any feedback, please get in touch by email, on Twitter, or by submitting an issue on GitHub. We look forward to hearing from you!

13 Sept 2021

Welcome Blake as the new RNA Resources Project Leader

RNAcentral and Rfam recently completed the search for a project leader to succeed Anton Petrov and Blake Sweeney has been appointed and recently started in his new role. Some of you may know Blake as the current RNAcentral bioinformatician where he has been running the bioinformatic pipeline and speaking at conferences. With a PhD in RNA bioinformatics and a decade of experience developing RNA databases, including 4.5 years at RNAcentral, Blake is perfectly positioned to take the projects forward. The official handover date is in May, but until then Anton and Blake will be working together to ensure a smooth transition.

Additionally, we are now hiring a new RNAcentral bioinformatician. If you are interested in applying please see: https://bit.ly/38L86Xe. If you have any questions or comments about the RNA resources please contact Blake Sweeney at bsweeney@ebi.ac.uk.

30 Jun 2021

RNAcentral Release 18

We are pleased to announce that release 18 of RNAcentral is available! This release features improvements to the RNA types, secondary structures, and updates to our member databases.


Additionally, we are hiring for the Project Leader role. If you are interested please see the job description here

Improved RNA types

We worked with the Sequence Ontology (SO) team to improve the rRNA terms.The Sequence Ontology now reflects the diversity of rRNA sequences with specific subtypes for cytosolic, mitochondrial, and plastid rRNAs. A  summary of the changes is shown below.



The new SO terms make it easier and quicker to find specific subtypes of rRNAs. Thanks to everyone who made this possible, including the SO team as well as Anton S. Petrov (Georgia Tech) and Steven Marygold (FlyBase)!


We include these new terms, along with some other minor improvements in our RNA type facet. Below is a comparison of the old, on the left, and new, on the right, facet.



You can browse the new RNA terms here and try browsing some new annotations, like mitochondrial LSU here. We plan to continue improving the precision and extent of our annotations in future releases.

Improvements to secondary structures

We recently published our method of drawing RNA secondary structures, R2DT. This approach is based on matching sequences to templates and then folding the sequences into a secondary structure that matches the template. For more details, you can read our paper here

In this release we have added a quality assurance step to R2DT. We now show fewer low quality diagrams of rRNA sequences. With these changes and updates we now have 25.9 million sequences with secondary structure diagrams. We plan to continue improving R2DT, if you have any feedback on our diagrams please get in touch! You can browse the diagrams here.

Database updates

We have updated 17 databases bringing the total number of sequences up to 30.7 million. Below is the list of updates. 


  • ENA (snapshot as of 7 Jun 2021)

  • Ensembl (104)

  • Ensembl/GENCODE (human 38/mouse 27)

  • Ensembl Genomes (51)

  • FlyBase (fb_2021_03)

  • GeneCards (5.2)

  • HGNC (2021-05-17)

  • IntAct (2021-05-17)

  • Malacards (5.2)

  • PDB (2021-05-17)

  • PomBase (2021-05-17)

  • QuickGO (2021-05-12)

  • RefSeq (205)

  • SGD (2021-04-27)

  • SILVA (138.1)

  • ZFIN (2020-06-22)

  • ZWD (1.1)

Get in touch

The data can be freely accessed on the RNAcentral website, via the API, and in the FTP archive. The next release is scheduled for September 2021. In the meantime, please get in touch by email, on Twitter, or by submitting an issue on GitHub. We look forward to hearing from you!

9 Jun 2021

R2DT paper is out in Nature Communications

We are happy to announce that a paper describing R2DT, a method for visualising RNA secondary structures in consistent, reproducible, and recognisable layouts, has been published in Nature Communications.

R2DT powers the secondary structure visualisations in RNAcentral and enables displaying of a wide range of RNA structures in familiar layouts, from small RNAs like tRNAs to some of the largest known structured RNAs, such as the large ribosomal subunit RNA.


You can find out more about R2DT in on the EMBL website, including a story about how R2DT helped with public engagement in primary schools, or visualise your RNA sequence using the R2DT web server.

Image


R2DT is the result of a community effort across several disciplines, including 6 different groups in 3 countries. We would like to thank everyone who contributed to this project!

The success of the method depends on having a large number of high quality templates for different RNA types provided by the RNA community. We are always looking for more templates, so if you have any diagrams you would like to have included, please reach out to us on GitHub, Twitter, or by email.

9 Mar 2021

RNAcentral release 17


We are pleased to announce that a new RNAcentral release is now live. Version 17 features new RNAse P secondary structure diagrams, a new API for running sequence similarity searches programmatically, as well as links to piRBase, a database of piRNAs. Read on to learn more or browse >30 million non-coding RNAs in RNAcentral


You can also help improve RNAcentral by filling out a short user survey. Please let us know how to make RNAcentral more useful for your needs by filling out the survey.

Welcome to new databases

piRBase is a database of piRNA sequences across 21 different organisms with over 170 million sequences. We now provide cross links from RNAcentral to piRBase. Here is an example sequence report showing links to piRBase annotating for a mouse piRNA:


Due to the large size of piRBase we have limited the import to include only the sequences that were already available in RNAcentral, which resulted in 219,000 annotated sequences. You can browse piRBase data in RNAcentral here.

Changes to ENA import

We have reduced the number of metagenomic rRNA fragments coming from ENA. In the last release we found this was becoming a larger and larger fraction of RNAcentral data. To provide our users with high quality datasets we began analyzing sequences with ribovore and excluding partial rRNA metagenomic sequences (matching less than 90% of the Rfam rRNA models). This excludes about 7 million sequences from RNAcentral. In the future we will work on methods to ensure RNAcentral remains a comprehensive but high quality database. If you have any questions or comments about this approach please reach out to us by email or on GitHub.

RNAcentral sequence search runs GtRNAdb

GtRNAdb now joins Rfam, miRBase, and snoDB in using the embeddable RNAcentral sequence search widget. This widget provides nhmmer sequence searches, Rfam classification and secondary structure prediction in a simple, easy to use interface.



If you would like to add the search to your website, the code and the documentation are available on GitHub. The widget is easy to integrate into any website with just 2 lines of code. It can be customised to match the appearance of the host website and to search all or just a subset of RNAcentral sequences.

Run sequence similarity searches programmatically 

The RNAcentral sequence similarity search can now be run programmatically. We have an API, with Swagger documentation and example code available at https://rnacentral.org/sequence-search/api. If you have any questions about the API please  get in touch.

New RNase P secondary structures

We have improved the R2DT software by adding new templates for the RNase P. With 19 new templates that represent a wide range of organisms. These new templates provide a clearer and more consistent display of RNase P secondary structures. As an example of the improvement here is the before, on the left, and after, on the right, for human ribonuclease P RNA component H1 (URS000013F331_9606):


You can browse the new structures here.

Get in touch

The data can be freely accessed on the RNAcentral website, via the API, and in the FTP archive. The next release is scheduled for May 2021. In the meantime, please get in touch by email, on Twitter, or by submitting an issue on GitHub. We look forward to hearing from you!