29 Jan 2019

RNAcentral Release 11

We are pleased to announce that a new RNAcentral release is now available. Version 11 includes miRNA-target interactions from TarBase and LncBase, a new public Postgres database, data from Ensembl Plants, miRBase word clouds, and lots more.

Release 11 announcement

microRNA target interactions

We continue to expand the functional annotations available in RNAcentral by importing human and mouse miRNA-target interactions from LncBase and TarBase. These databases contain high quality, literature-based interactions between miRNAs and their lncRNA and protein targets. The interactions are displayed on sequences pages, for example human microRNA hsa-miR-1226-5p interacts with both lncRNAs and proteins:

You can also see when a lncRNA is targeted by a miRNA, for example:

The interactions are searchable, for example you can find sequences:
For more details see the help centre or browse the sequences in RNAcentral.

We would like to thank Artemis Hatzigeorgiou, Dimitra Karagkouni, Maria Paraskevopoulou and other present and former members of the DIANA lab for help with importing the data and developing the interface.

New website features

miRBase word clouds

Starting in version 22, miRBase began providing word clouds based on snippets from the open access literature mentioning miRBase accessions. Now the word clouds are available from the corresponding RNAcentral pages providing a quick insight into the function of a miRNA. For example, miRNA mir-100 is associated with cancer, with this term prominently featured in the word cloud:
Find out more about literature mining and word clouds in the latest miRBase paper.

Conserved regions

We now display conserved regions for 29 vertebrate species using data from the CRS website. The conserved structures were identified with CMFinder using vertebrate multiple sequence alignments (see the Genome Research paper for the description of the method).

The conserved regions are displayed in the sequence feature viewer alongside the Rfam annotations. Here is an example from URS0000BC44D5_9606.

We would like to thank Stefan Seemann and Jan Gorodkin (University of Copenhagen) for providing the data and enabling this integration.

New way of linking to RNAcentral

It is now possible to link to RNAcentral web pages without knowing the RNAcentral accessions. For example, if you would like to link to Ensembl transcript ENST00000365484, you can do so using a URL like /link/ensembl:ENST00000365484. See more examples in the documentation.

Public Postgres database

Following multiple user requests, we now provide a public Postgres database that provides the same data available on the RNAcentral website. The database is meant to help users who would like to access RNAcentral programmatically or are interested in tasks that are not yet supported by the website. The connection details, example queries, tips for quick start with Docker, and a sample Python script are found in RNAcentral Help.

You can now contribute new features or bug fixes for RNAcentral website by downloading the RNAcentral webcode and starting a local server using the public database. We welcome code contribution on GitHub. If you have any issues connecting to the database or have any questions please get in contact.

Data updates

We now import data from Ensembl Plants, a comprehensive source of plant gene annotations. Additionally we now import data from the Zasha Weinberg Database (ZWD), a recently established database of high-quality sequence alignments for a diverse range of habitats and organisms. Let us know if you see any other data sources for RNAcentral.

The following database have also been updated:
  • ENA (v137)
  • Ensembl (v95)
  • Ensembl Plants (v42)
  • FlyBase (fb_2018_05)
  • HGNC
  • LNCipedia (v5.2)
  • PDB
  • RefSeq
  • TAIR
  • WormBase (WS267)

Get in touch

The data are available on the RNAcentral website, via the API, and in the FTP archive. We plan to make the next release available in April, 2019. In the meantime, if you have any feedback please feel free to get in touch by email, on Twitter, or by submitting an issue on GitHub. We look forward to hearing from you!

22 Aug 2018

RNAcentral release 10

A new version of RNAcentral (release 10) is now available. This release features 10x more sequences with genomic coordinates, GO annotations from QuickGO, as well as new data from FlyBase, miRBase, LNCipedia, ENA, Ensembl, HGNC, PDB, and WormBase.

Genome mapping for >200 species

RNAcentral now provides genomic locations for sequences from over 200 organisms. The locations are based on a newly developed blat based alignment procedure. Details and an analysis of the method will be in an upcoming NAR paper. In short, sequences without known genome locations are aligned to the respective reference genomes using blat. The improved genome browser displays how the alignment was found. Here is an example of an human miRNA precursor URS000018AA08_9606 that is mapped with 99% identity to a single location in chromosome 17:
Browse all sequences with genome mapping or find your favourite organism in the genome browser.

Importing manual GO annotations

We are now displaying manual GO annotations imported from QuickGO. These annotations are based on the careful work of researchers like Rachel Huntley and colleagues (more details in their recent paper). The GO annotations are fetched from QuickGO and displayed as a table that includes an overview of the annotation terms. The ancestors of GO terms can be shown by clicking on the tree button. For example here is a snapshot for annotations for URS0000759B6D_9606.

Additionally, the search terms are searchable in the text search. For example searching for involved_in:"GO:1905563" finds all sequences annotated as being involved in negative regulation of vascular endothelial cell proliferation. You can also search for the names of GO terms, not just the term id, so searches like involved_in:"negative" are possible as well. The search isn’t yet ontology aware, so you can only search for direct annotations. You can browse all sequences with annotations with: has_go_annotations:"True". Read more in the documentation and try exploring the data to see what you can learn about your favourite sequence!

Updates for miRBase and LNCipedia

miRBase and LNCipedia recently had major new releases (versions 22 and 5.0 respectively). These updates to miRNA and lncRNA data have been one of the most requested features by our users.

Other data updates

The following database have also been updated:
  • ENA (release 136)
  • Ensembl (release 92)
  • FlyBase (FB2018_03)
  • RefSeq
  • PDB
  • HGNC
  • WormBase

Upcoming migration to https

On Monday, September 3rd, RNAcentral will switch to the https protocol which means that the main website URL will become https://rnacentral.org (note the “s” after http). Most people do not need to do anything, but if you use the RNAcentral API or provide links to the RNAcentral site, please replace “http” with “https” in the URL. Find out more about https in Google documentation or let us know if you have any questions about the migration.

Get in touch

The data are available on the RNAcentral website, via the API, and in the FTP archive. We plan to make the next release available in September, 2018. In the meantime, if you have any feedback please feel free to get in touch by email, on Twitter, or by submitting an issue on GitHub. We look forward to hearing from you!

11 Apr 2018

RNAcentral Release 9

A new version of RNAcentral (release 9) is available. This features the Rat Genome Database (RGD) as a new Expert Database as well as updated data from ENA, RefSeq, Ensembl, PDBe and HGNC. Additionally, we have added more search options, provided a feature viewer for sequences and improved the display of genomic locations.

More search options

One common request for our search has been to make it easy to search by length. It has been possible using the advanced syntax documented in our help; but this wasn’t intuitive, so we added the length slider.

Another request we’ve gotten has been for sorting options. Now you can control sorting so it isn’t just our default sorting. We allow sorting by popular species and descending length or just length in descending or ascending order. Let us know what sortings you would want, or new search features!

Sequence feature viewer

Now that RNAcentral shows Rfam annotations of sequences, we want to ensure these results are easy to understand. To do this we added a feature viewer for sequences, which we use to show the Rfam annotations of sequences and any modifications or non-standard bases. This viewer is particularly nice in cases where a sequence is composed of several Rfam models, like:

Here are a few interesting examples. Can you find any others?
  1. An incomplete sequence
  2. A complex sequence
  3. A simple, well annotated sequence
  4. An example that shows the evolutionary history of the 5.8S

More useful genomic location display

RNAcentral displays the genomic location of ncRNAs in selected organisms. For example, the genome browser shows human HOTAIR in chromosome 12. Now, RNAcentral has a table that summarizes all known locations, additionally the current sequence is highlighted.

This helps clarify the localizations when a sequence is found in many databases or locations. For example, human hsa-mir-10a precursor is only found on chromosome 17, while the human hsa-mir-3648 precursor appears twice in chromosome 21. Without the summary table you would have to carefully read the entire cross reference table to learn this. Additionally, this table provides links to viewing this region in the Ensembl and UCSC genome browsers. We are working on other improvements to our genomic locations, so stay tuned for big changes!

Welcome RGD

We have imported another Model Organism database, RGD. This database serves as the primary resource for genomic, phenotype and disease data generated from Rat research.

Other data updates

The following database have also been updated:
  • ENA (release 134) 
  • Ensembl (release 91) 
  • RefSeq 
  • PDB 
  • HGNC 
  • WormBase 
Get in touch
The data are available on the RNAcentral website, via the API, and in the FTP archive. We plan to make the next release available in June, 2018. In the meantime, if you have any feedback please feel free to get in touch by email, on Twitter, or by submitting an issue on GitHub. We look forward to hearing from you!

6 Dec 2017

RNAcentral release 8

A new version of RNAcentral (release 8) is now available. This release includes Mouse Genome Informatics (MGI) as a new Expert Database as well as new data from ENA, PDB, Ensembl, snoPY, RefSeq, HGNC, and Rfam. This release also features Rfam annotations of all sequences and secondary structures from GtRNAdb.

Rfam annotations for all sequences

Rfam is a database of functional non-coding RNA families. It provides covariance models that can be used by the Infernal software to classify ncRNA sequences into families.

Starting with this release, all RNAcentral sequences are compared against all Rfam families. Each RNA sequence page has a new Rfam section showing whether the sequence matched an Rfam family. For example, sequence URS00005B7DD8, originally annotated as miscellaneous RNA (misc_RNA), matches a conserved domain of the MALAT1 Rfam family and contains MEN beta RNA that is processed from MALAT1 by RNAse P:

Rfam annotations may provide additional functional context to poorly annotated sequences and help identify potential problems.

The majority of RNAcentral sequences (84%) were mapped to one or more Rfam family. However, not all RNAcentral sequences are expected to match Rfam families, as Rfam does not include piRNAs, full-length lncRNAs, and several other ncRNA types. To find out more about this work, see the latest Rfam paper.

Quality control based on Rfam annotations

The automatic annotations based on Rfam classification can be compared with the annotations provided by Expert Databases, which provides an opportunity for quality control. Currently, RNAcentral warns about three types of potential problems:
  1. Incomplete sequences
    When an RNAcentral sequence matches only a part of the Rfam covariance model, the sequence is flagged as incomplete, for example the following sequence matches less than half of the model:

    More than 4.5 million RNAcentral sequences fall into this category, most of which are partial rRNA sequences.
  2. Possible contamination
    When a Eukaryotic sequence matches an Rfam family that is only found in Bacteria, this could indicate bacterial contamination or taxonomic misclassification, for example the following Eukaryotic sequence matches Archaeal rRNA:

  3. Missing Rfam hits
    The majority of RNAcentral sequences annotated as rRNA or tRNA match the corresponding Rfam families. However, some sequences do not match the expected Rfam families which could mean that either the sequence has an incorrect RNA type or that the Rfam model needs to be updated. For example, the following tRNA sequence did not match any Rfam families which may indicate a problem:

The table below shows the number of sequences with and without annotation problems:

Number of sequences
No problems detected
Incomplete sequence
Missing hit
Potential contamination

Rfam warnings are displayed in text search results. For example, C. elegans RNA URS000049E54F_6239 matches a Bacterial RNA which is surprising and may require further investigation:

There is a new text search facet that allows to filter sequences based on the quality controls:

It is important to interpret the results of this automatic analysis with caution. For example, eukaryotic sequences found in organelles are expected to match bacterial Rfam models. However, you may see warnings on some RNAcentral pages when the software cannot recognise that the sequence comes from an organelle.

Read more about the quality checks in documentation and let us know if you have any feedback.

tRNA secondary structures from GtRNAdb

Following a major upgrade of the tRNAscan-SE software, GtRNAdb now provides a much broader range of tRNA sequences, including tRNAs with possible introns. RNAcentral imported Bacterial, Archeal, and selected model organism sequences from GtRNAdb increasing the coverage from 382 species to 4,239.

RNAcentral also displays RNA secondary structures provided by GtRNAdb using Forna, for example:

Welcome MGI

We have imported a new Model Organism Database, MGI (Mouse Genome Informatics), which serves as a primary resource for a spectrum of genetic, genomic and biological data supporting the use of the mouse as a model for understanding human biology and disease.

Other data updates

The following database have also been updated:
  • ENA (release 133)
  • Ensembl (release 90)
  • RefSeq
  • PDB
  • HGNC
  • FlyBase
  • WormBase

Get in touch

The data are available on the RNAcentral website, via the API, and in the FTP archive. We plan to make the next release available in February, 2018. In the meantime, if you have any feedback please feel free to get in touch by email, on Twitter, or by submitting an issue on GitHub. We look forward to hearing from you!