6 Jan 2017

RNAcentral release 6

To kick off 2017 we are happy to announce that a new version of RNAcentral is now available. The latest release includes official human gene names from the HGNC database as well as new data from ENA, RefSeq, and PDB. The data are available on the RNAcentral website, via the API, and in the FTP archive.

Official human ncRNA gene names from HGNC

Starting from this release, RNAcentral links to ncRNAs from HGNC, which is a database that assigns unique and stable names to human genes. The HGNC gene symbols are the official names for human genes and are widely used in the literature and across many resources. You can find out more about HGNC in this NAR paper.

In addition to gene names, HGNC provides manually curated links to relevant publications and database accessions from RefSeq, Vega, and other resources. We used these accessions to map HGNC identifiers to RNAcentral entries so that each HGNC entry is matched to one RNAcentral sequence. For example, the HGNC entry for HOTAIR corresponds to RefSeq accession NR_003716, which is found in RNAcentral under the identifier URS000075C808.

As a result of this mapping, over 95% of 6,357 HGNC ncRNA genes of the sequences were connected to RNAcentral identifiers using RefSeq, Vega, or gtRNAdb identifiers from HGNC. If none of these were found in RNAcentral, we retrieved sequences for Ensembl genes (where available) using Ensembl REST API and matched them to RNAcentral accessions by sequence identity. Only about 300 HGNC ncRNA entries (<5%) remained unmapped, most of which are piRNA clusters, rRNAs, and snoRNAs. Some of these ncRNAs will be matched to RNAcentral in future releases, as they get integrated in RefSeq and other RNAcentral databases. Other RNAs, such as piRNA clusters, are unlikely to be mapped to RNAcentral because they correspond to a large number of RNA sequences.

Browse ncRNAs from HGNC or view HGNC summary page in RNAcentral.

Database growth over time

RNAcentral now contains almost 11 million unique RNA sequences from 23 Expert Databases. There are 750 thousand new distinct ncRNA sequences and 2 million additional cross-references from ENA, PDB, RefSeq in release 6 compared to release 5. To see how the RNAcentral database grew over time, explore the interactive charts at the RNAcentral stats page.

New NAR paper

If you haven’t seen the latest RNAcentral paper, the final version was published in the 2017 Database Issue of Nucleic Acids Research.

Get in touch

We plan to make the next release available in March, 2017. In the meantime, if you have any feedback please feel free to get in touch by email, on Twitter, or by submitting an issue on GitHub. We look forward to hearing from you!