To kick off 2017 we are happy to announce that a new version of RNAcentral is now available. The latest release includes official human gene names from the HGNC database as well as new data from ENA, RefSeq, and PDB. The data are available on the RNAcentral website, via the API, and in the FTP archive.
Official human ncRNA gene names from HGNC
Starting from this release, RNAcentral links to ncRNAs from HGNC, which is a database that assigns unique and stable names to human genes. The HGNC gene symbols are the official names for human genes and are widely used in the literature and across many resources. You can find out more about HGNC in this NAR paper.In addition to gene names, HGNC provides manually curated links to relevant publications and database accessions from RefSeq, Vega, and other resources. We used these accessions to map HGNC identifiers to RNAcentral entries so that each HGNC entry is matched to one RNAcentral sequence. For example, the HGNC entry for HOTAIR corresponds to RefSeq accession NR_003716, which is found in RNAcentral under the identifier URS000075C808.
As a result of this mapping, over 95% of 6,357 HGNC ncRNA genes of the sequences were connected to RNAcentral identifiers using RefSeq, Vega, or gtRNAdb identifiers from HGNC. If none of these were found in RNAcentral, we retrieved sequences for Ensembl genes (where available) using Ensembl REST API and matched them to RNAcentral accessions by sequence identity. Only about 300 HGNC ncRNA entries (<5%) remained unmapped, most of which are piRNA clusters, rRNAs, and snoRNAs. Some of these ncRNAs will be matched to RNAcentral in future releases, as they get integrated in RefSeq and other RNAcentral databases. Other RNAs, such as piRNA clusters, are unlikely to be mapped to RNAcentral because they correspond to a large number of RNA sequences.
Browse ncRNAs from HGNC or view HGNC summary page in RNAcentral.