GRCh38.p13 is the latest version of the human reference genome assembly, which was released by the Genome Reference Consortium in December 2019. It contains several updates and improvements over the previous assembly, GRCh38, including more accurate annotations of protein-coding genes, non-coding RNAs, and structural variations. The designation “p13” refers to the 13th minor update to the assembly since its initial release. GRCh38.p13 is currently the most commonly used reference genome assembly for human genetics research and clinical applications.
To update the external_gene_name for human genes with the latest Ensembl database using the ensembl_gene_id, you can use the BioMart tool provided by Ensembl. Here are the steps to follow:
-
Go to the Ensembl website (www.ensembl.org) and click on the “BioMart” link under the “Tools” section.
-
Select “Ensembl Genes” as the dataset and choose the latest version of the database (e.g., GRCh38.p13) for the human species.
-
Select the attributes you want to retrieve by choosing the “Attributes” option. In this case, select “External Gene Name” and “Ensembl Gene ID.”
-
Filter the data using the “Filters” option by selecting “Ensembl Gene ID” as the filter type and entering the relevant gene IDs for which you want to update the external gene name.
-
Click on the “Results” button to generate the updated information.
-
Download the updated information in the desired format (e.g., CSV, TSV, or Excel).
-
Use the downloaded information to update the external_gene_name in your database or analysis pipeline.
Note that the Ensembl database may have updated gene annotations, so it is important to verify the updated information and ensure that it matches your requirements.
Here is a concrete example:
“DNAAF9” is an HGNC symbol. You can use the following website to translate all Ensembl gene IDs (namely the first column of your Excel table) to HGNC in a batch.
To translate identifiers from different databases, follow these steps:
-
Open the website: http://www.ensembl.org/biomart/martview
-
Choose the database “Ensembl genes 109”
-
Select the dataset for your desired organism: Human genes (GRCh38.p13)
-
Go to “Filters” > “Gene:” > “Input external reference ID list”
-
Select the chosen source database: Gene stable ID(s)
-
Provide a list of IDs, delimited by newline: copy the first column of your results.
#For example: ENSG00000088854 ENSG00000226328 ENSG00000086666 ENSG00000215717 ENSG00000168502 ENSG00000223518
-
Go to “Attributes” > “Gene:”
-
Untick “Transcript stable ID”
-
Leave “Gene stable ID” ticked
-
Go to “External:” and tick “Gene name,” “Gene description,” “HGNC ID,” and “HGNC symbol”.
-
Click “Results” at the top left. This gives a preview that can be exported into various formats.
-
The HGNC symbol and gene name refer to two different types of identifiers for genes. The HGNC symbol (HUGO Gene Nomenclature Committee symbol) is a short abbreviation assigned to each human gene by the HGNC, a committee responsible for standardizing and naming human genes. The HGNC symbol is typically composed of uppercase letters and sometimes includes numbers or special characters. For example, the HGNC symbol for the gene that causes cystic fibrosis is “CFTR”.
The gene name, on the other hand, is a longer, more descriptive name assigned to each gene based on its function, location, or other characteristics. Gene names are often more intuitive and easier to remember than HGNC symbols. For example, the gene name for the cystic fibrosis gene is “cystic fibrosis transmembrane conductance regulator”.
While the HGNC symbol and gene name can differ, they are often used interchangeably to refer to the same gene. In general, the HGNC symbol is used more commonly in scientific publications and databases, while the gene name is more often used in popular science writing or in clinical settings.