<1> 使用R包org.Hs.eg.db进行转换

安装Bioconductor源的第三方R包

1
2
3
source("http://bioconductor.org/biocLite.R")
biocLite("org.Hs.eg.db")
library(org.Hs.eg.db)

查看org.Hs.eg.db数据对象包含的各大主流数据库的数据。

1
keytypes(org.Hs.eg.db)

[1] “ACCNUM” “ALIAS” “ENSEMBL” “ENSEMBLPROT” “ENSEMBLTRANS”
[6] “ENTREZID” “ENZYME” “EVIDENCE” “EVIDENCEALL” “GENENAME”
[11] “GO” “GOALL” “IPI” “MAP” “OMIM”
[16] “ONTOLOGY” “ONTOLOGYALL” “PATH” “PFAM” “PMID”
[21] “PROSITE” “REFSEQ” “SYMBOL” “UCSCKG” “UNIGENE”
[26] “UNIPROT”

使用 select 函数能够提取其中部分内容。例如,提取”ENSG00000130720”, “ENSG00000103257”, “ENSG00000156414”, “ENSG00000144644”, “ENSG00000159307”, “ENSG00000144485” 这六个基因的SYMBOL和GENENAME信息。

1
2
3
ensids <- c("ENSG00000130720", "ENSG00000103257", "ENSG00000156414", "ENSG00000144644", "ENSG00000159307", "ENSG00000144485")
cols <- c("SYMBOL", "GENENAME")
select(org.Hs.eg.db, keys=ensids, keytype="ENSEMBL", columns=cols)

ENSEMBL SYMBOL GENENAME
1 ENSG00000130720 FIBCD1 fibrinogen C domain containing 1
2 ENSG00000103257 SLC7A5 solute carrier family 7 member 5
3 ENSG00000156414 TDRD9 tudor domain containing 9
4 ENSG00000144644 GADL1 glutamate decarboxylase like 1
5 ENSG00000159307 SCUBE1 signal peptide, CUB domain and EGF like domain containing 1
6 ENSG00000144485 HES6 hes family bHLH transcription factor 6

因此,可以利用这种方式来进行基因ID的转换。例如上面代码,我们有几个ensembl的基因ID号,想找它们所对应的gene名和缩略词简称,就通过select函数来搞定。

当然,select 函数还可以应用于信息的查看,例如想要查看BRCA1基因的GO、ENTREZID等等信息:

1
select(org.Hs.eg.db, keys="BRCA1", keytype="SYMBOL", columns=c("ENSEMBL","UNIGENE","ENTREZID","CHR","GO","GENENAME"))

SYMBOL ENSEMBL UNIGENE ENTREZID CHR GO EVIDENCE ONTOLOGY GENENAME
1 BRCA1 ENSG00000012048 Hs.194143 672 17 GO:0000151 NAS CC BRCA1, DNA repair associated
2 BRCA1 ENSG00000012048 Hs.194143 672 17 GO:0000724 IDA BP BRCA1, DNA repair associated
3 BRCA1 ENSG00000012048 Hs.194143 672 17 GO:0000729 TAS BP BRCA1, DNA repair associated
4 BRCA1 ENSG00000012048 Hs.194143 672 17 GO:0000731 TAS BP BRCA1, DNA repair associated
5 BRCA1 ENSG00000012048 Hs.194143 672 17 GO:0000732 TAS BP BRCA1, DNA repair associated
6 BRCA1 ENSG00000012048 Hs.194143 672 17 GO:0000800 IDA CC BRCA1, DNA repair associated
7 BRCA1 ENSG00000012048 Hs.194143 672 17 GO:0003677 TAS MF BRCA1, DNA repair associated

<2> 使用R包clusterProfiler的bitr方法进行转换

安装Bioconductor源的第三方R包

1
2
3
source("http://bioconductor.org/biocLite.R")
options(BioC_mirror="http://mirrors.ustc.edu.cn/bioc/")
biocLite("clusterProfiler")

进行ID转换

1
2
3
4
5
library("clusterProfiler")
gene.df <- bitr(gene, fromType = "ENTREZID", #fromType是指你的数据ID类型是属于哪一类的
toType = c("ENSEMBL", "SYMBOL"), #toType是指你要转换成哪种ID类型,可以写多种,也可以只写一种
OrgDb = org.Hs.eg.db)#Orgdb是指对应的注释包是哪个
head(gene.df)

<3> 使用R包AnnotationDbi进行转换

1
2
3
4
5
6
library(AnnotationDbi)
mySymbols <- mget(gene, org.Hs.egSYMBOL, #这个是可以选择的,选择不同,转换的ID类型也不一样
ifnotfound=NA)
# 转换成Symbol ID
head(mySymbols)
class(mySymbols)

<4> 使用在线网页工具进行转换

  1. DAVID: The Database for Annotation, Visualization and Integrated Discover

http://david.abcc.ncifcrf.gov/conversion.jsp.

挺强大的一个工具,不过可能就是速度有点慢…还有一个缺点就是数据不能及时更新。

  1. Biomart

http://www.biomart.org/biomart/martview/65c2ea6c079d1b85820fa5bbf5af62b5

非常不错的工具,定期发布新版本,而且可以将数据下载到本地进行操作,推荐。

  1. BioDBnet

http://biodbnet.abcc.ncifcrf.gov/db/db2db.php

  1. Hyperlink Management System (HMS)

http://biodb.jp/

  1. BridgeDB

http://www.biomedcentral.com/1471-2105/11/5

  1. Uniprot 也提供了比较好的转换工具

http://www.uniprot.org/ (ID Mapping)

  1. KEGG 的API

http://www.kegg.jp/kegg/rest/keggapi.html (conv)

参考资料

https://vip.biotrainee.com/d/109-entrez-id