dDAGgeneSim
is supposed to calculate pair-wise semantic
similarity between genes based on a direct acyclic graph (DAG) with
annotated data. It first calculates semantic similarity between terms
and then derives semantic similarity between genes from terms-term
semantic similarity. Parallel computing is also supported for Linux or
Mac operating systems.
dDAGgeneSim(g, genes = NULL, method.gene = c("BM.average", "BM.max", "BM.complete", "average", "max"), method.term = c("Resnik", "Lin", "Schlicker", "Jiang", "Pesquita"), force = TRUE, fast = TRUE, parallel = TRUE, multicores = NULL, verbose = TRUE)
dDAGannotate
)source("http://bioconductor.org/biocLite.R");
biocLite(c("foreach","doParallel"))
. If not yet installed, this option
will be disabledIt returns a sparse matrix containing pair-wise semantic similarity
between input genes. This sparse matrix can be converted to the full
matrix via the function as.matrix
For the mode "shortest_paths", the induced subgraph is the most concise, and thus informative for visualisation when there are many nodes in query, while the mode "all_paths" results in the complete subgraph.
# 1) load HPPA as igraph object ig.HPPA <-dRDataLoader(RData='ig.HPPA')'ig.HPPA' (from package 'dnet' version 1.1.2) has been loaded into the working environment (at 2018-01-19 12:34:39)g <- ig.HPPA # 2) load human genes annotated by HPPA org.Hs.egHPPA <- dRDataLoader(RData='org.Hs.egHPPA')'org.Hs.egHPPA' (from package 'dnet' version 1.1.2) has been loaded into the working environment (at 2018-01-19 12:34:39)# 3) prepare for ontology and its annotation information dag <- dDAGannotate(g, annotations=org.Hs.egHPPA, path.mode="all_paths", verbose=TRUE)At level 13, there are 5 nodes, and 12 incoming neighbors. At level 12, there are 17 nodes, and 27 incoming neighbors. At level 11, there are 50 nodes, and 65 incoming neighbors. At level 10, there are 144 nodes, and 145 incoming neighbors. At level 9, there are 332 nodes, and 282 incoming neighbors. At level 8, there are 518 nodes, and 374 incoming neighbors. At level 7, there are 625 nodes, and 389 incoming neighbors. At level 6, there are 710 nodes, and 382 incoming neighbors. At level 5, there are 587 nodes, and 232 incoming neighbors. At level 4, there are 297 nodes, and 91 incoming neighbors. At level 3, there are 105 nodes, and 23 incoming neighbors. At level 2, there are 23 nodes, and 1 incoming neighbors. At level 1, there are 1 nodes, and 0 incoming neighbors.# 4) calculate pair-wise semantic similarity between 5 randomly chosen genes allgenes <- unique(unlist(V(dag)$annotations)) genes <- sample(allgenes,5) sim <- dDAGgeneSim(g=dag, genes=genes, method.gene="BM.average", method.term="Resnik", parallel=FALSE, verbose=TRUE)Start at 2018-01-19 12:34:52 First, extract all annotatable genes (2018-01-19 12:34:52)... there are 5 input genes amongst 3226 annotatable genes Second, pre-compute semantic similarity between 168 terms (forced to be the most specific for each gene) using Resnik method (2018-01-19 12:34:54)... Last, calculate pair-wise semantic similarity between 5 genes using BM.average method (2018-01-19 12:34:56)... 1 out of 5 (2018-01-19 12:34:56) 2 out of 5 (2018-01-19 12:34:56) 3 out of 5 (2018-01-19 12:34:56) 4 out of 5 (2018-01-19 12:34:56) Finish at 2018-01-19 12:34:56 Runtime in total is: 4 secssim5 x 5 sparse Matrix of class "dgCMatrix" 25782 197131 25814 7277 58484 25782 . 0.6906510 0.35378600 0.3017337 0.22278271 197131 0.6906510 . 0.30233885 0.2934029 0.52274184 25814 0.3537860 0.3023389 . 0.9212131 0.07423735 7277 0.3017337 0.2934029 0.92121307 . . 58484 0.2227827 0.5227418 0.07423735 . .
dDAGgeneSim.r
dDAGgeneSim.Rd
dDAGgeneSim.pdf
dDAGtermSim
, dDAGinduce
,
dDAGtip
, dCheckParallel