`dNetPipeline`

is supposed to finish ab inito maximum-scoring
subgraph identification for the input graph with the node information
on the significance (p-value or fdr). It returns an object of class
"igraph" or "graphNEL".

dNetPipeline(g, pval, method = c("pdf", "cdf", "customised"), significance.threshold = NULL, nsize = NULL, plot = F, verbose = T)

- g
- an object of class "igraph" or "graphNEL"
- pval
- a vector containing input p-values (or fdr). For each element, it must have the name that could be mapped onto the input graph. Also, the names in input "pval" should contain all those in the input graph "g", but the reverse is not necessary
- method
- the method used for the transformation. It can be either "pdf" for the method based on the probability density function of the fitted model, or "cdf" for the method based on the cumulative distribution function of the fitted model
- significance.threshold
- the given significance threshold. By default, it is set to NULL, meaning there is no constraint. If given, those p-values below this are considered significant and thus scored positively. Instead, those p-values above this given significance threshold are considered insigificant and thus scored negatively
- nsize
- the desired number of nodes constrained to the resulting subgraph. It is not nulll, a wide range of significance thresholds will be scanned to find the optimal significance threshold leading to the desired number of nodes in the resulting subgraph. Notably, the given significance threshold will be overwritten by this option.
- plot
- logical to indicate whether the histogram plot, contour plot and scatter plot should be drawn. By default, it sets to false for no plotting
- verbose
- logical to indicate whether the messages will be displayed in the screen. By default, it sets to true for display

a subgraph with a maximum score, an object of class "igraph" or "graphNEL"

The pipeline sequentially consists of:

- ia) if the method is either "pdf" or "cdf",
`dBUMfit`

used to fit the p-value distribution under beta-uniform mixture model, and`dBUMscore`

used to calculate the scores according to the fitted BUM and the significance threshold. - ib) if the method is either "customised", then the user input
list of fdr (or p-values) and the significance threshold will be
directly used for score transformation by
`dFDRscore`

. - ii) if there is the desired number of nodes constrained to the resulting subgraph, a wide range of significance thresholds (including rough stage with large intervals, and finetune stage with smaller intervals) will be scanned to find the significance threshold to meet the desired number of nodes.
- iii)
`dNetFind`

used to find maximum-scoring subgraph from the input graph and scores imposed on its nodes.

# 1) generate an vector consisting of random values from beta distribution x <- rbeta(1000, shape1=0.5, shape2=1) names(x) <- as.character(1:length(x)) # 2) generate a random graph according to the ER model g <- erdos.renyi.game(1000, 1/100) # 3) produce the induced subgraph only based on the nodes in query subg <- dNetInduce(g, V(g), knn=0) # 4) find maximum-scoring subgraph based on the given significance threshold # 4a) assume the input is a list of p-values (controlling fdr=0.1) subgraph <- dNetPipeline(g=subg, pval=x, significance.threshold=0.1)Start at 2018-01-19 12:36:34First, fit the input p-value distribution under beta-uniform mixture model (2018-01-19 12:36:34)...A total of p-values: 1000Maximum Log-Likelihood: 252.6Mixture parameter (lambda): 0.000Shape parameter (a): 0.530Second, determine the significance threshold (2018-01-19 12:36:34)...significance threshold: 1.00e-01Third, calculate the scores according to the fitted BUM and FDR threshold (if any) (2018-01-19 12:36:34)...Amongst 1000 scores, there are 154 positives.Finally, find the subgraph from the input graph with 1000 nodes and 5039 edges (2018-01-19 12:36:34)...Size of the subgraph: 107 nodes and 113 edgesFinish at 2018-01-19 12:36:36Runtime in total is: 2 secs# 4b) assume the input is a list of customised significance (eg FDR directly) subgraph <- dNetPipeline(g=subg, pval=x, method="customised", significance.threshold=0.1)Start at 2018-01-19 12:36:36First, consider the input fdr (or p-value) distribution (2018-01-19 12:36:36)...Second, determine the significance threshold (2018-01-19 12:36:36)...significance threshold: 1.00e-01Third, calculate the scores according to the input fdr (or p-value) and the threshold (if any) (2018-01-19 12:36:36)...Amongst 1000 scores, there are 300 positives.Finally, find the subgraph from the input graph with 1000 nodes and 5039 edges (2018-01-19 12:36:36)...Size of the subgraph: 287 nodes and 433 edgesFinish at 2018-01-19 12:36:38Runtime in total is: 2 secs# 5) find maximum-scoring subgraph with the desired node number nsize=20 subgraph <- dNetPipeline(g=subg, pval=x, nsize=20)Start at 2018-01-19 12:36:38First, fit the input p-value distribution under beta-uniform mixture model (2018-01-19 12:36:38)...A total of p-values: 1000Maximum Log-Likelihood: 252.6Mixture parameter (lambda): 0.000Shape parameter (a): 0.530Second, determine the significance threshold (2018-01-19 12:36:38)...Via constraint on the size of subnetwork to be identified (20 nodes)Scanning significance threshold at rough stage (2018-01-19 12:36:38)...significance threshold: 1.00e-05, corresponding to the network size (0 nodes) (2018-01-19 12:36:38)significance threshold: 1.00e-04, corresponding to the network size (0 nodes) (2018-01-19 12:36:38)significance threshold: 1.00e-03, corresponding to the network size (0 nodes) (2018-01-19 12:36:38)significance threshold: 1.00e-02, corresponding to the network size (2 nodes) (2018-01-19 12:36:38)significance threshold: 1.00e-01, corresponding to the network size (107 nodes) (2018-01-19 12:36:39)Scanning significance threshold at finetuning stage (2018-01-19 12:36:39)...significance threshold: 1.50e-02, corresponding to the network size (2 nodes) (2018-01-19 12:36:40)significance threshold: 2.00e-02, corresponding to the network size (6 nodes) (2018-01-19 12:36:40)significance threshold: 2.50e-02, corresponding to the network size (9 nodes) (2018-01-19 12:36:41)significance threshold: 3.00e-02, corresponding to the network size (9 nodes) (2018-01-19 12:36:41)significance threshold: 3.50e-02, corresponding to the network size (11 nodes) (2018-01-19 12:36:42)significance threshold: 4.00e-02, corresponding to the network size (11 nodes) (2018-01-19 12:36:43)significance threshold: 4.50e-02, corresponding to the network size (10 nodes) (2018-01-19 12:36:43)significance threshold: 5.00e-02, corresponding to the network size (29 nodes) (2018-01-19 12:36:44)significance threshold: 5.00e-02Third, calculate the scores according to the fitted BUM and FDR threshold (if any) (2018-01-19 12:36:44)...Amongst 1000 scores, there are 72 positives.Finally, find the subgraph from the input graph with 1000 nodes and 5039 edges (2018-01-19 12:36:44)...Size of the subgraph: 29 nodes and 29 edgesFinish at 2018-01-19 12:36:45Runtime in total is: 7 secs

`dNetPipeline.r`

`dNetPipeline.Rd`

`dNetPipeline.pdf`

`dBUMfit`

, `dBUMscore`

,
`dFDRscore`

, `dNetFind`