Title: | Tree Branches Evaluated Statistically for Tightness |
---|---|
Description: | Our method introduces mathematically well-defined measures for tightness of branches in a hierarchical tree. Statistical significance of the findings is determined, for all branches of the tree, by performing permutation tests, optionally with generalized Pareto p-value estimation. |
Authors: | Guoli Sun, Alex Krasnitz |
Maintainer: | Guoli Sun <[email protected]> |
License: | GPL-2 |
Version: | 5.2 |
Built: | 2024-11-24 03:23:14 UTC |
Source: | https://github.com/cran/TBEST |
Description: This object is a list of three items. It contains a statistical assessment of the tightness of branches in a hierarchical tree.
Call |
An object of class Call, specifying the parameters used. |
data |
A matrix from which the distance matrix used for growing the tree is computed, with the rows corresponding to the items being clustered. |
indextable |
If measure of tightness is not |
Guoli Sun, Alex Krasnitz
## Not run: data(leukemia) mytable<-SigTree(data.matrix(leukemia),mystat="all", mymethod="ward",mymetric="euclidean",rand.fun="shuffle.column", distrib="Rparallel",njobs=2,Ptail=TRUE,tailmethod="ML") class(mytable) names(mytable) mytable<-SigTree(data.matrix(leukemia),mystat="slb", mymethod="ward",mymetric="euclidean",rand.fun="shuffle.column", distrib="Rparallel",njobs=2,Ptail=FALSE) class(mytable) names(mytable) ## End(Not run)
## Not run: data(leukemia) mytable<-SigTree(data.matrix(leukemia),mystat="all", mymethod="ward",mymetric="euclidean",rand.fun="shuffle.column", distrib="Rparallel",njobs=2,Ptail=TRUE,tailmethod="ML") class(mytable) names(mytable) mytable<-SigTree(data.matrix(leukemia),mystat="slb", mymethod="ward",mymetric="euclidean",rand.fun="shuffle.column", distrib="Rparallel",njobs=2,Ptail=FALSE) class(mytable) names(mytable) ## End(Not run)
Description: find the names of all items comprising one or more branches of a hierarchical tree.
LeafContent(myinput, mynode=NA)
LeafContent(myinput, mynode=NA)
myinput |
|
mynode |
An integer vector of the numbers of branches whose leaf content is desired. The |
A list of items, of the same length as mynode
. Each item corresponds to a branch listed in myneode
and is a character vector containing the names of the leaves in the branch.
Guoli Sun, Alex Krasnitz
data(leukemia) hc<-hclust(dist(data.matrix(leukemia)),"ward") #find the name of leaf 29 LeafContent(hc,mynode=c(-29)) #find the name of leaf 29 and leaves belonging to node 29 LeafContent(hc,mynode=c(-29,29)) ## Not run: mytable<-SigTree(data.matrix(leukemia),mystat="fldc", mymethod="ward",mymetric="euclidean",rand.fun="shuffle.column", distrib="Rparallel",njobs=2,Ptail=TRUE,tailmethod="ML") LeafContent(mytable,mynode=c(-29,29)) mypartition<-PartitionTree(x=mytable,siglevel=0.001,statname="fldc",sigtype="raw") LeafContent(mypartition) ## End(Not run)
data(leukemia) hc<-hclust(dist(data.matrix(leukemia)),"ward") #find the name of leaf 29 LeafContent(hc,mynode=c(-29)) #find the name of leaf 29 and leaves belonging to node 29 LeafContent(hc,mynode=c(-29,29)) ## Not run: mytable<-SigTree(data.matrix(leukemia),mystat="fldc", mymethod="ward",mymetric="euclidean",rand.fun="shuffle.column", distrib="Rparallel",njobs=2,Ptail=TRUE,tailmethod="ML") LeafContent(mytable,mynode=c(-29,29)) mypartition<-PartitionTree(x=mytable,siglevel=0.001,statname="fldc",sigtype="raw") LeafContent(mypartition) ## End(Not run)
This data set represents mRNA expression of 500 genes in 38 patient cases of leukemia. These 38 cases fall into 3 subtypes: AML (11), T-lineage ALL (8) and B-lineage ALL (19). The set was obtained by removing 499 genes from Golub's leukemia data, to facilitate the execution of examples for this package.
data(leukemia)
data(leukemia)
A data frame with 38 observations (rows) of 500 variables (columns).
Bone marrow samples obtained from acute leukemia patients at the time of diagnosis.
http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi
T.R. Golub, D.K. Slonim et al(1999) Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression;
Stefano Monti, Pablo Tamayo, Jill Mesirov, and Todd Golub(2003) Consensus Clustering: A resampling-based method for class discovery and visualization of gene expression microarray data
data(leukemia) dim(leukemia)
data(leukemia) dim(leukemia)
Description: This object is a list of four items, which jointly specify a detailed partition of a hierarchical tree into tight branches.
Call |
An object of class Call, specifying the function call which generated the list. |
best |
An object of class "best", see |
sigvalue |
A two-column matrix, with one row per each internal node of the tree. The first column enumerates the nodes. The second column profides the significance estimate for the tightness of the node. |
partition |
A two-column data frame specifying the partitition. The first column is a character vector with the names of the leaves. The second column provides the number of the part to which the leaf belongs. |
Guoli Sun, Alex Krasnitz
## Not run: data(leukemia) mytable<-SigTree(data.matrix(leukemia),mystat="all", mymethod="ward",mymetric="euclidean",rand.fun="shuffle.column", distrib="Rparallel",njobs=2,Ptail=TRUE,tailmethod="ML") class(mytable) mypartition<-PartitionTree(x=mytable,siglevel=0.001,statname="fldc", sigtype="raw") class(mypartition) names(mypartition) ## End(Not run)
## Not run: data(leukemia) mytable<-SigTree(data.matrix(leukemia),mystat="all", mymethod="ward",mymetric="euclidean",rand.fun="shuffle.column", distrib="Rparallel",njobs=2,Ptail=TRUE,tailmethod="ML") class(mytable) mypartition<-PartitionTree(x=mytable,siglevel=0.001,statname="fldc", sigtype="raw") class(mypartition) names(mypartition) ## End(Not run)
Description: The function finds the most detailed partition of a hierartchical tree into tight branches, given a level of significance for tightness.
PartitionTree(x,siglevel=0.05,statname="fldc", sigtype=c("raw","corrected","fdr"))
PartitionTree(x,siglevel=0.05,statname="fldc", sigtype=c("raw","corrected","fdr"))
x |
An object of class |
siglevel |
Threshold of significance for tightness of branches. Default is 0.05. |
statname |
A character string specifying the name of measure of tighness whose is significance is to be used for partition. The choices are |
sigtype |
A character string specifying how the significance threshold |
An object of class partition
. See ?partition
for details.
Guoli Sun, Alex Krasnitz
## Not run: data(leukemia) mytable<-SigTree(data.matrix(leukemia),mystat="all", mymethod="ward",mymetric="euclidean",rand.fun="shuffle.column", distrib="Rparallel",njobs=2,Ptail=TRUE,tailmethod="ML") mypartition<-PartitionTree(x=mytable,siglevel=0.001,statname="fldc", sigtype="raw") partition1<-mypartition$partition sigmatrix1<-mypartition$sigvalue fix(partition1) fix(sigmatrix1) ## End(Not run)
## Not run: data(leukemia) mytable<-SigTree(data.matrix(leukemia),mystat="all", mymethod="ward",mymetric="euclidean",rand.fun="shuffle.column", distrib="Rparallel",njobs=2,Ptail=TRUE,tailmethod="ML") mypartition<-PartitionTree(x=mytable,siglevel=0.001,statname="fldc", sigtype="raw") partition1<-mypartition$partition sigmatrix1<-mypartition$sigvalue fix(partition1) fix(sigmatrix1) ## End(Not run)
Description: A plot method for the class best
.
## S3 method for class 'best' plot(x,mystat="fldc",siglevel=0.05,sigtype=c("raw","corrected","fdr"), partition=NA,print.num=TRUE,print.lab=TRUE,float=0.01,col.best=c(2,3), cex.best=0.8,cex.leaf=0.8,font.best=NULL,main=NULL,sub=NULL,xlab=NULL, metric.args=list(),...)
## S3 method for class 'best' plot(x,mystat="fldc",siglevel=0.05,sigtype=c("raw","corrected","fdr"), partition=NA,print.num=TRUE,print.lab=TRUE,float=0.01,col.best=c(2,3), cex.best=0.8,cex.leaf=0.8,font.best=NULL,main=NULL,sub=NULL,xlab=NULL, metric.args=list(),...)
x |
An object of class |
mystat |
A measure of tightness for which p-values are to be shown in the plot. Default is |
siglevel |
A threshold level of significance for tightness of branches used when |
sigtype |
A character string specifying how the significance threshold |
partition |
An object of class |
print.num |
Logical. If true, the branch numbers will be indicated. |
print.lab |
Logical. If true, the labels will be displayed at the bottom of dendrogram. |
float |
A numeric value that can change the vertical location of pvalues. |
col.best |
A character vector of length 2, indicating the colors to be used for the p-values and for the numbers of the nodes. |
cex.best |
A numeric value for the text size of the branch labels. |
cex.leaf |
A numeric value for the text size of the leaf labels. |
font.best |
An integer which specifies font choice of text on the plot. See |
main |
A character string specifying the title of the plot. |
sub |
A character string specifying a subtitle of the plot. |
xlab |
A caracter string specifying the label of horizontal axis. |
metric.args |
Additional argument from user supplied dissimilariity(distance) function. See |
... |
Further arguments to be passed on to the |
The function plots a dendrogram of the hierarchical tree as specified by the x
argument, an object of class "best"
. When argument partition
is set to an object of class "partition"
, and a partition does exist (see partition
for description), this plot provides the significance estimates for the nodes that form the partition. Otherwise, this function puts legends on all tight nodes with significance estimates no more than siglevel
. To obtain the leaves descending from a given node, refer to function LeafContent
.
A plot with all branch numbers and significant pvalues in the hierarchical tree.
Guoli Sun, Alex Krasnitz
SigTree
, PartitionTree
,best
,partition
## Not run: data(leukemia) mytable<-SigTree(data.matrix(leukemia),mystat="all", mymethod="ward",mymetric="euclidean",rand.fun="shuffle.column", distrib="Rparallel",njobs=2,Ptail=TRUE,tailmethod="ML") plot(x=mytable,mystat="fldc",siglevel=0.001,sigtype="raw",hang=-1) mypartition<-PartitionTree(x=mytable,siglevel=0.001,statname="fldc", sigtype="raw") plot(x=mytable,mystat="fldc",partition=mypartition) plot(x=mytable,mystat="fldc",partition=mypartition,print.num=F) #with user-defined functions mydist<-function(x,y){return(dist(x)/y)} myrand<-function(x,z){return(apply(x+z,2,sample))} mytable<-SigTree(data.matrix(leukemia),mystat="fldc", mymethod="ward",mymetric="mydist",rand.fun="myrand", distrib="Rparallel",njobs=2,Ptail=TRUE,tailmethod="MOM",metric.args=list(3), rand.args=list(2)) plot(mytable,metric.args=list(3)) plot(mytable,metric.args=list(3),cex.leaf=1.5) ## End(Not run)
## Not run: data(leukemia) mytable<-SigTree(data.matrix(leukemia),mystat="all", mymethod="ward",mymetric="euclidean",rand.fun="shuffle.column", distrib="Rparallel",njobs=2,Ptail=TRUE,tailmethod="ML") plot(x=mytable,mystat="fldc",siglevel=0.001,sigtype="raw",hang=-1) mypartition<-PartitionTree(x=mytable,siglevel=0.001,statname="fldc", sigtype="raw") plot(x=mytable,mystat="fldc",partition=mypartition) plot(x=mytable,mystat="fldc",partition=mypartition,print.num=F) #with user-defined functions mydist<-function(x,y){return(dist(x)/y)} myrand<-function(x,z){return(apply(x+z,2,sample))} mytable<-SigTree(data.matrix(leukemia),mystat="fldc", mymethod="ward",mymetric="mydist",rand.fun="myrand", distrib="Rparallel",njobs=2,Ptail=TRUE,tailmethod="MOM",metric.args=list(3), rand.args=list(2)) plot(mytable,metric.args=list(3)) plot(mytable,metric.args=list(3),cex.leaf=1.5) ## End(Not run)
Description: Given data from which a hierarchical tree is grown, compute measures of tightness for each branch, sample from the null distribution of these measures in the randomized data and compute the corresponding p-values.
SigTree(myinput,mystat=c("all","fldc","bldc","fldcc","slb"), mymethod="complete",mymetric="euclidean",rand.fun=NA, by.block=NA,distrib=c("vanilla","Rparallel"),Ptail=TRUE, tailmethod=c("ML","MOM"),njobs=1,seed=NA, Nperm=ifelse(Ptail,1000,1000*nrow(myinput)), metric.args=list(),rand.args=list())
SigTree(myinput,mystat=c("all","fldc","bldc","fldcc","slb"), mymethod="complete",mymetric="euclidean",rand.fun=NA, by.block=NA,distrib=c("vanilla","Rparallel"),Ptail=TRUE, tailmethod=c("ML","MOM"),njobs=1,seed=NA, Nperm=ifelse(Ptail,1000,1000*nrow(myinput)), metric.args=list(),rand.args=list())
myinput |
A matrix with rows corresponding to items to be clustered. |
mystat |
A character string specifying the measures of tightness to be computed and evaluated for significance of finding. See Details for the definitions of these measures. If |
mymethod |
A character string specifying the linkage method for hierarchical clustering, to be used by the |
mymetric |
A character string specifying the definition of dissimilarity (distance) among the data items. The options, in addition to those for the argument |
rand.fun |
A character string specifying the permutation method to be applied to |
by.block |
A vector of the same length as the column dimension of |
distrib |
One of |
Ptail |
Logical. If |
tailmethod |
A character string only needed to be specified if the |
njobs |
A single integer specifying the number of worker jobs to create in case of distributed computation if |
seed |
An optional single integer value, to be used to set the random number generator seed (see |
Nperm |
A single integer specifying the size of a sample from the null distribution. See |
metric.args |
Additional arguments for user-supplied dissimilarity (distance) function. See |
rand.args |
Additional arguments for user-supplied randomization function. See |
When rand.fun
is set to the name of a user supplied randomization function, the first argument of that function should be set to myinput
. See examples below.
The measures of tightness are defined as follows. Denote a node in the tree by a
, its sibling node by b
, and their parent node by p
. Let their respective geights be ha
,hb
,hp
. Finally, let Sx
mean that the measure S
is computed for the node x
. Then the definitions are
fldc:
Sa = (hp-ha)/hp
fldcc:
Sa = (hp-(ha-hb)/2)/ha
bldc:
Sp = (2*hp-ha-hb)/(2*hp)
slb:
Sp = 2*hp-ha-hb
The first three measures test tightnss of all internal nodes at the same time, while slb
only tests two-way split of input data.
The seed
argument is optional. Setting the seed ensures reproducibility of sampling from the null distribution.
If rand.fun
is set to NA, the function returns a matrix whose rows correspond to the internal nodes of the tree and whose columns contain the tree structure as in the merge
component of the class hclust
; the height
component of hclust
;and columns tabulating the values of the measures of tightness specified by the mystat
argument.
If rand.fun
is set to a specific randomization method, an object of class best
is returned. See ?best
for details.
If mymetric
or rand.fun
is a customized function, make sure you have read and write permission for your working directory.
Guoli Sun, Alex Krasnitz
Theo A. Knijnenburg, Lodewyk F. A. Wessels et al (2009) Fewer permutations, more accurate P-values
####Each column is a gene expression profile for a case of leukemia. ####Each case belongs to one of three subtypes. data(leukemia) #output only statistic table mytable<-SigTree(data.matrix(leukemia),mystat="all", mymethod="ward",mymetric="euclidean") class(mytable) ## Not run: #use multicore processing to detect significant sub-clusters mytable<-SigTree(data.matrix(leukemia),mystat="all", mymethod="ward",mymetric="euclidean",rand.fun="shuffle.column", distrib="Rparallel",njobs=2,Ptail=TRUE,tailmethod="ML") class(mytable) ####Each row after the 1st describes an item belonging to one of four subtypes. ####Each column corresponds to a genomic location in one of 22 human chromosomes. ####The 1st row contains the chromosome numbers. data(T10) #Perform randomization within each chromosome chrom<-as.numeric(T10[1,]) mydata<-T10[-1,] mytable<-SigTree(data.matrix(mydata),mystat="fldc", mymethod="ward",mymetric="euclidean",rand.fun="shuffle.block", by.block=chrom,distrib="Rparallel",njobs=2,Ptail=TRUE,tailmethod="ML") #Compute dissimilarity using a user-supplied distance function, #and perform randomization using a user-supplied randomization function, #with additional arguments. #Both user-supplied functions are only useful as illustration. mydist<-function(x,y){return(dist(x)/y)} myrand<-function(x,z){return(apply(x+z,2,sample))} mytable<-SigTree(data.matrix(leukemia),mystat="fldc", mymethod="ward",mymetric="mydist",rand.fun="myrand", distrib="Rparallel",njobs=2,Ptail=TRUE,tailmethod="MOM",metric.args=list(3), rand.args=list(2)) ## End(Not run)
####Each column is a gene expression profile for a case of leukemia. ####Each case belongs to one of three subtypes. data(leukemia) #output only statistic table mytable<-SigTree(data.matrix(leukemia),mystat="all", mymethod="ward",mymetric="euclidean") class(mytable) ## Not run: #use multicore processing to detect significant sub-clusters mytable<-SigTree(data.matrix(leukemia),mystat="all", mymethod="ward",mymetric="euclidean",rand.fun="shuffle.column", distrib="Rparallel",njobs=2,Ptail=TRUE,tailmethod="ML") class(mytable) ####Each row after the 1st describes an item belonging to one of four subtypes. ####Each column corresponds to a genomic location in one of 22 human chromosomes. ####The 1st row contains the chromosome numbers. data(T10) #Perform randomization within each chromosome chrom<-as.numeric(T10[1,]) mydata<-T10[-1,] mytable<-SigTree(data.matrix(mydata),mystat="fldc", mymethod="ward",mymetric="euclidean",rand.fun="shuffle.block", by.block=chrom,distrib="Rparallel",njobs=2,Ptail=TRUE,tailmethod="ML") #Compute dissimilarity using a user-supplied distance function, #and perform randomization using a user-supplied randomization function, #with additional arguments. #Both user-supplied functions are only useful as illustration. mydist<-function(x,y){return(dist(x)/y)} myrand<-function(x,z){return(apply(x+z,2,sample))} mytable<-SigTree(data.matrix(leukemia),mystat="fldc", mymethod="ward",mymetric="mydist",rand.fun="myrand", distrib="Rparallel",njobs=2,Ptail=TRUE,tailmethod="MOM",metric.args=list(3), rand.args=list(2)) ## End(Not run)
This data set summarizes DNA copy number variation in 100 individual cancer cells harvested from a breast tumor. The cells belong to four subtypes, differing by ploidy.There are 47 Diploid+Pseudo-diploid, 24 Hypo-diploid, 4 Aneuploid B and 25 Aneuploid A cells. Their copy number profiles are summarized in terms of 354 amplification and deletion "cores", are computed by the CORE package.
data(T10)
data(T10)
A data frame with 101 rows and 354 columns. Each column corresponds to a core. The first row is integer and contains the chromosome number for each core. The remaining rows are numeric, with values between 0 and 1, and each represents a DNA copy number profile of a cell.
Please remove the first row before computing the distance matrix.
Alexander Krasnitz, Guoli Sun, Peter Andrews, and Michael Wigler(2013) Target inference from collections of genomic intervals
Alexander Krasnitz, Guoli Sun, Peter Andrews, and Michael Wigler(2013) Target inference from collections of genomic intervals
data(T10) dim(T10)
data(T10) dim(T10)