mcmc.Rd 12.1 KB
Newer Older
Gilles Kratzer's avatar
Gilles Kratzer committed
1
% mcmcabn.Rd ---
Gilles Kratzer's avatar
Gilles Kratzer committed
2
% Author           : Gilles Kratzer
Gilles Kratzer's avatar
Gilles Kratzer committed
3
% Created on :       18.02.2019
Gilles Kratzer's avatar
Gilles Kratzer committed
4
5
6
% Last modification :
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Gilles Kratzer's avatar
Gilles Kratzer committed
7
8
\name{mcmcabn}
\alias{mcmcabn}
Gilles Kratzer's avatar
Gilles Kratzer committed
9
\title{Structural MCMC sampler for DAGs}
Gilles Kratzer's avatar
Gilles Kratzer committed
10
11

\usage{
Gilles Kratzer's avatar
Gilles Kratzer committed
12
13
14
15
mcmcabn(score.cache = NULL,
                 score = "mlik",
                 data.dists = NULL,
                 max.parents = 1,
Gilles Kratzer's avatar
Gilles Kratzer committed
16
                 mcmc.scheme = c(100,1000,1000),
Gilles Kratzer's avatar
Gilles Kratzer committed
17
18
19
                 seed = 42,
                 verbose = FALSE,
                 start.dag = NULL,
Gilles Kratzer's avatar
Gilles Kratzer committed
20
21
                 prior.dag = NULL,
                 prior.lambda = NULL,
Gilles Kratzer's avatar
Gilles Kratzer committed
22
23
                 prob.rev = 0.05,
                 prob.mbr = 0.05,
Gilles Kratzer's avatar
Gilles Kratzer committed
24
                 heating = 1,
Gilles Kratzer's avatar
Gilles Kratzer committed
25
                 prior.choice = 2)
Gilles Kratzer's avatar
Gilles Kratzer committed
26
27
28
                 }

\arguments{
29
  \item{score.cache}{output from \link[abn:build_score_cache]{buildscorecache} from the \code{abn} R package.}
Gilles Kratzer's avatar
Gilles Kratzer committed
30
  \item{score}{character giving which network score should be used to sample the DAGs landscape.}
Gilles Kratzer's avatar
typos    
Gilles Kratzer committed
31
\item{data.dists}{a named list giving the distribution for each node in the network, see details.}
Gilles Kratzer's avatar
Gilles Kratzer committed
32
33
34
35
  \item{max.parents}{a constant giving the maximum number of parents allowed.}
  \item{mcmc.scheme}{a sampling scheme. It is vector giving in that order: the number of returned DAGS, the number of thinned steps and length of the burn in phase.}
\item{seed}{a non-negative integer which sets the seed.}
\item{verbose}{extra output, see output for details.}
Gilles Kratzer's avatar
website    
Gilles Kratzer committed
36
  \item{start.dag}{a DAG given as a matrix, see details for format, which can be used to explicitly provide a starting point for the structural search. Alternatively character "random" will select a random DAG as starting point. Character "hc" will call a hill-climber to select a DAG as starting point.}
Gilles Kratzer's avatar
Gilles Kratzer committed
37
38
  \item{prior.dag}{user defined prior. It should be given as a matrix where entries range from zero to one. 0.5 is non-informative for the given arc.}
\item{prior.lambda}{hyper parameter representing the strength of belief in the user defined prior.}
Gilles Kratzer's avatar
Gilles Kratzer committed
39
\item{prob.rev}{probability of selecting a new edge reversal.}
Gilles Kratzer's avatar
Gilles Kratzer committed
40
\item{prob.mbr}{probability of selecting a Markov blanket resampling move.}
Gilles Kratzer's avatar
website    
Gilles Kratzer committed
41
\item{heating}{a real positive number that heats up the convergence if between zero and one and apply an exponential decrease scheme if larger than one. The default is one. See details}
Gilles Kratzer's avatar
typos    
Gilles Kratzer committed
42
  \item{prior.choice}{an integer, 1 or 2, where 1 is a uniform structural prior and 2 uses a weighted prior, see details.}
Gilles Kratzer's avatar
Gilles Kratzer committed
43
44
45
  }

\description{
Gilles Kratzer's avatar
typos    
Gilles Kratzer committed
46
This function is a structural Monte Carlo Markov Chain Model Choice (MC)^3 sampler that is equipped with two large scale MCMC moves that are purposed to accelerate chain mixing.
Gilles Kratzer's avatar
Gilles Kratzer committed
47
48
}

Gilles Kratzer's avatar
typos    
Gilles Kratzer committed
49
\details{The procedure runs a structural Monte Carlo Markov Chain Model Choice (MC)^3 to find the most probable posterior network (DAG). The default algorithm is based on three MCMC move: edge addition, edge deletion and edge reversal. This algorithm is known as the (MC)^3. It is known to mix slowly and getting stuck in low probability regions. Indeed, changing of Markov equivalence region often requires multiple MCMC moves. Then large scale MCMC moves are implemented. Their relative frequency can be set by the user. The new edge reversal move (REV) from Grzegorczyk and Husmeier (2008) and the Markov blanket resampling (MBR) from Su and Borsuk (2016). The classical reversal move depends on the global configuration of the parents and children and fails to propose MCMC jumps that produce valid but very different DAGs in a unique move. The REV move sample globally a new set of parent. The MBR workaround applies the same idea but to the entire Markov blanket of a randomly chosen node.
Gilles Kratzer's avatar
Gilles Kratzer committed
50

Gilles Kratzer's avatar
typos    
Gilles Kratzer committed
51
The classical (MC)^3 is unbiased but inefficient in mixing, the two radical MCMC alternative move are known to massively accelerate mixing without introducing biases. But those move are computationally expensive. Then low frequencies are advised. The REV move is not necessarily ergotic , then it should not be used alone.
Gilles Kratzer's avatar
Gilles Kratzer committed
52

Reinhard Furrer's avatar
details    
Reinhard Furrer committed
53
The parameter \code{start.dag} can be: \code{"random"}, \code{"hc"} or user defined. If user select \code{"random"} then a random valid DAG is selected. The routine used favourise low density structure. If \code{"hc"} (for Hill-climber: \link[abn:search_heuristic]{searchHeuristic} then a DAG is selected using 100 different searches with 500 optimization steps. A user defined DAG can be provided. It should be a named square matrix  containing only zeros and ones. The DAG should be valid (i.e. acyclic).
54

Reinhard Furrer's avatar
details    
Reinhard Furrer committed
55
The parameter \code{prior.choice} determines the prior used within each individual node for a given choice of parent combination. In Koivisto and Sood (2004) p.554 a form of prior is used which assumes that the prior probability for parent combinations comprising of the same number of parents are all equal. Specifically, that the prior probability for parent set G with cardinality |G| is proportional to 1/[n-1 choose |G|] where there are n total nodes. Note that this favours parent combinations with either very low or very high cardinality which may not be appropriate. This prior is used when \code{prior.choice=2}. When \code{prior.choice=1} an uninformative prior is used where parent combinations of all cardinalities are equally likely. When \code{prior.choice=3} a user defined prior is used, defined by \code{prior.dag}. It is given by an adjacency matrix (squared and same size as number of nodes) where entries ranging from zero to one give the user prior belief. An hyper parameter defining the global user belief in the prior is given by \code{prior.lambda}.
Gilles Kratzer's avatar
Gilles Kratzer committed
56

Reinhard Furrer's avatar
details    
Reinhard Furrer committed
57
MCMC sampler come with asymptotic statistical guarantees. Therefore it is highly advised to run multiple long enough chains. The burn in phase length (i.e throwing away first MCMC iterations) should be adequately chosen.
Gilles Kratzer's avatar
Gilles Kratzer committed
58

Reinhard Furrer's avatar
details    
Reinhard Furrer committed
59
The argument \code{data.dists} must be a list with named arguments, one for each of the variables in \code{data.df}, where each entry is either \code{"poisson"}, \code{"binomial"}, or \code{"gaussian"}.
Gilles Kratzer's avatar
website    
Gilles Kratzer committed
60
61

The parameter \code{heating} could improve convergence. It should be a real positive number. If smaller than one it is a tuning parameter which transforms the score by raising it to this power. One is neutral. The smaller the more probable to accept any move. If larger than one, it indicates the number of returned steps where an exponentially decrease heating scheme is applied. After this number of steps, the \code{heating} parameter is set to one.
Gilles Kratzer's avatar
Gilles Kratzer committed
62
63
}

Gilles Kratzer's avatar
website    
Gilles Kratzer committed
64
\value{A list with an entry for the list of sampled DAGs, the list of scores, the acceptance probability, the method used for each MCMC jump, the rejection status for each MCMC jump, the total number of iterations the thinning, the length of burn in phase, the named list of distribution per node and the heating parameter. The returned object is of class mcmcabn.}
Gilles Kratzer's avatar
Gilles Kratzer committed
65
66
67

\author{Gilles Kratzer}

Gilles Kratzer's avatar
Gilles Kratzer committed
68
69
70
\references{
For the implementation of the function:

Reinhard Furrer's avatar
details    
Reinhard Furrer committed
71
Kratzer, G., Furrer, R.  "Is a single unique Bayesian network enough to accurately represent your data?". arXiv preprint arXiv:1902.06641.
Gilles Kratzer's avatar
Gilles Kratzer committed
72
73

For the new edge reversal:
Gilles Kratzer's avatar
Gilles Kratzer committed
74

Reinhard Furrer's avatar
details    
Reinhard Furrer committed
75
Grzegorczyk, M., Husmeier, D. (2008). "Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move", Machine Learning, vol. 71(2-3), 265.
Gilles Kratzer's avatar
Gilles Kratzer committed
76

Gilles Kratzer's avatar
Gilles Kratzer committed
77
78
For the Markov Blanket resampling move:

Reinhard Furrer's avatar
details    
Reinhard Furrer committed
79
Su, C., Borsuk, M. E. (2016). "Improving structure MCMC for Bayesian networks through Markov blanket resampling", The Journal of Machine Learning Research, vol. 17(1), 4042-4061.
Gilles Kratzer's avatar
Gilles Kratzer committed
80

Gilles Kratzer's avatar
Gilles Kratzer committed
81
82
83
84
85
86
For the Koivisto prior:

Koivisto, M. V. (2004). Exact Structure Discovery in Bayesian Networks, Journal of Machine Learning Research, vol 5, 549-573.

For the user defined prior:

Reinhard Furrer's avatar
details    
Reinhard Furrer committed
87
Werhli, A. V.,  Husmeier, D. (2007). "Reconstructing gene regulatory networks with Bayesian networks by combining expression data with multiple sources of prior knowledge". Statistical Applications in Genetics and Molecular Biology, 6 (Article 15).
Gilles Kratzer's avatar
Gilles Kratzer committed
88

Reinhard Furrer's avatar
details    
Reinhard Furrer committed
89
Imoto, S., Higuchi, T., Goto, T., Tashiro, K., Kuhara, S., Miyano, S. (2003). Using Bayesian networks for estimating gene networks from microarrays and biological knowledge. In Proceedings of the European Conference on Computational Biology.
Gilles Kratzer's avatar
Gilles Kratzer committed
90

Gilles Kratzer's avatar
Gilles Kratzer committed
91
For the asia dataset:
Gilles Kratzer's avatar
Gilles Kratzer committed
92

Reinhard Furrer's avatar
details    
Reinhard Furrer committed
93
Scutari, M. (2010). Learning Bayesian Networks with the bnlearn R Package. Journal of Statistical Software, 35(3), 1-22. doi:http://dx.doi.org/10.18637/jss.v035.i03.
Gilles Kratzer's avatar
Gilles Kratzer committed
94
}
Gilles Kratzer's avatar
Gilles Kratzer committed
95

Gilles Kratzer's avatar
Gilles Kratzer committed
96

Gilles Kratzer's avatar
Gilles Kratzer committed
97
\examples{
Reinhard Furrer's avatar
details    
Reinhard Furrer committed
98
99
## Example from the asia dataset from Lauritzen and Spiegelhalter (1988)
## provided by Scutari (2010)
Gilles Kratzer's avatar
Gilles Kratzer committed
100

Reinhard Furrer's avatar
details    
Reinhard Furrer committed
101
# The number of MCMC run is delibaretelly chosen too small (computing time)
102
103
104
105
# no thinning (usually not recommended)
# no burn-in (usually not recommended,
# even if not supported by any theoretical arguments)

Gilles Kratzer's avatar
Gilles Kratzer committed
106
107
data("mcmc_run_asia")

Reinhard Furrer's avatar
details    
Reinhard Furrer committed
108
# Let us run: 0.03 REV, 0.03 MBR, 0.94 MC3 MCMC jumps
109
110
111
# with a random DAG as starting point

mcmc.out.asia.small <- mcmcabn(score.cache = bsc.compute.asia,
Gilles Kratzer's avatar
Gilles Kratzer committed
112
                  score = "mlik",
Gilles Kratzer's avatar
Gilles Kratzer committed
113
                  data.dists = dist.asia,
Gilles Kratzer's avatar
Gilles Kratzer committed
114
                  max.parents = 2,
115
                  mcmc.scheme = c(100,0,0),
Gilles Kratzer's avatar
Gilles Kratzer committed
116
                  seed = 321,
Gilles Kratzer's avatar
Gilles Kratzer committed
117
118
                  verbose = FALSE,
                  start.dag = "random",
Gilles Kratzer's avatar
Gilles Kratzer committed
119
120
121
                  prob.rev = 0.03,
                  prob.mbr = 0.03,
                  prior.choice = 2)
Gilles Kratzer's avatar
Gilles Kratzer committed
122

123
124
summary(mcmc.out.asia.small)

Gilles Kratzer's avatar
website    
Gilles Kratzer committed
125
# Soly with MC3 moves
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140

mcmc.out.asia.small <- mcmcabn(score.cache = bsc.compute.asia,
                  score = "mlik",
                  data.dists = dist.asia,
                  max.parents = 2,
                  mcmc.scheme = c(100,0,0),
                  seed = 42,
                  verbose = FALSE,
                  start.dag = "random",
                  prob.rev = 0,
                  prob.mbr = 0,
                  prior.choice = 2)

summary(mcmc.out.asia.small)

Reinhard Furrer's avatar
details    
Reinhard Furrer committed
141
# Defining a starting DAG
Gilles Kratzer's avatar
Gilles Kratzer committed
142
143
144
145
146
147
148
149
150
startDag <- matrix(data = c(0, 0, 0, 1, 0, 0, 0, 0,
                            0, 0, 1, 0, 0, 0, 0, 0,
                            1, 0, 0, 0, 0, 0, 0, 0,
                            0, 0, 0, 0, 0, 1, 0, 0,
                            0, 0, 0, 0, 0, 0, 0, 0,
                            0, 0, 0, 0, 0, 0, 0, 0,
                            0, 0, 0, 0, 0, 0, 0, 0,
                            0, 0, 0, 0, 0, 0, 0, 0),nrow = 8,ncol = 8, byrow = TRUE)

151
152
colnames(startDag) <- rownames(startDag) <- names(dist.asia)

Gilles Kratzer's avatar
Gilles Kratzer committed
153
# Additionally, let us use the non informative prior
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
mcmc.out.asia.small <- mcmcabn(score.cache = bsc.compute.asia,
                  score = "mlik",
                  data.dists = dist.asia,
                  max.parents = 2,
                  mcmc.scheme = c(100,0,0),
                  seed = 42,
                  verbose = FALSE,
                  start.dag = startDag,
                  prob.rev = 0,
                  prob.mbr = 0,
                  prior.choice = 1)

summary(mcmc.out.asia.small)

# let us define our very own prior
# we know that there should be a link between Smoking and LungCancer nodes

Gilles Kratzer's avatar
Gilles Kratzer committed
171
172
# uninformative prior matrix
priorDag <- matrix(data = 0.5,nrow = 8,ncol = 8)
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
# name it
colnames(priorDag) <- rownames(priorDag) <- names(dist.asia)
# parent = smoking; child = LungCancer
priorDag["LungCancer","Smoking"] <- 1

mcmc.out.asia.small <- mcmcabn(score.cache = bsc.compute.asia,
                               score = "mlik",
                               data.dists = dist.asia,
                               max.parents = 2,
                               mcmc.scheme = c(100,0,0),
                               seed = 42,
                               verbose = FALSE,
                               start.dag = startDag,
                               prob.rev = 0,
                               prob.mbr = 0,
                               prior.choice = 3,
                               prior.dag = priorDag)

summary(mcmc.out.asia.small)
Gilles Kratzer's avatar
website    
Gilles Kratzer committed
192
193
194
195
196
197
198
199

# let us improve convergence rate. The 20 first MCMC moves are performed with an
# heating parameter different than one, afterwards, heating is set up to one.

mcmc.out.asia.small <- mcmcabn(score.cache = bsc.compute.asia,
                  score = "mlik",
                  data.dists = dist.asia,
                  max.parents = 2,
200
201
                  mcmc.scheme = c(150,0,0),
                  seed = 41242,
Gilles Kratzer's avatar
website    
Gilles Kratzer committed
202
203
204
205
206
207
                  verbose = FALSE,
                  start.dag = "random",
                  prob.rev = 0.03,
                  prob.mbr = 0.03,
                  prior.choice = 2, heating = 20)

Reinhard Furrer's avatar
details    
Reinhard Furrer committed
208
summary(mcmc.out.asia.small)
Gilles Kratzer's avatar
website    
Gilles Kratzer committed
209

210
}