Commit 16eece2f authored by Gilles Kratzer's avatar Gilles Kratzer
Browse files

update citation

parent 622d0b4b
Pipeline #3349 passed with stage
in 6 seconds
......@@ -42,6 +42,8 @@ ___
- 01/07/2019 - mcmcabn 0.2 available on CRAN
- 02/03/2020 - mcmcabn 0.3 available on CRAN. New peer reviewed article [Bayesian Network Modeling Applied to Feline Calicivirus Infection Among Cats in Switzerland](https://www.frontiersin.org/articles/10.3389/fvets.2020.00073/full) in Front. Vet. Sci.
____
**`mcmcabn` is developed and maintained by [Gilles Kratzer](https://gilleskratzer.netlify.com/) and [Prof. Dr. Reinhard Furrer](https://user.math.uzh.ch/furrer/) from
......
citHeader("To cite `mcmcabn` package in publication use:")
citHeader("To cite `mcmcabn` package in publication use select the most appropriate among:")
citEntry(entry = "Article",
title = "Bayesian Network Modeling Applied to Feline Calicivirus Infection Among Cats in Switzerland",
author = personList(as.person("Gilles Kratzer"),
as.person("Fraser I. Lewis"),as.person("Barbara Willi"),as.person("Marina L. Meli"),as.person("Felicitas S. Boretti"),as.person("Regina Hofmann-Lehmann"),as.person("Paul Torgerson"), as.person("Reinhard Furrer"), as.person("Sonja Hartnack")),
year = "2020",
journal ="Front. Vet. Sci. 7:73. doi: 10.3389/fvets.2020.00073",
textVersion =
paste("Kratzer G, Lewis FI, Willi B, Meli ML, Boretti FS, Hofmann-Lehmann R, Torgerson P, Furrer R and Hartnack S (2020) Bayesian Network Modeling Applied to Feline Calicivirus Infection Among Cats in Switzerland. Front. Vet. Sci. 7:73. doi: 10.3389/fvets.2020.00073"),
)
citEntry(entry = "Article",
title = "Is a single unique Bayesian network enough to accurately represent your data?",
......
This diff is collapsed.
# Overview
This directory contains the necesary code to reproduce the results from the article: "Kratzer G, Lewis FI, Willi B, Meli ML, Boretti FS, Hofmann-Lehmann R, Torgerson P, Furrer R and Hartnack S (2020) Bayesian Network Modeling Applied to Feline Calicivirus Infection Among Cats in Switzerland. Front. Vet. Sci. 7:73. doi: 10.3389/fvets.2020.00073".
Note that the data are stored in the `abn` R package.
## Files in the directory
* `README.md` this file
* `FCV-analysis.R` the R code.
Author: Gilles Kratzer
Contact: gilles.kratzer@math.uzh.ch
......@@ -34,25 +34,25 @@ CoupledHeatedmcmcabn(score.cache = NULL,
\item{mcmc.scheme}{a sampling scheme. It is vector giving in that order: the number of returned DAGS, the number of thinned steps and length of the burn-in phase.}
\item{seed}{a non-negative integer which sets the seed.}
\item{verbose}{extra output, see output for details.}
\item{start.dag}{a DAG given as a matrix, see details for format, which can be used to provide a starting point for the structural search explicitly. Alternatively, character "random" will select a random DAG as a starting point. Character "hc" will call a hill-climber to select a DAG as a starting point.}
\item{start.dag}{a DAG given as a matrix, see details for format, which can be used to provide a starting point for the structural search explicitly. Alternatively, the character "random" will select a random DAG as a starting point. Character "hc" will call a hill-climber to select a DAG as a starting point.}
\item{prior.dag}{user defined prior. It should be given as a matrix where entries range from zero to one. 0.5 is non-informative for the given arc.}
\item{prior.lambda}{hyper parameter representing the strength of belief in the user-defined prior.}
\item{prob.rev}{probability of selecting a new edge reversal.}
\item{prob.mbr}{probability of selecting a Markov blanket resampling move.}
\item{heating}{a real positive number that heats up the chains. The default is one. See details}
\item{heating}{a real positive number that heats the chains. The default is one. See details}
\item{n.chains}{Number of chains to be run, see details.}
\item{prior.choice}{an integer, 1 or 2, where 1 is a uniform structural prior and 2 uses a weighted prior, see details.}
}
\description{
This function is a coupled heated structural Monte Carlo Markov Chain sampler that is equipped with two large scale MCMC moves and parallel tempering that are purposed to synergitically accelerate chain mixing.
This function is a coupled heated structural Monte Carlo Markov Chain sampler that is equipped with two large scale MCMC moves and parallel tempering that are purposed to synergistically accelerate chain mixing.
}
\details{The procedure runs a coupled heated structural Monte Carlo Markov Chain to find the most probable posterior network (DAG). The default algorithm is based on three MCMC moves in a parallel tempering scheme: edge addition, edge deletion, and edge reversal. This algorithm is known as the (MC)^3. It is known to mix slowly and getting stuck in low probability regions. Indeed, changing of Markov equivalence region often requires multiple MCMC moves. Then large scale MCMC moves are implemented. The user can set the relative frequencies. The new edge reversal move (REV) from Grzegorczyk and Husmeier (2008) and the Markov blanket resampling (MBR) from Su and Borsuk (2016). The classical reversal move depends on the global configuration of the parents and children and fails to propose MCMC jumps that produce valid but very different DAGs in a unique move. The REV move sample globally a new set of parents. The MBR workaround applies the same idea but to the entire Markov blanket of a randomly chosen node.
The classical (MC)^3 is unbiased but inefficient in mixing. The two radical MCMC alternative moves are known to accelerate mixing without introducing biases. Those MCMC moves are computationally expensive. Then low frequencies are advised. The REV move is not necessarily ergotic, then it should not be used alone.
The classical (MC)^3 is unbiased but inefficient in mixing. The two radical MCMC alternative moves are known to accelerate mixing without introducing biases. Those MCMC moves are computationally expensive. Then low frequencies are advised. The REV move is not necessarily ergotic. Then it should not be used alone.
The parallel tempering scheme has been proposed by (Geyer, 1991). The idea is to run multiple MCMC chains, at different temperature. This will flatten the posterior probability distribution and then making jumps across low probability regions more probable. At each iteration, a swap between two randomly chosen chains is evaluated through a Metropolis like probability. THe temperature is sequentially increased in the chains folowwing
The parallel tempering scheme has been proposed by (Geyer, 1991). The idea is to run multiple MCMC chains at different temperatures. This will flatten the posterior probability distribution and then making jumps across low probability regions more probable. At each iteration, a swap between two randomly chosen chains is evaluated through a Metropolis like probability. The temperature is sequentially increased in the chains following
......@@ -60,7 +60,7 @@ The parameter \code{start.dag} can be: \code{"random"}, \code{"hc"} or user defi
The parameter \code{prior.choice} determines the prior used within each node for a given choice of parent combination. In Koivisto and Sood (2004) p.554, a form of prior is used, which assumes that the prior probability for parent combinations comprising of the same number of parents are all equal. Specifically, that the prior probability for parent set G with cardinality |G| is proportional to 1/[n-1 choose |G|] where there are n total nodes. Note that this favors parent combinations with either very low or very high cardinality, which may not be appropriate. This prior is used when \code{prior.choice=2}. When \code{prior.choice=1} an uninformative prior is used where parent combinations of all cardinalities are equally likely. When \code{prior.choice=3} a user-defined prior is used, defined by \code{prior.dag}. It is given by an adjacency matrix (squared and same size as number of nodes) where entries ranging from zero to one give the user prior belief. An hyperparameter defining the global user belief in the prior is given by \code{prior.lambda}.
MCMC sampler comes with asymptotic statistical guarantees. Therefore it is highly advised to run multiple long enough chains. The burn-in phase length (i.e. throwing away first MCMC iterations) should be adequately chosen.
MCMC sampler comes with asymptotic statistical guarantees. Therefore it is highly advised to run multiple long enough chains. The burn-in phase length (i.e., throwing away first MCMC iterations) should be adequately chosen.
The argument \code{data.dists} must be a list with named arguments, one for each of the variables in \code{data.df}, where each entry is either \code{"poisson"}, \code{"binomial"}, or \code{"gaussian"}.
......@@ -92,7 +92,7 @@ For the Koivisto prior:
Koivisto, M. V. (2004). Exact Structure Discovery in Bayesian Networks, Journal of Machine Learning Research, vol 5, 549-573.
For the user defined prior:
For the user-defined prior:
Werhli, A. V., Husmeier, D. (2007). "Reconstructing gene regulatory networks with Bayesian networks by combining expression data with multiple sources of prior knowledge". Statistical Applications in Genetics and Molecular Biology, 6 (Article 15).
......@@ -108,7 +108,7 @@ Scutari, M. (2010). Learning Bayesian Networks with the bnlearn R Package. Journ
## Example from the asia dataset from Lauritzen and Spiegelhalter (1988)
## provided by Scutari (2010)
# The number of MCMC run is deliberately chosen too small (computing time)
# The number of MCMC run is deliberately chosen too short (computing time)
# no thinning (usually not recommended)
# no burn-in (usually not recommended,
# even if not supported by any theoretical arguments)
......
......@@ -16,7 +16,7 @@
\examples{
## This data set was generated using the following code:
library("bnlearn") # for the dataset
data(asia, package='bnlearn') # for the dataset
library("abn") # for the cache of score function
# Renaming columns of the dataset
......
......@@ -9,14 +9,14 @@
\usage{dist.asia}
\format{
The data contains a cache of pre-computed scores with a maximum of two parents per node.
The dataset contains a list of named distribution used to analyse the asia dataset.
\itemize{
\item \code{dist.asia}: a named list giving the distribution for each node in the network.
}}
\examples{
## This data set was generated using the following code:
library("bnlearn") # for the dataset
data(asia, package='bnlearn') # for the dataset
# Renaming columns of the dataset
colnames(asia) <- c("Asia",
......
......@@ -17,7 +17,7 @@
\examples{
\dontrun{
## This data set was generated using the following code:
library(bnlearn) #for the dataset
data(asia, package='bnlearn') #for the dataset
library(abn) #for the cache of scores computing function
mcmc.out.asia <- mcmcabn(score.cache = bsc.compute.asia,
......
......@@ -21,7 +21,7 @@
\examples{
\dontrun{
## This data set was generated using the following code:
library(bnlearn) #for the dataset
data(asia, package='bnlearn') #for the dataset
library(abn) #for the cache of score function
#renaming columns of the dataset
......
\name{. mcmcabn .}
\alias{overview}
\alias{mcmcabn}
\alias{mcmcabn-package}
\docType{package}
\title{mcmcabn Package}
\description{\code{mcmcabn} is a structural MCMC sampler for Directed Acyclic Graphs (DAGs). The three main problems that can be addressed by this R package are selecting the most probable structure based on a cache of pre-computed scores, controlling for overfitting and sampling the landscape of high scoring structures. It allows to quantify the marginal impact of relationships of interest by marginalising out over structures or nuisance dependencies.
}
\section{General overview}{What is \pkg{mcmcabn}:\cr
Bayesian network modeling is a data analysis technique that is ideally suited to messy, highly correlated, and complex datasets. This methodology is somewhat distinct from other forms of statistical modeling in that its focus is on structure discovery - determining an optimal graphical model that describes the inter-relationships in the underlying processes which generated the data. It is a multivariate technique and can used for one or many dependent variables. This is a data-driven approach, as opposed to, rely only on subjective expert opinion to determine how variables of interest are inter-related (for example, structural equation modeling).
The R package mcmcabn is a structural MCMC sampler for Directed Acyclic Graphs (DAGs). It contains routines to compute, analyze, and report MCMC samples. This structural sampler supports the new edge reversal move from Grzegorczyk and Husmeier (2008) <doi:10.1007/s10994-008-5057-7> and the Markov blanket resampling from Su and Borsuk (2016) <http://jmlr.org/papers/v17/su16a.html>. It supports three priors: a prior controlling for structure complexity from Koivisto and Sood (2004) <http://dl.acm.org/citation.cfm?id=1005332.1005352>, an uninformative prior and a user-defined prior.
The web pages \url{https://www.math.uzh.ch/pages/mcmcabn/} provide further case studies. See also the files stored in the package directories
\code{inst/FCV_code}.
}
\references{
Kratzer G, Lewis FI, Willi B, Meli ML, Boretti FS, Hofmann-Lehmann R, Torgerson P, Furrer R and Hartnack S (2020) Bayesian Network Modeling Applied to Feline Calicivirus Infection Among Cats in Switzerland. Front. Vet. Sci. 7:73. doi: 10.3389/fvets.2020.00073
Kratzer, G. and Furrer, R. (2019). Is a single unique Bayesian network enough to accurately represent
your data?. arXiv preprint arXiv:1902.06641
Kratzer, G. and Furrer, R. (2019). mcmcabn: a structural MCMC sampler for DAGs learned from observed
systemic datasets. R package version 0.3. https://CRAN.R-project.org/package=mcmcabn
}
\examples{
## Citations:
citation('mcmcabn')
}
\author{Gilles Kratzer}
\keyword{documentation}
\keyword{package}
......@@ -19,4 +19,3 @@ BuildType: Package
PackageUseDevtools: Yes
PackageInstallArgs: --no-multiarch --with-keep.source
PackageCheckArgs: --as-cran
PackageRoxygenize: rd,collate,namespace
......@@ -9,7 +9,7 @@
Sys.setenv("R_TESTS" = "")
#library(testthat)
#library(abn)
library(bnlearn)
data(asia, package='bnlearn')
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##General tests
......
......@@ -31,7 +31,7 @@ Let us examine a first MCMC search (1000 MCMC steps).
```{r, eval=TRUE}
# loading libraries
library(bnlearn)
data(asia, package='bnlearn')
library(mcmcabn)
library(abn)
library(ggplot2)
......
......@@ -76,7 +76,7 @@ Let us start with an example from the `bnlearn` R package from Scutari (2010). I
One needs to pre-compute a cache of scores. We use the R package `abn` to do it. But first, let us define the list of distribution for the nodes and plot the network.
```{r, warning = FALSE, message = FALSE}
library(bnlearn) #for the dataset
data(asia, package='bnlearn') #for the dataset
library(abn) #to pre-compute the scores
library(ggplot2) #plotting
library(ggpubr) #plotting
......@@ -213,14 +213,15 @@ query(mcmcabn = mcmc.out.asia ,formula = ~LungCancer|Smoking + Bronchitis|Smokin
So essentially zero!
### Formula statement: tutorial
### Formula statement for DAG specifications
The **formula** statement has been designed to ease querying over the MCMC samples. Hence, without explicitly writing an adjacency matrix (which can be painful when the number of variables increases). The `formula` argument can be provided using a formula alike:
~ node1|parent1:parent2 + node2:node3|parent3. The formula statement has to start with `~`. In this example, node1 has two parents (parent1 and parent2).
The **formula** statement has been designed to ease querying over the MCMC samples. Hence, without explicitly writing an adjacency matrix (which can be painful when the number of variables increases). The `formula` argument can be provided using a formula alike:
`~ node1|parent1:parent2 + node2:node3|parent3`. The formula statement has to start with `~`. In this example, node1 has two parents (parent1 and parent2).
node2 and node3 have the same parent3. The parents' names have to match
those given in `data.dist` exactly. `:` is the separator between either children or parents,
`|` separates children (left side), and parents (right side), `+` separates terms, `.` replaces all the variables in name. Then, the arrows go from the left to the right side of the formula. Additionally, when one wants to exclude an arc put `-` in front of that statement. Then a formula like:
~ -node1|parent1 excludes all DAGs that have an arc between parent1 and node1. Alternatively, one can query using an adjacency matrix. This matrix should have only: 0,1 and -1. The 1 indicates the requested arcs, the -1 the excluded, and the 0 all other entries that are not subject to constraints. The rows indicate the set of parents of the index nodes. The order of rows and columns should be the same as the ones used in the `mcmcabn()` function in the `data.dist` argument. The matrix should not be named, but it should be squared.
`~ -node1|parent1` excludes all DAGs that have an arc between parent1 and node1. Alternatively, one can query using an adjacency matrix. This matrix should have only: 0,1 and -1. The 1 indicates the requested arcs, the -1 the excluded, and the 0 all other entries that are not subject to constraints. The rows indicate the set of parents of the index nodes. The order of rows and columns should be the same as the ones used in the `mcmcabn()` function in the `data.dist` argument. The matrix should not be named, but it should be squared.
## Technical foundations
......@@ -299,7 +300,7 @@ $$E(G) = \sum_{i,j=1}^N |B_{i,j} −G_{i,j}|$$
The energy $E$ is zero for a perfect match between the prior knowledge $B$ and the actual network $G$, while increasing values of $E$ indicates an increasing divergence between $B$ and $G$. Following, Imoto et al. (2003) we define the prior belief on graph $G$ by
$$p(G|\beta) = \frac{\exp({-\lambda E(G)})}{z^*}$$
$$p(G) = \frac{\exp({-\lambda E(G)})}{z^*}$$
where $z^*$ is a normalizing constant, and $\lambda$ is a hyperparameter defined by `prior.lambda`. In an MCMC setting the normalizing constant will canceled out in the Hasting ratio.
In statistical physics, in a Gibbs distribution, the hyperparameter $\lambda$ is the inverse of temperature. It can be interpreted as a proxy indicating the strength of the influence of the prior over the data. Thus the strength of the user belief in the given prior. For $\lambda \rightarrow 0$, the prior distribution becomes flatter then uninformative about the structure. Inversely, for $\lambda \rightarrow \infty$, the prior distribution becomes sharply peaked at the network with the lowest energy.
......
This source diff could not be displayed because it is too large. You can view the blob instead.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment