Commit cce67658 authored by Gilles Kratzer's avatar Gilles Kratzer
Browse files

update website

parent 16eece2f
Pipeline #3350 passed with stage
in 3 seconds
......@@ -10,4 +10,4 @@
* update mcmcabn to make it compatible with constraints imported from the cache of scores. Heating parameter to increase or decrease acceptance probability.
* new function CoupledHeatedmcmcabn() implementing parallel tempering
* new article
* new published article (preferred reference): Bayesian Network Modeling Applied to Feline Calicivirus Infection Among Cats in Switzerland in Front. Vet. Sci.
......@@ -124,7 +124,7 @@
<p>Let us examine a first MCMC search (1000 MCMC steps).</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb1-1" data-line-number="1"></a>
<a class="sourceLine" id="cb1-2" data-line-number="2"><span class="co"># loading libraries</span></a>
<a class="sourceLine" id="cb1-3" data-line-number="3"><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span>(bnlearn)</a>
<a class="sourceLine" id="cb1-3" data-line-number="3"><span class="kw"><a href="https://rdrr.io/r/utils/data.html">data</a></span>(asia, <span class="dt">package=</span><span class="st">'bnlearn'</span>)</a>
<a class="sourceLine" id="cb1-4" data-line-number="4"><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span>(mcmcabn)</a>
<a class="sourceLine" id="cb1-5" data-line-number="5"><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span>(abn)</a>
<a class="sourceLine" id="cb1-6" data-line-number="6"><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span>(ggplot2)</a>
......
......@@ -104,7 +104,7 @@
<h1>mcmcabn: A Structural Mcmc Sampler for Dags Learned from Observed Systemic Datasets</h1>
<h4 class="author">Gilles Kratzer, Reinhard Furrer</h4>
<h4 class="date">2019-11-06</h4>
<h4 class="date">2020-03-01</h4>
<div class="hidden name"><code>mcmcabn.Rmd</code></div>
......@@ -143,7 +143,7 @@
<div class="sourceCode" id="cb2"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb2-1" data-line-number="1"><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span>(mcmcabn)</a></code></pre></div>
<p>Let us start with an example from the <code>bnlearn</code> R package from Scutari (2010). It is about a small synthetic dataset from Lauritzen and Spiegelhalter (1988) about lung diseases (tuberculosis, lung cancer, or bronchitis) and visits to Asia (8 nodes and 8 arcs).</p>
<p>One needs to pre-compute a cache of scores. We use the R package <code>abn</code> to do it. But first, let us define the list of distribution for the nodes and plot the network.</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb3-1" data-line-number="1"><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span>(bnlearn) <span class="co">#for the dataset</span></a>
<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb3-1" data-line-number="1"><span class="kw"><a href="https://rdrr.io/r/utils/data.html">data</a></span>(asia, <span class="dt">package=</span><span class="st">'bnlearn'</span>) <span class="co">#for the dataset</span></a>
<a class="sourceLine" id="cb3-2" data-line-number="2"><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span>(abn) <span class="co">#to pre-compute the scores </span></a>
<a class="sourceLine" id="cb3-3" data-line-number="3"><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span>(ggplot2) <span class="co">#plotting</span></a>
<a class="sourceLine" id="cb3-4" data-line-number="4"><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span>(ggpubr) <span class="co">#plotting</span></a>
......@@ -290,16 +290,11 @@
<div class="sourceCode" id="cb13"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb13-1" data-line-number="1"><span class="kw"><a href="../reference/query.html">query</a></span>(<span class="dt">mcmcabn =</span> mcmc.out.asia ,<span class="dt">formula =</span> <span class="op">~</span>LungCancer<span class="op">|</span>Smoking <span class="op">+</span><span class="st"> </span>Bronchitis<span class="op">|</span>Smoking <span class="op">-</span><span class="st"> </span>Tuberculosis<span class="op">|</span>Smoking <span class="op">-</span><span class="st"> </span>XRay<span class="op">|</span>Bronchitis)</a>
<a class="sourceLine" id="cb13-2" data-line-number="2"><span class="co">#&gt; [1] 0.002</span></a></code></pre></div>
<p>So essentially zero!</p>
<div id="formula-statement-tutorial" class="section level3">
<div id="formula-statement-for-dag-specifications" class="section level3">
<h3 class="hasAnchor">
<a href="#formula-statement-tutorial" class="anchor"></a>Formula statement: tutorial</h3>
<dl>
<dt>The <strong>formula</strong> statement has been designed to ease querying over the MCMC samples. Hence, without explicitly writing an adjacency matrix (which can be painful when the number of variables increases). The <code>formula</code> argument can be provided using a formula alike:</dt>
<dd>node1|parent1:parent2 + node2:node3|parent3. The formula statement has to start with <code>~</code>. In this example, node1 has two parents (parent1 and parent2). node2 and node3 have the same parent3. The parents’ names have to match those given in <code>data.dist</code> exactly. <code>:</code> is the separator between either children or parents, <code><a href="https://rdrr.io/r/base/Logic.html">|</a></code> separates children (left side), and parents (right side), <code><a href="https://rdrr.io/r/base/Arithmetic.html">+</a></code> separates terms, <code>.</code> replaces all the variables in name. Then, the arrows go from the left to the right side of the formula. Additionally, when one wants to exclude an arc put <code><a href="https://rdrr.io/r/base/Arithmetic.html">-</a></code> in front of that statement. Then a formula like:
</dd>
<dd>-node1|parent1 excludes all DAGs that have an arc between parent1 and node1. Alternatively, one can query using an adjacency matrix. This matrix should have only: 0,1 and -1. The 1 indicates the requested arcs, the -1 the excluded, and the 0 all other entries that are not subject to constraints. The rows indicate the set of parents of the index nodes. The order of rows and columns should be the same as the ones used in the <code><a href="../reference/mcmc.html">mcmcabn()</a></code> function in the <code>data.dist</code> argument. The matrix should not be named, but it should be squared.
</dd>
</dl>
<a href="#formula-statement-for-dag-specifications" class="anchor"></a>Formula statement for DAG specifications</h3>
<p>The <strong>formula</strong> statement has been designed to ease querying over the MCMC samples. Hence, without explicitly writing an adjacency matrix (which can be painful when the number of variables increases). The <code>formula</code> argument can be provided using a formula alike:</p>
<p><code>~ node1|parent1:parent2 + node2:node3|parent3</code>. The formula statement has to start with <code>~</code>. In this example, node1 has two parents (parent1 and parent2). node2 and node3 have the same parent3. The parents’ names have to match those given in <code>data.dist</code> exactly. <code>:</code> is the separator between either children or parents, <code><a href="https://rdrr.io/r/base/Logic.html">|</a></code> separates children (left side), and parents (right side), <code><a href="https://rdrr.io/r/base/Arithmetic.html">+</a></code> separates terms, <code>.</code> replaces all the variables in name. Then, the arrows go from the left to the right side of the formula. Additionally, when one wants to exclude an arc put <code><a href="https://rdrr.io/r/base/Arithmetic.html">-</a></code> in front of that statement. Then a formula like: <code>~ -node1|parent1</code> excludes all DAGs that have an arc between parent1 and node1. Alternatively, one can query using an adjacency matrix. This matrix should have only: 0,1 and -1. The 1 indicates the requested arcs, the -1 the excluded, and the 0 all other entries that are not subject to constraints. The rows indicate the set of parents of the index nodes. The order of rows and columns should be the same as the ones used in the <code><a href="../reference/mcmc.html">mcmcabn()</a></code> function in the <code>data.dist</code> argument. The matrix should not be named, but it should be squared.</p>
</div>
<div id="technical-foundations" class="section level2">
<h2 class="hasAnchor">
......@@ -365,7 +360,7 @@
<p>Such a matrix is still not a prior as we need to introduce a proper normalization procedure. To do so, let us define the energy of a structure as</p>
<p><span class="math display">\[E(G) = \sum_{i,j=1}^N |B_{i,j} −G_{i,j}|\]</span></p>
<p>The energy <span class="math inline">\(E\)</span> is zero for a perfect match between the prior knowledge <span class="math inline">\(B\)</span> and the actual network <span class="math inline">\(G\)</span>, while increasing values of <span class="math inline">\(E\)</span> indicates an increasing divergence between <span class="math inline">\(B\)</span> and <span class="math inline">\(G\)</span>. Following, Imoto et al. (2003) we define the prior belief on graph <span class="math inline">\(G\)</span> by</p>
<p><span class="math display">\[p(G|\beta) = \frac{\exp({-\lambda E(G)})}{z^*}\]</span> where <span class="math inline">\(z^*\)</span> is a normalizing constant, and <span class="math inline">\(\lambda\)</span> is a hyperparameter defined by <code>prior.lambda</code>. In an MCMC setting the normalizing constant will canceled out in the Hasting ratio.</p>
<p><span class="math display">\[p(G) = \frac{\exp({-\lambda E(G)})}{z^*}\]</span> where <span class="math inline">\(z^*\)</span> is a normalizing constant, and <span class="math inline">\(\lambda\)</span> is a hyperparameter defined by <code>prior.lambda</code>. In an MCMC setting the normalizing constant will canceled out in the Hasting ratio.</p>
<p>In statistical physics, in a Gibbs distribution, the hyperparameter <span class="math inline">\(\lambda\)</span> is the inverse of temperature. It can be interpreted as a proxy indicating the strength of the influence of the prior over the data. Thus the strength of the user belief in the given prior. For <span class="math inline">\(\lambda \rightarrow 0\)</span>, the prior distribution becomes flatter then uninformative about the structure. Inversely, for <span class="math inline">\(\lambda \rightarrow \infty\)</span>, the prior distribution becomes sharply peaked at the network with the lowest energy.</p>
</div>
</div>
......@@ -374,10 +369,11 @@
<h1 class="hasAnchor">
<a href="#references" class="anchor"></a>References</h1>
<ul>
<li>Kratzer, Gilles, et al. (2020). “Bayesian Networks modeling applied to Feline Calicivirus infection among cats in Switzerland.” Frontiers in Veterinary Science 7: 73.</li>
<li>Madigan, D. and York, J. (1995) “Bayesian graphical models for discrete data”. International Statistical Review, 63:215–232.</li>
<li>Giudici, P. and Castelo, R. (2003). “Improving Markov chain Monte Carlo model search for data mining”. Machine Learning, 50:127–158.</li>
<li>Kratzer, G. and Furrer, R. (2016) “Is a single unique Bayesian network enough to accurately represent your data?”. arXiv preprint arXiv:1902.06641.</li>
<li>Friedman, N. and Koller, D. (2003). “Being Bayesian about network structure. A Bayesian approach to structure discovery in Bayesian networks. Machine Learning, 50:95–125, 2003.</li>
<li>Giudici, P. and Castelo, R. (2003). “Improving Markov Chain Monte Carlo model search for data mining”. Machine Learning, 50:127–158.</li>
<li>Kratzer, G. and Furrer, R. (2019) “Is a single unique Bayesian network enough to accurately represent your data?”. arXiv preprint arXiv:1902.06641.</li>
<li>Friedman, N. and Koller, D. (2003). “Being Bayesian about network structure. A Bayesian approach to structure discovery in Bayesian networks. Machine Learning, 50:95–125, 2003.</li>
<li>Grzegorczyk, M. and Husmeier, D. “Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move”, Machine Learning, vol. 71(2-3), pp. 265, 2008.</li>
<li>Su, C. and Borsuk, M. E. “Improving structure MCMC for Bayesian networks through Markov blanket resampling”, The Journal of Machine Learning Research, vol. 17(1), pp. 4042-4061, 2016.</li>
<li>Koivisto, M. V. (2004). Exact Structure Discovery in Bayesian Networks, Journal of Machine Learning Research, vol 5, 549-573.</li>
......
......@@ -141,6 +141,16 @@
</div>
<p>Kratzer G, Lewis FI, Willi B, Meli ML, Boretti FS, Hofmann-Lehmann R, Torgerson P, Furrer R, Hartnack S (2020).
&ldquo;Bayesian Network Modeling Applied to Feline Calicivirus Infection Among Cats in Switzerland.&rdquo;
<em>Front. Vet. Sci. 7:73. doi: 10.3389/fvets.2020.00073</em>.
</p>
<pre>@Article{,
title = {Bayesian Network Modeling Applied to Feline Calicivirus Infection Among Cats in Switzerland},
author = {Gilles Kratzer and Fraser I. Lewis and Barbara Willi and Marina L. Meli and Felicitas S. Boretti and Regina Hofmann-Lehmann and Paul Torgerson and Reinhard Furrer and Sonja Hartnack},
year = {2020},
journal = {Front. Vet. Sci. 7:73. doi: 10.3389/fvets.2020.00073},
}</pre>
<p>Kratzer G, Furrer R (2019).
&ldquo;Is a single unique Bayesian network enough to accurately represent your data?&rdquo;
<em>arXiv preprint arXiv:1902.06641</em>.
......
......@@ -136,6 +136,7 @@
<li><p>08/03/2019 - mcmcabn is available on CRAN (v 0.1)</p></li>
<li><p>18/02/2019 - new pre-print <a href="https://arxiv.org/pdf/1902.06641.pdf">Is a single unique Bayesian network enough to accurately represent your data?</a> on arXiv</p></li>
<li><p>01/07/2019 - mcmcabn 0.2 available on CRAN</p></li>
<li><p>02/03/2020 - mcmcabn 0.3 available on CRAN. New peer reviewed article <a href="https://www.frontiersin.org/articles/10.3389/fvets.2020.00073/full">Bayesian Network Modeling Applied to Feline Calicivirus Infection Among Cats in Switzerland</a> in Front. Vet. Sci.</p></li>
</ul>
<hr>
<p><strong><code>mcmcabn</code> is developed and maintained by <a href="https://gilleskratzer.netlify.com/">Gilles Kratzer</a> and <a href="https://user.math.uzh.ch/furrer/">Prof. Dr. Reinhard Furrer</a> from <a href="https://www.math.uzh.ch/as/index.php?id=as">Applied Statistics Group</a> from the University of Zurich.</strong></p>
......
......@@ -160,7 +160,8 @@
<a href="#mcmcabn-0-3" class="anchor"></a>mcmcabn 0.3:</h2>
<ul>
<li>update mcmcabn to make it compatible with constraints imported from the cache of scores. Heating parameter to increase or decrease acceptance probability.</li>
<li>new article</li>
<li>new function CoupledHeatedmcmcabn() implementing parallel tempering</li>
<li>new published article (preferred reference): Bayesian Network Modeling Applied to Feline Calicivirus Infection Among Cats in Switzerland in Front. Vet. Sci.</li>
</ul>
</div>
</div>
......
This diff is collapsed.
......@@ -159,13 +159,8 @@
<h2 class="hasAnchor" id="examples"><a class="anchor" href="#examples"></a>Examples</h2>
<pre class="examples"><div class='input'><span class='co'>## This data set was generated using the following code:</span>
<span class='fu'><a href='https://rdrr.io/r/base/library.html'>library</a></span>(<span class='st'>"bnlearn"</span>) <span class='co'># for the dataset</span></div><div class='output co'>#&gt; <span class='message'></span>
#&gt; <span class='message'>Attaching package: ‘bnlearn’</span></div><div class='output co'>#&gt; <span class='message'>The following object is masked from ‘package:stats’:</span>
#&gt; <span class='message'></span>
#&gt; <span class='message'> sigma</span></div><div class='input'><span class='fu'><a href='https://rdrr.io/r/base/library.html'>library</a></span>(<span class='st'>"abn"</span>) <span class='co'># for the cache of score function</span></div><div class='output co'>#&gt; <span class='message'>Loading required package: nnet</span></div><div class='output co'>#&gt; <span class='message'>Loading required package: MASS</span></div><div class='output co'>#&gt; <span class='message'>Loading required package: lme4</span></div><div class='output co'>#&gt; <span class='message'>Loading required package: Matrix</span></div><div class='output co'>#&gt; <span class='message'></span>
#&gt; <span class='message'>Attaching package: ‘abn’</span></div><div class='output co'>#&gt; <span class='message'>The following object is masked from ‘package:bnlearn’:</span>
#&gt; <span class='message'></span>
#&gt; <span class='message'> mb</span></div><div class='input'>
<span class='fu'><a href='https://rdrr.io/r/utils/data.html'>data</a></span>(<span class='no'>asia</span>, <span class='kw'>package</span><span class='kw'>=</span><span class='st'>'bnlearn'</span>) <span class='co'># for the dataset</span>
<span class='fu'><a href='https://rdrr.io/r/base/library.html'>library</a></span>(<span class='st'>"abn"</span>) <span class='co'># for the cache of score function</span></div><div class='output co'>#&gt; <span class='message'>Loading required package: nnet</span></div><div class='output co'>#&gt; <span class='message'>Loading required package: MASS</span></div><div class='output co'>#&gt; <span class='message'>Loading required package: lme4</span></div><div class='output co'>#&gt; <span class='message'>Loading required package: Matrix</span></div><div class='input'>
<span class='co'># Renaming columns of the dataset</span>
<span class='fu'><a href='https://rdrr.io/r/base/colnames.html'>colnames</a></span>(<span class='no'>asia</span>) <span class='kw'>&lt;-</span> <span class='fu'><a href='https://rdrr.io/r/base/c.html'>c</a></span>(<span class='st'>"Asia"</span>,
<span class='st'>"Smoking"</span>,
......
......@@ -153,13 +153,13 @@
<h2 class="hasAnchor" id="format"><a class="anchor" href="#format"></a>Format</h2>
<p>The data contains a cache of pre-computed scores with a maximum of two parents per node.</p><ul>
<p>The dataset contains a list of named distribution used to analyse the asia dataset.</p><ul>
<li><p><code>dist.asia</code>: a named list giving the distribution for each node in the network.</p></li>
</ul>
<h2 class="hasAnchor" id="examples"><a class="anchor" href="#examples"></a>Examples</h2>
<pre class="examples"><div class='input'><span class='co'>## This data set was generated using the following code:</span>
<span class='fu'><a href='https://rdrr.io/r/base/library.html'>library</a></span>(<span class='st'>"bnlearn"</span>) <span class='co'># for the dataset</span>
<span class='fu'><a href='https://rdrr.io/r/utils/data.html'>data</a></span>(<span class='no'>asia</span>, <span class='kw'>package</span><span class='kw'>=</span><span class='st'>'bnlearn'</span>) <span class='co'># for the dataset</span>
<span class='co'># Renaming columns of the dataset</span>
<span class='fu'><a href='https://rdrr.io/r/base/colnames.html'>colnames</a></span>(<span class='no'>asia</span>) <span class='kw'>&lt;-</span> <span class='fu'><a href='https://rdrr.io/r/base/c.html'>c</a></span>(<span class='st'>"Asia"</span>,
......
......@@ -157,6 +157,12 @@
</tr>
<tr>
<td>
<p><code><a href="CoupledHeatedmcmcabn.html">CoupledHeatedmcmcabn()</a></code> </p>
</td>
<td><p>Coupled Heated Structural MCMC sampler for DAGs</p></td>
</tr><tr>
<td>
<p><code><a href="bsc-compute-asia.html">bsc.compute.asia</a></code> </p>
</td>
......@@ -187,6 +193,12 @@
<td><p>List of files to reproduce examples <code>mcmcabn</code> library.</p></td>
</tr><tr>
<td>
<p><code><a href="mcmcabn-package.html">. mcmcabn .</a></code> </p>
</td>
<td><p>mcmcabn Package</p></td>
</tr><tr>
<td>
<p><code><a href="plot.html">plot(<i>&lt;mcmcabn&gt;</i>)</a></code> </p>
</td>
......
......@@ -160,7 +160,7 @@
<h2 class="hasAnchor" id="examples"><a class="anchor" href="#examples"></a>Examples</h2>
<pre class="examples"><div class='input'><span class='kw'>if</span> (<span class='fl'>FALSE</span>) {
<span class='co'>## This data set was generated using the following code:</span>
<span class='fu'><a href='https://rdrr.io/r/base/library.html'>library</a></span>(<span class='no'>bnlearn</span>) <span class='co'>#for the dataset</span>
<span class='fu'><a href='https://rdrr.io/r/utils/data.html'>data</a></span>(<span class='no'>asia</span>, <span class='kw'>package</span><span class='kw'>=</span><span class='st'>'bnlearn'</span>) <span class='co'>#for the dataset</span>
<span class='fu'><a href='https://rdrr.io/r/base/library.html'>library</a></span>(<span class='no'>abn</span>) <span class='co'>#for the cache of scores computing function</span>
<span class='no'>mcmc.out.asia</span> <span class='kw'>&lt;-</span> <span class='fu'><a href='mcmc.html'>mcmcabn</a></span>(<span class='kw'>score.cache</span> <span class='kw'>=</span> <span class='no'>bsc.compute.asia</span>,
......
......@@ -163,7 +163,7 @@
<h2 class="hasAnchor" id="examples"><a class="anchor" href="#examples"></a>Examples</h2>
<pre class="examples"><div class='input'><span class='kw'>if</span> (<span class='fl'>FALSE</span>) {
<span class='co'>## This data set was generated using the following code:</span>
<span class='fu'><a href='https://rdrr.io/r/base/library.html'>library</a></span>(<span class='no'>bnlearn</span>) <span class='co'>#for the dataset</span>
<span class='fu'><a href='https://rdrr.io/r/utils/data.html'>data</a></span>(<span class='no'>asia</span>, <span class='kw'>package</span><span class='kw'>=</span><span class='st'>'bnlearn'</span>) <span class='co'>#for the dataset</span>
<span class='fu'><a href='https://rdrr.io/r/base/library.html'>library</a></span>(<span class='no'>abn</span>) <span class='co'>#for the cache of score function</span>
<span class='co'>#renaming columns of the dataset</span>
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment