Bayesian Networks modelling is becoming more and more popular in systems epidemiology

\cite{b1,b2}. It is highly suitable for epidemiological datasets that are messy and highly

correlated or when it is not clear which variable would be the outcome. It is also well adapted in

context where no prior model exists or when experts want a data driven approach to select optimal

models. However in highly interdisciplinary research field, some concerns have been raised against

BN modelling. A fully data driven approach could select a model that is statistically supported by

the data, but that does not have plausible epidemiological interpretation. To overcome this

limitation there exists a popular workaround: one can either ban or retain some parts of the

structure to account for those biological constrains. This modelling approach thus ends up in a semi

supervised approach. Once this first guiding step is performed, all other arcs are either present or

correlated or when it is not clear which variable would be the outcome. It is also well-adapted to

contexts where no prior model exists or when experts want a data-driven approach for selecting optimal

models. However, in highly interdisciplinary research fields, some concerns have been raised against

BN modelling. A fully data-driven approach could select a model that is statistically supported by

the data but that does not have plausible epidemiological interpretation. To overcome this

limitation, there exists a popular workaround: one can either ban or retain some parts of the

structure to account for those biological constrains. This modelling approach thus ends up in a semi-supervised approach. Once this first guiding step is performed, all other arcs are either present or

absent in the model.

From a practical and modelling perspective, a major concern of BN modelling is the tendency to

overfit the data and to select overly complicated models that will generalise poorly. To compensate

parametric or nonparametric approaches prune the selected structure and usually a unique and

From practice and modelling perspectives, a major concern of BN modelling is the tendency to

overfit the data and to select overly complicated models that generalise poorly. To compensate,

parametric or non-parametric approaches prune the selected structure. Usually, a unique and

theory-compatible structure is reported. Albeit being popular and accepted, this process makes very

crude choices regarding the possible connections in the model and diminish the range of possible

interpretation. Indeed, an arc is either present of absent. This is exceptionally rudimentary taken

into account the massive number of a priori networks.

crude choices regarding the possible connections in the model and diminishes the range of possible

interpretation. Indeed, an arc is either present or absent. This is exceptionally rudimentary considering the massive number of a priori networks.

%In contrast to other fields, the acceptance rate \rf{which?} of BN modelling, represented by directed acyclic

%graphs (DAGs), is relatively low compared to more traditional statistical approach within the

...

...

@@ -79,80 +76,81 @@ into account the massive number of a priori networks.

%plausibility of covariate in a model on a continuous scale. The p-values are certainly not the best

%answers but they are well accepted from an interpretation point of view. %\gk{note RF}

Classically in statistics any relationship between variables is given with an estimate of their data

support. A possible counterpart in BN modelling, would be the so called Monte Carlo Markov chain

model choice ((MC)$^3$) which sample over all possible structures and move from structures to

structure according to their support by the data. Indeed, the posterior probability of any

structural feature could be obtained from a Markov chain sample computed from graph structure by

Classically in statistics, any relationship between variables is given with an estimate of the relationship based on data. A possible counterpart in BN modelling could be the Monte Carlo Markov chain

model choice ((MC)$^3$), which samples overall possible structures and moves from structure to

structure according to its support by the data. Indeed, the posterior probability of any

structural feature could be obtained from a Markov chain sample computed from graph structures by

\begin{equation}

E[f|D] \approx\frac{1}{S}\sum_{s=1}^{S}f(G^s), \text{ where } G^s \sim p(G|D)\label{eq}

\end{equation}

where $f$ is any structural query, $D$ is the data, $S$ is the set of visited structures $G$ and

where $f$ is any structural query, $D$ is the data, $S$ is the set of visited structures $G$, and

$p(G|D)$ is the posterior distribution of the structures.

\section{Outline of the Approach}

Sampling BN using MCMC algorithms is a complicated task. The classical structural approach is to

make single-edge operations (addition, deletion and reversal of an edge). It is known that naive

structural approaches will fail to efficiently sample BN landscape in large problems. Indeed,

sampling algorithms mix slowly and for large problem hardly converge. A popular solution is to

sample from the space of node ordering using order MCMC sampling scheme \cite{b3}. But it is not

Sampling BN using MCMC algorithms is a complicated task. The classical structural approach makes

single-edge operations (addition, deletion and reversal of an edge). It is known that naive

structural approaches will fail to efficiently sample BN landscapes in large problems. Indeed,

sampling algorithms mix slowly and hardly converge for large problem. A popular solution is to

sample from the space of node ordering using order MCMC sampling schemes\cite{b3}. But it is not

possible to explicitly express priors on graph structures in this context. This is of particular

importance in systems epidemiology where the data are usually scarce and, as a result, the posterior

depends heavily on the prior choice. Again, from a data analysis point of view this would weaken the

analysis and render it not fully adapted. Two structurally transparent workarounds have been

proposed: the new edge reversal move \cite{b4} and the Markov blanket resampling \cite{b5}. The

former advocates to make a reversal move in resampling the parent set. Indeed, the classical

reversal move depends on the global configuration of the parents and children, and then fails to

propose MCMC jumps that produce valid but very different DAGs in a unique move. In the latter the

same idea is applied but to the entire Markov blanket of a randomly chosen node. Single edge and both structural methods were implemented in \textsc{mcmcabn}.

depends heavily on the prior choice. Again, from a data analysis point of view, this would weaken

the analysis and render it not fully adapted.

Two structurally transparent workarounds have been

proposed: the new edge reversal move \cite{b4} and Markov blanket resampling \cite{b5}. The former

advocates to make a reversal move in resampling the parent set. Indeed, the classical reversal move

depends on the global configuration of the parents and children, but then fails to propose MCMC

jumps that produce valid but very different DAGs in a unique move. The latter workaround applies the

same idea but to the entire Markov blanket of a randomly chosen node. Single edge and both

structural methods were implemented in \textsc{mcmcabn}.

\section{Simulation Study}

To illustrate the usability of the approach for a typical systems epidemiology dataset, we simulate

a sparse BN of five nodes and five arcs (Fig.~\ref{fig:dag}). Then 250, 500, 1000, 10'000 binomial

observations have been simulated from the DAG. A cache of score for each possible set of parent has

been computed for each simulated dataset. The synthetic data has been generated using the R package

\textsc{abn}\cite{b6}. Afterwards, 10$^5$ sampling steps have been performed for each

datasets. For computational reasons, the starting point of the MCMC chain is the true model and no

burn-in phase nor thinning have been considered. All computations have been performed using R.

a sparse BN of five nodes and five arcs (Fig.~\ref{fig:dag}). 250, 500, 1000, 10,000 binomial

observations were simulated from the DAG. A cache of scores for each possible set of parents was computed for each simulated dataset. The synthetic data was generated using the R package

\textsc{abn}\cite{b6}. Afterwards, 10$^5$ sampling steps were performed for each

dataset. For computational reasons, the starting point of the MCMC chain is the true model and no

burn-in phase nor thinning has been considered. All computations were performed using R.

\caption{Five node DAG (left) and summaries of associated structural queries (\%) (right). Green indicates if

the link is expected and red if not.}\label{fig:dag}

\caption{Five-node DAG (left) and summaries of associated structural queries (\%) (right). Green indicates where the link is expected and red where it is not.}\label{fig:dag}

\end{figure}

...

...

@@ -193,7 +190,7 @@ the link is expected and red if not.} \label{fig:dag}

\section*{Acknowledgment}

\section*{Acknowledgments}

We would like to thank Fraser I. Lewis for the fruitful discussions about model averaging in BN modelling context that lead to this paper.

Bayesian Networks modelling is becoming more and more popular in systems epidemiology

\cite{b1,b2}. It is highly suitable for epidemiological datasets that are messy and highly

correlated or when it is not clear which variable would be the outcome. It is also well-adapted to

contexts where no prior model exists or when experts want a data-driven approach for selecting optimal

models. However, in highly interdisciplinary research fields, some concerns have been raised against

BN modelling. A fully data-driven approach could select a model that is statistically supported by

the data but that does not have plausible epidemiological interpretation. To overcome this

limitation, there exists a popular workaround: one can either ban or retain some parts of the

structure to account for those biological constrains. This modelling approach thus ends up in a semi-supervised approach. Once this first guiding step is performed, all other arcs are either present or

absent in the model.

From a practical and modelling perspective, a major concern of BN modelling is the tendency to

overfit the data and to select overly complicated models that generalise poorly. To compensate,

parametric or non-parametric approaches prune the selected structure. Usually, a unique and

theory-compatible structure is reported. Albeit being popular and accepted, this process makes very

crude choices regarding the possible connections in the model and diminishes the range of possible

interpretation. Indeed, an arc is either present or absent. This is exceptionally rudimentary considering the massive a priori possibilities of structures.

%In contrast to other fields, the acceptance rate \rf{which?} of BN modelling, represented by directed acyclic

%graphs (DAGs), is relatively low compared to more traditional statistical approach within the

%epidemiological community. Even i

%In contexts where expert models are hard to identify or where more

%holistic approach would be preferable. Traditional regression based on expert model are preferred. A

%possible retraining factor is the high popularity of the p-values in epidemiology that act as

%proxies for significance and leads to rich and smooth interpretation of regression models. However

%being not recommended by many statistical associations, those tools are tentative to asses the

%plausibility of covariate in a model on a continuous scale. The p-values are certainly not the best

%answers but they are well accepted from an interpretation point of view. %\gk{note RF}

Classically in statistics, any relationship between variables is given with an estimate of their data

support. A possible counterpart in BN modelling could be the Monte Carlo Markov chain

model choice (MC$^3$), which samples overall possible structures and moves from structures to

structure according to their support by the data. Indeed, the posterior probability of any

structural feature could be obtained from a Markov chain sample computed from graph structure by

\begin{equation}

E[f|D] \approx\frac{1}{S}\sum_{s=1}^{S}f(G^s), \text{ where } G^s \sim p(G|D)\label{eq}

\end{equation}

where $f$ is any structural query, $D$ is the data, $S$ is the set of visited structures $G$, and

$p(G|D)$ is the posterior distribution of the structures.

\section{Outline of the Approach}

Sampling BN using MCMC algorithms is a complicated task. The classical structural approach

makes single-edge operations (addition, deletion and reversal of an edge). It is known that naive

structural approaches will fail to efficiently sample BN landscapes in large problems. Indeed,

sampling algorithms mix slowly and for large problem hardly converge. A popular solution is to

sample from the space of node ordering using order MCMC sampling scheme \cite{b3}. But it is not

possible to explicitly express priors on graph structures in this context. This is of particular

importance in systems epidemiology where the data are usually scarce and, as a result, the posterior

depends heavily on the prior choice. Again, from a data analysis point of view, this would weaken the

analysis and render it not fully adapted. Two structurally transparent workarounds have been

proposed: the new edge reversal move \cite{b4} and Markov blanket resampling \cite{b5}. The

former advocates to make a reversal move in resampling the parent set. However, the classical

reversal move depends on the global configuration of the parents and children, and then fails to

propose MCMC jumps that produce valid but very different DAGs in a unique move. The latter workaround applies the same idea but to the entire Markov blanket of a randomly chosen node.

\section{Simulation Study}

To illustrate the usability of the approach for a typical systems epidemiology dataset, we simulate

a sparse BN of five nodes and five arcs (Fig.~\ref{fig:dag}). Then 250, 500, 1000, 10,000 binomial

observations were simulated from the DAG. A cache of scores for each possible set of parents was computed for each simulated dataset. The synthetic data was generated using the R package

\textsc{abn}\cite{b6}. Afterwards, 10$^5$ sampling steps were performed for each

dataset. For computational reasons, the starting point of the MCMC chain is the true model and no

burn-in phase nor thinning has been considered. All computations were performed using R.

\caption{Five-node DAG (left) and summaries of associated structural queries (\%) (right). Green indicates where the link is expected and red where it is not.}\label{fig:dag}

\end{figure}

% \begin{table}[]

% %\raisebox{2.2cm}[0pt][0pt]{

% %\parbox[c]{11.5cm}{

% % Please add the following required packages to your document preamble:

% % \usepackage[table,xcdraw]{xcolor}

% % If you use beamer only pass "xcolor=table" option, i.e. \documentclass[xcolor=table]{beamer}

We would like to thank Fraser I. Lewis for the fruitful discussions about model averaging in BN modelling context that lead to this paper.

\begin{thebibliography}{00}

%\gk{first reference will be replaced by the second paper as soon as it will be accepted. I will say during the review round that "the first was replace by a more recent reference". Is it ok for you?} \rf{sure}

\bibitem{b1} S. Ruchti, A. R. Meier, H. W\"{u}rbel, G. Kratzer, S. G. Gebhardt-Henrich, S. Hartnack, ``Pododermatitis in group housed rabbit does in Switzerland--Prevalence, severity and risk factors'', Preventive Veterinary Medicine, vol. 158, pp. 114--121, October 2018.

\bibitem{b2} A. Comin, A. Jeremiasson, G. Kratzer, L. Keeling, ``Revealing the structure of the associations between housing system, facilities, management and welfare of commercial laying hens using Additive Bayesian Networks'', Preventive Veterinary Medicine, vol. 164, pp. 23-32, January 2019.

\bibitem{b3} N. Friedman, D. Koller, ``Being Bayesian about network structure'', Machine Learning, vol. 50, pp. 95--126, 2003.

\bibitem{b4} M. Grzegorczyk, D. Husmeier, ``Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move'', Machine Learning, vol. 71(2-3), pp. 265, 2008.

\bibitem{b5} C. Su, M. E. Borsuk, ``Improving structure MCMC for Bayesian networks through Markov blanket resampling'', The Journal of Machine Learning Research, vol. 17(1), pp. 4042-4061, 2016.

\bibitem{b6} G. Kratzer, M. Pittavino, F. I. Lewis and R. Furrer, ``abn: an R package for modelling multivariate data using additive Bayesian networks''. R package version 1.3. https://CRAN.R-project.org/package=abn (2018)