query.html 12.1 KB
 Gilles Kratzer committed Feb 25, 2019 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34  Function to query MCMC samples generated by mcmcabn — query • mcmcabn  Gilles Kratzer committed Feb 28, 2019 35   Gilles Kratzer committed Feb 25, 2019 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 
 Gilles Kratzer committed Feb 28, 2019 131 

The function allows users to perform structural queries over MCMC samples produced by mcmcabn.

 Gilles Kratzer committed Feb 25, 2019 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 
query(mcmcabn = NULL,                  formula = NULL)

Arguments

mcmcabn

object of class mcmcabn.

formula

formula statement or adjacency matrix to query the MCMC samples, see details. If this argument is NULL, then the average arc-wise frequencies is reported.

 Gilles Kratzer committed Feb 28, 2019 147   Gilles Kratzer committed Feb 25, 2019 148 149 150 151 152 

Details

 Gilles Kratzer committed Feb 28, 2019 153 154 

The query can be formulated using an adjacency matrix or a formula-wise expression.

The adjacency matrix should be squared of dimension equal to the number of nodes in the networks. Their entries should be either 1,0 or -1. The 1 indicates the requested arcs, the -1 the excluded and the 0 all other entries that are not subject to query. The rows indicated the set of parents of the index nodes. The order of rows and column should be the same as the one used in the mcmcabn() function in the data.dist argument.

 Gilles Kratzer committed Feb 25, 2019 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 

The formula statement has been designed to ease querying over the MCMC sample. It allows user to make complex queries without explicitly writing an adjacency matrix (which can be painful when the number of variables is large). The formula argument can be provided using typically a formula like: ~ node1|parent1:parent2 + node2:node3|parent3. The formula statement has to start with ~. In this example, node1 has two parents (parent1 and parent2). node2 and node3 have the same parent3. The parents names have to exactly match those given in name. : is the separator between either children or parents, | separates children (left side) and parents (right side), + separates terms, . replaces all the variables in name. Additional, when one want to exclude an arc simply put - in front of that statement. Then a formula like: ~ -node1|parent1 exclude all DAGs that have an arc between parent1 and node1.

If the formula argument is not provided the function returns the average support of all individual arcs using a named matrix.

Value

A probability

References

Kratzer, G. Furrer, R. "Is a single unique Bayesian network enough to accurately represent your data?". arXiv preprint arXiv:1902.06641.

Lauritzen S, Spiegelhalter D (1988). "Local Computation with Probabilities on Graphical Structures and their Application to Expert Systems (with discussion)". Journal of the Royal Statistical Society: Series B, 50(2):157–224.

Scutari, M. (2010). Learning Bayesian Networks with the bnlearn R Package. Journal of Statistical Software, 35(3), 1 - 22. doi:http://dx.doi.org/10.18637/jss.v035.i03.

Examples

## Example from the asia dataset from Lauritzen and Spiegelhalter (1988) provided by Scutari (2010) data("mcmc_run_asia") ##return a named matrix with individual arc support query(mcmcabn = mcmc.out.asia)
#> Asia Smoking Tuberculosis LungCancer Bronchitis #> Asia 0.00000000 0.04795205 0.10989011 0.06193806 0.05794206 #> Smoking 0.01498501 0.00000000 0.05994006 0.28671329 0.31468531 #> Tuberculosis 0.02797203 0.17782218 0.00000000 0.32967033 0.01398601 #> LungCancer 0.01798202 0.40059940 0.16583417 0.00000000 0.06093906 #> Bronchitis 0.01298701 0.51148851 0.01598402 0.07492507 0.00000000 #> Either 0.01198801 0.27172827 0.31268731 0.35664336 0.07792208 #> XRay 0.01498501 0.15784216 0.30469530 0.27972028 0.05294705 #> Dyspnea 0.01198801 0.21878122 0.09290709 0.24375624 0.53546454 #> Either XRay Dyspnea #> Asia 0.09290709 0.05094905 0.05994006 #> Smoking 0.16283716 0.10789211 0.22477522 #> Tuberculosis 0.43356643 0.20179820 0.11388611 #> LungCancer 0.32067932 0.21178821 0.20279720 #> Bronchitis 0.12287712 0.08491508 0.35364635 #> Either 0.00000000 0.28271728 0.22577423 #> XRay 0.44455544 0.00000000 0.10789211 #> Dyspnea 0.35464535 0.15084915 0.00000000
## what is the probability of LungCancer node being children of the Smoking node? query(mcmcabn = mcmc.out.asia,formula = ~LungCancer|Smoking)
#> [1] 0.4005994
## what is the probability of Smoking node being parent of ## both LungCancer and Bronchitis node? query(mcmcabn = mcmc.out.asia, formula = ~ LungCancer|Smoking+Bronchitis|Smoking)
#> [1] 0.2037962
## what is the probability of previous statement, ## when there is no arc from Smoking to Tuberculosis and from Bronchitis to XRay? query(mcmcabn = mcmc.out.asia, formula = ~LungCancer|Smoking + Bronchitis|Smoking - Tuberculosis|Smoking - XRay|Bronchitis)
#> [1] 0.002997003