## 统计代写|贝叶斯网络代写Bayesian network代考| Conditional Independence Tests

Conditional independence tests focus on the presence of individual arcs. Since each arc encodes a probabilistic dependence, conditional independence tests can be used to assess whether that probabilistic dependence is supported by the data. If the null hypothesis (of conditional independence) is rejected, the arc can be considered for inclusion in the DAG. For instance, consider adding an arc from Education to Travel $(\mathrm{E} \rightarrow \mathrm{T})$ to the DAG shown in Figure 1.1. The null hypothesis is that Travel is probabilistically independent $\left(\Perp_{P}\right)$ from Education conditional on its parents, i.e.,
$$H_{0}: \mathrm{T} \Perp_{P} \mathrm{E} \mid{0, \mathrm{R}},$$
and the alternative hypothesis is that
$$H_{1}: T \not H_{P} E \mid{0, \mathrm{R}} .$$
We can test this null hypothesis by adapting either the log-likelihood ratio $\mathrm{G}^{2}$ or Pearson’s $\mathrm{X}^{2}$ to test for conditional independence instead of marginal independence. For $\mathrm{G}^{2}$, the test statistic assumes the form
$$\mathrm{G}^{2}(\mathrm{~T}, \mathrm{E} \mid 0, \mathrm{R})=\sum_{t \in \mathrm{T}} \sum_{e \in \mathrm{E}} \sum_{k \in 0 \times \mathbb{R}} n_{t e k} \log \frac{n_{t e k} n_{++k}}{n_{t+k} n_{+e k}},$$
where we denote the categories of Travel with $t \in \mathrm{T}$, the categories of Education with $e \in \mathrm{E}$, and the configurations of Occupation and Residence with $k \in 0 \times \mathrm{R}$. Hence, $n_{t e k}$ is the number of observations for the combination of a category $t$ of Travel, a category $e$ of Education and a category $k$ of $0 \times R$. The use of a “+” subscript denotes the sum over an index, as in the classic book from Agresti (2013), and is used to indicate the marginal counts for the remaining variables. So, for example, $n_{t+k}$ is the number of observations for $t$ and $k$ obtained by summing over all the categories of Education. For Pearson’s $\mathrm{X}^{2}$, using the same notation we have that
$$\mathrm{X}^{2}(\mathrm{~T}, \mathrm{E} \mid 0, \mathrm{R})=\sum_{t \in \mathrm{T}} \sum_{e \in \mathrm{E}} \sum_{k \in 0 \times \mathbb{R}} \frac{\left(n_{t e k}-m_{t e k}\right)^{2}}{m_{t e k}}, \quad \text { where } \quad m_{t e k}=\frac{n_{t+k} n_{+e k}}{n_{++k}} .$$

## 统计代写|贝叶斯网络代写Bayesian network代考|Using the DAG Structure

Using the DAG we saved in dag, we can investigate whether a variable is associated with another, essentially asking a conditional independence query. Both direct and indirect associations between two variables can be read from the DAG by checking whether they are connected in some way. If the variables depend directly on each other, there will be a single arc connecting the nodes corresponding to those two variables. If the dependence is indirect, there will be two or more arcs passing through the nodes that mediate the association. In general, two sets $\mathbf{X}$ and $\mathbf{Y}$ of variables are independent given a third set $\mathbf{Z}$ of variables if there is no set of arcs connecting them that is not blocked by the conditioning variables. Conditioning on $\mathbf{Z}$ is equivalent to fixing the values of its elements, so that they are known quantities. In other words, the $\mathbf{X}$ and $\mathbf{Y}$ are separated by $\mathbf{Z}$, which we denote with $\mathbf{X} \Perp_{G} \mathbf{Y} \mid \mathbf{Z}$. Given that $\mathrm{BNs}$ are based on DAGs, we speak of $d$-separation (directed separation): a formal treatment of its definition and properties is provided in Section 6.1. For the moment, we will just say that graphical separation $\left(\Perp_{G}\right)$ implies probabilistic independence $\left(\Perp_{P}\right)$ in a $\mathrm{BN}$ : if all the paths between $\mathbf{X}$ and $\mathbf{Y}$ are blocked, $\mathbf{X}$ and $\mathbf{Y}$ are (conditionally) independent. The converse is not necessarily true: not every conditional independence relationship is reflected in the graph.
We can investigate whether two nodes in a bn object are d-separated using the dsep function. dsep takes three arguments, $x, y$ and $z$, corresponding to $\mathbf{X}, \mathbf{Y}$ and $\mathbf{Z}$; the first two must be the names of two nodes being tested for d-separation, while the latter is an optional d-separating set. So, for example, we can see from dag that both $S$ and 0 are associated with $R$.

