### 统计代写 | Statistical Learning and Decision Making代考| Conditional Independence

## 统计代写 | Statistical Learning and Decision Making代考|Conditional Independence

The reason that a Bayesian network can represent a joint distribution with fewer independent parameters than would normally be required is due to the conditional independence assumptions encoded in its graphical structure. ${ }^{17}$ Conditional independence is a generalization of the notion of independence introduced in section 2.3-1. Variables $X$ and $Y$ are conditionally independent given $Z$ if and only if $P(X, Y \mid Z)=P(X \mid Z) P(Y \mid Z)$. The assertion that $X$ and $Y$ are conditionally independent given $Z$ is written $(X \perp Y \mid Z$ ). It is possible to show from this definition that $(X \perp Y \mid Z)$ if and only if $P(X \mid Z)=P(X \mid Y, Z)$. Given $Z$, information about $Y$ provides no additional information about $X$, and vice versa. Example $2.6$ provides an example.

We can use a set of rules to determine whether the structure of a Bayesian network implies that two variables must be conditionally independent given a set of other evidence variables. ${ }^{18}$ Suppose we want to check whether $(A \perp B \mid \mathcal{C})$ is implied by the network structure, where $\mathcal{C}$ is a set of evidence variables. We have

to check all possible undirected paths from $A$ to $B$ for what is called $d$-separation. A path between $A$ and $B$ is d-separated by $\mathcal{C}$ if any of the following are true:

1. The path contains a chain of nodes, $X \rightarrow Y \rightarrow Z$, such that $Y$ is in $\mathcal{C}$.
2. The path contains a fork, $X \leftarrow Y \rightarrow Z$, such that $Y$ is in $\mathcal{C}$.
3. The path contains an inverted fork (also called a $v$-structure), $X \rightarrow Y \leftarrow Z$, such that $Y$ is not in $\mathcal{C}$ and no descendant of $Y$ is in $\mathcal{C}$. Example $2.7$ provides some intuition for this rule.

We say that $A$ and $B$ are d-separated by $\mathcal{C}$ if all paths between $A$ and $B$ are $\mathrm{d}-$ separated by $\mathcal{C}$. This d-separation implies that $(A \perp B \mid \mathcal{C}) .{ }^{19}$ Example $2.8$ demonstrates this process for checking whether a graph implies a particular conditional independence assumption.

Sometimes the term Markoo blanket ${ }^{20}$ of a node $X$ is used to refer to the minimal set of nodes that, if their values were known, makes $X$ conditionally independent of all other nodes. A Markov blanket of a particular node turns out to consist of its parents, its children, and the other parents of its children.

## 统计代写 | Statistical Learning and Decision Making代考|Summary

Representing uncertainty as a probability distribution is motivated by a set of axioms related to the comparison of the plausibility of different statements.
There are many different families of both discrete and continuous probability distributions.Continuous probability distributions can be represented by density functions.

Probability distribution families can be combined together in mixtures to result in more flexible distributions.

Joint distributions are distributions over multiple variables.

Conditional distributions are distributions over one or more variables given values of evidence variables.

A Bayesian network is defined by a graphical structure and a set of conditional distributions.

Depending on the structure of the Bayesian network, we can represent joint distributions with fewer parameters due to conditional independence assumptions.

## 统计代写 | Statistical Learning and Decision Making代考|Exercises

Exercise 2.1. Consider a continuous random variable $X$ that follows the exponential distribution parameterized by $\lambda$ with density $p(x \mid \lambda)=\lambda \exp (-\lambda x)$ with nonnegative support. Compute the cumulative distribution function of $X$.

Solution: We start with the definition of the cumulative distribution function. Since the support of the distribution is lower-bounded by $x=0$, there is no probability mass in the interval $(-\infty, 0)$, allowing us to adjust the lower bound of the integral to 0 . After computing the integral, wẻ obtain cdf $X(x)$ :
\begin{aligned} &\operatorname{cdf}{X}(x)=\int{-\infty}^{x} p\left(x^{\prime}\right) \mathrm{d} x^{\prime} \ &\operatorname{cdf}{X}(x)=\int{0}^{x} \lambda e^{-\lambda x^{\prime}} \mathrm{d} x^{\prime} \ &\operatorname{cdf}{X}(x)=-\left.e^{-\lambda x^{\prime}}\right|{0} ^{x} \ &\operatorname{cdf}_{X}(x)=1-e^{-\lambda x} \end{aligned}
Exercise 2.2. For the density function in figure 2.6, what are the five components of the mixture? (There are multiple valid solutions.)

Solution: One solution is $\mathcal{U}([-10,-10],[-5,10]), \mathcal{U}([-5,0],[0,10]), \mathcal{U}([-5,-10],[0,0])$, $\mathcal{U}([0,-10],[10,5])$, and $\mathcal{U}([0,5],[10,10])$.

Exercise 2.3. Given the following table representation of $P(X, Y, Z)$, generate an equivalent compact decision tree representation.

