统计代写|贝叶斯统计代写Bayesian statistics代考|STA421

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

统计代写|贝叶斯统计代写beyesian statistics代考|Bayes' theorem for parametric inference

Consider a general problem in which we have data $x$ and require inference about a parameter $\theta$. In a Bayesian analysis $\theta$ is unknown and viewed as a random variable. Thus, it possesses a density function $f(\theta)$. From Bayes’ theorem ${ }^2,(1.6)$, we have
\begin{aligned} f(\theta \mid x) & =\frac{f(x \mid \theta) f(\theta)}{f(x)} \ & \propto f(x \mid \theta) f(\theta) \end{aligned}
Colloquially (1.7) is
Posterior $\propto$ Likelihood $\times$ Prior.

Most commonly, both the parameter $\theta$ and data $x$ are continuous. There are cases when $\theta$ is continuous and $x$ is discrete ${ }^3$. In exceptional cases $\theta$ could be discrete.
The Bayesian method comprises of the following principle steps

1. Prior
Obtain the prior density $f(\theta)$ which expresses our knowledge about $\theta$ prior to observing the data.
2. Likelihood
Obtain the likelihood function $f(x \mid \theta)$. This step simply describes the process giving rise to the data $x$ in terms of $\theta$.
3. Posterior
Apply Bayes’ theorem to derive posterior density $f(\theta \mid x)$ which expresses all that is known about $\theta$ after observing the data.
4. Inference
Derive appropriate inference statements from the posterior distribution e.g. point estimates, interval estimates, probabilities of specified hypotheses.

统计代写|贝叶斯统计代写beyesian statistics代考|Conjugate Bayesian updates

Example 4 Beta-Binomial. Suppose that $X \mid \theta \sim \operatorname{Bin}(n, \theta)$. We specify a prior distribution for $\theta$ and consider $\theta \sim \operatorname{Beta}(\alpha, \beta)$ for $\alpha, \beta>0$ known ${ }^4$. Thus, for $0 \leq \theta \leq 1$ we have
$$f(\theta)=\frac{1}{B(\alpha, \beta)} \theta^{\alpha-1}(1-\theta)^{\beta-1}$$
where $B(\alpha, \beta)=\frac{\Gamma(\alpha) \Gamma(\beta)}{\Gamma(\alpha+\beta)}$ is the beta function and $E(\theta)=\frac{\alpha}{\alpha+\beta}$. Recall that as
$$\int_0^1 f(\theta) d \theta=1$$
then
$$B(\alpha, \beta)=\int_0^1 \theta^{\alpha-1}(1-\theta)^{\beta-1} d \theta .$$
Using Bayes’ theorem, (1.7), the posterior is
\begin{aligned} f(\theta \mid x) \propto f(x \mid \theta) f(\theta) & =\left(\begin{array}{c} n \ x \end{array}\right) \theta^x(1-\theta)^{n-x} \times \frac{1}{B(\alpha, \beta)} \theta^{\alpha-1}(1-\theta)^{\beta-1} \ & \propto \theta^x(1-\theta)^{n-x} \theta^{\alpha-1}(1-\theta)^{\beta-1} \ & =\theta^{\alpha+x-1}(1-\theta)^{\beta+n-x-1} . \end{aligned}
So, $f(\theta \mid x)=c \theta^{\alpha+x-1}(1-\theta)^{\beta+n-x-1}$ for some constant $c$ not involving $\theta$. Now
$$\int_0^1 f(\theta \mid x) d \theta=1 \Rightarrow c^{-1}=\int_0^1 \theta^{\alpha+x-1}(1-\theta)^{\beta+n-x-1} d \theta .$$
Notice that from (1.12) we can evaluate this integral so that
$$c^{-1}=\int_0^1 \theta^{\alpha+x-1}(1-\theta)^{\beta+n-x-1} d \theta=B(\alpha+x, \beta+n-x)$$
whence
\begin{aligned} & \qquad f(\theta \mid x)=\frac{1}{B(\alpha+x, \beta+n-x)} \theta^{\alpha+x-1}(1-\theta)^{\beta+n-x-1} \ & \text { i.e. } \theta \mid x \sim \operatorname{Beta}(\alpha+x, \beta+n-x) \end{aligned}
Notice the tractability of this update: the prior and posterior distribution are both from the same family of distributions, in this case the Beta family. This is an example of conjugacy. The update is simple to perform: the number of successes observed, $x$, is added to $\alpha$ whilst the number of failures observed, $n-x$, is added to $\beta$.

统计代写|贝叶斯统计代写beyesian statistics代考|Bayes' theorem for parametric inference

$$f(\theta \mid x)=\frac{f(x \mid \theta) f(\theta)}{f(x)} \quad \propto f(x \mid \theta) f(\theta)$$

1. Prior
获取先验密度 $f(\theta)$ 这表达了我们对 $\theta$ 在观察数据之前。
2. Likelihood
获得似然函数 $f(x \mid \theta)$. 这一步简单描述了产生数据的过程 $x$ 按照 $\theta$.
3. 后验
应用贝叶斯定理推导后验密度 $f(\theta \mid x)$ 它表达了所有已知的 $\theta$ 观察数据后。
4. 推论
从后验分布中推导出适当的推论陈述，例如点估计、区间估计、特定假设的概率。

## 统计代写|贝叶斯统计代写beyesian statistics代考|Conjugate Bayesian updates

$$f(\theta)=\frac{1}{B(\alpha, \beta)} \theta^{\alpha-1}(1-\theta)^{\beta-1}$$

$$\int_0^1 f(\theta) d \theta=1$$

$$B(\alpha, \beta)=\int_0^1 \theta^{\alpha-1}(1-\theta)^{\beta-1} d \theta$$

$$f(\theta \mid x) \propto f(x \mid \theta) f(\theta)=(n x) \theta^x(1-\theta)^{n-x} \times \frac{1}{B(\alpha, \beta)} \theta^{\alpha-1}(1-\theta)^{\beta-1} \propto \theta^x(1-\theta)^{n-x} \theta^\alpha$$

$$c^{-1}=\int_0^1 \theta^{\alpha+x-1}(1-\theta)^{\beta+n-x-1} d \theta=B(\alpha+x, \beta+n-x)$$

$$f(\theta \mid x)=\frac{1}{B(\alpha+x, \beta+n-x)} \theta^{\alpha+x-1}(1-\theta)^{\beta+n-x-1} \quad \text { i.e. } \theta \mid x \sim \operatorname{Beta}(\alpha+x, \beta+n$$

统计代写|贝叶斯统计代写Bayesian statistics代考|STA602

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

统计代写|贝叶斯统计代写beyesian statistics代考|The Bayesian method

Consider a problem where we wish to make inferences about a parameter $\theta$ given data $x$. In a classical setting the data is treated as if it is random, even after it has been observed, and the parameter is viewed as a fixed unknown constant. Consequently, no probability distribution can be attached to the parameter. Conversely in a Bayesian approach parameters, having not been observed, are treated as random and thus possess a probability distribution whilst the data, having been observed, is treated as being fixed.

Example 1 Suppose that we perform $n$ independent Bernoulli trials in which we observe $x$, the number of times an event occurs. We are interested in making inferences about $\theta$, the probability of the event occurring in a single trial. Let’s consider the classical approach to this problem.
Prior to observing the data, the probability of observing $x$ was
$$P(X=x \mid \theta)=\left(\begin{array}{l} n \ x \end{array}\right) \theta^x(1-\theta)^{n-x}$$

This is a function of the (future) $x$, assuming that $\theta$ is known. If we know $x$ but don’t know $\theta$ we could treat (1) as a function of $\theta, L(\theta)$, the likelihood function. We then choose the value which maximises this likelihood. The maximum likelihood estimate is $\frac{x}{n}$ with corresponding estimator $\frac{X}{n}$.

In the general case, the classical approach uses an estimate $T(x)$ for $\theta$. Justifications for the estimate depend upon the properties of the corresponding estimator $T(X)$ (bias, consistency, …) using its sampling distribution (given $\theta$ ). That is, we treat the data as being random even though it is known! Such an approach can lead to nonsensical answers.

Example 2 Suppose in the Bernoulli trials of Example 1 we wish to estimate $\theta^2$. The maximum likelihood estimator ${ }^1$ is $\left(\frac{X}{n}\right)^2$. However this is a biased estimator as
\begin{aligned} E\left(X^2 \mid \theta\right) & =\operatorname{Var}(X \mid \theta)+E^2(X \mid \theta) \ & =n \theta(1-\theta)+n^2 \theta^2 \ & =n \theta+n(n-1) \theta^2 . \end{aligned}

## 统计代写|贝叶斯统计代写beyesian statistics代考|Bayes’ theorem

Let $X$ and $Y$ be random variables with joint density function $f(x, y)$. The marginal distribution of $Y, f(y)$, is the joint density function averaged over all possible values of $X$,
$$f(y)=\int_X f(x, y) d x .$$
For example, if $Y$ is univariate and $X=\left(X_1, X_2\right)$ where $X_1$ and $X_2$ are univariate then
$$f(y)=\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f\left(x_1, x_2, y\right) d x_1 d x_2 .$$
The conditional distribution of $Y$ given $X=x$ is
$$f(y \mid x)=\frac{f(x, y)}{f(x)}$$
so that by substituting (1.2) into (1.1) we have
$$f(y)=\int_X f(y \mid x) f(x) d x .$$
which is often known as the theory of total probability. $X$ and $Y$ are independent if and only if
$$f(x, y)=f(x) f(y) .$$
Substituting (1.2) into (1.3) we see that an equivalent result is that
$$f(y \mid x)=f(y)$$

so that independence reflects the notion that learning the outcome of $X$ gives us no information about the distribution of $Y$ (and vice versa). If $Z$ is a third random variable then $X$ and $Y$ are conditionally independent given $Z$ if and only if
$$f(x, y \mid z)=f(x \mid z) f(y \mid z) .$$

## 统计代写|贝叶斯统计代写beyesian statistics代考|The Bayesian method

$$P(X=x \mid \theta)=(n x) \theta^x(1-\theta)^{n-x}$$

$$E\left(X^2 \mid \theta\right)=\operatorname{Var}(X \mid \theta)+E^2(X \mid \theta) \quad=n \theta(1-\theta)+n^2 \theta^2=n \theta+n(n-1) \theta^2$$

## 统计代写|贝叶斯统计代写beyesian statistics代考|Bayes’ theorem

$$f(y)=\int_X f(x, y) d x .$$

$$f(y)=\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f\left(x_1, x_2, y\right) d x_1 d x_2 .$$

$$f(y \mid x)=\frac{f(x, y)}{f(x)}$$

$$f(y)=\int_X f(y \mid x) f(x) d x .$$

$$f(x, y)=f(x) f(y) .$$

$$f(y \mid x)=f(y)$$

$$f(x, y \mid z)=f(x \mid z) f(y \mid z) \text {. }$$

统计代写|贝叶斯统计代写Bayesian statistics代考|STA602

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

统计代写|贝叶斯统计代写beyesian statistics代考|GENERAL LAWS

There are those who think philosophers have already spilled more ink on the paradoxes of confirmation than they are worth, others who think them among the deepest conceptual knots in the foundations of knowledge. Like the problem of free will, the Goodman paradox owes much of its fascination to the way in which it combines urgent and topical philosophical concerns, above all, interest in the inductive roots of language and the linguistic roots of our theoretical construction of the world. My own view is that the paradoxes have at least one lesson, fraught with significance, to convey: that general laws are not necessarily confirmed by their positive cases. I. J. Good ${ }^1$ was, I believe, the first to both point this out and make a convincing case. One of his examples is rather artificial. He invites us to imagine that the live possibilities have been narrowed to just two: either the world contains a single white raven and a vast number of black ravens, or else it contains no white ravens and a modest number of black ravens. (In either case, of course, it may contain other things as well.) Since a random sample of the general population is more likely to contain a black raven in the first case than in the second, the first possibility is confirmed (i.e., made more probable). Hence, by confirming the first possibility (many black ravens and a single white raven), observation of a black raven disconfirms the hypothesis that all ravens are black.

Good’s other example is less precise but more suggestive. Crows and ravens being related species, observation of white crows (mutants, perhaps) would tend rather to lower than raise the probability that all ravens (including mutants) are black. This example provides more insight. When we fill out the description of a non-black non-raven to include its being a white crow, we bring relevant background knowledge into the foreground. The same dramatic affect on our probabilities is illustrated in a somewhat sharper form when we fill out our description of a non-reactive specimen of non $\mathrm{U}^{238}$ (the heavy isotope of uranium) to include its being an inert specimen of $\mathrm{U}^{235}$ (the lighter isotope). Atomic theory instructs us that the chemical properties of an element are independent of isotopy, and so we should hardly expect observation of inert specimens of the lighter isotope of uranium to increase our confidence that samples of the heavier isotope are reactive. There is an even better example to illustrate the point, one which has the virtue of bringing knowable probabilities into play.

Consider the classical problem of matches. ${ }^2$ A case would be assorting $N$ hats at random among their $N$ owners; the problem is to compute the probability of a match (a man receiving his own hat). Let $H$ be the hypothesis that no man receives his own hat (no matches). Of the first two men queried, we learn that neither received his own hat (in conformity with $H$ ). This outcome, call it $X$, will confirm $H$. But let us see what happens when we pick out various subevents of $X$.

## 统计代写|贝叶斯统计代写beyesian statistics代考|RESOLUTION OF THE PARADOXES

The lynch pin of the Goodman paradox is the inference from ‘a green emerald examined before time $t$ is a grue emerald’ to ‘examination of an emerald before time $t$ which proves to be green confirms the hypothesis that all emeralds are grue’. But when we fill out our description of a grue emerald to include its being green and examined prior to time $t$, we single out a subevent, and no inference to the confirmation of the grue hypothesis can be drawn. No more than we can infer confirmation of the reactivity of the heavy isotope of uranium by inert specimens of the light isotope from the fact that the latter are non-samples of the heavy isotope.

Nor, for that matter, can we even infer confirmation of the grue hypothesis by observation of grue emeralds, for, in general, as Good’s first example illustrates, we cannot conclude confirmation of ‘All $A$ are $B$ ‘ from observation of $A B$ ‘s. The possible worlds (i.e., the possible states of the actual world with respect to a specific population and set of properties) which are assigned high prior probability in the light of background knowledge and contain many $A B$ ‘s may all contain an $A$ which is non- $B$. Just as in Good’s example, finding an $A B$ would, by raising the probabilities of these possible worlds, lower the probability that all $A$ are without exception $B$.

The same is true, a fortiori, of non- $A B$ ‘s. In fact, it is quite easy to think up cases where observation of a non- $A B$ would disconfirm ‘All $A$ are $B$ ‘. This would be true, for example, if the numbers of $A$ ‘s and $B$ ‘s were known and finite. (For ‘All $A$ are $B$ ‘ to have non-zero prior probability would then require that the known number of $B$ ‘s exceed the known number of $A$ ‘s.) Each non- $A B$ found would then reduce the probability that all $A$ are $B$, the probability vanishing entirely when the observed number of non- $A B$ ‘s surpassed the known excess of $B$ ‘s over $A$ ‘s.

When background knowledge is admitted, e.g., in the form of a probability distribution over possible states of the considered population, very little can be inferred in general about the confirmation of a general law by its ‘positive cases’ (in any straightforward sense of this term). Given a probabilistic analysis of confirmation, then, the paradoxes are stopped dead in their tracks. We cannot infer the confirmation of the grue hypothesis by grue emeralds (much less by green emeralds), nor that of the raven hypothesis by white shoes or red herrings.

## 统计代写|贝叶斯统计代写beyesian statistics代考|GENERAL LAWS

Good 的另一个例子不太精确，但更具启发性。乌鸦和渡鸦是相关物种，观察白乌鸦（也许是突变体）倾向于降低而不是提高所有乌鸦（包括突变体）都是黑色的概率。这个例子提供了更多的洞察力。当我们填写非黑非乌鸦的描述以包括它是一只白乌鸦时，我们将相关的背景知识带入了前台。当我们填写对非反应性样本的描述时，对我们概率的同样戏剧性影响会以一种更清晰的形式说明。在238（铀的重同位素）包括它是在235（较轻的同位素）。原子理论告诉我们，元素的化学性质与同位素无关，因此我们几乎不应该期望通过观察铀的较轻同位素的惰性样品来增加我们对较重同位素样品具有反应性的信心。有一个更好的例子来说明这一点，它具有将可知概率发挥作用的优点。

## 有限元方法代写

统计代写|贝叶斯统计代写Bayesian statistics代考|FNR6560

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 统计代写|贝叶斯统计代写beyesian statistics代考|NOISELESS INFORMA TION

An experiment $X$ (or channel) is noiseless for $\theta$ if $H_\theta(X)=0$. I.e., knowledge of the true state or transmitted message removes all uncertainty regarding the outcome of $X$. Each outcome $x$ of $X$ may then be identified with the set of states $\theta$ such that $x$ occurs when $\theta$ obtains, and so $X$ is effectively a partition of $\theta$, the set of states or possible messages. Consider now sequences of experiments or repetitions of an experiment where at each step there are $n$ possible outcomes. Following Sneed (1967), I shall speak of $n$-ary questioning procedures. Given a noiseless channel, our problem may be to find the most efficient of all the $n$-ary questioning procedures ( $n$ is typically a function of the channel). It is not hard to see that the most efficient maximizes the $E S I$ or transmitted information. (This maximum is often called the channel capacity.) For noiseless channels, $T(X ; \theta)=T(\theta ; X)=H(X)-H_\theta(X)=H(X)$. The best questioning procedure therefore maximizes $H(X)$, the outcome entropy, at each step. In particular, this procedure will identify the true state or message in a minimum number of steps, provided all partitions are feasible. In general, we must distinguish between the average number of steps it takes a procedure to identify the true state and the number of steps it requires to identify the true state. The latter is found by assuming the a priori least favorable distribution of states – the uniform distribution. For equiprobable messages, the best questioning procedure partitions the set of live possibilities into equinumerous subsets. I refer to this principle as the uniform partition strategy. For the problem of locating a square on a checkerboard discussed earlier, this strategy directs us to divide the number of remaining squares in half at each step. The following example further illustrates the efficiency of the uniform partition strategy.

EXAMPLE 5 (the odd ball). Given twelve steel balls, eleven of which are of the same weight, the problem is to locate the odd ball in three weighings with a pan balance, and to determine whether the odd ball is heavier or lighter than the eleven standard weights. (Thus, we seek a 3-ary questioning procedure that requires only three steps.) I number the balls $1, \ldots, 12$, and assign each of the 24 possible states $1 H, 1 L, 2 H, 2 L, \ldots, 12 H, 12 L$ (‘ $H$ ‘ for ‘heavier’, ‘ $L$ ‘ for ‘lighter’) equal probability. To insure noiselessness, I permit only weighings of equal numbers of balls. (Before reading on, the reader may wish to attempt a solution of this problem by trial and error.)

Solution. The uniform partition strategy determines the best first weighing as four against four (not, as many people initially guess, six against six). Say we weigh $1,2,3,4$ against $5,6,7,8$. Then all three possible outcomes of the weighing are equiprobable and the set of 24 possibilities is uniformly partitioned into three sets of 8 elements each. E.g., if the left pan is heavier, the unexcluded possibilities are $1 H, 2 H, 3 H, 4 H, 5 L, 6 L, 7 L, 8 L$. Given this outcome, let us find a best second weighing. Since there are 8 remaining possibilities, the best second weighing will partition this set into three subsets of 3,3 and 2 elements, the best feasible approximation to a uniform partition. Weighing $1,2,9$ against $3,4,5$ achieves this most nearly uniform partition, and is therefore a best second weighing. (N.B., 9 is known to be a standard weight.) Whatever outcome this best second weighing produces, the true state can be found on a third weighing. E.g., if the pans balance on the second weighing, leaving the possibilities $6 L, 7 L, 8 L$, weigh ball 6 against ball 7. If they balance, you are left with $8 L$, etc. The reader is invited to find a best second weighing in the case where the pans balance on the first weighing. Pursuit of the uniform partition strategy will yield the solution in three weighings whatever the outcome of each weighing. I.e., this questioning procedure requires only three questions.

## 统计代写|贝叶斯统计代写beyesian statistics代考|INFORMATION

Our discussion has by no means exhausted the measures of information that have been proposed. I have focused on what seem to me the most fundamental and most useful concepts, and on those which play a role in subsequent chapters. I have also wholly neglected the vast psychological literature dealing with applications of information theory to learning, perception and related problem areas. Much of this material is relevant to our concerns and highly suggestive, and so this is a serious omission. For a useful introduction to this literature, consult Atneave (1954), (1959), and Garner (1962). One would expect the ‘disinterested’ measures studied here to induce the same (or nearly the same) ranking of experiments, but I have not investigated the matter in detail (nor has anyone else, to my knowledge).
When we compare ‘interested’ with ‘disinterested’ measures, on the other hand, the matter is quite otherwise. The EVSI, we saw (Example 6), is not an increasing function of the $E S I$ and the two can induce opposite rankings of the same pair of experiments. Consider another ‘interested’ measure (Blackwell and Girshick, 1954) which ranks one experiment higher than another if any loss function attainable with the latter is at tainable with the former. As Lindley (1956) shows, one experiment ranked higher than another by this method must also have higher $E S I$, but the converse fails. Blackwell and Girshick show, for example, that in comparing the hypothesis that two traits $F$ and $G$ are unassociated with any alternative of dependence (where the proportions with which the two traits $F, G$ occur in the general population are known), it is most informative to sample that one of the four traits $F, G$, non- $F$, non- $G$ which is rarest in the considered population. This result can be verified directly for the ESI, and it follows from Lindley’s more general result.

If utility one is assigned to the ‘acceptance’ of a true hypothesis and utility zero to the ‘acceptance’ of a false hypothesis, then expected loss reduces to the expected proportion of errors (i.e., of false accepted hypotheses). Lindley’s result, seen in this light, is somewhat reassuring. On the other hand, as Marshak (1974) observes, if $a_i$ is the action of affirming $H_i$, and we posit the ‘disinterested’ utilities $U\left(a_i, s_j\right)=\delta_{i j}$ (Kronecker’s delta, which is 1 or 0 according as $i=j$ or $i \neq j$ ), then the $C V S I$ of outcome $x$ becomes
$$\text { (1.34) } \max _i P\left(H_i / x\right)-\max _i P\left(H_i\right)$$
as the reader can easily verify. However, not even this drastic constraint on the scientist’s utilities will insure that the $E V S I$ and $E S I$ induce the same ranking of two or more experiments. One has only to note that the entropy of one distribution can exceed that of a second even though the maximal element of the first also exceeds the maximal element of the second.

## 统计代写|贝叶斯统计代写beyesian statistics代考|INFORMATION

(1.34) 最大限度一世磷(H一世/X)−最大限度一世磷(H一世)

## 有限元方法代写

统计代写|贝叶斯统计代写Bayesian statistics代考|STAT206

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 统计代写|贝叶斯统计代写beyesian statistics代考|THE UTILITY OF INFORMATION

This section concerns risky decision making, or decision making with incomplete knowledge of the state of nature. For example, the decision might involve classifying a patient as infected or uninfected, marketing or withholding a new drug, or determining an optimal allocation of stock. When one of the options is best under every possible circumstance, the choice is clear (the so-called ‘sure-thing principle’). In general, though, the best course of action depends on which state of nature obtains. It is clear that if one has fairly sharply defined probabilities for the different states, and fairly well defined views on the desirability of performing the several actions under the considered states, then the best action is that which has highest utility at the most probable states. If numerical utilities and probabilities are assigned, we are led to a sharper, quantitative form of this principle; choose that action which maximizes expected utility. The expected utility of an action is the weighted average of its utilities under the several states, the weights being the respective probabilities of those states. The rule in question is variously referred to as the expected utility rule or the Bayes decision rule. An action which is best by the lights of the rule (i.e., an action which maximizes expected utility) is called a Bayes act.

## 统计代写|贝叶斯统计代写beyesian statistics代考|DISINTERESTED INFORMA TION

（1.7）H(p1,…,p米)=−小号一世p一世日志⁡p一世.

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

统计代写|贝叶斯分析代写Bayesian Analysis代考|STAT4102

statistics-lab™ 为您的留学生涯保驾护航 在代写贝叶斯分析Bayesian Analysis方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写贝叶斯分析Bayesian Analysis代写方面经验极为丰富，各种代写贝叶斯分析Bayesian Analysis相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 统计代写|贝叶斯分析代写Bayesian Analysis代考| Do Not Forget the Importance of the Variance in the TNormal Distribution

The variance captures our uncertainty about the weighted function. Because the TNormal for ranked nodes is always in the range $[0,1]$ any variance above $0.5$ would be considered very high (you should try it out on a simple weighted mean example). You may need to experiment with the variance to get it just right.

In each of the previous examples the variance was a constant, but in many situations the variance will be dependent on the parents. For example, consider the $\mathrm{BN}$ in Figure $9.40$ that is clearly based on a definitional idiom.

In this case system quality is defined in terms of the quality of two subsystems $S 1$ and $S 2$. It seems reasonable to assume all nodes are ranked and that the NPT for System quality should be a TNormal whose mean is a weighted mean of the parents. Assuming that the weights of $S 1$ and $S 2$ are equal we therefore define the mean of the TNormal as wmean $(S 1, S 2)$.

However, it also seems reasonable to assume that the variance depends on the difference between the two subsystem qualities. Consider, for example these two scenarios for subsystems $S 1$ and $S 2$ :

1. Both $S 1$ and $S 2$ have “medium” quality.
2. $S 1$ quality is “very high,” while $S 2$ quality is “very low.”
If the variance in the TNormal expression is fixed at, say $0.1$, then the System Quality in both scenarios 1 and 2 will be the same-as is shown in Figure 9.41(a) and (b). Specifically, the system quality in both cases is medium but with a lot of uncertainty.

However, it seems logical to assume that there should be less uncertainty in scenario 1 (when both subsystems have the same, medium, quality) than in scenario 2 (when both subsystems have very different levels of quality). To achieve the required result we therefore have to ensure that the variance in the TNormal expression is a function of the difference in subsystem qualities. Setting the variance as abs(S1-S2)/5 produces the required result as shown in Figure 9.41(c) and (d).

The use of a variable variance also enables us to easily implement the measurement idiom in the case where all the nodes of the idiom are ranked. This is explained in Box 9.12. The special case of indicator nodes is shown in Box 9.13.

## 统计代写|贝叶斯分析代写Bayesian Analysis代考|Elicitation Protocols and Cognitive Biases

We are aiming to build a scientific model, so open, factual, and honest discussion of the risks, our beliefs (i.e., theories) about how they interrelate, and what the probabilities are is of the utmost importance. The elicitor (the modeler/risk analyst) and the elicitee (the subject matter expert) must be mutually respectful of each other’s professionalism, skills, and objectives. Attributes of a good elicitation protocol involve elicitors making an effort to understand subject matter sufficiently to probe and challenge discussion in order to allow experts to sharpen and refine thinking. Similarly, more accurate probabilities are elicited when people are asked for reasons for them, but the BN structure supplies some or all of this, thus making this easier than when asking for probabilities alone. Without these prerequisites the elicitation exercise will be futile.

Some practical advice on how to elicit numbers from experts is provided in O’Hagan et al (2006). Box $9.14$ provides some examples of what has been used, based primarily on Spetzler and von Holstein 1975 (also known as the Stanford Elicitation Prototcol).

There is plenty of advice on how not to perform elicitation from the field of cognitive psychology as pioneered by Kahneman and colleagues (1982). A summary (by no means exhaustive) of the well-known biases is listed next and we recommend that these be presented and discussed with experts as part of any pre-elicitation training:

• Ambiguity effect-Avoiding options for which missing information makes the probability seem unknown.
• Attentional bias-Neglecting relevant data when making judgments of a correlation or association.
• Availability heuristic-Estimating what is more likely by what is more available in memory, which is biased toward vivid, unusual, or emotionally charged examples.
• Base rate neglect-Failing to take account of the prior probability. This was at the heart of the common fallacious reasoning in the Harvard medical study described in Chapter 2 . It is the most common reason for people to feel that the results of Bayesian inference are nonintuitive.
• Bandwagon effect – Believing things because many other people do (or believe) the same. Related to groupthink and herd behavior.
• Confirmation bias-Searching for or interpreting information in a way that confirms one’s preconceptions.
• Déformation professionnelle-Ignoring any broader point of view and seeing the situation through the lens of one’s own professional norms.

## 统计代写|贝叶斯分析代写贝叶斯分析代考|不要忘记方差在t正态分布中的重要性

1. $S 1$和$S 2$都是中等质量。
2. $S 1$质量“非常高”，而$S 2$质量“非常低”。如果TNormal表达式中的方差固定在，比如$0.1$，那么在场景1和场景2中的系统质量将是相同的，如图9.41(a)和(b)所示。具体地说，在这两种情况下，系统质量是中等的，但有很大的不确定性然而，假设场景1(当两个子系统具有相同的中等质量时)的不确定性应该比场景2(当两个子系统具有非常不同的质量水平时)的不确定性更低似乎是合乎逻辑的。因此，为了达到所需的结果，我们必须确保TNormal表达式中的方差是子系统质量差异的函数。将方差设为abs(S1-S2)/5会产生如图9.41(c)和(d)所示的结果变量方差的使用还使我们能够轻松地实现度量习惯用法，在这种情况下，习惯用法的所有节点都是排序的。这将在框9.12中解释。指示节点的特殊情况在框9.13中显示
统计代写|贝叶斯分析代写贝叶斯分析代考|启发式协议和认知偏差
我们的目标是建立一个科学的模型，所以公开、实事求是和诚实地讨论风险，我们的信念(即理论)是如何相互联系的，以及概率是什么是最重要的。激发者(建模师/风险分析师)和被激发者(主题专家)必须相互尊重对方的专业知识、技能和目标。一个好的诱导协议的属性包括诱导者努力充分理解主题，以探索和挑战讨论，以便让专家们提高和精炼思维。类似地，当人们被问及其原因时，会引出更准确的概率，但BN结构提供了部分或全部这些，因此比单独询问概率更容易。没有这些先决条件，启发练习将是徒劳的O’Hagan等人(2006)就如何从专家那里引出数字提供了一些实用的建议。Box $9.14$提供了一些已经使用的例子，主要基于Spetzler和von Holstein 1975(也称为斯坦福启发协议)。Kahneman和他的同事(1982)在认知心理学领域率先提出了很多关于如何不进行诱导的建议。下面是对众所周知的偏见的总结(并非详尽无遗)，我们建议将这些偏见作为任何预诱导培训的一部分与专家讨论:
• 歧义效应—避免信息缺失使概率看起来未知的选项。注意偏差-在对相关或关联做出判断时忽略相关数据。
• 可用性启发式-通过记忆中更多的可用性来估计什么更有可能发生，这偏向于生动的、不寻常的或情绪化的例子。
• 基准率忽略-未考虑先验概率。这就是第二章中描述的哈佛医学研究中常见谬误推理的核心。人们觉得贝叶斯推断的结果是非直观的，这是最常见的原因。
• 从众效应-相信一些事情，因为许多其他人也这么做(或相信)。与群体思维和从众行为有关。
• 确认偏误——以一种证实某人先入为主的方式搜索或解释信息。
• Déformation professionnelle-忽略任何更广泛的观点，通过自己的专业规范来看待情况

## 有限元方法代写

统计代写|贝叶斯分析代写Bayesian Analysis代考|STATS3023

statistics-lab™ 为您的留学生涯保驾护航 在代写贝叶斯分析Bayesian Analysis方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写贝叶斯分析Bayesian Analysis代写方面经验极为丰富，各种代写贝叶斯分析Bayesian Analysis相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 统计代写|贝叶斯分析代写Bayesian Analysis代考|Hints and Tips When Working with Ranked Nodes and NPTs

We have found that the set of weighted functions (i.e., WMEAN, WMIN, WMAX, and MIXMINMAX) is sufficient to generate almost any ranked node NPT in practice where the ranked node’s parents are all ranked.

In cases where the weighted function does not exactly capture the requirements for the node’s $\mathrm{NPL}^{\prime}$ it is usually possible to get to what you want by manually tweaking the NPT that is generated by a weighted function. For example, Figure $9.37$ shows a part of the table that is automatically generated for the node $Y$ as specified in Figure 9.31.

You will note that the probability of $Y$ being “very high” when both parents are “very low” is very close to 0 but not equal to 0 . If you really want this probability to be 0 then you can simply enter 0 manually into that cell.

## 统计代写|贝叶斯分析代写Bayesian Analysis代考|Exploit the Fact That a Ranked Node Parent Has an Underlying Numerical Scale

In many real-world models you will find that nodes that are not ranked nodes will have one or more parents that are ranked. In such situations you can exploit the underlying numerical property of the ranked node parent to define the NPT of the child node. For example, it makes sense to extend the model of Figure $9.35$ by adding a Boolean node called Release Product? which is true when the product has been sufficiently well tested to be released and false otherwise. The extended model is shown in Figure 9.38.

We could as usual define the NPT for the new Boolean node manually (it has 10 entries). But it makes much more sense and is far simpler to exploit the fact that the node $Y$ has an underlying numerical value between 0 and 1. Since we have a 5-point scale we know that if $Y$ is above $0.5$ then the quality is at least “medium.” If the value is $0.7$ then the quality is in the middle of the “high” range. So, suppose that previous experience suggests that testing effectiveness needs to be “high” in order for the product to be released without too many problems. Then we can simply define the NPT of the node Release product? by the expression:
if $(\mathrm{Y}>0.7$, “True”, “False”).
The effect of running the resulting model with some observations is shown in Figure 9.39.

.

## 统计代写|贝叶斯分析代写贝叶斯分析代考|利用分级节点父节点具有底层数值尺度的事实

if $(\mathrm{Y}>0.7$， “True”， “False”)。运行结果模型和一些观察结果的效果如图9.39所示

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

统计代写|贝叶斯分析代写Bayesian Analysis代考|MAST90125

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

统计代写|贝叶斯分析代写Bayesian Analysis代考|Weighted Averages

A common simple approach to quantitative risk assessment is to use a weighted average score to combine risks and produce an overall “risk score” as shown in Table 9.7. This is purely arithmetical and is easily implemented in a spreadsheet, such as Excel. Here we have identified three risks to a project: Risk $\mathrm{A}$, Risk $\mathrm{B}$ and Risk $\mathrm{C}$ with respective probabilities $10 \%, 20 \%$ and $80 \%$ and “weights” 3,2 , and 1 . This produces an overall weighted average risk score of $25 \%$.

As we saw in Chapter 3, this is the “risk register” approach that can be viewed as the extension of the simple approach to risk-assessment in which we define risk as probability times impact. Specifically, the impacts are viewed as relative “weights.”

For all of the reasons discussed in Chapter 3 we do not recommend this approach to risk assessment, but there may be many reasons why we would want to incorporate weighted averages into a BN. For example, we might wish to use a weighted average as a score to determine which new car to buy based on criteria such as price, quality, and delivery time. Although the weighted average is deterministic (and therefore can be computed in Excel) the values for the criteria could be based on a range of uncertain factors and relationships that require a BN model in which the weighted average is just a component.

Fortunately, it is possible to replicate weighted averages (using the same example probabilities and weights as Table 9.7) in a BN as shown in Figure 9.23.

Each of the risk factors is represented by a Boolean node whose “probability” is simply specified as the “True” value in the NPTso, for example, since Risk A has probability $10 \%$ we set its NPT as “True” $=10 \%$. The Risk Score node is also Boolean but it makes sense to replace the labels “False” and “True” with “Low” and “High,” respectively. The key to ensure we can replicate the weighted average calculation is to introduce the labelled node Weights whose states correspond to the three risk node weights. The normalised weights are used in the NPT for this node.

## 统计代写|贝叶斯分析代写Bayesian Analysis代考|Alternative Weighted Functions

The weighted mean is not the only natural function that could be used as the mean of the TNormal ranked node NPTs. Suppose, for example, that in Figure $9.26$ we replace the node Quality of Testing Process with the node Testing Effort as shown in Figure 9.35.
In this case we elicit the following information:

• When $X_1$ and $X_2$ are both “very high” the distribution of $Y$ is heavily skewed toward “very high.”
• When $X_1$ and $X_2$ are both “very low” the distribution of $Y$ is heavily skewed toward “very low.”
• When $X_1$ is very low and $X_2$ is “very high” the distribution of $Y$ is centered toward “very low.”
• When $X_1$ is very high and $X_2$ is “very low” the distribution of $Y$ is centered toward “low.”

Intuitively, the expert is saying here that, for testing to be effective, you need not just to have good people but also to put in the effort. If either the people or the effort is insufficient, then the result will be poor. However, really good people can compensate to a small extent for lack of effort.
A simple weighted mean for $Y$ will not produce an NPT to satisfy these elicited requirements (you can try it out by putting in different weights; you will never be able to satisfy both of the last two elicited constraints). Informally, $Y$ ‘s mean is something like the minimum of the parent values, but with a small weighting in favor of $X_1$. The necessary function, which we call the weighted min function (WMIN), is what is needed in this case. The general form of this function (together with analogous WMAX and the mixture function MIXMINMAX) is shown in Box 9.11. You need not know the details because the function is built into AgenaRisk, so it is sufficient to know what the effect of the function is with different values.

## 统计代写|贝叶斯分析代写贝叶斯分析代考|备选加权函数

• 当$X_1$和$X_2$都是“非常高”时，$Y$的分布严重偏向于“非常高”。当$X_1$和$X_2$都是“非常低”时，$Y$的分布严重偏向于“非常低”。
• 当$X_1$非常低，$X_2$非常高时，$Y$的分布以“非常低”为中心。
• 当$X_1$非常高，$X_2$是“非常低”时，$Y$的分布以“低”为中心。

## 有限元方法代写

统计代写|贝叶斯分析代写Bayesian Analysis代考|MAST90125

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 统计代写|贝叶斯分析代写Bayesian Analysis代考|The Crucial Independence Assumptions

Take a look again at the BN model of Figure $7.3$ and the subsequent calculations we used. Using the terminology of Chapter 5 what we have actually done is use some crucial simplifying assumptions in order to avoid having to work out the full joint probability distribution of:
(Norman late, Martin late, Martin oversleeps, Train strike) We will write this simply as $(N, M, O, T)$
For example, in calculating the marginal probability of $\operatorname{Martin}$ late $(M)$ we assumed that $M$ was dependent only on Martin oversleeps $(O)$ and Train strike $(T)$. The variable Norman late $(N)$ simply did not appear in the equation because we assume that none of these variables are directly dependent on $N$. Similarly, although $M$ depends on both $O$ and $T$, the variables $O$ and $T$ are independent of each other.

These kind of assumptions are called conditional independence assumptions (we will provide a more formal definition of this later). If we were unable to make any such assumptions then the full joint probability distribution of $(N, M, O, T)$ is (by the chain rule of Chapter 5)
$$P(N, M, O, T)=P(N \mid M, O, T) P(M \mid O, T) P(O \mid T) P(T)$$
However, because $N$ directly depends only on $T$ the expression $P(N \mid M, O, T)$ is equal to $P(N \mid T)$, and because $O$ is independent of $T$ the expression $P(O \mid T)$ is equal to $P(O)$.
Hence, the full joint probability distribution can be simplified as:
$$P(N, M, O, T)=P(N \mid T) P(M \mid O, T) P(O) P(T)$$
and this is exactly what we used in the computations.

## 统计代写|贝叶斯分析代写Bayesian Analysis代考|Structural Properties of BNs

In $\mathrm{BNs}$ the process of determining what evidence will update which node is determined by the conditional dependency structure. The main formal area of guidance for building sensible BN structures therefore requires some understanding of different types of relationships between variables and the different ways these relationships are structured.

Generally we are interested in the following problem. Suppose that variable $A$ is linked to both variables $B$ and $C$. There are three different ways the links can be directed as shown in Figure 7.8. Although $B$ and $C$ are not directly linked, under what conditions in each case are $B$ and $C$ independent of $A$ ?

Knowing the answer to this question enables us to determine how to construct appropriate links, and it also enables us to formalize the different notions of conditional independence that we introduced informally in Chapter $6 .$

The three cases in Figure $7.8$ are called, respectively, serial, diverging, and converging connections. We next discuss each in turn.

Consider the example of a serial connection as shown in Figure 7.9. Suppose we have some evidence that a signal failure has occurred $(B)$. Then clearly this knowledge increases our belief that the train is delayed $(A)$, which in turn increases our belief that Norman is late $(C)$. Thus, evidence about $B$ is transmitted through $A$ to $C$ as is shown in Figure 7.10.

However, now suppose that we know the true status of $A$; for example, suppose we know that the train is delayed. Then this means we have hard evidence for A (see Box $7.5$ for an explanation of what hard and uncertain evidence are and how they differ).

## 统计代写|贝叶斯分析代写Bayesian Analysis代考|The Crucial Independence Assumptions

$$P(N, M, O, T)=P(N \mid M, O, T) P(M \mid O, T) P(O \mid T) P(T)$$

$$P(N, M, O, T)=P(N \mid T) P(M \mid O, T) P(O) P(T)$$

## 有限元方法代写

统计代写|贝叶斯分析代写Bayesian Analysis代考|MAST90125

statistics-lab™ 为您的留学生涯保驾护航 在代写贝叶斯分析Bayesian Analysis方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写贝叶斯分析Bayesian Analysis代写方面经验极为丰富，各种代写贝叶斯分析Bayesian Analysis相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

统计代写|贝叶斯分析代写Bayesian Analysis代考|Accounting for Multiple Causes

Norman is not the only person whose chances of being late increase when there is a train strike. Martin is also more likely to be late, but Martin depends less on trains than Norman and he is often late simply as a result of oversleeping. These additional factors can be modeled as shown in Figure 7.3.

You should add the new nodes and edges using AgenaRisk. We also need the probability tables for each of the nodes Martin oversleeps (Table 7.3) and Martin late (Table 7.4).

The table for node Martin late is more complicated than the table for Norman late because Martin late is conditioned on two nodes rather than one. Since each of the parent nodes has two states, true and false (we are still keeping the example as simple as possible), the number of combinations of parent states is four rather than two.

If you now run the model and display the probability graphs you should get the marginal probability values shown Figure 7.4(a). In particular, note that the marginal probability that Martin is late is equal to $0.446$ (i.e. $44.6 \%$ ). Box $7.1$ explains the underlying calculations involved in this.

But if we know that Norman is late, then the probability that Martin is late increases from the prior $0.446$ to $0.542$ as shown in Figure 7.4(b). Box $7.1$ explains the underlying calculations involved.

## 统计代写|贝叶斯分析代写Bayesian Analysis代考|Using Propagation to Make Special

统计代写|贝叶斯分析代写Bayesian Analysis代考|Using Propagation to Make Special
This can yield some exceptionally powerful types of analysis. For example, without showing the computational steps involved, if we first enter the observation that Martin is late we get the revised probabilities shown in Figure 7.5(a).

What the model is telling us here is that the most likely explanation for Martin’s lateness is Martin oversleeping; the revised probability of a train strike is still low. However, if we now discover that Norman is also late (Figure 7.5(b)) then Train strike (rather than Martin oversleeps) becomes the most likely explanation for Martin being late. This particular type of (backward) inference is called explaining away (or sometimes called nonmonotonic reasoning). Classical statistical tools alone do not enable this type of reasoning and what-if analysis.

In fact, as even the earlier simple example shows, BNs offer the following benefits:

• Explicitly model causal factors – It is important to understand that this key benefit is in stark contrast to classical statistics whereby prediction models are normally developed by purely data-driven approaches. For example, the regression models introduced in Chapter 2 use historical data alone to produce equations relating dependent and independent variables. Such approaches not only fail to incorporate expert judgment in scenarios where there is insufficient data, but also fail to accommodate causal explanations. We will explore this further in Chapter $9 .$
• Reason from effect to cause and vice versa-A BN will update the probability distributions for every unknown variable whenever an observation is entered into any node. So entering an observation in an “effect” node will result in back propagation, that is, revised probability distributions for the “cause” nodes and vice versa. Such backward reasoning of uncertainty is not possible in other approaches.
• Reduce the burden of parameter acquisition-A BN will require fewer probability values and parameters than a full joint probability model. This modularity and compactness means that elicitation of probabilities is easier and explaining model results is made simpler.

## 统计代写|贝叶斯分析代写Bayesian Analysis代考|Using Propagation to Make Special

• 显式建模因果因素——重要的是要了解，这一关键优势与经典统计形成鲜明对比，经典统计通常通过纯粹的数据驱动方法开发预测模型。例如，第 2 章介绍的回归模型仅使用历史数据来生成与因变量和自变量相关的方程。这种方法不仅无法在数据不足的情况下纳入专家判断，而且无法适应因果解释。我们将在本章中进一步探讨9.
• 从结果到原因的原因，反之亦然 – 每当将观察输入任何节点时，BN 都会更新每个未知变量的概率分布。因此，在“影响”节点中输入观察结果将导致反向传播，即修改“原因”节点的概率分布，反之亦然。这种对不确定性的反向推理在其他方法中是不可能的。
• 减少参数获取的负担——与完整的联合概率模型相比，BN 将需要更少的概率值和参数。这种模块化和紧凑性意味着概率的引出更容易，模型结果的解释也变得更简单。

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

