## 统计代写|贝叶斯统计代写beyesian statistics代考|The Bayesian method

Consider a problem where we wish to make inferences about a parameter $\theta$ given data $x$. In a classical setting the data is treated as if it is random, even after it has been observed, and the parameter is viewed as a fixed unknown constant. Consequently, no probability distribution can be attached to the parameter. Conversely in a Bayesian approach parameters, having not been observed, are treated as random and thus possess a probability distribution whilst the data, having been observed, is treated as being fixed.

Example 1 Suppose that we perform $n$ independent Bernoulli trials in which we observe $x$, the number of times an event occurs. We are interested in making inferences about $\theta$, the probability of the event occurring in a single trial. Let’s consider the classical approach to this problem.
Prior to observing the data, the probability of observing $x$ was
$$P(X=x \mid \theta)=\left(\begin{array}{l} n \ x \end{array}\right) \theta^x(1-\theta)^{n-x}$$

This is a function of the (future) $x$, assuming that $\theta$ is known. If we know $x$ but don’t know $\theta$ we could treat (1) as a function of $\theta, L(\theta)$, the likelihood function. We then choose the value which maximises this likelihood. The maximum likelihood estimate is $\frac{x}{n}$ with corresponding estimator $\frac{X}{n}$.

In the general case, the classical approach uses an estimate $T(x)$ for $\theta$. Justifications for the estimate depend upon the properties of the corresponding estimator $T(X)$ (bias, consistency, …) using its sampling distribution (given $\theta$ ). That is, we treat the data as being random even though it is known! Such an approach can lead to nonsensical answers.

Example 2 Suppose in the Bernoulli trials of Example 1 we wish to estimate $\theta^2$. The maximum likelihood estimator ${ }^1$ is $\left(\frac{X}{n}\right)^2$. However this is a biased estimator as
\begin{aligned} E\left(X^2 \mid \theta\right) & =\operatorname{Var}(X \mid \theta)+E^2(X \mid \theta) \ & =n \theta(1-\theta)+n^2 \theta^2 \ & =n \theta+n(n-1) \theta^2 . \end{aligned}

## 统计代写|贝叶斯统计代写beyesian statistics代考|Bayes’ theorem

Let $X$ and $Y$ be random variables with joint density function $f(x, y)$. The marginal distribution of $Y, f(y)$, is the joint density function averaged over all possible values of $X$,
$$f(y)=\int_X f(x, y) d x .$$
For example, if $Y$ is univariate and $X=\left(X_1, X_2\right)$ where $X_1$ and $X_2$ are univariate then
$$f(y)=\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f\left(x_1, x_2, y\right) d x_1 d x_2 .$$
The conditional distribution of $Y$ given $X=x$ is
$$f(y \mid x)=\frac{f(x, y)}{f(x)}$$
so that by substituting (1.2) into (1.1) we have
$$f(y)=\int_X f(y \mid x) f(x) d x .$$
which is often known as the theory of total probability. $X$ and $Y$ are independent if and only if
$$f(x, y)=f(x) f(y) .$$
Substituting (1.2) into (1.3) we see that an equivalent result is that
$$f(y \mid x)=f(y)$$

so that independence reflects the notion that learning the outcome of $X$ gives us no information about the distribution of $Y$ (and vice versa). If $Z$ is a third random variable then $X$ and $Y$ are conditionally independent given $Z$ if and only if
$$f(x, y \mid z)=f(x \mid z) f(y \mid z) .$$

## 统计代写|贝叶斯统计代写beyesian statistics代考|The Bayesian method

$$P(X=x \mid \theta)=(n x) \theta^x(1-\theta)^{n-x}$$

$$E\left(X^2 \mid \theta\right)=\operatorname{Var}(X \mid \theta)+E^2(X \mid \theta) \quad=n \theta(1-\theta)+n^2 \theta^2=n \theta+n(n-1) \theta^2$$

## 统计代写|贝叶斯统计代写beyesian statistics代考|Bayes’ theorem

$$f(y)=\int_X f(x, y) d x .$$

$$f(y)=\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f\left(x_1, x_2, y\right) d x_1 d x_2 .$$

$$f(y \mid x)=\frac{f(x, y)}{f(x)}$$

$$f(y)=\int_X f(y \mid x) f(x) d x .$$

$$f(x, y)=f(x) f(y) .$$

$$f(y \mid x)=f(y)$$

$$f(x, y \mid z)=f(x \mid z) f(y \mid z) \text {. }$$

