## 统计代写|贝叶斯分析代写Bayesian Analysis代考|Dealing with a priori ignorance

The Bayesian approach requires a prior distribution to be specified even when there is complete (or total) a priori ignorance (meaning no prior information at all). This feature presents a general and philosophical problem with the Bayesian paradigm, one for which several theoretical solutions have been advanced but which does not yet have a universally accepted solution. We have already discussed finding an uninformative prior in relation to particular Bayesian models, as follows.

For the normal-normal model defined by $\left(y_{1}, \ldots, y_{n} \mid \mu\right) \sim$ iid $N\left(\mu, \sigma^{2}\right)$ and $\mu \sim N\left(\mu_{0}, \sigma_{0}^{2}\right)$, an uninformative prior is given by $\sigma_{0}=\infty$, that is, $f(\mu) \propto 1, \mu \in \Re$

For the normal-gamma model defined by $\left(y_{1}, \ldots, y_{n} \mid \mu\right) \sim$ iid $N(\mu, 1 / \lambda)$ and $\lambda \sim \operatorname{Gamma}(\alpha, \beta)$, an uninformative prior is given by $\alpha=\beta=0$, that is, $f(\lambda) \propto 1 / \lambda, \lambda>0$.

For the binomial-beta model defined by $(y \mid \theta) \sim \operatorname{Binomial}(n, \theta)$ and $\theta \sim \operatorname{Beta}(\alpha, \beta)$ (having the posterior $(\theta \mid y) \sim \operatorname{Beta}(\alpha+y, \beta+n-y)$ ), an uninformative prior is the Bayes prior given by $\alpha=\beta=1$, that is, $f(\theta)=1,0<\theta<1$. This is the prior that was originally advocated by Thomas Bayes.

Unlike for the normal-normal and normal-gamma models, more than one uninformative prior specification has been proposed as reasonable in the context of the binomial-beta model.
One of these is the improper Haldane prior, defined by $\alpha=\beta=0$, or
$$f(\theta) \propto \frac{1}{\theta(1-\theta)}, \quad 0<\theta<1$$
Under the prior $\theta \sim \operatorname{Beta}(\alpha, \beta)$ generally, the posterior mean of $\theta$ is
$$\hat{\theta}=E(\theta \mid y)=\frac{(\alpha+y)}{(\alpha+y)+(\beta+n-y)}=\frac{\alpha+y}{\alpha+\beta+n}$$
This reduces to the MLE $y / n$ under the Haldane prior but not under the Bayes prior. In contrast, the Bayes prior leads to a posterior mode which is equal to the MLE.

## 统计代写|贝叶斯分析代写Bayesian Analysis代考|The Jeffreys prior

The statistician Harold Jeffreys devised a rule for finding a suitable uninformative prior in a wide variety of situations. His idea was to construct a prior which is invariant under reparameterisation. For the case of a univariate model parameter $\theta$, the Jeffreys prior is given by the following equation (also known as Jeffreys’ rule):
$$f(\theta) \propto \sqrt{I(\theta)},$$
where $I(\theta)$ is the Fisher information defined by
$$I(\theta)=E\left{\left(\frac{\partial}{\partial \theta} \log f(y \mid \theta)\right)^{2} \mid \theta\right} .$$
Note 1: If $\log f(y \mid \theta)$ is twice differentiable with respect to $\theta$, and certain regularity conditions hold, then
$$I(\theta)=-E\left{\frac{\partial^{2}}{\partial \theta^{2}} \log f(y \mid \theta) \mid \theta\right} .$$
Note 2: Jeffreys’ rule also extends to the multi-parameter case (not considered here).

The significance of Jeffreys’ rule may be described as follows. Consider a prior given by $f(\theta) \propto \sqrt{I(\theta)}$ and the transformed parameter $\phi=g(\theta)$,where $g$ is a strictly increasing or decreasing function. (For simplicity, we only consider this case.) Then the prior density for $\phi$ is
\begin{aligned} f(\phi) & \propto f(\theta)\left|\frac{\partial \theta}{\partial \phi}\right| \text { by the transformation rule } \ & \propto \sqrt{I(\theta)\left(\frac{\partial \theta}{\partial \phi}\right)^{2}}=\sqrt{E\left{\left(\frac{\partial}{\partial \theta} \log f(y \mid \theta)\right)^{2} \mid \theta\right}\left(\frac{\partial \theta}{\partial \phi}\right)^{2}} \ &=\sqrt{E\left{\left(\frac{\partial}{\partial \theta} \log f(y \mid \theta) \frac{\partial \theta}{\partial \phi}\right)^{2} \mid \theta\right}} \ &=\sqrt{E\left{\left(\frac{\partial}{\partial \phi} \log f(y \mid \phi)\right)^{2} \mid \phi\right}} \ &=\sqrt{I(\phi)} . \end{aligned}
Thus, Jeffreys’ rule is ‘invariant under reparameterisation’, in the sense that if a prior is constructed according to
$$f(\theta) \propto \sqrt{I(\theta)},$$
then, for another parameter $\phi=g(\theta)$, it is also true that
$$f(\phi) \propto \sqrt{I(\phi)} .$$

## 统计代写|贝叶斯分析代写Bayesian Analysis代考|Bayesian decision theory

The posterior mean, mode and median, as well as other Bayesian point estimates, can all be derived and interpreted using the principles and theory of decision theory. Suppose we wish to choose an estimate of $\theta$ which minimises costs in some sense. To this end, let $L(\hat{\theta}, \theta)$ denote generally a loss function (LF) associated with an estimate $\hat{\theta}$.

Note: The estimator $\hat{\theta}$ is a function of the data $y$ and so could also be written $\hat{\theta}(y)$. For example, in the context where $(y \mid \theta) \sim \operatorname{Bin}(n, \theta)$, the sample proportion or MLE is the function given by $\hat{\theta}=\hat{\theta}(y)=y / n$.
The loss function $L$ represents the cost incurred when the true value $\theta$ is estimated by $\hat{\theta}$ and usually satisfies the property $L(\theta, \theta)=0$.
The three most commonly used loss functions are defined as follows: $L(\hat{\theta}, \theta)=|\hat{\theta}-\theta| \quad$ the absolute error loss function (AELF) L(\hat{\theta}, \theta)=(\hat{\theta}-\theta)^{2} \quad \begin{aligned}&\text { the quadratic error loss function (QELF) }\end{aligned} $L(\hat{\theta}, \theta)=I(\hat{\theta} \neq \theta)=\left{\begin{array}{ll}0 & \text { if } \hat{\theta}=\theta \ 1 & \text { if } \hat{\theta} \neq \theta\end{array}\right} \quad$ the indicator error loss function (IELF), also known as the zero-one loss function (ZOLF) or the all-or-nothing error loss function (ANLF).
Figures $2.8$ and $2.9$ illustrate these three basic loss functions.

