### 统计代写|统计推断作业代写statistics interference代考|Some more general frequentist developments

## 统计代写|统计推断作业代写statistics interference代考|Exponential family problems

Some limited but important classes of problems have a formally exact solution within the frequentist approach. We begin with two situations involving exponential family distributions. For simplicity we suppose that the parameter $\psi$ of interest is one-dimensional.

We start with a full exponential family model in which $\psi$ is a linear combination of components of the canonical parameter. After linear transformation of the parameter and canonical statistic we can write the density of the observations in the form
$$m(y) \exp \left{s_{\psi} \psi+s_{\lambda}^{T} \lambda-k(\psi, \lambda)\right}$$
where $\left(s_{\psi}, s_{\lambda}\right)$ is a partitioning of the sufficient statistic corresponding to the partitioning of the parameter. For inference about $\psi$ we prefer a pivotal distribution not depending on $\lambda$. It can be shown via the mathematical property of completeness, essentially that the parameter space is rich enough, that separation from $\lambda$ can be achieved only by working with the conditional distribution of the data given $S_{\lambda}=s_{\lambda}$. That is, we evaluate the conditional distribution of $S_{\psi}$ given $S_{\lambda}=s_{\lambda}$ and use the resulting distribution to find which values of $\psi$ are consistent with the data at various levels.

Many of the standard procedures dealing with simple problems about Poisson, binomial, gamma and normal distributions can be found by this route.We illustrate the arguments sketched above by a number of problems connected with binomial distributions, noting that the canonical parameter of a binomial distribution with probability $\pi$ is $\phi=\log {\pi /(1-\pi)}$. This has some advantages in interpretation and also disadvantages. When $\pi$ is small, the parameter is essentially equivalent to $\log \pi$ meaning that differences in canonical parameter are equivalent to ratios of probabilities.

## 统计代写|统计推断作业代写statistics interference代考|Transformation models

A different kind of reduction is possible for models such as the location model which preserve a special structure under a family of transformations.

Example 4.10. Location model. We return to Example 1.8, the location model. The likelihood is $\Pi g\left(y_{k}-\mu\right)$ which can be rearranged in terms of the order statistics $y_{(t)}$, i.e., the observed values arranged in increasing order of magnitude. These in general form the sufficient statistic for the model, minimal except for the normal distribution. Now the random variables corresponding to the differences between order statistics, specified, for example, as
$$a=\left(y_{(2)}-y_{(1)}, \ldots, y_{(n)}-y_{(1)}\right)=\left(a_{2}, \ldots, a_{n}\right),$$
have a distribution not depending on $\mu$ and hence are ancillary; inference about $\mu$ is thus based on the conditional distribution of, say, $y_{(1)}$ given $A=a$. The choice of $y_{(1)}$ is arbitrary because any estimate of location, for example the mean, is $y_{(1)}$ plus a function of $a$, and the latter is not random after conditioning. Now the marginal density of $A$ is
$$\int g\left(y_{(1)}-\mu\right) g\left(y_{(1)}+a_{2}-\mu\right) \cdots g\left(y_{(1)}+a_{n}-\mu\right) d y_{(1)} .$$The integral with respect to $y_{(1)}$ is equivalent to one integrated with respect to $\mu$ so that, on reexpressing the integrand in terms of the likelihood for the unordered variables, the density of $A$ is the function necessary to normalize the likelihood to integrate to one with respect to $\mu$. That is, the conditional density of $Y_{(1)}$, or of any other measure of location, is determined by normalizing the likelihood function and regarding it as a function of $y_{(1)}$ for given $a$. This implies that $p$-values and confidence limits for $\mu$ result from normalizing the likelihood and treating it like a distribution of $\mu$.

This is an exceptional situation in frequentist theory. In general, likelihood is a point function of its argument and integrating it over sets of parameter values is not statistically meaningful. In a Bayesian formulation it would correspond to combination with a uniform prior but no notion of a prior distribution is involved in the present argument.

The ancillary statistics allow testing conformity with the model. For example, to check on the functional form of $g(.)$ it would be best to work with the differences of the ordered values from the mean, $\left(y_{(l)}-\bar{y}\right)$ for $l=1, \ldots, n$. These are functions of $a$ as previously defined. A test statistic could be set up informally sensitive to departures in distributional form from those implied by $g(.)$. Or $\left(y_{(l)}-\bar{y}\right)$ could be plotted against $G^{-1}{(l-1 / 2) /(n+1)}$, where $G(.)$ is the cumulative distribution function corresponding to the density $g(.)$.

## 统计代写|统计推断作业代写statistics interference代考|Some further Bayesian examples

In principle the prior density in a Bayesian analysis is an insertion of additional information and the form of the prior should be dictated by the nature of that evidence. It is useful, at least for theoretical discussion, to look at priors which lead to mathematically tractable answers. One such form, useful usually only for nuisance parameters, is to take a distribution with finite support, in particular a two-point prior. This has in one dimension three adjustable parameters, the position of two points and a probability, and for some limited purposes this may be adequate. Because the posterior distribution remains concentrated on the same two points computational aspects are much simplified.

We shall not develop that idea further and turn instead to other examples of parametric conjugate priors which exploit the consequences of exponential family structure as exemplified in Section 2.4.

Example 4.12. Normal variance. Suppose that $Y_{1}, \ldots, Y_{n}$ are independently normally distributed with known mean, taken without loss of generality to be zero, and unknown variance $\sigma^{2}$. The likelihood is, except for a constant factor,
$$\frac{1}{\sigma^{n}} \exp \left{-\Sigma y_{k}^{2} /\left(2 \sigma^{2}\right)\right} .$$
The canonical parameter is $\phi=1 / \sigma^{2}$ and simplicity of structure suggests taking $\phi$ to have a prior gamma density which it is convenient to write in the form
$$\pi\left(\phi ; g, n_{\pi}\right)=g(g \phi)^{n_{\pi} / 2-1} e^{-g \phi} / \Gamma\left(n_{\pi} / 2\right),$$

defined by two quantities assumed known. One is $n_{\pi}$, which plays the role of an effective sample size attached to the prior density, by analogy with the form of the chi-squared density with $n$ degrees of freedom. The second defining quantity is $g$. Transformed into a distribution for $\sigma^{2}$ it is often called the inverse gamma distribution. Also $E_{\pi}(\Phi)=n_{\pi} /(2 g)$.

On multiplying the likelihood by the prior density, the posterior density of $\Phi$ is proportional to
$$\phi^{\left(n+n_{\pi}\right) / 2-1} \exp \left[-\left{\left(\Sigma y_{k}^{2}+n_{\pi} / E_{\pi}(\Phi)\right)\right} \phi / 2\right]$$
The posterior distribution is in effect found by treating
$$\left{\Sigma y_{k}^{2}+n_{\pi} / E_{\pi}(\Phi)\right} \Phi$$
as having a chi-squared distribution with $n+n_{\pi}$ degrees of freedom.
Formally, frequentist inference is based on the pivot $\Sigma Y_{k}^{2} / \sigma^{2}$, the pivotal distribution being the chi-squared distribution with $n$ degrees of freedom. There is formal, although of course not conceptual, equivalence between the two methods when $n_{a}=0$. This arises from the improper prior $d \phi / \phi$, equivalent to $d \sigma / \sigma$ or to a uniform improper prior for $\log \sigma$. That is, while there is never in this setting exact agreement between Bayesian and frequentist solutions, the latter can be approached as a limit as $n_{\pi} \rightarrow 0$.

$$a=\left(y_{(2)}-y_{(1)}, \ldots, y_{ (n)}-y_{(1)}\right)=\left(a_{2}, \ldots, a_{n}\right),$$

$$\int g\left(y_{(1)}-\mu\right) g\left(y_{(1)}+a_{2} -\mu\right) \cdots g\left(y_{(1)}+a_{n}-\mu\right) d y_{(1)} .$$关于$y_{的积分(1)}$ 等价于关于 $\mu$ 的积分，因此，在根据无序变量的似然性重新表示被积函数时，$A$ 的密度是归一化似然性所必需的函数对于 $\mu$ 为一。也就是说，$Y_{(1)}$ 或任何其他位置度量的条件密度是通过对似然函数进行归一化并将其视为给定 $a 的$y_{(1)}$的函数来确定的美元。这意味着$\mu$的$p$值和置信限来自于对可能性进行归一化并将其视为$\mu$的分布。 对于诸如位置模型这样的模型可以进行不同类型的归约，该模型在一系列变换下保留了特殊结构。 示例 4.10。定位模型。我们回到示例 1.8，位置模型。可能性是$\Pi g\left(y_{k}-\mu\right)$，它可以根据顺序统计$y_{(t)}$重新排列，即观察值按递增顺序排列震级。这些通常构成模型的足够统计量，除了正态分布之外是最小的。现在将随机变量对应的订单统计量的差异，指定为例如 $$a=\left(y_{(2)}-y_{(1)}, \ldots, y_{ (n)}-y_{(1)}\right)=\left(a_{2}, \ldots, a_{n}\right),$$ 有一个不依赖于$\ 的分布mu$，因此是辅助的；因此，关于$\mu$的推断是基于给定$A=a$的$y_{(1)}$的条件分布。$y_{(1)}$的选择是任意的，因为任何位置估计，例如均值，都是$y_{(1)}$加上$a$的函数，而后者在调节后不是随机的。现在$A$的边际密度是 $$\int g\left(y_{(1)}-\mu\right) g\left(y_{(1)}+a_{2} -\mu\right) \cdots g\left(y_{(1)}+a_{n}-\mu\right) d y_{(1)} .$$关于$y_{的积分(1)}$等价于关于$\mu$的积分，因此，在根据无序变量的似然性重新表示被积函数时，$A$的密度是归一化似然性所必需的函数对于$\mu$为一。也就是说，$Y_{(1)}$或任何其他位置度量的条件密度是通过对似然函数进行归一化并将其视为给定$a 的 $y_{(1)}$ 的函数来确定的美元。这意味着 $\mu$ 的 $p$ 值和置信限来自于对可能性进行归一化并将其视为 $\mu$ 的分布。

$$\left{\Sigma y_{k}^{2}+n_{\pi} / E_ {\pi}(\Phi)\right} \Phi$$

