### 统计代写|统计模型作业代写Statistical Modelling代考|Solving Likelihood Equations Numerically

## 统计代写|统计模型作业代写Statistical Modelling代考|Newton–Raphson, Fisher Scoring and Other Algorithms

For a full family in mean value parameterization the MLE is evidently explicit, $\hat{\boldsymbol{\mu}}{t}=\boldsymbol{t}$. For other parameterizations, by some $\psi$ say, we must invert the equation system $\mu{t}(\hat{\psi})=t$, and this need not have an explicit solution.
A classic and presumably well-known algorithm for iterative solution of equation systems is the Newton-Raphson method. Its basis here is the following linearization of the score $U$, which is adequate for $\psi$ close enough to the unknown $\hat{\psi}$ (note that $-J$ is the Hessian matrix of partial derivatives of $U$ ):
$$0=U_{\psi}(\hat{\psi}) \approx U_{\dot{\psi}}(\psi)-J_{\psi}(\psi)(\hat{\psi}-\psi) .$$
Iteratively solving for $\hat{\psi}$ in this linearized system yields successive updates of $\psi_{k}$ to $\psi_{k+1}, k=0,1,2, \ldots .$
$$\psi_{k+1}=\psi_{k}+J_{\psi}\left(\psi_{k}\right)^{-1} U_{\psi}\left(\psi_{k}\right)$$
This is a locally fast algorithm (quadratic convergence), provided that it converges. The method can be quite sensitive to the choice of starting point $\psi_{0^{}}$ Generally, if $J_{\psi^{}}\left(\psi_{0}\right)$ is not positive definite or close to singular, it is not likely to converge, or it might converge to a minimum of $\log L$, so there is no guarantee that the resulting root represents even a local likelihood maximum. For a full exponential family, however, the odds are better. In particular, there is not more than one root, and if the parameter is the canonical, $\boldsymbol{\theta}$, we know that $J_{\theta}=I_{\theta}=\operatorname{var}(\boldsymbol{t})$, a necessarily positive definite matrix. Quasi-Newton methods are modifications of $(3.27)$, where the $J$ matrix is approximated, e.g. by the secant method, typically because an explicit formula for $J$ is not available.

## 统计代写|统计模型作业代写Statistical Modelling代考|Conditional Inference for Canonical Parameter

Suppose $\psi$ is a parameter of interest, either being itself a subvector $\theta_{v}$ of $\boldsymbol{\theta}$, perhaps after a linear transformation of the canonical statistic $\boldsymbol{t}$, or being

some other one-to-one function of $\theta_{v}$. Corresponding to the partitioning $t=(u, v)$, we may write $\theta$ as $(\lambda, \psi)$. Here $\lambda$, represented by $\theta_{u}$ or $\mu_{u}$, is regarded as a nuisance parameter, supplementing $\psi$. As shown in Section $3.3 .3$ above, $\lambda=\boldsymbol{\mu}{u}=E{\theta}(\boldsymbol{u})$ is the preferable nuisance parameter, at least in principle, since $\psi$ and $\boldsymbol{\mu}{u}$ are variation independent and information orthogonal (Proposition $3.20$ ). Proposition $3.21$ Conditionality principle for full families Statistical inference about the canonical parameter component $\psi$ in presence of the nuisance parameter $\lambda$ or $\boldsymbol{\mu}{\mathrm{u}}=E_{\theta}(\boldsymbol{u})$ should be made conditional on $\boldsymbol{u}$, that is, in the conditional model for $\boldsymbol{y}$ or $\boldsymbol{v}$ given $\boldsymbol{u}$.
Motivation. The likelihood for $\left(\mu_{u}, \psi\right)$ factorizes as
$$L\left(\boldsymbol{\mu}{u}, \psi ; t\right)=L{1}\left(\boldsymbol{\mu}{u}, \psi ; u\right) L{2}(\psi ; \boldsymbol{v} \mid \boldsymbol{u}),$$
where the two parameters are variation independent. In some cases, exemplified by Example 3.2, $L_{1}$ depends only on $\boldsymbol{\mu}{u}$, $$L\left(\mu{u}, \psi ; t\right)=L_{1}\left(\mu_{u} ; u\right) L_{2}(\psi ; v \mid u) .$$
Then it is clear that there is no information whatsoever about $\psi$ in the first factor $L_{1}$, and the argument for the principle is compelling. The terminology for this situation is that the factorization is called a cut, and $\boldsymbol{u}$ is called $S$-ancillary for $\psi$. But also when $L_{1}$ depends on $\psi$ (illustrated in Example $3.3$ ), there is really no information about $\psi$ in $u$, as seen by the following argument. Note first that $u$ and $\boldsymbol{\mu}{u}$ are of the same dimension (of course), and that $\boldsymbol{u}$ serves as an estimator (the MLE) of $\boldsymbol{\mu}{u}$, whatever be $\psi$ (Proposition 3.11). This means that the information in $\boldsymbol{u}$ about $\left(\mu_{u}, \psi\right)$ is totally consumed in the estimation of $\mu_{u}$. Furthermore, the estimated value of $\mu_{u}$ does not provide any information about $\psi$, and $\mu_{u}$ would not do so even if it were known, due to the variation independence between $\mu_{u}$ and $\psi$. Thus, the first factor $L_{1}$ contributes only information about $\mu_{u}$ in (3.28).

## 统计代写|统计模型作业代写Statistical Modelling代考|Two Poisson variates

Suppose we want to make inference about the relative change in a Poisson parameter from one occasion to another, with one observation per occasion.
$48 \quad$ Regularity Conditions and Basic Properties
Specifically, let $y_{1}$ and $y_{2}$ be independent Poisson distributed variables with mean values $e^{\lambda}$ and $e^{\psi+\lambda}$, respectively. The model is
$$f\left(y_{1}, y_{2} ; \lambda, \psi\right)=\frac{h\left(y_{1}, y_{2}\right)}{C(\lambda, \psi)} e^{\lambda\left(y_{1}+y_{2}\right)+\psi y_{2}}$$
The parameter of prime interest is $e^{\phi}$, or equivalently the canonical $\psi$. The conditionality principle states that the inference about $\psi$ should be made in the conditional model for $v=y_{2}$ given the other canonical statistic $u=$ $y_{1}+y_{2}$. Simple but important calculations left as an exercise (Exercise 3.13) show that this model is the binomial distribution with logit parameter $\psi$, $\operatorname{Bin}\left{y_{1}+y_{2} ; e^{\psi} /\left(1+e^{\psi}\right)\right}$. The marginal distribution for $u=y_{1}+y_{2}$ is $\operatorname{Po}\left(\mu_{u}\right)$, not additionally involving $\psi$. Thus we have a cut, and $S$-ancillarity. Note the crucial role of the mixed parameterization to obtain a cut. If expressed instead in terms of the canonical parameters $\lambda$ and $\psi$ of the joint model, the Poisson parameter in the marginal for $u$ would have been dependent on both $\lambda$ and $\psi$, since $\mu_{u}=E\left(y_{1}+y_{2}\right)=e^{\lambda}\left(1+e^{\phi}\right)$.

Under S-ancillarity, the MLEs will be the same when derived in the conditional or marginal models as in the joint model, and the observed and expected information matrices in the joint model will be block diagonal with blocks representing the conditional and marginal models. This can sometimes be used in the opposite way, by artificially extending a conditionally defined model to a simpler joint model with the same MLEs etc, see Section 5.6. Outside S-ancillarity we cannot count on these properties, but we must instead distinguish joint model and conditional model MLEs. Here is one such example, see Example $3.6$ for another one.

## 统计代写|统计模型作业代写Statistical Modelling代考|Newton–Raphson, Fisher Scoring and Other Algorithms

0=在ψ(ψ^)≈在ψ˙(ψ)−Ĵψ(ψ)(ψ^−ψ).

ψķ+1=ψķ+Ĵψ(ψķ)−1在ψ(ψķ)

## 统计代写|统计模型作业代写Statistical Modelling代考|Two Poisson variates

48正则性条件和基本性质

F(是1,是2;λ,ψ)=H(是1,是2)C(λ,ψ)和λ(是1+是2)+ψ是2

