### 统计代写|统计模型作业代写Statistical Modelling代考|Asymptotic Properties of the MLE

## 统计代写|统计模型作业代写Statistical Modelling代考|Large Sample Asymptotics

For simplicity of notation we suppose the exponential family is regular, so $\boldsymbol{\Theta}$ is open, but the results hold for any full family if $\boldsymbol{\Theta}$ is only restricted to its interior. Proposition $3.10$ showed that the log likelihood function is strictly concave and therefore has a unique maximum, corresponding to a unique root of the likelihood equations, provided there is a (finite) maximal value. We have seen examples in Chapter 3 of likelihoods without a finite maximum, for example in the binomial and Poisson families. We shall here first show that if we have a sample of size $n$ from a regular exponential family, the risk for such an event tends to zero with increasing $n$, and the MLE $\hat{\boldsymbol{\theta}}$ approaches the true $\boldsymbol{\theta}$, i.e. $\hat{\boldsymbol{\theta}}$ is a consistent estimator. The next step, to an asymptotic Gaussian distribution of $\hat{\boldsymbol{\theta}}$, is not big.

We shall use the following notation. Let $t(y)$ be the canonical statistic for a single observation, and $t_{n}=\sum_{i} t\left(y_{i}\right)$ the corresponding canonical statistic for the whole sample. Let $\mu_{t}(\theta)=E_{\theta}{t(y)}=E_{\theta}\left{t_{n} / n\right}$, i.e. the mean value per observational unit. The one-to-one canonical and mean-

value parameterizations by $\boldsymbol{\theta}$ and $\boldsymbol{\mu}$ are related by $\hat{\theta}(\boldsymbol{t})=\boldsymbol{\mu}^{-1}(\boldsymbol{t})$, and for a regular family both $\Theta$ and $\boldsymbol{\mu}(\boldsymbol{\Theta})$ are open sets in $\mathbb{R}^{k}$, where $k=\operatorname{dim} \theta$.

The existence and consistency of the MLE is essentially only a simple application of the law of large numbers under the mean value parameterization (when the MLE is $\hat{\mu}{t}=\boldsymbol{t}{n} / n$ ), followed by a reparameterization. In an analogous way, asymptotic normality of the MLE follows from the central limit theorem applied on $\hat{\mu}_{t}$. However, we will also indicate stronger versions of these results, utilizing more of the exponential family structure.

## 统计代写|统计模型作业代写Statistical Modelling代考|Existence and consistency of the MLE of θ

For a sample of size n from a regular exponential family, and for any $\boldsymbol{\theta} \in \boldsymbol{\Theta}$,
$$\operatorname{Pr}\left{\hat{\boldsymbol{\theta}}\left(\boldsymbol{t}{n} / n\right) \text { exists; } \boldsymbol{\theta}\right} \rightarrow 1 \text { as } n \rightarrow \infty,$$ and furthermore, $$\hat{\boldsymbol{\theta}}\left(\boldsymbol{t}{n} / n\right) \rightarrow \boldsymbol{\theta} \text { in probability as } n \rightarrow \infty$$
These convergences are uniform on compact subsets of $\Theta$.
Proof By the (weak) law of large numbers (Khinchine version), $t_{n} / n \rightarrow$ $\boldsymbol{\mu}{t}\left(=\boldsymbol{\mu}{t}(\boldsymbol{\theta})\right)$ in probability as $n \rightarrow \infty$. More precisely expressed, for any fixed $\delta>0$,
$$\operatorname{Pr}\left{\left|\boldsymbol{t}{n} / n-\boldsymbol{\mu}{t}\right|<\delta\right} \rightarrow 1$$ Note that as soon as $\delta>0$ is small enough, the $\delta$-neighbourhood (4.1) of $\boldsymbol{\mu}{t}$ is wholly contained in the open set $\boldsymbol{\mu}(\boldsymbol{\Theta})$, and that $\boldsymbol{t}{n} / n$ is identical with the MLE $\hat{\boldsymbol{\mu}}{t}$. This is the existence and consistency result for the MLE of the mean value parameter, $\hat{\boldsymbol{\mu}}{t}$. Next we transform this to a result for $\hat{\boldsymbol{\theta}}$ in $\boldsymbol{\Theta}$.
Consider an open $\delta^{\prime}$-neighbourhood of $\theta$. For $\delta^{\prime}>0$ small enough, it is wholly within the open set $\boldsymbol{\Theta}$. There the image function $\boldsymbol{\mu}{t}(\boldsymbol{\theta})$ is welldefined, and the image of the $\delta^{\prime}$-neighbourhood of $\theta$ is an open neighbourhood of $\boldsymbol{\mu}(\boldsymbol{\theta})$ in $\boldsymbol{\mu}(\boldsymbol{\Theta})$ (since $\hat{\boldsymbol{\theta}}=\boldsymbol{\mu}{t}^{-1}$ is a continuous function). Inside that open neighbourhood we can always find (for some $\delta>0$ ) an open $\delta$ neighbourhood of type (4.1), whose probability goes to 1 . Thus, the probability for $\hat{\theta}\left(t_{n} / n\right)$ to be in the $\delta^{\prime}$-neighbourhood of $\theta$ also goes to 1 . This shows the asymptotic existence and consistency of $\hat{\boldsymbol{\theta}}$ for any fixed $\boldsymbol{\theta}$ in $\boldsymbol{\Theta}$.
Finally, this can be strengthened (Martin-Löf, 1970) to uniform convergence on compact subsets of the parameter space $\Theta$. Specifically, Chebyshev’s inequality can be used to give a more explicit upper bound of order $1 / n$ to the complementary probability, $\operatorname{Pr}\left{\left|t_{n} / n-\mu_{t}(\theta)\right| \geq \delta\right}$, proportional to $\operatorname{var}_{t}(\theta)$, that is bounded on compact subsets of $\boldsymbol{\Theta}$, since it is a continuous function of $\theta$. Details are omitted.

## 统计代写|统计模型作业代写Statistical Modelling代考|Uniform convergence on compacts

General results tells that under a set of suitable regularity conditions, the MLE is correspondingly asymptotically normally distributed as $n \rightarrow \infty$, but with the asymptotic variance expressed as being the inverse of the Fisher information. First note that when we are in a regular exponential family, we require no additional regularity conditions. Secondly, the variance $\operatorname{var}\left(\boldsymbol{t}{n} / n\right)=\operatorname{var}(t) / n$ for $\hat{\boldsymbol{\mu}}{\boldsymbol{t}}$ is precisely the inverse of the Fisher information matrix for $\mu_{t}$ (Proposition $\left.3.15\right)$, and unless $\psi\left(\mu_{t}\right)$ is of lower dimension than $\mu_{t}$ itself, the variance formula in Proposition $4.3$ can alternatively be obtained from $\operatorname{var}(t)$ by use of the Reparameterization lemma (Proposition 3.14). When $\psi$ and $\boldsymbol{\mu}{t}$ are not one-to-one, but $\operatorname{dim} \psi\left(\boldsymbol{\mu}{t}\right)<$ $\operatorname{dim} \mu_{t}$, e.g. $\psi$ a subvector of $\mu_{t}$, the theory of profile likelihoods (Section 3.3.4) can be useful, yielding correct formulas expressed in Fisher information terms. In a concrete situation we are of course free to calculate the estimator variance directly from the explicit form of $\hat{\psi}$. In any case, a conditional inference approach, eliminating nuisance parameters (Section $3.5)$, should also be considered. Asymptotically, however, we should not expect the conditional and unconditional results to differ, unless the nuisance parameters are incidental and increase in number with $n$, as in Exercise 3.16.

Note the special case of the canonical parameterization, with $\boldsymbol{\theta}$ as canonical parameter for the distribution of the single observation $(n=1)$. For this parameter, the asymptotic variance of the MLE is $\operatorname{var}(t)^{-1} / n$, where $t$ is the single observation statistic. This is the special case of Proposition $4.3$ for which
$$\left(\frac{\partial \psi}{\partial \mu_{t}}\right)=\left(\frac{\partial \theta}{\partial \mu_{t}}\right)=\left(\frac{\partial \mu_{t}}{\partial \theta}\right)^{-1}=\operatorname{var}{\theta}(t)^{-1}$$ Like in the general large sample statistical theory, we may use the result of Proposition $4.3$ to construct asymptotically correct confidence regions. For example, expressed for the canonical parameter $\boldsymbol{\theta}$, it follows from Proposition $4.3$ that the quadratic form in $\hat{\theta}$, $$Q(\boldsymbol{\theta})=n(\hat{\boldsymbol{\theta}}-\boldsymbol{\theta})^{T} \operatorname{var}{\theta}(\boldsymbol{t})(\hat{\boldsymbol{\theta}}-\boldsymbol{\theta})$$

