### 统计代写|统计模型作业代写Statistical Modelling代考|Reparameterization lemma

## 统计代写|统计模型作业代写Statistical Modelling代考|Reparameterization lemma

If $\psi$ and $\theta=\theta(\psi)$ are two equivalent parameterizations of the same model, or if parameterization by $\psi$ represents a lower-dimensional subfamily of a family parameterized by $\boldsymbol{\theta}$, via $\boldsymbol{\theta}=\boldsymbol{\theta}(\boldsymbol{\psi})$, then the score functions are related by
$$U_{\dot{\psi}}(\psi ; y)=\left(\frac{\partial \boldsymbol{\theta}}{\partial \psi}\right)^{T} U_{\theta}(\boldsymbol{\theta}(\psi) ; \boldsymbol{y}),$$
and the corresponding information matrices are related by the equations
\begin{aligned} &I_{\psi}(\psi)=\left(\frac{\partial \theta}{\partial \psi}\right)^{T} I_{\theta}(\theta(\psi))\left(\frac{\partial \theta}{\partial \psi}\right) \ &J_{\psi}(\hat{\psi})=\left(\frac{\partial \theta}{\partial \psi}\right)^{T} J_{\theta}(\theta(\hat{\psi}))\left(\frac{\partial \theta}{\partial \psi}\right) \end{aligned}
where the information matrices refer to their respective parameterizations and are calculated in points $\psi$ and $\theta(\psi)$, respectively, and where $\left(\frac{\partial \theta}{\partial \psi}\right)$ is the Jacobian transformation matrix for the reparameterization, calculated in the same point as the information matrix.

Note that it is generally crucial to distinguish the Jacobian matrix from its transpose. The definition of the Jacobian is such that the $(i, j)$ element of $\left(\frac{\partial \theta}{\partial \psi}\right)$ is the partial derivative of $\theta_{i}$ with respect to $\psi_{j}$.

When $\psi$ and $\theta$ are two equivalent parameterizations, it is sometimes easier to derive $\left(\frac{\partial \varphi}{\partial \theta}\right)$ than $\left(\frac{\partial \theta}{\partial \psi}\right)$ for use in the reparameterization expressions. Note then that they are the inverses of each other, cf. Section B.2.

Note also that Proposition $3.14$ does not allow $\psi$ to be a function of $\boldsymbol{\theta}$, so for example not a component or subvector of $\boldsymbol{\theta}$. Reducing the score to a subvector of itself, and the information matrix to the corresponding

submatrix, is legitimate only when the excluded parameter subvector is regarded as known (cf. Proposition 4.3).

Note finally that the relationship (3.16) between the two Fisher informations $I$ holds for any $\psi$, whereas the corresponding relation (3.17) for the observed information $J$ only holds in the ML point. The reason for the latter fact will be clear from the following proof.
Proof Repeated use of the chain rule for differentiation yields first
$$D_{\varphi} \log L=\left(\frac{\partial \theta}{\partial \psi}\right)^{T} D_{\theta} \log L$$
and next
$$D_{\psi}^{2} \log L=\left(\frac{\partial \theta}{\partial \psi}\right)^{T} D_{\theta}^{2} \log L\left(\frac{\partial \theta}{\partial \psi}\right)+\left(\frac{\partial^{2} \theta}{\partial \psi^{2}}\right)^{T} D_{\theta} \log L$$
where the last term represents a matrix obtained when the three-dimensional array $\left(\frac{\partial^{2} \theta}{\partial \psi^{2}}\right)^{T}$ is multiplied by a vector. In the MLE point this last term of (3.19) vanishes, since $D_{\theta} \log L=0$, and (3.17) is obtained. Also, if we take expected values, the same term vanishes because the score function has expected value zero, and the corresponding relation between the expected informations follows.

## 统计代写|统计模型作业代写Statistical Modelling代考|Mixed parameterization

The mixed parameterization is a valid parameterization.
Proof of Corollary $3.17$ From Proposition $3.16$ it follows that the conditional density can be parameterized by $\theta_{v}$. When $\theta_{v}$ has been specified, Proposition $3.16$ additionally tells that the marginal for $u$ is an exponential family that is for example parameterized by its mean value $\mu_{u}$. Finally, since the joint density for $u$ and $v$ is the product of these marginal and conditional densities, it follows that the mixed parameterization with $\left(\mu_{u}, \theta_{v}\right)$ is also a valid parameterization of the family for $t$.

Proof of Proposition $3.16$ The marginal density for $\boldsymbol{u}$ is obtained by integrating the density (1.8) for $\boldsymbol{t}$ with respect to $\boldsymbol{v}$, for fixed $\boldsymbol{u}$ :
\begin{aligned} f(\boldsymbol{u} ; \boldsymbol{\theta}) &=\int a(\boldsymbol{\theta}) g(\boldsymbol{u}, \boldsymbol{v}) e^{\boldsymbol{\theta}{u}^{T}} \boldsymbol{u}+\boldsymbol{\theta}{v}^{T} \boldsymbol{v}{\mathrm{d} \boldsymbol{v}} \ &=a(\boldsymbol{\theta}) e^{\boldsymbol{\theta}{u}^{T} \boldsymbol{u}} \int g(\boldsymbol{u}, \boldsymbol{v}) e^{\boldsymbol{\theta}{v}^{T} \boldsymbol{v}} \mathrm{d} \boldsymbol{v} \end{aligned} The integral factor on the second line is generally a function of both data and the parameter $\left(\theta{v}\right)$, which destroys the exponential family property. However, for each given $\theta_{v}$, this factor is only a function of data, and then the family is exponential. Its canonical parameter is $\theta_{u}$, but the parameter space for $\boldsymbol{\theta}{u}$ may depend on $\boldsymbol{\theta}{v}$, being the intersection of $\boldsymbol{\Theta}$ with the hyperplane or affine subspace $\theta_{v}=$ fixed. Since this exponential family

is automatically regular in $\theta_{u}$, the range for its mean value $\mu_{u}$, that is, its mean value parameter space, is identical with the interior of the closed convex hull of the support of the family of distributions for $u$, according to Proposition 3.13. However, since this set is independent of $\theta_{v}$, the parameter space for $\mu_{u}$ is also independent of $\theta_{v}$, as was to be shown.

For the conditional model we do not bother about possible mathematical technicalities connected with conditional densities, but simply write as for conditional probabilities, using the marginal density (3.21) to simplify for $\boldsymbol{u}=\boldsymbol{u}(\boldsymbol{y})$ the expression:
\begin{aligned} f(\boldsymbol{y} \mid \boldsymbol{u} ; \boldsymbol{\theta}) &=f(\boldsymbol{u}, \boldsymbol{y} ; \boldsymbol{\theta}) / f(\boldsymbol{u} ; \boldsymbol{\theta}) \ &=\frac{e^{\boldsymbol{\theta}{\leftarrow}^{T} v(\boldsymbol{v})} h(\boldsymbol{y})}{\int e^{\theta{r}^{T} v} g(\boldsymbol{u}, \boldsymbol{v}) \mathrm{d} v} . \end{aligned}
To obtain $f(\boldsymbol{v} \mid \boldsymbol{u} ; \boldsymbol{\theta})$ we only substitute $g(\boldsymbol{u}, \boldsymbol{v})$ for $h(\boldsymbol{y})$ in the numerator of the exponential family density (3.22). Note that $f(\boldsymbol{v} \mid \boldsymbol{u} ; \boldsymbol{\theta})$ is defined only for those $v$-values which are possible for the given $u$, and the denominator is the integral over this set of values. This makes the model depend on $u$, even though the canonical parameter is the same, $\boldsymbol{\theta}_{v}$, and in particular the conditional mean and variance of $v$ will usually depend on $u$.

## 统计代写|统计模型作业代写Statistical Modelling代考|Marginality and conditionality for Gaussian sample

As an illustration of Proposition 3.16, consider $u=\sum y_{i}^{2}$ for a sample from the normal distribution, with $v=\sum y_{i}$. First, the marginal distribution of $\sum y_{i}^{2}$ is proportional to a noncentral $\chi^{2}$ and does not in general form an exponential family. Next, consider instead its conditional distribution, given $\sum y_{i}=n \bar{y}$. This might appear quite complicated, but given $\bar{y}, \sum y_{i}^{2}$ differs by only an additive, known constant from $\sum y_{i}^{2}-n \bar{y}^{2}=(n-1) s^{2}$. Thus, it is enough to characterize the distribution of $(n-1) s^{2}$, and it is well-known that $s^{2}$ is independent of $\bar{y}$ (see also Section 3.7), and that the distribution of $(n-1) s^{2}$ is proportional (by $\sigma^{2}$ ) to a (central) $\chi^{2}(n-1)$. From the explicit form of a $\chi^{2}$ it is easily seen that the conditional distribution forms an exponential family (Reader, check!).

The canonical parameter space for the conditional family is at least as large as the maximal set of $\boldsymbol{\theta}{v}$-values in $\boldsymbol{\Theta}$, that is, the set of $\boldsymbol{\theta}{v}$ such that $\left(\boldsymbol{\theta}{u}, \boldsymbol{\theta}{v}\right) \in \boldsymbol{\Theta}$ for some $\boldsymbol{\theta}{u}$. Typically the two sets are the same, but there are models in which the conditional canonical parameter space is actually larger and includes parameter values $\boldsymbol{\theta}{v}$ that lack interpretation in the joint model, see for example the Strauss model in Section 13.1.2 and the model in Section 14.3(b).

## 统计代写|统计模型作业代写Statistical Modelling代考|Reparameterization lemma

D披日志⁡大号=(∂θ∂ψ)吨Dθ日志⁡大号

Dψ2日志⁡大号=(∂θ∂ψ)吨Dθ2日志⁡大号(∂θ∂ψ)+(∂2θ∂ψ2)吨Dθ日志⁡大号

## 统计代写|统计模型作业代写Statistical Modelling代考|Mixed parameterization

F(在;θ)=∫一种(θ)G(在,在)和θ在吨在+θ在吨在d在 =一种(θ)和θ在吨在∫G(在,在)和θ在吨在d在第二行的积分因子通常是数据和参数的函数(θ在)，它破坏了指数家庭财产。然而，对于每个给定的θ在，这个因子只是数据的函数，然后这个族是指数的。它的规范参数是θ在，但参数空间为θ在可能取决于θ在，是的交集θ与超平面或仿射子空间θ在=固定的。由于这个指数族

\begin{aligned} f(\boldsymbol{y} \mid \boldsymbol{u} ; \boldsymbol{\theta}) &=f(\boldsymbol{u}, \boldsymbol{y} ; \boldsymbol {\theta}) / f(\boldsymbol{u} ; \boldsymbol{\theta}) \ &=\frac{e^{\boldsymbol{\theta} {\leftarrow}^{T} v(\boldsymbol{v })} h(\boldsymbol{y})}{\int e^{\theta {r}^{T} v} g(\boldsymbol{u}, \boldsymbol{v}) \mathrm{d} v} . \end{aligned}

