### 统计代写|似然估计作业代写Probability and Estimation代考|ANOVA Model

## 统计代写|似然估计作业代写Probability and Estimation代考|Introduction

An important model belonging to the class of general linear hypothesis is the analysis of variance (ANOVA) model. In this model, we consider the assessment of $p$ treatment effects by considering sample experiments of sizes $n_{1}$, $n_{2}, \ldots, n_{p}$, respectively, with the responses $\left{\left(y_{i 1}, \ldots, y_{i n_{i}}\right)^{\mathrm{T}} ; i=1,2, \ldots, p\right}$ which satisfy the model, $y_{i j}=\theta_{i}+e_{i j}\left(j=1, \ldots, n_{i}, i=1, \ldots, p\right)$. The main objective of the chapter is the selection of the treatments which would yield best results. Accordingly, we consider the penalty estimators, namely, ridge, subset selection rule, and least absolute shrinkage and selection operator (LASSO) together with the classical shrinkage estimators, namely, the preliminary test estimator (PTE), the Stein-type estimators (SE), and positive-rule Stein-type estimator (PRSE) of $\theta=\left(\theta_{1}, \ldots, \theta_{p}\right)^{\top}$. For LASSO and related methods, see Breiman (1996), Fan and Li (2001), Zou and Hastie (2005), and Zou (2006), among others; and for PTE and SE, see Judge and Bock (1978) and Saleh (2006), among others.

The chapter points to the useful “selection” aspect of LASSO and ridge estimators as well as limitations found in other papers. Our conclusions are based on the ideal $\mathrm{L}{2}$ risk of LASSO of an oracle which would supply optimal coefficients in a diagonal projection scheme given by Donoho and Johnstone (1994, p. 437). The comparison of the estimators considered here are based on mathematical analysis as well as by tables of $\mathrm{L}{2}$-risk efficiencies and graphs and not by simulation.

In his pioneering paper, Tibshirani (1996) examined the relative performance of the subset selection, ridge regression, and LASSO in three different scenarios, under orthogonal design matrix in a linear regression model:
(a) Small number of large coefficients: subset selection does the best here, the LASSO not quite as well, ridge regression does quite poorly.
(b) Small to moderate numbers of moderate-size coefficients: LASSO does the best, followed by ridge regression and then subset selection.

## 统计代写|似然估计作业代写Probability and Estimation代考|Model, Estimation, and Tests

Consider the ANOVA model
$$\boldsymbol{Y}=\boldsymbol{B} \boldsymbol{\theta}+\boldsymbol{\epsilon}=\boldsymbol{B}{1} \boldsymbol{\theta}{1}+\boldsymbol{B}{2} \boldsymbol{\theta}{2}+\boldsymbol{\epsilon},$$
where $Y=\left(y_{11}, \ldots, y_{1 n_{1}}, \ldots, y_{p_{1}}, \ldots, y_{p w_{p}}\right)^{\top}, \theta=\left(\theta_{1}, \ldots, \theta_{p_{1}}, \theta_{p_{1}+1}, \ldots, \theta_{p}\right)^{\top}$ is the unknown vector that can be partitioned as $\boldsymbol{\theta}=\left(\boldsymbol{\theta}{1}^{\top}, \boldsymbol{\theta}{2}^{\top}\right)^{\top}$, where $\boldsymbol{\theta}{1}=$ $\left(\theta{1}, \ldots, \theta_{p_{1}}\right)^{\top}$, and $\theta_{2}=\left(\theta_{p_{1}+1}, \ldots, \theta_{p}\right)^{\top}$.

The error vector $\boldsymbol{\epsilon}$ is $\left(\epsilon_{11}, \ldots, \epsilon_{1 n_{1}}, \ldots, \epsilon_{p_{1}}, \ldots, \epsilon_{p v_{p}}\right)^{\top}$ with $\boldsymbol{E} \sim \mathcal{N}{n}\left(\mathbf{0}, \sigma^{2} I{n}\right)$. The notation $B$ stands for a block-diagonal vector of $\left(\mathbf{1}{n{1}}, \ldots, \mathbf{1}{n{p}}\right)$ which can subdivide into two matrices $\boldsymbol{B}{1}$ and $\boldsymbol{B}{2}$ as $\left(\boldsymbol{B}{1}, \boldsymbol{B}{2}\right)$, where $\mathbf{1}{n{j}}=(1, \ldots, 1)^{\top}$ is an $n_{i}$-tuples of $1 \mathrm{~s}, \boldsymbol{I}{n}$ is the $n$-dimensional identity matrix where $n=n{1}+\cdots+n_{p}$, and $\sigma^{2}$ is the known variance of the errors.

Our objective is to estimate and select the treatments $\theta=\left(\theta_{1}, \ldots, \theta_{p}\right)^{\top}$ when we suspect that the subset $\theta_{2}=\left(\theta_{p_{1}+1}, \ldots, \theta_{p}\right)^{\top}$ may be $\mathbf{0}$, i.e. ineffective. Thus, we consider the model (3.1) and discuss the LSE of $\theta$ in Section 3.2.1.

## 统计代写|似然估计作业代写Probability and Estimation代考|Estimation of Treatment Effects

First, we consider the unrestricted LSE of $\theta=\left(\theta_{1}^{\top}, \theta_{2}^{\top}\right)^{\top}$ given by
$\tilde{\theta}{n}=\operatorname{argmin}{\theta}\left{\left(\boldsymbol{Y}-\boldsymbol{B}{1} \boldsymbol{\theta}{1}-\boldsymbol{B}{2} \boldsymbol{\theta}{2}\right)^{\top}\left(\boldsymbol{Y}-\boldsymbol{B}{1} \boldsymbol{\theta}{1}-\boldsymbol{B}{2} \boldsymbol{\theta}{2}\right)\right}$
$=\left(\begin{array}{cc}\boldsymbol{B}{1}^{\top} \boldsymbol{B}{1} & \boldsymbol{B}{1}^{\top} \boldsymbol{B}{2} \ \boldsymbol{B}{2}^{\top} \boldsymbol{B}{1} & \boldsymbol{B}{2}^{\top} \boldsymbol{B}{2}\end{array}\right)^{-1}\left(\begin{array}{c}\boldsymbol{B}{1}^{\top} \boldsymbol{Y} \ \boldsymbol{B}{2}^{\top} \boldsymbol{Y}\end{array}\right)=\left(\begin{array}{cc}\boldsymbol{N}{1} & \mathbf{0} \ \mathbf{0} & \boldsymbol{N}{2}\end{array}\right)^{-1}\left(\begin{array}{c}\boldsymbol{B}{1}^{\top} \boldsymbol{Y} \ \boldsymbol{B}{2}^{\top} \boldsymbol{Y}\end{array}\right)$
$=\left(\begin{array}{c}\boldsymbol{N}{1}^{-1} \boldsymbol{B}{1}^{\top} \boldsymbol{Y} \ \boldsymbol{N}{2}^{-1} \boldsymbol{B}{2}^{\top} \boldsymbol{Y}\end{array}\right)=\left(\begin{array}{l}\tilde{\boldsymbol{\theta}}{1 n} \ \tilde{\boldsymbol{\theta}}{2 n}\end{array}\right)$,
where $\boldsymbol{N}=\boldsymbol{B}^{\top} \boldsymbol{B}=\operatorname{Diag}\left(n_{1}, \ldots, n_{p}\right), \quad \boldsymbol{N}{1}=\operatorname{Diag}\left(n{1}, \ldots, n_{p_{1}}\right), \quad$ and $\quad \boldsymbol{N}{2}=$ $\operatorname{Diag}\left(n{p_{1}+1}, \ldots, n_{p}\right)$.

In case $\sigma^{2}$ is unknown, the best linear unbiased estimator (BLUE) of $\sigma^{2}$ is given by
$$s_{n}^{2}=(n-p)^{-1}\left(\boldsymbol{Y}-\boldsymbol{B}{1} \tilde{\theta}{1 n}-\boldsymbol{B}{2} \tilde{\theta}{2 n}\right)^{\top}\left(\boldsymbol{Y}-\boldsymbol{B}{1} \tilde{\theta}{1 n}-\boldsymbol{B}{2} \tilde{\theta}{2 n}\right) .$$
Clearly, $\tilde{\theta}{n} \sim \mathcal{N}{p}\left(\theta, \sigma^{2} N^{-1}\right)$ is independent of $m s_{n}^{2} / \sigma^{2}(m=n-p)$, which follows a central $\chi^{2}$ distribution with $m$ degrees of freedom (DF).

When $\theta_{2}=\mathbf{0}$, the restricted least squares estimator (RLSE) of $\theta_{\mathrm{R}}=\left(\theta_{1}^{\top}, \mathbf{0}^{\top}\right)^{\top}$ is given by $\hat{\boldsymbol{\theta}}{\mathrm{R}}=\left(\tilde{\boldsymbol{\theta}}{1 n}^{\top}, \boldsymbol{0}^{\top}\right)^{\top}$, where $\tilde{\boldsymbol{\theta}}{1 n}=\boldsymbol{N}{1}^{-1} \boldsymbol{B}_{1}^{\top} \boldsymbol{Y}$.

\tilde{\theta}{n}=\operatorname{argmin}{\theta}\left{\left(\boldsymbol{Y}-\boldsymbol{B}{1} \boldsymbol{\theta}{1}-\ boldsymbol{B}{2} \boldsymbol{\theta}{2}\right)^{\top}\left(\boldsymbol{Y}-\boldsymbol{B}{1} \boldsymbol{\theta}{1} -\boldsymbol{B}{2} \boldsymbol{\theta}{2}\right)\right}\tilde{\theta}{n}=\operatorname{argmin}{\theta}\left{\left(\boldsymbol{Y}-\boldsymbol{B}{1} \boldsymbol{\theta}{1}-\ boldsymbol{B}{2} \boldsymbol{\theta}{2}\right)^{\top}\left(\boldsymbol{Y}-\boldsymbol{B}{1} \boldsymbol{\theta}{1} -\boldsymbol{B}{2} \boldsymbol{\theta}{2}\right)\right}
=(乙1⊤乙1乙1⊤乙2 乙2⊤乙1乙2⊤乙2)−1(乙1⊤是 乙2⊤是)=(ñ10 0ñ2)−1(乙1⊤是 乙2⊤是)
=(ñ1−1乙1⊤是 ñ2−1乙2⊤是)=(θ~1n θ~2n),

sn2=(n−p)−1(是−乙1θ~1n−乙2θ~2n)⊤(是−乙1θ~1n−乙2θ~2n).

