### 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Multiple Linear Model

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Multiple Linear Model

The simple linear model and the analysis of variance model can be viewed as a particular case of a more general linear model where the variations of one variable $y$ are explained by $p$ explanatory variables $x$ respectively. Let $y(n \times 1)$ and $\mathcal{X}(n \times p)$ be a vector of observations on the response variable and a data matrix on the $p$ explanatory variables. An important application of the developed theory is the least squares fitting. The idea is to approximate $y$ by a linear combination $\hat{y}$ of columns of $\mathcal{X}$, i.e. $\hat{y} \in C(\mathcal{X})$. The problem is to find $\hat{\beta} \in \mathbb{R}^{p}$ such that $\hat{y}=\mathcal{X} \hat{\beta}$ is the best fit of $y$ in the least-squares sense. The linear model can be written as
$$y=X \beta+\varepsilon$$
where $\varepsilon$ are the errors. The least squares solution is given by $\hat{\beta}$ :
$$\hat{\beta}=\arg \min {\beta}(y-\mathcal{X} \beta)^{\top}(y-\mathcal{X} \beta)=\arg \min {\beta} \varepsilon^{\top} \varepsilon$$

Suppose that $\left(\mathcal{X}^{\top} \mathcal{X}\right)$ is of full rank and thus invertible. Minimising the expression (3.51) with respect to $\beta$ yields:
$$\hat{\beta}=\left(\mathcal{X}^{\top} \mathcal{X}\right)^{-1} \mathcal{X}^{\top} y$$
The fitted value $\hat{y}=\mathcal{X} \hat{\beta}=\mathcal{X}\left(\mathcal{X}^{\top} \mathcal{X}^{-1} \mathcal{X}^{\top} y=\mathcal{P} y\right.$ is the projection of $y$ onto $C(\mathcal{X})$ as computed in (2.47).
The least squares residuals are
$$e=y-\hat{y}=y-\mathcal{X} \hat{\beta}=\mathcal{Q} y=\left(\mathcal{I}{n}-\mathcal{P}\right) y$$ The vector $e$ is the projection of $y$ onto the orthogonal complement of $C(\mathcal{X})$. Remark $3.5$ A linear model with an intercept $\alpha$ can also be written in this framework. The approximating equation is: $$y{i}=\alpha+\beta_{1} x_{i 1}+\cdots+\beta_{p} x_{i p}+\varepsilon_{i} ; i=1, \ldots, n .$$
This can be written as:
$$y=\mathcal{X}^{} \beta^{}+\varepsilon$$
where $\mathcal{X}^{}=\left(1_{n} \mathcal{X}\right)$ (we add a column of ones to the data). We have by (3.52): $$\hat{\beta}^{}=\left(\begin{array}{l} \hat{\alpha} \ \hat{\beta} \end{array}\right)=\left(\mathcal{X}^{* \top} \mathcal{X}^{}\right)^{-1} \mathcal{X}^{ \top} y$$
Example $3.15$ Let us come back to the “classic blue” pullovers example. In Example 3.11, we considered the regression fit of the sales $X_{1}$ on the price $X_{2}$ and concluded that there was only a small influence of sales by changing the prices. A linear model incorporating all three variables allows us to approximate sales as a linear function of price $\left(X_{2}\right)$, advertisement $\left(X_{3}\right)$ and presence of sales assistants ( $X_{4}$ ) simultaneously. Adding a column of ones to the data (in order to estimate the intercept $\alpha$ ) leads to
$$\hat{\alpha}=65.670 \text { and } \widehat{\beta}{1}=-0.216, \widehat{\beta}{2}=0.485, \widehat{\beta}{3}=0.844$$ The coefficient of determination is computed as before in ( $3.40)$ and is: $$r^{2}=1-\frac{e^{\top} e}{\sum\left(y{i}-\bar{y}\right)^{2}}=0.907 .$$
We conclude that the variation of $X_{1}$ is well approximated by the linear relation.

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|The ANOVA Model in Matrix Notation

The simple ANOVA problem (Sect. 3.5) may also be rewritten in matrix terms. Recall the definition of a vector of ones from $(2.1)$ and define a vector of zeros as $0_{n}$. Then construct the following $(n \times p)$ matrix (here $p=3$ ),
$$\mathcal{X}=\left(\begin{array}{lll} 1_{m} & 0_{m} & 0_{m} \ 0_{m} & 1_{m} & 0_{m} \ 0_{m} & 0_{m} & 1_{m} \end{array}\right)$$
where $m=10$. Equation (3.41) then reads as follows.
The parameter vector is $\beta=\left(\mu_{1}, \mu_{2}, \mu_{3}\right)^{\top}$. The data set from Example $3.14$ can therefore be written as a linear model $y=\mathcal{X} \beta+\varepsilon$ where $y \in \mathbb{R}^{n}$ with $n=m \cdot p$ is the stacked vector of the columns of Table 3.1. The projection into the column space $C(\mathcal{X})$ of (3.54) yields the least-squares estimator $\hat{\beta}=\left(\mathcal{X}^{\top} \mathcal{X}^{-1} \mathcal{X}^{\top} y\right.$. Note that $\left(\mathcal{X}^{\top} \mathcal{X}\right)^{-1}=(1 / 10) \mathcal{I}{3}$ and that $\mathcal{X}^{\top} y=(106,124,151)^{\top}$ is the sum $\sum{k=1}^{m} y_{k j}$ for each factor, i.e. the three column sums of Table 3.1. The least squares estimator is therefore the vector $\hat{\beta}{H{1}}=\left(\hat{\mu}{1}, \hat{\mu}{2}, \hat{\mu}{3}\right)=(10.6,12.4,15.1)^{\top}$ of sample means for each factor level $j=1,2,3$. Under the null hypothesis of equal mean values $\mu{1}=\mu_{2}=\mu_{3}=\mu$, we estimate the parameters under the same constraints. This can be put into the form of a linear constraint:
\begin{aligned} &-\mu_{1}+\mu_{2}=0 \ &-\mu_{1}+\mu_{3}=0 \end{aligned}
This can be written as $\mathcal{A} \beta=a$, where
$$a=\left(\begin{array}{l} 0 \ 0 \end{array}\right)$$ and
$$\mathcal{A}=\left(\begin{array}{lll} -1 & 1 & 0 \ -1 & 0 & 1 \end{array}\right)$$

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Multivariate Distributions

The preceding chapter showed that by using the two first moments of a multivariate distribution (the mean and the covariance matrix), a lot of information on the relationship between the variables can be made available. Only basic statistical theory was used to derive tests of independence or of linear relationships. In this chapter we give an introduction to the basic probability tools useful in statistical multivariate analysis.

Means and covariances share many interesting and useful properties, but they represent only part of the information on a multivariate distribution. Section $4.1$ presents the basic probability tools used to describe a multivariate random variable, including marginal and conditional distributions and the concept of independence. In Sect. 4.2, basic properties on means and covariances (marginal and conditional ones) are derived.

Since many statistical procedures rely on transformations of a multivariate random variable, Sect. $4.3$ proposes the basic techniques needed to derive the distribution of transformations with a special emphasis on linear transforms. As an important example of a multivariate random variable, Sect. $4.4$ defines the multinormal distribution. It will be analysed in more detail in Chap. 5 along with most of its “companion” distributions that are useful in making multivariate statistical inferences.

The normal distribution plays a central role in statistics because it can be viewed as an approximation and limit of many other distributions. The basic justification relies on the central limit theorem presented in Sect. 4.5. We present this central theorem in the framework of sampling theory. A useful extension of this theorem is also given: it is an approximate distribution to transformations of asymptotically normal variables. The increasing power of computers today makes it possible to consider alternative approximate sampling distributions. These are based on resampling techniques and are suitable for many general situations. Section $4.8$ gives an introduction to the ideas behind bootstrap approximations.

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Multiple Linear Model

b^=参数⁡分钟b(是−Xb)⊤(是−Xb)=参数⁡分钟be⊤e

b^=(X⊤X)−1X⊤是

b^=(一个^ b^)=(X∗⊤X)−1X⊤是

r2=1−和⊤和∑(是一世−是¯)2=0.907.

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|The ANOVA Model in Matrix Notation

X=(1米0米0米 0米1米0米 0米0米1米)

−μ1+μ2=0 −μ1+μ3=0

