## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|STAT4102

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Lasso in the Linear Regression Model

The linear regression model can be written as follows:
$$y=\mathcal{X} \beta+\varepsilon,$$
where $y$ is an $(n \times 1)$ vector of observations for the response variable, $\mathcal{X}=$ $\left(x_1^{\top}, \ldots, x_n^{\top}\right)^{\top}, x_i \in \mathbb{R}^p, i=1, \ldots, n$ is a data matrix of $p$ explanatory variables, and $\varepsilon=\left(\varepsilon_1, \ldots, \varepsilon_n\right)^{\top}$ is a vector of errors where $\mathrm{E}\left(\varepsilon_i\right)=0$ and $\operatorname{Var}\left(\varepsilon_i\right)=\sigma^2$, $i=1, \ldots, n$.

In this framework, $\mathrm{E}(y \mid \mathcal{X})=\mathcal{X} \beta$ with $\beta=\left(\beta_1, \ldots, \beta_p\right)^{\top}$. Further assume that the columns of $\mathcal{X}$ are standardised such that $n^{-1} \sum_{i=1}^n x_{i j}=0$ and $n^{-1} \sum_{i=1}^n x_{i j}^2=$ 1. The Lasso estimate $\hat{\beta}$ can then be defined as follows
$$\hat{\beta}=\arg \min \beta\left{\sum{i=1}^n\left(y_i-x_i^{\top} \beta\right)^2\right}, \text { subject to } \sum_{j=1}^p\left|\beta_j\right| \leq s,$$
where $s \geq 0$ is the tuning parameter which controls the amount of shrinkage. For the OLS estimate $\hat{\beta}^0=\left(\mathcal{X}^{\top} \mathcal{X}\right)^{-1} \mathcal{X}^{\top} y$ a choice of tuning parameter $s<s_0$, where $s_0=\sum_{j=1}^p\left|\hat{\beta}j^0\right|$, will cause shrinkage of the solutions towards 0 , and ultimately some coefficients may be exactly equal to 0 . For values $s \geq s_0$ the Lasso coefficients are equal to the unpenalised OLS coefficients. An alternative representation of $(9.1)$ is: $$\hat{\beta}=\arg \min \beta\left{\sum_{i=1}^n\left(y_i-x_i^{\top} \beta\right)^2+\lambda \sum_{j=1}^p\left|\beta_j\right|\right},$$ with a tuning parameter $\lambda \geq 0$. As $\lambda$ increases, the Lasso estimates are continuously shrunk toward zero. Then if $\lambda$ is quite large, some coefficients are exactly zero. For $\lambda=0$ the Lasso coefficients coincide with the OLS estimate. In fact, if the solution to (9.1) is denoted as $\hat{\beta}s$ and the solution to (9.2) as $\hat{\beta}\lambda$, then $\forall \lambda>0$ and the resulting solution $\hat{\beta}\lambda \exists s\lambda$ such that $\hat{\beta}\lambda=\hat{\beta}{s_\lambda}$ and vice versa which implies a one-toone correspondence between these parameters. However, this does not hold if it is required that $\lambda \geq 0$ only and not $\lambda>0$, because if, for instance, $\lambda=0$, then $\hat{\beta}_\lambda$ is the same for any $s \geq|\hat{\beta}|_1$ and the correspondence is no longer one-to-one.

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|The LAR Algorithm and Lasso Solution Paths

The LAR algorithm may be introduced in the simple three-dimensional case as follows (assume that the number of covariates $p=3$ ):

• first, standardise all the covariates to have mean 0 and unit length as well as make the response variable have mean zero;
• start with $\hat{\beta}=0$;
• initialise the algorithm with the first two covariates: let $\mathcal{X}=\left(x_1, x_2\right)$ and calculate the prediction vector $\hat{y}_0=\mathcal{X} \hat{\beta}=0$;
• calculate $\bar{y}_2$ the projection of $y$ onto $\mathcal{L}\left(x_1, x_2\right)$, the linear space spanned by $x_1$ and $x_2$
• compute the vector of current correlations between the covariates $\mathcal{X}$ and the two-dimensional current residual vector: $C^{\hat{y}_0}=\mathcal{X}^{\top}\left(\bar{y}_2-\hat{y}_0\right)=\left(c_1^{\hat{y}_0}, c_2^{\hat{y}_0}\right)^{\top}$. According to Fig.9.2, the current residual $\bar{y}_2-\hat{y}_0$ makes a smaller angle with $x_1$, than with $x_2$, therefore $c_1^{\hat{y}}>c_2^{\hat{y_0}}$;
• augment $\hat{y}_0$ in the direction of $x_1$ so that $\hat{y}_1=\hat{y}_0+\hat{\gamma}_1 x_1$ with $\hat{\gamma}_1$ chosen such that $c_1^{\hat{y}_0}=c_2^{\hat{y}_0}$ which means that the new current residual $\bar{y}_2-\hat{y}_1$ makes equal angles (is equiangular) with $x_1$ and $x_2$;
• suppose that another regressor $x_3$ enters the model: calculate a new projection $\bar{y}_3$ of $y$ onto $\mathcal{L}\left(x_1, x_2, x_3\right)$;
• recompute the current correlations vector $C^{\hat{y}_1}=\left(c_1^{\hat{y}_1}, c_2^{\hat{y}_1}, c_3^{\hat{y}_1}\right)^{\top}$ with $\mathcal{X}=$ $\left(x_1, x_2, x_3\right), \bar{y}_3$ and $\hat{y}_1$;
• augment $\hat{y}_1$ in the equiangular direction so that $\hat{y}_2=\hat{y}_1+\hat{\gamma}_2 u_2$ with $\hat{\gamma}_2$ chosen such that $c_1^{\hat{y_1}}=c_2^{\hat{y_1}}=c_3^{\hat{y_1}}$, then the new current residual $\bar{y}_3-\hat{y}_2$ goes
• equiangularly between $x_1, x_2$ and $x_3$ (here $u_2$ is the unit vector lying along the equiangular direction $\hat{y}_2$ );
• the three-dimensional algorithm is terminated with the calculation of the final prediction vector $\hat{y}_3=\hat{y}_2+\hat{\gamma}_3 u_3$ with $\hat{\gamma}_3$ chosen such that $\hat{y}_3=\bar{y}_3$.
• In the case of $p>3$ covariates, $\hat{y}_3$ would be smaller than $\bar{y}_3$ initiating another change of direction, as illustrated in Fig. 9.2.
• In this setup, it is important that the covariate vectors $x_1, x_2, x_3$ are linearly independent. The LAR algorithm “moves” the variable coefficients to their least squares values. So the Lasso adjustment necessary for the sparse solution is that if a nonzero coefficient happens to return to zero, it should be dropped from the current (“active”) set of variables and not be considered in further computations. The general LAR algorithm for $p$ predictors can be summarised as follows.

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Lasso in the Linear Regression Model

$$y=\mathcal{X} \beta+\varepsilon,$$

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|The LAR Algorithm and Lasso Solution Paths

LAR算法可以在简单的三维情况下引入如下 (假设协变量的数量 $p=3$ ):

• 首先，将所有协变量标准化为均值为 0 和单位长度，并使响应变量的均值为零；
• 从…开始 $\hat{\beta}=0$;
• 用前两个协变量初始化算法: 让 $\mathcal{X}=\left(x_1, x_2\right)$ 并计算预测向量 $\hat{y}_0=\mathcal{X} \hat{\beta}=0$;
• 计算 $\bar{y}_2$ 的投射 $y$ 到 $\mathcal{L}\left(x_1, x_2\right)$ ，线性空间跨越 $x_1$ 和 $x_2$
• 计算协变量之间当前相关性的向量 $\mathcal{X}$ 和二维当前残差向量:
$C^{\hat{y}_0}=\mathcal{X}^{\top}\left(\bar{y}_2-\hat{y}_0\right)=\left(c_1^{\hat{y}_0}, c_2^{\hat{y}_0}\right)^{\top}$. 根据图 9.2，当前残差 $\bar{y}_2-\hat{y}_0$ 与 $x_1$ ，比与 $x_2$ ，所以 $c_1^{\hat{y}}>c_2^{\hat{y_0}}$;
• 增加 $\hat{y}_0$ 在…方向 $x_1$ 以便 $\hat{y}_1=\hat{y}_0+\hat{\gamma}_1 x_1$ 和 $\hat{\gamma}_1$ 选择这样的 $c_1^{\hat{y}_0}=c_2^{\hat{y}_0}$ 这意味着新的当前残差 $\bar{y}_2-\hat{y}_1$ 使角相等 (等角) 与 $x_1$ 和 $x_2$ ；
• 假设另一个回归量 $x_3$ 进入模型: 计算一个新的投影 $\bar{y}_3$ 的 $y$ 到 $\mathcal{L}\left(x_1, x_2, x_3\right)$;
• 重新计算当前相关向量 $C^{\hat{y}_1}=\left(c_1^{\hat{y}_1}, c_2^{\hat{y}_1}, c_3^{\hat{y}_1}\right)^{\top}$ 和 $\mathcal{X}=\left(x_1, x_2, x_3\right), \bar{y}_3$ 和 $\hat{y}_1$ ；
• 增加 $\hat{y}_1$ 在等角方向，使得 $\hat{y}_2=\hat{y}_1+\hat{\gamma}_2 u_2$ 和 $\hat{\gamma}_2$ 选择这样的 $c_1^{\hat{y}_1}=c_2^{\hat{y_1}}=c_3^{\hat{y_1}}$ ，那么新的当前残差 $\bar{y}_3-\hat{y}_2$ 去
• 之间等角 $x_1, x_2$ 和 $x_3$ (这里 $u_2$ 是沿等角方向的单位向量 $\hat{y}_2$ )；
• 三维算法以最终预测向量的计算结束 $\hat{y}_3=\hat{y}_2+\hat{\gamma}_3 u_3$ 和 $\hat{\gamma}_3$ 选择这样的 $\hat{y}_3=\bar{y}_3$.
• 如果是 $p>3$ 协变量， $\hat{y}_3$ 会小于 $\bar{y}_3$ 开始另一个方向改变，如图 $9.2$ 所示。
• 在此设置中，重要的是协变量向量 $x_1, x_2, x_3$ 是线性独立的。LAR 算法将可变系数“移动”到它们的最 小二乘值。因此，稀疏解决方案所需的套索调整是，如果非零系数恰好返回零，则应将其从当前
(“活动”) 变量集中删除，并且在进一步的计算中不予考虑。一般的 LAR 算法为 $p$ 预测因素可以总结 如下。

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|MAST9008

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Testing Issues with Count Data

One of the main practical interests in regression models for contingency tables is to test restrictions on the parameters of a more complete model. These testing ideas are created in the same spirit as in Sect. $3.5$ where we tested restrictions in ANOVA models.

In linear models, the test statistics is based on the comparison of the goodness of fit for the full model and for the reduced model. Goodness of fit is measured by the residual sum of squares (RSS). The idea here will be the same here but with a more appropriate measure for goodness of fit. Once a model has been estimated, we can compute the predicted value under that model for each cell of the table. We will denote, as above, the observed value in a cell by $y_k$ and $\hat{m}_k$ will denote the expected value predicted by the model. The goodness of fit may be appreciated by measuring, in some way, the distance between the series of observed and of predicted values.

Two statistics are proposed: the Pearson chi-square $X^2$ and the Deviance noted $G^2$. They are defined as follows:
\begin{aligned} X^2 & =\sum_{k=1}^K \frac{\left(y_k-\hat{m}k\right)^2}{\hat{m}_k} \ G^2 & =2 \sum{k=1}^K y_k \log \left(\frac{y_k}{\hat{m}_k}\right) \end{aligned}
where $K$ is the total number of cells of the table. The deviance is directly related to the log-likelihood ratio statistic and is usually preferred because it can be used to compare nested models as we usually do in this context.

Under the hypothesis that the model used to compute the predicted value is true, both statistics (for large samples) are approximately distributed as a $\chi^2$ variable with degrees of freedom $d . f$. depending on the model. The $d . f$. can be computed as follows:
d.f. $=$ # free cells $-$ # free parameters estimated.
For saturated models, the fit is perfect: $X^2=G^2=0$ with $d . f .=0$.
Suppose now that we want to test a reduced model which is a restricted version of a full model. The deviance can then be used as the $F$ statistics in linear regression. The test procedure is straightforward:
$H_0$ : reduced model with $r$ degrees of freedom
$H_1$ : full model with $f$ degrees of freedom.

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Logit Models for Binary Response

Consider the vector $y(n \times 1)$ of observations on a binary response variable (a value of ” 1 ” indicating the presence of a particular qualitative trait and a value of ” 0 “, its absence). The logit model makes the assumption that the probability for observing $y_i=1$ given a particular value of $x_i=\left(x_{i 1}, \ldots, x_{i p}\right)^{\top}$ is given by the logistic function of a “score”, a linear combination of $x$ :
$$p\left(x_i\right)=\mathrm{P}\left(y_i=1 \mid x_i\right)=\frac{\exp \left(\beta_0+\sum_{j=1}^p \beta_j x_{i j}\right)}{1+\exp \left(\beta_0+\sum_{j=1}^p \beta_j x_{i j}\right)} .$$
This entails the probability of the absence of the trait:
$$1-p\left(x_i\right)=\mathrm{P}\left(y_i=0 \mid x_i\right)=\frac{1}{1+\exp \left(\beta_0+\sum_{j=1}^p \beta_j x_{i j}\right)},$$
which implies
$$\log \left{\frac{p\left(x_i\right)}{1-p\left(x_i\right)}\right}=\beta_0+\sum_{j=1}^p \beta_j x_{i j} .$$
This indicates that the logit model is equivalent to a log-linear model for the odds ratio $p\left(x_i\right) /\left{1-p\left(x_i\right)\right}$. A positive value of $\beta_j$ indicates an explanatory variable $x_j$ that will favour the presence of the trait since it improves the odds. A zero value of $\beta_j$ corresponds to the absence of an effect of this variable on the appearance of the qualitative trait.
For i.i.d observations the likelihood function is:
$$L\left(\beta_0, \beta\right)=\prod_{i=1}^n p\left(x_i\right)^{y_i}\left{1-p\left(x_i\right)\right}^{1-y_i} .$$
The maximum likelihood estimators of the $\beta$ ‘s are obtained as the solution of the non-linear maximisation problem $\left(\hat{\beta}0, \hat{\beta}\right)=\arg \max {\beta_0, \beta} \log L\left(\beta_0, \beta\right)$ where
$$\log L\left(\beta_0, \beta\right)=\sum_{i=1}^n\left[y_i \log p\left(x_i\right)+\left(1-y_i\right) \log \left{1-p\left(x_i\right)\right}\right]$$

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Testing Issues with Count Data

$\mathrm{df}=#$ 自由细胞一# 估计的自由参数。

$H_0$ ：缩小模型 $r$ 自由程度
$H_1$ : 完整模型 $f$ 自由程度。

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Logit Models for Binary Response

$$p\left(x_i\right)=\mathrm{P}\left(y_i=1 \mid x_i\right)=\frac{\exp \left(\beta_0+\sum_{j=1}^p \beta_j x_{i j}\right)}{1+\exp \left(\beta_0+\sum_{j=1}^p \beta_j x_{i j}\right)}$$

$$1-p\left(x_i\right)=\mathrm{P}\left(y_i=0 \mid x_i\right)=\frac{1}{1+\exp \left(\beta_0+\sum_{j=1}^p \beta_j x_{i j}\right)},$$

$(\hat{\beta} 0, \hat{\beta})=\arg \max \beta_0, \beta \log L\left(\beta_0, \beta\right)$ 在哪里

## 有限元方法代写

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|OLET5610

statistics-lab™ 为您的留学生涯保驾护航 在代写多元统计分析Multivariate Statistical Analysis方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写多元统计分析Multivariate Statistical Analysis代写方面经验极为丰富，各种代写多元统计分析Multivariate Statistical Analysis相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Log-Linear Models for Contingency Tables

Consider a $(J \times K)$ two-way table, where $y_{j k}$ is the number of observations having the nominal value $j$ for the first qualitative character and nominal value $k$ for the second character. Since the total number of observations is fixed $n=$ $\sum_{j=1}^J \sum_{k=1}^K y_{j k}$, there are $J K-1$ free cells in the table. The multinomial likelihood can be written as in (8.6)
$$L=\frac{n !}{\prod_{j=1}^J \prod_{k=1}^K y_{j k} !} \prod_{j=1}^J \prod_{k=1}^K\left(\frac{m_{j k}}{n}\right)^{y_{j k}},$$
where we now introduce a log-linear structure to analyse the role of the rows and the columns to determine the parameters $m_{j k}=\mathrm{E}\left(y_{j k}\right)$ (or $p_{j k}$ ).

1. Model without interaction
Suppose that there is no interaction between the rows and the columns: this corresponds to the hypothesis of independence between the two qualitative characters. In other words, $p_{j k}=p_j p_k$ for all $j, k$. This implies the log-linear model:
$$\log m_{j k}=\mu+\alpha_j+\gamma_k \text { for } j=1, \ldots, J, k=1, \ldots, K,$$
where, as in ANOVA models for identification purposes $\sum_{j=1}^J \alpha_j=\sum_{k=1}^K \gamma_k=$ 0 . Using the same coding devices as above, the model can be written as
$$\log m=\mathcal{X} \beta .$$

For a $(2 \times 3)$ table we have:
$$\log m=\left(\begin{array}{l} \log m_{11} \ \log m_{12} \ \log m_{13} \ \log m_{21} \ \log m_{22} \ \log m_{23} \end{array}\right), \mathcal{X}=\left(\begin{array}{rrrr} 1 & 1 & 1 & 0 \ 1 & 1 & 0 & 1 \ 1 & 1 & -1 & -1 \ 1 & -1 & 1 & 0 \ 1 & -1 & 0 & 1 \ 1 & -1 & -1 & -1 \end{array}\right), \beta=\left(\begin{array}{l} \beta_0 \ \beta_1 \ \beta_2 \ \beta_3 \end{array}\right)$$
where the first column of $\mathcal{X}$ is for the constant term, the second column is the coded column for the 2-levels row effect and the two last columns are the coded columns for the 3-levels column effect. The estimation is obtained by maximising the log-likelihood which is equivalent to maximising the function $L(\beta)$ in $\beta$ :
$$L(\beta)=\sum_{j=1}^J \sum_{k=1}^K y_{j k} \log m_{j k} .$$

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Three-Way Tables

The models presented above for two-way tables can be extended to higher order tables but at a cost of notational complexity. We show how to adapt to threeway tables. This deserves special attention due to the presence of higher-order interactions in the saturated model.

A $(J \times K \times L)$ three-way table may be constructed under multinomial sampling as follows: each of the $n$ observations falls in one, and only one, category of each of three categorical variables having $J, K$ and $L$ modalities respectively. We end up with a three-dimensional table with $J K L$ cells containing the counts $y_{j k \ell}$ where $n=\sum_{j, k, \ell} y_{j k \ell}$. The expected counts depend on the unknown probabilities $p_{j k \ell}$ in the usual way:
$$m_{j k \ell}=n p_{j k \ell}, j=1, \ldots, J, k=1, \ldots, K, \ell=1, \ldots, L$$

1. The saturated model
A full saturated log-linear model reads as follows:
\begin{aligned} \log m_{j k \ell}= & \mu+\alpha_j+\beta_k+\gamma \ell+(\alpha \beta){j k}+(\alpha \gamma){j \ell}+(\beta \gamma){k \ell}+(\alpha \beta \gamma){j k \ell}, \ j & =1, \ldots, J, k=1, \ldots, K, \ell=1, \ldots, L . \end{aligned}
The restrictions are the following (using the “dot” notation for summation on the corresponding indices):
\begin{aligned} & \alpha_{(\bullet)}=\beta_{(\bullet)}=\gamma_{(\bullet)}=0 \ & (\alpha \beta){j \bullet}=(\alpha \gamma){j \bullet}=(\beta \gamma){k \bullet}=0 \ & (\alpha \beta){\bullet k}=(\alpha \gamma){\bullet \ell}=(\beta \gamma){\bullet \ell}=0 \ & (\alpha \beta \gamma){j k \bullet}=(\alpha \beta \gamma){j \bullet \ell}=(\alpha \beta \gamma){\bullet k \ell}=0 \end{aligned} The parameters $(\alpha \beta){j k},(\alpha \gamma){j \ell},(\beta \gamma){k \ell}$ are called first-order interactions. The second-order interactions are the parameters $(\alpha \beta \gamma){j k \ell}$, they allow to take into account heterogeneities in the interactions between two of the three variables. For instance, let $\ell$ stand for the two gender categories $(L=2)$, if we suppose that $(\alpha \beta \gamma){j k 1}=-(\alpha \beta \gamma)_{j k 2} \neq 0$. we mean that the interactions between the variable $J$ and $K$ are not the same for both gender categories.

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Log-Linear Models for Contingency Tables

$$L=\frac{n !}{\prod_{j=1}^J \prod_{k=1}^K y_{j k} !} \prod_{j=1}^J \prod_{k=1}^K\left(\frac{m_{j k}}{n}\right)^{y_{j k}}$$

1. 没有交互作用
的模型假设行和列之间没有交互作用：这对应于两个定性特征之间的独立性假设。换句话说， $p_{j k}=p_j p_k$ 对所有人 $j, k$. 这意味着对数线性模型:
$$\log m_{j k}=\mu+\alpha_j+\gamma_k \text { for } j=1, \ldots, J, k=1, \ldots, K,$$
其中，与用于识别目的的 ANOVA 模型一样 $\sum_{j=1}^J \alpha_j=\sum_{k=1}^K \gamma_k=0$ 。使用与上述相同的编码设 备，模型可以写成
$$\log m=\mathcal{X} \beta$$
为一个 $(2 \times 3)$ 我们有表:
其中第一列 $\mathcal{X}$ 是常数项，第二列是 2 级行效应的编码列，最后两列是 3 级列效应的编码列。估计是通过最 大化对数似然得到的，这相当于最大化函数 $L(\beta)$ 在 $\beta$ :
$$L(\beta)=\sum_{j=1}^J \sum_{k=1}^K y_{j k} \log m_{j k} .$$

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Three-Way Tables

$$m_{j k \ell}=n p_{j k \ell}, j=1, \ldots, J, k=1, \ldots, K, \ell=1, \ldots, L$$

1. 饱和模型
一个完整的饱和对数线性模型如下:
$$\log m_{j k \ell}=\mu+\alpha_j+\beta_k+\gamma \ell+(\alpha \beta) j k+(\alpha \gamma) j \ell+(\beta \gamma) k \ell+(\alpha \beta \gamma) j k \ell, j=1, \ldots, J$$
限制如下（使用“点”符号对相应索引求和）：
$$\alpha_{(\bullet)}=\beta_{(\bullet)}=\gamma_{(\bullet)}=0 \quad(\alpha \beta) j \bullet=(\alpha \gamma) j \bullet=(\beta \gamma) k \bullet=0(\alpha \beta) \bullet k=(\alpha \gamma) \bullet \ell=(\beta \gamma) \bullet \ell$$
参数 $(\alpha \beta) j k,(\alpha \gamma) j \ell,(\beta \gamma) k \ell$ 称为一阶相互作用。二阶相互作用是参数 $(\alpha \beta \gamma) j k \ell$ ，它们允许考虑 三个变量中两个变量之间相互作用的异质性。例如，让 $\ell$ 代表两个性别类别 $(L=2)$ ，如果我们假设 $(\alpha \beta \gamma) j k 1=-(\alpha \beta \gamma)_{j k 2} \neq 0$. 我们的意思是变量之间的相互作用 $J$ 和 $K$ 两种性别类别的情况并不相同。

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|STAT4102

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Parallel Coordinate Plots

Parallel Coordinates Plots (PCP) is a method for representing high-dimensional data, see Inselberg (1985). Instead of plotting observations in an orthogonal coordinate system, PCP draws coordinates in parallel axes and connects them with straight lines. This method helps in representing data with more than four dimensions.

One first scales all variables to $\max =1$ and $\min =0$. The coordinate index $j$ is drawn onto the horizontal axis, and the scaled value of variable $x_{i j}$ is mapped onto the vertical axis. This way of representation is very useful for high-dimensional data. It is, however, also sensitive to the order of the variables, since certain trends in the data can be shown more clearly in one ordering than in another.

PCP can also be used for detecting linear dependencies between variables: if all lines are of almost parallel dimensions $(p=2)$, there is a positive linear dependence between them. In Fig. 1.21, we display the two variables, weight and displacement, for the car data set in Appendix B.3. The correlation coefficient $\rho$ introduced in Sect. $3.2$ is $0.9$. If all lines intersect visibly in the middle, there is evidence of a negative linear dependence between these two variables, see Fig. 1.22. In fact the correlation is $\rho=-0.82$ between two variables mileage and weight: The more the weight the less the mileage.

Another use of PCP is subgroups detection. Lines converging to different discrete points indicate subgroups. Figure $1.23$ shows the last three variables-displacement, gear ratio for high gear, and company’s headquarters of the car data; we see convergence to the last variable. This last variable is the company’s headquarters with three discrete values: U.S., Japan and Europe. PCP can also be used for outlier detection. Figure $1.24$ shows the variables headroom, rear seat clearance, and trunk (boot) space in the car data set. There are two outliers visible. The boxplot Fig. $1.25$ confirms this. PCPs have also possible shortcomings: We cannot distinguish observations when two lines cross at one point unless we distinguish them clearly (e.g., by different line style). In Fig. 1.26, observations A and B both have the same value at $j=2$. Two lines cross at one point here. At the third and fourth dimension, we cannot tell which line belongs to which observation. A dotted line for $\mathrm{A}$ and solid line for $\mathrm{B}$ could have helped there.

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Hexagon Plots

This section closely follows the presentation of Lewin-Koh (2006). In geometry, a hexagon is a polygon with six edges and six vertices. Hexagon binning is a type of bivariate histogram with hexagon borders. It is useful for visualizing the structure of data sets entailing a large number of observations $n$. The concept of hexagon binning is as follows:

1. The $x y$ plane over the set (range $(x)$, range $(y))$ is tessellated by a regular grid of hexagons.
2. The number of points falling in each hexagon is counted.
3. The hexagons with count $>0$ are plotted by using a color ramp or varying the radius of the hexagon in proportion to the counts.

This algorithm is extremely fast and effective for displaying the structure of data sets even for $n \geq 10^6$. If the size of the grid and the cuts in the color ramp are chosen in a clever fashion, then the structure inherent in the data should emerge in the binned plot. The same caveats apply to hexagon binning as histograms. Variance and bias vary in opposite directions with bin-width, so we have to settle for finding the value of the bin-width that yields the optimal compromise between variance and bias reduction. Clearly, if we increase the size of the grid, the hexagon plot appears to be smoother, but without some reasonable criterion on hand it remains difficult to say which bin-width provides the “optimal” degree of smoothness. The default number of bins suggested by standard software is 30 .

Applications to some data sets are shown as follows. The data is taken from ALLBUS (2006)[ZA No.3762]. The number of respondents is 2946 . The following nine variables have been selected to analyze the relation between each pair of variables:

First, we consider two variables $X_1=$ Age and $X_2=$ Net income in Fig. 1.29. The top left picture is a scatter plot. The second one is a hexagon plot with borders making it easier to see the separation between hexagons. Looking at these plots one can see that almost all individuals have a net monthly income of less than 2000 EUR. Only two individuals earn more than 10000 EUR per month.

Figure $1.30$ shows the relation between $X_1$ and $X_5$. About forty percent of respondents from 20 to 80 years old do not use a computer at least once per week. The respondent who deals with a computer $105 \mathrm{~h}$ each week was actually not in full-time employment.

Clearly, people who earn modest incomes live in smaller flats. The trend here is relatively clear in Fig. 1.31. The larger the net income, the larger the flat. A few people do however earn high incomes but live in small flats.

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Parallel Coordinate Plots

PCP 还可用于检测变量之间的线性相关性：如果所有线的尺寸几乎平行(p=2)，它们之间存在正线性相关性。在图 1.21 中，我们显示了附录 B.3 中汽车数据集的两个变量，重量和排量。相关系数r节中介绍。3.2是0.9. 如果所有线在中间明显相交，则表明这两个变量之间存在负线性相关性，见图 1.22。事实上相关性是r=−0.82里程和重量两个变量之间：重量越大，里程越少。

PCP 的另一个用途是子组检测。会聚到不同离散点的线表示子组。数字1.23显示汽车数据的最后三个变量——排量、高档传动比和公司总部；我们看到收敛到最后一个变量。最后一个变量是公司总部，具有三个离散值：美国、日本和欧洲。PCP 也可用于离群值检测。数字1.24显示了汽车数据集中的变量净空高度、后座间隙和后备箱（后备箱）空间。有两个异常值可见。箱线图。1.25证实了这一点。PCP 也可能有缺点：当两条线在一点交叉时，我们无法区分观察结果，除非我们清楚地区分它们（例如，通过不同的线型）。在图 1.26 中，观测值 A 和 B 在j=2. 两条线在这里交叉一点。在第三维和第四维，我们无法分辨哪条线属于哪个观测值。虚线为一个和实线乙可以帮助那里。

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Hexagon Plots

1. 这X是集合上的平面（范围(X)， 范围(是))由规则的六边形网格镶嵌而成。
2. 计算落在每个六边形中的点数。
3. 有计数的六边形>0通过使用色带或根据计数按比例改变六边形的半径来绘制。

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|MAST90085

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Scatterplots

Scatterplots are bivariate or trivariate plots of variables against each other. They help us understand relationships among the variables of a data set. A downward-sloping scatter indicates that as we increase the variable on the horizontal axis, the variable on the vertical axis decreases. An analogous statement can be made for upward-sloping scatters.

Figure $1.12$ plots the fifth column (upper inner frame) of the bank data against the sixth column (diagonal). The scatter is downward-sloping. As we already know from the previous section on marginal comparison (e.g., Fig. 1.9) a good separation between genuine and counterfeit banknotes is visible for the diagonal variable. The sub-cloud in the upper half (circles) of Fig. $1.12$ corresponds to the true banknotes. As noted before, this separation is not distinct, since the two groups overlap somewhat.
This can be verified in an interactive computing environment by showing the index and coordinates of certain points in this scatterplot. In Fig. 1.12, the 70th observation in the merged data set is given as a thick circle, and it is from a genuine bank note. This observation lies well embedded in the cloud of counterfeit banknotes. One straightforward approach that could be used to tell the counterfeit from the genuine banknotes is to draw a straight line and define notes above this value as genuine. We would, of course, misclassify the 70th observation, but can we do better?

If we extend the two-dimensional scatterplot by adding a third variable, e.g., $X_4$ (lower distance to inner frame), we obtain the scatterplot in three dimensions as shown in Fig. 1.13. It becomes apparent from the location of the point clouds that a better separation is obtained. We have rotated the three-dimensional data until this satisfactory 3D view was obtained. Later, we will see that the rotation is the same as bunding a high-dimensional observation into one or more linear combinations of the elements of the observation vector. In other words, the “separation line” parallel to the horizontal coordinate axis in Fig. $1.12$ is, in Fig. 1.13, a plane and no longer parallel to one of the axes. The formula for such a separation plane is a linear combination of the elements of the observation vector:
$$a_1 x_1+a_2 x_2+\cdots+a_6 x_6=\text { const. }$$
The algorithm that automatically finds the weights $\left(a_1, \ldots, a_6\right)$ will be investigated later on in Chap. 14.

Let us study yet another technique: the scatterplot matrix. If we want to draw all possible two-dimensional scatterplots for the variables, we can create a so-called draftsman’s plot (named after a draftsman who prepares drafts for parliamentary discussions). Similar to a draftsman’s plot the scatterplot matrix helps in creating new ideas and in building knowledge about dependencies and structure.

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Chernoff-Flury Faces

If we are given a data in a numerical form, we tend to also display it numerically. This was done in the preceding sections: an observation $x_1=(1,2)$ was plotted as the point $(1,2)$ in a two-dimensional coordinate system. In multivariate analysis, we want to understand data in low dimensions (e.g., on a $2 \mathrm{D}$ computer screen) although the structures are hidden in high dimensions. The numerical display of data structures using coordinates, therefore, ends at dimensions greater than three.

If we are interested in condensing a structure into 2 D elements, we have to consider alternative graphical techniques. The Chernoff-Flury faces, for example, provide such a condensation of high-dimensional information into a simple “face”. In fact, faces are a simple way of graphically displaying high-dimensional data. The size of the face elements like pupils, eyes, upper and lower hair line, etc., are assigned to certain variables. The idea of using faces goes back to Chernoff (1973) and has been further developed by Bernhard Flury. We follow the design described in Flury and Riedwyl (1988) which uses the following characteristics:

First, every variable that is to be coded into a characteristic face element is transformed into a $(0,1)$ scale, i.e., the minimum of the variable corresponds to 0 and the maximum to 1 . The extreme positions of the face elements, therefore, correspond to a certain “grin” or “happy” face element. Dark hair might be coded as 1, and blond hair as 0 and so on.

As an example, consider the observations 91 to 110 of the bank data. Recall that the bank data set consists of 200 observations of dimension 6 where, for example, $X_6$ is the diagonal of the note. If we assign the six variables to the following face elements:
\begin{aligned} & X_1=1,19 \text { (eye sizes) } \ & X_2=2,20 \text { (pupil sizes) } \ & X_3=4,22 \text { (eye slants) } \ & X_4=11,29 \text { (upper hair lines) } \ & X_5=12,30 \text { (lower hair lines) } \ & X_6=13,14,31,32 \text { (face lines and darkness of hair), } \end{aligned}
we obtain Fig. 1.15.
Also recall that observations 1-100 correspond to the genuine notes, and that observations 101-200 correspond to the counterfeit notes. The counterfeit banknotes then correspond to the upper half of Fig. 1.15. In fact, the faces for these observations look more grim and less happy. The variable $X_6$ (diagonal) already worked well in the boxplot on Fig. 1.4 in distinguishing between the counterfeit and genuine notes. Here, this variable is assigned to the face line and the darkness of the hair. That is why we clearly see a good separation within these 20 observations.

What happens if we include all 100 genuine and all 100 counterfeit banknotes in the Chernoff-Flury face technique? Figure $1.16$ shows the faces of the genuine banknotes with the same assignments as used before and Fig. 1.17 shows the faces of the counterfeit banknotes. Comparing Figs. $1.16$ and $1.17$ one clearly sees that the diagonal (face line) is longer for genuine banknotes. Equivalently coded is the hair darkness (diagonal) which is lighter (shorter) for the counterfeit banknotes. One sees that the faces of the genuine banknotes have a much darker appearance and have broader face lines. The faces in Fig. $1.16$ are obviously different from the ones in Fig. 1.17.

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Chernoff-Flury Faces

X1=1,19 （眼睛大小）  X2=2,20 （瞳孔大小）  X3=4,22 （眼睛倾斜）  X4=11,29 （上发际线）  X5=12,30 （下发际线）  X6=13,14,31,32 （面部线条和头发的深色），

## 有限元方法代写

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|OLET5610

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Histograms

Histograms are density estimates. A density estimate gives a good impression of the distribution of the data. In contrast to boxplots, density estimates show possible multimodality of the data. The idea is to locally represent the data density by counting the number of observations in a sequence of consecutive intervals (bins) with origin $x_0$. Let $B_j\left(x_0, h\right)$ denote the bin of length $h$, which is the element of a bin grid starting at $x_0$ :
$$B_j\left(x_0, h\right)=\left[x_0+(j-1) h, x_0+j h\right), \quad j \in \mathbb{Z},$$
where $[.,$.$) denotes a left-closed and right-open interval. If \left{x_i\right}_{i=1}^n$ is an i.i.d. sample with density $f$, the histogram is defined as follows:
$$\widehat{f}h(x)=n^{-1} h^{-1} \sum{j \in \mathbb{Z}} \sum_{i=1}^n \mathrm{I}\left{x_i \in B_j\left(x_0, h\right)\right} \mathrm{I}\left{x \in B_j\left(x_0, h\right)\right} .$$
In sum (1.7) the first indicator function $\mathrm{I}\left{x_i \in B_j\left(x_0, h\right)\right}$ counts the number of observations falling into bin $B_j\left(x_0, h\right)$. The second indicator function is responsible for “localizing” the counts around $x$. The parameter $h$ is a smoothing or localizing parameter and controls the width of the histogram bins. An $h$ that is too large leads to very big blocks and thus to a very unstructured histogram. On the other hand, an $h$ that is too small gives a very variable estimate with many unimportant peaks.

The effect of $h$ is given in detail in Fig. 1.6. It contains the histogram (upper left) for the diagonal of the counterfeit banknotes for $x_0=137.8$ (the minimum of these observations) and $h=0.1$. Increasing $h$ to $h=0.2$ and using the same origin, $x_0=137.8$, results in the histogram shown in the lower left of the figure. This density histogram is somewhat smoother due to the larger $h$. The bin-width is next set to $h=0.3$ (upper right). From this histogram, one has the impression that the distribution of the diagonal is bimodal with peaks at about $138.5$ and 139.9. The detection of modes requires fine-tuning of the bin-width. Using methods from smoothing methodology Härdle et al. (2004), one can find an “optimal” bin-width $h$ for $n$ observations:
$$h_{o p t}=\left(\frac{24 \sqrt{\pi}}{n}\right)^{1 / 3} .$$

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Kernel Densities

The major difficulties of histogram estimation may be summarized in four critiques:

• determination of the bin-width $h$, which controls the shape of the histogram,
• choice of the bin origin $x_0$, which also influences to some extent the shape,
• loss of information since observations are replaced by the central point of the interval in which they fall,
• the underlying density function is often assumed to be smooth, but the histogram is not smooth.

Rosenblatt (1956), Whittle (1958), and Parzen (1962) developed an approach which avoids the last three difficulties. First, a smooth kernel function rather than a box is used as the basic building block. Second, the smooth function is centered directly over each observation. Let us study this refinement by supposing that $x$ is the center value of a bin. The histogram can in fact be rewritten as
$$\widehat{f}h(x)=n^{-1} h^{-1} \sum{i=1}^n \mathrm{I}\left(\left|x-x_i\right| \leq \frac{h}{2}\right) .$$

If we define $K(u)=\mathrm{I}\left(|u| \leq \frac{1}{2}\right)$, then (1.8) changes to
$$\widehat{f}h(x)=n^{-1} h^{-1} \sum{i=1}^n K\left(\frac{x-x_i}{h}\right) .$$
This is the general form of the kernel estimator. Allowing smoother kernel functions like the quartic kernel,
$$K(u)=\frac{15}{16}\left(1-u^2\right)^2 \mathrm{I}(|u| \leq 1),$$
and computing $x$ not only at bin centers gives us the kernel density estimator. Kernel estimators can also be derived via weighted averaging of rounded points (WARPing) or by averaging histograms with different origins, see Scott (1985). Table $1.5$ introduces some commonly used kernels.

Different kernels generate the different shapes of the estimated density. The most important parameter is the bandwidth $h$, and can be optimized, for example, by crossvalidation; see Härdle (1991) for details. The cross-validation method minimizes the integrated squared error. This measure of discrepancy is based on the squared differences $\left{\hat{f}h(x)-f(x)\right}^2$. Averaging these squared deviations over a grid of points $\left{x_l\right}{l=1}^L$ leads to
$$L^{-1} \sum_{l=1}^L\left{\hat{f}_h\left(x_l\right)-f\left(x_l\right)\right}^2 .$$

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Histograms

$$B_j\left(x_0, h\right)=\left[x_0+(j-1) h, x_0+j h\right), \quad j \in \mathbb{Z},$$

$$h_{\text {opt }}=\left(\frac{24 \sqrt{\pi}}{n}\right)^{1 / 3} .$$

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Kernel Densities

• 箱宽的确定 $h$ ，它控制直方图的形状，
• bin 原点的选择 $x_0$ ，这也在一定程度上影响了形状，
• 信息丢失，因为观测值被它们所在区间的中心点所取代，
• 通常假定底层密度函数是平滑的，但直方图并不平滑。
Rosenblatt (1956)、Whittle (1958) 和 Parzen (1962) 开发了一种避免后三个困难的方法。首先，使用平 滑核函数而不是盒子作为基本构建块。其次，平滑函数直接以每个观察为中心。让我们通过假设来研究这 种改进 $x$ 是 bin 的中心值。直方图实际上可以重写为
$$\widehat{f} h(x)=n^{-1} h^{-1} \sum i=1^n \mathrm{I}\left(\left|x-x_i\right| \leq \frac{h}{2}\right) .$$
如果我们定义 $K(u)=\mathrm{I}\left(|u| \leq \frac{1}{2}\right)$ ，那么 (1.8) 变为
$$\widehat{f} h(x)=n^{-1} h^{-1} \sum i=1^n K\left(\frac{x-x_i}{h}\right) .$$
这是核估计器的一般形式。允许更平滑的内核函数，如四次内核，
$$K(u)=\frac{15}{16}\left(1-u^2\right)^2 \mathrm{I}(|u| \leq 1),$$
和计算 $x$ 不仅在 bin 中心为我们提供了核密度估计量。核估计量也可以通过舍入点的加权平均 (WARPing) 或通过对不同来源的直方图进行平均来得出，参见 Scott (1985)。桌子1.5介绍一些常用的内核。
不同的核生成不同形状的估计密度。最重要的参数是带宽 $h$ ，并且可以优化，例如，通过交叉验证；有关 详细信息，请参见 Härdle (1991)。交叉验证方法最小化综合平方误差。这种差异度量基于平方差

## 有限元方法代写

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|STATS7062

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Multivariate Laplace Distribution

Let $g$ and $G$ be the pdf and cdf of a $d$-dimensional Gaussian distribution $N_d(0, \Sigma)$, the pdf and cdf of a multivariate Laplace distribution can be written as
\begin{aligned} &f_{M \text { Laplace }d}(x ; m, \Sigma)=\int_0^{\infty} g\left(z^{-\frac{1}{2}} x-z^{\frac{1}{2}} m\right) z^{-\frac{d}{2}} e^{-z} d z \ &F{M \text { Laplace }d}(x, m, \Sigma)=\int_0^{\infty} G\left(z^{-\frac{1}{2}} x-z^{\frac{1}{2}} m\right) e^{-z} d z \end{aligned} the pdf can also be described as \begin{aligned} f{M \text { Laplace }d}(x ; m, \Sigma)=& \frac{2 e^{x^{\top} \Sigma^{-1} m}}{(2 \pi)^{\frac{d}{2}}|\Sigma|^{\frac{1}{2}}}\left(\frac{x^{\top} \Sigma^{-1} x}{2+m^{\top} \Sigma^{-1} m}\right)^{\frac{\lambda}{2}} \ & \times K\lambda\left(\sqrt{\left(2+m^{\top} \Sigma^{-1} m\right)\left(x^{\top} \Sigma^{-1} x\right)}\right) \end{aligned}
where $\lambda=\frac{2-d}{2}$ and $K_\lambda(x)$ is the modified Bessel function of the third kind
$$K_\lambda(x)=\frac{1}{2}\left(\frac{x}{2}\right)^\lambda \int_0^{\infty} t^{-\lambda-1} e^{-t-\frac{x^2}{4 t}} d t, \quad x>0$$
Multivariate Laplace distribution has mean and variance
\begin{aligned} \mathrm{E}[X] &=m \ \operatorname{Cov}[X] &=\Sigma+m m^{\top} \end{aligned}

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Multivariate Distributions

The cumulative distribution function (cdf) of a two-dimensional vector $\left(X_1, X_2\right)$ is given by
$$F\left(x_1, x_2\right)=\mathrm{P}\left(X_1 \leq x_1, X_2 \leq x_2\right) .$$
For the case that $X_1$ and $X_2$ are independent, their joint cumulative distribution function $F\left(x_1, x_2\right)$ can be written as a product of their one-dimensional marginals:
$$F\left(x_1, x_2\right)=F_{X_1}\left(x_1\right) F_{X_2}\left(x_2\right)=\mathrm{P}\left(X_1 \leq x_1\right) \mathrm{P}\left(X_2 \leq x_2\right) .$$
But how can we model dependence of $X_1$ and $X_2$ ? Most people would suggest linear correlation. Correlation is though an appropriate measure of dependence only when the random variables have an elliptical or spherical distribution, which include the normal multivariate distribution. Although the terms “correlation” and “dependency” are often used interchangeably, correlation is actually a rather imperfect measure of dependency, and there are many circumstances where correlation should not be used.

Copulae represent an elegant concept of connecting marginals with joint cumulative distribution functions. Copulae are functions that join or “couple” multivariate distribution functions to their one-dimensional marginal distribution functions. Let us consider a $d$-dimensional vector $X=\left(X_1, \ldots, X_d\right)^{\top}$. Using copulae, the marginal distribution functions $F_{X_i}(i=1, \ldots, d)$ can be separately modelled from their dependence structure and then coupled together to form the multivariate distribution $F_X$. Copula functions have a long history in probability theory and statistics. Their application in finance is very recent. Copulae are important in Valueat-Risk calculations and constitute an essential tool in quantitative finance (Härdle et al., 2009).

First let us concentrate on the two-dimensional case, then we will extend this concept to the $d$-dimensional case, for a random variable in $\mathbb{R}^d$ with $d \geq 1$. To be able to define a copula function, first we need to represent a concept of the volume of a rectangle, a 2-increading function and a grounded function.

Let $U_1$ and $U_2$ be two sets in $\overline{\mathbb{R}}=\mathbb{R} \cup{+\infty} \cup{-\infty}$ and consider the function $F: U_1 \times U_2 \longrightarrow \overline{\mathbb{R}}$

## 统计代写|多元统计分析代写多元统计分析代考|多元拉普拉斯分布

\begin{aligned} &f_{M \text { Laplace }d}(x ; m, \Sigma)=\int_0^{\infty} g\left(z^{-\frac{1}{2}} x-z^{\frac{1}{2}} m\right) z^{-\frac{d}{2}} e^{-z} d z \ &F{M \text { Laplace }d}(x, m, \Sigma)=\int_0^{\infty} G\left(z^{-\frac{1}{2}} x-z^{\frac{1}{2}} m\right) e^{-z} d z \end{aligned} PDF也可以被描述为 \begin{aligned} f{M \text { Laplace }d}(x ; m, \Sigma)=& \frac{2 e^{x^{\top} \Sigma^{-1} m}}{(2 \pi)^{\frac{d}{2}}|\Sigma|^{\frac{1}{2}}}\left(\frac{x^{\top} \Sigma^{-1} x}{2+m^{\top} \Sigma^{-1} m}\right)^{\frac{\lambda}{2}} \ & \times K\lambda\left(\sqrt{\left(2+m^{\top} \Sigma^{-1} m\right)\left(x^{\top} \Sigma^{-1} x\right)}\right) \end{aligned}
where $\lambda=\frac{2-d}{2}$ 和 $K_\lambda(x)$ 是第三类修正贝塞尔函数
$$K_\lambda(x)=\frac{1}{2}\left(\frac{x}{2}\right)^\lambda \int_0^{\infty} t^{-\lambda-1} e^{-t-\frac{x^2}{4 t}} d t, \quad x>0$$多元拉普拉斯分布的均值和方差均
\begin{aligned} \mathrm{E}[X] &=m \ \operatorname{Cov}[X] &=\Sigma+m m^{\top} \end{aligned}

## 统计代写|多元统计分析代写多元统计分析代考|多元分布

$$F\left(x_1, x_2\right)=\mathrm{P}\left(X_1 \leq x_1, X_2 \leq x_2\right) .$$

$$F\left(x_1, x_2\right)=F_{X_1}\left(x_1\right) F_{X_2}\left(x_2\right)=\mathrm{P}\left(X_1 \leq x_1\right) \mathrm{P}\left(X_2 \leq x_2\right) .$$

Copulae代表了用联合累积分布函数连接边缘的优雅概念。Copulae是连接或“耦合”多元分布函数与其一维边际分布函数的函数。让我们考虑一个$d$维向量$X=\left(X_1, \ldots, X_d\right)^{\top}$。利用copulae，可以将边际分布函数$F_{X_i}(i=1, \ldots, d)$从其依赖结构中单独建模，然后将其耦合在一起，形成多元分布$F_X$。Copula函数在概率论和统计学中有着悠久的历史。它们在金融领域的应用是最近才出现的。Copulae在价值-风险计算中很重要，是量化金融的重要工具(Härdle等，2009)

## 有限元方法代写

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|MAST90085

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Laplace Distribution

The univariate Laplace distribution with mean zero was introduced by Laplace (1774). The Laplace distribution can be defined as the distribution of differences between two independent variates with identical exponential distributions. Therefore it is also called the double exponential distribution (Fig. 4.9).
The Laplace distribution with mean $\mu$ and scale parameter $\theta$ has the pdf
$$f_{\text {Laplace }}(x ; \mu, \theta)=\frac{1}{2 \theta} e^{-\frac{|x-\mu|}{\theta}}$$
and the cdf
$$F_{\text {Laplace }}(x ; \mu, \theta)=\frac{1}{2}\left{1+\operatorname{sign}(x-\mu)\left(1-e^{-\frac{|x-\mu|}{\theta}}\right)\right}$$
where sign is sign function. The mean, variance, skewness and kurtosis of the Laplace distribution are
\begin{aligned} \mu &=\mu \ \sigma^2 &=2 \theta^2 \ \text { Skewness } &=0 \ \text { Kurtosis } &=6 \end{aligned}
With mean 0 and $\theta=1$, we obtain the standard Laplace distribution

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Cauchy Distribution

The Cauchy distribution is motivated by the following example.
Example 4.23 A gangster has just robbed a bank. As he runs to a point $s$ metres away from the wall of the bank, a policeman reaches the crime scene behind the wall of the bank. The robber turns back and starts to shoot but he is such a poor shooter that the angle of his fire (marked in Fig. $4.10$ as $\alpha$ ) is uniformly distributed. The bullets hit the wall at distance $x$ (from the centre). Obviously the distribution of $x$, the random variable where the bullet hits the wall, is of vital knowledge to the policeman in order to identify the location of the gangster. (Should the policeman calculate the mean or the median of the observed bullet hits $\left{x_i\right}_{i=1}^n$ in order to identify the location of the robber?)
Since $\alpha$ is uniformly distributed:
$$f(\alpha)=\frac{1}{\pi} \boldsymbol{I}(\alpha \in[-\pi / 2, \pi / 2])$$ and
\begin{aligned} \tan \alpha &=\frac{x}{s} \ \alpha &=\arctan \left(\frac{x}{s}\right) \ d \alpha &=\frac{1}{s} \frac{1}{1+\left(\frac{x}{s}\right)^2} d x \end{aligned}
For a small interval $d \alpha$, the probability is given by
\begin{aligned} f(\alpha) d \alpha &=\frac{1}{\pi} d \alpha \ &=\frac{1}{s \pi} \frac{1}{1+\left(\frac{x}{s}\right)^2} d x \end{aligned}
with
$$\int_{-\frac{\pi}{2}}^{\frac{\pi}{2}} \frac{1}{\pi} d \alpha=1$$

\begin{aligned} \int_{-\infty}^{\infty} \frac{1}{s \pi} \frac{1}{1+\left(\frac{x}{s}\right)^2} d x &=\frac{1}{\pi}\left{\arctan \left(\frac{x}{s}\right)\right}_{-\infty}^{\infty} \ &=\frac{1}{\pi}\left{\frac{\pi}{2}-\left(-\frac{\pi}{2}\right)\right} \ &=1 \end{aligned}
So the pdf of $x$ can be written as:
$$f(x)=\frac{1}{s \pi} \frac{1}{1+\left(\frac{x}{s}\right)^2}$$

## 统计代写|多元统计分析代写多元统计分析代考|拉普拉斯分布

$$f_{\text {Laplace }}(x ; \mu, \theta)=\frac{1}{2 \theta} e^{-\frac{|x-\mu|}{\theta}}$$

$$F_{\text {Laplace }}(x ; \mu, \theta)=\frac{1}{2}\left{1+\operatorname{sign}(x-\mu)\left(1-e^{-\frac{|x-\mu|}{\theta}}\right)\right}$$
，其中符号是符号函数。拉普拉斯分布的均值、方差、偏度和峰度均
\begin{aligned} \mu &=\mu \ \sigma^2 &=2 \theta^2 \ \text { Skewness } &=0 \ \text { Kurtosis } &=6 \end{aligned}

## 统计代写|多元统计分析代写多元统计分析代考|柯西分布

Since $\alpha$ 均匀分布:
$$f(\alpha)=\frac{1}{\pi} \boldsymbol{I}(\alpha \in[-\pi / 2, \pi / 2])$$ 和
\begin{aligned} \tan \alpha &=\frac{x}{s} \ \alpha &=\arctan \left(\frac{x}{s}\right) \ d \alpha &=\frac{1}{s} \frac{1}{1+\left(\frac{x}{s}\right)^2} d x \end{aligned}

\begin{aligned} f(\alpha) d \alpha &=\frac{1}{\pi} d \alpha \ &=\frac{1}{s \pi} \frac{1}{1+\left(\frac{x}{s}\right)^2} d x \end{aligned}

$$\int{-\frac{\pi}{2}}^{\frac{\pi}{2}} \frac{1}{\pi} d \alpha=1$$

\begin{aligned} \int_{-\infty}^{\infty} \frac{1}{s \pi} \frac{1}{1+\left(\frac{x}{s}\right)^2} d x &=\frac{1}{\pi}\left{\arctan \left(\frac{x}{s}\right)\right}_{-\infty}^{\infty} \ &=\frac{1}{\pi}\left{\frac{\pi}{2}-\left(-\frac{\pi}{2}\right)\right} \ &=1 \end{aligned}

$$f(x)=\frac{1}{s \pi} \frac{1}{1+\left(\frac{x}{s}\right)^2}$$

## 有限元方法代写

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|OLET5610

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Generalised Hyperbolic Distribution

The generalised hyperbolic distribution was introduced by Barndorff-Nielsen and at first applied to model grain size distributions of wind blown sands. Today one of its most important uses is in stock price modelling and market risk measurement. The name of the distribution is derived from the fact that its log-density forms a hyperbola, while the log-density of the normal distribution is a parabola (Fig. 4.7).

The density of a one-dimensional generalised hyperbolic $(\mathrm{GH})$ distribution for $x \in \mathbb{R}$ is
\begin{aligned} &f_{\mathrm{GH}}(x ; \lambda, \alpha, \beta, \delta, \mu) \ &\quad=\frac{\left(\sqrt{\alpha^2-\beta^2} / \delta\right)^\lambda}{\sqrt{2 \pi} K_\lambda\left(\delta \sqrt{\alpha^2-\beta^2}\right)} \frac{K_{\lambda-1 / 2}\left{\alpha \sqrt{\delta^2+(x-\mu)^2}\right}}{\left.\sqrt{\delta^2+(x-\mu)^2} / \alpha\right)^{1 / 2-\lambda}} e^{\beta(x-\mu)} \end{aligned}
where $K_\lambda$ is a modified Bessel function of the third kind with index $\lambda$
$$K_\lambda(x)=\frac{1}{2} \int_0^{\infty} y^{\lambda-1} e^{-\frac{x}{2}\left(y+y^{-1}\right)} d y$$
The domain of variation of the parameters is $\mu \in \mathbb{R}$ and
$$\begin{array}{lll} \delta \geq 0,|\beta|<\alpha, & \text { if } & \lambda>0 \ \delta>0,|\beta|<\alpha, & \text { if } \quad \lambda=0 \ \delta>0,|\beta| \leq \alpha, & \text { if } \quad \lambda<0 \end{array}$$
The generalised hyperbolic distribution has the following mean and variance
\begin{aligned} \mathrm{E}[X]=& \mu+\frac{\delta \beta}{\sqrt{\alpha^2-\beta^2}} \frac{K_{\lambda+1}\left(\delta \sqrt{\alpha^2-\beta^2}\right)}{K_\lambda\left(\delta \sqrt{\alpha^2-\beta^2}\right)} \ \operatorname{Var}[X]=& \delta^2\left[\frac{K_{\lambda+1}\left(\delta \sqrt{\alpha^2-\beta^2}\right)}{\delta \sqrt{\alpha^2-\beta^2} K_\lambda\left(\delta \sqrt{\alpha^2-\beta^2}\right)}+\frac{\beta^2}{\alpha^2-\beta^2}\left[\frac{K_{\lambda+2}\left(\delta \sqrt{\alpha^2-\beta^2}\right)}{K_\lambda\left(\delta \sqrt{\alpha^2-\beta^2}\right)}\right.\right.\ &\left.\left.-\left{\frac{K_{\lambda+1}\left(\delta \sqrt{\alpha^2-\beta^2}\right)}{K_\lambda\left(\delta \sqrt{\alpha^2-\beta^2}\right)}\right}^2\right]\right] \end{aligned}
Where $\mu$ and $\delta$ play important roles in the density’s location and scale respectively. With specific values of $\lambda$, we obtain different sub-classes of GH such as hyperbolic (HYP) or normal-inverse Gaussian (NIG) distribution.

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Student’s t-Distribution

The $t$-distribution was first analysed by Gosset (1908) who published it under pseudonym “Student” by request of his employer. Let $X$ be a normally distributed random variable with mean $\mu$ and variance $\sigma^2$, and $Y$ be the random variable such that $Y^2 / \sigma^2$ has a chi-square distribution with $n$ degrees of freedom. Assume that $X$ and $Y$ are independent, then
$$t \stackrel{\text { def }}{=} \frac{X \sqrt{n}}{Y}$$
is distributed as Student’s $t$ with $n$ degrees of freedom. The $t$-distribution has the following density function
$$f_t(x ; n)=\frac{\Gamma\left(\frac{n+1}{2}\right)}{\sqrt{n \pi} \Gamma\left(\frac{n}{2}\right)}\left(1+\frac{x^2}{n}\right)^{-\frac{n+1}{2}}$$
where $n$ is the number of degrees of freedom, $-\infty4)$ are:
\begin{aligned} \mu &=0 \ \sigma^2 &=\frac{n}{n-2} \ \text { Skewness } &=0 \ \text { Kurtosis } &=3+\frac{6}{n-4} . \end{aligned}
The $t$-distribution is symmetric around 0 , which is consistent with the fact that its mean is 0 and skewness is also 0 (Fig. 4.8).

## 统计代写|多元统计分析代写多元统计分析代考|广义双曲分布

$x \in \mathbb{R}$的一维广义双曲分布$(\mathrm{GH})$的密度为
\begin{aligned} &f_{\mathrm{GH}}(x ; \lambda, \alpha, \beta, \delta, \mu) \ &\quad=\frac{\left(\sqrt{\alpha^2-\beta^2} / \delta\right)^\lambda}{\sqrt{2 \pi} K_\lambda\left(\delta \sqrt{\alpha^2-\beta^2}\right)} \frac{K_{\lambda-1 / 2}\left{\alpha \sqrt{\delta^2+(x-\mu)^2}\right}}{\left.\sqrt{\delta^2+(x-\mu)^2} / \alpha\right)^{1 / 2-\lambda}} e^{\beta(x-\mu)} \end{aligned}

$$K_\lambda(x)=\frac{1}{2} \int_0^{\infty} y^{\lambda-1} e^{-\frac{x}{2}\left(y+y^{-1}\right)} d y$$

$$\begin{array}{lll} \delta \geq 0,|\beta|<\alpha, & \text { if } & \lambda>0 \ \delta>0,|\beta|<\alpha, & \text { if } \quad \lambda=0 \ \delta>0,|\beta| \leq \alpha, & \text { if } \quad \lambda<0 \end{array}$$

\begin{aligned} \mathrm{E}[X]=& \mu+\frac{\delta \beta}{\sqrt{\alpha^2-\beta^2}} \frac{K_{\lambda+1}\left(\delta \sqrt{\alpha^2-\beta^2}\right)}{K_\lambda\left(\delta \sqrt{\alpha^2-\beta^2}\right)} \ \operatorname{Var}[X]=& \delta^2\left[\frac{K_{\lambda+1}\left(\delta \sqrt{\alpha^2-\beta^2}\right)}{\delta \sqrt{\alpha^2-\beta^2} K_\lambda\left(\delta \sqrt{\alpha^2-\beta^2}\right)}+\frac{\beta^2}{\alpha^2-\beta^2}\left[\frac{K_{\lambda+2}\left(\delta \sqrt{\alpha^2-\beta^2}\right)}{K_\lambda\left(\delta \sqrt{\alpha^2-\beta^2}\right)}\right.\right.\ &\left.\left.-\left{\frac{K_{\lambda+1}\left(\delta \sqrt{\alpha^2-\beta^2}\right)}{K_\lambda\left(\delta \sqrt{\alpha^2-\beta^2}\right)}\right}^2\right]\right] \end{aligned}

## 统计代写|多元统计分析代写多元统计分析代考|学生的t-分布

Gosset(1908)首先分析了$t$ -分布，他应雇主要求以“学生”的笔名发表了它。设$X$为均值$\mu$，方差$\sigma^2$的正态分布随机变量，$Y$为$Y^2 / \sigma^2$具有$n$自由度的卡方分布的随机变量。假设$X$和$Y$是独立的，则
$$t \stackrel{\text { def }}{=} \frac{X \sqrt{n}}{Y}$$

$$f_t(x ; n)=\frac{\Gamma\left(\frac{n+1}{2}\right)}{\sqrt{n \pi} \Gamma\left(\frac{n}{2}\right)}\left(1+\frac{x^2}{n}\right)^{-\frac{n+1}{2}}$$
，其中$n$是自由度的数量，$-\infty4)$是:
\begin{aligned} \mu &=0 \ \sigma^2 &=\frac{n}{n-2} \ \text { Skewness } &=0 \ \text { Kurtosis } &=3+\frac{6}{n-4} . \end{aligned}
$t$ -分布在0附近是对称的，这与它的平均值为0且偏度也为0的事实一致(图4.8)

## 有限元方法代写

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|STATS7062

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Andrews’ Curves

The basic problem of graphical displays of multivariate data is the dimensionality. Scatterplots work well up to three dimensions (if we use interactive displays). More than three dimensions have to be coded into displayable $2 \mathrm{D}$ or $3 \mathrm{D}$ structures (e.g. faces). The idea of coding and representing multivariate data by curves was suggested by Andrews (1972). Each multivariate observation $X_{i}=\left(X_{i, 1}, \ldots, X_{i, p}\right)$ is transformed into a curve as follows:
f_{i}(t)=\left{\begin{aligned} \frac{X_{i, 1}}{\sqrt{2}}+X_{i, 2} \sin (t)+X_{i, 3} \cos (t)+\cdots & \ \quad+X_{i, p-1} \sin \left(\frac{p-1}{2} t\right)+X_{i, p} \cos \left(\frac{p-1}{2} t\right) & \text { for } p \text { odd } \ \frac{X_{i, 1}}{\sqrt{2}}+X_{i, 2} \sin (t)+X_{i, 3} \cos (t)+\cdots+X_{i, p} \sin \left(\frac{p}{2} t\right) & \text { for } p \text { even } \end{aligned}\right.
the observation represents the coefficients of a so-called Fourier series $(t \in[-\pi, \pi])$. Suppose that we have three-dimensional observations: $X_{1}=(0,0,1), X_{2}=$ $(1,0,0)$ and $X_{3}=(0,1,0)$. Here $p=3$ and the following representations correspond to the Andrews’ curves:
\begin{aligned} &f_{1}(t)=\cos (t) \ &f_{2}(t)=\frac{1}{\sqrt{2}} \text { and } \ &f_{3}(t)=\sin (t) \end{aligned}
These curves are indeed quite distinct, since the observations $X_{1}, X_{2}$, and $X_{3}$ are the 3D unit vectors: each observation has mass only in one of the three dimensions. The order of the variables plays an important role.

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Parallel Coordinates Plots

PCP is a method for representing high-dimensional data, see Inselberg (1985). Instead of plotting observations in an orthogonal coordinate system, PCP draws coordinates in parallel axes and connects them with straight lines. This method helps in representing data with more than four dimensions.

One first scales all variables to $\max =1$ and $\min =0$. The coordinate index $j$ is drawn onto the horizontal axis, and the scaled value of variable $x_{i j}$ is mapped onto the vertical axis. This way of representation is very useful for high-dimensional data. It is however also sensitive to the order of the variables, since certain trends in the data can be shown more clearly in one ordering than in another.

Example 1.5 Take, once again, the observations $96-105$ of the Swiss bank notes. These observations are six dimensional, so we can’t show them in a six-dimensional Cartesian coordinate system. Using the PCP technique, however, they can be plotted on parallel axes. This is shown in Fig. 1.22.

PCP can also be used for detecting linear dependencies between variables: if all the lines are of almost parallel dimensions $(p=2)$, there is a positive linear dependence between them. In Fig. $1.23$ we display the two variables weight and displacement for the car data set in Sect. 22.3. The correlation coefficient $\rho$ introduced in Sect. $3.2$ is $0.9$. If all lines intersect visibly in the middle, there is evidence of a negative linear dependence between these two variables, see Fig. 1.24. In fact the correlation is $\rho=-0.82$ between two variables mileage and weight: The more the weight, the less the mileage.

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Andrews’ Curves

$\$ \$$\mathrm{f}{-}{i}(\mathrm{t})=\backslash \operatorname{left}{ \frac{X{i, 1}}{\sqrt{2}}+X_{i, 2} \sin (t)+X_{i, 3} \cos (t)+\cdots \quad+X_{i, p-1} \sin \left(\frac{p-1}{2} t\right)+X_{i, p} \cos \left(\frac{p-1}{2} t\right) \quad for p oc 【正确的。 theobservationrepresentsthecoefficientsofaso – calledFourierseries \(t \in[-\pi, \pi]) \$$. Suppose
$f_{1}(t)=\cos (t) \quad f_{2}(t)=\frac{1}{\sqrt{2}}$ and $f_{3}(t)=\sin (t)$
$\$ \

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Parallel Coordinates Plots

PCP 是一种表示高维数据的方法，参见 Inselberg (1985)。PCP 不是在正交坐标系中绘制观测值，而是在平行轴上绘制坐标并用直线连接它们。此方法有助于表示具有四个以上维度的数据。

PCP 也可用于检测变量之间的线性依赖关系：如果所有线的维度几乎平行(p=2)，它们之间存在正线性相关。在图。1.23我们显示了 Sect 中汽车数据集的两个变量权重和位移。22.3. 相关系数r节中介绍。3.2是0.9. 如果所有线在中间明显相交，则表明这两个变量之间存在负线性相关性，见图 1.24。实际上相关性是r=−0.82里程和重量两个变量之间：重量越大，里程越少。

