### 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Comparison of Batches

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Kernel Densities

The major difficulties of histogram estimation may be summarised in four critiques:

• determination of the binwidth $h$, which controls the shape of the histogram,
• choice of the bin origin $x_{0}$, which also influences to some extent the shape,
• loss of information since observations are replaced by the central point of the interval in which they fall,
• the underlying density function is often assumed to be smooth, but the histogram is not smooth.

Rosenblatt (1956), Whittle (1958) and Parzen (1962) developed an approach which avoids the last three difficulties. First, a smooth kernel function rather than a box is used as the basic building block. Second, the smooth function is centred directly over each observation. Let us study this refinement by supposing that $x$ is the centre value of a bin. The histogram can in fact be rewritten as
$$\hat{f}{h}(x)=n^{-1} h^{-1} \sum{i=1}^{n} I\left(\left|x-x_{i}\right| \leq \frac{h}{2}\right)$$
If we define $K(u)=\boldsymbol{I}\left(|u| \leq \frac{1}{2}\right)$, then (1.8) changes to
$$\hat{f}{h}(x)=n^{-1} h^{-1} \sum{i=1}^{n} K\left(\frac{x-x_{i}}{h}\right) .$$
This is the general form of the kernel estimator. Allowing smoother kernel functions like the quartic kernel,
$$K(u)=\frac{15}{16}\left(1-u^{2}\right)^{2} \boldsymbol{I}(|u| \leq 1)$$
and computing $x$ not only at bin centers gives us the kernel density estimator. Kernel estimators can also be derived via weighted averaging of rounded points (WARPing) or by averaging histograms with different origins, see Scott (1985). Table $1.5$ introduces some commonly used kernels.

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Chernoff-Flury Faces

If we are given data in numerical form, we tend to also display it numerically. This was done in the preceding sections: an observation $x_{1}=(1,2)$ was plotted as the point $(1,2)$ in a two-dimensional coordinate system. In multivariate analysis we want to understand data in low dimensions (e.g. on a $2 \mathrm{D}$ computer screen) although the structures are hidden in high dimensions. The numerical display of data structures using coordinates therefore ends at dimensions greater than three.

If we are interested in condensing a structure into $2 \mathrm{D}$ elements, we have to consider alternative graphical techniques. The Chernoff-Flury faces, for example, provide such a condensation of high-dimensional information into a simple “face”. In fact faces are a simple way of graphically displaying high-dimensional data. The size of the face elements like pupils, eyes, upper and lower hair line, etc. are assigned to certain variables. The idea of using faces goes back to Chernoff (1973) and has been further developed by Bernhard Flury. We follow the design described in Flury and Riedwyl (1988) which uses the following characteristics.

1. right eye size
2. right pupil size
3. position of right pupil
4. right eye slant
5. horizontal position of right eye
6. vertical position of right eye
7. curvature of right eyebrow
8. density of right eyebrow
9. horizontal position of right eyebrow
10. vertical position of right eyebrow
11. right upper hair line
12. right lower hair line
13. right face line
14. darkness of right hair
15. right hair slant
16. right nose line
17. right size of mouth
18. right curvature of mouth
19-36. like 1-18, only for the left side.

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Andrews’ Curves

The basic problem of graphical displays of multivariate data is the dimensionality. Scatterplots work well up to three dimensions (if we use interactive displays). More than three dimensions have to be coded into displayable 2D or 3D structures (e.g. faces). The idea of coding and representing multivariate data by curves was suggested by Andrews (1972). Each multivariate observation $X_{i}=\left(X_{i, 1}, \ldots, X_{i, p}\right)$ is transformed into a curve as follows:
the observation represents the coefficients of a so-called Fourier series $(t \in[-\pi, \pi])$.
Suppose that we have three-dimensional observations: $X_{1}=(0,0,1), X_{2}=$ $(1,0,0)$ and $X_{3}=(0,1,0)$. Here $p=3$ and the following representations correspond to the Andrews’ curves:
\begin{aligned} &f_{1}(t)=\cos (t) \ &f_{2}(t)=\frac{1}{\sqrt{2}} \quad \text { and } \ &f_{3}(t)=\sin (t) \end{aligned}
These curves are indeed quite distinct, since the observations $X_{1}, X_{2}$, and $X_{3}$ are the $3 \mathrm{D}$ unit vectors: each observation has mass only in one of the three dimensions. The order of the variables plays an important role.

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Kernel Densities

• binwidth的确定H，它控制直方图的形状，
• bin 原点的选择X0，这也在一定程度上影响了形状，
• 信息丢失，因为观测值被它们所在区间的中心点所取代，
• 通常假设底层密度函数是平滑的，但直方图并不平滑。

Rosenblatt (1956)、Whittle (1958) 和 Parzen (1962) 开发了一种方法来避免最后三个困难。首先，使用平滑核函数而不是盒子作为基本构建块。其次，平滑函数直接以每个观察为中心。让我们通过假设X是 bin 的中心值。直方图实际上可以重写为

F^H(X)=n−1H−1∑一世=1n我(|X−X一世|≤H2)

F^H(X)=n−1H−1∑一世=1nķ(X−X一世H).

ķ(在)=1516(1−在2)2我(|在|≤1)

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Chernoff-Flury Faces

1. 右眼大小
2. 正确的瞳孔大小
3. 右瞳孔位置
4. 右眼倾斜
5. 右眼水平位置
6. 右眼垂直位置
7. 右眉弧度
8. 右眉密度
9. 右眉水平位置
10. 右眉垂直位置
11. 右上发际线
12. 右下发际线
13. 右脸线
14. 右头发的黑暗
15. 右发斜
16. 右鼻线
17. 嘴巴大小合适
18. 嘴的右曲度
19-36。像 1-18，仅适用于左侧。

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Andrews’ Curves

F1(吨)=因⁡(吨) F2(吨)=12 和  F3(吨)=罪⁡(吨)

