统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Parallel Coordinates Plots

统计代写|多元统计分析代写Multivariate Statistical Analysis代考|KernelParallel Coordinates Plots

PCP is a method for representing high-dimensional data, see Inselberg (1985). Instead of plotting observations in an orthogonal coordinate system, PCP draws coordinates in parallel axes and connects them with straight lines. This method helps in representing data with more than four dimensions.

One first scales all variables to $\max =1$ and $\min =0$. The coordinate index $j$ is drawn onto the horizontal axis, and the scaled value of variable $x_{i j}$ is mapped onto the vertical axis. This way of representation is very useful for high-dimensional data. It is however also sensitive to the order of the variables, since certain trends in the data can be shown more clearly in one ordering than in another.

Example 1.5 Take, once again, the observations $96-105$ of the Swiss bank notes. These observations are six dimensional, so we can’t show them in a six-dimensional Cartesian coordinate system. Using the PCP technique, however, they can be plotted on parallel axes. This is shown in Fig. 1.22.

PCP can also be used for detecting linear dependencies between variables: if all the lines are of almost parallel dimensions $(p=2)$, there is a positive linear dependence between them. In Fig. $1.23$ we display the two variables weight and displacement for the car data set in Sect. 22.3. The correlation coefficient $\rho$ introduced in Sect. $3.2$ is $0.9$. If all lines intersect visibly in the middle, there is evidence of a negative linear dependence between these two variables, see Fig. $1.24$. In fact the correlation is $\rho=-0.82$ between two variables mileage and weight: The more the weight, the less the mileage.

统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Hexagon Plots

This section closely follows the presentation of Lewin-Koh (2006). In geometry, a hexagon is a polygon with six edges and six vertices. Hexagon binning is a type of bivariate histogram with hexagon borders. It is useful for visualising the structure of data sets entailing a large number of observations $n$. The concept of hexagon binning is as follows:

1. The $x y$ plane over the set (range $(x)$, $\operatorname{range}(y)$ ) is tessellated by a regular grid of hexagons.
2. The number of points falling in each hexagon is counted.
3. The hexagons with count $>0$ are plotted by using a colour ramp or varying the radius of the hexagon in proportion to the counts.

This algorithm is extremely fast and effective for displaying the structure of data sets even for $n \geq 10^{6}$. If the size of the grid and the cuts in the colour ramp are chosen in a clever fashion, then the structure inherent in the data should emerge in the binned plot. The same caveats apply to hexagon binning as histograms. Variance and bias vary in opposite directions with bin width, so we have to settle for finding the value of the bin width that yields the optimal compromise between variance and bias reduction. Clearly, if we increase the size of the grid, the hexagon plot appears to be smoother, but without some reasonable criterion on hand it remains difficult to say which bin width provides the “optimal” degree of smoothness. The default number of bins suggested by standard software is 30 .

Applications to some data sets are shown as follows. The data is taken from ALLBUS (2006) [ZA No.3762]. The number of respondents is 2,946 . The following nine variables have been selected to analyse the relation between each pair of variables.

A quadratic form $Q(x)$ is built from a symmetric matrix $\mathcal{A}(p \times p)$ and a vector $x \in \mathbb{R}^{p}:$
$$Q(x)=x^{\top}, \mathcal{A} x=\sum_{i=1}^{p} \sum_{j=1}^{p} a_{i j} x_{i} x_{j}$$
Definiteness of Quadratic Forms and Matrices
$$\begin{array}{ll} Q(x)>0 \text { for all } x \neq 0 & \text { positive definite } \ Q(x) \geq 0 \text { for all } x \neq 0 & \text { positive semidefinite } \end{array}$$
A matrix $\mathcal{A}$ is called positive definite (semidefinite) if the corresponding quadratic form $Q(.)$ is positive definite (semidefinite). We write $\mathcal{A}>0(\geq 0)$.
Quadratic forms can always be diagonalised, as the following result shows.
Theorem 2.3 If $\mathcal{A}$ is symmetric and $Q(x)=x^{\top} \mathcal{A} x$ is the corresponding quadratic form, then there exists a transformation $x \mapsto \Gamma^{\top} x=y$ such that
$$x^{\top} \mathcal{A} x=\sum_{i=1}^{p} \lambda_{i} y_{i}^{2}$$
where $\lambda_{i}$ are the eigenvalues of $\mathcal{A}$.
Proof $\mathcal{A}=\Gamma \Lambda \Gamma^{\top}$. By Theorem $2.1$ and $y=\Gamma^{\top} \alpha$ we have that $x^{\top} \mathcal{A} x=$ $x^{\top} \Gamma \Lambda \Gamma^{\top} x=y^{\top} \Lambda y=\sum_{i=1}^{p} \lambda_{i} y_{i}^{2}$.

Positive definiteness of quadratic forms can be deduced from positive eigenvalues.
Theorem $2.4 \mathcal{A}>0$ if and only if all $\lambda_{i}>0, i=1, \ldots, p$.
Proof $0<\lambda_{1} y_{1}^{2}+\cdots+\lambda_{p} y_{p}^{2}=x^{\top} \mathcal{A} x$ for all $x \neq 0$ by Theorem 2.3.

统计代写|多元统计分析代写Multivariate Statistical Analysis代考|KernelParallel Coordinates Plots

PCP 是一种表示高维数据的方法，参见 Inselberg (1985)。PCP 不是在正交坐标系中绘制观测值，而是在平行轴上绘制坐标并用直线连接它们。此方法有助于表示具有四个以上维度的数据。

PCP 也可用于检测变量之间的线性依赖关系：如果所有线的维度几乎平行(p=2)，它们之间存在正线性相关。在图。1.23我们显示了 Sect 中汽车数据集的两个变量权重和位移。22.3. 相关系数ρ节中介绍。3.2是0.9. 如果所有线在中间明显相交，则有证据表明这两个变量之间存在负线性相关性，见图。1.24. 实际上相关性是ρ=−0.82里程和重量两个变量之间：重量越大，里程越少。

统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Hexagon Plots

1. 这X是平面上的集合（范围(X), 范围⁡(是)) 由六边形的规则网格镶嵌。
2. 计算落在每个六边形中的点数。
3. 有计数的六边形>0通过使用色带或与计数成比例地改变六边形的半径来绘制。

X⊤一个X=∑一世=1pλ一世是一世2

