## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Comparison of Batches

Multivariate statistical analysis is concerned with analysing and understanding data in high dimensions. We suppose that we are given a set $\left{x_{i}\right}_{i=1}^{n}$ of $n$ observations of a variable vector $X$ in $\mathbb{R}^{p}$. That is, we suppose that each observation $x_{i}$ has $p$ dimensions:
$$x_{i}=\left(x_{i 1}, x_{i 2}, \ldots, x_{i p}\right)$$
and that it is an observed value of a variable vector $X \in \mathbb{R}^{p}$. Therefore, $X$ is composed of $p$ random variables:
$$X=\left(X_{1}, X_{2}, \ldots, X_{p}\right)$$
where $X_{j}$, for $j=1, \ldots, p$, is a one-dimensional random variable. How do we begin to analyse this kind of data? Before we investigate questions on what inferences we can reach from the data, we should think about how to look at the data. This involves descriptive techniques. Questions that we could answer by descriptive techniques are:

• Are there components of $X$ that are more spread out than others?
• Are there some elements of $X$ that indicate sub-groups of the data?
• Are there outliers in the components of $X$ ?
• How “normal” is the distribution of the data?
• Are there “low-dimensional” linear combinations of $X$ that show “non-normal” behaviour?

One difficulty of descriptive methods for high-dimensional data is the human perceptional system. Point clouds in two dimensions are easy to understand and to interpret. With modern interactive computing techniques we have the possibility to see real time $3 \mathrm{D}$ rotations and thus to perceive also three-dimensional data. A “sliding technique” as described in Härdle and Scott (1992) may give insight into four-dimensional structures by presenting dynamic 3D density contours as the fourth variable is changed over its range.

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Histograms

Histograms are density estimates. A density estimate gives a good impression of the distribution of the data. In contrast to boxplots, density estimates show possible multimodality of the data. The idea is to locally represent the data density by counting the number of observations in a sequence of consecutive intervals (bins) with origin $x_{0}$. Let $B_{j}\left(x_{0}, h\right)$ denote the bin of length $h$ which is the element of a bin grid starting at $x_{0}$ :
$$B_{j}\left(x_{0}, h\right)=\left[x_{0}+(j-1) h, x_{0}+j h\right), \quad j \in \mathbb{Z},$$
where [., . ) denotes a left closed and right open interval. If $\left{x_{i}\right}_{i=1}^{n}$ is an i.i.d. sample with density $f$, the histogram is defined as follows:
$$\hat{f}{h}(x)=n^{-1} h^{-1} \sum{j \in \mathbb{Z}} \sum_{i=1}^{n} \boldsymbol{I}\left{x_{i} \in B_{j}\left(x_{0}, h\right)\right} \mathbf{I}\left{x \in B_{j}\left(x_{0}, h\right)\right}$$
In sum (1.7) the first indicator function $I\left{x_{i} \in B_{j}\left(x_{0}, h\right)\right}$ (see Symbols and Notation in Chap. 21) counts the number of observations falling into bin $B_{j}\left(x_{0}, h\right)$. The second indicator function is responsible for “localising” the counts around $x$. The parameter $h$ is a smoothing or localising parameter and controls the width of the histogram bins. An $h$ that is too large leads to very big blocks and thus to a very unstructured histogram. On the other hand, an $h$ that is too small gives a very variable estimate with many unimportant peaks.

The effect of $h$ is given in detail in Fig. 1.6. It contains the histogram (upper left) for the diagonal of the counterfeit bank notes for $x_{0}=137.8$ (the minimum of these observations) and $h=0.1$. Increasing $h$ to $h=0.2$ and using the same origin, $x_{0}=137.8$, results in the histogram shown in the lower left of the figure. This density histogram is somewhat smoother due to the larger $h$. The binwidth is next set to $h=0.3$ (upper right). From this histogram, one has the impression that the distribution of the diagonal is bimodal with peaks at about $138.5$ and 139.9.

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Histograms

$$B_{j}\left(x_{0}, h\right)=\left[x_{0}+(j-1) h, x_{0}+j h\right), \quad j \in \mathbb{Z},$$

