## 统计代写|主成分分析代写Principal Component Analysis代考|Kernel PCA methodology

KPCA is a nonlinear equivalent of classical PCA that uses methods inspired by statistical learning theory. We describe shortly the KPCA method from Scholkopf et al. (1998).

Given a set of observations $\mathbf{x}{i} \in \mathbb{R}^{n}, i=1, \ldots, m$. Let us consider a dot product space $F$ related to the input space by a map $\phi: \mathbb{R}^{n} \rightarrow F$ which is possibly nonlinear. The feature space $F$ could have an arbitrarily large, and possibly infinite, dimension. Hereafter upper case characters are used for elements of $F$, while lower case characters denote elements of $\mathbb{R}^{n}$. We assume that we are dealing with centered data $\sum{i=1}^{m} \phi\left(\mathbf{x}{i}\right)=0$. In $F$ the covariance matrix takes the form $$\mathrm{C}=\frac{1}{m} \sum{j=1}^{m} \phi\left(\mathbf{x}{j}\right) \phi\left(\mathbf{x}{j}\right)^{\top} .$$
We have to find eigenvalues $\lambda \geq 0$ and nonzero eigenvectors $V \in F \backslash{0}$ satisfying
$$\mathbf{C V}=\lambda \mathrm{V} \text {. }$$

As is well known all solutions $\mathbf{V}$ with $\lambda \neq 0$ lie in the span of $\left{\phi\left(\mathbf{x}{i}\right)\right}{i=1}^{m}$. This has two consequences: first we may instead consider the set of equations
$$\left\langle\phi\left(\mathbf{x}{k}\right), \mathbf{C V}\right\rangle=\lambda\left\langle\phi\left(\mathbf{x}{k}\right), \mathbf{V}\right\rangle,$$
for all $k=1, \ldots, m$, and second there exist coefficients $\alpha_{i}, i=1, \ldots, m$ such that
$$\mathbf{V}=\sum_{i=1}^{m} \alpha_{i} \phi\left(\mathbf{x}{i}\right)$$ Combining (1) and (2) we get the dual representation of the eigenvalue problem $$\frac{1}{m} \sum{i=1}^{m} \alpha_{i}\left\langle\phi\left(\mathbf{x}{k}\right), \sum{j=1}^{m} \phi\left(\mathbf{x}{j}\right)\left\langle\phi\left(\mathbf{x}{j}\right), \phi\left(\mathbf{x}{i}\right)\right\rangle\right\rangle=\lambda \sum{i=1}^{m} \alpha_{i}\left\langle\phi\left(\mathbf{x}{k}\right), \phi\left(\mathbf{x}{i}\right)\right\rangle,$$
for all $k=1, \ldots m$. Defining a $m \times m$ matrix $K$ by $K_{i j}:=\left\langle\phi\left(\mathbf{x}{i}\right), \phi\left(\mathbf{x}{j}\right)\right\rangle$, this reads
$$K^{2} \alpha=m \lambda K \alpha,$$

## 统计代写|主成分分析代写Principal Component Analysis代考|Adding input variable information into Kernel PCA

In order to get interpretability we add supplementary information into KPCA representation. We have developed a procedure to project any given input variable onto the subspace spanned by the eigenvectors (9).

We can consider that our observations are realizations of the random vector $X=\left(X_{1}, \ldots, X_{n}\right)$. Then to represent the prominence of the input variable $X_{k}$ in the $\mathrm{KPCA}$. We take a set of points of the form $\mathbf{y}=\mathbf{a}+s \mathbf{e}{k} \in \mathbb{R}^{n}$ where $\mathbf{e}{k}=(0, \ldots, 1, \ldots, 0) \in \mathbb{R}^{n}, s \in \mathbb{R}$, where $k$-th component is equal 1 and otherwise are 0 . Then, we can compute the projections of the image of these points $\phi(\mathbf{y})$ onto the subspace spanned by the eigenvectors (9).

Taking into account equation (11) the induced curve in the eigenspace expressed in matrix form is given by the row vector:
$$\sigma(s){1 \times r}=\left(\mathbf{Z}{s}^{\top}-\frac{1}{m} 1_{m}^{\top} K\right)\left(\mathbf{I}{m}-\frac{1}{m} 1{m} \mathbf{1}{m f}^{\top}\right) \tilde{\mathbf{V}}{r}$$
where $\mathbf{Z}{\mathrm{s}}$ is of the form (10). In addition we can represent directions of maximum variation of $\sigma(\mathrm{s})$ associated with the variable $X{k}$ by projecting the tangent vector at $s=0$. In matrix form, we have
$$\left.\frac{d \sigma}{d s}\right|{s=0}=\left.\frac{d \mathbf{Z}{s}^{\top}}{d s}\right|{s=0}\left(\mathbf{I}{m}-\frac{1}{m} \mathbf{1}{m} \mathbf{1}{m}^{\top}\right) \tilde{\mathbf{V}}$$
with
$$\left.\frac{d \mathbf{Z}{s}^{\top}}{d s}\right|{s=0}=\left(\left.\frac{d \mathbf{Z}{s}^{1}}{d s}\right|{s=0} \ldots,\left.\frac{d \mathbf{Z}{s}^{m}}{d s}\right|{s=0}\right)^{\top}$$ and, with
\begin{aligned} \left.\frac{d \mathbf{Z}{s}^{t}}{d s}\right|{s=0} &=\left.\frac{d K\left(\mathbf{y}{,} \mathbf{x}{i}\right)}{d s}\right|{s=0} \ &=\left.\left(\sum{t=1}^{m} \frac{\partial K\left(\mathbf{y}, \mathbf{x}{i}\right)}{\partial y{t}} \frac{d y_{t}}{d s}\right)\right|{s=0} \ &=\left.\sum{t=1}^{m} \frac{\partial K\left(\mathbf{y}, \mathbf{x}{i}\right)}{\partial y{t}}\right|{\mathbf{y}=\mathbf{a}} \delta{t}^{k}=\left.\frac{\partial K\left(\mathbf{y}{,} \mathbf{x}{i}\right)}{\partial y_{k}}\right|_{\mathbf{y}=\mathbf{a}} \end{aligned}

## 统计代写|主成分分析代写Principal Component Analysis代考|Kernel PCA methodology

KPCA 是经典 PCA 的非线性等价物，它使用受统计学习理论启发的方法。我们简要描述了 Scholkopf 等人的 KPCA 方法。（1998 年）。

$$\mathrm{C}=\frac{1}{m} \sum j=1^{m} \phi(\mathbf{x} j) \phi(\mathbf{x} j)^{\top} .$$

$$\mathbf{C V}=\lambda \mathrm{V} \text {. }$$ 我们可以考虑方程组
$$\langle\phi(\mathbf{x} k), \mathbf{C V}\rangle=\lambda\langle\phi(\mathbf{x} k), \mathbf{V}\rangle,$$

$$\mathbf{V}=\sum_{i=1}^{m} \alpha_{i} \phi(\mathbf{x} i)$$

$$\frac{1}{m} \sum i=1^{m} \alpha_{i}\left\langle\phi(\mathbf{x} k), \sum j=1^{m} \phi(\mathbf{x} j)\langle\phi(\mathbf{x} j), \phi(\mathbf{x} i)\rangle\right\rangle=\lambda \sum i=1^{m} \alpha_{i}\langle\phi(\mathbf{x} k), \phi(\mathbf{x} i)\rangle$$

$$K^{2} \alpha=m \lambda K \alpha,$$

## 统计代写|主成分分析代写Principal Component Analysis代考|Adding input variable information into Kernel PCA

$$\sigma(s) 1 \times r=\left(\mathbf{Z} s^{\top}-\frac{1}{m} 1_{m}^{\top} K\right)\left(\mathbf{I} m-\frac{1}{m} 1 m \mathbf{1} m f^{\top}\right) \tilde{\mathbf{V}} r$$

$$\frac{d \sigma}{d s}\left|s=0=\frac{d \mathbf{Z} s^{\top}}{d s}\right| s=0\left(\mathbf{I} m-\frac{1}{m} \mathbf{1} m \mathbf{1} m^{\top}\right) \tilde{\mathbf{V}}$$

$$\frac{d \mathbf{Z} s^{\top}}{d s} \mid s=0=\left(\frac{d \mathbf{Z} s^{1}}{d s}\left|s=0 \ldots, \frac{d \mathbf{Z} s^{m}}{d s}\right| s=0\right)^{\top}$$

$$\frac{d \mathbf{Z} s^{t}}{d s}\left|s=0=\frac{d K(\mathbf{y}, \mathbf{x} i)}{d s}\right| s=0 \quad=\left(\sum t=1^{m} \frac{\partial K(\mathbf{y}, \mathbf{x} i)}{\partial y t} \frac{d y_{t}}{d s}\right) \mid s=0=\sum t=1^{m} \partial K$$

