For a $p \neq 0$, the generalized mean or power mean $\mathscr{M}{p}$ of $\left{a{i}>0, i=1, \ldots, N\right}$ $[20]$ is defined as
$$\mathcal{M}{p}\left{a{1}, \ldots, a_{N}\right}=\left(\frac{1}{N} \sum_{i=1}^{N} a_{i}^{p}\right)^{1 / p}$$
Figure $1[25]$ shows that $\mathscr{H}_{p}{1,2, \ldots, 10}$ varies continuously as $p$ changes from $-10$ to 10 . The arithmetic mean, the geometric mean, and the harmonic mean are special cases of the generalized mean when $p=1, p \rightarrow 0$, and $p=-1$, respectively. Furthermore, the maximum and the minimum values of the numbers can also be approximated from the generalized mean by making $p \rightarrow \infty$ and $p \rightarrow-\infty$, respectively. Note that as $p$ decreases (increases), the generalized mean is more affected by the smaller (larger) numbers than the larger (smaller) ones, i.e., controlling $p$ makes it possible to adjust the contribution of each number to the generalized mean. This characteristic is useful in the situation where data samples should be

differently handled according to their importance, for example, when outliers are contained in the training set.

In [25], it was shown that the generalized mean of a set of positive numbers can be expressed by a nonnegative linear combination of the elements in the set as the following:
$$\left(\frac{1}{K} \sum_{i=1}^{K} a_{i}^{p}\right)^{1 / p}=c_{1} a_{1}+\cdots+c_{K} a_{K}$$
Each $c_{i}$ in this equation can be obtained by differentiating this equation with respect to $c_{i}$
$$c_{i}=\left(\frac{1}{K} \sum_{i=1}^{K} a_{i}^{p}\right)^{\frac{1}{p-1}} \frac{a_{i}^{p-1}}{K}$$
where $i=1, \ldots, K$. In this chapter, it is further simplified as the following:
\begin{aligned} \sum_{i=1}^{K} a_{i}^{p} &=b_{1} a_{1}+\cdots+b_{K} a_{K} \ b_{i} &=a_{i}^{p-1}, \quad i=1, \ldots, K \end{aligned}
Note that each weight $b_{i}$ has the same value of 1 if $p=1$, where the generalized mean becomes the arithmetic mean. It is also noted that, if $p$ is less than one, the weight $b_{i}$ increases as $a_{i}$ decreases. This means that, when $p<1$, the generalized mean is more influenced by the small numbers in $\left{a_{i}\right}_{i=1}^{K}$, and the extent of the influence increases as $p$ decreases. This equation plays an important role in solving the optimization problems using the generalized mean.

Most conventional PCAs commonly assume that training samples have zero-mean. To satisfy this assumption, all of the samples are subtracted by the sample mean, i.e., $\mathbf{x}{i}-\mathbf{m}{S}$ for $i=1, \ldots, N$, where $\mathbf{m}{S}=\frac{1}{N} \sum{i=1}^{N} \mathbf{x}{i}$. The conventional sample mean can be considered as the center of the samples in the sense of the least square, i.e., $$\mathbf{m}{S}=\underset{\mathbf{m}}{\arg \min } \frac{1}{N} \sum_{i=1}^{N}\left|\mathbf{x}{i}-\mathbf{m}\right|{2}^{2} .$$
In (7), a small number of outliers in the training samples dominate the objective function because the objective function in ( 7$)$ is constructed based on the squared distances. To obtain a robust sample mean in the presence of outliers, a new opti-mization problem is formulated by replacing the arithmetic mean in ( 7 ) with the generalized mean as
$$\mathbf{m}{G}=\underset{\mathbf{m}}{\arg \min }\left(\frac{1}{N} \sum{i=1}^{N}\left(\left|\mathbf{x}{i}-\mathbf{m}\right| |{2}^{2}\right)^{p}\right)^{1 / p}$$
This problem is equivalent to $(7)$ if $p=1$. As mentioned in the previous subsection, the contribution of a large number to the objective function decreases as $p$ decreases. Thus, the negative effect of outliers can be alleviated if $p<1$. From now on, we will call $\mathbf{m}{G}$ as the generalized sample mean. Using the fact that $x^{p}$ with $p>0$ is a monotonic increasing function of $x$ for $x>0$, this problem can be converted to $$\mathbf{m}{G}=\underset{\mathbf{m}}{\arg \min } \sum_{i=1}^{N}\left(\left|\mathbf{x}{i}-\mathbf{m}\right|{2}^{2}\right)^{p}$$
Although the minimization in (8) should be changed into the maximization when $p<0$, we only consider positive values of $p$ in this paper.

The necessary condition for $\mathbf{m}{G}$ to be a local minimum is that the gradient of the objective function in (8) with respect to $\mathrm{m}$ is equal to zero, i.e., $$\frac{\partial}{\partial \mathbf{m}} \sum{i=1}^{N}\left(\left|\mathbf{x}{i}-\mathbf{m}\right|{2}^{2}\right)^{p}=0$$

For a projected sample $\mathbf{W}^{T} \mathbf{x}$, the squared reconstruction error $e(\mathbf{W})$ can be computed as
$$e(\mathbf{W})=\tilde{\mathbf{x}}^{T} \widetilde{\mathbf{x}}-\tilde{\mathbf{x}}^{T} \mathbf{W} \mathbf{W}^{T} \widetilde{\mathbf{x}}$$
where $\tilde{\mathbf{x}}=\mathbf{x}-\mathbf{m}$. We use the generalized sample mean $\mathbf{m}{G}$ for $\mathbf{m}$. To prevent outliers corresponding to large $e(\mathbf{W})$ from dominating the objective function, we propose to minimize the following objective function: $$J{G}(\mathbf{W})=\left(\frac{1}{N} \sum_{i=1}^{N}\left[e_{i}(\mathbf{W})\right]^{p}\right)^{1 / p}$$
where $e_{i}(\mathbf{W})=\tilde{\mathbf{x}}{i}^{T} \tilde{\mathbf{x}}{i}-\tilde{\mathbf{x}}{i}^{T} \mathbf{W} \mathbf{W}^{T} \tilde{\mathbf{x}}{i}$ is the squared reconstruction error of $\mathbf{x}{i}$ with respect to $\mathbf{W}$. Note that $J{G}(\mathbf{W})$ is formulated by replacing the arithmetic mean in $J_{L_{2}}(\mathbf{W})$ with the generalized mean keeping the use of the Euclidean distance and it is equivalent to $J_{L_{2}}(\mathbf{W})$ if $p=1$. The negative effect raised by outliers is suppressed in the same way as in (8). Also, the solution that minimizes $J_{G}(\mathbf{W})$ is rotational invariant because each $e_{i}(\mathbf{W})$ is measured based on the Euclidean distance. To obtain $\mathbf{W}_{G}$, we develop an iterative optimization method similar to Algorithm $1 .$

