### 统计代写|数据科学代写data science代考|Accuracy Bounds

## 统计代写|数据科学代写data science代考|Accuracy Bounds

Finally, (1.19) can now be taken advantage of in constructing the accuracy bounds for the $h$ th disjunct region. The variance of the residuals can be calculated based on the Frobenius norm of the residual matrix $\mathbf{E}{h}$. Beginning with the PCA decomposition of the data matrix $\mathbf{Z}{h}$, storing the observations of the $h$ th disjunct region, into the product of the associated score and loading matrices, $\mathbf{T}{h} \mathbf{P}{h}^{T}$ and the residual matrix $\mathbf{E}{h}=\mathbf{T}{h}^{} \mathbf{P}{h}^{^{T}}$ :
$$\mathbf{Z}{h}=\mathbf{T}{h} \mathbf{P}{h}^{T}+\mathbf{E}{h}=\mathbf{T}{h} \mathbf{P}{h}^{T}+\mathbf{T}{h}^{} \mathbf{P}{h}^{^{T}},$$
the sum of the residual variances for each original variable, $\rho{i_{h}}, \rho_{h}=\sum_{i=1}^{N} \rho_{i_{h}}$ can be determined as follows:
$$\rho_{h}=\frac{1}{\widetilde{K}-1} \sum_{i=1}^{\widetilde{K}} \sum_{j=1}^{N} e_{i j_{h}}^{2}=\frac{1}{\widetilde{K}-1}\left|\mathbf{E}{h}\right|{2}^{2}$$
which can be simplified to:
$$\rho_{h}=\frac{1}{\widetilde{K}-1}\left|\mathbf{T}{h}^{} \mathbf{P}{h}^{^{T}}\right|_{2}^{2}=\frac{1}{\widetilde{K}-1}\left|\mathbf{U}{h}^{} \boldsymbol{\Lambda}{h}^{} \sqrt[1]{/ 2} \sqrt{\widetilde{K}-1} \mathbf{P}{h}^{^{T}}\right|{2}^{2}$$
and is equal to:
$$\rho_{h}=\frac{\widetilde{K}-1}{\widetilde{K}-1}\left|\boldsymbol{\Lambda}{h}^{}{ }^{1}\right|{2}^{2}=\sum_{i=n+1}^{N} \lambda_{i}$$
Equations (1.20) and (1.22) utilize a singular value decomposition of $\mathbf{Z}{h}$ and reconstructs the discarded components, that is $$\mathbf{E}{h}=\mathbf{U}{h}^{}\left[\Lambda=\sqrt{\widetilde{K}{h}-1}\right] \mathbf{P}{h}^{^{T}}=\mathbf{T}{h}^{} \mathbf{P}{h}^{^{T}}$$
Since $\mathbf{R}{Z Z}^{(h)}=\left[\mathbf{P}{h} \mathbf{P}{h}^{}\right]\left[\begin{array}{cc}\boldsymbol{\Lambda}{h} & \mathbf{0} \ \mathbf{0} & \boldsymbol{\Lambda}{h}^{}\end{array}\right]\left[\begin{array}{c}\mathbf{P}{h}^{T} \ \mathbf{P}{h}^{*}\end{array}\right]$, the discarded eigenvalues $\lambda_{1}$, $\lambda_{2}, \ldots, \lambda_{N}$ depend on the elements in the correlation matrix $\mathbf{R}{Z Z}$. According to (1.18) and (1.19), however, these values are calculated within a confidence limits obtained for a significance level $\alpha$. This, in turn, gives rise to the following optimization problem: \begin{aligned} &\rho{h_{\max }}=\arg \max {\Delta \mathbf{R}{Z Z_{\max }}} \rho_{h}\left(\mathbf{R}{Z Z}+\Delta \mathbf{R}{Z Z_{\max }}\right) \ &\rho_{h_{\min }}=\arg \min {\Delta \mathbf{R}{Z Z_{\min }}} \rho_{h}\left(\mathbf{R}{Z Z}+\Delta \mathbf{R}{Z Z_{\min }}\right) \end{aligned}
which is subject to the following constraints:

\begin{aligned} &\mathbf{R}{Z Z{L}} \leq \mathbf{R}{Z Z}+\Delta \mathbf{R}{Z Z_{\max }} \leq \mathbf{R}{Z Z{U}} \ &\mathbf{R}{Z Z{L}} \leq \mathbf{R}{Z Z}+\Delta \mathbf{R}{Z Z_{\min }} \leq \mathbf{R}{Z Z{U}} \end{aligned}
where $\Delta \mathbf{R}{Z Z{\max }}$ and $\Delta \mathbf{R}{Z Z{\min }}$ are perturbations of the nondiagonal elements in $\mathbf{R}{Z Z}$ that result in the determination of a maximum value, $\rho{h_{\max }}$, and a minimum value, $\rho_{h_{\min }}$, of $\rho_{h}$, respectively.

The maximum and minimum value, $\rho_{h_{\max }}$ and $\rho_{h_{\min }}$, are defined as the accuracy bounds for the $h$ th disjunct region. The interpretation of the accuracy bounds is as follows.

Definition 1. Any set of observations taken from the same disjunct operating region cannot produce a larger or a smaller residual variance, determined with a significance of $\alpha$, if the interrelationship between the original variables is linear.

The solution of Equations (1.24) and (1.25) can be computed using a genetic algorithm [63] or the more recently proposed particle swarm optimization [50].

## 统计代写|数据科学代写data science代考|Summary of the Nonlinearity Test

After determining the accuracy bounds for the $h$ th disjunct region, detailed in the previous subsection, a PCA model is obtained for each of the remaining $m-1$ regions. The sum of the $N-n$ discarded eigenvalues is then benchmarked against these limits to examine whether they fall inside or at least one residual variance value is outside. The test is completed if accuracy bounds have been computed for each of the disjunct regions including a benchmarking of the respective remaining $m-1$ residual variance. If for each of these combinations the residual variance is within the accuracy bound the process is said to be linear. In contrast, if at least one of the residual variances is outside one of the accuracy bounds, it must be concluded that the variable interrelationships are nonlinear. In the latter case, the uncertainty in the $\mathrm{PCA}$ model accuracy is smaller than the variation of the residual variances, implying that a nonlinear PCA model must be employed.
The application of the nonlinearity test involves the following steps.

1. Obtain a sufficiently large set of process data;
2. Determine whether this set can be divided into disjunct regions based on a priori knowledge; if yes, goto step 5 else goto step 3 ;
3. Carry out a $\mathrm{PCA}$ analysis of the recorded data, construct scatter diagrams for the first few principal components to determine whether distinctive operating regions can be identified; if so goto step 5 else goto step 4 ;
4. Divide the data into two disjunct regions, carry out steps 6 to 11 by setting $h=1$, and investigate whether nonlinearity within the data can be proven; if not, increase the number of disjunct regions incrementally either until the sum of discarded eigenvalues violate the accuracy bounds or the number of observations in each region is insufficient to continue the analysis;
1. Set $h=1$;
2. Calculate the confidence limits for the nondiagonal elements of the correlation matrix for the hth disjunct region (Equations (1.17) and (1.18));
3. Solve Equations (1.24) and (1.25) to compute accuracy bounds $\sigma_{h_{\max }}$ and $\sigma_{h_{\min }} ;$
4. Obtain correlation/covariance matrices for each disjunct region (scaled with respect to the variance of the observations within the $h$ th disjunct region:
5. Carry out a singular value decomposition to determine the sum of eigenvalues for each matrix;
6. Benchmark the sums of eigenvalues against the $h$ th set of accuracy bounds to test the hypothesis that the interrelationships between the recorded process variables are linear against the alternative hypothesis that the variable interrelationships are nonlinear:
7. if $h=N$ terminate the nonlinearity test else goto step 6 by setting $h=$ $h+1 .$

Examples of how to employ the nonlinearity test is given in the next subsection.

## 统计代写|数据科学代写data science代考|Example Studies

These examples have two variables, $z_{1}$ and $z_{2}$. They describe (a) a linear interrelationship and (b) a nonlinear interrelationship between $z_{1}$ and $z_{2}$. The examples involve the simulation of 1000 observations of a single score variable $t$ that stem from a uniform distribution such that the division of this set into 4 disjunct regions produces 250 observations per region. The mean value of $t$ is equal to zero and the observations of $t$ spread between $+4$ and $-4$.

In the linear example, $z_{1}$ and $z_{2}$ are defined by superimposing two independently and identically distributed sequenoes, $e_{1}$ and $e_{2}$, that follow a normal distribution of zero mean and a variance of $0.005$ onto $t$ :
$$z_{1}=t+e_{1}, e_{1}=\mathcal{N}{0,0.005} \quad z_{2}=t+e_{2}, e_{2}=\mathcal{N}{0,0.005}$$
For the nonlinear example, $z_{1}$ and $z_{2}$, are defined as follows:
$$z_{1}=t+e_{1} \quad z_{2}=t^{3}+e_{2}$$
with $e_{1}$ and $e_{2}$ described above. Figure $1.2$ shows the resultant scatter plots for the linear example (right plot) and the nonlinear example (left plot) including the division into 4 disjunct regions each.

## 统计代写|数据科学代写data science代考|Accuracy Bounds

ρH=1ķ~−1∑一世=1ķ~∑j=1ñ和一世jH2=1ķ~−1|和H|22

$$\rho_{h}=\frac{1}{\widetilde{K}-1}\left|\mathbf{T}{h}^{} \mathbf{P}{h} ^{^{T}}\right|_{2}^{2}=\frac{1}{\widetilde{K}-1}\left|\mathbf{U}{h}^{} \boldsymbol{ \Lambda}{h}^{} \sqrt[1]{/ 2} \sqrt{\widetilde{K}-1} \mathbf{P}{h}^{^{T}}\right|{2} ^{2} 一种nd一世s和q在一种l吨这: \rho_{h}=\frac{\widetilde{K}-1}{\widetilde{K}-1}\left|\boldsymbol{\Lambda}{h}^{}{ }^{1}\right| {2}^{2}=\sum_{i=n+1}^{N} \lambda_{i} 和q在一种吨一世这ns(1.20)一种nd(1.22)在吨一世l一世和和一种s一世nG在l一种r在一种l在和d和C这米p这s一世吨一世这n这F从H一种ndr和C这ns吨r在C吨s吨H和d一世sC一种rd和dC这米p这n和n吨s,吨H一种吨一世s\mathbf{E}{h}=\mathbf{U}{h}^{}\left[\Lambda=\sqrt{\widetilde{K}{h}-1}\right] \mathbf{P}{h }^{^{T}}=\mathbf{T}{h}^{} \mathbf{P}{h}^{^{T}} 小号一世nC和R从从(H)=[磷H磷H][ΛH0 0ΛH][磷H吨 磷H∗],吨H和d一世sC一种rd和d和一世G和n在一种l在和sλ1,λ2,…,λñd和p和nd这n吨H和和l和米和n吨s一世n吨H和C这rr和l一种吨一世这n米一种吨r一世XR从从.一种CC这rd一世nG吨这(1.18)一种nd(1.19),H这在和在和r,吨H和s和在一种l在和s一种r和C一种lC在l一种吨和d在一世吨H一世n一种C这nF一世d和nC和l一世米一世吨s这b吨一种一世n和dF这r一种s一世Gn一世F一世C一种nC和l和在和l一种.吨H一世s,一世n吨在rn,G一世在和sr一世s和吨这吨H和F这ll这在一世nG这p吨一世米一世和一种吨一世这npr这bl和米:ρH最大限度=参数⁡最大限度ΔR从从最大限度ρH(R从从+ΔR从从最大限度) ρH分钟=参数⁡分钟ΔR从从分钟ρH(R从从+ΔR从从分钟)$$

## 统计代写|数据科学代写data science代考|Summary of the Nonlinearity Test

1. 获得足够大的过程数据集；
2. 判断这个集合是否可以根据先验知识划分为不相交的区域；如果是，则转到第 5 步，否则转到第 3 步；
3. 进行一次磷C一种分析记录的数据，构建前几个主成分的散点图，以确定是否可以识别出不同的操作区域；如果是，则转到第 5 步，否则转到第 4 步；
4. 将数据分成两个不相交的区域，通过设置执行步骤 6 到 11H=1，并调查是否可以证明数据中的非线性；如果不是，则逐渐增加分离区域的数量，直到丢弃的特征值的总和超出精度界限或每个区域中的观察数量不足以继续分析；
1. 放H=1;
2. 计算第 h 个分离区域的相关矩阵的非对角元素的置信限（方程（1.17）和（1.18））；
3. 求解方程 (1.24) 和 (1.25) 以计算精度界限σH最大限度和σH分钟;
4. 获得每个分离区域的相关/协方差矩阵（根据观测值的方差缩放H分离区域：
5. 进行奇异值分解，确定每个矩阵的特征值之和；
6. 将特征值之和与H用于检验记录过程变量之间的相互关系是线性的假设与变量相互关系是非线性的备择假设的准确度范围：
7. 如果H=ñ通过设置终止非线性测试，否则转到步骤 6H= H+1.

