### 统计代写|数据科学代写data science代考|Developments and Applications

statistics-lab™ 为您的留学生涯保驾护航 在代写数据科学data science方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写数据科学data science方面经验极为丰富，各种代写数据科学data science相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 统计代写|数据科学代写data science代考|Principal Component Analysis

PCA is a data analysis technique that relies on a simple transformation of recorded observation, stored in a vector $\mathbf{z} \in \mathbb{R}^{N}$, to produce statistically independent score variables, stored in $\mathrm{t} \in \mathbb{R}^{n}, n \leq N$ :
$$\mathrm{t}=\mathbf{P}^{T} \mathbf{z}$$
Here, $\mathbf{P}$ is a transformation matrix, constructed from orthonormal column vectors. Since the first applications of $\mathrm{PCA}[21]$, this technique has found its way into a wide range of different application areas, for example signal processing $[75]$, factor analysis $[29,44]$, system identification $[77]$, chemometrics $[20,66]$ and more recently, general data mining $[11,58,70]$ including image processing $[17,72]$ and pattern recognition $[10,47]$, as well as process monitoring and quality control $[1,82]$ including multiway [48], multiblock [52] and

multiscale [3] extensions. This success is mainly related to the ability of PCA to describe significant information/variation within the recorded data typically by the first few score variables, which simplifies data analysis tasks accordingly.

Sylvester $[67]$ formulated the idea behind PCA, in his work the removal of redundancy in bilinear quantics, that are polynomial expressions where the sum of the exponents are of an order greater than 2, and Pearson [51] laid the conceptual basis for PCA by defining lines and planes in a multivariable space that present the closest fit to a given set of points. Hotelling [28] then refined this formulation to that used today. Numerically, PCA is closely related to an eigenvector-eigenvalue decomposition of a data covariance, or correlation matrix and numerical algorithms to obtain this decomposition include the iterative NIPALS algorithm [78], which was defined similarly by Fisher and MacKenzie earlier in $[80]$, and the singular value decomposition. Good overviews concerning $\mathrm{PCA}$ are given in Mardia et al. [45], Joliffe [32]. Wold et al. $[80]$ and Jackson [30].
‘The aim of this article is to review and examine nonlinear extensions of PCA that have been proposed over the past two decades. This is an important research field, as the application of linear PCA to nonlinear data may be inadequate [49]. The first attempts to present nonlinear PCA extensions include a generalization, utilizing a nonmetric scaling, that produces a nonlinear optimization problem [42] and constructing a curves through a given cloud of points, referred to as principal curves [25]. Inspired by the fact that the reconstruction of the original variables, $\widehat{\mathbf{z}}$ is given by:
$$\widehat{\mathbf{z}}=\mathbf{P t}=\overbrace{\mathbf{P} \underbrace{\left(\mathbf{P}^{T} \mathbf{z}\right)}_{\text {mapping }}}^{\text {demapping }},$$
that includes the determination of the score variables (mapping stage) and the determination of $\widehat{\mathbf{z}}$ (demapping stage), Kramer [37] proposed an autoassociative neural network (ANN) structure that defines the mapping and demapping stages by neural network layers. Tan and Mavrovouniotis [68] pointed out, however, that the 5 layers network topology of autoassociative neural networks may be difficult to train, i.e. network weights are difficult to determine if the number of layers increases [27].

To reduce the network complexity, Tan and Mavrovouniotis proposed an input training (IT) network topology, which omits the mapping layer. Thus, only a 3 layer network remains, where the reduced set of nonlinear principal components are obtained as part of the training procedure for establishing the IT network. Dong and McAvoy [16] introduced an alternative approach that divides the 5 layer autoassociative network topology into two 3 layer topologies, which, in turn, represent the nonlinear mapping and demapping functions. The output of the first network, that is the mapping layer, are the score variables which are determined using the principal curve approach.

## 统计代写|数据科学代写data science代考|PCA Preliminaries

where $N$ and $K$ are the number of recorded variables and the number of available observations, respectively. Defining the rows and columns of $\mathbf{Z}$ by vectors $\mathbf{z}{i} \in \mathbb{R}^{N}$ and $\zeta{j} \in \mathbb{R}^{K}$, respectively, $\mathbf{Z}$ can be rewritten as shown below:
$$\mathbf{Z}=\left[\begin{array}{c} \mathbf{z}{1}^{T} \ \mathbf{z}{2}^{T} \ \mathbf{z}{3}^{T} \ \vdots \ \mathbf{z}{i}^{T} \ \vdots \ \mathbf{z}{K-1}^{T} \ \mathbf{z}{K}^{T} \end{array}\right]=\left[\begin{array}{lll} \boldsymbol{\zeta}{1} \ \boldsymbol{\zeta}{2} \end{array} \boldsymbol{\zeta}{3} \cdots \boldsymbol{\zeta}{j} \cdots \boldsymbol{\zeta}{N}\right]$$ The first and second order statisties of the original set variables $\mathbf{z}^{T}=$ $\left(z{1} z_{2} z_{3} \cdots z_{j} \cdots z_{N}\right)$ are:
$$E{\mathbf{z}}=\mathbf{0} \quad E\left{\mathbf{z z}^{T}\right}=\mathbf{S}{Z Z}$$ with the correlation matrix of $\mathbf{z}$ being defined as $\mathbf{R}{Z Z}$.
The PCA analysis entails the determination of a set of score variables $t_{k}, k \in{123 \cdots n}, n \leq N$, by applying a linear transformation of $\mathbf{z}$ :
$$t_{k}=\sum_{j=1}^{N} p_{k j} z_{j}$$
under the following constraint for the parameter vector
$$\begin{gathered} \mathbf{p}{k}^{T}=\left(p{k 1} p_{k 2} p_{k 3} \cdots p_{k j} \cdots p_{k}\right. \ \sqrt{\sum_{j=1}^{N} p_{k j}^{2}}=\left|\mathbf{p}{k}\right|{2}=1 . \end{gathered}$$
Storing the score variables in a vector $\mathbf{t}^{T}=\left(t_{1} t_{2} t_{3} \cdots t_{j} \cdots t_{n}\right), \mathbf{t} \in \mathbb{R}^{n}$ has the following first and second order statistics:
$$E{\mathbf{t}}=\mathbf{0} \quad E\left{\mathbf{t t}^{T}\right}=\mathbf{\Lambda},$$
where $\Lambda$ is a diagonal matrix. An important property of $\mathrm{PCA}$ is that the variance of the score variables represent the following maximum:
$$\lambda_{k}=\arg \max {\mathbf{p}{k}}\left{E\left{t_{k}^{2}\right}\right}=\arg \max {\mathbf{p}{k}}\left{E\left{\mathbf{p}{k}^{T} \mathbf{z z}^{T} \mathbf{p}{k}\right}\right}$$

that is constraint by:
$$E\left{\left(\begin{array}{c} t_{1} \ t_{2} \ t_{3} \ \vdots \ t_{k-1} \end{array}\right) t_{k}\right}=0 \quad\left|\mathbf{p}{k}\right|{2}^{2}-1=0$$
Anderson [2] indicated that the formulation of the above constrained optimization can alternatively be written as:
$$\lambda_{k}=\arg \max {\mathbf{p}}\left{E\left{\mathbf{p}^{T} \mathbf{z z}^{T} \mathbf{p}\right}-\lambda{k}\left(\mathbf{p}^{T} \mathbf{p}-1\right)\right}$$
under the assumption that $\lambda_{k}$ is predetermined. Reformulating (1.11) to determine $\mathbf{p}{k}$ gives rise to: $$\mathbf{p}{k}=\arg \frac{\partial}{\partial \mathbf{p}}\left{E\left{\mathbf{p}^{I} \mathbf{z z ^ { I }} \mathbf{p}\right}-\lambda_{k}\left(\mathbf{p}^{T} \mathbf{p}-1\right)\right}=\mathbf{0}$$
and produces
$$\mathbf{p}{k}=\arg \left{E\left{\mathbf{z z}^{T}\right} \mathbf{p}-2 \lambda{k} \mathbf{p}\right}=\mathbf{0}$$

## 统计代写|数据科学代写data science代考|Nonlinearity Test for PCA Models

This section discusses how to determine whether the underlying structure within the recorded data is linear or nonlinear. Kruger et al. [38] introduced this nonlinearity test using the principle outlined in Fig. 1.1. The left plot in this figure shows that the first principal component describes the underlying linear relationship between the two variables, $z_{1}$ and $z_{2}$, while the right plot describes some basic nonlinear function, indicated by the curve.

By dividing the operating region into several disjunct regions, where the first region is centered around the origin of the coordinate system, a $\mathrm{PCA}$ model can be obtained from the data of each of these disjunct regions. With respect to Fig. 1.1, this would produce a total of $3 \mathrm{PCA}$ models for each disjunct region in both cases, the linear (left plot) and the nonlinear case (right plot). To determine whether a linear or nonlinear variable interrelationship can be extracted from the data, the principle idea is to take advantage of the residual variance in each of the regions. More precisely, accuracy bounds that are based on the residual variance are obtained for one of the $P C A$ models, for example that of disjunct region I, and the residual variance of the remaining $P C A$ models (for disjunct regions II and III) are benchmarked against these bounds. The test is completed if each of the PCA models has been used to determine accuracy bounds which are then benchmarked against the residual variance of the respective remaining $P C A$ models.

The reason of using the residual variance instead of the variance of the retained score variables is as follows. The residual variance is independent of the region if the underlying interrelationship between the original variables is linear, which the left plot in Fig. $1.1$ indicates. In contrast, observations that have a larger distance from the origin of the coordinate system will, by default, produce a larger projection distance from the origin, that is a larger score value. In this respect, observations that are associated with an

adjunct region that are further outside will logically produce a larger variance irrespective of whether the variable interrelationships are linear or nonlinear.
The detailed presentation of the nonlinearity test in the remainder of this section is structured as follows. Next, the assumptions imposed on the nonlinearity test are shown, prior to a detailed discussion into the construction of disjunct regions. Subsection $3.3$ then shows how to obtain statistical confidence limits for the nondiagonal elements of the correlation matrix. This is followed by the definition of the accuracy bounds. Finally, a summary of the nonlinearity test is presented and some example studies are presented to demonstrate the working of this test.

## 统计代写|数据科学代写data science代考|Principal Component Analysis

PCA 是一种数据分析技术，它依赖于记录观察的简单转换，存储在向量中和∈Rñ, 以产生统计上独立的分数变量，存储在吨∈Rn,n≤ñ :

‘本文的目的是回顾和检查过去二十年来提出的 PCA 的非线性扩展。这是一个重要的研究领域，因为线性 PCA 对非线性数据的应用可能不够充分 [49]。提出非线性 PCA 扩展的第一次尝试包括利用非度量标度进行泛化，这会产生非线性优化问题 [42]，并通过给定的点云构建曲线，称为主曲线 [25]。受原始变量重建的启发，和^是（谁）给的：

## 统计代写|数据科学代写data science代考|PCA Preliminaries

PCA 分析需要确定一组分数变量吨ķ,ķ∈123⋯n,n≤ñ，通过应用线性变换和 :

pķ吨=(pķ1pķ2pķ3⋯pķj⋯pķ ∑j=1ñpķj2=|pķ|2=1.

\lambda_{k}=\arg \max {\mathbf{p}{k}}\left{E\left{t_{k}^{2}\right}\right}=\arg \max {\mathbf{ p}{k}}\left{E\left{\mathbf{p}{k}^{T} \mathbf{z z}^{T} \mathbf{p}{k}\right}\right}\lambda_{k}=\arg \max {\mathbf{p}{k}}\left{E\left{t_{k}^{2}\right}\right}=\arg \max {\mathbf{ p}{k}}\left{E\left{\mathbf{p}{k}^{T} \mathbf{z z}^{T} \mathbf{p}{k}\right}\right}

E\left{\left(\begin{array}{c} t_{1} \ t_{2} \ t_{3} \ \vdots \ t_{k-1} \end{array}\right) t_{k }\right}=0 \quad\left|\mathbf{p}{k}\right|{2}^{2}-1=0E\left{\left(\begin{array}{c} t_{1} \ t_{2} \ t_{3} \ \vdots \ t_{k-1} \end{array}\right) t_{k }\right}=0 \quad\left|\mathbf{p}{k}\right|{2}^{2}-1=0
Anderson [2] 指出，上述约束优化的公式也可以写成：
\lambda_{k}=\arg \max {\mathbf{p}}\left{E\left{\mathbf{p}^{T} \mathbf{z z}^{T} \mathbf{p}\right} -\lambda{k}\left(\mathbf{p}^{T} \mathbf{p}-1\right)\right}\lambda_{k}=\arg \max {\mathbf{p}}\left{E\left{\mathbf{p}^{T} \mathbf{z z}^{T} \mathbf{p}\right} -\lambda{k}\left(\mathbf{p}^{T} \mathbf{p}-1\right)\right}

\mathbf{p}{k}=\arg \left{E\left{\mathbf{z z}^{T}\right} \mathbf{p}-2 \lambda{k} \mathbf{p}\right} =\mathbf{0}\mathbf{p}{k}=\arg \left{E\left{\mathbf{z z}^{T}\right} \mathbf{p}-2 \lambda{k} \mathbf{p}\right} =\mathbf{0}

## 广义线性模型代考

statistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。