### 统计代写|数据科学代写data science代考|Algorithmic developments

statistics-lab™ 为您的留学生涯保驾护航 在代写数据科学data science方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写数据科学data science方面经验极为丰富，各种代写数据科学data science相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 统计代写|数据科学代写data science代考|Algorithmic developments

Since the concept was proposed by Hastie and Stuetzle in 1989 , a considerable number of refinements and further developments have been reported. The first thrust of such developments address the issue of bias. The HSPCs algorithm has two biases, a model bias and an estimation bias.

Assuming that the data are subjected to some distribution function with gaussian noise, a model bias implies that that the radius of curvature in the curves is larger than the actual one. Conversely, spline functions applied by the algorithm results in an estimated radius that becomes smaller than the actual one.

With regards to the model bias, Tibshirani [69] assumed that data are generated in two stages (i) the points on the curve $f(t)$ are generated from some distribution function $\mu_{t}$ and (ii) $\mathbf{z}$ are formed based on conditional distribution $\mu_{z \mid t}$ (here the mean of $\mu_{z \mid t}$ is $\mathrm{f}(t)$ ). Assume that the distribution functions $\mu_{t}$

and $\mu_{z \mid t}$ are consistent with $\mu_{z}$, that is $\mu_{z}=\int \mu_{z \mid t}(\mathbf{z} \mid t) \mu_{t}(t) \mathrm{d} t$. Therefore, $\mathbf{z}$ are random vectors of dimension $N$ and subject to some density $\mu_{z}$. While the algorithm by Tibshirani [69] overcomes the model bias, the reported experimental results in this paper demonstrate that the practical improvement is marginal. Moreover, the self-consistent property is no longer valid.

In 1992 , Banfield and Raftery [4] addressed the estimation bias problem by replacing the squared distance error with residual and generalized the $\mathrm{PCs}$ into closed-shape curves. However, the refinement also introduces numerical instability and may form a smooth but otherwise incorrect principal curve.
In the mid 1990 s, Duchamp and Stuezle $[18,19]$ studied the holistical differential geometrical property of HSPCs, and analyzed the first and second variation of principal curves and the relationship between self-consistent and curvature of curves. This work discussed the existence of principal curves in the sphere, ellipse and annulus based on the geometrical characters of HSPCs. The work by Duchamp and Stuezle further proved that under the condition that curvature is not equal to zero, the expected square distance from data to principal curve in the plane is just a saddle point but not a local minimum unless low-frequency variation is considered to be described by a constraining term. As a result, cross-validation techniques can not be viewed as an effective measure to be used for the model selection of principal curves.

At the end of the 1990 s, Kégl proposed a new principal curve algorithm that incorporates a length constraint by combining vector quantization with principal curves. For this algorithm, further referred to as the $\mathrm{KPC}$ algorithm, Kégl proved that if and only if the data distribution has a finite secondorder moment, a KPC exists and is unique. This has been studied in detail based on the principle of structural risk minimization, estimation error and approximation error. It is proven in references $[34,35]$ that the $\mathrm{KPC}$ algorithm has a faster convergence rate that the other algorithms described above. This supports to use of the $\mathrm{KPC}$ algorithm for large databases.

## 统计代写|数据科学代写data science代考|Neural Network Approaches

Using the structure shown in Fig. 1.6, Kramer [37] proposed an alternative NLPCA implementation to principal curves and manifolds. This structure represents an autoassociative neural network (ANN), which, in essence, is an identify mapping that consists of a total of 5 layers. Identify mapping relates to this hétwork topōlogy is optimized to reconstruct thẻ $N$ network input variables as accurately as possible using a reduced set of bottleneck nodes $n<N$. From the left to right, the first layer of the $\mathrm{ANN}$ is the input layer that passes weighted values of the original variable set $\mathbf{z}$ onto the second layer, that is the mapping layer:
$$\xi_{i}=\sum_{j=1}^{N} w_{i j}^{(1)} z_{j}+b_{i}^{1}$$
where $w_{i j}^{(1)}$ are the weights for the first layer and $b_{i}^{(1)}$ is a bias term. The sum in (1.41), $\xi_{i}$, is the input the the $i$ th node in the mapping layer that consists of a total of $M_{m}$ nodes. A scaled sum of the outputs of the nonlinearly transformed values $\sigma\left(\xi_{i}\right)$, then produce the nonlinear scores in the bottleneck layer. More precisely, the $p$ th nonlinear score $t_{p}, 1 \leq p \leq n$ is given by:

$$t_{p}=\sum_{i=1}^{M_{m}} w_{p i}^{(2)} \sigma\left(\xi_{i}\right)+b_{p}^{(2)}=\sum_{i=1}^{M_{m}} w_{p i}^{(2)} \sigma\left(\sum_{j=1}^{N} w_{i j}^{(1)} z_{j}+b_{i}^{1}\right)+b_{p}^{(2)}$$
To improve the modeling capability of the ANN structure for mildly nonlinear systems, it is useful to include linear contributions of the original variables $z_{1} z_{2} \cdots z_{N}$ :
$$t_{p}=\sum_{i=1}^{M_{m}} w_{p i}^{(2)} \sigma\left(\sum_{j=1}^{N} w_{i j}^{(1)} z_{j}+b_{i}^{1}\right)+\sum_{j=1}^{N} w_{p i}^{(1 l)} z_{i}+b_{p}^{(2)}$$
where the index $l$ refers to the linear contribution of the original variables. Such a network, where a direct linear contribution of the original variables is included, is often referred to as a generalized neural network. The middle layer of the ANN topology is further referred to as the bottleneck layer.

A linear combination of these nonlinear score variables then produces the inputs for the nodes in the 4 th layer, that is the demapping layer:
$$\tau_{j}=\sum_{p=1}^{n} w_{j p}^{(3)} t_{p}+b_{p}^{(3)}$$
Here, $w_{j p}^{(3)}$ and $b_{p}^{(3)}$ are the weights and the bias term associated with the bottleneck layer, respectively, that represents the input for the $j$ th node of the demapping layer. The nonlinear transformation of $\tau_{j}$ finally provides the reconstruction of the original variables $\mathbf{z}, \widehat{\mathbf{z}}=\left(\widehat{z}{1} \widehat{z}{2} \ldots \widehat{z}{N}\right)^{T}$ by the output layer: $$\widehat{z}{q}=\sum_{j=1}^{M_{d}} w_{q j}^{(4)} \sigma\left(\sum_{p=1}^{n} w_{j p}^{(3)} t_{p}+b_{p}^{(3)}\right)+\sum_{j=1}^{n} w_{q j}^{(3 l)} t_{j}+b_{q}^{(4)}$$

## 统计代写|数据科学代写data science代考|Introduction to kernel PCA

This technique first maps the original input vectors $\mathbf{z}$ onto a high-dimensional feature space $\mathbf{z} \mapsto \boldsymbol{\Phi}(\mathbf{z})$ and then perform the principal component analysis on $\Phi(\mathbf{z})$. Given a set of observations $\mathbf{z}{i} \in \mathbb{R}^{N}, i=\left{1 \quad 2 \cdots K^{*}\right}$, the mapping of $\mathbf{z}{i}$ onto a feature space, that is $\Phi(\mathbf{z})$ whose dimension is considerably larger than $N$, produces the following sample covarianee matrix:
$$\mathbf{S}{\Phi \omega}=\frac{1}{K-1} \sum{i=1}^{K}\left(\boldsymbol{\Phi}\left(\mathbf{z}{i}\right)-\mathbf{m}{\Phi}\right)\left(\mathbf{\Phi}\left(\mathbf{z}{i}\right)-\mathbf{m}{\Phi}\right)^{T}=\frac{1}{K-1} \overline{\boldsymbol{\Phi}}(\mathbf{Z})^{T} \bar{\Phi}(\mathbf{Z}) .$$
Here, $\mathrm{m}{\mathcal{s}}=\frac{1}{K} \Phi(\mathbf{Z})^{T} \mathbf{1}{K}$, where $\mathbf{1}{K} \in \mathbb{R}^{K}$ is a column vector storing unity elements, is the sample mean in the feature space, and $\Phi(\mathbf{Z})=$ $\left[\Phi\left(\mathbf{z}{1}\right) \Phi\left(\mathbf{z}{2}\right) \cdots \mathbf{\Phi}\left(\mathbf{z}{K}\right)\right]^{T}$ and $\bar{\Phi}(\mathbf{Z})=\Phi(\mathbf{Z})-\frac{1}{K} \mathbf{E}{K} \Phi(\mathbf{Z})$, with $\mathbf{E}{K}$ being a matrix of ones, are the original and mean centered feature matrices, respectively.
KPCA now solves the following eigenvector-eigenvalue problem,
$$\mathbf{S}{\Phi \Phi} \mathbf{p}{i}=\frac{1}{K-1} \bar{\Phi}(\mathbf{Z})^{T} \overline{\boldsymbol{\Phi}}(\mathbf{Z}) \mathbf{p}{i}=\lambda{i} \mathbf{p}_{i} \quad i=1,2 \cdots N$$ where $\lambda_{i}$ and $\mathbf{p}{i}$ are the eigenvalue and its associated eigenvector of $\mathbf{S}{\text {कw }}$. respectively. Given that the explicit mapping formulation of $\Phi(\mathbf{z})$ is usually unknown, it is difficult to extract the eigenvector-eigenvalue decomposition of $\mathbf{S}_{\Phi \Phi}$ directly. However, $\mathrm{KPCA}$ overcomes this deficiency as shown below.

## 统计代写|数据科学代写data science代考|Algorithmic developments

1992 年，Banfield 和 Raftery [4] 通过用残差代替平方距离误差解决了估计偏差问题，并推广了磷Cs成闭合曲线。然而，细化也引入了数值不稳定性，并可能形成平滑但不正确的主曲线。
1990 年代中期，Duchamp 和 Stuezle[18,19]研究了HSPCs的整体微分几何特性，分析了主曲线的一阶和二阶变化以及曲线自洽与曲率的关系。本工作基于HSPCs的几何特征，讨论了球面、椭圆和环面主曲线的存在。Duchamp 和 Stuezle 的工作进一步证明，在曲率不等于 0 的情况下，除非考虑低频变化，否则平面内数据到主曲线的期望平方距离只是鞍点而不是局部最小值用一个约束条件来描述。因此，交叉验证技术不能被视为主曲线模型选择的有效手段。

## 统计代写|数据科学代写data science代考|Neural Network Approaches

X一世=∑j=1ñ在一世j(1)和j+b一世1

τj=∑p=1n在jp(3)吨p+bp(3)

## 统计代写|数据科学代写data science代考|Introduction to kernel PCA

KPCA 现在解决了以下特征向量-特征值问题，

## 广义线性模型代考

statistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。