分类：流形学习代写

计算机代写|流形学习代写Manifold learning代考|Math214

Posted on 2023年8月11日2023年8月11日 by statistics-lab

如果你也在怎样代写流形学习Manifold learning 这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。流形学习Manifold learning是一种用于机器学习的降维技术，旨在保留高维数据的底层结构，同时在低维环境中表示它。

流形学习Manifold learningPCA识别数据中的三个主要成分。投影到前两个PCA分量导致沿歧管混合的颜色。流形学习(LLE和IsoMap)在投影数据时保留了局部结构，防止了颜色的混合。

statistics-lab™ 为您的留学生涯保驾护航在代写流形学习Manifold learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写流形学习Manifold learning代写方面经验极为丰富，各种代写流形学习Manifold learning相关的作业也就用不着说。

计算机代写|流形学习代写Manifold learning代考|Diffusion Maps

The basic idea of Diffusion MAPs (Nadler, Lafon, Coifman, and Kevrekidis, 2005; Coifman and Lafon, 2006) uses a Markov chain constructed over a graph of the data points, followed by an eigenanalysis of the probability transition matrix of the Markov chain. As with the other algorithms in this Section, there are three steps in this algorithm, with the first and second steps the same as for Laplacian eigenmaps. Although a nearest-neighbor search (Step 1) was not explicitly considered in the above papers on diffusion maps as a means of constructing the graph (Step 2), a nearest-neighbor search is included in software packages for computing diffusion maps. For an example in astronomy of a diffusion map incorporating a nearest-neighbor search, see Freeman, Newman, Lee, Richards, and Schafer (2009).

Nearest-Neighbor Search. Fix an integer $K$ or an $\epsilon>0$. Define a $K$-neighborhood $N_i^K$ or an $\epsilon$-neighborhood $N_i^\epsilon$ of the point $\mathbf{x}_i$ as in Step 1 of Laplacian eigenmaps. In general, let $N_i$ denote the neighborhood of $\mathbf{x}_i$.

Pairwise Adjacency Matrix. The $n$ data points $\left{\mathbf{x}i\right}$ in $\Re^r$ can be regarded as a graph $\mathcal{G}=\mathcal{G}(\mathcal{V}, \mathcal{E})$ with the data points playing the role of vertices $\mathcal{V}=\left{\mathbf{x}_1, \ldots, \mathbf{x}_n\right}$, and the set of edges $\mathcal{E}$ are the connection strengths (or weights), $w\left(\mathbf{x}_i, \mathbf{x}_j\right)$, between pairs of adjacent vertices, $$ w{i j}=w\left(\mathbf{x}i, \mathbf{x}_j\right)= \begin{cases}\exp \left{-\frac{\left|\mathbf{x}_i-\mathbf{x}_j\right|^2}{2 \sigma^2}\right}, & \text { if } \mathbf{x}_j \in N_i ; \ 0, & \text { otherwise. }\end{cases} $$ This is a Gaussian kernel with width $\sigma$; however, other kernels may be used. Kernels such as (1.52) ensure that the closer two points are to each other, the larger the value of $w$. For convenience in exposition, we will suppress the fact that the elements of most of the matrices depend upon the value of $\sigma$. Then, $\mathbf{W}=\left(w{i j}\right)$ is a pairwise adjacency matrix between the $n$ points. To make the matrix $\mathbf{W}$ even more sparse, values of its entries that are smaller than some given threshold (i.e., the points in question are far apart from each other) can be set to zero. The graph $\mathcal{G}$ with weight matrix $\mathbf{W}$ gives information on the local geometry of the data.
Spectral embedding. Define $\mathbf{D}=\left(d_{i j}\right)$ to be a diagonal matrix formed from the matrix $\mathbf{W}$ by setting the diagonal elements, $d_{i i}=\sum_j w_{i j}$, to be the column sums of $\mathbf{W}$ and the off-diagonal elements to be zero. The $(n \times n)$ symmetric matrix $\mathbf{L}=\mathbf{D}-\mathbf{W}$ is the graph Laplacian for the graph $\mathcal{G}$. We are interested in the solutions of the generalized eigenequation, $\mathbf{L v}=\lambda \mathbf{D v}$, or, equivalently, of the matrix
$$
\mathbf{P}=\mathbf{D}^{-1 / 2} \mathbf{L} \mathbf{D}^{-1 / 2}=\mathbf{I}_n-\mathbf{D}^{-1 / 2} \mathbf{W} \mathbf{D}^{-1 / 2},
$$
which is the normalized graph Laplacian. The matrix $\mathbf{H}=e^{t \mathbf{P}}, t \geq 0$, is usually referred to as the heat kernel. By construction, $\mathbf{P}$ is a stochastic matrix with all row sums equal to one, and, thus, can be interpreted as defining a random walk on the graph $\mathcal{G}$.

计算机代写|流形学习代写Manifold learning代考|Hessian Eigenmaps

Recall that, in certain situations, the convexity assumption for IsomAP may be too restrictive. Instead, we may require that the manifold $\mathcal{M}$ be locally isometric to an open, connected subset of $\Re^t$. Popular examples include families of “articulated” images (i.e., translated or rotated images of the same object, possibly through time) that are found in a high-dimensional, digitized-image library (e.g., faces, pictures, handwritten numbers or letters). However, if the pixel elements of each 64-pixel-by-64-pixel digitized image are represented as a 4,096-dimensional vector in “pixel space,” it would be very difficult to show that the images really live on a low-dimensional manifold, especially if that image manifold is unknown.

We can model such images using a vector of smoothly varying articulation parameters $\boldsymbol{\theta} \in \Theta$. For example, digitized images of a person’s face that are varied by pose and illumination can be parameterized by two pose parameters (expression [happy, sad, sleepy, surprised, wink] and glasses-no glasses) and a lighting direction (centerlight, leftlight, rightlight, normal); similarly, handwritten “2”s appear to be parameterized essentially by two features, bottom loop and top arch (Tenenbaum, de Silva, and Langford, 2000; Roweis and Saul, 2000). To some extent, learning about an underlying image manifold depends upon whether the images are sufficiently scattered around the manifold and how good is the quality of digitization of each image?

Hessian Eigenmaps (Donoho and Grimes, 2003b) were proposed for recovering manifolds of high-dimensional libraries of articulated images where the convexity assumption is often violated. Let $\Theta \subset \Re^t$ be the parameter space and suppose that $\phi: \Theta \rightarrow \Re^r$, where $t<r$. Assume $\mathcal{M}=\phi(\Theta)$ is a smooth manifold of articulated images. The isometry and convexity requirements of IsOMAP are replaced by the following weaker requirements:

Local Isometry: $\phi$ is a locally isometric embedding of $\Theta$ into $\Re^r$. For any point $\mathbf{x}^{\prime}$ in a sufficiently small neighborhood around each point $\mathrm{x}$ on the manifold $\mathcal{M}$, the geodesic distance equals the Euclidean distance between their corresponding parameter points $\boldsymbol{\theta}, \boldsymbol{\theta}^{\prime} \in \Theta ;$ that is,
$$
d^{\mathcal{M}}\left(\mathbf{x}, \mathbf{x}^{\prime}\right)=\left|\boldsymbol{\theta}-\boldsymbol{\theta}^{\prime}\right|_{\Theta},
$$
where $\mathbf{x}=\phi(\boldsymbol{\theta})$ and $\mathbf{x}^{\prime}=\phi\left(\boldsymbol{\theta}^{\prime}\right)$.
Connectedness: The parameter space $\Theta$ is an open, connected subset of $\Re^t$.

流形学习代写

计算机代写|流形学习代写Manifold learning代考|Diffusion Maps

扩散地图的基本思想(Nadler, Lafon, Coifman, and Kevrekidis, 2005;Coifman和Lafon, 2006)使用在数据点图上构造的马尔可夫链，然后对马尔可夫链的概率转移矩阵进行特征分析。与本节中的其他算法一样，该算法有三个步骤，第一步和第二步与拉普拉斯特征映射相同。尽管在上述关于扩散图的论文中，没有明确地将最近邻搜索(步骤1)作为构造图的一种方法(步骤2)，但用于计算扩散图的软件包中包含了最近邻搜索。关于天文学中包含最近邻搜索的扩散图的例子，见Freeman, Newman, Lee, Richards, and Schafer(2009)。

最近邻搜索。修复整数$K$或$\epsilon>0$。定义点$\mathbf{x}_i$的$K$ -邻域$N_i^K$或$\epsilon$ -邻域$N_i^\epsilon$，如拉普拉斯特征映射的步骤1所示。通常，设$N_i$表示$\mathbf{x}_i$的邻域。

成对邻接矩阵。The $n$ 数据点 $\left{\mathbf{x}i\right}$ 在 $\Re^r$ 可以看作是一个图形吗 $\mathcal{G}=\mathcal{G}(\mathcal{V}, \mathcal{E})$ 用数据点扮演顶点的角色 $\mathcal{V}=\left{\mathbf{x}_1, \ldots, \mathbf{x}_n\right}$，以及边的集合 $\mathcal{E}$ 是连接强度(或权重)， $w\left(\mathbf{x}_i, \mathbf{x}_j\right)$，相邻顶点对之间， $$ w{i j}=w\left(\mathbf{x}i, \mathbf{x}_j\right)= \begin{cases}\exp \left{-\frac{\left|\mathbf{x}_i-\mathbf{x}_j\right|^2}{2 \sigma^2}\right}, & \text { if } \mathbf{x}_j \in N_i ; \ 0, & \text { otherwise. }\end{cases} $$ 这是一个有宽度的高斯核 $\sigma$;但是，也可以使用其他内核。像(1.52)这样的核确保两个点越接近，的值就越大 $w$．为了说明方便，我们将省略大多数矩阵的元素依赖于的值这一事实 $\sigma$．然后， $\mathbf{W}=\left(w{i j}\right)$ 之间的成对邻接矩阵 $n$ 分。来制作矩阵 $\mathbf{W}$ 更稀疏的是，小于某个给定阈值(即，所讨论的点彼此相距很远)的条目值可以设置为零。图表 $\mathcal{G}$ 带权矩阵 $\mathbf{W}$ 给出有关数据局部几何形状的信息。

频谱嵌入。将$\mathbf{D}=\left(d_{i j}\right)$定义为由矩阵$\mathbf{W}$形成的对角矩阵，方法是将对角元素$d_{i i}=\sum_j w_{i j}$设置为$\mathbf{W}$的列和，并将非对角元素设置为零。$(n \times n)$对称矩阵$\mathbf{L}=\mathbf{D}-\mathbf{W}$是图$\mathcal{G}$的图拉普拉斯式。我们感兴趣的是广义特征方程$\mathbf{L v}=\lambda \mathbf{D v}$的解，或者，等价地，矩阵的解
$$
\mathbf{P}=\mathbf{D}^{-1 / 2} \mathbf{L} \mathbf{D}^{-1 / 2}=\mathbf{I}_n-\mathbf{D}^{-1 / 2} \mathbf{W} \mathbf{D}^{-1 / 2},
$$
也就是拉普拉斯归一化图。矩阵$\mathbf{H}=e^{t \mathbf{P}}, t \geq 0$，通常被称为热核。通过构造，$\mathbf{P}$是一个所有行和都等于1的随机矩阵，因此，可以解释为在图$\mathcal{G}$上定义了一个随机游走。

计算机代写|流形学习代写Manifold learning代考|Hessian Eigenmaps

回想一下，在某些情况下，IsomAP的凸性假设可能过于严格。相反，我们可能要求流形$\mathcal{M}$与$\Re^t$的一个开放的、连通的子集局部等距。流行的例子包括在高维数字化图像库(如面孔、图片、手写数字或字母)中发现的“关节”图像家族(即同一物体的翻译或旋转图像，可能是通过时间)。然而，如果每个64 × 64像素的数字化图像的像素元素在“像素空间”中被表示为一个4096维的向量，那么就很难表明这些图像确实存在于低维流形上，尤其是在该图像流形未知的情况下。

我们可以使用平滑变化的关节参数的向量$\boldsymbol{\theta} \in \Theta$对这样的图像进行建模。例如，根据姿势和光照变化的人脸数字化图像可以通过两个姿势参数(表情[高兴，悲伤，困倦，惊讶，眨眼]和戴眼镜-不戴眼镜)和光照方向(中心光，左光，右光，正常)来参数化;类似地，手写的“2”似乎主要由两个特征参数化，即底部环和顶部拱(Tenenbaum, de Silva, and Langford, 2000;Roweis and Saul, 2000)。在某种程度上，对底层图像流形的了解取决于图像是否充分分散在流形周围，以及每张图像的数字化质量有多好。

Hessian Eigenmaps (Donoho and Grimes, 2003b)被提出用于恢复经常违反凹凸性假设的高维铰接图像库的流形。设$\Theta \subset \Re^t$为参数空间，假设$\phi: \Theta \rightarrow \Re^r$，其中$t<r$。假设$\mathcal{M}=\phi(\Theta)$是一个平滑的铰接图像集合。IsOMAP的等距和凹凸性要求被以下较弱的要求所取代:

局部等距:$\phi$是在$\Re^r$中嵌入$\Theta$的局部等距。对于流形$\mathcal{M}$上每个点$\mathrm{x}$周围足够小的邻域内的任意点$\mathbf{x}^{\prime}$，测地线距离等于其对应参数点$\boldsymbol{\theta}, \boldsymbol{\theta}^{\prime} \in \Theta ;$之间的欧氏距离，即:
$$
d^{\mathcal{M}}\left(\mathbf{x}, \mathbf{x}^{\prime}\right)=\left|\boldsymbol{\theta}-\boldsymbol{\theta}^{\prime}\right|_{\Theta},
$$
其中$\mathbf{x}=\phi(\boldsymbol{\theta})$和$\mathbf{x}^{\prime}=\phi\left(\boldsymbol{\theta}^{\prime}\right)$。

连接性:参数空间$\Theta$是$\Re^t$的一个开放的、连接的子集。

计算机代写|流形学习代写Manifold learning代考请认准statistics-lab™

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

金融工程是使用数学技术来解决金融问题。金融工程使用计算机科学、统计学、经济学和应用数学领域的工具和知识来解决当前的金融问题，以及设计新的和创新的金融产品。

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

术语广义线性模型（GLM）通常是指给定连续和/或分类预测因素的连续响应变量的常规线性回归模型。它包括多元线性回归，以及方差分析和方差分析（仅含固定效应）。

有限元方法代写

有限元方法（FEM）是一种流行的方法，用于数值解决工程和数学建模中出现的微分方程。典型的问题领域包括结构分析、传热、流体流动、质量运输和电磁势等传统领域。

有限元是一种通用的数值方法，用于解决两个或三个空间变量的偏微分方程（即一些边界值问题）。为了解决一个问题，有限元将一个大系统细分为更小、更简单的部分，称为有限元。这是通过在空间维度上的特定空间离散化来实现的，它是通过构建对象的网格来实现的：用于求解的数值域，它有有限数量的点。边界值问题的有限元方法表述最终导致一个代数方程组。该方法在域上对未知函数进行逼近。[1] 然后将模拟这些有限元的简单方程组合成一个更大的方程系统，以模拟整个问题。然后，有限元通过变化微积分使相关的误差函数最小化来逼近一个解决方案。

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

随机分析代写

随机微积分是数学的一个分支，对随机过程进行操作。它允许为随机过程的积分定义一个关于随机过程的一致的积分理论。这个领域是由日本数学家伊藤清在第二次世界大战期间创建并开始的。

时间序列分析代写

随机过程，是依赖于参数的一组随机变量的全体，参数通常是时间。随机变量是随机现象的数量表现，其时间序列是一组按照时间发生先后顺序进行排列的数据点序列。通常一组时间序列的时间间隔为一恒定值（如1秒，5分钟，12小时，7天，1年），因此时间序列可以作为离散时间数据进行分析处理。研究时间序列数据的意义在于现实中，往往需要研究某个事物其随时间发展变化的规律。这就需要通过研究该事物过去发展的历史记录，以得到其自身发展的规律。

回归分析代写

多元回归分析渐进（Multiple Regression Analysis Asymptotics）属于计量经济学领域，主要是一种数学上的统计分析方法，可以分析复杂情况下各影响因素的数学关系，在自然科学、社会和经济学等多个领域内应用广泛。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习和应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

机器学习代写|流形学习代写manifold data learning代考|ICML2022

Posted on 2023年2月10日2023年2月10日 by statistics-lab

如果你也在怎样代写流形学习manifold data learning这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

流形学习是机器学习的一个流行且快速发展的子领域，它基于一个假设，即一个人的观察数据位于嵌入高维空间的低维流形上。本文介绍了流形学习的数学观点，深入探讨了核学习、谱图理论和微分几何的交叉点。重点放在图和流形之间的显著相互作用上，这构成了流形正则化技术的广泛使用的基础。

statistics-lab™ 为您的留学生涯保驾护航在代写流形学习manifold data learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写流形学习manifold data learning代写方面经验极为丰富，各种代写流形学习manifold data learning相关的作业也就用不着说。

我们提供的流形学习manifold data learning及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

机器学习代写|流形学习代写manifold data learning代考|ICML2022

机器学习代写|流形学习代写manifold data learning代考|History of Probabilistic Dimensionality Reduction

The probabilistic variants of many of the spectral dimensionality reduction methods were gradually developed and proposed. For example, probabilistic $P C A[50,62]$ is the stochastic version of PCA, and it assumes that data are obtained by the addition of noise to the linear projection of a latent random variable. It can be demonstrated that PCA is a special case of probabilistic PCA, where the variance of noise tends to zero. The probabilistic PCA, itself, is a special case of factor analysis [17]. In the factor analysis, the different dimensions of noise can be correlated, while they are uncorrelated and isometric in the probabilistic PCA. The factor analysis and the probabilistic PCA use expectation maximization and variational inference on the data.

The linear spectral dimensionality reduction methods, such as PCA and FDA, learn a projection matrix from the data for better data representation or more separation between classes. In 1984, it was surprisingly determined that if a random matrix is used for the linear projection of the data, without being learned from the data, it represents data well! The correctness of this mystery of random projection was proven by the Johnson-Lindenstrauss lemma [33], which put a bound on the error of preservation of the distances in the subspace. Later, nonlinear variants of random projection were developed, including random Fourier features [48] and random kitchen sinks [49].

Sufficient Dimension Reduction $(S D R)$ is another family of probabilistic methods, whose first method was proposed in 1991 [39]. It is used for finding a transformation of data to a lower-dimensional space, which does not change the conditional of labels given data. Therefore, the subspace is sufficient for predicting labels from projected data onto the subspace. SDR was mainly proposed for high-dimensional regression, where the regression labels are used. Later, Kernel Dimensionality Reduction $(K D R)$ was proposed [18] as a method in the family of SDR for dimensionality reduction in machine learning.

Stochastic Neighbour Embedding (SNE) was proposed in 2003 [30] and took a probabilistic approach to dimensionality reduction. It attempted to preserve the probability of a point being a neighbour of others in the subspace. A problem with SNE was that it could not find an optimal subspace because it was not possible for it to preserve all the important information of high-dimensional data in the low-dimensional subspace. Therefore, t-SNE was proposed [41], which used another distribution with more capacity in the subspace. This allowed t-SNE to preserve a larger amount of information in the low-dimensional subspace. A recent successful probabilistic dimensionality reduction method is the Uniform Manifold Approximation and Projection (UMAP) [43], which is widely used for data visualization. Today, both t-SNE and UMAP are used for high-dimensional data visualization, especially in the visualization of extracted features in deep learning. They have also been widely used for visualizing high-dimensional genome data.

机器学习代写|流形学习代写manifold data learning代考|History of Neural Network-Based Dimensionality

Neural networks are machine learning models modeled after the neural structure of the human brain. Neural networks are currently powerful tools for representation learning and dimensionality reduction. In the 1990s, researchers’ interest in neural networks decreased; this was called the winter of neural networks. This winter occurred mainly because networks could not become deep, as gradients vanished after many layers of network during optimization. The success of kernel support vector machines [10] also exaggerated this winter. In 2006, Hinton and Salakhutdinov demonstrated that a network’s weights can be initialized using energy-based training, where the layers of the network are considered stacks of Restricted Boltzmann Machines (RBM) [1,31]. RBM is a two-layer structure of neurons, whose weights between the two layers are trained using maximum likelihood estimation [68]. This initialization saved the neural network from the vanishing gradient problem and ended the neural networks’ winter. A deep network using RBM training was named the deep belief network [29]. Although, later, the proposal of the ReLU activation function [23] and the dropout technique [59] made it possible to train deep neural networks with random initial weights [24].

In fundamental machine learning, people often extract features using traditional dimensionality reduction and then apply the classification, regression, or clustering task afterwards. However, modern deep learning extracts features and learns embedding spaces in the layers of the network; this process is called end-to-end. Therefore, deep learning can be seen as performing a form of dimensionality reduction as part of its model. One problem with end-to-end models is that they are harder to troubleshoot if the performance is not satisfactory on a part of the data. The insights and meaning of the data coming from representation learning are critical to fully understand a model’s performance. Some of these insights can be useful for improving or understanding how deep neural networks operate. Researchers often visualize the extracted features of a neural network to interpret and analyze why deep learning is working properly on their data.

Deep metric learning [35] utilizes deep neural networks for extracting lowdimensional descriptive features from data at the last or one-to-last layer of the network. Siamese networks [11] are important network structures for deep metric learning. They contain several identical networks that share their weights, but have different inputs. Contrastive loss [27] and triplet loss [56] are two well-known loss functions that were proposed for training Siamese networks. Deep reconstruction autoencoders also make it possible to capture informative features at the bottleneck between the encoder and decoder.

流形学习代写

机器学习代写|流形学习代写manifold data learning代考|History of Probabilistic Dimensionality Reduction

许多谱降维方法的概率变体逐渐被开发和提出。例如，概率PCA[50,62]是 PCA 的随机版本，它假设数据是通过将噪声添加到潜在随机变量的线性投影中获得的。可以证明 PCA 是概率 PCA 的一个特例，其中噪声的方差趋于零。概率 PCA 本身是因子分析的一个特例 [17]。在因子分析中，噪声的不同维度可以相关，而在概率PCA中它们是不相关和等距的。因子分析和概率 PCA 对数据使用期望最大化和变分推理。

PCA 和 FDA 等线性光谱降维方法从数据中学习投影矩阵，以实现更好的数据表示或更好的类间分离。1984年，令人惊奇地确定，如果用一个随机矩阵来做数据的线性投影，不用从数据中学习，就可以很好地表示数据！Johnson-Lindenstrauss 引理 [33] 证明了这种随机投影之谜的正确性，该引理限制了子空间中距离保存的误差。后来，开发了随机投影的非线性变体，包括随机傅立叶特征 [48] 和随机厨房水槽 [49]。

足够的降维(小号丁R)是另一类概率方法，其第一个方法于 1991 年提出 [39]。它用于寻找数据到低维空间的转换，这不会改变给定数据的标签条件。因此，子空间足以预测来自投影数据到子空间的标签。SDR 主要是为高维回归提出的，其中使用了回归标签。后来，内核降维(钾丁R)被提议 [18] 作为 SDR 家族中的一种方法，用于机器学习中的降维。

随机邻域嵌入（SNE）于 2003 年提出 [30]，并采用概率方法来降维。它试图保留一个点作为子空间中其他点的邻居的概率。SNE 的一个问题是它找不到最佳子空间，因为它不可能在低维子空间中保留高维数据的所有重要信息。因此，t-SNE被提出[41]，它使用了另一种子空间容量更大的分布。这允许 t-SNE 在低维子空间中保留大量信息。最近成功的概率降维方法是均匀流形近似和投影（UMAP）[43]，它被广泛用于数据可视化。今天，t-SNE和UMAP都用于高维数据可视化，特别是深度学习中提取特征的可视化。它们还被广泛用于可视化高维基因组数据。

机器学习代写|流形学习代写manifold data learning代考|History of Neural Network-Based Dimensionality

神经网络是模仿人脑神经结构的机器学习模型。神经网络目前是表示学习和降维的强大工具。20 世纪 90 年代，研究人员对神经网络的兴趣下降；这被称为神经网络的冬天。这个冬天的发生主要是因为网络无法变深，因为在优化过程中，梯度在多层网络之后消失了。内核支持向量机 [10] 的成功也在这个冬天被夸大了。2006 年，Hinton 和 Salakhutdinov 证明了可以使用基于能量的训练来初始化网络的权重，其中网络的层被视为受限玻尔兹曼机 (RBM) 的堆栈 [1,31]。RBM是神经元的两层结构，其两层之间的权重使用最大似然估计 [68] 进行训练。这种初始化将神经网络从梯度消失问题中解救出来，结束了神经网络的寒冬。使用 RBM 训练的深度网络被命名为深度信念网络 [29]。尽管后来，ReLU 激活函数 [23] 和 dropout 技术 [59] 的提出使训练具有随机初始权重的深度神经网络成为可能 [24]。

在基础机器学习中，人们通常使用传统的降维方法提取特征，然后再应用分类、回归或聚类任务。然而，现代深度学习提取特征并学习网络层中的嵌入空间；这个过程称为端到端。因此，深度学习可以被视为执行一种形式的降维，作为其模型的一部分。端到端模型的一个问题是，如果部分数据的性能不令人满意，则很难对其进行故障排除。来自表示学习的数据的见解和意义对于充分理解模型的性能至关重要。其中一些见解可能有助于改进或理解深度神经网络的运作方式。

深度度量学习 [35] 利用深度神经网络从网络最后一层或最后一层的数据中提取低维描述性特征。孪生网络 [11] 是深度度量学习的重要网络结构。它们包含几个共享权重但具有不同输入的相同网络。Contrastive loss [27] 和 triplet loss [56] 是两个众所周知的损失函数，它们被提议用于训练 Siamese 网络。深度重建自动编码器还可以在编码器和解码器之间的瓶颈处捕获信息特征。

机器学习代写|流形学习代写manifold data learning代考请认准statistics-lab™

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

机器学习代写|流形学习代写manifold data learning代考|SCl7314

Posted on 2023年2月10日2023年2月10日 by statistics-lab

如果你也在怎样代写流形学习manifold data learning这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的流形学习manifold data learning及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

机器学习代写|流形学习代写manifold data learning代考|SCl7314

机器学习代写|流形学习代写manifold data learning代考|Dimensionality Reduction and Manifold Learning

Feature extraction is also referred to as dimensionality reduction, manifold learning [12], subspace learning, submanifold learning, manifold unfolding, embedding, encoding, and representation learning [7, 70]. This book uses manifold learning and dimensionality reduction interchangeably for feature extraction. Manifold learning techniques can be used in a variety of ways, including:

Data dimensionality reduction: Produce a compact (compressed) lowdimensional encoding of a given high-dimensional dataset.
Data visualization: Provide an interpretation of a given dataset in terms of intrinsic degrees of freedom, usually as a byproduct of data dimensionality reduction.
Preprocessing for supervised learning: Simplify, reduce, and clean the data for subsequent supervised training.

In dimensionality reduction, the data points are mapped to a lower-dimensional subspace either linearly or nonlinearly. Dimensionality reduction methods can be grouped into three categories-spectral dimensionality reduction, probabilistic dimensionality reduction, and (artificial) neural network-based dimensionality reduction [19] (see Fig.1.3). These categories are introduced in the following subsections.

Consider several points in a three-dimensional Euclidean space. Assume these points lie on a nonlinear submanifold with a local dimensionality of two, as illustrated in Fig. 1.4a. This means that there is no need to have three features to represent each of these points. Rather, two features can represent most of the data points’ information if the two features demonstrate the 2D coordinates on the submanifold. A nonlinear dimensionality reduction method can unfold this manifold correctly, as depicted in Fig. 1.4b. However, a linear dimensionality reduction method cannot properly find a correct underlying $2 \mathrm{D}$ representation of the data points. Figure $1.4 \mathrm{c}$ demonstrates that a linear method ruins the relative structure of the data points. This is because a linear method uses the Euclidean distances between the points, while a nonlinear method considers the geodesic distances along the nonlinear submanifold. If the submanifold is linear, the linear method is able to obtain the lower-dimensional structure of the data. Spectral dimensionality reduction methods typically have a geometric perspective and attempt to find the linear or nonlinear submanifold of the data. These methods are often reduced to a generalized eigenvalue problem [20].

机器学习代写|流形学习代写manifold data learning代考|History of Spectral Dimensionality Reduction

Principal Component Analysis (PCA) [34] was first proposed by Pearson in 1901 [47]. It was the first spectral dimensionality reduction method and one of the first methods in linear subspace learning. It is unsupervised, meaning that it does not use any class labels. Fisher Discriminant Analysis (FDA) [16], proposed by Fisher in 1936 [15], was the first supervised spectral dimensionality reduction method. PCA and FDA are based on the scatter, i.e., variance of the data. A proper subspace preserves either the relative similarity or relative dissimilarity of the data points after transformation of data from the input space to the subspace. This was the goal of Multidimensional Scaling (MDS) [13], which preserves the relative similarities of data points in its subspace. In later MDS approaches, the cost function was changed to preserve the distances between points [37], which developed into Sammon mapping [52]. Sammon mapping is considered to be the first nonlinear dimensionality reduction method.

Figure $1.4$ demonstrates that a linear algorithm cannot perform well on nonlinear data. For nonlinear data, two approaches can be used:

a nonlinear algorithm should be designed to handle nonlinear data, or
the nonlinear data should be modified to become linear. In this case, the data should be transformed to another space to become linearly separable in that space. Then, the transformed data, which now have a linear pattern, will be able to use a linear approach. This approach is called kernelization in machine learning.
The kernel PCA $[54,55]$ uses the PCA and the kernel trick [32] to transform data to a high-dimensional space so that it becomes roughly linear within that space. Kernel FDA $[44,45]$ was also proposed to manipulate nonlinear data in a supervised manner using representation theory [3]. Representation theory can be used for kernelization; it will be introduced in Chap. 3 .

流形学习代写

机器学习代写|流形学习代写manifold data learning代考|Dimensionality Reduction and Manifold Learning

特征提取也称为降维、流形学习 [12]、子空间学习、子流形学习、流形展开、嵌入、编码和表示学习 [7, 70]。本书交替使用流形学习和降维来进行特征提取。流形学习技术可以以多种方式使用，包括：

数据降维：为给定的高维数据集生成紧凑（压缩）的低维编码。
数据可视化：根据内在自由度提供给定数据集的解释，通常作为数据降维的副产品。
Preprocessing for supervised learning：简化、减少和清洗数据，用于后续的监督训练。

在降维中，数据点被线性或非线性映射到低维子空间。降维方法可以分为三类——谱降维、概率降维和基于（人工）神经网络的降维[19]（见图1.3）。这些类别在以下小节中介绍。

考虑三维欧几里德空间中的几个点。假设这些点位于局部维数为 2 的非线性子流形上，如图 1.4a 所示。这意味着不需要三个特征来表示这些点中的每一个。相反，如果这两个特征展示了子流形上的二维坐标，则这两个特征可以表示大部分数据点的信息。如图 1.4b 所示，非线性降维方法可以正确展开该流形。然而，线性降维方法无法正确找到正确的底层2丁数据点的表示。数字1.4C表明线性方法破坏了数据点的相对结构。这是因为线性方法使用点之间的欧几里得距离，而非线性方法考虑沿非线性子流形的测地线距离。如果子流形是线性的，则线性方法能够获得数据的低维结构。谱降维方法通常具有几何视角，并试图找到数据的线性或非线性子流形。这些方法通常被简化为广义特征值问题 [20]。

机器学习代写|流形学习代写manifold data learning代考|History of Spectral Dimensionality Reduction

主成分分析（PCA）[34]最早由 Pearson 于 1901 年提出[47]。它是第一个谱降维方法，也是线性子空间学习中最早的方法之一。它是无监督的，这意味着它不使用任何类别标签。Fisher判别分析（FDA）[16]是由Fisher于1936年[15]提出的，是第一个有监督的谱降维方法。PCA 和 FDA 基于散点，即数据的方差。在将数据从输入空间转换到子空间之后，适当的子空间保留数据点的相对相似性或相对不相似性。这是多维缩放 (MDS) [13] 的目标，它保留了其子空间中数据点的相对相似性。在后来的 MDS 方法中，更改成本函数以保留点之间的距离 [37]，这发展成 Sammon 映射 [52]。Sammon映射被认为是第一个非线性降维方法。

数字1.4表明线性算法不能很好地处理非线性数据。对于非线性数据，可以使用两种方法：

应该设计一个非线性算法来处理非线性数据，或者
非线性数据应修改为线性。在这种情况下，应该将数据转换到另一个空间，使其在该空间中线性可分。然后，现在具有线性模式的转换数据将能够使用线性方法。这种方法在机器学习中称为内核化。
内核PCA[54,55]使用 PCA 和内核技巧 [32] 将数据转换到高维空间，使其在该空间内大致呈线性。内核FDA[44,45]还提出了使用表示论 [3] 以监督方式操纵非线性数据。表示论可用于核化；将在第 1 章介绍。3.

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

机器学习代写|流形学习代写manifold data learning代考|EECS559

Posted on 2023年2月10日2023年2月10日 by statistics-lab

如果你也在怎样代写流形学习manifold data learning这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的流形学习manifold data learning及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

机器学习代写|流形学习代写manifold data learning代考|EECS559

机器学习代写|流形学习代写manifold data learning代考|Manifold Hypothesis

Each feature of a data point does not carry an equal amount of information. For example, some pixels of an image are background regions with limited information, while other pixels contain important objects that describe the scene in the image. This means that data points can be significantly compressed to preserve the most informative features while eliminating those with limited information. In other words, the $d$-dimensional data points of a dataset usually do not cover the entire $d$ dimensional Euclidean space, but they lie on a specific lower-dimensional structure in the space.

Consider the illustration in Fig. 1.1, where several three-dimensional points exist in $\mathbb{R}^3$. These points can represent any measurement, such as personal health measurements, including blood pressure, blood sugar, and blood fat. As demonstrated in Fig. 1.1, the points of the dataset have a structure in a two-dimensional space. The three-dimensional Euclidean space is called the input space, and the twodimensional space, which has a lower dimensionality than the input space, is called the subspace, the submanifold, or the embedding space. The subspace can be either linear or nonlinear, depending on whether a linear (hyper)plane passes through the points. Usually, subspace and submanifold are used for linear and nonlinear lowerdimensional spaces, respectively. Linear and nonlinear subspaces are depicted in Fig. 1.1a and b, respectively.

Whether the points of a dataset lie on a space is a hypothesis, but this hypothesis is usually true because the data points typically represent a natural signal, such as an image. When the data acquisition process is natural, the data will have a define structure. For example, in the dataset where there are multiple images from different angles depicting the same scene, the objects of the scene remain the same, but the point of view changes (see Fig. 1.2). This hypothesis is called the manifold hypothesis [14]. Its formal definition is as follows. According to the manifold hypothesis, data points of a dataset lie on a submanifold or subspace with lower dimensionality. In other words, the dataset in $\mathbb{R}^d$ lies on an embedded submanifold [38] with local dimensionality less than $d$ [14]. According to this hypothesis, the data points most often lie on a submanifold with high probability [64].

机器学习代写|流形学习代写manifold data learning代考|Feature Engineering

Due to the manifold hypothesis, a dataset can be compressed while preserving most of the important information. Therefore, engineering and processing can be applied to the features for the sake of compression [4]. Feature engineering can be seen as a preprocessing stage, where the dimensionality of the data is reduced. Assume $d$ and $p$ denote the dimensionality of the input space and the subspace, respectively, where $p \in(0, d]$. Feature engineering is a map from a $d$-dimensional Euclidean space to a $p$-dimensional Euclidean space, i.e., $\mathbb{R}^d \rightarrow \mathbb{R}^p$. The dimensionality of the subspace is usually much smaller than the dimensionality of the space, i.e. $p \ll d$, because most of the information usually exists in only a few features.

Feature engineering is divided into two broad approaches-feature selection and feature extraction [22]. In feature selection, the $p$ most informative features of the $d$-dimensional data vector are selected so the features of the transformed data points are a subset of the original features. In feature extraction, however, the $d$-dimensional data vector is transformed to a $p$-dimensional data vector, where the $p$ new features are completely different from the original features. In other words, data points are represented in another lower-dimensional space. Both feature selection and feature extraction are used for compression, which results in either the better discrimination of classes or better representation of data. In other words, the compressed data by feature engineering may have a better representation of the data or may separate the classes of data. This book concentrates on feature extraction.

流形学习代写

机器学习代写|流形学习代写manifold data learning代考|Manifold Hypothesis

数据点的每个特征并不携带等量的信息。例如，图像的一些像素是信息有限的背景区域，而其他像素包含描述图像中场景的重要对象。这意味着可以显着压缩数据点以保留信息最多的特征，同时消除信息有限的特征。换句话说，d数据集的维数据点通常不会覆盖整个d维欧几里德空间，但它们位于空间中特定的低维结构上。

考虑图 1.1 中的图示，其中存在几个三维点R3. 这些点可以表示任何测量值，例如个人健康测量值，包括血压、血糖和血脂。如图 1.1 所示，数据集的点在二维空间中具有结构。三维欧氏空间称为输入空间，维数低于输入空间的二维空间称为子空间、子流形或嵌入空间。子空间可以是线性的也可以是非线性的，这取决于线性（超）平面是否通过这些点。通常，子空间和子流形分别用于线性和非线性低维空间。线性和非线性子空间分别如图 1.1a 和 b 所示。

数据集的点是否位于空间上是一个假设，但这个假设通常是正确的，因为数据点通常表示自然信号，例如图像。当数据采集过程自然时，数据将具有定义的结构。例如，在有多张不同角度的图像描绘同一场景的数据集中，场景的对象保持不变，但视角发生变化（见图1.2）。这个假设被称为流形假设[14]。它的正式定义如下。根据流形假设，数据集的数据点位于较低维度的子流形或子空间上。换句话说，数据集在Rd位于局部维数小于的嵌入子流形 [38] 上d[14]。根据这个假设，数据点最常位于子流形上的概率很高 [64]。

机器学习代写|流形学习代写manifold data learning代考|Feature Engineering

由于流形假设，可以在保留大部分重要信息的同时压缩数据集。因此，为了压缩 [4]，可以对特征应用工程和处理。特征工程可以看作是一个预处理阶段，其中数据的维度被降低。认为d和p分别表示输入空间和子空间的维数，其中p∈(0,d]. 特征工程是一张来自d维欧几里德空间到p-维欧几里德空间，即Rd→Rp. 子空间的维数通常远小于空间的维数，即p≪d，因为大部分信息通常只存在于少数特征中。

特征工程分为两大类——特征选择和特征提取[22]。在特征选择中，p最有信息量的特征d选择维数据向量，因此转换数据点的特征是原始特征的子集。然而，在特征提取中，d维数据向量被转换为p维数据向量，其中p新功能与原来的功能完全不同。换句话说，数据点在另一个低维空间中表示。特征选择和特征提取都用于压缩，这可以更好地区分类别或更好地表示数据。换句话说，通过特征工程压缩的数据可能具有更好的数据表示或可能分离数据的类别。本书专注于特征提取。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

机器学习代写|流形学习代写manifold data learning代考|SCl 7314

Posted on 2022年7月20日2022年7月20日 by statistics-lab

如果你也在怎样代写流形学习manifold data learning这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的流形学习manifold data learning及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

机器学习代写|流形学习代写manifold data learning代考|Curves and Geodesics

If the Riemannian manifold $(\mathcal{M}, g)$ is connected, it is a metric space with an induced topology that coincides with the underlying manifold topology. We can, therefore, define a function $d^{\mathcal{M}}$ on $\mathcal{M}$ that calculates distances between points on $\mathcal{M}$ and determines its structure.

Let $\mathbf{p}, \mathbf{q} \in \mathcal{M}$ be any two points on the Riemannian manifold $\mathcal{M}$. We first define the length of a (one-dimensional) curve in $\mathcal{M}$ that joins $\mathbf{p}$ to $\mathbf{q}$, and then the length of the shortest such curve.

A curve in $\mathcal{M}$ is defined as a smooth mapping from an open interval $\Lambda$ (which may have infinite length) in $\Re$ into $\mathcal{M}$. The point $\lambda \in \Lambda$ forms a parametrization of the curve. Let $c(\lambda)=\left(c_{1}(\lambda), \cdots, c_{d}(\lambda)\right)^{\top}$ be a curve in $\Re^{d}$ parametrized by $\lambda \in \Lambda \subseteq \Re$. If we take the coordinate functions, $\left{c_{h}(\lambda)\right}$, of $c(\lambda)$ to be as smooth as needed (usually, $\mathcal{C}^{\infty}$, functions that have any number of continuous derivatives), then we say that $c$ is a smooth curve. If $c(\lambda+\alpha)=c(\lambda)$ for all $\lambda, \lambda+\alpha \in \Lambda$, the curve $c$ is said to be closed. The velocity (or tangent) vector at the point $\lambda$ is given by
$$
c^{\prime}(\lambda)=\left(c_{1}^{\prime}(\lambda), \cdots, c_{d}^{\prime}(\lambda)\right)^{\tau},
$$
where $c_{j}^{\prime}(\lambda)=d c_{j}(\lambda) / d \lambda$, and the “speed” of the curve is
$$
\left|c^{\prime}(\lambda)\right|=\left{\sum_{j=1}^{d}\left[c_{j}^{\prime}(\lambda)\right]^{2}\right}^{1 / 2}
$$
Distance on a smooth curve $c$ is given by arc-length, which is measured from a fixed point $\lambda_{0}$ on that curve. Usually, the fixed point is taken to be the origin, $\lambda_{0}=0$, defined to be one of the two endpoints of the data. More generally, the arc-length $L(c)$ along the curve $c(\lambda)$ from point $\lambda_{0}$ to point $\lambda_{1}$ is defined as
$$
L(c)=\int_{\lambda_{0}}^{\lambda_{1}}\left|c^{\prime}(\lambda)\right| d \lambda .
$$

机器学习代写|流形学习代写manifold data learning代考|Linear Manifold Learning

Most statistical theory and applications that deal with the problem of dimensionality reduction are focused on linear dimensionality reduction and, by extension, linear manifold learning. A linear manifold can be visualized as a line, a plane, or a hyperplane, depending upon the number of dimensions involved. Data are observed in some high-dimensional space and it is usually assumed that a lower-dimensional linear manifold would be the most appropriate summary of the relationship between the variables. Although data tend not to live on a linear manifold, we view the problem as having two kinds of motivations. The first such motivation is to assume that the data live close to a linear manifold, the distance off the manifold determined by a random error (or noise) component. A second way of thinking about linear manifold learning is that a linear manifold is really a simple linear approximation to a more complicated type of nonlinear manifold that would probably be a better fit to the data. In both scenarios, the intrinsic dimensionality of the linear manifold is taken to be much smaller than the dimensionality of the data.

Identifying a linear manifold embedded in a higher-dimensional space is closely related to the classical statistics problem of linear dimensionality reduction. The recommended way of accomplishing linear dimensionality reduction is to create a reduced set of linear transformations of the input variables. Linear transformations are projection methods, and so the problem is to derive a sequence of low-dimensional projections of the input data that possess some type of optimal properties.

There are many techniques that can be used for either linear dimensionality reduction or linear manifold learning. In this chapter, we describe only two linear methods, namely, principal component analysis and multidimensional scaling. The earliest projection method was principal component analysis (dating back to 1933), and this technique has become the most popular dimensionality-reducing technique in use today. A related method is that of multidimensional scaling (dating back to 1952), which has a very different motivation. An adaptation of multidimensional scaling provided the core element of the IsOMAP algorithm for nonlinear manifold learning.

流形学习代写

机器学习代写|流形学习代写manifold data learning代考|Curves and Geodesics

如果黎曼流形 $(\mathcal{M}, g)$ 是连通的，它是一个度量空间，其诱导拓扑与底层流形拓扑一致。因此，我们可以定义一个函数 $d^{\mathcal{M}}$ 上 $\mathcal{M}$ 计算点之间的距离 $\mathcal{M}$ 并确定其结构。
让 $\mathbf{p}, \mathbf{q} \in \mathcal{M}$ 是黎曼流形上的任意两点 $\mathcal{M}$. 我们首先定义一条 (一维) 曲线的长度 $\mathcal{M}$ 加入 $\mathbf{p}$ 至 $\mathbf{q}$, 然后是最短的这种曲线的长度。
中的一条曲线 $\mathcal{M}$ 定义为开区间的平滑映射 $\Lambda$ (可能有无限长) 在 $\Re$ 进入 $\mathcal{M}$. 重点 $\lambda \in \Lambda$ 形成曲线的参数化。让 $c(\lambda)=\left(c_{1}(\lambda), \cdots, c_{d}(\lambda)\right)^{\top}$ 成为曲线 $\Re^{d}$ 参数化 $\lambda \in \Lambda \subseteq \Re$. 如果我们取坐标函数，
lleft{c_{h}(Nambda)\right }，的 $c(\lambda)$ 尽可能平滑（通常， $\mathcal{C}^{\infty}$ ，具有任意数量的连续导数的函数），那么我们说 $c$ 是一条平滑曲线。如果 $c(\lambda+\alpha)=c(\lambda)$ 对所有人 $\lambda, \lambda+\alpha \in \Lambda$, 曲线 $c$ 据说是关闭的。该点的速度 (或切线) 矢量 $\lambda$ 是 (谁) 给的
$$
c^{\prime}(\lambda)=\left(c_{1}^{\prime}(\lambda), \cdots, c_{d}^{\prime}(\lambda)\right)^{\tau}
$$
在哪里 $c_{j}^{\prime}(\lambda)=d c_{j}(\lambda) / d \lambda$ ，曲线的“速度”为
平滑曲线上的距离 $c$ 由弧长给出，从一个固定点测量 $\lambda_{0}$ 在那条曲线上。通常，以不动点为原点， $\lambda_{0}=0$ ，定义为数据的两个端点之一。更一般地，弧长 $L(c)$ 沿着曲线 $c(\lambda)$ 从点 $\lambda_{0}$ 指向 $\lambda_{1}$ 定义为
$$
L(c)=\int_{\lambda_{0}}^{\lambda_{1}}\left|c^{\prime}(\lambda)\right| d \lambda .
$$

机器学习代写|流形学习代写manifold data learning代考|Linear Manifold Learning

大多数处理降维问题的统计理论和应用都集中在线性降维上，并通过扩展，线性流形学习。线性流形可以可视化为线、平面或超平面，具体取决于所涉及的维数。数据是在一些高维空间中观察到的，通常假设低维线性流形是变量之间关系的最合适的总结。尽管数据往往不存在于线性流形上，但我们认为这个问题有两种动机。第一个这样的动机是假设数据靠近线性流形，流形的距离由随机误差（或噪声）分量确定。关于线性流形学习的第二种思考方式是，线性流形实际上是对更复杂类型的非线性流形的简单线性近似，可能更适合数据。在这两种情况下，线性流形的内在维度都被认为远小于数据的维度。

识别嵌入在高维空间中的线性流形与线性降维的经典统计问题密切相关。完成线性降维的推荐方法是创建一组输入变量的简化线性变换。线性变换是投影方法，因此问题是推导出具有某种最佳属性的输入数据的一系列低维投影。

有许多技术可用于线性降维或线性流形学习。在本章中，我们只描述了两种线性方法，即主成分分析和多维缩放。最早的投影方法是主成分分析（可追溯到 1933 年），该技术已成为当今最流行的降维技术。一种相关的方法是多维缩放（可追溯到 1952 年），其动机非常不同。多维缩放的适应为非线性流形学习提供了 IsOMAP 算法的核心元素。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

机器学习代写|流形学习代写manifold data learning代考|INFS6077

Posted on 2022年7月20日2022年7月20日 by statistics-lab

如果你也在怎样代写流形学习manifold data learning这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的流形学习manifold data learning及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

机器学习代写|流形学习代写manifold data learning代考|INFS6077

机器学习代写|流形学习代写manifold data learning代考|Topological Spaces

Topological spaces were introduced by Maurice Fréchet (1906) (in the form of metric spaces), and the idea was developed and extended over the next few decades. Amongst those who contributed significantly to the subject was Felix Hausdorff, who in 1914 coined the phrase “topological space” using Johann Benedict Listing’s German word Topologie introduced in $1847 .$

A topological space $\mathcal{X}$ is a nonempty collection of subsets of $\mathcal{X}$ which contains the empty set, the space itself, and arbitrary unions and finite intersections of those sets. A topological space is often denoted by $(\mathcal{X}, \mathcal{T})$, where $\mathcal{T}$ represents the topology associated with $\mathcal{X}$. The elements of $\mathcal{T}$ are called the open sets of $\mathcal{X}$, and a set is closed if its complement is open. Topological spaces can also be characterized through the concept of neighborhood. If $\mathbf{x}$ is a point in a topological space $\mathcal{X}$, its neighborhood is a set that contains an open set that contains $\mathbf{x}$.
Let $\mathcal{X}$ and $\mathcal{Y}$ be two topological spaces, and let $U \subset \mathcal{X}$ and $V \subset \mathcal{Y}$ be open subsets. Consider the family of all cartesian products of the form $U \times V$. The topology formed from these products of open subsets is called the product topology for $\mathcal{X} \times \mathcal{Y}$. If $W \subset \mathcal{X} \times \mathcal{Y}$, then $W$ is open relative to the product topology iff for each point $(x, y) \in \mathcal{X} \times \mathcal{Y}$ there are open neighborhoods, $U$ of $x$ and $V$ of $y$, such that $U \times V \subset W$. For example, the usual topology for $d$-dimensional Euclidean space $\Re^{d}$ consists of all open sets of points in $\Re^{d}$, and this topology is equivalent to the product topology for the product of $d$ copies of $\Re$.

One of the core elements of manifold learning involves the idea of “embedding” one topological space inside another. Loosely speaking, the space $\mathcal{X}$ is said to be embedded in the space $\mathcal{Y}$ if the topological properties of $\mathcal{Y}$ when restricted to $\mathcal{X}$ are identical to the topological properties of $\mathcal{X}$. To be more specific, we state the following definitions. A function $g: \mathcal{X} \rightarrow \mathcal{Y}$ is said to be continuous if the inverse image of an open set in $\mathcal{Y}$ is an open set in $\mathcal{X}$. If $g$ is a bijective (i.e., one-to-one and onto) function such that $g$ and its inverse $g^{-1}$ are continuous, then $g$ is said to be a homeomorphism. Two topological spaces $\mathcal{X}$ and $\mathcal{Y}$ are said to be homeomorphic (or topologically equivalent) if there exists a homeomorphism from one space onto the other. A topological space $\mathcal{X}$ is said to be embedded in a topological space $\mathcal{Y}$ if $\mathcal{X}$ is homeomorphic to a subspace of $\mathcal{Y}$.

机器学习代写|流形学习代写manifold data learning代考|Riemannian Manifolds

In the entire theory of topological manifolds, there is no mention of the use of calculus. However, in a prototypical application of a “manifold,” calculus enters in the form of a “smooth” (or differentiable) manifold $\mathcal{M}$, also known as a Riemannian manifold; it is usually defined in differential geometry as a submanifold of some ambient (or surrounding) Euclidean space, where the concepts of length, curvature, and angle are preserved, and where smoothness relates to differentiability. The word manifold (in German, Mannigfaltigkeit) was coined in an “intuitive” way and without any precise definition by Georg Friedrich Bernhard Riemann (1826-1866) in his 1851 doctoral dissertation (Riemann, 1851; Dieudonné, 2009); in 1854, Riemann introduced in his famous Habilitations lecture the idea of a topological manifold on which one could carry out differential and integral calculus.

A topological manifold $\mathcal{M}$ is called a smooth (or differentiable) manifold if $\mathcal{M}$ is continuously differentiable to any order. All smooth manifolds are topological manifolds, but the reverse is not necessarily true. (Note: Authors often differ on the precise definition of a “smooth” manifold.)

We now define the analogue of a homeomorphism for a differentiable manifold. Consider two open sets, $U \in \Re^{r}$ and $V \in \Re^{s}$, and let $g: U \rightarrow V$ so that for $\mathbf{x} \in U$ and $\mathbf{y} \in V, g(\mathbf{x})=$ y. If the function $g$ has finite first-order partial derivatives, $\partial y_{j} / \partial x_{i}$, for all $i=1,2, \ldots, r$, and all $j=1,2, \ldots, s$, then $g$ is said to be a smooth (or differentiable) mapping on $U$. We also say that $g$ is a $\mathcal{C}^{1}$-function on $U$ if all the first-order partial derivatives are continuous. More generally, if $g$ has continuous higher-order partial derivatives, $\partial^{k_{1}+\cdots+k_{r}} y_{j} / \partial x_{1}^{k_{1}} \cdots \partial x_{r}^{k_{r}}$, for all $j=1,2, \ldots, s$ and all nonnegative integers $k_{1}, k_{2}, \ldots, k_{r}$ such that $k_{1}+k_{2}+\cdots+k_{r} \leq r$, then we say that $g$ is a $\mathcal{C}^{\top}$-function, $r=1,2, \ldots$. If $g$ is a $\mathcal{C}^{r}$-function for all $r \geq 1$, then we say that $g$ is a $\mathcal{C}^{\infty}$-function.

If $g$ is a homeomorphism from an open set $U$ to an open set $V$, then it is said to be a $\mathcal{C}^{r}$-diffeomorphism if $g$ and its inverse $g^{-1}$ are both $\mathcal{C}^{r}$-functions. A $\mathcal{C}^{\infty}$-diffeomorphism is simply referred to as a diffeomorphism. We say that $U$ and $V$ are diffeomorphic if there exists a diffeomorphism between them. These definitions extend in a straightforward way to manifolds. For example, if $\mathcal{X}$ and $\mathcal{Y}$ are both smooth manifolds, the function $g: \mathcal{X} \rightarrow \mathcal{Y}$ is a diffeomorphism if it is a homeomorphism from $\mathcal{X}$ to $\mathcal{Y}$ and both $g$ and $g^{-1}$ are smooth. Furthermore, $\mathcal{X}$ and $\mathcal{Y}$ are diffeomorphic if there exists a diffeomorphism between them, in which case, $\mathcal{X}$ and $\mathcal{Y}$ are essentially indistinguishable from each other.

流形学习代写

机器学习代写|流形学习代写manifold data learning代考|Topological Spaces

Maurice Fréchet (1906) 引入了拓扑空间（以度量空间的形式），这个想法在接下来的几十年中得到发展和扩展。对这个主题做出重大贡献的人中有 Felix Hausdorff，他在 1914 年使用 Johann Benedict Listing 的德语单词 Topologie 创造了“拓扑空间”一词。1847.
拓扑空间 $\mathcal{X}$ 是子集的非空集合 $\mathcal{X}$ 它包含空集、空间本身以及这些集合的任意并集和有限交集。拓扑空间通常表示为 $(\mathcal{X}, \mathcal{T})$ ，在哪里 $\mathcal{T}$ 表示与相关的拓扑 $\mathcal{X}$. 的元素 $\mathcal{T}$ 被称为开集 $\mathcal{X}$ ，如果它的补集是开集，则它是闭集。拓扑空间也可以通过邻域的概念来表征。如果 $\mathbf{x}$ 是拓扑空间中的一个点 $\mathcal{X}$ ，它的邻域是一个包含一个开集的集合，其中包含 $\mathbf{x}$.
让 $\mathcal{X}$ 和 $\mathcal{Y}$ 是两个拓扑空间，令 $U \subset \mathcal{X}$ 和 $V \subset \mathcal{Y}$ 是开放子集。考虑以下形式的所有笛卡尔积的族 $U \times V$. 由这些开放子集的乘积形成的拓扑称为乘积拓扑 $\mathcal{X} \times \mathcal{Y}$. 如果 $W \subset \mathcal{X} \times \mathcal{Y}$ ，然后 $W$ 对于每个点相对于产品拓扑是开放的 $(x, y) \in \mathcal{X} \times \mathcal{Y}$ 有开放的社区， $U$ 的 $x$ 和 $V$ 的 $y$ ，这样 $U \times V \subset W$. 例如，通常的拓扑 $d$ 维欧几里得空间 $\Re^{d}$ 由所有开集的点组成 $\Re^{d}$, 这个拓扑等价于的乘积的乘积拓扑 $d$ 的副本 $\Re$.
流形学习的核心要素之一涉及将一个拓扑空间“嵌入“另一个拓扑空间的想法。说白了就是空间 $\mathcal{X}$ 据说嵌入空间 $\mathcal{Y}$ 如果拓扑性质Y)当限制在 $\mathcal{X}$ 与拓扑性质相同 $\mathcal{X}$. 更具体地说，我们陈述以下定义。一个函数 $g: \mathcal{X} \rightarrow \mathcal{Y}$ 如果一个开集的逆像在 $\mathcal{Y}$ 是一个开集 $\mathcal{X}$. 如果 $g$ 是一个双射（即，一对一和上）函数，使得 $g$ 和它的逆 $g^{-1}$ 是连续的，那么 $g$ 据说是同胚。两个拓扑空间 $\mathcal{X}$ 和 $\mathcal{Y}$ 如果存在从一个空间到另一个空间的同胚，则称其是同胚的（或拓扑等价的）。拓扑空间 $\mathcal{X}$ 据说嵌入在拓扑空间中 $\mathcal{Y}$ 如果 $\mathcal{X}$ 同胚于一个子空间 $\mathcal{Y}$.

机器学习代写|流形学习代写manifold data learning代考|Riemannian Manifolds

在拓扑流形的整个理论中，没有提到微积分的使用。然而，在“流形”的原型应用中，微积分以“平滑” (或可微分) 流形的形式出现 $\mathcal{M}$ ，也称为黎曼流形；它通常在微分几何中定义为一些周围 (或周围) 欧几里得空间的子流形，其中保留了长度、曲率和角度的概念，并且平滑度与可微性相关。Georg Friedrich Bernhard Riemann (18261866) 在他 1851 年的博士论文 (Riemann, 1851; Dieudonné, 2009) 中以“直观”的方式创造了流形这个词 (德语，Mannigfaltigkeit)，没有任何精确的定义；1854 年，黎曼在他著名的 Habilitations 演讲中介绍了拓扑流形的概念，人们可以在该流形上进行微分和积分。
拓扑流形 $\mathcal{M}$ 称为光滑（或可微) 流形，如果 $\mathcal{M}$ 连续可微分到任意阶。所有光滑流形都是拓扑流形，但反过来不一定正确。（注：作者经常对“平滑”流形的精确定义存在分歧。）
我们现在为可微流形定义同胚的类比。考虑两个开集， $U \in \Re^{r}$ 和 $V \in \Re^{s}$ ，然后让 $g: U \rightarrow V$ 所以对于 $\mathbf{x} \in U$ 和 $\mathbf{y} \in V, g(\mathbf{x})=$ 是的。如果函数 $g$ 具有有限的一阶偏导数， $\partial y_{j} / \partial x_{i}$ ，对所有人 $i=1,2, \ldots, r$ ，和所有 $j=1,2, \ldots, s$ ，然后 $g$ 据说是一个平滑的 (或可微的) 映射 $U$. 我们也说 $g$ 是一个 $\mathcal{C}^{1}$ – 功能开启 $U$ 如果所有一阶偏导数都是连续的。更一般地说，如果 $g$ 具有连续的高阶偏导数， $\partial^{k_{1}+\cdots+k_{r}} y_{j} / \partial x_{1}^{k_{1}} \cdots \partial x_{r}^{k_{r}}$ ，对所有人 $j=1,2, \ldots, s$ 和所有非负整数 $k_{1}, k_{2}, \ldots, k_{r}$ 这样 $k_{1}+k_{2}+\cdots+k_{r} \leq r$ ，那么我们说 $g$ 是一个 $\mathcal{C}^{\top}$-功能，
如果 $g$ 是开集的同胚 $U$ 对开集 $V$ ，则称其为 $\mathcal{C}^{r}$-微分同胚如果 $g$ 和它的逆 $g^{-1}$ 都是 $\mathcal{C}^{r}$-功能。一个 $-$ 溦分同胚简称为微分同胚。我们说 $U$ 和 $V$ 如果它们之间存在微分同胚，则它们是微分同胚的。这些定义以直接的方式扩展到流形。例如，如果 $\mathcal{X}$ 和 $\mathcal{Y}$ 都是光滑流形，函数 $g: \mathcal{X} \rightarrow \mathcal{Y}$ 如果它是同胚，则它是微分同胚 $\mathcal{X}$ 至 $\mathcal{Y}$ 和两者 $g$ 和 $g^{-1}$ 光滑。此外， $\mathcal{X}$ 和 $\mathcal{Y}$ 如果它们之间存在微分同胚，则它们是微分同胚的，在这种情况下， $\mathcal{X}$ 和 $\mathcal{Y}$ 本质上是无法区分的。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

机器学习代写|流形学习代写manifold data learning代考|EECS 559a

Posted on 2022年7月20日2022年7月20日 by statistics-lab

如果你也在怎样代写流形学习manifold data learning这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的流形学习manifold data learning及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

机器学习代写|流形学习代写manifold data learning代考|EECS 559a

机器学习代写|流形学习代写manifold data learning代考|Spectral Embedding Methods for Manifold Learning

Manifold learning encompasses much of the disciplines of geometry, computation, and statistics, and has become an important research topic in data mining and statistical learning. The simplest description of manifold learning is that it is a class of algorithms for recovering a low-dimensional manifold embedded in a high-dimensional ambient space. Major breakthroughs on methods for recovering low-dimensional nonlinear embeddings of highdimensional data (Tenenbaum, de Silva, and Langford, 2000; Roweis and Saul, 2000) led to the construction of a number of other algorithms for carrying out nonlinear manifold learning and its close relative, nonlinear dimensionality reduction. The primary tool of all embedding algorithms is the set of eigenvectors associated with the top few or bottom few eigenvalues of an appropriate random matrix. We refer to these algorithms as spectral embedding methods. Spectral embedding methods are designed to recover linear or nonlinear manifolds, usually in high-dimensional spaces.

Linear methods, which have long been considered part-and-parcel of the statistician’s toolbox, include PRINCIPAL COMPONENT ANALYSIS (PCA) and MULTIDIMENSIONAL SCALING (MDS). PCA has been used successfully in many different disciplines and applications. In computer vision, for example, PCA is used to study abstract notions of shape, appearance, and motion to help solve problems in facial and object recognition, surveillance, person tracking, security, and image compression where data are of high dimensionality (Turk and Pentland, 1991; De la Torre and Black, 2001). In astronomy, where very large digital sky surveys have become the norm, PCA has been used to analyze and classify stellar spectra, carry out morphological and spectral classification of galaxies and quasars, and analyze images of supernova remnants (Steiner, Menezes, Ricci, and Oliveira, 2009). In bioinformatics, PCA has been used to study high-dimensional data generated by genome-wide, gene-expression experiments on a variety of tissue sources, where scatterplots of the top principal components in such studies often show specific classes of genes that are expressed by different clusters of distinctive biological characteristics (Yeung and Ruzzo, 2001; ZhengBradley, Rung, Parkinson, and Brazma, 2010). PCA has also been used to select an optimal subset of single nucleotide polymorphisms (SNPs) (Lin and Altman, 2004). PCA is also used to derive approximations to more complicated nonlinear subspaces, including problems involving data interpolation, compression, denoising, and visualization.

机器学习代写|流形学习代写manifold data learning代考|Spaces and Manifolds

Manifold learning involves concepts from general topology and differential geometry. Good introductions to topological spaces include Kelley (1955), Willard (1970), Bourbaki (1989), Mendelson (1990), Steen (1995), James (1999), and several of these have since been reprinted. Books on differential geometry include Spivak (1965), Kreyszig (1991), Kühnel (2000), Lee (2002), and Pressley (2010).

Manifolds generalize the notions of curves and surfaces in two and three dimensions to higher dimensions. Before we give a formal description of a manifold, it will be helpful to visualize the notion of a manifold. Imagine an ant at a picnic, where there are all sorts of items from cups to doughnuts. The ant crawls all over the picnic items, but because of its tiny size, the ant sees everything on a very small scale as flat and featureless. Similarly, a human, looking around at the immediate vicinity, would not see the curvature of the earth. A manifold (also referred to as a topological manifold) can be thought of in similar terms, as a topological space that locally looks flat and featureless and behaves like Euclidean space. Unlike a metric space, a topological space has no concept of distance. In this Section, we review specific definitions and ideas from topology and differential geometry that enable us to provide a useful definition of a manifold.

流形学习代写

机器学习代写|流形学习代写manifold data learning代考|Spectral Embedding Methods for Manifold Learning

流形学习涵盖了几何、计算和统计学的大部分学科，已成为数据挖掘和统计学习的重要研究课题。流形学习最简单的描述是它是一类用于恢复嵌入在高维环境空间中的低维流形的算法。恢复高维数据的低维非线性嵌入方法的重大突破（Tenenbaum、de Silva 和 Langford，2000；Roweis 和 Saul，2000）导致构建了许多其他用于执行非线性流形学习的算法及其关闭相对的，非线性的降维。所有嵌入算法的主要工具是与适当随机矩阵的顶部几个或底部几个特征值相关联的特征向量集。我们将这些算法称为谱嵌入方法。谱嵌入方法旨在恢复线性或非线性流形，通常在高维空间中。

长期以来，线性方法一直被认为是统计学家工具箱的重要组成部分，包括主成分分析 (PCA) 和多维缩放 (MDS)。PCA 已成功用于许多不同的学科和应用。例如，在计算机视觉中，PCA 用于研究形状、外观和运动的抽象概念，以帮助解决面部和物体识别、监视、人员跟踪、安全和图像压缩中的高维数据问题（Turk 和彭特兰，1991 年；德拉托雷和布莱克，2001 年）。在天文学中，超大型数字巡天已成为常态，PCA 已被用于分析和分类恒星光谱，对星系和类星体进行形态和光谱分类，以及分析超新星遗迹的图像（Steiner、Menezes、Ricci 和奥利维拉，2009）。在生物信息学中，PCA 已被用于研究由对各种组织来源的全基因组基因表达实验产生的高维数据，其中此类研究中主要主要成分的散点图通常显示特定类别的基因，这些基因由不同的具有独特生物学特征的集群（Yeung 和 Ruzzo，2001；ZhengBradley、Rung、Parkinson 和 Brazma，2010）。PCA 还被用于选择单核苷酸多态性 (SNP) 的最佳子集 (Lin and Altman, 2004)。PCA 还用于推导更复杂的非线性子空间的近似值，包括涉及数据插值、压缩、去噪和可视化的问题。对各种组织来源的基因表达实验，其中此类研究中主要主要成分的散点图通常显示特定类别的基因，这些基因由不同的独特生物学特征簇表达（Yeung 和 Ruzzo，2001；ZhengBradley，Rung，Parkinson，和布拉兹马，2010）。PCA 还被用于选择单核苷酸多态性 (SNP) 的最佳子集 (Lin and Altman, 2004)。PCA 还用于推导更复杂的非线性子空间的近似值，包括涉及数据插值、压缩、去噪和可视化的问题。对各种组织来源的基因表达实验，其中此类研究中主要主要成分的散点图通常显示特定类别的基因，这些基因由不同的独特生物学特征簇表达（Yeung 和 Ruzzo，2001；ZhengBradley，Rung，Parkinson，和布拉兹马，2010）。PCA 还被用于选择单核苷酸多态性 (SNP) 的最佳子集 (Lin and Altman, 2004)。PCA 还用于推导更复杂的非线性子空间的近似值，包括涉及数据插值、压缩、去噪和可视化的问题。帕金森和布拉兹马，2010）。PCA 还被用于选择单核苷酸多态性 (SNP) 的最佳子集 (Lin and Altman, 2004)。PCA 还用于推导更复杂的非线性子空间的近似值，包括涉及数据插值、压缩、去噪和可视化的问题。帕金森和布拉兹马，2010）。PCA 还被用于选择单核苷酸多态性 (SNP) 的最佳子集 (Lin and Altman, 2004)。PCA 还用于推导更复杂的非线性子空间的近似值，包括涉及数据插值、压缩、去噪和可视化的问题。

机器学习代写|流形学习代写manifold data learning代考|Spaces and Manifolds

流形学习涉及来自一般拓扑和微分几何的概念。对拓扑空间的良好介绍包括 Kelley (1955)、Willard (1970)、Bourbaki (1989)、Mendelson (1990)、Steen (1995)、James (1999)，其中一些已被重印。有关微分几何的书籍包括 Spivak (1965)、Kreyszig (1991)、Kühnel (2000)、Lee (2002) 和 Pressley (2010)。

流形将二维和三维曲线和曲面的概念推广到更高维度。在我们正式描述流形之前，可视化流形的概念会很有帮助。想象一只蚂蚁在野餐，那里有各种各样的物品，从杯子到甜甜圈。蚂蚁在野餐物品上爬来爬去，但由于它的体积很小，蚂蚁在非常小的尺度上看到的一切都是平坦的、毫无特色的。同样，一个人环顾四周，看不到地球的曲率。流形（也称为拓扑流形）可以用类似的术语来理解，即局部看起来平坦且无特征的拓扑空间，其行为类似于欧几里得空间。与度量空间不同，拓扑空间没有距离的概念。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

机器学习代写|流形学习代写manifold data learning代考|Preserving the Estimated Density

Posted on 2022年5月10日2022年5月10日 by statistics-lab

如果你也在怎样代写流形学习manifold data learning这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的流形学习manifold data learning及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

此图片的alt属性为空；文件名为16521601791.png — 机器学习代写|流形学习代写manifold data learning代考|Preserving the Estimated Density

机器学习代写|流形学习代写manifold data learning代考|The Optimization

Now that we have a method to estimate the density on a submanifold of $\mathbb{R}^{D}$, we can proceed to define an algorithm for density preserving maps. ${ }^{9}$ Suppose we are given a sample $X=\left{x_{1}, x_{2}, \ldots, x_{m}\right}$ of $m$ data points $x_{i} \in \mathbb{R}^{D}$ that live on a $d$-dimensional submanifold $M$ of $\mathbb{R}^{D}$. We first proceed to estimate the density at each one of the points, by using a slightly generalized version of the submanifold estimator that has variable bandwidths. Denoting the bandwidth for a given evaluation point $x_{j}$ and a reference (data) point $x_{i}$ by $h_{i j}$, the generalized, variable bandwidth estimator at $x_{j}$ is, ${ }^{10}$
$$
\hat{f_{j}}=\hat{f}\left(x_{j}\right)=\frac{1}{m} \sum_{i} \frac{1}{h_{i j}^{d}} K_{d}\left(\frac{\left|x_{j}-x_{i}\right|_{D}}{h_{i j}}\right) .
$$
Variable bandwidth methods allow the estimator to adapt to the inhomogeneities in the data. Various approaches exist for picking the bandwidths $h_{i j}$ as functions of the query (evaluation) point $x_{j}$ and/or the reference point $x_{i}[25]$. Here, we focus on the $k$ th-nearest neighbor approach for evaluation points, i.e., we take $h_{i j}$ to depend only on the evaluation point $x_{j}$, and we let $h_{i j}=h_{j}=$ the distance of the $k$ th nearest data (reference) point to the evaluation point $x_{j}$. Here, $k$ is a free parameter that needs to be picked by the user. However, instead of tuning it by hand, one can use a leave-one-out cross-validation score [25] such as the log-likelihood score for the density estimate to pick the best value. This is done by estimating the log-likelihood of each data point by using the leave-one-out version

of the density estimate $(3.7)$ for a range of $k$ values, and picking the $k$ that gives the highest log-likelihood.

Now, given the estimates $\hat{f}{j}=\hat{f}\left(x{j}\right)$ of the submanifold density at the $D$-dimensional data points $x_{j}$, we want to find a $d$-dimensional representation $X^{\prime}=\left{x_{1}^{\prime}, x_{2}^{\prime}, \ldots, x_{m}^{\prime}\right}$, $x_{i}^{\prime} \in \mathbb{R}^{d}$ such that the new estimates $\hat{f}{i}^{\prime}$ at the points $x{i}^{\prime} \in \mathbb{R}^{d}$ agree with the original density estimates, i.e.,
$$
\hat{f}{i}^{\prime}=\hat{f}{i}, \quad i=1, \ldots, m .
$$
For this purpose, one can attempt, for example, to minimize the mean squared deviation of $\hat{f}{i}^{\prime}$ from $\hat{f}{i}$ as a function of the $x_{i}^{\prime}$ s, but such an approach would result in a non-convex optimization problem with many local minima. We formulate an alternative approach involving semidefinite programming, for the special case of the Epanechnikov kernel [25], which is known to be asymptotically optimal for density estimation, and is convenient for formulating a convex optimization problem for the matrix of inner products (the Gram matrix, or the kernel matrix) of the low dimensional data set $X^{\prime}$.

机器学习代写|流形学习代写manifold data learning代考|The Optimization

The Epanechnikov kernel. The Epanechnikov kernel $k_{e}$ in $d$ dimensions is defined as,
$$
k_{e}\left(\left|x_{i}-x_{j}\right|\right)=\left{\begin{array}{cc}
N_{e}\left(1-\left|x_{i}-x_{j}\right|^{2}\right), & 0 \leq\left|x_{i}-x_{j}\right| \leq 1 \
0, & 1 \leq\left|x_{i}-x_{j}\right|
\end{array}\right.
$$
where $N_{e}$ is the normalization constant that ensures $\int_{\mathbb{R}{d}} k{e}\left(\left|x-x^{\prime}\right|\right) d^{d} x^{\prime}=1$. We will assume that the kernel used in the estimates $\hat{f}{i}$ and $\hat{f}{i}^{\prime}$ of the density via (3.7) is the Epanechnikov kernel. Owing to its quadratic form (3.9), this kernel facilitates the formulation of a convex optimization problem. Instead of seeking the dimensionally reduced version $X^{\prime}=\left{x_{1}^{\prime}, \ldots, x_{n}^{\prime}\right}$ of the data set directly, we will first aim to obtain the kernel matrix $K_{i j}=x_{i}^{\prime} \cdot x_{j}^{\prime}$ for the low-dimensional data points. This is a common approach in the manifold learning literature, where one obtains the low-dimensional data points themselves from the $K_{i j}$ via a singular value decomposition.

We next formulate the DPM optimization problem using the Epanechnikov kernel, and comment on the motivation behind it. As in the case of distance-based manifold learning methods, there will likely be various approaches to density-preserving dimensional reduction, some computationally more efficient than the one discussed here. We hope the discussions in this chapter will stimulate further research in this area.

Given the estimated densities $\hat{f}{i}$, we seek a symmetric, positive semidefinite inner product matrix $K{i j}=x_{i}^{\prime} \cdot x_{j}^{\prime}$ that results in $d$-dimensional density estimates that agree with $\hat{f}_{i}$. In order to deal with the non-uniqueness problem mentioned during our discussion of densitypreserving maps between manifolds (which likely carries over to the discrete setting), we need to pick a suitable objective function to maximize. We choose the objective function to be the same as that of Maximum Variance Unfolding (MVU) [29], namely, $\operatorname{trace}(K)$. After getting rid of translations by constraining the center of mass of the dimensionally reduced data points to the origin, maximizing the objective function trace $(K)$ becomes equivalent to maximizing the sum of the squared distances between the data points [29].

While the objective function for DPM is the same as that of MVU, the constraints of the former will be weaker. Instead of preserving the distances between $k$-nearest neighbors, the DPM optimization defined below preserves the total contribution of the original $k$-nearest neighbors to the density estimate at the data points. As opposed to MVU, this allows for local stretches of the data set, and results in optimal kernel matrices $K$ that can be faithfully represented by a smaller number of dimensions than the intrinsic dimensionality suggested by MVU. For instance, while MVU is capable of unrolling data on the Swiss roll onto a flat plane, it is impossible to lay data from a spherical cap onto the plane while keeping the distances to the $k$ th nearest neighbors fixed. ${ }^{11}$ Thus, the constraints of the optimization in MVU are too stringent to give an inner product matrix $K$ of rank 2, when the original data is on an intrinsically curved surface in $\mathbb{R}^{3}$. We will see below that the looser constraints of DPM allow it to do a better job in capturing the intrinsic dimensionality of a curved surface.

机器学习代写|流形学习代写manifold data learning代考|Summary

In this chapter, we discussed density preserving maps, a density-based alternative to distancebased methods of manifold learning. This method aims to perform dimensionality reduction on large-dimensional data sets in a way that preservs their density. By using a classical result due to Moser, we proved that density preserving maps to $\mathbb{R}^{d}$ exist even for data on intrinsically curved $d$-dimensional submanifolds of $\mathbb{R}^{D}$ that are globally, or topologically “simple.” Since the underlying probability density function is arguably one of the most fundamental statistical quantities pertaining to a data set, a method that preserves densities while performing dimensionality reduction is guaranteed to preserve much valuable structure in the data. While distance-preserving approaches distort data on intrinsically curved spaces in various ways, density preserving maps guarantee that certain fundamental statistical information is conserved.

We reviewed a method of estimating the density on a submanifold of Euclidean space. This method was a slightly modified version of the classical method of kernel density estimation, with the additional property that the convergence rate was determined by the intrinsic dimensionality of the data, instead of the full dimensionality of the Euclidean space the data was embedded in. We made a further modification on this estimator to allow for variable “bandwidths,” and used it with a specific kernel function to set up a semidefinite optimization problem for a proof-of-concept approach to density preserving maps. The objective function used was identical to the one in Maximum Variance Unfolding [29], but the constraints were significantly weaker than the distance-preserving constraints in MVU. By testing the methods on two relatively small, synthetic data sets, we experimentally confirmed the theoretical expectations and showed that density preserving maps are better in detecting and reducing to the intrinsic dimensionality of the data than some of the commonly used distance-based approaches that also work by first estimating a kernel matrix.
While the initial formulation presented in this chapter is not yet scalable to large data sets, we hope our discussion will motivate our readers to pursue the idea of density preserving maps further, and explore alternative, superior formulations. One possible approach to speeding up the computation is to use fast semidefinite programming techniques [4].

流形学习代写

机器学习代写|流形学习代写manifold data learning代考|The Optimization

现在我们有了一种方法来估计子流形上的密度RD，我们可以继续定义密度保持图的算法。9假设我们有一个样本X=\left{x_{1}, x_{2}, \ldots, x_{m}\right}X=\left{x_{1}, x_{2}, \ldots, x_{m}\right}的米数据点X一世∈RD生活在d维子流形米的RD. 我们首先通过使用具有可变带宽的子流形估计器的稍微概括的版本来估计每个点的密度。表示给定评估点的带宽Xj和一个参考（数据）点X一世经过H一世j，广义的可变带宽估计器Xj是，10
Fj^=F^(Xj)=1米∑一世1H一世jdķd(|Xj−X一世|DH一世j).
可变带宽方法允许估计器适应数据中的不均匀性。存在多种选择带宽的方法H一世j作为查询（评估）点的函数Xj和/或参考点X一世[25]. 在这里，我们专注于ķ评估点的th最近邻方法，即我们取H一世j仅取决于评估点Xj，我们让H一世j=Hj=的距离ķ距离评估点最近的数据（参考）点Xj. 这里，ķ是一个需要用户选择的自由参数。但是，可以使用留一法交叉验证分数 [25]（例如密度估计的对数似然分数）来选择最佳值，而不是手动调整它。这是通过使用留一法估计每个数据点的对数似然来完成的

的密度估计(3.7)对于一系列ķ值，并选择ķ这给出了最高的对数似然。

现在，鉴于估计F^j=F^(Xj)的子流形密度D维数据点Xj，我们想找到一个d维表示X^{\prime}=\left{x_{1}^{\prime}, x_{2}^{\prime}, \ldots, x_{m}^{\prime}\right}X^{\prime}=\left{x_{1}^{\prime}, x_{2}^{\prime}, \ldots, x_{m}^{\prime}\right}, X一世′∈Rd这样新的估计F^一世′在点X一世′∈Rd与原始密度估计一致，即
F^一世′=F^一世,一世=1,…,米.
为此，可以尝试，例如，最小化F^一世′从F^一世作为一个函数X一世′s，但这种方法会导致具有许多局部最小值的非凸优化问题。我们制定了一种涉及半定规划的替代方法，用于 Epanechnikov 核 [25] 的特殊情况，已知它对于密度估计是渐近最优的，并且便于制定内积矩阵的凸优化问题（Gram低维数据集的矩阵或核矩阵）X′.

机器学习代写|流形学习代写manifold data learning代考|The Optimization

Epanechnikov 内核。Epanechnikov 内核ķ和在d维度定义为
$$
k_{e}\left(\left|x_{i}-x_{j}\right|\right)=\left{ñ和(1−|X一世−Xj|2),0≤|X一世−Xj|≤1 0,1≤|X一世−Xj|\对。
$$
在哪里ñ和是确保 $\int_{\mathbb{R} {d}} k {e}\left(\left|xx^{\prime}\right|\right) d^{d} x^{\素数}=1.在和在一世ll一种ss在米和吨H一种吨吨H和ķ和rn和l在s和d一世n吨H和和s吨一世米一种吨和s\帽子{f} {i}一种nd\hat{f} {i}^{\素数}这F吨H和d和ns一世吨是在一世一种(3.7)一世s吨H和和p一种n和CHn一世ķ这在ķ和rn和l.这在一世nG吨这一世吨sq在一种dr一种吨一世CF这r米(3.9),吨H一世sķ和rn和lF一种C一世l一世吨一种吨和s吨H和F这r米在l一种吨一世这n这F一种C这n在和X这p吨一世米一世和一种吨一世这npr这bl和米.一世ns吨和一种d这Fs和和ķ一世nG吨H和d一世米和ns一世这n一种ll是r和d在C和d在和rs一世这nX^{\prime}=\left{x_{1}^{\prime}, \ldots, x_{n}^{\prime}\right}这F吨H和d一种吨一种s和吨d一世r和C吨l是,在和在一世llF一世rs吨一种一世米吨这这b吨一种一世n吨H和ķ和rn和l米一种吨r一世XK_{ij}=x_{i}^{\prime} \cdot x_{j}^{\prime}F这r吨H和l这在−d一世米和ns一世这n一种ld一种吨一种p这一世n吨s.吨H一世s一世s一种C这米米这n一种ppr这一种CH一世n吨H和米一种n一世F这ldl和一种rn一世nGl一世吨和r一种吨在r和,在H和r和这n和这b吨一种一世ns吨H和l这在−d一世米和ns一世这n一种ld一种吨一种p这一世n吨s吨H和米s和l在和sFr这米吨H和K_{ij}$ 通过奇异值分解。

接下来，我们使用 Epanechnikov 内核制定 DPM 优化问题，并评论其背后的动机。与基于距离的流形学习方法一样，可能会有各种方法来保持密度降维，其中一些方法在计算上比这里讨论的方法更有效。我们希望本章的讨论将激发该领域的进一步研究。

给定估计的密度F^一世，我们寻求一个对称的半正定内积矩阵ķ一世j=X一世′⋅Xj′这导致d维密度估计符合F^一世. 为了处理我们在讨论流形之间的密度保持映射时提到的非唯一性问题（这可能会延续到离散设置），我们需要选择一个合适的目标函数来最大化。我们选择的目标函数与最大方差展开（MVU）[29]的目标函数相同，即痕迹⁡(ķ). 通过将降维数据点的质心约束到原点来消除平移后，最大化目标函数轨迹(ķ)变得等效于最大化数据点之间的平方距离之和 [29]。

虽然 DPM 的目标函数与 MVU 的目标函数相同，但前者的约束会更弱。而不是保留之间的距离ķ-最近邻，下面定义的 DPM 优化保留原始的总贡献ķ- 数据点处密度估计的最近邻。与 MVU 不同，这允许数据集的局部延伸，并产生最佳内核矩阵ķ可以用比 MVU 建议的固有维度更少的维度来忠实地表示。例如，虽然 MVU 能够将瑞士卷上的数据展开到平面上，但不可能将球冠上的数据放在平面上，同时保持与球冠的距离。ķth 最近的邻居固定。11因此，MVU 中优化的约束过于严格，无法给出内积矩阵ķ等级 2，当原始数据在一个固有曲面上时R3. 我们将在下面看到，DPM 更宽松的约束使其能够更好地捕捉曲面的内在维度。

机器学习代写|流形学习代写manifold data learning代考|Summary

在本章中，我们讨论了密度保持图，这是一种基于密度的流形学习方法的基于距离的替代方法。该方法旨在以保持其密度的方式对大维数据集执行降维。通过使用 Moser 的经典结果，我们证明了密度保持映射到Rd甚至对于内在弯曲的数据也存在d的维子流形RD全局或拓扑“简单”的。由于潜在的概率密度函数可以说是与数据集有关的最基本的统计量之一，因此在执行降维的同时保留密度的方法可以保证在数据中保留许多有价值的结构。虽然距离保持方法以各种方式扭曲了内在弯曲空间上的数据，但密度保持图保证了某些基本统计信息的保存。

我们回顾了一种估计欧几里得空间子流形上的密度的方法。该方法是经典核密度估计方法的略微修改版本，具有收敛速度由数据的固有维度决定的附加属性，而不是数据嵌入的欧几里得空间的全维度。我们对此估计器进行了进一步修改以允许可变“带宽”，并将其与特定的核函数一起使用，为密度保持图的概念验证方法设置半定优化问题。使用的目标函数与最大方差展开 [29] 中的目标函数相同，但约束明显弱于 MVU 中的距离保持约束。通过对两种比较少的方法进行测试，
虽然本章介绍的初始公式还不能扩展到大型数据集，但我们希望我们的讨论能够激发我们的读者进一步追求密度保持图的想法，并探索替代的、优越的公式。加速计算的一种可能方法是使用快速半定编程技术[4]。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

机器学习代写|流形学习代写manifold data learning代考| Density Estimation on Submanifolds

Posted on 2022年5月10日2022年5月10日 by statistics-lab

如果你也在怎样代写流形学习manifold data learning这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的流形学习manifold data learning及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

机器学习代写|流形学习代写manifold data learning代考| Density Estimation on Submanifolds

机器学习代写|流形学习代写manifold data learning代考|Introduction

Kernel density estimation (KDE) [21] is one of the most popular methods of estimating the underlying probability density function (PDF) of a data set. Roughly speaking, KDE consists of having the data points contribute to the estimate at a given point according to their distances from that point – closer the point, the bigger the contribution. More precisely, in the simplest multi-dimensional KDE [5], the estimate $\hat{f}{m}\left(\mathbf{y}{0}\right)$ of a PDF $f\left(\mathbf{y}{0}\right)$ at a point $\mathbf{y}{0} \in \mathbb{R}^{D}$ is given in terms of a sample $\left{\mathbf{y}{1}, \ldots, \mathbf{y}{m}\right}$ as,
$$
\hat{f}{m}\left(\mathbf{y}{0}\right)=\frac{1}{m} \sum_{i=1}^{m} \frac{1}{h_{m}^{D}} K\left(\frac{\left|\mathbf{y}{i}-\mathbf{y}{0}\right|}{h_{m}}\right),
$$
where $h_{m}>0$, the bandwidth, is chosen to approach to zero in a suitable manner as the number $m$ of data points increases, and $K:[0, \infty) \rightarrow[0, \infty)$ is a kernel function that satisfies certain properties such as boundedness. Various theorems exist on the different types and rates of convergence of the estimator to the correct result. The earliest result on the pointwise convergence rate in the multivariable case seems to be given in [5], where it is stated that under certain conditions for $f$ and $K$, assuming $h_{m} \rightarrow 0$ and $m h_{m}^{D} \rightarrow \infty$ as $m \rightarrow \infty$, the mean squared error in the estimate $\hat{f}\left(\mathbf{y}{0}\right)$ of the density at a point goes to zero with the rate, $$ \operatorname{MSE}\left[\hat{f}{m}\left(\mathbf{y}{0}\right)\right]=\mathrm{E}\left[\left(\hat{f}{m}\left(\mathbf{y}{0}\right)-f\left(\mathbf{y}{0}\right)\right)^{2}\right]=O\left(h_{m}^{4}+\frac{1}{m h_{m}^{D}}\right)
$$
as $m \rightarrow \infty$. If $h_{m}$ is chosen to be proportional to $m^{-1 /(D+4)}$, one gets,
$$
\operatorname{MSE}\left[\hat{f}{m}(p)\right]=O\left(\frac{1}{m^{4 /(D+4)}}\right), $$ as $m \rightarrow \infty$. The two conditions $h{m} \rightarrow 0$ and $m h_{m}^{D} \rightarrow \infty$ ensure that, as the number of data points increases, the density estimate at a point is determined by the values of the density in a smaller and smaller region around that point, but the number of data points contributing to the estimate (which is roughly proportional to the volume of a region of size $h_{m}$ ) grows unboundedly, respectively.

机器学习代写|流形学习代写manifold data learning代考|Motivation for the Submanifold Estimator

We would like to estimate the values of a PDF that lives on an (unknown) $d$-dimensional Riemannian submanifold $M$ of $\mathbb{R}^{D}$, where $d<D$. Usually, $D$-dimensional KDE does not work for such a distribution. This can be intuitively understood by considering a distribution on a line in the plane: 1-dimensional KDE performed on the line (with a bandwidth $h_{m}$ satisfying the asymptotics given above) would converge to the correct density on the line, but 2-dimensional KDE, differing from the former only by a normalization factor that blows up as the bandwidth $h_{m} \rightarrow 0$ (compare (3.1) for the cases $D=2$ and $D=1$ ), diverges. This behavior is due to the fact that, similar to a “delta function” distribution on $\mathbb{R}$, the $D$-dimensional density of a distribution on a $d$-dimensional submanifold of $\mathbb{R}^{D}$ is, strictly speaking, undefined – the density is zero outside the submanifold, and in order to have proper normalization, it has to be infinite on the submanifold. More formally, the $D$ dimensional probability measure for a $d$-dimensional PDF supported on $M$ is not absolutely continuous with respect to the Lebesgue measure on $\mathbb{R}^{D}$, and does not have a probability

density function on $\mathbb{R}^{D}$. If one attempts to use $D$-dimensional KDE for data drawn from such a probability measure, the estimator will “attempt to converge” to a singular PDF; one that is infinite on $M$, zero outside.

For a distribution with support on a line in the plane, we can resort to 1-dimensional KDE to get the correct density on the line, but how could one estimate the density on an unknown, possibly curved submanifold of dimension $d<D$ ? Essentially the same approach works: even for data that lives on an unknown, curved $d$-dimensional submanifold of $\mathbb{R}^{D}$, it suffices to use the $d$-dimensional kernel density estimator with the Euclidean distance on $\mathbb{R}^{D}$ to get a consistent estimator of the submanifold density. Furthermore, the convergence rate of this estimator can be bounded as in (3.3), with $D$ being replaced by $d$, the intrinsic dimension of the submanifold. [20]

The intuition behind this approach is based on three facts: 1) For small bandwidths, the main contribution to the density estimate at a point comes from data points that are nearby; 2) For small distances, a $d$-dimensional Riemannian manifold “looks like” $\mathbb{R}^{d}$, and densities in $\mathbb{R}^{d}$ should be estimated by a $d$-dimensional kernel, instead of a $D$-dimensional one; and 3) For points of $M$ that are close to each other, the intrinsic distances as measured on $M$ are close to Euclidean distances as measured in the surrounding $\mathbb{R}^{D}$. Thus, as the number of data points increases and the bandwidth is taken to be smaller and smaller, estimating the density by using a kernel normalized for $d$ dimensions and distances as measured in $\mathbb{R}^{D}$ should give a result closer and closer to the correct value.

We will next give the formal definition of the estimator motivated by these considerations, and state the theorem on its asymptotics. As in the original work of Parzen [21], the pointwise consistence of the estimator can be proven by using a bias-variance decomposition. The asymptotic unbiasedness of the estimator follows from the fact that as the bandwidth converges to zero, the kernel function becomes a “delta function.” Using this fact, it is possible to show that with an appropriate choice for the vanishing rate of the bandwidth, the variance also vanishes asymptotically, completing the proof of the pointwise consistency of the estimator.

机器学习代写|流形学习代写manifold data learning代考|Statement of the Theorem

Let $(M, \mathbf{g})$ be a $d$-dimensional, embedded, complete, compact Riemannian submanifold of $\mathbb{R}^{D}(d0 .^{7}$ Let $d(p, q)=d_{p}(q)$ be the length of a length-minimizing geodesic in $M$ between $p, q \in M$, and let $u(p, q)=u_{p}(q)$ be the geodesic distance between $p$ and $q$ as measured in $\mathbb{R}^{D}$ (thus, $u(p, q)$ is simply the Euclidean distance between $p$ and $q$ in $\left.\mathbb{R}^{D}\right)$. Note that $u(p, q) \leq d(p, q)$. We will denote the Riemannian volume measure on $M$ by $V$, and the volume form by $d V .^{8}$
Theorem 3.3.1 Let $f: M \rightarrow[0, \infty)$ be a probability density function defined on $M$ (so that the related probability measure is $f V)$, and $K:[0, \infty) \rightarrow[0, \infty)$ be a continuous function that vanishes outside $[0,1)$, is differentiable with a bounded derivative in $[0,1)$, and satisfies the normalization condition, $\int_{|\mathbf{z}| \leq 1} K(|\mathbf{z}|) d^{d} \mathbf{z}=1$. Assume $f$ is differentiable to second order in a neighborhood of $p \in M$, and for a sample $q_{1}, \ldots, q_{m}$ of size $m$ drawn from the

density $f$, define an estimator $\hat{f}{m}(p)$ of $f(p)$ as, $$ \hat{f}{m}(p)=\frac{1}{m} \sum_{j=1}^{m} \frac{1}{h_{m}^{d}} K\left(\frac{u_{p}\left(q_{j}\right)}{h_{m}}\right)
$$
where $h_{m}>0$. If $h_{m}$ satisfies $\lim {m \rightarrow \infty} h{m}=0$ and $\lim {m \rightarrow \infty} m h{m}^{d}=\infty$, then, there exist non-negative numbers $m_{}, C_{b}$, and $C_{V}$ such that for all $m>m_{}$ the mean squared error of the estimator (3.4) satisfies,
$$
\operatorname{MSE}\left[\hat{f}{m}(p)\right]=\mathrm{E}\left[\left(\hat{f}{m}(p)-f(p)\right)^{2}\right]<C_{b} h_{m}^{4}+\frac{C_{V}}{m h_{m}^{d}}
$$
If $h_{m}$ is chosen to be proportional to $m^{-1 /(d+4)}$, this gives,
$$
\mathrm{E}\left[\left(f_{m}(p)-f(p)\right)^{2}\right]=O\left(\frac{1}{m^{4 /(d+4)}}\right)
$$
as $m \rightarrow \infty$.
Thus, the bound on the convergence rate of the submanifold density estimator is as in (3.2), (3.3), with the dimensionality $D$ replaced by the intrinsic dimension $d$ of $M$. As mentioned above, the proof of this theorem follows from two lemmas on the convergence rates of the bias and the variance; the $h_{m}^{4}$ term in the bound corresponds to the bias, and the $1 / m h_{m}^{d}$ term corresponds to the variance; see $[20]$ for details. This approach to submanifold density estimation was previously mentioned in [11], and the thesis [10] contains the details, although in a more technical and general approach than the elementary one followed in [20].

流形学习代写

机器学习代写|流形学习代写manifold data learning代考|Introduction

核密度估计 (KDE) [21] 是估计数据集的潜在概率密度函数 (PDF) 的最流行的方法之一。粗略地说，KDE 包括让数据点根据它们与该点的距离对给定点的估计做出贡献——越接近该点，贡献越大。更准确地说，在最简单的多维 KDE [5] 中，估计F^米(是0)PDF 的F(是0)在某一点是0∈RD以样本的形式给出\left{\mathbf{y}{1}, \ldots, \mathbf{y}{m}\right}\left{\mathbf{y}{1}, \ldots, \mathbf{y}{m}\right}作为，
F^米(是0)=1米∑一世=1米1H米Dķ(|是一世−是0|H米),
在哪里H米>0，带宽，被选择为以适当的方式接近零作为数字米数据点的增加，并且ķ:[0,∞)→[0,∞)是满足某些属性（例如有界性）的核函数。关于估计器对正确结果的不同类型和收敛速度，存在各种定理。关于多变量情况下逐点收敛速度的最早结果似乎在 [5] 中给出，其中指出在某些条件下F和ķ，假设H米→0和米H米D→∞作为米→∞, 估计中的均方误差F^(是0)一个点的密度随速率变为零，MSE⁡[F^米(是0)]=和[(F^米(是0)−F(是0))2]=这(H米4+1米H米D)
作为米→∞. 如果H米选择成正比于米−1/(D+4)，一个得到，
MSE⁡[F^米(p)]=这(1米4/(D+4)),作为米→∞. 两个条件H米→0和米H米D→∞确保随着数据点数量的增加，一个点的密度估计值由该点周围越来越小的区域中的密度值决定，但对估计值有贡献的数据点数量（大致成正比）到一个大小区域的体积H米) 分别无限增长。

机器学习代写|流形学习代写manifold data learning代考|Motivation for the Submanifold Estimator

我们想估计位于（未知）上的 PDF 的值d维黎曼子流形米的RD，在哪里d<D. 通常，D维 KDE 不适用于这样的分布。这可以通过考虑平面中一条线上的分布来直观地理解：在线上执行的一维 KDE（具有带宽H米满足上面给出的渐近线）将收敛到线上的正确密度，但是二维 KDE，与前者的区别仅在于随着带宽爆炸的归一化因子H米→0（比较（3.1）的情况D=2和D=1)，发散。这种行为是由于这样一个事实，类似于“三角函数”分布R，这D-a 上分布的维密度d维子流形RD严格来说，是未定义的——密度在子流形之外为零，并且为了进行适当的归一化，它在子流形上必须是无限的。更正式地说，D维概率测度d支持的三维PDF米就 Lebesgue 测度而言不是绝对连续的RD, 并且没有概率

密度函数RD. 如果尝试使用D从这种概率度量中提取的数据的维 KDE，估计器将“尝试收敛”为奇异 PDF；一个无限的米, 外为零。

对于平面中一条线上的支持分布，我们可以求助于一维 KDE 来获得线上的正确密度，但是如何估计未知的、可能是弯曲的维度子流形上的密度d<D? 本质上相同的方法有效：即使对于存在于未知、弯曲的数据d维子流形RD, 使用就足够了d具有欧几里得距离的维核密度估计器RD得到子流形密度的一致估计。此外，该估计器的收敛速度可以如 (3.3) 中的那样有界，其中D被取代d，子流形的内在维度。[20]

这种方法背后的直觉基于三个事实：1）对于小带宽，对一个点的密度估计的主要贡献来自附近的数据点；2) 对于小距离，ad维黎曼流形“看起来像”Rd, 和密度Rd应该由一个估计d维内核，而不是D-维一；和 3) 对于点米彼此接近的，所测量的内在距离米接近在周围测量的欧几里得距离RD. 因此，随着数据点数量的增加和带宽越来越小，通过使用归一化的内核来估计密度d测量的尺寸和距离RD应该给出越来越接近正确值的结果。

接下来我们将给出由这些考虑引起的估计量的正式定义，并在其渐近性上陈述定理。与 Parzen [21] 的原始工作一样，估计器的逐点一致性可以通过使用偏差-方差分解来证明。估计器的渐近无偏性源于这样一个事实，即随着带宽收敛到零，核函数变为“delta 函数”。利用这一事实，可以证明，有了适当的选择带宽速率，该方差也渐近地消失，完成估计器的点一致性的证明。

机器学习代写|流形学习代写manifold data learning代考|Statement of the Theorem

让(米,G)做一个d维的、嵌入的、完整的、紧致的黎曼子流形RD(d0.7让d(p,q)=dp(q)是长度最小的测地线的长度米之间p,q∈米，然后让在(p,q)=在p(q)是之间的测地线距离p和q如测量RD（因此，在(p,q)只是之间的欧几里得距离p和q在RD). 注意在(p,q)≤d(p,q). 我们将在米经过在, 体积形式为d在.8
定理 3.3.1 令F:米→[0,∞)是一个概率密度函数，定义在米（因此相关的概率测度是F在)，和ķ:[0,∞)→[0,∞)是一个在外部消失的连续函数[0,1), 可与有界导数微分[0,1)，并且满足归一化条件，∫|和|≤1ķ(|和|)dd和=1. 认为F在邻域中可微分到二阶p∈米，对于一个样本q1,…,q米大小的米取自

密度F, 定义一个估计器F^米(p)的F(p)作为，F^米(p)=1米∑j=1米1H米dķ(在p(qj)H米)
在哪里H米>0. 如果H米满足林米→∞H米=0和林米→∞米H米d=∞, 则存在非负数米,Cb，和C在这样对于所有人米>米估计器 (3.4) 的均方误差满足，
MSE⁡[F^米(p)]=和[(F^米(p)−F(p))2]<CbH米4+C在米H米d
如果H米选择成正比于米−1/(d+4)，这给出了，
和[(F米(p)−F(p))2]=这(1米4/(d+4))
作为米→∞.
因此，子流形密度估计器的收敛速度的界限如（3.2），（3.3），具有维数D被内在维度取代d的米. 如上所述，这个定理的证明来自关于偏差和方差收敛速度的两个引理；这H米4边界中的项对应于偏差，并且1/米H米d项对应于方差；看[20]详情。这种子流形密度估计的方法之前在[11]中提到过，论文[10]包含了细节，尽管它比[20]中的基本方法更技术和更通用。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

机器学习代写|流形学习代写manifold data learning代考|The Existence of Density Preserving Maps

Posted on 2022年5月10日2022年5月10日 by statistics-lab

如果你也在怎样代写流形学习manifold data learning这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的流形学习manifold data learning及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

机器学习代写|流形学习代写manifold data learning代考|The Existence of Density Preserving Maps

机器学习代写|流形学习代写manifold data learning代考|Moser’s Theorem

Riemannian manifolds. We begin by restricting our attention to data subspaces which are Riemannian submanifolds of $\mathbb{R}^{D}$. Riemannian manifolds provide a generalization of the notion of a smooth surface in $\mathbb{R}^{3}$ to higher dimensions. As first clarified by Gauss in the two-dimensional case and by Riemann in the general case, it turns out that intrinsic features of the geometry of a surface, such as the lengths of its curves or intrinsic distances between its points, etc., can be given in terms of the so-called metric tensor ${ }^{2} \mathrm{~g}$, without referring to the particular way the the surface is embedded in $\mathbb{R}^{3}$. A space whose geometry is defined in terms of a metric tensor is called a Riemannian manifold (for a rigorous definition, see, e.g., $[12,16,2])$.

The Gauss/Riemann result mentioned above states that if the intrinsic curvature of a Riemannian manifold $\left(M, \mathbf{g}_{M}\right)$ is not zero in an open set $U \in M$, it is not possible to find

a map from $M$ into $\mathbb{R}^{d}$ that preserves the distances between the points of $U$. Thus, there exists a local obstruction, namely, the curvature, to the existence of distance-preserving maps. It turns out that no such local obstruction exists for volume-preserving maps. The only invariant is a global one, namely, the total volume. ${ }^{3}$ This is the content of Moser’s theorem on volume-preserving maps, which we state next.

Theorem 3.2.1 (Moser [18]) Let $\left(M, \mathbf{g}{M}\right)$ and $\left(N, \mathbf{g}{N}\right)$ be two closed, connected, orientable, $d$-dimensional differentiable manifolds that are diffeomorphic to each other. Let $\tau_{M}$ and $\tau_{N}$ be volume forms, i.e., nowhere vanishing $d$-forms on these manifolds, satisfying $\int_{M} \tau_{M}=$ $\int_{N} \tau_{N}$. Then, there exists a diffeomorphism $\phi: M \rightarrow N$ such that $\tau_{M}=\phi^{*} \tau_{N}$, i.e., the volume form on $M$ is the same as the pull-back of the volume form on $N$ by $\phi .^{4}$

The meaning of this result is that, if two manifolds with the same “global shape” (i.e., two manifolds that are diffeomorphic) have the same total volume, one can find a map between them that preserves the volume locally. The surfaces of a mug and a torus are the classical examples used for describing global, topological equivalence. Although these objects have the same “global shape” (topology/smooth structure) their intrinsic, local geometries are different. Moser’s theorem states that if their total surface areas are the same, one can find a map between them that preserves the areas locally, as well, i.e., a map that sends all small regions on one surface to regions in the other surface in a way that preserves the areas.

Using this theorem, we now show that it is possible to find density-preserving maps between Riemannian manifolds that have the same total volume. This is due to the fact that if local volumes are preserved under a map, the density of a distribution will also be preserved.

机器学习代写|流形学习代写manifold data learning代考|Dimensional Reduction

These results were formulated in terms of so-called closed manifolds, i.e., compact manifolds without boundary. The practical dimensionality reduction problem we would like to address, on the other hand, involves starting with a $d$-dimensional data submanifold $M$ of $\mathbb{R}^{D}$ (where $d<D)$, and dimensionally reducing to $\mathbb{R}^{d}$. In order to be able to do this diffeomorphically, $M$ must be diffeomorphic to a subspace of $\mathbb{R}^{d}$, which is not generally the case for closed manifolds. For instance, although we can find a diffeomorphism from a hemisphere (a manifold with boundary, not a closed manifold) into the plane, we cannot find one from the unit sphere (a closed manifold) into the plane. This is a constraint on all dimensional reduction algorithms that preserve the global topology of the data space, not just density preserving maps. Any algorithm that aims to avoid “tearing” or “folding” the data subspace during the reduction will fail on problems like reducing a sphere to $\mathbb{R}^{2.5}$

Thus, in order to show that density preserving maps into $\mathbb{R}^{d}$ exist for a useful class of $d$-dimensional data manifolds, we have to make sure that the conclusion of Moser’s theorem and our corollary work for certain manifolds with boundary, or for certain non-compact manifolds, as well. Fortunately, this is not so hard, at least for a simple class of manifolds that is enough to be useful. In proving his theorem for closed manifolds, Moser [18] first gives a proof for a single “coordinate patch” in such a manifold, which, basically, defines a compact manifold with boundary minus the boundary itself. Not all $d$-dimensional manifolds with boundary (minus their boundaries) can be given by atlases consisting of a single coordinate patch, but the ones that can be so given cover a wide range of curved Riemannian manifolds, including the hemisphere and the Swiss roll, possibly with punctures. In the following, we will assume that $M$ consists of a single coordinate patch.

机器学习代写|流形学习代写manifold data learning代考|Intuition on Non-Uniqueness

Note that the results above claim the existence of volume (or density) preserving maps, but not uniqueness. In fact, the space of volume-preserving maps is very large. An intuitive way to see this is to consider the flow of an incompressible fluid in $\mathbb{R}^{3}$. The fluid may cover the same region in space at two given times, but the fluid particles may have gone through significant shuffling. The map from the original configuration of the fluid to the final one is a volume preserving diffeomorphism, assuming the flow is smooth. The infinity of ways a fluid can move shows the infinity of ways of preserving volume.

Distance-preserving maps may also have some non-uniqueness, but this is parametrized by a finite-dimensional group, namely, the isometry group of the Riemannian manifold under consideration. ${ }^{6}$ The case of volume-preserving maps is much worse, the space of volumepreserving diffeomorphisms being infinite-dimensional. Since the aim of this chapter is to describe a manifold-learning method that preserves volumes/densities, we are faced with the following question: Given a data manifold with intrinsic dimension $d$ that is diffeomorphic to a subset of $\mathbb{R}^{d}$, which map, in the infinite-dimensional space of volume-preserving maps from this manifold to $\mathbb{R}^{d}$, is the “best”? In Section 3.4, we will describe an approach to this problem by setting up a specific optimization procedure. But first, let us describe a method for estimating densities on submanifolds.

流形学习代写

机器学习代写|流形学习代写manifold data learning代考|Moser’s Theorem

黎曼流形。我们首先将注意力限制在数据子空间上，这些数据子空间是RD. 黎曼流形提供了光滑表面概念的一般化R3到更高的维度。正如高斯在二维情况下和黎曼在一般情况下首先阐明的那样，表面几何的内在特征，例如曲线的长度或点之间的内在距离等，可以用所谓的度量张量给出2 G, 没有提到表面嵌入的特定方式R3. 其几何由度量张量定义的空间称为黎曼流形（对于严格的定义，请参见，例如，[12,16,2]).

上面提到的高斯/黎曼结果表明，如果黎曼流形的本征曲率(米,G米)在开集中不为零在∈米, 找不到

一张地图米进入Rd保留点之间的距离在. 因此，保距图的存在存在一个局部障碍，即曲率。事实证明，保体地图不存在这样的局部障碍。唯一的不变量是全局变量，即总体积。3这就是 Moser 定理关于保体积映射的内容，我们接下来会说明。

定理 3.2.1 (Moser [18]) 令 $\left(M, \mathbf{g} {M}\right)一种nd\left(N, \mathbf{g} {N}\right)b和吨在这Cl这s和d,C这nn和C吨和d,这r一世和n吨一种bl和,d−d一世米和ns一世这n一种ld一世FF和r和n吨一世一种bl和米一种n一世F这lds吨H一种吨一种r和d一世FF和这米这rpH一世C吨这和一种CH这吨H和r.大号和吨\tau_ {M}一种nd\tau_{N}b和在这l在米和F这r米s,一世.和.,n这在H和r和在一种n一世sH一世nGd−F这r米s这n吨H和s和米一种n一世F这lds,s一种吨一世sF是一世nG\int_{M} \tau_{M}=\ int_ {N} \ tau_ {N}.吨H和n,吨H和r和和X一世s吨s一种d一世FF和这米这rpH一世s米\phi: M \rightarrow Ns在CH吨H一种吨\tau_{M}=\phi^{*} \tau_{N},一世.和.,吨H和在这l在米和F这r米这n米一世s吨H和s一种米和一种s吨H和p在ll−b一种Cķ这F吨H和在这l在米和F这r米这nñb是\phi .^{4}$

这个结果的意思是，如果两个具有相同“全局形状”的流形（即，两个微分同胚的流形）具有相同的总体积，则可以在它们之间找到一个局部保留体积的映射。杯子和圆环的表面是用于描述全局拓扑等价的经典例子。尽管这些对象具有相同的“全局形状”（拓扑/光滑结构），但它们内在的局部几何形状是不同的。Moser 定理指出，如果它们的总表面积相同，则可以在它们之间找到一个在局部保留区域的映射，即一张将一个表面上的所有小区域以某种方式发送到另一个表面上的区域的映射保留这些区域。

使用这个定理，我们现在表明可以在具有相同总体积的黎曼流形之间找到保持密度的映射。这是因为如果在地图下保留局部体积，则分布的密度也将保留。

机器学习代写|流形学习代写manifold data learning代考|Dimensional Reduction

这些结果是用所谓的封闭流形表示的，即没有边界的紧凑流形。另一方面，我们想要解决的实际降维问题涉及从一个d维数据子流形米的RD（在哪里d<D), 并降维为Rd. 为了能够微分同胚地做到这一点，米必须微分同胚于子空间Rd，这通常不是封闭歧管的情况。例如，虽然我们可以找到从半球（有边界的流形，而不是封闭的流形）到平面的微分同胚，但我们找不到从单位球体（封闭的流形）到平面的微分同胚。这是对保留数据空间全局拓扑的所有降维算法的约束，而不仅仅是保留密度图。任何旨在避免在缩减过程中“撕裂”或“折叠”数据子空间的算法都会在诸如将球体缩减为R2.5

因此，为了表明密度保持映射到Rd存在一个有用的类别d维数据流形，我们必须确保 Moser 定理的结论和我们的推论适用于某些有边界的流形，或者某些非紧流形。幸运的是，这并不难，至少对于一个简单的流形类来说就足够有用了。在证明他的闭流形定理时，Moser [18] 首先给出了这样一个流形中单个“坐标补丁”的证明，它基本上定义了一个紧凑流形，其边界减去边界本身。不是全部d具有边界（减去它们的边界）的维流形可以通过由单个坐标块组成的图集给出，但是可以给出的图集涵盖了广泛的弯曲黎曼流形，包括半球和瑞士卷，可能带有穿孔. 下面，我们假设米由单个坐标补丁组成。

机器学习代写|流形学习代写manifold data learning代考|Intuition on Non-Uniqueness

请注意，上述结果声称存在体积（或密度）保留图，但不是唯一性。事实上，保体贴图的空间非常大。看到这一点的一种直观方法是考虑不可压缩流体的流动R3. 流体可能在两个给定时间覆盖空间中的相同区域，但流体粒子可能已经经历了显着的改组。假设流动是平滑的，从流体的原始配置到最终配置的映射是一个保持体积的微分同胚。流体可以移动的无限方式显示了保持体积的无限方式。

距离保持映射也可能有一些非唯一性，但这是由一个有限维群参数化的，即所考虑的黎曼流形的等距群。6保体映射的情况更糟，保体微分同胚的空间是无限维的。由于本章的目的是描述一种保留体积/密度的流形学习方法，因此我们面临以下问题：给定具有内在维度的数据流形d微分同胚于Rd，它映射，在体积保持映射的无限维空间中，从这个流形到Rd，是最好的”？在第 3.4 节中，我们将通过设置特定的优化程序来描述解决此问题的方法。但首先，让我们描述一种估计子流形上的密度的方法。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写