## 计算机代写|机器学习代写machine learning代考|COMP4702

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习 machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习 machine learning代写方面经验极为丰富，各种代写机器学习 machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|机器学习代写machine learning代考|Intuition and Main Results

Consider first the training error $E_{\text {train }}$ defined in (5.3). Since
$$\operatorname{tr} \mathbf{Y} \mathbf{Q}^2(\gamma) \mathbf{Y}^{\boldsymbol{\top}}=-\frac{\partial}{\partial \gamma} \operatorname{tr} \mathbf{Y} \mathbf{Q}(\gamma) \mathbf{Y}^{\top},$$
a deterministic equivalent for the resolvent $\mathbf{Q}(\gamma)$ is sufficient to acceess the asymptotic behavior of $E_{\text {train }}$.
With a linear activation $\sigma(t)=t$, the resolvent of interest
$$\mathbf{Q}(\gamma)=\left(\frac{1}{n} \sigma(\mathbf{W X})^{\top} \sigma(\mathbf{W} \mathbf{X})+\gamma \mathbf{I}n\right)^{-1}$$ is the same as in Theorem 2.6. In a sense, the evaluation of $\mathbf{Q}(\gamma)$ (and subsequently $\left.E{\text {train }}\right)$ calls for an extension of Theorem $2.6$ to handle the case of nonlinear activations. Recall now that the main ingredients to derive a deterministic equivalent for (the linear case) $\mathbf{Q}=\left(\mathbf{X}^{\top} \mathbf{W}^{\top} \mathbf{W} \mathbf{X} / n+\gamma \mathbf{I}n\right)^{-1}$ are (i) $\mathbf{X}^{\top} \mathbf{W}^{\top}$ has i.i.d. columns and (ii) its $i$ th column $\left[\mathbf{W}^{\top}\right]_i$ has i.i.d. (or linearly dependent) entries so that the key Lemma $2.11$ applies. These hold, in the linear case, due to the i.i.d. property of the entries of $\mathbf{W}$. However, while for Item (i), the nonlinear $\Sigma^{\top}=\sigma(\mathbf{W X})^{\top}$ still has i.i.d. columns, and for Item (ii), its $i$ th column $\sigma\left(\left[\mathbf{X}^{\top} \mathbf{W}^{\top}\right]{. i}\right)$ no longer has i.i.d. or linearly dependent entries. Therefore, the main technical difficulty here is to obtain a nonlinear version of the trace lemma, Lemma 2.11. That is, we expect that the concentration of quadratic forms around their expectation remains valid despite the application of the entry-wise nonlinear $\sigma$. This naturally falls into the concentration of measure theory discussed in Section $2.7$ and is given by the following lemma.

Lemma 5.1 (Concentration of nonlinear quadratic form, Louart et al. [2018, Lemma 1]). For $\mathbf{w} \sim \mathcal{N}\left(\mathbf{0}, \mathbf{I}_p\right)$, 1-Lipschitz $\sigma(\cdot)$, and $\mathbf{A} \in \mathbb{R}^{n \times n}, \mathbf{X} \in \mathbb{R}^{p \times n}$ such that $|\mathbf{A}| \leq 1$ and $|\mathbf{X}|$ bounded with respect to $p, n$, then,
$$\mathbb{P}\left(\left|\frac{1}{n} \sigma\left(\mathbf{w}^{\top} \mathbf{X}\right) \mathbf{A} \sigma\left(\mathbf{X}^{\top} \mathbf{w}\right)-\frac{1}{n} \operatorname{tr} \mathbf{A} \mathbf{K}\right|>t\right) \leq C e^{-c n \min \left(t, t^2\right)}$$ for some $C, c>0, p / n \in(0, \infty)$ with ${ }^2$
$$\mathbf{K} \equiv \mathbf{K}{\mathbf{X X}} \equiv \mathbb{E}{\mathbf{w} \sim \mathcal{N}\left(\mathbf{0}, \mathbf{I}_p\right)}\left[\sigma\left(\mathbf{X}^{\top} \mathbf{w}\right) \sigma\left(\mathbf{w}^{\boldsymbol{\top}} \mathbf{X}\right)\right] \in \mathbb{R}^{n \times n}$$

## 计算机代写|机器学习代写machine learning代考|Consequences for Learning with Large Neural Networks

To validate the asymptotic analysis in Theorem $5.1$ and Corollary $5.1$ on real-world data, Figures $5.2$ and $5.3$ compare the empirical MSEs with their limiting behavior predicted in Corollary 5.1, for a random network of $N=512$ neurons and various types of Lipschitz and non-Lipschitz activations $\sigma(\cdot)$, respectively. The regressor $\boldsymbol{\beta} \in \mathbb{R}^p$ maps the vectorized images from the Fashion-MNIST dataset (classes 1 and 2) [Xiao et al., 2017] to their corresponding uni-dimensional ( $d=1$ ) output labels $\mathbf{Y}{1 i}, \hat{\mathbf{Y}}{1 j} \in$ ${\pm 1}$. For $n, p, N$ of order a few hundreds (so not very large when compared to typical modern neural network dimensions), a close match between theory and practice is observed for the Lipschitz activations in Figure 5.2. The precision is less accurate but still quite good for the case of non-Lipschitz activations in Figure 5.3, which, we recall, are formally not supported by the theorem statement – here for $\sigma(t)=1-t^2 / 2$, $\sigma(t)=1_{t>0}$, and $\sigma(t)=\operatorname{sign}(t)$. For all activations, the deviation from theory is more acute for small values of regularization $\gamma$.

Figures $5.2$ and $5.3$ confirm that while the training error is a monotonically increasing function of the regularization parameter $\gamma$, there always exists an optimal value for $\gamma$ which minimizes the test error. In particular, the theoretical formulas derived in Corollary $5.1$ allow for a (data-dependent) fast offline tuning of the hyperparameter $\gamma$ of the network, in the setting where $n, p, N$ are not too small and comparable. In terms of activation functions (those listed here), we observe that, on the Fashion-MNIST dataset, the ReLU nonlinearity $\sigma(t)=\max (t, 0)$ is optimal and achieves the minimum test error, while the quadratic activation $\sigma(t)=1-t^2 / 2$ is the worst and produces much higher training and test errors compared to others. This observation will be theoretically explained through a deeper analysis of the corresponding kernel matrix $\mathbf{K}$, as performed in Section 5.1.2. Lastly, although not immediate at first sight, the training and test error curves of $\sigma(t)=1_{t>0}$ and $\sigma(t)=\operatorname{sign}(t)$ are indeed the same, up to a shift in $\gamma$, as a consequence of the fact that $\operatorname{sign}(t)=2 \cdot 1_{t>0}-1$.

# 机器学习代考

## 计算机代写|机器学习代写machine learning代考|Intuition and Main Results

$$\operatorname{tr} \mathbf{Y} \mathbf{Q}^2(\gamma) \mathbf{Y}^{\top}=-\frac{\partial}{\partial \gamma} \operatorname{tr} \mathbf{Y} \mathbf{Q}(\gamma) \mathbf{Y}^{\top}$$

$$\mathbf{Q}(\gamma)=\left(\frac{1}{n} \sigma(\mathbf{W X})^{\top} \sigma(\mathbf{W X})+\gamma \mathbf{I} n\right)^{-1}$$

$\mathbf{Q}=\left(\mathbf{X}^{\top} \mathbf{W}^{\top} \mathbf{W X} / n+\gamma \mathbf{I} n\right)^{-1}$ 是我) $\mathbf{X}^{\top} \mathbf{W}^{\top}$ 有 iid 列和 (ii) 它的 $i$ 第 列 $\left[\mathbf{W}^{\top}\right]_i$ 具有独立同分布 (或线性相关) 条目，因此密钥引理 $2.11$ 适用。在线性情况下，由于条目的 iid 属性，这些成立 W. 然 而，对于项目 (i)，非线性 $\Sigma^{\top}=\sigma(\mathbf{W X})^{\top}$ 仍然有 iid 列，对于项目 (ii)，其 $i$ 第列 $\sigma\left(\left[\mathbf{X}^{\top} \mathbf{W}^{\top}\right] . i\right)$ 不 再具有 iid 或线性相关条目。因此，这里的主要技术难点是获得非线性版本的迹引理，引理 2.11。也就是 说，我们预计尽管应用了逐项非线性 $\sigma$. 这自然落入第 节讨论的测度论的集中 $2.7$ 并由以下引理给出。

$$\mathbb{P}\left(\left|\frac{1}{n} \sigma\left(\mathbf{w}^{\top} \mathbf{X}\right) \mathbf{A} \sigma\left(\mathbf{X}^{\top} \mathbf{w}\right)-\frac{1}{n} \operatorname{tr} \mathbf{A K}\right|>t\right) \leq C e^{-c n \min \left(t, t^2\right)}$$

$$\mathbf{K} \equiv \mathbf{K X X} \equiv \mathbb{E} \mathbf{w} \sim \mathcal{N}\left(\mathbf{0}, \mathbf{I}_p\right)\left[\sigma\left(\mathbf{X}^{\top} \mathbf{w}\right) \sigma\left(\mathbf{w}^{\top} \mathbf{X}\right)\right] \in \mathbb{R}^{n \times n}$$

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|机器学习代写machine learning代考|COMP30027

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习 machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习 machine learning代写方面经验极为丰富，各种代写机器学习 machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|机器学习代写machine learning代考|Random Neural Networks

Although much less popular than modern deep neural networks, neural networks with random fixed weights are simpler to analyze. Such networks have frequently arisen in the past decades as an appropriate solution to handle the possibly restricted number of training data, to reduce the computational and memory complexity and, from another viewpoint, can be seen as efficient random feature extractors. These neural networks in fact find their roots in Rosenblatt’s perceptron [Rosenblatt, 1958] and have then been many times revisited, rediscovered, and analyzed in a number of works, both in their feedforward [Schmidt et al., 1992] and recurrent [Gelenbe, 1993] versions. The simplest modern versions of these random networks are the so-called extreme learning machine [Huang et al., 2012] for the feedforward case, which one may seem as a mere linear regression method on nonlinear random features, and the echo state network [Jaeger, 2001] for the recurrent case. Also see Scardapane and Wang [2017] for a more exhaustive overview of randomness in neural networks.

It is also to be noted that deep neural networks are initialized at random and that random operations (such as random node deletions or voluntarily not-learning a large proportion of randomly initialized neural network weights, that is, random dropout) are common and efficient in neural network learning [Srivastava et al., 2014, Frankle and Carbin, 2019]. We may also point the recent endeavor toward neural network “learning without backpropagation,” which, inspired by biological neural networks (which naturally do not operate backpropagation learning), proposes learning mechanisms with fixed random backward weights and asymmetric forward learning procedures [Lillicrap et al., 2016, Nøkland, 2016, Baldi et al., 2018, Frenkel et al., 2019, Han et al., 2019]. As such, the study of random neural network structures may be instrumental to future improved understanding and designs of advanced neural network structures.

As shall be seen subsequently, the simple models of random neural networks are to a large extent connected to kernel matrices. More specifically, the classification or regression performance at the output of these random neural networks are functionals of random matrices that fall into the wide class of kernel random matrices, yet of a slightly different form than those studied in Section 4. Perhaps more surprisingly, this connection still exists for deep neural networks which are (i) randomly initialized and (ii) then trained with gradient descent, via the so-called neural tangent kernel [Jacot et al., 2018] by considering the “infinitely many neurons” limit, that is, the limit where the network widths of all layers go to infinity simultaneously. This close connection between neural networks and kernels has triggered a renewed interest for the theoretical investigation of deep neural networks from various perspectives including optimization [Du et al., 2019, Chizat et al., 2019], generalization [Allen-Zhu et al., 2019, Arora et al., 2019a, Bietti and Mairal, 2019], and learning dynamics [Lee et al., 2020, Advani et al., 2020, Liao and Couillet, 2018a]. These works shed new light on our theoretical understanding of deep neural network models and specifically demonstrate the significance of studying simple networks with random weights and their associated kernels to assess the intrinsic mechanisms of more elaborate and practical deep networks.

## 计算机代写|机器学习代写machine learning代考|Regression with Random Neural Networks

Throughout this section, we consider a feedforward single-hidden-layer neural network, as illustrated in Figure $5.1$ (displayed, for notational convenience, from right to left). A similar class of single-hidden-layer neural network models, however with a recurrent structure, will be discussed later in Section 5.3.

Given input data $\mathbf{X}=\left[\mathbf{x}_1, \ldots, \mathbf{x}_n\right] \in \mathbb{R}^{p \times n}$, we denote $\Sigma \equiv \sigma(\mathbf{W} \mathbf{X}) \in \mathbb{R}^{N \times n}$ the output of the first layer comprising $N$ neurons. This output arises from the premultiplication of $\mathbf{X}$ by some random weight matrix $\mathbf{W} \in \mathbb{R}^{N \times p}$ with i.i.d. (say standard Gaussian) entries and the entry-wise application of the nonlinear activation function $\sigma: \mathbb{R} \rightarrow \mathbb{R}$. As such, the columns $\sigma\left(\mathbf{W x}_i\right)$ of $\Sigma$ can be seen as random nonlinear features of $\mathbf{x}_i$. The second layer weight $\boldsymbol{\beta} \in \mathbb{R}^{N \times d}$ is then learned to adapt the feature matrix $\Sigma$ to some associated target $\mathbf{Y}=\left[\mathbf{y}_1, \ldots, \mathbf{y}_n\right] \in \mathbb{R}^{d \times n}$, for instance, by minimizing the Frobenius norm $\left|\mathbf{Y}-\boldsymbol{\beta}^{\top} \Sigma\right|_F^2$.

Remark 5.1 (Random neural networks, random feature maps and random kernels). The columns of $\Sigma$ may be seen as the output of the $\mathbb{R}^p \rightarrow \mathbb{R}^N$ random feature map $\phi: \mathbf{x}i \mapsto \sigma\left(\mathbf{W} \mathbf{x}_i\right)$ for some given $\mathbf{W} \in \mathbb{R}^{N \times p}$. In Rahimi and Recht [2008], it is shown that, for every nonnegative definite “shift-invariant” kernel of the form $(\mathbf{x}, \mathbf{y}) \mapsto f\left(|\mathbf{x}-\mathbf{y}|^2\right)$, there exist appropriate choices for $\sigma$ and the law of the entries of $\mathbf{W}$ so that as the number of neurons or random features $N \rightarrow \infty$, $$\sigma\left(\mathbf{W} \mathbf{x}_i\right)^{\top} \sigma\left(\mathbf{W} \mathbf{x}_j\right) \stackrel{\text { a.s. }}{\longrightarrow} f\left(\left|\mathbf{x}_i-\mathbf{x}_j\right|^2\right) .$$ As such, for large enough $N$ (that in general must scale with $n, p$ ), the bivariate function $(\mathbf{x}, \mathbf{y}) \mapsto \sigma(\mathbf{W} \mathbf{x})^{\top} \sigma(\mathbf{W y})$ approximates a kernel function of the type $f\left(|\mathbf{x}-\mathbf{y}|^2\right)$ studied in Chapter 4. This result is then generalized, in subsequent works, to a larger family of kernels including inner-product kernels [Kar and Karnick, 2012], additive homogeneous kernels [Vedaldi and Zisserman, 2012], etc. Another, possibly more marginal, connection with the previous sections is that $\sigma\left(\mathbf{w}^{\top} \mathbf{x}\right)$ can be interpreted as a “properly scaling” inner-product kernel function applied to the “data” pair $\mathbf{w}, \mathbf{x} \in \mathbb{R}^p$. This technically induces another strong relation between the study of kernels and that of neural networks. Again, similar to the concentration of (Euclidean) distance extensively explored in this chapter, the entry-wise convergence in (5.1) does not imply convergence in the operator norm sense, which, as we shall see, leads directly to the so-called “double descent” test curve in random feature/neural network models. If the network output weight matrix $\boldsymbol{\beta}$ is designed to minimize the regularized MSE $L(\boldsymbol{\beta})=\frac{1}{n} \sum{i=1}^n\left|\mathbf{y}_i-\boldsymbol{\beta}^{\top} \sigma\left(\mathbf{W x}_i\right)\right|^2+\gamma|\boldsymbol{\beta}|_F^2$, for some regularization parameter $\gamma>0$, then $\beta$ takes the explicit form of a ridge-regressor ${ }^1$
$$\beta \equiv \frac{1}{n} \Sigma\left(\frac{1}{n} \Sigma^{\top} \Sigma+\gamma \mathbf{I}_n\right)^{-1} \mathbf{Y}^{\top},$$
which follows from differentiating $L(\boldsymbol{\beta})$ with respect to $\boldsymbol{\beta}$ to obtain $0=\gamma \boldsymbol{\beta}+$ $\frac{1}{n} \Sigma\left(\Sigma^{\top} \boldsymbol{\beta}-\mathbf{Y}^{\top}\right)$ so that $\left(\frac{1}{n} \Sigma \Sigma^{\top}+\gamma \mathbf{I}_N\right) \boldsymbol{\beta}=\frac{1}{n} \Sigma \mathbf{Y}^{\top}$ which, along with $\left(\frac{1}{n} \Sigma \Sigma^{\top}+\right.$ $\left.\gamma \mathbf{I}_N\right)^{-1} \Sigma=\Sigma\left(\frac{1}{n} \Sigma^{\top} \Sigma+\gamma \mathbf{I}_n\right)^{-1}$ for $\gamma>0$, gives the result.

# 机器学习代考

## 计算机代写|机器学习代写machine learning代考|Regression with Random Neural Networks

$$\beta \equiv \frac{1}{n} \Sigma\left(\frac{1}{n} \Sigma^{\top} \Sigma+\gamma \mathbf{I}_n\right)^{-1} \mathbf{Y}^{\top},$$

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|机器学习代写machine learning代考|COMP5318

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习 machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习 machine learning代写方面经验极为丰富，各种代写机器学习 machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|机器学习代写machine learning代考|Concluding Remarks

Before the present chapter, the first part of the book was mostly concerned with the sample covariance matrix model $\mathbf{X} \mathbf{X}^{\top} / n$ (and more marginally with the Wigner model $\mathbf{X} / \sqrt{n}$ for symmetric $\mathbf{X}$ ), where the columns of $\mathbf{X}$ are independent and the entries of each column are independent or linearly dependent. Historically, this model and its numerous variations (with a variance profile, with right-side correlation, summed up to other independent matrices of the same form, etc.) have covered most of the mathematical and applied interest of the first two decades (since the early nineties) of intense random matrix advances. The main drivers for these early developments were statistics, signal processing, and wireless communications. The present chapter leaped much further in considering now random matrix models with possibly highly correlated entries, with a specific focus on kernel matrices. When (moderately) largedimensional data are considered, the intuition and theoretical understanding of kernel matrices in small-dimensional setting being no longer accurate, random matrix theory provides accurate (and asymptotically exact) performance assessment along with the possibility to largely improve the performance of kernel-based machine learning methods. This, in effect, creates a small revolution in our understanding of machine learning on realistic large datasets.

A first important finding of the analysis of large-dimensional kernel statistics reported here is the ubiquitous character of the Marčenko-Pastur and the semi-circular laws. As a matter of fact, all random matrix models studied in this chapter, and in particular the kernel regimes $f\left(\mathbf{x}_i^{\top} \mathbf{x}_j / p\right)$ (which concentrate around $f(0)$ ) and $f\left(\mathbf{x}_i^{\top} \mathbf{x}_j / \sqrt{p}\right.$ ) (which tends to $f(\mathcal{N}(0,1))$ ), have a limiting eigenvalue distribution akin to a combination of the two laws. This combination may vary from case to case (compare for instance the results of Practical Lecture 3 to Theorem 4.4), but is often parametrized in a such way that the Marčenko-Pastur and semicircle laws appear as limiting cases (in the context of Practical Lecture 3, they correspond to the limiting cases of dense versus sparse kernels, and in Theorem $4.4$ to the limiting cases of linear versus “purely” nonlinear kernels).

## 计算机代写|机器学习代写machine learning代考|Practical Course Material

In this section, Practical Lecture 3 (that evaluates the spectral behavior of uniformly sparsified kernels) related to the present Chapter 4 is discussed, where we shall see, as for $\alpha-\beta$ and properly scaling kernels in Sections $4.2 .4$ and $4.3$ that, depending on the “level of sparsity,” a combination of Marčenko-Pastur and semicircle laws is observed.
Practical Lecture Material 3 (Complexity-performance trade-off in spectral clustering with sparse kernel, Zarrouk et al. [2020]). In this exercise, we study the spectrum of a “punctured” version $\mathbf{K}=\mathbf{B} \odot\left(\mathbf{X}^{\top} \mathbf{X} / p\right.$ ) (with the Hadamard product $[\mathbf{A} \odot \mathbf{B}]{i j}=[\mathbf{A}]{i j}[\mathbf{B}]{i j}$ of the linear kernel $\mathbf{X}^{\top} \mathbf{X} / p$, with data matrix $\mathbf{X} \in \mathbb{R}^{p \times n}$ and a symmetric random mask-matrix $\mathbf{B} \in{0,1}^{n \times n}$ having independent $[\mathbf{B}]{i j} \sim \operatorname{Bern}(\boldsymbol{\epsilon})$ entries for $i \neq j$ (up to symmetry) and $[\mathbf{B}]_{i i}=b \in{0,1}$ fixed, in the limit $p, n \rightarrow \infty$ with $p / n \rightarrow c \in(0, \infty)$. This matrix mimics the computation of only a proportion $\epsilon \in(0,1)$ of the entries of $\mathbf{X}^{\top} \mathbf{X} / n$, and its impact on spectral clustering. Letting $\mathbf{X}=\left[\mathbf{x}_1, \ldots, \mathbf{x}_n\right]$ with $\mathbf{x}_i$ independently and uniformly drawn from the following symmetric two-class Gaussian mixture
$$\mathcal{C}_1: \mathbf{x}_i \sim \mathcal{N}\left(-\boldsymbol{\mu}, \mathbf{I}_p\right), \quad \mathcal{C}_2: \mathbf{x}_i \sim \mathcal{N}\left(+\boldsymbol{\mu}, \mathbf{I}_p\right)$$
for $\boldsymbol{\mu} \in \mathbb{R}^p$ such that $|\boldsymbol{\mu}|=O(1)$ with respect to $n, p$, we wish to study the effect of a uniform “zeroing out” of the entries of $\mathbf{X}^{\top} \mathbf{X}$ on the presence of an isolated spike in the spectrum of $\mathbf{K}$, and thus on the spectral clustering performance.

We will study the spectrum of $\mathbf{K}$ using Stein’s lemma and the Gaussian method discussed in Section 2.2.2. Let $\mathbf{Z}=\left[\mathbf{z}1, \ldots, \mathbf{z}_n\right] \in \mathbb{R}^{p \times n}$ for $\mathbf{z}_i=\mathbf{x}_i-(-1)^a \boldsymbol{\mu} \sim \mathcal{N}\left(\mathbf{0}, \mathbf{I}_p\right)$ with $\mathbf{x}_i \in \mathcal{C}_a$ and $\mathbf{M}=\mu \mathbf{j}^{\top}$ with $\mathbf{j}=\left[-\mathbf{1}{n / 2}, \mathbf{1}_{n / 2}\right]^{\top} \in \mathbb{R}^n$ so that $\mathbf{X}=\mathbf{M}+\mathbf{Z}$. First show that, for $\mathbf{Q} \equiv \mathbf{Q}(z)=\left(\mathbf{K}-z \mathbf{I}_n\right)^{-1}$,
\begin{aligned} \mathbf{Q}= & -\frac{1}{z} \mathbf{I}_n+\frac{1}{z}\left(\frac{\mathbf{Z}^{\boldsymbol{}} \mathbf{Z}}{p} \odot \mathbf{B}\right) \mathbf{Q}+\frac{1}{z}\left(\frac{\mathbf{Z}^{\boldsymbol{T}} \mathbf{M}}{p} \odot \mathbf{B}\right) \mathbf{Q} \ & +\frac{1}{z}\left(\frac{\mathbf{M}^{\boldsymbol{\top}} \mathbf{Z}}{p} \odot \mathbf{B}\right) \mathbf{Q}+\frac{1}{z}\left(\frac{\mathbf{M}^{\boldsymbol{T}} \mathbf{M}}{p} \odot \mathbf{B}\right) \mathbf{Q} . \end{aligned}
To proceed, we need to go slightly beyond the study of these four terms.

# 机器学习代考

## 计算机代写|机器学习代写machine learning代考|Practical Course Material

$$\mathbf{Q}=-\frac{1}{z} \mathbf{I}_n+\frac{1}{z}\left(\frac{\mathbf{Z Z}}{p} \odot \mathbf{B}\right) \mathbf{Q}+\frac{1}{z}\left(\frac{\mathbf{Z}^T \mathbf{M}}{p} \odot \mathbf{B}\right) \mathbf{Q} \quad+\frac{1}{z}\left(\frac{\mathbf{M}^{\top} \mathbf{Z}}{p} \odot \mathbf{B}\right) \mathbf{Q}+\frac{1}{z}\left(\frac{\mathbf{M}^T}{p}\right.$$

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|机器学习代写machine learning代考|COMP4702

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习 machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习 machine learning代写方面经验极为丰富，各种代写机器学习 machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|机器学习代写machine learning代考|Distance and Inner-Product Random Kernel Matrices

The most widely used kernel model in machine learning applications is the heat kernel $\mathbf{K}=\left{\exp \left(-\left|\mathbf{x}i-\mathbf{x}_j\right|^2 / 2 \sigma^2\right)\right}{i, j=1}^n$, for some $\sigma>0$. It is thus natural to start the large-dimensional analysis of kernel random matrices by focusing on this model.
As mentioned in the previous sections, for the Gaussian mixture model above, as the dimension $p$ increases, $\sigma^2$ needs to scale as $O(p)$, so say $\sigma^2=\tilde{\sigma}^2 p$ for some $\tilde{\sigma}^2=O(1)$, to avoid evaluating the exponential at increasingly large values for $p$ large. As such, the prototypical kernel of present interest is
$$\mathbf{K}=\left{f\left(\frac{1}{p}\left|\mathbf{x}i-\mathbf{x}_j\right|^2\right)\right}{i, j-1}^n,$$
for $f$ a sufficiently smooth function (specifically, $f(t)=\exp \left(-t / 2 \tilde{\sigma}^2\right)$ for the heat kernel). As we will see though, it is much desirable not to restrict ourselves to $f(t)=\exp \left(-t / 2 \tilde{\sigma}^2\right)$ so to better appreciate the impact of the nonlinear kernel function $f$ on the (asymptotic) structural behavior of the kernel matrix $\mathbf{K}$.

## 计算机代写|机器学习代写machine learning代考|Euclidean Random Matrices with Equal Covariances

In order to get a first picture of the large-dimensional behavior of $\mathbf{K}$, let us first develop the distance $\left|\mathbf{x}_i-\mathbf{x}_j\right|^2 / p$ for $\mathbf{x}_i \in \mathcal{C}_a$ and $\mathbf{x}_j \in \mathcal{C}_b$, with $i \neq j$.

For simplicity, let us assume for the moment $\mathbf{C}_1=\cdots=\mathbf{C}_k=\mathbf{I}_p$ and recall the notation $\mathbf{x}_i=\boldsymbol{\mu}_a+\mathbf{z}_i$. We have, for $i \neq j$ that “entry-wise,”
\begin{aligned} \frac{1}{p}\left|\mathbf{x}_i-\mathbf{x}_j\right|^2= & \frac{1}{p}\left|\boldsymbol{\mu}_a-\boldsymbol{\mu}_b\right|^2+\frac{2}{p}\left(\boldsymbol{\mu}_a-\boldsymbol{\mu}_b\right)^{\top}\left(\mathbf{z}_i-\mathbf{z}_j\right) \ & +\frac{1}{p}\left|\mathbf{z}_i\right|^2+\frac{1}{p}\left|\mathbf{z}_j\right|^2-\frac{2}{p} \mathbf{z}_i^{\top} \mathbf{z}_j . \end{aligned}
For $\left|\mathbf{x}_i\right|$ of order $O(\sqrt{p})$, if $\left|\mu_a\right|=O(\sqrt{p})$ for all $a \in{1, \ldots, k}$ (which would be natural), then $\left|\mu_a-\mu_b\right|^2 / p$ is a priori of order $O(1)$ while, by the central limit theorem, $\left|\mathbf{z}_i\right|^2 / p=1+O\left(p^{-1 / 2}\right)$. Also, again by the central limit theorem, $\mathbf{z}_i^{\top} \mathbf{z}_j / p=$ $O\left(p^{-1 / 2}\right)$ and $\left(\mu_a-\mu_b\right)^{\top}\left(\mathbf{z}_i-\mathbf{z}_j\right) / p=O\left(p^{-1 / 2}\right)$

As a consequence, for $p$ large, the distance $\left|\mathbf{x}i-\mathbf{x}_j\right|^2 / p$ is dominated by $| \boldsymbol{\mu}_a-$ $\boldsymbol{\mu}_b |^2 / p+2$ and easily discriminates classes from the pairwise observations of $\mathbf{x}_i, \mathbf{x}_j$, making the classification asymptotically trivial (without having to resort to any kernel method). It is thus of interest consider the situations where the class distances are less significant to understand how the choices of kernel come into play in such more practical scenario. To this end, we now demand that $$\left|\mu_a-\mu_b\right|=O(1),$$ which is also the minimal distance rate that can be discriminated from a mere Bayesian inference analysis, as thoroughly discussed in Section 1.1.3. Since the kernel function $f(\cdot)$ operates only on the distances $\left|\mathbf{x}_i-\mathbf{x}_j\right|$, we may even request (up to centering all data by, say, the constant vector $\frac{1}{n} \sum{a=1}^k n_a \mu_a$ ) for simplicity that $\left|\mu_a\right|=O(1)$ for each $a$.

# 机器学习代考

## 计算机代写|机器学习代写machine learning代考|Euclidean Random Matrices with Equal Covariances

$$\frac{1}{p}\left|\mathbf{x}_i-\mathbf{x}_j\right|^2=\frac{1}{p}\left|\boldsymbol{\mu}_a-\boldsymbol{\mu}_b\right|^2+\frac{2}{p}\left(\boldsymbol{\mu}_a-\boldsymbol{\mu}_b\right)^{\top}\left(\mathbf{z}_i-\mathbf{z}_j\right) \quad+\frac{1}{p}\left|\mathbf{z}_i\right|^2+\frac{1}{p}\left|\mathbf{z}_j\right|^2-\frac{2}{p} \mathbf{z}_i^{\top} \mathbf{z}_j$$

$$\left|\mu_a-\mu_b\right|=O(1)$$

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|机器学习代写machine learning代考|COMP30027

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习 machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习 machine learning代写方面经验极为丰富，各种代写机器学习 machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|机器学习代写machine learning代考|The Nontrivial Growth Rates

In classical large- $n$ only asymptotic statistics, laws of large numbers demand a scaling by $1 / n$ of the summed observations. When centered, central limit theorems then occur after multiplication of the average by $\sqrt{n}$. A similar requirement is needed when we now consider that the dimension $p$ of the data is also large. In particular, we will demand that the norm of each observation remains bounded. Assuming $\mathbf{x} \in \mathbb{R}^p$ is a vector of bounded entries, that is, each of order $O(1)$ with respect to $p$, the natural normalization is typically $\mathbf{x} / \sqrt{p}$.

In the context of kernel methods, for data $\mathbf{x}_1, \ldots, \mathbf{x}_n$, one wishes that the argument of $f(\cdot)$ in the inner-product kernel $f\left(\mathbf{x}_i^{\top} \mathbf{x}_j\right)$ or the distance kernel $f\left(\left|\mathbf{x}_i-\mathbf{x}_j\right|^2\right)$ be of order $O(1)$, when $f$ is assumed independent of $p$.

The “correct” scaling however appears not to be so immediate. Letting $\mathbf{x}i$ have entries of order $O(1)$, one naturally has that $\left|\mathbf{x}_i-\mathbf{x}_j\right|^2=\left|\mathbf{x}_i\right|^2+\left|\mathbf{x}_j\right|^2-2 \mathbf{x}_i^{\top} \mathbf{x}_j=$ $O(p)$ and it thus appears natural to scale $\left|\mathbf{x}_i-\mathbf{x}_j\right|^2$ by $1 / p$. Similarly, if the norm of the mean $\left|\mathbb{E}\left[\mathbf{x}_i\right]\right|$ of $\mathbf{x}_i$ has the same order of magnitude as $\left|\mathbf{x}_i\right|$ itself (as it should in general), then for $\mathbf{x}_i, \mathbf{x}_j$ independent, $\mathbb{E}\left[\mathbf{x}_i^{\top} \mathbf{x}_j\right]=O(p)$. So again, one should scale the inner-product also by $1 / p$, to obtain kernel matrices of the type $$\mathbf{K}=\left{f\left(\frac{1}{p}\left|\mathbf{x}_i-\mathbf{x}_j\right|^2\right)\right}{i, j=1}^n, \text { and }\left{f\left(\frac{1}{p} \mathbf{x}i^{\top} \mathbf{x}_j\right)\right}{i, j=1}^n$$
Section $4.2$ (and most applications thereafter) will be placed under these kernel forms. The most commonly used Gaussian kernel matrix, defined as $\mathbf{K}=\left{\exp \left(-| \mathbf{x}i-\right.\right.$ $\left.\left.\mathbf{x}_j |^2 / 2 \sigma^2\right)\right}{i, j=1}^n$, falls into this family as one usually demands that $\sigma^2 \sim \mathbb{E}\left[\left|\mathbf{x}_i-\mathbf{x}_j\right|^2\right]$ (to avoid evaluating the exponential close to zero or infinity).

However, as already demonstrated in Section 1.1.3, if $n$ scales like $p$, then, for the classification problem to be asymptotically nontrivial, the difference $\left|\mathbb{E}\left[\mathbf{x}_i\right]-\mathbb{E}\left[\mathbf{x}_j\right]\right|^2$ needs to scale like $O(1)$ rather than $O(p)$ (otherwise data classes would be too easy to cluster for all large $n, p$ ), resulting in $\left|\mathbf{x}_i-\mathbf{x}_j\right|^2 / p$ possibly converging to a constant value irrespective of the data classes (of $\mathbf{x}_i$ and $\mathbf{x}_j$ ), with a typical “spread” of order $O(1 / \sqrt{p})$. Similarly, up to re-centering, ${ }^2 \mathbf{x}i^{\top} \mathbf{x}_j / p$ scales like $O(1 / \sqrt{p})$ rather than $O(1)$. As such, it seems more appropriate to normalize the kernel matrix entries as $$[\mathbf{K}]{i j}=f\left(\frac{\left|\mathbf{x}i-\mathbf{x}_j\right|^2}{\sqrt{p}}-\frac{1}{n(n-1)} \sum{i^{\prime}, j^{\prime}} \frac{\left|\mathbf{x}{i^{\prime}}-\mathbf{x}{j^{\prime}}\right|^2}{\sqrt{p}}\right), \text { or }[\mathbf{K}]_{i j}=f\left(\frac{1}{\sqrt{p}} \mathbf{x}_i^{\top} \mathbf{x}_j\right)$$
in order here to avoid evaluating $f$ essentially at a single value (equal to zero for the inner-product kernel or equal to the average “common” limiting intra-data distance for the distance kernel).

This “properly scaling” setting is in fact much richer than the $1 / p$ normalization when $n, p$ are of the same order of magnitude. Sections $4.2 .4$ and $4.3$ elaborate on this scenario.

## 计算机代写|机器学习代写machine learning代考|Statistical Data Model

In the remainder of the section, we assume the observation of $n$ independent data vectors from a total of $k$ classes gathered as $\mathbf{X}=\left[\mathbf{x}1, \ldots, \mathbf{x}_n\right] \in \mathbb{R}^{p \times n}$, where $$\begin{array}{cc} \mathbf{x}_1, \ldots, \mathbf{x}{n_1} & \sim \mathcal{N}\left(\mu_1, \mathbf{C}1\right) \ \vdots & \vdots \ \mathbf{x}{n-n_k+1}, \ldots, \mathbf{x}n \sim \mathcal{N}\left(\mu_k, \mathbf{C}_k\right), \end{array}$$ which is a $k$-class Gaussian mixture model (GMM) with a fixed cardinality $n_1, \ldots, n_k$ in each class. ${ }^3$ The fact that the data are indexed according to classes simplifies the notation but has no practical consequence in the analysis. We will denote $\mathcal{C}_a$ the class number ” $a$,” so in particular $$\mathbf{x}_i \sim \mathcal{N}\left(\mu_a, \mathbf{C}_a\right) \Leftrightarrow \mathbf{x}_i \in \mathcal{C}_a$$ for $a \in{1, \ldots, k}$, and will use for convenience the matrix $$\mathbf{J}=\left[\mathbf{j}_1, \ldots, \mathbf{j}_k\right] \in \mathbb{R}^{n \times k}, \quad \mathbf{j}_a=[\underbrace{0, \ldots, 0}{n_1+\ldots+n_{a-1}}, \underbrace{1, \ldots, 1}{n_a}, \underbrace{0, \ldots, 0}{n_{a+1}+\ldots+n_k}]^{\top},$$
which is the indicator matrix of the class labels $(\mathbf{J}$ is a priori known under a supervised learning setting and is to be fully or partially recovered under a semi-supervised or unsupervised learning setting).

We shall systematically make the following simplifying growth rate assumption for $p, n$ and $n_1, \ldots, n_k$.

Assumption 1 (Growth rate of data size and number). As $n \rightarrow \infty, p / n \rightarrow c \in(0, \infty)$ and $n_a / n \rightarrow c_a \in(0,1)$.

This assumption, in particular, implies that each class is “large” in the sense that their cardinalities increase with $n^4$

Accordingly with the discussions in Chapter 2, from a random matrix “universality” perspective, the Gaussian mixture assumption will often (yet not always) turn out equivalent to demanding that
$$\mathbf{x}_i \in \mathcal{C}_a: \mathbf{x}_i=\mu_a+\mathbf{C}_a^{\frac{1}{2}} \mathbf{z}_i$$
with $\mathbf{z}_i \in \mathbb{R}^p$ a random vector with i.i.d. entries of zero mean, unit variance, and bounded higher-order (e.g., fourth) moments.

This hypothesis is indeed quite restrictive as it imposes that the data, up to centering and linear scaling, are composed of i.i.d. entries. Equivalently, this suggests that only data which result from affine transformations of vectors with i.i.d. entries can be studied, which is quite restrictive in practice as “real data” are deemed much more complex.

Exploring the notion of concentrated random vectors introduced in Section 2.7, Chapter 8 will open up this discussion by showing that a much larger class of (statistical) data models embrace the same asymptotic statistics, and that most results discussed in the present section apply identically to broader models of data irreducible to vectors of independent entries.

# 机器学习代考

## 计算机代写|机器学习代写machine learning代考|Statistical Data Model

$$\mathbf{x}_i \in \mathcal{C}_a: \mathbf{x}_i=\mu_a+\mathbf{C}_a^{\frac{1}{2}} \mathbf{z}_i$$

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|机器学习代写machine learning代考|COMP5318

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习 machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习 machine learning代写方面经验极为丰富，各种代写机器学习 machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|机器学习代写machine learning代考|Kernel Methods

In a broad sense, kernel methods are at the core of many, if not most, machine learning algorithms [Schölkopf and Smola, 2018]. Given a set of data $\mathbf{x}1, \ldots, \mathbf{x}_n \in \mathbb{R}^p$, most learning mechanisms rely on extracting the structural data information from direct or indirect pairwise comparisons $\kappa\left(\mathbf{x}_i, \mathbf{x}_j\right)$ for some affinity metric $\kappa(\cdot, \cdot)$. Gathered in an $n \times n$ matrix $$\mathbf{K}=\left{\kappa\left(\mathbf{x}_i, \mathbf{x}_j\right)\right}{i, j=1}^n$$
the “cumulative” effect of these comparisons for numerous $(n \gg 1)$ data is at the source of various supervised, semi-supervised, or unsupervised methods such as support vector machines, graph Laplacian-based learning, kernel spectral clustering, and has deep connections to neural networks.

These applications will be thoroughly discussed in Section 4.4. For the moment though, our main interest lies in the spectral characterization of the kernel matrix $\mathbf{K}$ itself for various (classical) choices of affinity functions $\kappa$ and for various statistical models of the data $\mathbf{x}_i$

Clearly, from a purely machine learning perspective, the choice of the affinity function $\kappa(\cdot, \cdot)$ is central to a good performance of the learning method under study. Since real data in general have highly complex structures, a typical viewpoint is to assume that the data points $\mathbf{x}_i$ and $\mathbf{x}_j$ are not directly comparable in their ambient space but that there exists a convenient feature extraction function $\phi: \mathbb{R}^p \rightarrow \mathbb{R}^q(q \in \mathbb{N} \cup{+\infty})$ such that $\phi\left(\mathbf{x}_i\right)$ and $\phi\left(\mathbf{x}_j\right)$ are more amenable to comparison. Otherwise stated, in the image of $\phi(\cdot)$, the data are more “linear” (or more “linearly separable” if one seeks to group the data in affinity classes). The simplest affinity function between $\mathbf{x}_i$ and $\mathbf{x}_j$ would in this case be $\kappa\left(\mathbf{x}_i, \mathbf{x}_j\right)=\phi\left(\mathbf{x}_i\right)^{\top} \phi\left(\mathbf{x}_j\right)$

Since $q$ may be larger (if not much larger) than $p$, the mere cost of evaluating $\phi\left(\mathbf{x}_i\right)^{\top} \phi\left(\mathbf{x}_j\right)$ can be deleterious to practical implementation. The so-called kernel trick is anchored in the remark that, for a certain class of such functions $\phi, \phi\left(\mathbf{x}_i\right)^{\top} \phi\left(\mathbf{x}_j\right)=$ $f\left(\left|\mathbf{x}_i-\mathbf{x}_j\right|^2\right)$ or $-f\left(\mathbf{x}_i^{\top} \mathbf{x}_j\right)$ for some function $f: \mathbb{R} \rightarrow \mathbb{R}$ and it thus suffices to evaluate $\left|\mathbf{x}_i-\mathbf{x}_j\right|^2$ or $\mathbf{x}_i^{\top} \mathbf{x}_j$ in the ambient space and then apply $f$ in an entrywise manner to evaluate all data affinities, leading to more practically convenient methods.

Although the class of such functions $f$ is inherently restricted by the need for a mapping $\phi$ to exist such that, say, $\phi\left(\mathbf{x}_i\right)^{\top} \phi\left(\mathbf{x}_j\right)=f\left(\left|\mathbf{x}_i-\mathbf{x}_j\right|^2\right)$ for all possible $\mathbf{x}_i, \mathbf{x}_j$ pairs (these are sometimes called Mercer kernel functions), ${ }^1$ with time, practitioners have started to use arbitrary functions $f$ and worked with generic kernel matrices of the form
$$\mathbf{K}=\left{f\left(\left|\mathbf{x}i-\mathbf{x}_j\right|^2\right)\right}{i, j=1}^n, \quad \text { or } \quad \mathbf{K}=\left{f\left(\mathbf{x}i^{\top} \mathbf{x}_j\right)\right}{i, j=1}^n,$$
irrespective of the actual form or even the existence of an underlying feature extraction function $\phi$. There are, in particular, empirical evidences showing that well-chosen “indefinite” (i.e., nonMercer type) kernels, being not associated with a mapping $\phi$, can sometimes outperform conventional nonnegative definite kernels that satisfy the Mercer’s condition [Haasdonk, 2005, Luss and D’Aspremont, 2008].

## 计算机代写|机器学习代写machine learning代考|Basic Setting

As pointed out in Remark $4.1$ and shall become evident from the coming analysis, the small-dimensional intuition according to which $f$ should be a nonincreasing “valid” Mercer function becomes rather meaningless when dealing with large-dimensional data, essentially due to the “curse of dimensionality” and the concentration phenomenon in high dimensions.

To fully capture this aspect, a first important consideration is, as already mentioned in Section 1.1.3, to deal with “nontrivial” relative growth rates of the statistical data parameters with respect to the dimensions $p, n$. By nontrivial, we mean that the underlying classification or regression problem for which the kernel method is designed should neither be impossible nor trivially easy to solve as $p, n \rightarrow \infty$. The reason behind this request is fundamental, and also disrupts from many research works in machine learning which, instead, seek to prove that the method under study performs perfectly in the limit of large $n$ (with $p$ fixed in general): Here, we rather wish to account for the fact that, at finite but large $p, n$, the machine learning methods of practical interest are those which have nontrivial performances; thus, in what follows, ” $n, p \rightarrow \infty$ in nontrivial growth rates” should really be understood as ” $n, p$ are both large and the problem at hand is non-trivially easy or hard to solve.”

In this section, we will mostly focus on the use of kernel methods for classification, and thus the nontrivial settings are given in terms of the growth rate of the “distance” between (the statistics of) data classes. It will particularly appear that the very definition of the appropriate growth rates to ensure the nontrivial character of a machine learning problem to be solved through kernel methods depends on the kernel design itself, and that flagship kernels such as the Gaussian kernel $\kappa\left(\mathbf{x}_i, \mathbf{x}_j\right)=\exp \left(-\left|\mathbf{x}_i-\mathbf{x}_j\right|^2 / 2 \sigma^2\right)$ are in general quite suboptimal.

# 机器学习代考

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|机器学习代写machine learning代考|COMP4702

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习 machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习 machine learning代写方面经验极为丰富，各种代写机器学习 machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|机器学习代写machine learning代考|Explaining Kernel Methods with Random Matrix Theory

The fundamental reason behind this surprising behavior lies in the accumulated effect of the $n / 2$ small “hidden” informative terms $|\boldsymbol{\mu}|^2, \operatorname{tr} \mathbf{E}$ and $\operatorname{tr}\left(\mathbf{E}^2\right)$ in each class, which collectively “steer” the several top eigenvectors of $\mathbf{K}$. More explicitly, we shall see in the course of this book that the Gaussian kernel matrix $\mathbf{K}$ can be asymptotically expanded as
$$\mathbf{K}=\exp (-1)\left(\mathbf{1}n \mathbf{1}_n^{\boldsymbol{\top}}+\frac{1}{p} \mathbf{Z}^{\boldsymbol{\top}} \mathbf{Z}\right)+f(\boldsymbol{\mu}, \mathbf{E}) \cdot \frac{1}{p} \mathbf{j} \mathbf{j}^{\boldsymbol{\top}}++o{|\cdot|}(1),$$
where $\mathbf{Z}=\left[\mathbf{z}1, \ldots, \mathbf{z}_n\right] \in \mathbb{R}^{p \times n}$ is a Gaussian noise matrix, $f(\boldsymbol{\mu}, \mathbf{E})=O(1)$, and $\mathbf{j}=\left[\mathbf{1}{n / 2} ;-\mathbf{1}{n / 2}\right]$ is the class-information “label” vector (as in the setting of Figure 1.2). Here “” symbolizes extra terms of marginal importance to the present discussion, and $o{|\cdot|}(1)$ represents terms of asymptotically vanishing operator norm as $n, p \rightarrow \infty$. The important remark to be made here is that
(i) Under this description, $[\mathbf{K}]_{i j}=\exp (-1)\left(1+\mathbf{z}_i^{\top} \mathbf{z}_j / p\right) \pm f(\boldsymbol{\mu}, \mathbf{E}) / p+*$, with $f(\mu, \mathbf{E}) / p \ll \mathbf{z}_i^{\top} \mathbf{z}_j / p=O\left(p^{-1 / 2}\right)$; this is consistent with our previous discussion: The statistical information is entry-wise dominated by noise.
(ii) From a spectral viewpoint, $\left|\mathbf{Z}^{\top} \mathbf{Z} / p\right|=O$ (1), as per the Marčenko-Pastur theorem [Marčenko and Pastur, 1967] discussed in Section 1.1.2 and visually confirmed in Figure 1.1, while $|f(\boldsymbol{\mu}, \mathbf{E}) \cdot \mathbf{j} \mathbf{j} \mathrm{T} / p|=O(1)$ : Thus, spectrum-wise, the information stands on even ground with noise.

The mathematical magic at play here lies in $f(\boldsymbol{\mu}, \mathbf{E}) \cdot \mathbf{j} \mathbf{j} / / p$ having entries of order $O\left(p^{-1}\right)$ while being a low-rank (here unit-rank) matrix: All its “energy” concentrates in a single nonzero eigenvalue. As for $\mathbf{Z}^{\top} \mathbf{Z} / p$, with larger $O\left(p^{-1 / 2}\right)$ amplitude entries, it is composed of “essentially independent” zero-mean random variables and tends to be of full rank and spreads its energy over its $n$ eigenvalues. Spectrum-wise, both $f(\boldsymbol{\mu}, \mathbf{E}) \cdot \mathbf{j} \mathbf{j}{ }^{\top} / p$ and $\mathbf{Z}^{\top} \mathbf{Z} / p$ meet on even ground under the nontrivial classification setting of (1.7).

We shall see in Section 4 that things are actually not as clear-cut and, in particular, that not all choices of kernel functions can achieve the same nontrivial classification rates. In particular, the popular Gaussian (radial basis function [RBF]) kernel will be shown to be largely suboptimal in this respect.

## 计算机代写|机器学习代写machine learning代考|Random Matrix Theory as an Answer

Random matrix theory originates from the work of John Wishart [Wishart, 1928] on the study of the eigenvalues of the matrix $\mathbf{X} \mathbf{X}^{\top}$ (now referred to as a Wishart matrix) for $\mathbf{X} \in \mathbb{R}^{p \times n}$ with standard Gaussian entries $[\mathbf{X}]_{i j} \sim \mathcal{N}(0,1)$. Wishart managed to determine a closed-form expression for the joint eigenvalue distribution of $\mathbf{X X ^ { \top }}$ for every pair of $p, n$. Few progress however followed, as matrices with non-Gaussian entries are hardly amenable to similar analysis and, even if they were, the actual study of more elaborate functionals of $\mathbf{X X}^{\top}$ is at best cumbersome and often simply intractable.

The works of the physicist Eugene Wigner [Wigner, 1955] gave a new impulse to the theory. Interested in the eigenvalues of symmetric matrices $\mathbf{X} \in \mathbb{R}^{n \times n}$ with independent Bernoulli entries (particle spins in his application context), Wigner opted for an asymptotic analysis of the eigenvalue distribution, thereby initiating the important and much richer branch of large-dimensional random matrix theory. Despite this important inspiration, Wigner exploited standard asymptotic statistics tools (the method of moments) to prove that the discrete distribution of the eigenvalues of $\mathbf{X}$ has a continuous semicircle looking density in the $n \rightarrow \infty$ limit (the now popular semicircular law). This approach was particularly convenient as the limiting law is simple and could be visually anticipated (which is not the case of the next-to-come Marčenko-Pastur limiting distribution of Wishart matrices).

Only until 1967 with the tour-de-force of Marčenko and Pastur [1967] did random matrix theory take a new dimension. Marčenko and Pastur determined the limiting spectral distribution of the sample covariance matrix model $\mathbf{X} \mathbf{X}^{\top}$ of Wishart but under relaxed conditions: $[\mathbf{X}]{i j}$ are independent entries with zero mean and unit variance, and additional moment assumptions (all discarded in subsequent works). The independence (or weak dependence) property is key to their proof, which exploits the powerful Stieltjes transform $\frac{1}{p} \operatorname{tr}\left(\frac{1}{n} \mathbf{X} \mathbf{X}^{\top}-z \mathbf{I}_p\right)^{-1}=\int(\lambda-z)^{-1} \mu_p(d t)$ of the empirical spectral distribution $\mu_p \equiv \frac{1}{p} \sum{i=1}^p \delta_{\lambda_i\left(\frac{1}{n} \mathbf{X X}^{\top}\right)}$ of $\frac{1}{n} \mathbf{X} \mathbf{X}^{\top}$, a tool borrowed from operator theory in Hilbert spaces [Akhiezer and Glazman, 2013], rather than the moments $\frac{1}{p} \operatorname{tr}\left(\frac{1}{n} \mathbf{X X}^{\top}\right)^k$ (which may not converge since $\mathbb{E}\left[\mathbf{X}_{i j}^{\ell}\right]$ needs not be finite for $\ell>2$ ).

The technical approach devised by Marčenko and Pastur was then largely embraced at the turn of the twenty-first century by Bai and Silverstein who, in a series of significant breakthroughs (the most noticeable of which are [Silverstein and Bai, 1995, Bai and Silverstein, 1998]), extended the results in [Marčenko and Pastur, 1967] to an exhaustive study of sample covariance matrices.

# 机器学习代考

## 计算机代写|机器学习代写machine learning代考|Explaining Kernel Methods with Random Matrix Theory

$$\mathbf{K}=\exp (-1)\left(\mathbf{1} n \mathbf{1}n^{\top}+\frac{1}{p} \mathbf{Z}^{\top} \mathbf{Z}\right)+f(\boldsymbol{\mu}, \mathbf{E}) \cdot \frac{1}{p} \mathbf{j j}^{\top}++o|\cdot|(1)$$ 在哪里 $\mathbf{Z}=\left[\mathbf{z} 1, \ldots, \mathbf{z}_n\right] \in \mathbb{R}^{p \times n}$ 是高斯㗍声矩阵， $f(\boldsymbol{\mu}, \mathbf{E})=O(1)$ ，和 $\mathbf{j}=[\mathbf{1} n / 2 ;-\mathbf{1 n} / 2]$ 是类 信息”标签”向量（如图 $1.2$ 的设置) 。这里 ${ }^{\prime \prime \prime}$ 表示对当前讨论不重要的额外术语，并且 $o|\cdot|(1)$ 将渐近消 失的算子范数的项表示为 $n, p \rightarrow \infty$. 这里要说明的重要一点是 (i) 根据这个描述， $[\mathbf{K}]{i j}=\exp (-1)\left(1+\mathbf{z}_i^{\top} \mathbf{z}_j / p\right) \pm f(\boldsymbol{\mu}, \mathbf{E}) / p+*$ ， 和
$f(\mu, \mathbf{E}) / p \ll \mathbf{z}_i^{\top} \mathbf{z}_j / p=O\left(p^{-1 / 2}\right)$; 这与我们之前的讨论是一致的：统计信息在条目方面由噪声主 导。
(ii) 从光谱的角度来看， $\left|\mathbf{Z}^{\top} \mathbf{Z} / p\right|=O(1)$ ，根据 Marčenko-Pastur 定理 [Marčenko 和 Pastur，1967] 在第 1.1.2 节中讨论并在图 $1.1$ 中直观地确认，而 $|f(\boldsymbol{\mu}, \mathbf{E}) \cdot \mathbf{j j T} / p|=O(1)$ : 因此，在频谱方面，信 息与橾声持平。

## 计算机代写|机器学习代写machine learning代考|Random Matrix Theory as an Answer

Marčenko 和 Pastur 设计的技术方法在 21 世纪之交被 Bai 和 Silverstein 广泛接受，他们取得了一系列 重大突破 (其中最引人注目的是 [Silverstein 和 Bai， 1995，Bai 和 Silverstein，1998]), 将 [Marčenko 和 Pastur, 1967] 中的结果扩展到对样本协方差矩阵的详尽研究。

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|机器学习代写machine learning代考|COMP30027

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习 machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习 machine learning代写方面经验极为丰富，各种代写机器学习 machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|机器学习代写machine learning代考|Kernel Matrices of Large-Dimensional Data

Another less-known but equally important example of the curse of dimensionality in machine learning involves the loss of relevance of (the notion of) Euclidean distance between large-dimensional data vectors. To be more precise, we will see in the sequel that, in an asymptotically nontrivial classification setting (i.e., ensuring that asymptotic classification is neither trivially easy nor impossible), large and numerous data vectors $\mathbf{x}1, \ldots, \mathbf{x}_n \in \mathbb{R}^p$ extracted from a few-class (say two-class) mixture model tend to be asymptotically at equal (Euclidean) distance from one another, irrespective of their corresponding class. Roughly speaking, in this nontrivial setting and under some reasonable statistical assumptions on the $x_i \mathrm{~s}$, we have $$\max {1 \leq i \neq j \leq n}\left{\frac{1}{p}\left|\mathbf{x}_i-\mathbf{x}_j\right|^2-\tau\right} \rightarrow 0$$
for some constant $\tau>0$ as $n, p \rightarrow \infty$, independently of the classes (same or different) of $\mathbf{x}_i$ and $\mathbf{x}_j$ (here the normalization by $p$ is used for compliance with the notations in the remainder of this book and has no particular importance).

This asymptotic behavior is extremely counterintuitive and conveys the idea that classification by standard methods ought not to be doable in this large-dimensional regime. Indeed, in the conventional small-dimensional intuition that forged many of the leading machine learning algorithms of everyday use (such as spectral clustering [Ng et al., 2002, Luxburg, 2007]), two data points are assigned to the same class if they are “close” in Euclidean distance. Here we claim that, when $p$ is large, data pairs are neither close nor far from each other, regardless of their belonging to the same class or not. Despite this troubling loss of individual discriminative power between data pairs, we subsequently show that, thanks to a collective behavior of all data belonging to the same (few and thus large) classes, data classification or clustering is still achievable. Better, we shall see that, while many conventional methods devised from small-dimensional intuitions do fail in this large-dimensional regime, some popular approaches, such as the $\mathrm{Ng}$-Jordan-Weiss spectral clustering method [Ng et al., 2002] or the PageRank semisupervised learning approach [Avrachenkov et al., 2012], still function. But the core reasons for their functioning are strikingly different from the reasons of their initial designs, and they often operate far from optimally.

## 计算机代写|机器学习代写machine learning代考|The Nontrivial Classification Regime

To get a clear picture of the source of Equation (1.3), we first need to clarify what we refer to as the “asymptotically nontrivial” classification setting. Consider the simplest scenario of a binary Gaussian mixture classification: Given a training set $\mathbf{x}_1, \ldots, \mathbf{x}_n \in \mathbb{R}^p$ of $n$ samples independently drawn from the two-class $\left(\mathcal{C}_1\right.$ and $\left.\mathcal{C}_2\right)$ Gaussian mixture,
$$\mathcal{C}_1: \mathbf{x} \sim \mathcal{N}\left(\boldsymbol{\mu}, \mathbf{I}_p\right), \quad \mathcal{C}_2: \mathbf{x} \sim \mathcal{N}\left(-\boldsymbol{\mu}, \mathbf{I}_p+\mathbf{E}\right),$$
each drawn with probability $1 / 2$, for some deterministic $\mu \in \mathbb{R}^p$ and symmetric $\mathbf{E} \in \mathbb{R}^{p \times p}$, both possibly depending on $p$. In the ideal case where $\mu$ and $\mathbf{E}$ are perfectly known, one can devise a (decision optimal) Neyman-Pearson test. For an unknown $\mathbf{x}$, genuinely belonging to $\mathcal{C}_1$, the Neyman-Pearson test to decide on the class of $\mathbf{x}$ reads
Writing $\mathbf{x}=\boldsymbol{\mu}+\mathbf{z}$ for $\mathbf{z} \sim \mathcal{N}\left(\mathbf{0}, \mathbf{I}_p\right)$, the above test is equivalent to
\begin{aligned} T(\mathbf{x}) \equiv & 4 \boldsymbol{\mu}^{\top}\left(\mathbf{I}_p+\mathbf{E}\right)^{-1} \boldsymbol{\mu}+4 \boldsymbol{\mu}^{\top}\left(\mathbf{I}_p+\mathbf{E}\right)^{-1} \mathbf{z}+\mathbf{z}^{\top}\left(\left(\mathbf{I}_p+\mathbf{E}\right)^{-1}-\mathbf{I}_p\right) \mathbf{z} \ & +\log \operatorname{det}\left(\mathbf{I}_p+\mathbf{E}\right) \underset{\mathcal{C}_2}{\mathcal{C}_1} \underset{\gtrless}{ } . \end{aligned}
Since $\mathbf{U z}$ for $\mathbf{U} \in \mathbb{R}^{p \times p}$, an eigenvector basis of $\left(\mathbf{I}_p+\mathbf{E}\right)^{-1}$ (and thus of $\left(\mathbf{I}_p+\mathbf{E}\right)^{-1}-$ $\mathbf{I}_p$ ), follows the same distribution as $\mathbf{z}$, the random variable $T(\mathbf{x})$ can be written as the sum of $p$ independent random variables. Further assuming that $|\boldsymbol{\mu}|=O(1)$ with respect to $p$, by Lyapunov’s central limit theorem (e.g., [Billingsley, 2012, Theorem 27.3]) and the fact that $\operatorname{Var}\left[\mathbf{z}^{\top} \mathbf{A z}\right]=2 \operatorname{tr}\left(\mathbf{A}^2\right)$ for symmetric $\mathbf{A} \in \mathbb{R}^{p \times p}$ and Gaussian $\mathbf{z}$, we have, as $p \rightarrow \infty$,
$$V_T^{-1 / 2}(T(\mathbf{x})-\bar{T}) \stackrel{d}{\rightarrow} \mathcal{N}(0,1),$$
where
\begin{aligned} \bar{T} & \equiv 4 \mu^{\top}\left(\mathbf{I}_p+\mathbf{E}\right)^{-1} \boldsymbol{\mu}+\operatorname{tr}\left(\mathbf{I}_p+\mathbf{E}\right)^{-1}-p+\log \operatorname{det}\left(\mathbf{I}_p+\mathbf{E}\right), \ V_T & \equiv 16 \boldsymbol{\mu}^{\top}\left(\mathbf{I}_p+\mathbf{E}\right)^{-2} \boldsymbol{\mu}+2 \operatorname{tr}\left(\left(\mathbf{I}_p+\mathbf{E}\right)^{-1}-\mathbf{I}_p\right)^2 . \end{aligned}

# 机器学习代考

## 计算机代写|机器学习代写machine learning代考|The Nontrivial Classification Regime

$$\mathcal{C}_1: \mathbf{x} \sim \mathcal{N}\left(\boldsymbol{\mu}, \mathbf{I}_p\right), \quad \mathcal{C}_2: \mathbf{x} \sim \mathcal{N}\left(-\boldsymbol{\mu}, \mathbf{I}_p+\mathbf{E}\right),$$

$$T(\mathbf{x}) \equiv 4 \boldsymbol{\mu}^{\top}\left(\mathbf{I}_p+\mathbf{E}\right)^{-1} \boldsymbol{\mu}+4 \boldsymbol{\mu}^{\top}\left(\mathbf{I}_p+\mathbf{E}\right)^{-1} \mathbf{z}+\mathbf{z}^{\top}\left(\left(\mathbf{I}_p+\mathbf{E}\right)^{-1}-\mathbf{I}_p\right) \mathbf{z} \quad+\log \operatorname{det}\left(\mathbf{I}_p+\mathbf{E}\right)$$

[Billingsley, 2012, Theorem 27.3]) 和事实 $\operatorname{Var}\left[\mathbf{z}^{\top} \mathbf{A z}\right]=2 \operatorname{tr}\left(\mathbf{A}^2\right)$ 对于对称 $\mathbf{A} \in \mathbb{R}^{p \times p}$ 和高斯 $\mathbf{z}$, 我 们有 $p \rightarrow \infty$ ，
$$V_T^{-1 / 2}(T(\mathbf{x})-\bar{T}) \stackrel{d}{\rightarrow} \mathcal{N}(0,1),$$

$$\bar{T} \equiv 4 \mu^{\top}\left(\mathbf{I}_p+\mathbf{E}\right)^{-1} \boldsymbol{\mu}+\operatorname{tr}\left(\mathbf{I}_p+\mathbf{E}\right)^{-1}-p+\log \operatorname{det}\left(\mathbf{I}_p+\mathbf{E}\right), V_T \quad \equiv 16 \boldsymbol{\mu}^{\top}\left(\mathbf{I}_p+\mathbf{E}\right)^{-2} \boldsymbol{\mu}+$$

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|机器学习代写machine learning代考|COMP5318

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习 machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习 machine learning代写方面经验极为丰富，各种代写机器学习 machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|机器学习代写machine learning代考|The Pitfalls of Large-Dimensional Statistics

The big data revolution comes along with the challenging needs to parse, mine, and compress a large amount of large-dimensional and possibly heterogeneous data. In many applications, the dimension $p$ of the observations is as large as $-$ if not much larger than – their number $n$. In array processing and wireless communications, the number of antennas required for fine localization resolution or increased communication throughput may be as large (today in the order of hundreds) as the number of available independent signal observations [Li and Stoica, 2007, Lu et al., 2014]. In genomics, the identification of correlations among hundreds of thousands of genes based on a limited number of independent (and expensive) samples induces an even larger ratio $\mathrm{p} / \mathrm{n}$ [Arnold et al., 1994]. In statistical finance, portfolio optimization relies on the need to invest on a large number $p$ of assets to reduce volatility but at the same time to estimate the current (rather than past) asset statistics from a relatively small number $n$ of asset return records [Laloux et al., 2000].

As we shall demonstrate in the following section, the fact that in these problems $n$ is not much larger than $p$ annihilates most of the results from standard asymptotic statistics that assume $n$ alone is large [Vaart, 2000]. As a rule of thumb, by “much larger” we mean here that $n$ must be at least 100 times larger than $p$ for standard asymptotic statistics to be of practical convenience (see our argument in Section 1.1.2). Many algorithms in statistics, signal processing, and machine learning are precisely derived from this $n \gg p$ assumption that is no longer appropriate today. A major objective of this book is to cast some light on the resulting biases and problems incurred and to provide a systematic random matrix framework to improve these algorithms.

Possibly more importantly, we will see in this book that (small $p$ ) small-dimensional intuitions at the core of many machine learning algorithms (starting with spectral clustering [Ng et al., 2002, Luxburg, 2007]) may strikingly fail when applied in a simultaneously large $n, p$ setting. A compelling example lies in the notion of “distance” between vectors. Most classification methods in machine learning are rooted in the observation that random data vectors arising from a mixture distribution (say Gaussian) gather in “groups” of close-by vectors in the Euclidean norm. When dealing with large-dimensional data, however, concentration phenomena arise that make Euclidean distances useless, if not counterproductive: Vectors from the same mixture class may be further away in Euclidean distance than vectors arising from different classes. While classification may still be doable, it works in a rather different way from our small-dimensional intuition. The book intends to prepare the reader for the multiple traps caused by this “curse of dimensionality.”

## 计算机代写|机器学习代写machine learning代考|Sample Covariance Matrices in the Large n,p Regime

Let us consider the following example that illustrates a first elementary, yet counterintuitive, result: For simultaneously large $n, p$, the sample covariance matrix $\hat{\mathbf{C}} \in \mathbb{R}^{p \times p}$ based on $n$ samples $\mathbf{x}i \sim \mathcal{N}(\mathbf{0}, \mathbf{C})$ is an entry-wise consistent estimator of the population covariance $\mathbf{C} \in \mathbb{R}^{p \times p}$ (i.e., $|\hat{\mathbf{C}}-\mathbf{C}|{\infty} \rightarrow 0$ as $p, n \rightarrow \infty$ for $|\mathbf{A}|_{\infty} \equiv \max {i j}\left|\mathbf{A}{i j}\right|$ ) while overall being an extremely poor estimator in a (more practical) operator norm sense (i.e., $|\hat{\mathbf{C}}-\mathbf{C}| \nrightarrow 0$, with $|\cdot|$ being the operator norm here). Matrix norms are, in particular, not equivalent in the large $n, p$ scenario.

Let us detail this claim, in the simplest case where $\mathbf{C}=\mathbf{I}p$. Consider a dataset $\mathbf{X}=\left[\mathbf{x}_1, \ldots, \mathbf{x}_n\right] \in \mathbb{R}^{p \times n}$ of $n$ independent and identically distributed (i.i.d.) observations from a $p$-dimensional standard Gaussian distribution, that is, $\mathbf{x}_i \sim \mathcal{N}\left(\mathbf{0}, \mathbf{I}_p\right)$ for $i \in{1, \ldots, n}$. We wish to estimate the population covariance matrix $\mathbf{C}=\mathbf{I}_p$ from the $n$ available samples. The maximum likelihood estimator in this zero-mean Gaussian setting is the sample covariance matrix $\hat{\mathbf{C}}$ defined by $$\hat{\mathbf{C}}=\frac{1}{n} \sum{i=1}^n \mathbf{x}_i \mathbf{x}_i^{\top}=\frac{1}{n} \mathbf{X} \mathbf{X}^{\top}$$
By the strong law of large numbers, for fixed $p, \hat{\mathbf{C}} \rightarrow \mathbf{I}_p$ almost surely as $n \rightarrow \infty$, so that $\left|\hat{\mathbf{C}}-\mathbf{I}_p\right| \stackrel{\text { a.s. }}{\longrightarrow} 0$ holds for any standard matrix norm and in particular for the operator norm.

One must be more careful when dealing with the case $n, p \rightarrow \infty$ with the ratio $p / n \rightarrow$ $c \in(0, \infty)$ (or, from a practical standpoint, $n$ is not much larger than $p$ ). First, note that the entry-wise convergence still holds since, invoking the law of large numbers again,
$$[\hat{\mathbf{C}}]{i j}=\frac{1}{n} \sum{l=1}^n[\mathbf{X}]{i l}[\mathbf{X}]{j l} \stackrel{\text { a.s. }}{\longrightarrow} \begin{cases}1, & i=j \ 0, & i \neq j .\end{cases}$$
Besides, by a concentration inequality argument, it can even be shown that
$$\max {1 \leq i, j \leq p}\left|\left[\hat{\mathbf{C}}-\mathbf{I}_p\right]{i j}\right| \stackrel{\text { a.s. }}{\longrightarrow} 0$$

which holds as long as $p$ is no larger than a polynomial function of $n$, and thus:
$$\left|\hat{\mathbf{C}}-\mathbf{I}p\right|{\infty} \stackrel{\text { a.s. }}{\longrightarrow} 0 \text {. }$$
Consider now the case $p>n$. Since $\hat{\mathbf{C}}=\frac{1}{n} \sum_{i=1}^n \mathbf{x}_i \mathbf{x}_i^{\top}$ is the sum of $n$ rank-one matrices, the rank of $\hat{\mathbf{C}}$ is at most equal to $n$ and thus, being a $p \times p$ matrix with $p>n$, the sample covariance matrix $\hat{\mathbf{C}}$ must be a singular matrix having at least $p-n>0$ null eigenvalues. As a consequence,
$$\left|\hat{\mathbf{C}}-\mathbf{I}_p\right| \not \neg$$
for $|\cdot|$ the matrix operator (or spectral) norm.

# 机器学习代考

## 计算机代写|机器学习代写machine learning代考|Sample Covariance Matrices in the Large n,p Regime

$$\left|\hat{\mathbf{C}}-\mathbf{I}_p\right| \not$$

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|机器学习代写machine learning代考|COMP4702

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习 machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习 machine learning代写方面经验极为丰富，各种代写机器学习 machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|机器学习代写machine learning代考|Text Clustering

Text clustering methods partition the corpus into groups of related documents belonging to particular topics or categories. However, these categories are not known a priori, because specific examples of desired categories (e.g., politics) of documents are not provided up front. Such learning problems are also referred to as unsupervised, because no guidance is provided to the learning problem. In supervised applications, one might provide examples of news articles belonging to several natural categories like sports, politics, and so on. In the unsupervised setting, the documents are partitioned into similar groups, which is sometimes achieved with a domain-specific similarity function like the cosine measure. In most cases, an optimization model can be formulated, so that some direct or indirect measure of similarity within a cluster is maximized. A detailed discussion of clustering methods is provided in Chapter 4.

Many matrix factorization methods like probabilistic latent semantic analysis and latent Dirichlet allocation also achieve a similar goal of assigning documents to topics, albeit in a soft and probabilistic way. A soft assignment refers to the fact that the probability of assignment of each document to a cluster is determined rather than a hard partitioning of the data into clusters. Such methods not only assign documents to topics but also infer the significance of the words to various topics. In the following, we provide a brief overview of various clustering methods.

Most forms of non-negative matrix factorization methods can be used for clustering text data. Therefore, certain types of matrix factorization methods play the dual role of clustering and dimensionality reduction, although this is not true across every matrix factorization method. Many forms of non-negative matrix factorization are probabilistic mixture models, in which the entries of the document-term matrix are assumed to be generated by a probabilistic process. The parameters of this random process can then be estimated in order to create a factorization of the data, which has a natural probabilistic interpretation. This type of model is also referred to as a generative model because it assumes that the document-term matrix is created by a hidden generative process, and the data are used to estimate the parameters of this process.

## 计算机代写|机器学习代写machine learning代考|Similarity-Based Algorithms

Similarity-based algorithms are typically either representative-based methods or hierarchical methods, In all these cases, a distance or similarity function between points is used to partition them into clusters in a deterministic way. Representative-based algorithms use representatives in combination with similarity functions in order to perform the clustering. The basic idea is that each cluster is represented by a multi-dimensional vector, which represents the “typical” frequency of words in that cluster. For example, the centroid of a set of documents can be used as its representative. Similarly, clusters can be created by assigning documents to their closest representatives such as the cosine similarity. Such algorithms often use iterative techniques in which the cluster representatives are extracted as central points of clusters, whereas the clusters are created from these representatives by using cosine similarity-based assignment. This two-step process is repeated to convergence, and the corresponding algorithm is also referred to as the $k$-means algorithm. There are many variations of representative-based algorithms although only a small subset of them work with the sparse and high-dimensional representation of text. Nevertheless, one can use a broader variety of methods if one is willing to transform the text data to a reduced representation with dimensionality reduction techniques.

In hierarchical clustering algorithms, similar pairs of clusters are aggregated into larger clusters using an iterative approach. The approach starts by assigning each document to its own cluster and then merges the closest pair of clusters together. There are many variations in terms of how the pairwise similarity between clusters is computed, which has a direct impact on the type of clusters discovered by the algorithm. In many cases, hierarchical clustering algorithms can be combined with representative clustering methods to create more robust methods.

# 机器学习代考

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。