### 统计代写|机器学习代写machine learning代考|Mathematical Models of Learning

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习machine learning代写方面经验极为丰富，各种代写机器学习machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 统计代写|机器学习代写machine learning代考|Mathematical Models of Learning

This chapter introduces different mathematical models of learning. A mathematical model of learning has the advantage that it provides bounds on the generalization ability of a learning algorithm. It also indicates which quantities are responsible for generalization. As such, the theory motivates new learning algorithms. After a short introduction into the classical parametric statistics approach to learning, the chapter introduces the PAC and VC models. These models directly study the convergence of expected risks rather than taking a detour over the convergence of the underlying probability measure. The fundamental quantity in this framework is the growth function which can be upper bounded by a one integer summary called the VC dimension. With classical structural risk minimization, where the VC dimension must be known before the training data arrives, we obtain $a$-priori bounds, that is, bounds whose values are the same for a fixed training error.

In order to explain the generalization behavior of algorithms minimizing a regularized risk we will introduce the luckiness framework. This framework is based on the assumption that the growth function will be estimated on the basis of a sample. Thus, it provides a-posteriori bounds; bounds which can only be evaluated after the training data has been seen. Finally, the chapter presents a PAC analysis for real-valued functions. Here, we take advantage of the fact that, in the case of linear classifiers, the classification is carried out by thresholding a realvalued function. The real-valued output, also referred to as the margin, allows us to define a scale sensitive version of the VC dimension which leads to tighter bounds on the expected risk. An appealing feature of the margin bound is that we can obtain nontrivial bounds even if the number of training samples is significantly less than the number of dimensions of feature space. Using a technique, which is known as the robustness trick, it will be demonstrated that the margin bound is also applicable if one allows for training error via a quadratic penalization of the diagonal of the Gram matrix.

## 统计代写|机器学习代写machine learning代考|Generative vs. Discriminative Models

In Chapter 2 it was shown that a learning problem is given by a training sample $z=(\boldsymbol{x}, \boldsymbol{y})=\left(\left(x_{1}, y_{1}\right), \ldots,\left(x_{m}, y_{m}\right)\right) \in(\mathcal{X} \times \mathcal{Y})^{m}=\mathcal{Z}^{m}$, drawn iid according to some (unknown) probability measure $\mathbf{P}{\mathrm{Z}}=\mathbf{P}{\mathrm{XY}}$, and a loss $l: \mathcal{Y} \times \mathcal{Y} \rightarrow \mathbb{R}$, which defines how costly the prediction $h(x)$ is if the true output is $y$. Then, the goal is to find a deterministic function $h \in \mathcal{Y}^{\mathcal{X}}$ which expresses the dependency implicitly expressed by $\mathbf{P}{\mathbf{Z}}$ with minimal expected loss (risk) $R[h]=\mathbf{E}{X Y}[l(h(\mathrm{X}), \mathrm{Y})]$ while only using the given training sample $z$. We have already seen in the first part of this book that there exist two different algorithmical approaches to tackling this problem. We shall now try to study the two approaches more generally to see in what respect they are similar and in which aspects they differ.

1. In the generative (or parametric) statistics approach we restrict ourselves to a parameterized space $\mathcal{P}$ of measures for the space $\mathcal{Z}$, i.e., we model the data generation process. Hence, our model is given by ${ }^{1} \mathcal{P}=\left{\mathbf{P}{\mathrm{Z} \mid \mathbf{Q}=\theta} \mid \theta \in \mathcal{Q}\right}$, where $\theta$ should be understood as the parametric description of the measure $\mathbf{P}{\mathrm{Z} \mid \mathbf{Q}=\theta}$. With a fixed loss $l$ each measure $\mathbf{P}{\mathbf{Z} \mid \mathbf{Q}=\theta}$ implicitly defines a decision function $h{\theta}$,
$$h_{\theta}(x)=\underset{y \in \mathcal{Y}}{\operatorname{argmin}} \mathbf{E}{Y \mid X=x, \mathbf{Q}=\theta}[l(y, Y)] \text {. }$$ In order to see that this function has minimal expected risk we note that $$R{\theta}[h] \stackrel{\text { def }}{=} \mathbf{E}{\mathbf{X Y |} \mathbf{Q}=\theta}[l(h(\mathrm{X}), \mathrm{Y})]=\mathbf{E}{\mathbf{X} \mid \mathbf{Q}=\theta}\left[\mathbf{E}{Y \mid X=x, \mathbf{Q}=\theta}[l(h(x), \mathrm{Y})]\right] \text {, }$$ where $h{\theta}$ minimizes the expression in the innermost brackets. For the case of zeroone loss $l_{0-1}(h(x), y)=\mathbf{I}{h(x) \neq y}$ also defined in equation $(2.10)$, the function $h{\theta}$ reduces to
$$h_{\theta}(x)=\underset{y \in \mathcal{Y}}{\operatorname{argmin}}\left(1-\mathbf{P}{\mathrm{Y} \mid \mathrm{X}=x, \mathbf{Q}=\theta}(y)\right)=\underset{y \in \mathcal{Y}}{\operatorname{argmax}} \mathbf{P}{\mathrm{Y} \mid \mathrm{X}=x, \mathbf{Q}=\theta}(y),$$
which is known as the Bayes optimal decision based on $\mathbf{P}_{\mathrm{Z} \mid \mathbf{Q}=\theta}$.
2. In the discriminative, or machine learning, approach we restrict ourselves to a parameterized space $\mathcal{H} \subseteq \mathcal{Y}^{\mathcal{X}}$ of deterministic mappings $h$ from $\mathcal{X}$ to $\mathcal{Y}$. As a consequence, the model is given by $\mathcal{H}=\left{h_{\mathrm{w}}: \mathcal{X} \rightarrow \mathcal{Y} \mid \mathbf{w} \in \mathcal{W}\right}$, where $\mathbf{w}$ is the parameterization of single hypotheses $h_{\mathrm{w}}$. Note that this can also be interpreted as

a model of the conditional distribution of classes $y \in \mathcal{Y}$ given objects $x \in \mathcal{X}$ by assuming that $\mathbf{P}{Y \mid X=x, \mathcal{H}=h}=\mathbf{I}{y=h(x)}$. Viewed this way, the model $\mathcal{H}$ is a subset of the more general model $\mathcal{P}$ used in classical statistics.

The term generative refers to the fact that the model $\mathcal{P}$ contains different descriptions of the generation of the training sample $z$ (in terms of a probability measure). Similarly, the term discriminative refers to the fact that the model $\mathcal{H}$ consists of different descriptions of the discrimination of the sample $z$. We already know that a machine learning method selects one hypothesis $\mathcal{A}(z) \in \mathcal{H}$ given a training sample $z \in \mathcal{Z}^{m}$. The corresponding selection mechanism of a probability measure $\mathbf{P}_{\mathbf{z} \mid \mathbf{Q}=\theta}$ given the training sample $z$ is called an estimator.

## 统计代写|机器学习代写machine learning代考|Classical PAC and VC Analysis

In the following three subsections we will only be concerned with the zero-one loss $l_{0-1}$ given by equation $(2.10)$. It should be noted that the results we will obtain can readily be generalized to loss function taking only a finite number values; the generalization to the case of real-valued loss functions conceptually similar but will not be discussed in this book (see Section $4.5$ for further references).

The general idea is to bound the probability of “bad training samples”, i.e., training samples $z \in \mathcal{Z}^{m}$ for which there exists a hypothesis $h \in \mathcal{H}$ where the deviation between the empirical risk $R_{\text {emp }}[h, z]$ and the expected risk $R[h]$ is larger than some prespecified $\varepsilon \in[0,1]$. Setting the probability of this to $\delta$ and solving for $\varepsilon$ gives the required generalization error bound. If we are only given a finite number $|\mathcal{H}|$ of hypotheses $h$ then such a bound is very easily obtained by a combination of Hoeffding’s inequality and the union bound.

Theorem 4.6 (VC bound for finite hypothesis spaces) Suppose we are given a hypothesis space $\mathcal{H}$ having a finite number of hypotheses, i.e., $|\mathcal{H}|<\infty$. Then, for any measure $\mathbf{P}{\mathrm{Z}}$, for all $\delta \in(0,1]$ and all training sample sizes $m \in \mathbb{N}$, with probability at least $1-\delta$ over the random draw of the training sample $z \in \mathcal{Z}^{m}$ we have $$\mathbf{P}{Z^{w}}\left(\exists h \in \mathcal{H}:\left|R[h]-R_{\operatorname{emp}}[h, \mathbf{Z}]\right|>\varepsilon\right)<2 \cdot|\mathcal{H}| \cdot \exp \left(-2 m \varepsilon^{2}\right)$$ Proof Let $\mathcal{H}=\left\{h_{1}, \ldots, h_{|\mathcal{H}|}\right\}$. By an application of the union bound given in Theorem A.107 we know that $\mathbf{P}_{Z^{m}}\left(\exists h \in \mathcal{H}:\left|R[h]-R_{\text {emp }}[h, \mathbf{Z}]\right|>\varepsilon\right)$ is given by
$$\mathbf{P}{Z^{m}}\left(\bigvee{i=1}^{|\mathcal{H}|}\left(\left|R\left[h_{i}\right]-R_{\mathrm{emp}}\left[h_{i}, \mathbf{Z}\right]\right|>\varepsilon\right)\right) \leq \sum_{i=1}^{|\mathcal{H}|} \mathbf{P}{Z^{m}}\left(\left|R\left[h{i}\right]-R_{\mathrm{emp}}\left[h_{i}, \mathbf{Z}\right]\right|>\varepsilon\right) .$$

## 统计代写|机器学习代写machine learning代考|Generative vs. Discriminative Models

1. 在生成（或参数）统计方法中，我们将自己限制在参数化空间中磷空间措施从，即我们对数据生成过程进行建模。因此，我们的模型由下式给出{ }^{1} \mathcal{P}=\left{\mathbf{P}{\mathrm{Z}\mid\mathbf{Q}=\theta}\mid\theta\in\mathcal{Q}\right } }{ }^{1} \mathcal{P}=\left{\mathbf{P}{\mathrm{Z}\mid\mathbf{Q}=\theta}\mid\theta\in\mathcal{Q}\right } }， 在哪里θ应该理解为度量的参数化描述磷从∣问=θ. 有固定损失l每个措施磷从∣问=θ隐式定义决策函数Hθ,
Hθ(X)=精氨酸是∈是和是∣X=X,问=θ[l(是,是)]. 为了看到这个函数具有最小的预期风险，我们注意到Rθ[H]= 定义 和X是|问=θ[l(H(X),是)]=和X∣问=θ[和是∣X=X,问=θ[l(H(X),是)]], 在哪里Hθ最小化最里面的括号中的表达式。对于 zeroone loss 的情况l0−1(H(X),是)=一世H(X)≠是也在等式中定义(2.10)， 功能Hθ减少到
Hθ(X)=精氨酸是∈是(1−磷是∣X=X,问=θ(是))=最大参数是∈是磷是∣X=X,问=θ(是),
这被称为基于贝叶斯最优决策磷从∣问=θ.
2. 在判别式或机器学习方法中，我们将自己限制在参数化空间中H⊆是X确定性映射H从X到是. 因此，模型由下式给出\mathcal{H}=\left{h_{\mathrm{w}}: \mathcal{X} \rightarrow \mathcal{Y} \mid \mathbf{w} \in \mathcal{W}\right}\mathcal{H}=\left{h_{\mathrm{w}}: \mathcal{X} \rightarrow \mathcal{Y} \mid \mathbf{w} \in \mathcal{W}\right}， 在哪里在是单个假设的参数化H在. 请注意，这也可以解释为

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。