## 数学代写|机器学习中的优化理论代写OPTIMIZATION FOR MACHINE LEARNING代考|Least Squares

The most important gradient formula is the one of the square loss (3), which can be obtained by expanding the norm
\begin{aligned} f(x+\varepsilon) &=\frac{1}{2}\left|A x-y+A \varepsilon\left|^2=\frac{1}{2}|A x-y|+\langle A x-y, A \varepsilon\rangle+\frac{1}{2} \mid A \varepsilon\right|^2\right.\ &=f(x)+\left\langle\varepsilon, A^{\top}(A x-y)\right\rangle+o(|\varepsilon|) . \end{aligned}
Here, we have used the fact that $|\left. A \varepsilon\right|^2=o(|\varepsilon|)$ and use the transpose matrix $A^{\top}$. This matrix is obtained by exchanging the rows and the columns, i.e. $A^{\top}=\left(A_{j, i}\right){i=1, \ldots, n}^{j=1, \ldots}$, but the way it should be remember and used is that it obeys the following swapping rule of the inner product, $$\forall(u, v) \in \mathbb{R}^p \times \mathbb{R}^n, \quad\langle A u, v\rangle{\mathbb{R}^n}=\left\langle u, A^{\top} v\right\rangle_{\mathbb{R}^p}$$
Computing gradient for function involving linear operator will necessarily requires such a transposition step. This computation shows that
$$\nabla f(x)=A^{\top}(A x-y)$$
This implies that solutions $x^{\star}$ minimizing $f(x)$ satisfies the linear system $\left(A^{\top} A\right) x^{\star}=A^{\top} y$. If $A^{\star} A \in \mathbb{R}^{p \times p}$ is invertible, then $f$ has a single minimizer, namely
$$x^{\star}=\left(A^{\top} A\right)^{-1} A^{\top} y .$$
This shows that in this case, $x^{\star}$ depends linearly on the data $y$, and the corresponding linear operator $\left(A^{\top} A\right)^{-1} A^{\star}$ is often called the Moore-Penrose pseudo-inverse of $A$ (which is not invertible in general, since typically $p \neq n$ ). The condition that $A^{\top} A$ is invertible is equivalent to ker $(A)={0}$, since
$$A^{\top} A x=0 \quad \Longrightarrow \quad \mid A x |^2=\left\langle A^{\top} A x, x\right\rangle=0 \quad \Longrightarrow \quad A x=0 .$$
In particular, if $n<p$ (under-determined regime, there is too much parameter or too few data) this can never holds. If $n \geqslant p$ and the features $x_i$ are “random” then $\operatorname{ker}(A)={0}$ with probability one. In this overdetermined situation $n \geqslant p, \operatorname{ker}(A)={0}$ only holds if the features $\left{a_i\right}_{i=1}^n$ spans a linear space $\operatorname{Im}\left(A^{\top}\right)$ of dimension strictly smaller than the ambient dimension $p$.

## 数学代写|机器学习中的优化理论代写OPTIMIZATION FOR MACHINE LEARNING代考|Link with PCA

Let us assume the $\left(a_i\right){i=1}^n$ are centered, i.e. $\sum_i a_i=0$. If this is not the case, one needs to replace $a_i$ by $a_i-m$ where $m \stackrel{\text { def. }}{=} \frac{1}{n} \sum{i=1}^n a_i \in \mathbb{R}^p$ is the empirical mean. In this case, $\frac{C}{n}=A^{\top} A / n \in \mathbb{R}^{p \times p}$ is the empirical covariance of the point cloud $\left(a_i\right)i$, it encodes the covariances between the coordinates of the points. Denoting $a_i=\left(a{i, 1}, \ldots, a_{i, p}\right)^{\top} \in \mathbb{R}^p$ (so that $\left.A=\left(a_{i, j}\right){i, j}\right)$ the coordinates, one has $$\forall(k, \ell) \in{1, \ldots, p}^2, \quad \frac{C{k, \ell}}{n}=\frac{1}{n} \sum_{i=1}^n a_{i, k} a_{i, \ell}$$
In particular, $C_{k, k} / n$ is the variance along the axis $k$. More generally, for any unit vector $u \in \mathbb{R}^p,\langle C u, u\rangle / n \geqslant$ 0 is the variance along the axis $u$.
For instance, in dimension $p=2$,
$$\frac{C}{n}=\frac{1}{n}\left(\sum_{i=1}^n a_{i, 1}^2 \quad \sum_{i=1}^n a_{i, 1} a_{i, 2}\right)$$
Since $C$ is a symmetric, it diagonalizes in an ortho-basis $U=\left(u_1, \ldots, u_p\right) \in \mathbb{R}^{p \times p}$. Here, the vectors $u_k \in \mathbb{R}^p$ are stored in the columns of the matrix $U$. The diagonalization means that there exist scalars (the eigenvalues) $\left(\lambda_1, \ldots, \lambda_p\right)$ so that $\left(\frac{1}{n} C\right) u_k=\lambda_k u_k$. Since the matrix is orthogononal, $U U^{\top}=U^{\top} U=\mathrm{Id}_p$, and equivalently $U^{-1}=U^{\top}$. The diagonalization property can be conveniently written as $\frac{1}{n} C=U \operatorname{diag}\left(\lambda_k\right) U^{\top}$. One can thus re-write the covariance quadratic form in the basis $U$ as being a separable sum of $p$ squares
$$\frac{1}{n}\langle C x, x\rangle=\left\langle U \operatorname{diag}\left(\lambda_k\right) U^{\top} x, x\right\rangle=\left\langle\operatorname{diag}\left(\lambda_k\right)\left(U^{\top} x\right),\left(U^{\top} x\right)\right\rangle=\sum_{k=1}^p \lambda_k\left\langle x, u_k\right\rangle^2 .$$
Here $\left(U^{\top} x\right)_k=\left\langle x, u_k\right\rangle$ is the coordinate $k$ of $x$ in the basis $U$. Since $\langle C x, x\rangle=|A x|^2$, this shows that all the eigenvalues $\lambda_k \geqslant 0$ are positive.

# 机器学习中的优化理论代考

## 数学代写|机器学习中的优化理论代写OPTIMIZATION FOR MACHINE LEARNING代考|Least Squares

$$f(x+\varepsilon)=\frac{1}{2}|A x-y+A \varepsilon|^2=\frac{1}{2}|A x-y|+\langle A x-y, A \varepsilon\rangle+\frac{1}{2}|A \varepsilon|^2 \quad=f(x)+\left\langle\varepsilon, A^{\top}(A x\right.$$

$$\forall(u, v) \in \mathbb{R}^p \times \mathbb{R}^n, \quad\langle A u, v\rangle \mathbb{R}^n=\left\langle u, A^{\top} v\right\rangle_{\mathbb{R}^p}$$

$$\nabla f(x)=A^{\top}(A x-y)$$

$$x^{\star}=\left(A^{\top} A\right)^{-1} A^{\top} y .$$

$$A^{\top} A x=0 \quad \Longrightarrow|A x|^2=\left\langle A^{\top} A x, x\right\rangle=0 \quad \Longrightarrow \quad A x=0 .$$

## 数学代写|机器学习中的优化理论代写OPTIMIZATION FOR MACHINE LEARNING代考|Link with PCA

$$\forall(k, \ell) \in 1, \ldots, p^2, \quad \frac{C k, \ell}{n}=\frac{1}{n} \sum_{i=1}^n a_{i, k} a_{i, \ell}$$

$$\frac{C}{n}=\frac{1}{n}\left(\sum_{i=1}^n a_{i, 1}^2 \quad \sum_{i=1}^n a_{i, 1} a_{i, 2}\right)$$

$U U^{\top}=U^{\top} U=\operatorname{Id}p$ ，并且等价地 $U^{-1}=U^{\top}$. 对角化属性可以方便地写为 $\frac{1}{n} C=U \operatorname{diag}\left(\lambda_k\right) U^{\top}$. 因此可 以重写基中的协方差二次形式 $U$ 作为一个可分离的总和 $p$ 正方形 $$\frac{1}{n}\langle C x, x\rangle=\left\langle U \operatorname{diag}\left(\lambda_k\right) U^{\top} x, x\right\rangle=\left\langle\operatorname{diag}\left(\lambda_k\right)\left(U^{\top} x\right),\left(U^{\top} x\right)\right\rangle=\sum{k=1}^p \lambda_k\left\langle x, u_k\right\rangle^2$$

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 数学代写|机器学习中的优化理论代写OPTIMIZATION FOR MACHINE LEARNING代考|Derivative and gradient

If $f$ is differentiable along each axis, we denote
$$\nabla f(x) \stackrel{\text { def. }}{=}\left(\frac{\partial f(x)}{\partial x_1}, \ldots, \frac{\partial f(x)}{\partial x_p}\right)^{\top} \in \mathbb{R}^p$$
the gradient vector, so that $\nabla f: \mathbb{R}^p \rightarrow \mathbb{R}^p$ is a vector field. Here the partial derivative (when they exits) are defined as
$$\frac{\partial f(x)}{\partial x_k} \stackrel{\text { def. }}{=} \lim _{\eta \rightarrow 0} \frac{f\left(x+\eta \delta_k\right)-f(x)}{\eta}$$
where $\delta_k=(0, \ldots, 0,1,0, \ldots, 0)^{\top} \in \mathbb{R}^p$ is the $k^{\text {th }}$ canonical basis vector.
Beware that $\nabla f(x)$ can exist without $f$ being differentiable. Differentiability of $f$ at each reads
$$f(x+\varepsilon)=f(x)+\langle\varepsilon, \nabla f(x)\rangle+o(|\varepsilon|) .$$
Here $R(\varepsilon)=o(\mid \varepsilon |)$ denotes a quantity which decays faster than $\varepsilon$ toward 0 , i.e. $\frac{R(\varepsilon)}{| \varepsilon \mid} \rightarrow 0$ as $\varepsilon \rightarrow 0$. Existence of partial derivative corresponds to $f$ being differentiable along the axes, while differentiability should hold for any converging sequence of $\varepsilon \rightarrow 0$ (i.e. not along along a fixed direction). A counter example in 2-D is $f(x)=\frac{2 x_1 x_2\left(x_1+x_2\right)}{x_1^2+x_2^2}$ with $f(0)=0$, which is affine with different slope along each radial lines.

Also, $\nabla f(x)$ is the only vector such that the relation (7). This means that a possible strategy to both prove that $f$ is differentiable and to obtain a formula for $\nabla f(x)$ is to show a relation of the form
$$f(x+\varepsilon)=f(x)+\langle\varepsilon, g\rangle+o(|\varepsilon|),$$
in which case one necessarily has $\nabla f(x)=g$.
The following proposition shows that convexity is equivalent to the graph of the function being above its tangents.

## 数学代写|机器学习中的优化理论代写OPTIMIZATION FOR MACHINE LEARNING代考|First Order Conditions

The main theoretical interest (we will see later that it also have algorithmic interest) of the gradient vector is that it is a necessarily condition for optimality, as stated below.

Proposition 2. If $x^{\star}$ is a local minimum of the function $f$ (i.e. that $f\left(x^{\star}\right) \leqslant f(x)$ for all $x$ in some ball around $x^{\star}$ ) then
$$\nabla f\left(x^{\star}\right)=0 .$$
Proof. One has for $\varepsilon$ small enough and $u$ fixed
$$f\left(x^{\star}\right) \leqslant f\left(x^{\star}+\varepsilon u\right)=f\left(x^{\star}\right)+\varepsilon\left\langle\nabla f\left(x^{\star}\right), u\right\rangle+o(\varepsilon) \quad \Longrightarrow\left\langle\nabla f\left(x^{\star}\right), u\right\rangle \geqslant o(1) \quad \Longrightarrow \quad\left\langle\nabla f\left(x^{\star}\right), u\right\rangle \geqslant 0 .$$
So applying this for $u$ and $-u$ in the previous equation shows that $\left\langle\nabla f\left(x^{\star}\right), u\right\rangle=0$ for all $u$, and hence $\nabla f\left(x^{\star}\right)=0$

Note that the converse is not true in general, since one might have $\nabla f(x)=0$ but $x$ is not a local mininimum. For instance $x=0$ for $f(x)=-x^2$ (here $x$ is a maximizer) or $f(x)=x^3$ (here $x$ is neither a maximizer or a minimizer, it is a saddle point), see Fig. 6 . Note however that in practice, if $\nabla f\left(x^{\star}\right)=0$ but $x$ is not a local minimum, then $x^{\star}$ tends to be an unstable equilibrium. Thus most often a gradient-based algorithm will converge to points with $\nabla f\left(x^{\star}\right)=0$ that are local minimizers. The following proposition shows that a much strong result holds if $f$ is convex.

Proposition 3. If $f$ is convex and $x^{\star}$ a local minimum, then $x^{\star}$ is also a global minimum. If $f$ is differentiable and convex,
$$x^{\star} \in \underset{x}{\operatorname{argmin}} f(x) \Longleftrightarrow \nabla f\left(x^{\star}\right)=0 .$$
Proof. For any $x$, there exist $0<t<1$ small enough such that $t x+(1-t) x^{\star}$ is close enough to $x^{\star}$, and so since it is a local minimizer
$$f\left(x^{\star}\right) \leqslant f\left(t x+(1-t) x^{\star}\right) \leqslant t f(x)+(1-t) f\left(x^{\star}\right) \quad \Longrightarrow \quad f\left(x^{\star}\right) \leqslant f(x)$$
and thus $x^{\star}$ is a global minimum.
For the second part, we already saw in (2) the $\Leftarrow$ part. We assume that $\nabla f\left(x^{\star}\right)=0$. Since the graph of $x$ is above its tangent by convexity (as stated in Proposition 1),
$$f(x) \geqslant f\left(x^{\star}\right)+\left\langle\nabla f\left(x^{\star}\right), x-x^{\star}\right\rangle=f\left(x^{\star}\right) .$$
Thus in this case, optimizing a function is the same a solving an equation $\nabla f(x)=0$ (actually $p$ equations in $p$ unknown). In most case it is impossible to solve this equation, but it often provides interesting information about solutions $x^{\star}$.

# 机器学习中的优化理论代考

## 数学代写|机器学习中的优化理论代写OPTIMIZATION FOR MACHINE LEARNING代考|Derivative and gradient

$$\nabla f(x) \stackrel{\text { def. }}{=}\left(\frac{\partial f(x)}{\partial x_1}, \ldots, \frac{\partial f(x)}{\partial x_p}\right)^{\top} \in \mathbb{R}^p$$

$$\frac{\partial f(x)}{\partial x_k} \stackrel{\text { def. }}{=} \lim _{\eta \rightarrow 0} \frac{f\left(x+\eta \delta_k\right)-f(x)}{\eta}$$

$$f(x+\varepsilon)=f(x)+\langle\varepsilon, \nabla f(x)\rangle+o(|\varepsilon|) .$$

$$f(x+\varepsilon)=f(x)+\langle\varepsilon, g\rangle+o(|\varepsilon|),$$

## 数学代写|机器学习中的优化理论代写OPTIMIZATION FOR MACHINE LEARNING代考|First Order Conditions

$$\nabla f\left(x^{\star}\right)=0 .$$

$$f\left(x^{\star}\right) \leqslant f\left(x^{\star}+\varepsilon u\right)=f\left(x^{\star}\right)+\varepsilon\left\langle\nabla f\left(x^{\star}\right), u\right\rangle+o(\varepsilon) \quad \Longrightarrow\left\langle\nabla f\left(x^{\star}\right), u\right\rangle \geqslant o(1) \quad \Longrightarrow \quad\left\langle\nabla f\left(x^{\star}\right)\right.$$

$$x^{\star} \in \underset{x}{\operatorname{argmin}} f(x) \Longleftrightarrow \nabla f\left(x^{\star}\right)=0 .$$

$$f\left(x^{\star}\right) \leqslant f\left(t x+(1-t) x^{\star}\right) \leqslant t f(x)+(1-t) f\left(x^{\star}\right) \quad \Longrightarrow \quad f\left(x^{\star}\right) \leqslant f(x)$$

$$f(x) \geqslant f\left(x^{\star}\right)+\left\langle\nabla f\left(x^{\star}\right), x-x^{\star}\right\rangle=f\left(x^{\star}\right) .$$

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 数学代写|机器学习中的优化理论代写OPTIMIZATION FOR MACHINE LEARNING代考|Basics of Convex Analysis

In general, there might be no solution to the optimization (1). This is of course the case if $f$ is unbounded by below, for instance $f(x)=-x^2$ in which case the value of the minimum is $-\infty$. But this might also happen if $f$ does not grow at infinity, for instance $f(x)=e^{-x}$, for which min $f=0$ but there is no minimizer. In order to show existence of a minimizer, and that the set of minimizer is bounded (otherwise one can have problems with optimization algorithm that could escape to infinity), one needs to show that one can replace the whole space $\mathbb{R}^p$ by a compact sub-set $\Omega \subset \mathbb{R}^p$ (i.e. $\Omega$ is bounded and close) and that $f$ is continuous on $\Omega$ (one can replace this by a weaker condition, that $f$ is lower-semi-continuous, but we ignore this here). A way to show that one can consider only a bounded set is to show that $f(x) \rightarrow+\infty$ when $x \rightarrow+\infty$. Such a function is called coercive. In this case, one can choose any $x_0 \in \mathbb{R}^p$ and consider its associated lower-level set
$$\Omega=\left{x \in \mathbb{R}^p ; f(x) \leqslant f\left(x_0\right)\right}$$
which is bounded because of coercivity, and closed because $f$ is continuous. One can actually show that for convex function, having a bounded set of minimizer is equivalent to the function being coercive (this is not the case for non-convex function, for instance $f(x)=\min \left(1, x^2\right)$ has a single minimum but is not coercive).
Example 1 (Least squares). For instance, for the quadratic loss function $f(x)=\frac{1}{2}|A x-y|^2$, coercivity holds if and only if $\operatorname{ker}(A)={0}$ (this corresponds to the overdetermined setting). Indeed, if $\operatorname{ker}(A) \neq{0}$ if $x^{\star}$ is a solution, then $x^{\star}+u$ is also solution for any $u \in \operatorname{ker}(A)$, so that the set of minimizer is unbounded. On contrary, if $\operatorname{ker}(A)={0}$, we will show later that the set of minimizer is unique, see Fig. 3 . If $\ell$ is strictly convex, the same conclusion holds in the case of classification.

## 数学代写|机器学习中的优化理论代写OPTIMIZATION FOR MACHINE LEARNING代考|Convexity

Convex functions define the main class of functions which are somehow “simple” to optimize, in the sense that all minimizers are global minimizers, and that there are often efficient methods to find these minimizers (at least for smooth convex functions). A convex function is such that for any pair of point $(x, y) \in\left(\mathbb{R}^p\right)^2$,
$$\forall t \in[0,1], \quad f((1-t) x+t y) \leqslant(1-t) f(x)+t f(y)$$
which means that the function is below its secant (and actually also above its tangent when this is well defined), see Fig. 4 . If $x^{\star}$ is a local minimizer of a convex $f$, then $x^{\star}$ is a global minimizer, i.e. $x^{\star} \in$ argmin $f$. Convex function are very convenient because they are stable under lots of transformation. In particular, if $f, g$ are convex and $a, b$ are positive, $a f+b g$ is convex (the set of convex function is itself an infinite dimensional convex cone!) and so is $\max (f, g)$. If $g: \mathbb{R}^q \rightarrow \mathbb{R}$ is convex and $B \in \mathbb{R}^{q \times p}, b \in \mathbb{R}^q$ then $f(x)=g(B x+b)$ is convex. This shows immediately that the square loss appearing in (3) is convex, since $|$. $|^2 / 2$ is convex (as a sum of squares). Also, similarly, if $\ell$ and hence $L$ is convex, then the classification loss function (4) is itself convex.

Strict convexity. When $f$ is convex, one can strengthen the condition (5) and impose that the inequality is strict for $t \in] 0,1[$ (see Fig. 4, right), i.e.
$$\forall t \in] 0,1[, \quad f((1-t) x+t y)<(1-t) f(x)+t f(y) .$$
In this case, if a minimum $x^{\star}$ exists, then it is unique. Indeed, if $x_1^{\star} \neq x_2^{\star}$ were two different minimizer, one would have by strict convexity $f\left(\frac{x_i^+x_2^}{2}\right)<f\left(x_1^{\star}\right)$ which is impossible.
Example 2 (Least squares). For the quadratic loss function $f(x)=\frac{1}{2}|A x-y|^2$, strict convexity is equivalent to $\operatorname{ker}(A)={0}$. Indeed, we see later that its second derivative is $\partial^2 f(x)=A^{\top} A$ and that strict convexity is implied by the eigenvalues of $A^{\top} A$ being strictly positive. The eigenvalues of $A^{\top} A$ being positive, it is equivalent to $\operatorname{ker}\left(A^{\top} A\right)={0}$ (no vanishing eigenvalue), and $A^{\top} A z=0$ implies $\left\langle A^{\top} A z, z\right\rangle=\mid A z |^2=0$ i.e. $z \in \operatorname{ker}(A)$

# 机器学习中的优化理论代考

## 数学代写|机器学习中的优化理论代写OPTIMIZATION FOR MACHINE LEARNING代考|Basics of Convex Analysis

$f(x) \rightarrow+\infty$ 什么时候 $x \rightarrow+\infty$. 这种功能称为强制性。在这种情况下，可以选择任何 $x_0 \in \mathbb{R}^p$ 并考虑其相关 的低层集
IOmega $=\backslash$ left ${x$ \in $\backslash m a t h b b{R} \wedge p ; f(x)$ \eqslant flleft(x_o\right)\right } }

(这对应于超定设置) 。的确，如果 $\operatorname{ker}(A) \neq 0$ 如果 $x^{\star}$ 是解，那么 $x^{\star}+u$ 也是任何解决方案 $u \in \operatorname{ker}(A)$ ， 所以最小化器的集合是无界的。相反，如果 $\operatorname{ker}(A)=0$ ，我们稍后将证明最小化器的集合是唯一的，见图 3。 如果 $\ell$ 是严格凸的，同样的结论在分类的情况下成立。

## 数学代写|机器学习中的优化理论代写OPTIMIZATION FOR MACHINE LEARNING代考|Convexity

$$\forall t \in[0,1], \quad f((1-t) x+t y) \leqslant(1-t) f(x)+t f(y)$$

$$\forall t \in] 0,1[, \quad f((1-t) x+t y)<(1-t) f(x)+t f(y) .$$

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。