## 计算机代写|深度学习代写deep learning代考|Subdifferentials

The directional derivative of $f$ at $\boldsymbol{x} \in \operatorname{dom} f$ in the direction of $\boldsymbol{y} \in \mathcal{H}$ is defined by
$$f^{\prime}(x ; y)=\lim _{\alpha \downarrow 0} \frac{f(x+\alpha y)-f(x)}{\alpha}$$ if the limit exists. If the limit exists for all $y \in \mathcal{H}$, then one says that $f$ is Gãteaux differentiable at $\boldsymbol{x}$. Suppose $f^{\prime}(\boldsymbol{x} ; \cdot)$ is linear and continuous on $\mathcal{H}$. Then, there exist a unique gradient vector $\nabla f(\boldsymbol{x}) \in \mathcal{H}$ such that
$$f^{\prime}(\boldsymbol{x} ; \boldsymbol{y})=\langle\boldsymbol{y}, \nabla f(\boldsymbol{x})\rangle, \quad \forall \boldsymbol{y} \in \mathcal{H}$$
If a function is differentiable, the convexity of a function can easily be checked using the first- and second-order differentiability, as stated in the following:

Proposition $1.1$ Let $f: \mathcal{H} \mapsto(-\infty, \infty]$ be proper. Suppose that $\operatorname{dom} f$ is open and convex, and $f$ is Gâteux differentiable on $\operatorname{dom} f$. Then, the followings are equivalent:

1. $f$ is convex.
2. (First-order): $f(\boldsymbol{y}) \geq f(\boldsymbol{x})+\langle\boldsymbol{y}-\boldsymbol{x}, \nabla f(\boldsymbol{x})\rangle, \quad \forall \boldsymbol{x}, \boldsymbol{y} \in \mathcal{H}$.
3. (Monotonicity of gradient): $\langle\boldsymbol{y}-\boldsymbol{x}, \nabla f(\boldsymbol{y})-\nabla f(\boldsymbol{x})\rangle \geq 0, \quad \forall \boldsymbol{x}, \boldsymbol{y} \in \mathcal{H}$.
If the convergence in (1.48) is uniform with respect to $\boldsymbol{y}$ on bounded sets, i.e.
$$\lim _{\boldsymbol{0} \neq \boldsymbol{y} \rightarrow \mathbf{0}} \frac{f(\boldsymbol{x}+\boldsymbol{y})-f(\boldsymbol{x})-\langle\boldsymbol{y}, \nabla f(\boldsymbol{x})\rangle}{|\boldsymbol{y}|}=0$$

## 计算机代写|深度学习代写deep learning代考|Linear and Kernel Classifiers

Classification is one of the most basic tasks in machine learning. In computer vision, an image classifier is designed to classify input images in corresponding categories. Although this task appears trivial to humans, there are considerable challenges with regard to automated classification by computer algorithms.

For example, let us think about recognizing “dog” images. One of the first technical issues here is that a dog image is usually taken in the form of a digital format such as JPEG, PNG, etc. Aside from the compression scheme used in the digital format, the image is basically just a collection of numbers on a twodimensional grid, which takes integer values from 0 to 255 . Therefore, a computer algorithm should read the numbers to decide whether such a collection of numbers corresponds to a high-level concept of “dog”. However, if the viewpoint is changed, the composition of the numbers in the array is totally changed, which poses additional challenges to the computer program. To make matters worse, in a natural setting a dog is rarely found on a white background; rather, the dog plays on the lawn or takes a nap in the living room, hides underneath furniture or chews with her eyes closed, which makes the distribution of the numbers very different depending on the situation. Additional technical challenges in computer-based recognition of a dog come from all kinds of sources such as different illumination conditions, different poses, occlusion, intra-class variation, etc., as shown in Fig. 2.1. Therefore, designing a classifier that is robust to such variations was one of the important topics in computer vision literature for several decades.

In fact, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [7] was initiated to evaluate various computer algorithms for image classification at large scale. ImageNet is a large visual database designed for use in visual object recognition software research [8]. Over 14 million images have been hand-annotated in the project to indicate which objects are depicted, and at least one million of the images also have bounding boxes. In particular, ImageNet contains more than 20,000 categories made up of several hundred images. Since 2010, the ImageNet project has organized an annual software competition, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), in which software programs compete for the correct classification and recognition of objects and scenes. The main motivation is to allow researchers to compare progress in classification across a wider variety of objects. Since the introduction of AlexNet in 2012 [9], which was the first deep learning approach to win the ImageNet Challenge, the state-of-the art image classification methods are all deep learning approaches, and now their performance even surpasses human observers.

## 计算机代写|深度学习代写deep learning代考|Subdifferentials

$$f^{\prime}(x ; y)=\lim _{\alpha \downarrow 0} \frac{f(x+\alpha y)-f(x)}{\alpha}$$

$$f^{\prime}(\boldsymbol{x} ; \boldsymbol{y})=\langle\boldsymbol{y}, \nabla f(\boldsymbol{x})\rangle, \quad \forall \boldsymbol{y} \in \mathcal{H}$$

1. $f$ 是凸的。
2. (第一个订单) : $f(\boldsymbol{y}) \geq f(\boldsymbol{x})+\langle\boldsymbol{y}-\boldsymbol{x}, \nabla f(\boldsymbol{x})\rangle, \quad \forall \boldsymbol{x}, \boldsymbol{y} \in \mathcal{H}$.
3. (梯度的单调性) : $\langle\boldsymbol{y}-\boldsymbol{x}, \nabla f(\boldsymbol{y})-\nabla f(\boldsymbol{x})\rangle \geq 0, \quad \forall \boldsymbol{x}, \boldsymbol{y} \in \mathcal{H}$. 如果 (1.48) 中的收敛是一致的 $\boldsymbol{y}$ 在有界集上，即
$$\lim _{\boldsymbol{0} \neq \boldsymbol{y} \rightarrow 0} \frac{f(\boldsymbol{x}+\boldsymbol{y})-f(\boldsymbol{x})-\langle\boldsymbol{y}, \nabla f(\boldsymbol{x})\rangle}{|\boldsymbol{y}|}=0$$

## 计算机代写|深度学习代写deep learning代考|Some Definitions

Let $\mathcal{X}, \mathcal{Y}$ and $Z$ be non-empty sets. The identity operator on $\mathcal{H}$ is denoted by $I$, i.e. $I x=x, \forall x \in \mathcal{H}$. Let $\mathcal{D} \subset \mathcal{H}$ be a non-emply sel. The set of the fixed points of an operator $\mathcal{T}: D \mapsto D$ is denoted by
$$\operatorname{Fix} \mathcal{T}={x \in \mathcal{D} \mid \mathcal{T} x=x}$$
Let $\mathcal{X}$ and $\mathcal{Y}$ be real normed vector space. As a special case of an operator, we define a set of linear operators:
$$\mathcal{B}(\mathcal{X}, \mathcal{Y})={\mathcal{T}: \mathcal{Y} \mapsto \mathcal{Y} \mid \mathcal{T} \text { is linear and continuous }}$$
and we write $\mathcal{B}(\mathcal{X})=\mathcal{B}(\mathcal{X}, \mathcal{X})$. Let $f: \mathcal{X} \mapsto[-\infty, \infty]$ be a function. The domain of $f$ is
$$\operatorname{dom} f={\boldsymbol{x} \in \mathcal{X} \mid f(\boldsymbol{x})<\infty}$$
the graph of $f$ is
$$\operatorname{gra} f={(\boldsymbol{x}, y) \in \mathcal{X} \times \mathbb{R} \mid f(\boldsymbol{x})=y},$$
and the epigraph of $f$ is
$$\text { eنi } f={(x, y) . x \in X, y \in \mathbb{R}, y \geq f(x)} \text {. }$$

## 计算机代写|深度学习代写deep learning代考|Convex Sets, Convex Functions

A function $f(\boldsymbol{x})$ is a convex function if $\operatorname{dom} f$ is a convex set and
$$f\left(\theta \boldsymbol{x}{1}+(1-\theta) \boldsymbol{x}{2}\right) \leq \theta f\left(\boldsymbol{x}{1}\right)+(1-\theta) f\left(\boldsymbol{x}{1}\right)$$
for all $x_{1}, x_{2} \in \operatorname{dom} f, 0 \leq \theta \leq 1$. A convex set is a set that contains every line segment between any two points in the set (see Fig. 1.3). Specifically, a set $C$ is convex if $\boldsymbol{x}{1}, \boldsymbol{x}{2} \in \mathcal{C}^{\prime}$, then $\theta \boldsymbol{x}{1}+(1-\theta) \boldsymbol{x}{2} \in \mathcal{C}$ for all $0 \leq \theta \leq 1$. The relation between a convex function and a convex set can also be stated using its epigraph. Specifically, a function $f(x)$ is convex if and only if its epigraph epi $f$ is a convex set.

Convexity is preserved under various operations. For example, if $\left{f_{i}\right}_{i \in I}$ is a family of convex functions, then, $\sup {i \in I} f{i}$ is convex. In addition, a set of convex functions is closed under addition and multiplication by strictly positive real numbers. Moreover, the limit point of a convergent sequence of convex functions is also convex. Important examples of convex functions are summarized in Table $1.1$.

## 计算机代写|深度学习代写deep learning代考|Some Definitions

$\operatorname{Fix} \mathcal{T}=x \in \mathcal{D} \mid \mathcal{T} x=x$

$\mathcal{B}(\mathcal{X}, \mathcal{Y})=\mathcal{T}: \mathcal{Y} \mapsto \mathcal{Y} \mid \mathcal{T}$ is linear and continuous

$$\operatorname{dom} f=\boldsymbol{x} \in \mathcal{X} \mid f(\boldsymbol{x})<\infty$$

$$\operatorname{gra} f=(\boldsymbol{x}, y) \in \mathcal{X} \times \mathbb{R} \mid f(\boldsymbol{x})=y,$$

$$\text { eui } f=(x, y) . x \in X, y \in \mathbb{R}, y \geq f(x)$$

## 计算机代写|深度学习代写deep learning代考|Convex Sets, Convex Functions

$$f(\theta \boldsymbol{x} 1+(1-\theta) \boldsymbol{x} 2) \leq \theta f(\boldsymbol{x} 1)+(1-\theta) f(\boldsymbol{x} 1)$$

## 计算机代写|深度学习代写deep learning代考|Metric Space

A metric space $(\mathcal{X}, d)$ is a set $\chi$ together with a metric $d$ on the set. Here, a metric is a function that defines a concept of distance between any two members of the set, which is formally defined as follows.

Definition 1.1 (Metric) A metric on a set $X$ is a function called the distance $d$ : $\mathcal{X} \times \mathcal{X} \mapsto \mathbb{R}{+}$, where $\mathbb{R}{+}$is the set of non-negative real numbers. For all $x, y, z \in \mathcal{X}$, this function is required to satisfy the following conditions:

1. $d(x, y) \geq 0$ (non-negativity).
2. $d(x, y)=0$ if and only if $x=y$.
3. $d(x, y)=d(y, x)$ (symmetry).
4. $d(x, z) \leq d(x, y)+d(y, z)$ (triangle inequality).
A metric on a space induces topological properties like open and closed sets, which lead to the study of more abstract topological spaces. Specifically, about any point $x$ in a metric space $\mathcal{X}$, we define the open ball of radius $r>0$ about $x$ as the set
$$B_{r}(x)={y \in \mathcal{X}: d(x, y)0 such that B_{r}(x) is contained in U. The complement of an open set is called closed. ## 计算机代写|深度学习代写deep learning代考|Banach and Hilbert Space An inner product space is defined as a vector space that is equipped with an inner product. A normed space is a vector space on which a norm is defined. An inner product space is always a normed space since we can define a norm as |f|= \sqrt{\langle\boldsymbol{f}, \boldsymbol{f}\rangle}, which is often called the induced norm. Among the various forms of the normed space, one of the most useful normed spaces is the Banach space. Definition 1.7 The Banach space is a complete normed space. Here, the “completeness” is especially important from the optimization perspective, since most optimization algorithms are implemented in an iterative manner so that the final solution of the iterative method should belong to the underlying space \mathcal{H}. Recall that the convergence property is a property of a metric space. Therefore, the Banach space can be regarded as a vector space equipped with desirable properties of a metric space. Similarly, we can define the Hilbert space. Definition 1.8 The Hilbert space is a complete inner product space. We can easily see that the Hilbert space is also a Banach space thanks to the induced norm. The inclusion relationship between vector spaces, normed spaces, inner product spaces, Banach spaces and Hilbert spaces is illustrated in Fig. 1.1. As shown in Fig. 1.1, the Hilbert space has many nice mathematical structures such as inner product, norm, completeness, etc., so it is widely used in the machine learning literature. The following are well-known examples of Hilbert spaces: • l^{2}(\mathbb{Z}) : a function space composed of square summable discrete-time signals, i.e.$$
l^{2}(\mathbb{Z})=\left{x=\left.\left{x_{l}\right}_{l=-\infty}^{\infty}\left|\sum_{l=-\infty}^{\infty}\right| x_{l}\right|^{2}<\infty\right} .
$$## 深度学习代写 ## 计算机代写|深度学习代写deep learning代考|Metric Space 度量空间 (\mathcal{X}, d) 是一个集合 \chi 连同一个指标 d 在片场。这里，度量是定义集合中任意两个成员之间距离概念的函 数，其正式定义如下。 定义 1.1 (度量) 集合上的度量 X 是一个叫做距离的函数 d: \mathcal{X} \times \mathcal{X} \mapsto \mathbb{R}+ ，在哪里 \mathbb{R}+ 是一组非负实数。对所 有人 x, y, z \in \mathcal{X} ，该函数需要满足以下条件: 1. d(x, y) \geq 0 (非消极性) 。 2. d(x, y)=0 当且仅当 x=y. 3. d(x, y)=d(y, x) (对称)。 4. d(x, z) \leq d(x, y)+d(y, z) (三角不等式)。 空间上的度量会引发诸如开集和闭集之类的拓扑性质，从而导致对更抽象的拓扑空间的研究。具体来说，关 于任何一点 x 在度量空间 \mathcal{X} ，我们定义半径的开球 r>0 关于 x 作为集合 \ \$$
$\mathrm{B}{-}{\mathrm{r}}(\mathrm{x})=\left{\mathrm{y} \backslash\right.$ in Imathcal ${\mathrm{X}}: \mathrm{d}(\mathrm{x}, \mathrm{y}) 0$ suchthat $\mathrm{B}{-}{\mathrm{r}}(\mathrm{x})$ iscontainedin 美元。开集的补集称为闭集。

## 计算机代写|深度学习代写deep learning代考|Banach and Hilbert Space

• $l^{2}(\mathbb{Z})$ : 由平方和离散时间信号组成的函数空间，即

## 有限元方法代写

## 机器学习代写|流形学习代写manifold data learning代考|Curves and Geodesics

If the Riemannian manifold $(\mathcal{M}, g)$ is connected, it is a metric space with an induced topology that coincides with the underlying manifold topology. We can, therefore, define a function $d^{\mathcal{M}}$ on $\mathcal{M}$ that calculates distances between points on $\mathcal{M}$ and determines its structure.

Let $\mathbf{p}, \mathbf{q} \in \mathcal{M}$ be any two points on the Riemannian manifold $\mathcal{M}$. We first define the length of a (one-dimensional) curve in $\mathcal{M}$ that joins $\mathbf{p}$ to $\mathbf{q}$, and then the length of the shortest such curve.

A curve in $\mathcal{M}$ is defined as a smooth mapping from an open interval $\Lambda$ (which may have infinite length) in $\Re$ into $\mathcal{M}$. The point $\lambda \in \Lambda$ forms a parametrization of the curve. Let $c(\lambda)=\left(c_{1}(\lambda), \cdots, c_{d}(\lambda)\right)^{\top}$ be a curve in $\Re^{d}$ parametrized by $\lambda \in \Lambda \subseteq \Re$. If we take the coordinate functions, $\left{c_{h}(\lambda)\right}$, of $c(\lambda)$ to be as smooth as needed (usually, $\mathcal{C}^{\infty}$, functions that have any number of continuous derivatives), then we say that $c$ is a smooth curve. If $c(\lambda+\alpha)=c(\lambda)$ for all $\lambda, \lambda+\alpha \in \Lambda$, the curve $c$ is said to be closed. The velocity (or tangent) vector at the point $\lambda$ is given by
$$c^{\prime}(\lambda)=\left(c_{1}^{\prime}(\lambda), \cdots, c_{d}^{\prime}(\lambda)\right)^{\tau},$$
where $c_{j}^{\prime}(\lambda)=d c_{j}(\lambda) / d \lambda$, and the “speed” of the curve is
$$\left|c^{\prime}(\lambda)\right|=\left{\sum_{j=1}^{d}\left[c_{j}^{\prime}(\lambda)\right]^{2}\right}^{1 / 2}$$
Distance on a smooth curve $c$ is given by arc-length, which is measured from a fixed point $\lambda_{0}$ on that curve. Usually, the fixed point is taken to be the origin, $\lambda_{0}=0$, defined to be one of the two endpoints of the data. More generally, the arc-length $L(c)$ along the curve $c(\lambda)$ from point $\lambda_{0}$ to point $\lambda_{1}$ is defined as
$$L(c)=\int_{\lambda_{0}}^{\lambda_{1}}\left|c^{\prime}(\lambda)\right| d \lambda .$$

## 机器学习代写|流形学习代写manifold data learning代考|Linear Manifold Learning

Most statistical theory and applications that deal with the problem of dimensionality reduction are focused on linear dimensionality reduction and, by extension, linear manifold learning. A linear manifold can be visualized as a line, a plane, or a hyperplane, depending upon the number of dimensions involved. Data are observed in some high-dimensional space and it is usually assumed that a lower-dimensional linear manifold would be the most appropriate summary of the relationship between the variables. Although data tend not to live on a linear manifold, we view the problem as having two kinds of motivations. The first such motivation is to assume that the data live close to a linear manifold, the distance off the manifold determined by a random error (or noise) component. A second way of thinking about linear manifold learning is that a linear manifold is really a simple linear approximation to a more complicated type of nonlinear manifold that would probably be a better fit to the data. In both scenarios, the intrinsic dimensionality of the linear manifold is taken to be much smaller than the dimensionality of the data.

Identifying a linear manifold embedded in a higher-dimensional space is closely related to the classical statistics problem of linear dimensionality reduction. The recommended way of accomplishing linear dimensionality reduction is to create a reduced set of linear transformations of the input variables. Linear transformations are projection methods, and so the problem is to derive a sequence of low-dimensional projections of the input data that possess some type of optimal properties.

There are many techniques that can be used for either linear dimensionality reduction or linear manifold learning. In this chapter, we describe only two linear methods, namely, principal component analysis and multidimensional scaling. The earliest projection method was principal component analysis (dating back to 1933), and this technique has become the most popular dimensionality-reducing technique in use today. A related method is that of multidimensional scaling (dating back to 1952), which has a very different motivation. An adaptation of multidimensional scaling provided the core element of the IsOMAP algorithm for nonlinear manifold learning.

## 机器学习代写|流形学习代写manifold data learning代考|Curves and Geodesics

lleft{c_{h}(Nambda)\right }，的 $c(\lambda)$ 尽可能平滑（通常， $\mathcal{C}^{\infty}$ ，具有任意数量的连续导数的函数），那么我们说 $c$ 是 一条平滑曲线。如果 $c(\lambda+\alpha)=c(\lambda)$ 对所有人 $\lambda, \lambda+\alpha \in \Lambda$, 曲线 $c$ 据说是关闭的。该点的速度 (或切线) 矢 量 $\lambda$ 是 (谁) 给的
$$c^{\prime}(\lambda)=\left(c_{1}^{\prime}(\lambda), \cdots, c_{d}^{\prime}(\lambda)\right)^{\tau}$$

$$L(c)=\int_{\lambda_{0}}^{\lambda_{1}}\left|c^{\prime}(\lambda)\right| d \lambda .$$

## 机器学习代写|流形学习代写manifold data learning代考|Topological Spaces

Topological spaces were introduced by Maurice Fréchet (1906) (in the form of metric spaces), and the idea was developed and extended over the next few decades. Amongst those who contributed significantly to the subject was Felix Hausdorff, who in 1914 coined the phrase “topological space” using Johann Benedict Listing’s German word Topologie introduced in $1847 .$

A topological space $\mathcal{X}$ is a nonempty collection of subsets of $\mathcal{X}$ which contains the empty set, the space itself, and arbitrary unions and finite intersections of those sets. A topological space is often denoted by $(\mathcal{X}, \mathcal{T})$, where $\mathcal{T}$ represents the topology associated with $\mathcal{X}$. The elements of $\mathcal{T}$ are called the open sets of $\mathcal{X}$, and a set is closed if its complement is open. Topological spaces can also be characterized through the concept of neighborhood. If $\mathbf{x}$ is a point in a topological space $\mathcal{X}$, its neighborhood is a set that contains an open set that contains $\mathbf{x}$.
Let $\mathcal{X}$ and $\mathcal{Y}$ be two topological spaces, and let $U \subset \mathcal{X}$ and $V \subset \mathcal{Y}$ be open subsets. Consider the family of all cartesian products of the form $U \times V$. The topology formed from these products of open subsets is called the product topology for $\mathcal{X} \times \mathcal{Y}$. If $W \subset \mathcal{X} \times \mathcal{Y}$, then $W$ is open relative to the product topology iff for each point $(x, y) \in \mathcal{X} \times \mathcal{Y}$ there are open neighborhoods, $U$ of $x$ and $V$ of $y$, such that $U \times V \subset W$. For example, the usual topology for $d$-dimensional Euclidean space $\Re^{d}$ consists of all open sets of points in $\Re^{d}$, and this topology is equivalent to the product topology for the product of $d$ copies of $\Re$.

One of the core elements of manifold learning involves the idea of “embedding” one topological space inside another. Loosely speaking, the space $\mathcal{X}$ is said to be embedded in the space $\mathcal{Y}$ if the topological properties of $\mathcal{Y}$ when restricted to $\mathcal{X}$ are identical to the topological properties of $\mathcal{X}$. To be more specific, we state the following definitions. A function $g: \mathcal{X} \rightarrow \mathcal{Y}$ is said to be continuous if the inverse image of an open set in $\mathcal{Y}$ is an open set in $\mathcal{X}$. If $g$ is a bijective (i.e., one-to-one and onto) function such that $g$ and its inverse $g^{-1}$ are continuous, then $g$ is said to be a homeomorphism. Two topological spaces $\mathcal{X}$ and $\mathcal{Y}$ are said to be homeomorphic (or topologically equivalent) if there exists a homeomorphism from one space onto the other. A topological space $\mathcal{X}$ is said to be embedded in a topological space $\mathcal{Y}$ if $\mathcal{X}$ is homeomorphic to a subspace of $\mathcal{Y}$.

## 机器学习代写|流形学习代写manifold data learning代考|Riemannian Manifolds

In the entire theory of topological manifolds, there is no mention of the use of calculus. However, in a prototypical application of a “manifold,” calculus enters in the form of a “smooth” (or differentiable) manifold $\mathcal{M}$, also known as a Riemannian manifold; it is usually defined in differential geometry as a submanifold of some ambient (or surrounding) Euclidean space, where the concepts of length, curvature, and angle are preserved, and where smoothness relates to differentiability. The word manifold (in German, Mannigfaltigkeit) was coined in an “intuitive” way and without any precise definition by Georg Friedrich Bernhard Riemann (1826-1866) in his 1851 doctoral dissertation (Riemann, 1851; Dieudonné, 2009); in 1854, Riemann introduced in his famous Habilitations lecture the idea of a topological manifold on which one could carry out differential and integral calculus.

A topological manifold $\mathcal{M}$ is called a smooth (or differentiable) manifold if $\mathcal{M}$ is continuously differentiable to any order. All smooth manifolds are topological manifolds, but the reverse is not necessarily true. (Note: Authors often differ on the precise definition of a “smooth” manifold.)

We now define the analogue of a homeomorphism for a differentiable manifold. Consider two open sets, $U \in \Re^{r}$ and $V \in \Re^{s}$, and let $g: U \rightarrow V$ so that for $\mathbf{x} \in U$ and $\mathbf{y} \in V, g(\mathbf{x})=$ y. If the function $g$ has finite first-order partial derivatives, $\partial y_{j} / \partial x_{i}$, for all $i=1,2, \ldots, r$, and all $j=1,2, \ldots, s$, then $g$ is said to be a smooth (or differentiable) mapping on $U$. We also say that $g$ is a $\mathcal{C}^{1}$-function on $U$ if all the first-order partial derivatives are continuous. More generally, if $g$ has continuous higher-order partial derivatives, $\partial^{k_{1}+\cdots+k_{r}} y_{j} / \partial x_{1}^{k_{1}} \cdots \partial x_{r}^{k_{r}}$, for all $j=1,2, \ldots, s$ and all nonnegative integers $k_{1}, k_{2}, \ldots, k_{r}$ such that $k_{1}+k_{2}+\cdots+k_{r} \leq r$, then we say that $g$ is a $\mathcal{C}^{\top}$-function, $r=1,2, \ldots$. If $g$ is a $\mathcal{C}^{r}$-function for all $r \geq 1$, then we say that $g$ is a $\mathcal{C}^{\infty}$-function.

If $g$ is a homeomorphism from an open set $U$ to an open set $V$, then it is said to be a $\mathcal{C}^{r}$-diffeomorphism if $g$ and its inverse $g^{-1}$ are both $\mathcal{C}^{r}$-functions. A $\mathcal{C}^{\infty}$-diffeomorphism is simply referred to as a diffeomorphism. We say that $U$ and $V$ are diffeomorphic if there exists a diffeomorphism between them. These definitions extend in a straightforward way to manifolds. For example, if $\mathcal{X}$ and $\mathcal{Y}$ are both smooth manifolds, the function $g: \mathcal{X} \rightarrow \mathcal{Y}$ is a diffeomorphism if it is a homeomorphism from $\mathcal{X}$ to $\mathcal{Y}$ and both $g$ and $g^{-1}$ are smooth. Furthermore, $\mathcal{X}$ and $\mathcal{Y}$ are diffeomorphic if there exists a diffeomorphism between them, in which case, $\mathcal{X}$ and $\mathcal{Y}$ are essentially indistinguishable from each other.

## 机器学习代写|流形学习代写manifold data learning代考|Topological Spaces

Maurice Fréchet (1906) 引入了拓扑空间（以度量空间的形式），这个想法在接下来的几十年中得到发展和扩 展。对这个主题做出重大贡献的人中有 Felix Hausdorff，他在 1914 年使用 Johann Benedict Listing 的德语单词 Topologie 创造了“拓扑空间”一词。1847.

## 有限元方法代写

## 机器学习代写|流形学习代写manifold data learning代考|Spectral Embedding Methods for Manifold Learning

Manifold learning encompasses much of the disciplines of geometry, computation, and statistics, and has become an important research topic in data mining and statistical learning. The simplest description of manifold learning is that it is a class of algorithms for recovering a low-dimensional manifold embedded in a high-dimensional ambient space. Major breakthroughs on methods for recovering low-dimensional nonlinear embeddings of highdimensional data (Tenenbaum, de Silva, and Langford, 2000; Roweis and Saul, 2000) led to the construction of a number of other algorithms for carrying out nonlinear manifold learning and its close relative, nonlinear dimensionality reduction. The primary tool of all embedding algorithms is the set of eigenvectors associated with the top few or bottom few eigenvalues of an appropriate random matrix. We refer to these algorithms as spectral embedding methods. Spectral embedding methods are designed to recover linear or nonlinear manifolds, usually in high-dimensional spaces.

Linear methods, which have long been considered part-and-parcel of the statistician’s toolbox, include PRINCIPAL COMPONENT ANALYSIS (PCA) and MULTIDIMENSIONAL SCALING (MDS). PCA has been used successfully in many different disciplines and applications. In computer vision, for example, PCA is used to study abstract notions of shape, appearance, and motion to help solve problems in facial and object recognition, surveillance, person tracking, security, and image compression where data are of high dimensionality (Turk and Pentland, 1991; De la Torre and Black, 2001). In astronomy, where very large digital sky surveys have become the norm, PCA has been used to analyze and classify stellar spectra, carry out morphological and spectral classification of galaxies and quasars, and analyze images of supernova remnants (Steiner, Menezes, Ricci, and Oliveira, 2009). In bioinformatics, PCA has been used to study high-dimensional data generated by genome-wide, gene-expression experiments on a variety of tissue sources, where scatterplots of the top principal components in such studies often show specific classes of genes that are expressed by different clusters of distinctive biological characteristics (Yeung and Ruzzo, 2001; ZhengBradley, Rung, Parkinson, and Brazma, 2010). PCA has also been used to select an optimal subset of single nucleotide polymorphisms (SNPs) (Lin and Altman, 2004). PCA is also used to derive approximations to more complicated nonlinear subspaces, including problems involving data interpolation, compression, denoising, and visualization.

## 机器学习代写|流形学习代写manifold data learning代考|Spaces and Manifolds

Manifold learning involves concepts from general topology and differential geometry. Good introductions to topological spaces include Kelley (1955), Willard (1970), Bourbaki (1989), Mendelson (1990), Steen (1995), James (1999), and several of these have since been reprinted. Books on differential geometry include Spivak (1965), Kreyszig (1991), Kühnel (2000), Lee (2002), and Pressley (2010).

Manifolds generalize the notions of curves and surfaces in two and three dimensions to higher dimensions. Before we give a formal description of a manifold, it will be helpful to visualize the notion of a manifold. Imagine an ant at a picnic, where there are all sorts of items from cups to doughnuts. The ant crawls all over the picnic items, but because of its tiny size, the ant sees everything on a very small scale as flat and featureless. Similarly, a human, looking around at the immediate vicinity, would not see the curvature of the earth. A manifold (also referred to as a topological manifold) can be thought of in similar terms, as a topological space that locally looks flat and featureless and behaves like Euclidean space. Unlike a metric space, a topological space has no concept of distance. In this Section, we review specific definitions and ideas from topology and differential geometry that enable us to provide a useful definition of a manifold.

## 机器学习代写|自然语言处理代写NLP代考|Input embedding

The input embedding sub-layer converts the input tokens to vectors of dimension $d_{\text {modd }}=512$ using learned embeddings in the original Transformer model. The structure of the input embedding is classical:

The embedding sub-layer works like other standard transduction models. A tokenizer will transform a sentence into tokens. Each tokenizer has its methods, but the results are similar. For example, a tokenizer applied to the sequence “the Transformer is an innovative NLP model!” will produce the following tokens in one type of model:You will notice that this tokenizer normalized the string to lower case and truncated it into subparts. A tokenizer will generally provide an integer representation that will be used for the embedding process. For example:

There is not enough information in the tokenized text at this point to go further. The tokenized text must be embedded.
The Transformer contains a learned embedding sub-layer. Many embedding methods can be applied to the tokenized input.
I chose the skip-gram architecture of the word2vec embedding approach Google made available in 2013 to illustrate the embedding sublayer of the Transformer. A skip-gram will focus on a center word in a window of words and predicts context words. For example, if word(i) is the center word in a two-step window, a skipgram model will analyze word(i-2), word(i-1), word(i+1), and word(i+2). Then the window will slide and repeat the process. A skip-gram model generally contains an input layer, weights, a hidden layer, and an output containing the word cmbeddings of the tokenized input words.
Suppose we need to perform embedding for the following sentence:
The black cat sat on the couch and the brown dog slept on the rug.
We will focus on two words, black and brown. The word embedding vectors of these two words should be similar.
Since we must produce a vector of size $d_{\text {madel }}=512$ for each word, we will obtain a size 512 vector embedding for each word:The word black is now represented by 512 dimensions. Other embedding methods could be used and $d_{\text {mudel }}$ could have a higher number of dimensions.

## 机器学习代写|自然语言处理代写NLP代考|Positional encoding

We enter this positional encoding function of the Transformer with no idea of the position of a word in a sequence:

We cannot create independent positional vectors that would have a high cost on the training speed of the Transformer and make attention sub-layers very complex to work with. The idea is to add a positional encoding value to the input embedding instead of having additional vectors to describe the position of a token in a sequence.
We also know that the Transformer expects a fixed size $d_{\text {madel }}=512$ (or other constant value for the model) for each vector of the output of the positional encoding function.
If we go back to the sentence we used in the word embedding sub-layer, we can see that black and brown may be similar, but they are far apart:
The black cat sat on the couch and the brown dog slept on the rug.
The word black is in position 2, pos $=2$, and the word brown is in position 10 , pos $=10$.
Our problem is to find a way to add a value to the word embedding of each word so that it has that information. However, we need to add a value to the $d_{\text {madel }}=512$ dimensions! For each word embedding vector, we need to find a way to provide information to $i$ in the range $(\theta, 512)$ dimensions of the word embedding vector of black and brown.

There are many ways to achieve this goal. The designers found a clever way to use a unit sphere to represent positional encoding with sine and cosine values that will thus remain small but very useful.

## 机器学习代写|自然语言处理代写NLP代考|Input embedding

Transformer 包含一个学习的嵌入子层。许多嵌入方法可以应用于标记化输入。

## 机器学习代写|自然语言处理代写NLP代考|Positional encoding

black 这个词在位置 2，pos=2, 单词 brown 在位置 10 , pos=10.

## 有限元方法代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 机器学习代写|自然语言处理代写NLP代考|The rise of the Transformer: Attention Is All You Need

In December 2017, Vaswani et al. published their seminal paper, Attention Is All You Need. They performed their work at Google Research and Google Brain. I will refer to the model described in Attention Is All You Need as the “original Transformer model” throughout this chapter and book.

In this section, we will look at the Transformer model they built from the outside. In the following sections, we will explore what is inside each component of the model.
The original Transformer model is a stack of 6 layers. The output of layer $l$ is the input of layer $l+1$ until the final prediction is reached. There is a 6-layer encoder stack on the left and a 6-layer decoder stack on the right:

On the left, the inputs enter the encoder side of the Transformer through an attention sub-layer and FeedForward Network (FFN) sub-layer. On the right, the target outputs go into the decoder side of the Transformer through two attention sub-layers and an FFN sub-layer. We immediately notice that there is no RNN, LSTM, or CNN. Recurrence has been abandoned.
Attention has replaced recurrence, which requires an increasing number of operations as the distance between two words increases. The attention mechanism is a “word-to-word” operation. The attention mechanism will find how each word is related to all other words in a sequence, including the word being analyzed itself. Let’s examine the following sequence:

The attention mechanism will provide a deeper relationship between words and produce better results.
For each attention sub-layer, the original Transformer model runs not one but eight attention mechanisms in parallel to speed up the calculations. We will explore this architecture in the following section, The encoder stack. This process is named “multihead attention, ” providing:

• A broader in-depth analysis of sequences
• The preclusion of recurrence reducing calculation operations
• The implementation of parallelization, which reduces training time
• Each attention mechanism learns different perspectives of the same input sequence

## 机器学习代写|自然语言处理代写NLP代考|The encoder stack

The layers of the encoder and decoder of the original Transformer model are stacks of layers. Each layer of the encoder stack has the following structure:

The original encoder layer structure remains the same for all of the $N=6$ layers of the Transformer model. Each layer contains two main sub-layers: a multi-headed attention mechanism and a fully connected position-wise feedforward network.
Notice that a residual connection surrounds each main sub-layer, Sublayer $(x)$, in the Transformer model. These connections transport the unprocessed input $x$ of a sublayer to a layer normalization function. This way, we are certain that key information such as positional encoding is not lost on the way. The normalized output of each layer is thus:
LayerNormalization $(x+$ Sublayer $(x))$
Though the structure of each of the $N=6$ layers of the encoder is identical, the content of each layer is not strictly identical to the previous layer.
For example, the embedding sub-layer is only present at the bottom level of the stack. The other five layers do not contain an embedding layer, and this guarantees that the encoded input is stable through all the layers.

Also, the multi-head attention mechanisms perform the same functions from layer 1 to 6 . However, they do not perform the same tasks. Each layer learns from the previous layer and explores different ways of associating the tokens in the sequence. It looks for various associations of words, just like how we look for different associations of letters and words when we solve a crossword puzzle.
The designers of the Transformer introduced a very efficient constraint. The output of every sub-layer of the model has a constant dimension, including the embedding layer and the residual connections. This dimension is $d_{\text {madd }}$ and can be set to another value depending on your goals. In the original Transformer architecture, $d_{\text {madel }}=512$.

## 机器学习代写|自然语言处理代写NLP代考|The rise of the Transformer: Attention Is All You Need

2017 年 12 月，Vaswani 等人。发表了他们的开创性论文，Attention Is All You Need。他们在 Google Research 和 Google Brain 开展工作。在本章和本书中，我将把 Attention Is All You Need 中描述的模型称为“原始 Transformer 模型”。

• 对序列进行更广泛的深入分析
• 排除递归减少计算操作
• 并行化的实现，减少了训练时间
• 每个注意力机制学习相同输入序列的不同视角

## 机器学习代写|自然语言处理代写NLP代考|The encoder stack

LayerNormalization(X+子层(X))

Transformer 的设计者引入了一个非常有效的约束。模型的每个子层的输出都有一个恒定的维度，包括嵌入层和残差连接。这个维度是d疯狂 并且可以根据您的目标设置为另一个值。在最初的 Transformer 架构中，d马德尔 =512.

## 机器学习代写|自然语言处理代写NLP代考|Getting Started with the Model Architecture of the Transformer

Language is the essence of human communication. Civilizations would never have been born without the word sequences that form language. We now mostly live in a world of digital representations of language. Our daily lives rely on Natural Language Processing (NLP) digitalized language functions: web search engines, emails, social networks, posts, tweets, smartphone texting, translations, web pages, speech-to-text on streaming sites for transcripts, text-to-speech on hotline services, and many more everyday functions.

In December 2017, the seminal Vaswani et al. Attention Is All You Need article, written by Google Brain members and Google Research, was published. The Transformer was born. The Transformer outperformed the existing state-of-the-art NLP models. The Transformer trained faster than previous architectures and obtained higher evaluation results. Transformers have become a key component of NLP.
The digital world would never have existed without NLP. Natural Language Processing would have remained primitive and inefficient without artificial intelligence. However, the use of Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) comes at a tremendous cost in terms of calculations and machine power.

In this chapter, we will first start with the background of NLP that led to the rise of the Transformer. We will briefly go from early NLP to RNNs and CNNs. Then we will see how the Transformer overthrew the reign of RNNs and CNNs, which had prevailed for decades for sequence analysis.

Then we will open the hood of the Transformer model described by Vaswani et al. (2017) and examine the key components of its architecture. We will explore the fascinating world of attention and illustrate the key components of the Transformer.
This chapter covers the following topics:

• The background of the Transformer
• The architecture of the Transformer
• The Transformer’s self-attention model
• The encoding and decoding stacks
• Input and output embedding
• Positional embedding
• Self-attention
• Residual connections
• Normalization
• Feedforward network
• Output probabilities
Our first step will be to explore the background of the Transformer.

## 机器学习代写|自然语言处理代写NLP代考|The background of the Transformer

In this section, we will go through the background of NLP that led to the Transformer. The Transformer model invented by Google Research has toppled decades of Natural Language Processing research, development, and implementations.
Let us first see how that happened when NLP reached a critical limit that required a new approach.

Over the past $100+$ years, many great minds have worked on sequence transduction and language modeling. Machines progressively learned how to predict probable sequences of words. It would take a whole book to cite all the giants that made this happen.

In this section, I will share my favorite researchers with you to lay the ground for the arrival of the Transformer.

In the early $20^{\text {th }}$ century, Andrey Markov introduced the concept of random values and created a theory of stochastic processes. We know them in artificial intelligence (AI) as Markov Decision Processes (MDPs), Markov Chains, and Markov Processes. In 1902, Markov showed that we could predict the next element of a chain, a sequence, using only the last past element of that chain. In 1913, he applied this to a 20,000 -letter dataset using past sequences to predict the future letters of a chain. Bear in mind that he had no computer but managed to prove his theory, which is still in use today in AI.
In 1948, Claude Shannon’s The Mathematical Theory of Communication was published. He cites Andrey Markov’s theory multiple times when building his probabilistic approach to sequence modeling. Claude Shannon laid the ground for a communication model based on a source encoder, a transmitter, and a received decoder or semantic decoder.

## 机器学习代写|自然语言处理代写NLP代考|Getting Started with the Model Architecture of the Transformer

2017 年 12 月，开创性的 Vaswani 等人。发表了由 Google Brain 成员和 Google Research 撰写的 Attention Is All You Need 文章。变形金刚诞生了。Transformer 的性能优于现有的最先进的 NLP 模型。Transformer 的训练速度比以前的架构更快，并获得了更高的评估结果。Transformer 已成为 NLP 的关键组成部分。

• 变压器的背景
• 变压器的架构
• Transformer 的自注意力模型
• 编码和解码堆栈
• 输入和输出嵌入
• 位置嵌入
• 自注意力
• 多头注意力
• 蒙面多注意
• 剩余连接
• 正常化
• 前馈网络
• 输出概率
我们的第一步是探索 Transformer 的背景。

## 机器学习代写|自然语言处理代写NLP代考|The background of the Transformer

1948年，克劳德·香农的《通信的数学理论》出版。在构建序列建模的概率方法时，他多次引用了 Andrey Markov 的理论。Claude Shannon 为基于源编码器、发射器和接收解码器或语义解码器的通信模型奠定了基础。

## 计算机代写|机器学习代写machine learning代考|Proposed Artificial Dragonfly Algorithm for solving Optimization Problem

In this work, modified ADA is implemented for training the NN classifier. The DA model $[21,23]$ concerns on five factors for updating the location of the dragonfly. They are (i) Control cohesion (ii) Alignment (iii) Separation (iv) Attraction (iv) Distraction. The separation of $r^{t h}$ dragonfly, $M_{r}$ is calculated by Equation (1.24) and here $A$ denotes the current dragonfly position, $A_{s}^{\prime}$ refers to the location of $s^{\text {th }}$ neighbouring dragonfly and $H^{\prime}$ denotes the count of neighboring dragonflies.
$$M_{r}=\sum_{s=1}^{H^{\prime}}\left(A^{\prime}-A_{s}^{\prime}\right)$$
The alignment and cohesion are computed by Equation (1.25) and Equation (1.26). In Equation (1.25), $Q_{s}^{\prime}$ refers to the velocity of $s^{\text {th }}$ neighbour dragonfly.
\begin{aligned} J_{r} &=\frac{\sum_{s=1}^{H^{\prime}} Q_{s}^{\prime}}{H^{\prime}} \ V_{r} &=\frac{\sum_{s=1}^{H^{\prime}} A_{s}^{\prime}}{I I^{\prime}}-A \end{aligned}
Attraction towards food and distraction to the enemy are illustrated in Equation (1.27) and Equation (1.28). In Equation (1.27), $F v$ refers to the food position and in Equation (1.28), ene denotes the enemy position.
\begin{aligned} &W_{r}=F o-A^{\prime} \ &Z_{r}=e n e+A^{\prime} \end{aligned}
The vectors such as position $A^{\prime}$ and $\Delta A^{\prime}$ step are considered here for updating the position of the dragonfly. The step vector $\Delta A^{\prime}$ denotes the moving direction of dragonflies as given in Equation (1.29), in which $q^{\prime}, t^{\prime}$, $v^{\prime}, u^{\prime}, z^{\prime}$ and $\delta$ refers the weights for separation, alignment, cohesion, food factor, enemy factor, and inertia respectively and $l$ denotes to the iteration count.

## 计算机代写|机器学习代写machine learning代考|Result Interpretation

The presentation scrutiny of the implemented model with respect to varied values of $T$ is given by Figures $1.6-1.8$ and $1.9$ for accuracy, sensitivity, specificity, and F1 Score respectively. For instance, from Figure $1.6$ accuracy of $T$ at 97 is high, which is $3.06 \%, 3.06 \%, 8.16 \%$, and $6.12 \%$ better than $T$ at $94,95,98,99$, and 100 when $v^{\prime}$ is $0.2$. From Figure 1.6, the accuracy of the adopted model when $T=95$ is high, which is $8.16 \%, 13.27 \%, 8.16 \%$ and $16.33 \%$ better than $T$ at $97,98,99$ and 100 when $v^{\prime}$ is $0.4$. On considering Figure $1.6$, the accuracy at $T=95$ is high, which is $7.53 \%, 3.23 \%, 3.23 \%$ and $3.23 \%$ better than $T$ at $97,98,99$ and 100 when $v^{\prime}$ is $0.2$. Likewise, from Figure $1.7$, the sensitivity of the adopted scheme when $T=97$ is higher, which is $1.08 \%, 2.15 \%, 1.08 \%$, and $16.13 \%$ better than $T$ at $94,95,98$,99 and 100 when $v^{\prime}$ is $0.9$. Also, from Figure $1.7$, the sensitivity at $T=97$ is more, which is $7.22 \%, 12.37 \%, 7.22 \%$ and $6.19 \%$ better than $T$ at 95,98 , 99 and 100 when $v^{\prime}$ is $0.7$. Moreover, Figure $1.8$ shows the specificity of the adopted model, which revealed better results for all the two test cases. From Figure $1.8$, the specificity of the presented model at $T=95$ is high, which is $3.23 \%, 8.6 \%, 8.6 \%$, and $8.6 \%$ better than $T$ at $97,98,99$ and 100 when $v^{\prime}$ is $0.7$. From Figure 1.8, the specificity of the presented model at $T=99$ is high, which is $13.04 \%, 2.17 \%, 2.17 \%$ and $13.04 \%$ better than $T$ at 95,97 , 98 and 100 when $v^{\prime}$ is $0.6$. From Figure $1.8$, the specificity when $T=99$ is high, which is $21.05 \%, 21.05 \%, 47.37 \%$ and $47.37 \%$ better than $T$ at 95,97 , 98 and 100 when $v^{\prime}$ is $0.7$. The F1-score of the adopted model is revealed by Figure 1.9, which shows betterment for all values of $T$. From Figure $1.9$, the F1-score of the implemented model at $T=95$ is high, which is $3.23 \%, 8.6 \%$, $8.6 \%$ and $8.6 \%$ better than $T$ at $97,98,99$ and 100 when $v^{\prime}$ is $0.4$. From Figure $1.9$, the F1-score at $T=99$ is high, which is $3.23 \%, 8.6 \%, 8.6 \%$ and $8.6 \%$ better than $T$ at $95,97,98$ and 100 when $v^{\prime}$ is $0.4$. Thus, the betterment of the adopted scheme has been validated effectively.

## 计算机代写|机器学习代写machine learning代考|Related Work

A comprehensive review of various DL approaches has been done and existing methods for detecting and diagnosing cancer is discussed.

Siddhartha Bhatia et al. [4], implemented a model to predict the lung lesion from CT scans by using Deep Convolutional Residual techniques. Various classifiers like XGBoost and Random Forest are used to train the model. Preprocessing is done and feature extraction is done by implementing UNet and ResNet models. LIDC-IRDI dataset is utilized for evaluation and $84 \%$ of accuracy is recorded.

A. Asuntha et al. [5], implemented an approach to detect and label the pulmonary nodules. Novel deep learning methods are utilized for the detection of lung nodules. Various feature extraction techniques are used then feature selection is done by applying the Fuzzy Particle Swarm Optimization (FPSO) algorithm. Finally, classification is done by Deep learning methods. FPSOCNN is used to reduce the computational problem of CNN. Further valuation is done on a real-time dataset collected from Arthi Scan Hospital. The experimental analysis determines that the novel FPSOCNN gives the best results compared to other techniques.

Fangzhou Lia et al. [6], developed a 3D deep neural network model which comprises of two modules one is to detect the nodules namely the 3D region proposal network and the other module is to evaluate the cancer probabilities, both the modules use a modified U-net network. 2017 Data Science Bowl competition the proposed model won first prize. The overall model achieved better results in the standard competition of lung cancer classification.

Qing Zeng et al. [7]. implemented three variants of DL algorithms namely, CNN, DNN, and SAE. The proposed models are applied to the $\mathrm{Ct}$ scans for the classification and the model is experimented on the LIDC-IDRI dataset and achieved the best performance with $84.32 \%$ specificity, $83.96 \%$ sensitivity and accuracy is $84.15 \%$.

## 计算机代写|机器学习代写machine learning代考|Proposed Artificial Dragonfly Algorithm for solving Optimization Problem

$$M_{r}=\sum_{s=1}^{H^{\prime}}\left(A^{\prime}-A_{s}^{\prime}\right)$$

$$J_{r}=\frac{\sum_{s=1}^{H^{\prime}} Q_{s}^{\prime}}{H^{\prime}} V_{r}=\frac{\sum_{s=1}^{H^{\prime}} A_{s}^{\prime}}{I I^{\prime}}-A$$

$$W_{r}=F o-A^{\prime} \quad Z_{r}=e n e+A^{\prime}$$

## 计算机代写|机器学习代写machine learning代考|Result Interpretation

$3.23 \%, 8.6 \%, 8.6 \%$ ，和 $8.6 \%$ 好于 $T$ 在 $97,98,99$ 和 100 时 $v^{\prime}$ 是 $0.7$. 从图 $1.8$ 可以看出，所呈现模型的特殊性 在 $T=99$ 很高，即 $13.04 \%, 2.17 \%, 2.17 \%$ 和 $13.04 \%$ 好于 $T$ 在 $95,97 ， 98$ 和 100 时 $v^{\prime}$ 是 $0.6$. 从图 $1.8$, 时的特 异性 $T=99$ 很高，即 $21.05 \%, 21.05 \%, 47.37 \%$ 和 $47.37 \%$ 好于 $T$ 在 $95,97 ， 98$ 和 100 时 $v^{\prime}$ 是 $0.7$. 图 $1.9$ 显示 了所采用模型的 F1 分数，它显示了 $T$. 从图 $1.9$, 实现模型的 F1-score 在 $T=95$ 很高，即 $3.23 \%, 8.6 \%, 8.6 \%$ 和 $8.6 \%$ 好于 $T$ 在 $97,98,99$ 和 100 时 $v^{\prime}$ 是 $0.4$. 从图 $1.9$ ， F1 分数在 $T=99$ 很高，即 $3.23 \%, 8.6 \%, 8.6 \%$ 和 $8.6 \%$ 好 于 $T$ 在 $95,97,98$ 和 100 时 $v^{\prime}$ 是 $0.4$. 因此，有效地验证了所采用方案的改进。

## 计算机代写|机器学习代写machine learning代考|Related Work

A. Asuntha 等人。[5]，实施了一种检测和标记肺结节的方法。新的深度学习方法用于检测肺结节。使用各种特征提取技术，然后通过应用模糊粒子群优化 (FPSO) 算法完成特征选择。最后，分类是通过深度学习方法完成的。FPSOCNN 用于减少 CNN 的计算问题。对从 Arthi Scan 医院收集的实时数据集进行进一步评估。实验分析确定，与其他技术相比，新型 FPSOCNN 给出了最好的结果。

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

