## cs代写|机器学习代写machine learning代考|Regression and optimization

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习machine learning代写方面经验极为丰富，各种代写机器学习machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## cs代写|机器学习代写machine learning代考|Linear regression and gradient descent

Linear regression is usually taught in high school, but my hope is that this book will provide a new appreciation for this subject and associated methods. It is the simplest form of machine learning, and while linear regression seems limited in scope, linear methods still have some practical relevance since many problems are at least locally approximately linear. Furthermore, we use them here to formalize machine learning methods and specifically to introduce some methods that we can generalize later to non-linear situation. Supervised machine learning is essentially regression, although the recent success of machine learning compared to previous approaches to modeling and regression is their applicability to high-dimensional data with non-linear relations, and the ability to scale these methods to complex models. Linear regression can be solved analytically. However, the non-linear extensions will usually not be analytically solvable. Hence, we will here introduce the formalization of iterative training methods that underly much of supervised learning.

To undertake discuss linear regression, we will follow an example of describing house prices. The table on the left in Figure $5.1$ lists the size in square feet and the corresponding asking prices of some houses. These data points are plotted in the graph on the right in Figure 5.1. The question is, can we predict from these data the likely asking price for houses with different sizes?

To do this prediction we make the assumption that the house price depend essentially on the size of the house in a linear way. That is, a house twice the size should cost twice the money. Of course, this linear model clearly does not capture all the dimensions of the problem. Some houses are old, others might be new. Some houses might need repair and other houses might have some special features. Of course, as everyone in the real estate business knows, it is also location that is very important. Thus, we should keep in mind that there might be unobserved, so-called latent dimensions in the data that might be important in explaining the relations. However, we ignore such hidden causes at this point and just use the linear model over size as our hypothesis.
The linear model of the relation between the house size and the asking price can be made mathematically explicit with the linear equation
$$y\left(x ; w_{1}, w_{2}\right)=w_{1} x+w_{2}$$
where $y$ is the asking price, $x$ is the size of the house, and $w_{1}$ and $w_{2}$ are model parameters. Note that $y$ is a function of $x$, and here we follow a notation where the parameters of a function are included after a semi-colon. If the parameters are given, then this function can be used to predict the price of a house for any size. This is the general theme of supervised learning; we assume a specific function with parameters that we can use to predict new data.

## cs代写|机器学习代写machine learning代考|Error surface and challenges for gradient descent

It is instructive to look at the precise numerical results and details when implementing the whole procedure. We first link our common NumPy and plot routines and then define the data given in the table in Fig. 5.1. This figure also shows a plot of these data.

We now write the regression code as shown in Listing 5.2. First we set the starting values for the parameters $w_{1}$ and $w_{2}$, and we initialize an empty array to store the values of the loss function $L$ in each iteration. We also set the update (learning) rate $\alpha$ to a small value. We then perform ten iterations to update the parameters $w_{1}$ and $w_{2}$ with the gradient descent rule. Note that an index of an array with the value $-1$ indicates the last element in an Python array. The result of this program is shown in Fig. 5.2. The fit of the function shown in Fig. 5.2A does not look right at all. To see what is occurring it is good to plot the values of the loss function as shown in Fig. $5.2 B$. As can be seen, the loss function gets bigger, not smaller as we would have expected, and the values itself are extremely large.

The rising loss value is a hint that the learning rate is too large. The reason that this can happen is illustrated in Fig. 5.2C. This graph is a cartoon of a quadratic loss surface. When the update term is too large, the gradient can overshoot the minimum value. In such a case, the loss of the next step can be even larger since the slope at this point is also higher. In this way, every step can increase the loss value and the values will soon exceed the values representable in a computer.

So, let’s try it again with a much smaller learning rate of alpha $=0.00000001$ which was chosen after several trials to get what look like the best result. The results shown in Fig. $5.2$ look certainly much better although also not quite right. The fitted curve does not seem to balance the data points well, and while the loss values decrease at first rapidly, they seem to get stuck at a small value.

To look more closely at what is going on we can plot the loss function for several values around our expected values of the variable. This is shown in Fig. 5.2C. This reveals that the change of the loss function with respect to the parameter $w_{2}$ is large, but that changing the parameter $w_{1}$ on the same scale has little influence on the loss value. To fix this problem we would have to change the learning rate for each parameter, which is not practical in higher-dimensional models. There are much more sophisticated solutions such as Amari’s Natural Gradient, but a quick fix for many applications is to normalize the data so that the ranges are between 0 and 1 . Thus, by adding the code and setting the learning rate to alpha $=0.04$, we get the solution shown in Fig. 5.2. The solution is much better, although the learning path is still not optimal. However, this is a solution that is sufficient most of the time.

Learning in machine learning means finding parameters of the model w that minimize the loss function. There are many methods to minimize a function, and each one would constitute a learning algorithm. However, the workhorse in machine learning is usual some form of a gradient descent algorithm that we encountered earlier. Formally, the basic gradient descent minimizes the sum of the loss values over all training examples, which is called a batch algorithm as all training examples build the batch for minimization. Let us assume we have $m$ training data, then gradient descent iterates the equation
$$w_{i} \leftarrow w_{i}+\Delta w_{i}$$
with
$$\Delta w_{i}=-\frac{\alpha}{N} \sum_{k=1}^{N} \frac{\partial \mathcal{L}\left(y^{(i)}, \mathbf{x}^{(i)} \mid \mathbf{w}\right)}{\partial w_{i}}$$
where $N$ is the number of training samples. We can also write this compactly for all parameters using vector notation and the Nabla operator $\nabla$ as
$$\Delta \mathrm{w}=-\frac{\alpha}{N} \sum_{i=1}^{N} \nabla \mathcal{L}^{(i)}$$
with
$\mathcal{L}\left(y^{(i)}, \mathbf{x}^{(i)} \mid \mathbf{w}\right)$
(5.10)
With a sufficiently small learning rate $\alpha$, this will result in a strictly monotonically decreasing learning curve. However, with many training data, a large number of training

examples have to be kept in memory. Also, batch learning seems unrealistic biologically or in situations where training examples only arrive over a period of time. So-called online algorithms that use the training data when they arrive are therefore often desirable. The online gradient descent would consider only one training example at a time,
$$\Delta \mathbf{w}=-\alpha \nabla \mathcal{L}^{(i)}$$
and then use another training example for another update. If the training examples appear randomly in such an example-wise training, then the training examples provide a random walk around the true gradient descent. This algorithms is hence called the stochastic gradient descent (SGD). It can be seen as an approximation of the basic gradient descent algorithm, and the randomness has some positive effects on the search path such as avoiding oscillations or getting stuck in local minima. In practice it is now common to use something in between, using so-called mini-batches of the training data to iterate using them. This is formally still a stochastic gradient descent, but it combines the advantages of a batch algorithm with the reality of limited memory capacities.

## cs代写|机器学习代写machine learning代考|Error surface and challenges for gradient descent

Δ在一世=−一个ñ∑ķ=1ñ∂大号(是(一世),X(一世)∣在)∂在一世

Δ在=−一个ñ∑一世=1ñ∇大号(一世)

(5.10)

Δ在=−一个∇大号(一世)

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## cs代写|机器学习代写machine learning代考|Neural networks and Keras

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习machine learning代写方面经验极为丰富，各种代写机器学习machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## cs代写|机器学习代写machine learning代考|Neurons and the threshold perceptron

The brain is composed of specialized cells. These cells include neurons, which are thought to be the main information-processing units, and glia, which have a variety of supporting roles. A schematic example of a neuron is shown in Fig. 4.1a. Neurons are specialized in electrical and chemical information processing. They have an extensions called an axon to send signals, and receiving extensions called dendrites. The contact zone between the neurons is called a synapse. A sending neuron is often referred to as the presynaptic neuron and the receiving cell is a postsynaptic neuron. When an neuron becomes active it sends a spike down the axon where it can release chemicals called neurotransmitters. The neurotransmitters can then bind to receiving receptors on the dendrite that trigger the opening of ion channels. Ion channels are specialized proteins that form gates in the cell membrane. In this way, electrically charged ions can enter or leave the neuron and accordingly change the voltage (membrane potential) of the neuron. The dendrite and cell body acts like a cable and a capacitor that integrates (sums) the potentials of all synapses. When the combined voltage at the axon reaches a certain threshold, a spike is generated. The spike can then travel down the axon and affect further neurons downstream.

This outline of the functionality of a neuron is, of course, a major simplification. For example, we ignored the description of the specific time course of opening and closing of ion channels and hence some of the more detailed dynamics of neural activity. Also, we ignored the description of the transmission of the electric signals within the neuron; this is why such a model is called a point-neuron. Despite these simplifications, this model captures some important aspects of a neuron functionality. Such a model suffices for us at this point to build simplified models that demonstrate some of the informationprocessing capabilities of such a simplified neuron or a network of simplified neurons. We will now describe this model in mathematical terms so that we can then simulate such model neurons with the help of a computer.

Warren McCulloch and Walter Pitts were among the first to propose such a simple model of a neuron in 1943 which they called the threshold logical unit. It is now often

referred to as the McCulloch-Pitts neuron. Such a unit is shown in Fig. 4.2A with three input channels, although neurons have typically a much larger number of input channels. Input values are labeled by $x$ with a subscript for each channel. Each channel has an associated weight parameter, $w_{i}$, representing the “strength” of a synapse.
The McCulloch-Pitts neuron operates in the following way. Each input value is multiplied with the corresponding weight value, and these weighted values are then summed together, mimicking the superposition of electric charges. Finally, if the weighted summed input is larger than a certain threshold value, $w_{0}$, then the output is set to 1 , and 0 otherwise. Mathematically this can be written as
$$y(\mathbf{x} ; \mathbf{w})=\left{\begin{array}{cc} 1 & \text { if } \sum_{i}^{n} w_{i} x_{i}=\mathbf{w}^{T} \mathbf{x}>w_{0} \ 0 & \text { otherwise } \end{array}\right.$$
This simple neuron model can be written in a more generic form that we will call the perceptron. In this more general model, we calculate the output of a neuron by applying an gain function $g$ to the weighted summed input,
$$y(\mathbf{x} ; \mathbf{w})=g\left(\mathbf{w}^{T} \mathbf{x}\right)$$
where $w$ are parameters that need to be set to specific values or, in other words, they are the parameters of our parameterized model for supervised learning. We will come back to this point later regarding how precisely to chose them. The original McCulloch-Pits neuron is in these terms a threshold perceptron with a threshold gain function,
$$g(x)=\left{\begin{array}{l} 1 \text { if } x>0 \ 0 \text { otherwise } \end{array}\right.$$
This threshold gain function is a first example of a non-linear function that transforms the sum of the weighted inputs. The gain function is sometimes called the activation function, the transfer function, or the output function in the neural network literature. Non-linear gain functions are an important part of artificial neural networks as further discussed in later chapters.

## cs代写|机器学习代写machine learning代考|Multilayer perceptron (MLP) and Keras

To represent more complex functions with perceptron-like elements we are now building networks of artificial neurons. We will start with a multilayer perceptron (MLP) as

shown in Fig.4.3. This network is called a two-layer network as it basically has two processing layers. The input layer simply represents the feature vector of a sensory input, while the next two layers are composed of the perceptron-like elements that sum up the input from previous layers with their associate weighs of the connection channels and apply a non-linear gain function $\sigma(x)$ to this sum,
$$y_{i}=\sigma\left(\sum_{j} w_{i j} x_{j}\right)$$
We used here the common notation with variables $x$ representing input and $y$ representing the output. The synaptic weights are written as $w_{i j}$. The above equation corresponds to a single-layer perceptron in the case of a single output node. Of course, with more layers, we need to distinguish the different neurons and weights, for example with superscipts for the weights as in Fig.4.3. The output of this network is calculated as
$$y_{i}=\sigma\left(w_{i j}^{\mathrm{o}} \sigma\left(\sum_{k} w_{j k}^{\mathrm{h}} x_{k}\right)\right) .$$
where we used the superscript “o” for the output weights and the superscript ” $h$ ” for the hidden weights. These formulae represent a parameterized function that is the model in the machine learning context.

## cs代写|机器学习代写machine learning代考|Representational learning

Here, we are discussing feedforward neural networks which can be seen as implementing transformations or mapping functions from an input space to a latent space, and from there on to an output space. The latent space is spanned by the neurons in between the input nodes and the output nodes, which are sometime called the hidden neurons. We can of course always observe the activity of the nodes in our programs so that these are not really hidden. All the weights are learned from the data so that the transformations that are implemented by the neural network are learned from examples. However, we can guide these transformations with the architecture. The latent representations should be learned so that the final classification in the last layer is much easier than from the raw sensory space. Also, the network and hence the representation it represents should make generalizations to previously unseen examples easy and robust. It is useful to pause for a while here and discuss representations.

## cs代写|机器学习代写machine learning代考|Neurons and the threshold perceptron

Warren McCulloch 和 Walter Pitts 在 1943 年率先提出了这样一个简单的神经元模型，他们称之为阈值逻辑单元。现在经常

McCulloch-Pitts 神经元以下列方式运作。每个输入值乘以相应的权重值，然后将这些权重值相加，模拟电荷的叠加。最后，如果加权求和输入大于某个阈值，在0，则输出设置为 1 ，否则设置为 0。数学上这可以写成
$$y(\mathbf{x} ; \mathbf{w})=\left{ 1 如果 ∑一世n在一世X一世=在吨X>在0 0 除此以外 \正确的。 吨H一世ss一世米pl和n和在r○n米○d和lC一个nb和在r一世吨吨和n一世n一个米○r和G和n和r一世CF○r米吨H一个吨在和在一世llC一个ll吨H和p和rC和p吨r○n.我n吨H一世s米○r和G和n和r一个l米○d和l,在和C一个lC在l一个吨和吨H和○在吨p在吨○F一个n和在r○nb是一个ppl是一世nG一个nG一个一世nF在nC吨一世○nG吨○吨H和在和一世GH吨和ds在米米和d一世np在吨, y(\mathbf{x} ; \mathbf{w})=g\left(\mathbf{w}^{T} \mathbf{x}\right) 在H和r和在一个r和p一个r一个米和吨和rs吨H一个吨n和和d吨○b和s和吨吨○sp和C一世F一世C在一个l在和s○r,一世n○吨H和r在○rds,吨H和是一个r和吨H和p一个r一个米和吨和rs○F○在rp一个r一个米和吨和r一世和和d米○d和lF○rs在p和r在一世s和dl和一个rn一世nG.在和在一世llC○米和b一个Cķ吨○吨H一世sp○一世n吨l一个吨和rr和G一个rd一世nGH○在pr和C一世s和l是吨○CH○s和吨H和米.吨H和○r一世G一世n一个l米CC在ll○CH−磷一世吨sn和在r○n一世s一世n吨H和s和吨和r米s一个吨Hr和sH○ldp和rC和p吨r○n在一世吨H一个吨Hr和sH○ldG一个一世nF在nC吨一世○n, g(x)=\左{ 1 如果 X>0 0 除此以外 \正确的。$$

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## cs代写|机器学习代写machine learning代考|Support vector machines

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习machine learning代写方面经验极为丰富，各种代写机器学习machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## cs代写|机器学习代写machine learning代考|Soft margin classifier

Thus far we have only discussed the linear separable case, but how about the case when there are overlapping classes? It is possible to extend the optimization problem by allowing some data points to be in the margin while penalizing these points somewhat. We therefore include some slag variables $\xi_{i}$ that reduce the effective margin for each data point, but we add a penalty term to the optimization that penalizes if the sum of these slag variables are large,
$$\min {\mathbf{w}, b} \frac{1}{2}|\mathbf{w}|^{2}+C \sum{i} \xi_{i}$$
subject to the constraints
\begin{aligned} y^{(i)}\left(\mathbf{w}^{T} \mathbf{x}+b\right) & \geq 1-\xi_{i} \ \xi_{i} & \geq 0 \end{aligned}
The constant $C$ is a free parameter in this algorithm. Making this constant large means allowing fewer points to be in the margin. This parameter must be tuned and it is advisable at least to try to vary this parameter in order to verify that the results do not dramatically depend on an initial choice.

## cs代写|机器学习代写machine learning代考|Non-linear support vector machines

We have treated the case of overlapping classes while assuming that the best we can do is a linear separation. However, what if the underlying problem is separable with a function that might be more complex? An example is shown in Fig. 3.10. Nonlinear separation and regression models are of course much more common in machine learning, and we will now look into the non-linear generalization of the SVM.

Let us illustrate the basic idea with an example in two-dimensions. A linear function with two attributes that span the 2-dimensional feature space is given by
$$y=w_{0}+w_{1} x_{1}+w_{2} x_{2}=\mathbf{w}^{T} \mathbf{x},$$
with
$$\mathbf{x}=\left(\begin{array}{c} 1 \ x_{1} \ x_{2} \end{array}\right)$$
and weight vector
$$\mathbf{w}^{T}=\left(w_{0}, w_{1}, w_{2}\right) .$$
Let us say that we cannot separate the data with this linear function but that we could separate it with a polynomial that include second-order terms like
$$y=\tilde{w}{0}+\tilde{w}{1} x_{1}+\tilde{w}{2} x{2}+\tilde{w}{3} x{1} x_{2}+\tilde{w}{4} x{1}^{2}+\tilde{w}{5} x{2}^{2}=\tilde{\mathbf{w}} \phi(\mathbf{x}) .$$
We can view the second equation as a linear separation on a feature vector
$$\mathbf{x} \rightarrow \phi(\mathbf{x})=\left(\begin{array}{c} 1 \ x_{1} \ x_{2} \ x_{1} x_{2} \ x_{1}^{2} \ x_{2}^{2} \end{array}\right) .$$
This can be seen as mapping the attribute space $\left(1, x_{1}, x_{2}\right)$ to a higher-dimensional space with the mapping function $\phi(\mathbf{x})$. We call this mapping a feature map. The separating hyperplane is then linear in this higher-dimensional space. Thus, we can use the above linear maximum margin classification method in non-linear cases if we replace all occurrences of the attribute vector $x$ with the mapped feature vector $\phi(\mathbf{x})$.
There are only three problems remaining. One is that we don’t know what the mapping function should be. The somewhat ad-hoc solution to this problem will be that we try out some functions and see which one works best. We will discuss this further later in this chapter. The second problem is that we have the problem of overfitting

as we might use too many feature dimensions and corresponding free parameters $w_{i}$. In the next section, we provide a glimpse of an argument why SVMs might address this problem. The third problem is that with an increased number of dimensions the evaluation of the equations becomes more computational intensive. However, there is a useful trick to alleviate the last problem in the case when the calculations always contain only dot products between feature vectors. An example of this is the solution of the minimization problem of the dual problem in the earlier discussions of the linear SVM. The function to be minimized in this formulation, Egn $3.26$ with the feature maps, only depends on the dot products between a vector $\mathbf{x}^{(i)}$ of one example and another example $\mathbf{x}^{(j)}$. Also, when predicting the class for a new input vector $\mathbf{x}$ from Egn $3.24$ when adding the feature maps, we only need the resulting values for the dot products $\phi\left(\mathbf{x}^{(i)}\right)^{T} \phi(\mathbf{x})$. We now discuss that such dot products can sometimes be represented with functions called kernel functions,
$$K(\mathbf{x}, \mathbf{z})=\phi(\mathbf{x})^{T} \phi(\mathbf{z})$$
Instead of actually specifying a feature map, which is often a guess to start with, we could actually specify a kernel function. For example, let us consider a quadratic kernel function between two vectors $\mathbf{x}$ and $\mathbf{z}$,
$$K(\mathbf{x}, \mathbf{z})=\left(\mathbf{x}^{T} \mathbf{z}+1\right)^{2}$$

## cs代写|机器学习代写machine learning代考|Statistical learning theory and VC dimension

SVMs are good and practical classification algorithms for several reasons. In particular, they are formulated as a convex optimization problem that has many good theoretical properties and that can be solved with quadratic programming. They are formulated to

take advantage of the kernel trick, they have a compact representation of the decision hyperplane with support vectors, and turn out to be fairly robust with respect to the hyper parameters. However, in order to act as a good learner, they need to moderate the overfitting problem discussed earlier. A great theoretical contributions of Vapnik and colleagues was the embedding of supervised learning into statistical learning theory and to derive some bounds that make statements on the average ability to learn form data. We briefly outline here the ideas and state some of the results without too much details, and we discuss this issue here entirely in the context of binary classification. However, similar observations can be made in the case of multiclass classification and regression. This section uses language from probability theory that we only introduce in more detail later. Therefore, this section might be best viewed at a later stage. Again, the main reason in placing this section is to outline the deeper reasoning for specific models.

As can’t be stressed enough, our objective in supervised machine learning is to find a good model which minimizes the generalization error. To state this differently by using nomenclature common in these discussions, we call the error function here the risk function $R$; in particular, the expected risk. In the case of binary classification, this is the probability of missclassification,
$$R(h)=P(h(x) \neq y)$$
Of course, we generally do not know this density function. We assume here that the samples are iid (independent and identical distributed) data, and we can then estimate what is called the empirical risk with the help of the test data,
$$\hat{R}(h)=\frac{1}{m} \sum_{i=1}^{m} \mathbb{1}\left(h\left(\mathbf{x}^{(i)} ; \theta\right)=y^{(i)}\right)$$

## cs代写|机器学习代写machine learning代考|Non-linear support vector machines

X=(1 X1 X2)

X→φ(X)=(1 X1 X2 X1X2 X12 X22).

ķ(X,和)=φ(X)吨φ(和)

ķ(X,和)=(X吨和+1)2

## cs代写|机器学习代写machine learning代考|Statistical learning theory and VC dimension

R(H)=磷(H(X)≠是)

R^(H)=1米∑一世=1米1(H(X(一世);θ)=是(一世))

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## cs代写|机器学习代写machine learning代考|Dimensionality reduction, feature selection, and t-SNE

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习machine learning代写方面经验极为丰富，各种代写机器学习machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## cs代写|机器学习代写machine learning代考|Dimensionality reduction, feature selection, and t-SNE

Before we dive deeper into the theory of machine learning, it is good to realize that we have only scratched the surface of machine learning tools in the sklearn toolbox. Besides classification, there is of course regression, where the label is a continuous variable instead of a categorical. We will later see that we can formulate most supervised machine learning techniques as regression and that classification is only a special case of regression. Sklearn also includes several techniques for clustering which are often unsupervised learning techniques to discover relations in data. Popular examples are k-means and Gaussian mixture models (GMM). We will discuss such techniques and unsupervised learning more generally in later chapters. Here we will end this section by discussing some dimensionality reduction methods.

As stressed earlier, machine learning is inherently aimed at high-dimensional feature spaces and corresponding large sets of model parameters, and interpreting machine learning results is often not easy. Several machine learning methods such as neural networks or SVMs are frequently called a blackbox method. However, there is nothing hidden from the user; we could inspect all portions of machine learning models such as the weights in support vector machines. However, since the models are complex, the human interpretability of results is challenging. An important aspect of machine learning is therefore the use of complementary techniques such as visualization and dimensionality reduction. We have seen in the examples with the iris data that even plotting the data in a subspace of the 4-dimensional feature space is useful, and we could ask which subspace is best to visualize. Also, a common technique to keep the model complexity low in order to help with the overfitting problem and with computational demands was to select input features carefully. Such feature selection is hence closely related to dimensionality reduction.

Today we have more powerful computers, typically more training data, as well as better regularization techniques so that input variable selection and standalone dimensionality reduction techniques seems less important. With the advent of deep learning we now often speak about end-to-end solutions that starts with basic features without the need for pre-processing to find solutions. Indeed, it can be viewed as problematic to potential information. However, there are still many practical reasons why dimensionality reduction can be useful, such as the limited availability of training data and computational constraints. Also, displaying results in human readable formats such as 2-dimensional maps can be very useful for human-computer interaction (HCI).
A traditional method that is still used frequently for dimensionality reduction is principle component analysis (PCA). PCA attempts to find a new coordinate system of the feature representation which orders the dimensions according to how spread the data are along these dimensions. The reasoning behind this is that dimensions with a large spread of data would offer the most sensitivity for distinguishing data. This is illustrated in Fig. 3.5. The direction of the largest variance of the data in this figure is called the first principal component. The variance in the perpendicular direction, which is called the second principal component, is less. In higher dimensions, the next principal components are in further perpendicular directions with decreasing variance along the directions. If one were allowed to use only one quantity to describe the data, then one can choose values along the first principal component, since this would capture an important distinction between the individual data points. Of course, we lose some information about the data, and a better description of the data can be given by including values along the directions of higher-order principal components. Describing the data with all principal components is equivalent to a transformation of the coordinate system and thus equivalent to the original description of the data.

## cs代写|机器学习代写machine learning代考|Decision trees and random forests

As stressed at the beginning of this chapter, our main aim here was to show that applying machine learning methods is made fairly easy with application packages like sklearn, although one still needs to know how to use techniques like hyperparameter tuning and balancing data to make effective use of them. In the next two sections we want to explain some of the ideas behind the specific models implemented by the random forrest classifier (RPF) and the support vector machine (SVM). This is followed in the next chapter by discussions of neural networks. The next two section are optional in the sense that following the theory behind them really require knowledge of additional mathematical concepts that are beyond our brief introductory treatment in this book. Instead, the main focus here is to give a glimpse of the deep thoughts behind those algorithms and to encourage the interested reader to engage with further studies. The asterisk in section headings indicates that these sections are not necessary reading to follow the rest of this book.

We have already used a random forrest classifier (RFC), and this method is a popular choice where deep learning has not yet made an impact. It is worthwhile to outline the concepts behind it briefly since it is also an example of a non-parametric machine learning method. The reason is that the structure of the model is defined by the training data and not conjectured at the beginning by the investigator. This fact alone helps the ease of use of this method and might explain some of its popularity, although there are additional factors that make it competitive such as the ability to build in feature selection. We will briefly outline what is behind this method. A random forest is actually an ensemble method of decision trees, so we will start by explaining what a decision tree is.

## cs代写|机器学习代写machine learning代考|Linear classifiers with large margins

In this section we outline the basic idea behind support vector machines (SVM) that have been instrumental in a first wave of industrial applications due to their robustness and ease of use. A warning: SVMs have some intense mathematical underpinning, although our goal here is to outline only some of the mathematical ideas behind this method. It is not strictly necessary to read this section in order to follow the rest of the book, but it does provide a summary of concepts that have been instrumental in previous progress and are likely to influence the development of further methods and research. This includes some examples of advanced optimization techniques and the idea of kernel methods. While we mention some formulae in what follows, we do not derive all the steps and will only use them to outline the form to understand why we can apply a kernel trick. Our purpose here is mainly to provide some intuitions.

SVMs, and the underlying statistical learning theory, was largely invented by Vladimir Vapnik in the early $1960 \mathrm{~s}$, but some further breakthroughs were made in the late 1990 s with collaborators such as Corinna Cortes, Chris Burges, Alex Smola, and Bernhard Schölkopf, to name but a few. The basic SVMs are concerned with binary classification. Fig. $3.9$ shows an example of two classes, depicted by different symbols, in a 2-dimensional attribute space. We distinguish here attributes from features as follows. Attributes are the raw measurements, whereas features can be made up by combining attributes. For example, the attributes $x_{1}$ and $x_{2}$ could be combined in a feature vector $\left(x_{1}, x_{1} x_{2}, x_{2}, x_{1}^{2}, x_{2}^{2}\right)^{T}$. This will become important later. Our training set consists of $m$ data with attribute values $\mathbf{x}^{(i)}$ and labels $y^{(i)}$. We put the superscript index $i$ in brackets so it is not mistaken as a power. For this discussion we chose the binary labels of the two classes as represented with $y \in{-1,1}$. This will simplify some equations.

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## cs代写|机器学习代写machine learning代考|Bagging and data augmentation

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习machine learning代写方面经验极为丰富，各种代写机器学习machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## cs代写|机器学习代写machine learning代考|Bagging and data augmentation

Having enough training data is often a struggle for machine learning practitioners. The problems of not having enough training data are endless. For one, this might reinforce the problem with overfitting or even prevent using a model of sufficient complexity at the start. Support vector machines are fairly simple (shallow) models that have the advantage of needing less data than deep learning methods. Nevertheless, even for these methods we might only have a limited amount of data to train the model.

A popular workaround has been a method called bagging, which stands for “bootstrap aggregating.” The idea is therefore to use the original dataset to create several more training datasets by sampling from the original dataset with replacement. Sampling with replacement, which is also called boostrapping, means that we could have several copies of the same training data in the dataset. The question then is what good they can do. The answer is that if we are training several models on these different datasets we can propose a final model as the model with the averaged parameters. Such a regularized model can help with overfitting or challenges of shallow minima in the learning algorithm. We will discuss this point further when discussing the learning algorithms in more detail later.

While bagging is an interesting method with some practical benefits, the field of data augmentation now often uses more general ideas. For example, we could just add some noise in the duplicate data of the bootstrapped training sets which will give the training algorithms some more information on possible variations of the data. We will later see that other transformation of data, such as rotations or some other form of systematic distortions for image data is now a common way to train deep neural networks for computer vision. Even using some form of other models to transfom the data can be helpful, such as generating training data synthetically from physics-based simulations. There are a lot of possibilities that we can not all discuss in this book, but we want to make sure that such techniques are kept in mind for practical applications.

## cs代写|机器学习代写machine learning代考|Balancing data

We have already mentioned balancing data, but it is worthwhile pausing again to look at this briefly. A common problem for many machine learning algorithms is a situation in which we have much more data for one class than another. For example, say we have data from 100 people with a decease and data from 100,000 healthy controls. Such ratios of positive and negative class are not uncommon in many applications. A trivial classifier that always predicts the majority class would then get $99.9$ per cent correct. In mathematical terms, this is just the prior probability of finding the class, which sets the baseline somewhat for better classifications. The problem is that many learning methods that are guided by simple loss measures such as this accuracy will mostly find this trivial solution. There have been many methods proposed to prevent such trivial solutions of which we will only mention a few here.

One of the simplest methods to counter imbalance of data is simply to use as many data from the positive class as the negative class in the training set. This systematic under-sampling of the majority class is a valid procedure as long as the sub-sampled data still represent sufficiently the important features of this class. However, it also means that we lose some information that is available to us and the machine. In the example above this means that we would only utilize 100 of the healthy controls in the training data. Another way is then to somehow enlarge the minority class by repeating some examples. This seems to be a bad idea as repeating examples does not seem to add any information. Indeed, it has been shown that this technique does not usually improve the performance of the classifier or prevent the majority overfitting problem. The only reason that this might sometimes work is that it can at least make sure the learning algorithms is incremented the same number of times for the majority and the minority class.

Another method is to apply different weights or learning rates to learn examples with different sizes to the training set. One problem with this is to find the right scaling of increase or decrease in the training weight, but this technique has been applied successfully in many case, including deep learning.

In practice it has been shown that a combination of both strategies under-sampling the majority class and over-sampling the minority class can be most beneficial, in particular when augmenting the over-sampling with some form of augmentation of the data. This is formalized in a method called SMOTE: synthetic minority over-sampling technique. The idea is therefore to change some characteristics of the over-sampled data such as adding noise. In this way there is at least a benefit of showing the learner variations that can guide the learning process. This is very similar to the bagging and data augmentation idea discussed earlier.

## cs代写|机器学习代写machine learning代考|Validation for hyperparameter learning

Thus far we have mainly assumed that we have one training set, which we use to learn the parameters of the parameterized hypothesis function (model), and a test set, to evaluate the performance of the resulting model. In practice, there is an important step in applying machine learning methods which have to do with tuning hyperparameters. Hyperparameters are algorithmic parameters beyond the parameters of the hypothesis functions. Such parameters include, for example, the number of neurons in a neural network, or which split criteria to use in decision trees, discussed later. SVMs also have several parameters such as one to tune the softness of the classifier, usually called $C$, or the width of the Gaussian kernel $\gamma$. We can even specify the number of iterations of some training algorithms. We will later shed more light on these parameters, but for now it is important only to know that there are many parameters of the algorithms itself beyond the parameters of the parameterized hypothesis function (model), which can be tunes. To some extent we could think of all these parameters as those of the final model, but it is common to make the distinction between the main model parameters and the hyperparaemeters of the algorithms.

The question is then how we tune the hyperparameters. This in itself is a learning problem for which we need a special learning set that we will call a validation set. The name indicates that it is used for some form of validation, although it is most often used to test a specific hyperparameters setting that can be used to compare different settings and to choose the better one. Choosing the hyperparameters itself is therefore a type of learning problem, and some form of learning algorithms have been proposed. A simple learning algorithm for hyperparameters would be a grid search where we vary the parameters in constant increments over some ranges of values. Other algorithms, like simulated annealing or genetic algorithms, have also been used. A dominant mode that is itself often effective when used by experienced machine learners is the handtuning of parameters. Whatever method we choose, we need a way to evaluate our choice with some of our data.

Therefore, we have to split our training data again into a set for training the main model parameters and a set for training the hyperparameters. The former we still call the training set, but the second is commonly called the validation set. Thus, the question arises again how to split the original training data into a training set for model parameters and the validation set for the hyperparameter tuning. Now, we can of course use the cross-validation procedure as explained earlier for this. Indeed, it is very common to use cross-validation for hyperparameter tuning, and somehow the name of the cross-validation coincides with the name of the validation step. But notice that the cross-validation procedure is a method to split data and that this can be used for both hyperparameter tuning and evaluating the predicted performance of our final model.

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## cs代写|机器学习代写machine learning代考|Classification with support vector machines

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习machine learning代写方面经验极为丰富，各种代写机器学习machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## cs代写|机器学习代写machine learning代考|multilayer perceptrons

We will show here how to apply three different types of machine learning classifiers using sklearn implementations, that of a support vector classifier (SVC), a random forest classifier (RFC), and a multilayer perceptron (MLP). We therefore concentrate on the mechanisms and will discuss what is behind these classifiers using the classical example of the iris flowers dataset that we discussed in the previous chapter to demonstrate how to read data into NumPy arrays. We will start with the $\mathrm{SVC}$, which is support vector machine (SVM $)^{1}$. The sklearn implementation is actually a wrapper for the SVMLIB implementation by Chih-Chung Chang and Chih-Jen Lin that has been very popular for classification applications. Later in this chapter describe more of the math and tricks behind this method, but for now we use it to demonstrate the mechanics of applying this method.

To apply this machine learning technique of a classifier to the iris data-set in the program IrisClassificationSklearn. ipynb. The program starts as usual by importing the necessary libraries. We then import the data similar to the program discussed in the previous chapter. We choose here to split the data into a training set and a test set by using every second data point as training point and every other as a test point. This is accomplished with the index specifications $0:-1: 2$ which is a list that starts at index ” 0 “, iterates until the end specified by index ” $-1^{\prime \prime}$ and uses a step of ” $2 . “$ ” Since the data are ordered and well balanced in the original data file, this will leave us also with a balanced dataset. Balance here means here that we have the same, or nearly the same, number data in the training set for each class. It turns out that this is often important for the good performance of the models. Also, instead of using the names features and target, we decided to shorten the notation by denoting the input features as $\mathrm{x}$ and the targets as $\mathrm{y}$ values.

## cs代写|机器学习代写machine learning代考|Performance measures and evaluations

We used the percentage of misclassification as an objective function to evaluate the performance of the model. This is a common choice and often a good start in our examples, but there are other commonly used evaluation measures that we should understand. Let us consider first a binary classification case where it is common to call one class “positive” and the other the “negative” class. This nomenclature comes from diagnostics such as trying to decide if a person has a disease based on some clinical tests. We can then define the following four performance indicators,

• True Positive (TP): Number of correctly predicted positive samples
• True Negative (TN): Number of correctly predicted negative samples
• False Positive (FP): Number of incorrectly predicted positive samples
• False Negative (FN): Number of incorrectly predicted negative samples
These numbers are often summarized in a confusion matrix, and such a matrix layout is shown in Fig. 3.2A.If we have more than two classes we could generalize this to measures of True Class 1, True Class 2, True Class 3, False Class 1, etc. It is convenient to summarize these numbers in a matrix which lists the true class down the columns and the predicted label along the rows. An example of a confusion matrix for the iris dataset that has three classes is shown in Fig. 3.2B. The plot is produced with the following code.

## cs代写|机器学习代写machine learning代考|Cross-validation

The performance of a model on the training data can always be improved and even made perfect on the training data when making the model more complex. This is the essence of overfitting. Basically, we can always write a model that can memorize a finite dataset. However, machine learning is about generalization that can only be measured with data points that have not been used during training. This is why in the examples earlier we split our data into a training set and into a test set.

Just splitting the data into these two sets is sufficient if we have enough. In practice, having enough labeled data for supervised training is often a problem. We therefore now introduce a method that is much better in using the data to their full potential. The method is called k-fold cross-validation for evaluating a model’s performance. This

method is based on the premise that all the data are used at some time for training and testing (validation) at some point throughout the evaluation procedure. For this, we partition our data into $k$ partitions as shown in Fig. $3.4$ for $k=4$. In this example we assumed to have a dataset with twenty samples, so that each partition would have five samples. In every step of the cross-validation procedure we are leaving one partition out for validating (testing) the trained model and use the other $k-1$ partitions for training. Hence, we get $k$ values for our evaluation measure, such as accuracy. We could then simply use the average as a final measure for the accuracy of the model’s fit. However, since we have several measures, we now have the opportunity to look at the distribution itself for more insights. For example, we could also report the variance if we assume a Gaussian distribution of the performance of the different models that result from training with different training sets.

Of course, the next question is then what should the value of $k$ be? As always in machine learning, the answer is not as simple as merely stating a number. If we have only a small number of data, then it would be wise to use as many data as possible for training. Hence, an $N$-fold cross-validation, where $N$ is the number of samples, would likely be useful. This is also called leave-one-out cross-validation (LOOCV). However, this procedure also requires $N$ training sessions and evaluations which might be computationally too expensive with larger datasets. The choice of $k$ is hence important to balance computational realities. We of course assume here that all samples are ‘nicely’ distributed in the sense that their order in the dataset is not biased. For example, cross-validation would be biased if we have data points from one class in the first part of the dataset and the other in the second part. A random resampling of the dataset is a quick way of avoiding most of these errors. Sklearn has of course a good way of implementing this. A corresponding code is given below.

## cs代写|机器学习代写machine learning代考|Performance measures and evaluations

• 真阳性（TP）：正确预测的阳性样本数
• True Negative (TN)：正确预测的负样本数
• 假阳性 (FP)：错误预测的阳性样本数
• 假阴性（FN）：错误预测的负样本的
数量这些数字通常被总结在一个混淆矩阵中，这种矩阵布局如图 3.2A 所示。如果我们有两个以上的类，我们可以将其推广到 True 的度量1 类、真类 2、真类 3、假类 1 等。将这些数字总结在一个矩阵中很方便，该矩阵在列中列出了真实类，沿行列出了预测标签。具有三个类别的 iris 数据集的混淆矩阵示例如图 3.2B 所示。该图是使用以下代码生成的。

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## cs代写|机器学习代写machine learning代考|Data handling

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习machine learning代写方面经验极为丰富，各种代写机器学习machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## cs代写|机器学习代写machine learning代考|Basic plots of iris data

Since machine learning requires data, we are commonly faced with importing data from files. There are a variety of tools to handle specific file formats. The most basic one is to reading data from text files. We can then manipulate the data and plot them in a form which can help us to gain insights into the information we want to get from the data. We will discuss some classical machine learning examples. These data are now often included in the libraries so that it will save us some time. However, preparing data to be used in machine learning is a large part of applying machine learning in practice. The following examples are provided in the program HouseMNIST. ipynb.
We start here with the example of the well-known classification problem of iris flowers. The iris dataset was collected from a field on the same day at the Gaspé region of eastern Quebec in Canada. These data were first used by the famous British statistician Ronald Fisher in a 1936 paper. The data consist of 150 samples, 50 samples of each of 3 species of the iris flower called iris Setosa $(0)$, iris Versicolour (1), and iris Virginica (2). For our purpose, we usually simply give each class a label such as a number, as shown in the bracket after the flower names in this example.

The dataset is given on the book’s web page with three text files, named iris . data, feature_names. txt, and target_names.txt, to start practising data handling. These are basic text files and their contents can be inspected by loading them into an editor. We are now exploring these data with with the program iris.ipynb. The data file contains both the feature values and the class label, and we can load these data into a NumPy array with the NumPy functions loadtxt. Printing out the shape of the array reveals that there are 150 lines of data, 1 for each sample, and 5 columns. The first four values are the measured length and width of septals and pedals of the flowers. The last number is the class label. The following code separates this data array into feature matrix and a target vector for all the samples. We also show how text can be handled with the NumPy function genfromtxt.

## cs代写|机器学习代写machine learning代考|Image processing and convolutional filters

This section dives into some image processing concepts and reviews convolution operations that become important later in this book. It is therefore important to review this section well. Also, the discussion gives us the opportunity to practice Python programing a bit more.

We have already displayed gray-scale images that were given by 2-dimensional matrices where each component stands for a gray level of one pixel. In order to represent color images we just need now three channels that each stands for one primary colors, red (R), green (G), and blue (B). Such RGB images are represented in a tensor of $M \times N \times 3$, where $M$ and $N$ are the size of horizontal and vertical resolutions in pixels. Reading and displaying an image file is incorporated in the Matplotlib library, though there are also a variety of other packages that can be used. For example, given a test image such as motorbike.jpg from the book’s web page as shown in Fig. 2.8B, a program to read this image into an array and to plot it is

The shape function reveals that this image has a resolution of $600 \times 800$ pixels with three color channels.

A main application of machine learning is object recognition, and we will now give an example of how we could accomplish this with a filter that highlights specific features in an image. Let’s assume we are looking for a red spot of a certain size in a photograph. Lets say we are given a picture as an RGB image like that is shown in Fig.2.9A. The corresponding program to read this image into an array and to plot it is

creates a new red pixel resulting in the image shown in Fig. 2.9B. We use this image for the following discussion.

The red spot that we want to detect with the following program is the structure in the upper left and not the red pixel with coordinate $(6,5)$ that we just added by hand above. We added this red pixel to discuss how we can distinguish between the main red object we are looking for and other red objects in the picture. It is interesting to look at the red, green, and blue channels separately, as shown in Fig. $2.9 \mathrm{C}$. Each of these plots can be produced with a code as in the following example for the red channel.

The open-source series of libraries called scikit build on the NumPy and SciPy libraries for more domain-specific support. In this chapter we briefly introducing the scikit-learn library, or sklearn for short. This library started as a Google Summer of Code project by David Cournapeau and developed into an open source library which now provides a variety of well-established machine learning algorithms. These algorithms together with excellent documentation are available at $. The goal of this chapter is to show how to apply machine learning algorithms in a general setting using some classic methods. In particular, we will show how to apply three important machine learning algorithms, a support vector classifier (SVC), a random forest classifier (RFC), and a multilayer perceptron (MLP). While many of the methods studied later in this book go beyond these now classic methods, this does not mean that these methods are obsolete. Quite the contrary; many applications have limited amounts of data where some more data-hungry techniques such as deep learning might not work. Also, the algorithms discussed here are providing some form of baseline to discuss advanced methods like probabilistic reasoning and deep learning. Our aim here is to demonstrate that applying machine learning methods based on such machine learning libraries is not very difficult. It also provides us with an opportunity to discuss evaluation techniques that are very important in practice. An outline of the algorithms and a typical work flow provided by scikit-learn, or sklearn for short, is shown in Fig. 3.1. The machine learning methods are thereby divided into classification, regression, clustering, and dimensionality reduction. We will later discuss the ideas behind the corresponding algorithms, specifically in the second half of this chapter, though we start by treating the methods first as a blackbox. We specifically outline in this chapter a typical machine learning setting for classification. In some applications it is possible to achieve sufficient performance without much need of knowing exactly what these algorithms do, although we will later show that applying machine learning to more challenging cases and avoiding pitfalls requires some deeper understanding of the algorithms. Our aim for the later part of this book is therefore to look much deeper into the principles behind machine learning including probabilistic and deep learning methods. ## 机器学习代写 ## cs代写|机器学习代写machine learning代考|Basic plots of iris data 由于机器学习需要数据，我们通常会面临从文件中导入数据的问题。有多种工具可以处理特定的文件格式。最基本的一种是从文本文件中读取数据。然后，我们可以操纵数据并以某种形式绘制它们，这可以帮助我们深入了解我们想从数据中获得的信息。我们将讨论一些经典的机器学习示例。这些数据现在通常包含在库中，这样可以节省我们一些时间。然而，准备用于机器学习的数据是在实践中应用机器学习的很大一部分。以下示例在程序 HouseMNIST 中提供。ipynb。 我们从著名的鸢尾花分类问题的例子开始。鸢尾花数据集是在同一天从加拿大魁北克东部加斯佩地区的一个田地收集的。这些数据最早由英国著名统计学家罗纳德·费舍尔在 1936 年的一篇论文中使用。数据由 150 个样本组成，其中 3 种鸢尾花各 50 个样本，称为鸢尾花(0), 鸢尾花 (1) 和鸢尾花 (2)。出于我们的目的，我们通常简单地给每个类一个标签，例如一个数字，如本例中花名后面的括号所示。 该数据集在本书的网页上给出，包含三个名为 iris 的文本文件。数据，特征名称。txt 和 target_names.txt，开始练习数据处理。这些是基本的文本文件，可以通过将它们加载到编辑器中来检查它们的内容。我们现在正在使用程序 iris.ipynb 探索这些数据。数据文件包含特征值和类标签，我们可以使用 NumPy 函数 loadtxt 将这些数据加载到 NumPy 数组中。打印出数组的形状显示有 150 行数据，每个样本 1 行，5 列。前四个值是花的隔膜和踏板的测量长度和宽度。最后一个数字是类标签。以下代码将此数据数组分成特征矩阵和所有样本的目标向量。 ## cs代写|机器学习代写machine learning代考|Image processing and convolutional filters 本节深入探讨一些图像处理概念，并回顾在本书后面变得重要的卷积操作。因此，重要的是要好好复习本节。此外，讨论让我们有机会更多地练习 Python 编程。 我们已经展示了由二维矩阵给出的灰度图像，其中每个分量代表一个像素的灰度级。为了表示彩色图像，我们现在只需要三个通道，每个通道代表一种原色，红色 (R)、绿色 (G) 和蓝色 (B)。这样的 RGB 图像用一个张量表示米×ñ×3， 在哪里米和ñ是水平和垂直分辨率的大小，以像素为单位。读取和显示图像文件包含在 Matplotlib 库中，但也可以使用各种其他包。例如，给定一个测试图像，例如图 2.8B 所示的本书网页上的 motorbike.jpg，将这个图像读入一个数组并绘制它的程序是 形状函数表明该图像的分辨率为600×800具有三个颜色通道的像素。 机器学习的一个主要应用是对象识别，现在我们将举例说明如何使用过滤器来突出图像中的特定特征。假设我们正在寻找照片中某个大小的红点。假设我们得到一张 RGB 图像，如图 2.9A 所示。将该图像读入数组并绘制它的相应程序是 创建一个新的红色像素，产生如图 2.9B 所示的图像。我们将此图像用于以下讨论。 我们想用下面的程序检测的红点是左上角的结构，而不是坐标的红色像素(6,5)我们刚刚在上面手动添加的。我们添加了这个红色像素来讨论如何区分我们正在寻找的主要红色对象和图片中的其他红色对象。分别看红色、绿色和蓝色通道很有趣，如图所示。2.9C. 这些图中的每一个都可以使用代码生成，如下面的红色通道示例所示。 ## cs代写|机器学习代写machine learning代考|Machine learning with sklearn 名为 scikit 的开源系列库建立在 NumPy 和 SciPy 库之上，以提供更多特定领域的支持。本章我们简要介绍 scikit-learn 库，简称 sklearn。该库最初是 David Cournapeau 的 Google Summer of Code 项目，后来发展成为一个开源库，现在提供各种完善的机器学习算法。这些算法以及优秀的文档可在$.

scikit-learn 或简称 sklearn 提供的算法概要和典型工作流程如图 3.1 所示。机器学习方法由此分为分类、回归、聚类和降维。我们稍后将讨论相应算法背后的思想，特别是在本章的后半部分，尽管我们首先将这些方法视为一个黑盒。我们在本章中特别概述了用于分类的典型机器学习设置。在某些应用程序中，无需太多了解这些算法的确切功能即可获得足够的性能，尽管我们稍后将展示将机器学习应用于更具挑战性的情况并避免陷阱需要对算法有更深入的了解。

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## cs代写|机器学习代写machine learning代考|Scientific programming with Python

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习machine learning代写方面经验极为丰富，各种代写机器学习machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## cs代写|机器学习代写machine learning代考|Basic language elements

As a general purpose programming language, Python contains basic programing concepts such as basic data types, loops, conditional statements, and subroutines. We will briefly review the associated syntax with examples that are provided in file FirstProgram. ipynb. In addition to such basic programming constructs, all major programming languages such as Python are supported by a large number of libraries that enable a wide array of programming styles and specialized functions. We are here mainly interested in basic scientific computing, in contrast to system programming, and for this we need multidimensional arrays. We therefore base almost all programs in this book on the NumPy library. NumPy provides basic support of common scientific constructs and functions such as trigonometric functions and random number generators. Most importantly, it provides support for N-dimensional arrays. NumPy has become the standard in scientific computing with Python. We will use this wellestablished constructs to implement vectors, matrices and higher dimensional arrays. While there is a separate matrix class, this construct is limited to a two dimensional structure and has not gained widespread acceptance.

An established way to import the NumPy library in our programs is to map them to the name space “np” with the command import numpy as np. In this way, the specific methods or functions of NumPy are accessed with the prefix $\mathrm{np} .$ In addition to importing NumPy, we always import a plotting library as plotting results will be very useful and a common way to communicate results. We specifically use the popular PyPlot package of the Matploitlib library. Hence, we nearly always start our program with the two lines In the following, we walk through a program in the Jupyter environment called FirstProgram. These lines of code are intended to show the syntax of the basic programming constructs that we need in this book. We start by demonstrating the basic data types that we will be using frequently. We are mainly concerned with numerical data, of which a scalar is the simplest example, We here show the code as well as the response of running the program with the print () function. Comment lines can be included with the hash-tag symbol #. The type of the variables are dynamically assigned in Python. That is, a variable name and corresponding memory space is allocated the first time a variable with this name is used on the left hand side of an assignment operator ” $=$ “. In this case it is an interger value, but we could also assign a real-valued variable with textttaScalar=4.0.

## cs代写|机器学习代写machine learning代考|Functions

This book tries to use minimal examples that do not require advanced code structuring techniques such as object oriented-programming, although those techniques are available in Python. The basic code reuse technique is of course the definition of a function. In Python this can be done with the following template. To structure code better, specifically to define some code that can be reused, we have the option to define functions like

Simple variables are passed by value in Python, but more complex objects might be referred by reference. It is therefore wise to be careful when changing the content of calling variables in the functions. The function can be called with an argument, and we showed in the example how to provide a default argument.

It is also useful to define an inline version of a function, such as defining logistic sigmoid function We will use this inline function below to plot it.

## cs代写|机器学习代写machine learning代考|Code efficiency and vectorization

Machine learning is about working with large collections of data. Such data are kept in data bases, spreadsheets, or simply in text files, but to work with them we load them into arrays. Since we define operations on such arrays, it is better to treat these arrays as vectors, matrices, or generally as tensors. Traditional programming languages such as $\mathrm{C}$ and Fortran require us to write code that loops over all the indices in order to specify operations that are defined on all the data. For example, as provided in the program MatrixMultiplication.ipynb, let us define two random $n \times n$ matrices with the NumPy random number generator for uniformly distributed numbers,

It is now common to call this style of programming a vectorized code. Such a vectorized code is not only much easier to read, but it is also essential to write efficient code. The reason for this is that the system programmers can implement such routines very efficiently, and this is difficult to match with the more general but inefficient explicit index operation.

To demonstrate the efficiency issue, let us measure the time of operations for a matrix multiplication. We start as usual by importing the standard NumPy and Matplotlib libraries, and we also import a timer routine with We then define a method called matmulslow that implements a matrix multiplication with an explicit iteration over the indices.

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## cs代写|机器学习代写machine learning代考|Recent advances

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习machine learning代写方面经验极为丰富，各种代写机器学习machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## cs代写|机器学习代写machine learning代考|Recent advances

Many advances have been made in recent years based on machine learning, in particular with deep learning methods for image processing, natural language processing, and more general data analytics. Many companies are now enthusiastic about data analytics, using data in a wider sense to gain insights into customer profiles or other data mining tasks. Machine learning is an important part of a data analytics engine. Data analytics often require additional care such as data security to ensure privacy, the ability to acquire and maintain large data collections, and also to make results available in a form useful for humans. We will not delve into many of these aspects but concentrate instead on the data modeling aspects.

One of the most visible impacts of deep learning has been made in computer vision through convolutional neural networks. The basic applications in this area are mostly based on recognition networks and methods for semantic segmentation. However, such methods have now also advanced object localization, object tracking, and scene understanding, to name but a few. Some examples from my own projects are shown in Fig. 1.7. The left-hand image shows semantic segmentation to identify and localize crop and weed for a robotic farming application. The right-hand image shows an application of fish tracking for aquaculture applications.Another area that has seen a huge improvement is the area of natural language processing (NLP). It has long been an important tasks to build programs that understand natural languages for applications such as translation, sentiment analysis, or to enable

some form of formal analysis of technical reports. Various methods for sequence modeling have contributed greatly to this area, in particular recurrent neural networks, discussed later in this book.

A developing area in machine learning are generative models. Generative models are models that can make examples of instances of a class. For example, a generative models can learn about cars from examples and then generate images of new cars by itself. Such networks could then be used in some creative way. Examples of systems that can learn generative models are variational autoencoders (VAEs) and generative adversarial networks (GANs). These methods demonstrate an important advance: the ability to capture the probabilistic structure of objects which in turn can be exploited in various ways.

Machine learning methods have shown that it can produce solutions to problems that have previously been intractable. For example, computer programs to play the Chinese board game “Go” have been mostly available only at an advance novice level until a few years ago. However, in 2016 , a machine learning program called “Alpha-Go” that combined cleverly supervised and reinforcement learning was able to beat a player, Mr. Lee Sedol, who is considered one of the best players of the last decade and had previously won sixteen world titles. Go was considered to be a real challenge for AI systems as it was considered to rely a lot on “gut feelings” rather than quantifiable strategies. It was therefore a huge success when computers, which had only reached levels of an advanced beginner a few years prior, could win against such an accomplished player.

## cs代写|机器学习代写machine learning代考|No free lunch, but worth the bite

Neural networks and other models, such as support vector machines and decision trees, are fairly general models in contrast to Bayesian models that are usually much better at specifying a causal structure of interpretable entities. More specific models should outperform more general model as long as they faithfully represent the underlying structure of the world model. This fact is captured by David Wolpert’s “No free lunch” theorem, which states that there is not a single algorithms that covers all applications better than some other algorithms. The best model is, of course, the real world model, as discussed earlier, which we generally do not know. Applying machine learning algorithms is therefore somewhat of an art and requires experience and knowledge of the constraints of the algorithms. Discussions of what is an appropriate model are

sometimes cumbersome and can distract us from making good use of them. We take a more practical approach, letting a user define what an appropriate contribution is for a machine learning model. For example, the best accuracy of a prediction might not always be the goal, and other considerations such as the speed of processing, the number of required training data, or the ability to interpret data can be important factors. We will therefore include brief discussions of some classic machine learning algorithms even if they do not represent the latest research in this area.

An interesting remark that often cops up in discussions of some machine learning algorithms and, in particular, neural networks is that these methods are commonly described, and somewhat criticized, as being black box methods. By “back box” we usually mean that the internal structure is not known. However, the machine learning models usually live in a computer where we can inspect all the components; these methods are hence known as white box methods. A better way to describe the difficulties with the ability human have in interpreting machine learning models is due to the fact that trained deep learning models are commonly complex models that implement complex decision rules. While some application might have as a goal the learning of human interpretable decision rules, other might rather be interested in achieving better prediction performance, which often requires more fine-grained rules.
We will see in Chapter 3 that writing a program to apply machine learning algorithms to data is often not very difficult. New algorithms will often find their way to graphical data mining tools, which makes them available to an even larger application community. However, applying such algorithms correctly in different application domains can be challenging and it is well known that some experience is required. We therefore concentrate in the following on explaining what is behind these algorithms and how different theoretical concepts are explored by them. Some understanding of the algorithms is absolutely necessary to avoid pitfalls in their application.

The basic first step for the application of ML methods is how to represent the data. We mentioned already some different data structures of inputs such as vectors or tensors. However, there are usually many different possible ways to represent a problem numerically. In the past it has been crucial to work out an appropriate highlevel data representation such as summary statistics to keep the dimensionality of the model low. However, the recent progress in deep learning made it possible to treat this representation itself as part of the learning problem. Representational learning has thus become an important part of machine learning.

## cs代写|机器学习代写machine learning代考|Programming environment

We will be using a programming environment called Jupyter. Specifically, we will be using the Jupyter notebook that allows us to write code with a simple editor and display comments and outputs in the same file. Jupyter is accessed through the browser and contains form fields in which code and comments can be added. These fields can then be executed and the feedback from print commands or figure plots are displayed after each block within the same document. This makes it very useful in documenting brief code and small exercises. An example program is shown in Fig. 2.1. All example programs in this book are available as Jupyter files on the web.

The Jupyter notebook has an interface to launch the Python interpreter and to run individual sections or all the code. The header with comments is produced by executing a text cell. This is useful to produce some documentations. Also, the notebook can be distributed with the output that can facilitate communications about code. The numbers on the left shows a consecutive number of calls to the interpreter. In the shown example, the first program cell was run first to load the libraries, and then the second cell was run twice; this is why a [3] is displayed in front of this cell. When the program is running, an $[*]$ is displayed. The second cell produces the output 4 , which is displayed after the cell.

A more advanced environment for bigger programs with more traditional programming support is Spyder. This tool includes an editor, a command window, and further programming support such as displays of variables and debugging support. This pro-gram mimics more traditional programming environment such as the ones found in Matlab and R. An example view of Spyder is shown in Fig. 2.2. On the left is the editor window that contains a syntax-sensitive display to write the programs, and on the right is the console to launch line commands such as executing and interpreting the code. As Python is an interpreted language, it is possible to work with the programs in an interactive way, such as running a simulation and than plotting results in various ways. The Spyder development environment is recommended for bigger projects.

## cs代写|机器学习代写machine learning代考|Programming environment

Jupyter notebook 有一个接口来启动 Python 解释器并运行各个部分或所有代码。带有注释的标题是通过执行文本单元格生成的。这对于生成一些文档很有用。此外，notebook 可以与可以促进代码交流的输出一起分发。左侧的数字显示了对口译员的连续呼叫次数。在所示示例中，第一个程序单元首先运行以加载库，然后第二个单元运行两次；这就是为什么在此单元格前面显示 [3] 的原因。当程序运行时，一个[∗]被展示。第二个单元格产生输出 4，显示在单元格之后。

Spyder 是为具有更传统编程支持的大型程序提供的更高级环境。该工具包括一个编辑器、一个命令窗口和进一步的编程支持，例如变量显示和调试支持。该程序模仿了更传统的编程环境，例如 Matlab 和 R 中的编程环境。Spyder 的示例视图如图 2.2 所示。左侧是编辑器窗口，其中包含用于编写程序的语法敏感显示，右侧是控制台，用于启动行命令，例如执行和解释代码。由于 Python 是一种解释型语言，因此可以以交互方式处理程序，例如运行模拟并以各种方式绘制结果。大型项目推荐使用 Spyder 开发环境。

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## cs代写|机器学习代写machine learning代考|The basic idea and history of machine learning

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习machine learning代写方面经验极为丰富，各种代写机器学习machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## cs代写|机器学习代写machine learning代考|Introduction

This chapter provides a high-level overview of machine learning, in particular of how it is related to building models from data. We start with a basic idea in the historical context and phrase the learning problem in a simple mathematical term as function approximation as well as in a probabilistic context. In contrast to more traditional models we can characterize machine learning as nonlinear regression in high-dimensional spaces. This chapter seeks to point out how diverse sub-areas such as deep learning and Bayesian networks fit into the scheme of things and aims to motivate the further study with some examples of recent progress.

Machine learning is literally about building machines, often in software, that can learn to perform specific tasks. Examples of common tasks for machine learning is recognizing objects from digital pictures or predicting the location of a robot or a selfdriving car from a variety of sensor measurements. These techniques have contributed largely to a new wave of technologies that are commonly associated with artificial intelligence (AI). This books is dedicated to introducing the fundamentals of this discipline.

The recent importance of machine learning and its rapid development with new industrial applications has been breath taking, and it is beyond the scope of this book to anticipate the multitude of developments that will occur. However, the knowledge of basic ideas behind machine learning, many of which have been around for some time, and their formalization for building probabilistic models to describe data are now important basic skills. Machine learning is about modeling data. Describing data and uncertainty has been the traditional domain of Bayesian statistics and probability theory. In contrast, it seems that many exciting recent techniques come from an area now called deep learning. The specific contribution of this book is its attempt to highlight the relationship between these areas.

We often simply say that we learn from data, but it is useful to realize that data can mean several things. In its most fundamental form, data usual consist of measurements such as intensity of light in a digital camera, the measurement of electric potentials in Electroencephalography (EEG), or the recording of stock-market data. However, what we need for learning is a teacher who provides us with information about what these data should predict. Such information can take many different forms. For example, we might have a form of data that we call labels, such as the identity of objects in a digital photograph. This is exactly the kind of information we need to learn optical object recognition. The teacher provides examples of the desired answers that the student (learner) should learn to predict for novel inputs.

## cs代写|机器学习代写machine learning代考|Mathematical formulation of the basic learning problem

Much of what is currently most associated with the success of machine learning is supervised learning, sometimes also called predictive learning. The basic task of supervised learning is that of taking a collection of input data $x$, such as the pixel values of an image, measured medical data, or robotic sensor data, and predicting an output value $y$ such as the name of an object in an image, the state of a patient’s health, or the location of obstacles. It is common that each input has many components, such as many millions of pixel values in an image, and it is useful to collect these values in a mathematical structure such as a vectors (1-dimensional), a matrix (2-dimensional), or a tensor that is the generalization of such structures to higher dimensions. We often refer to machine learning problems as high-dimensional which refers, in this context, to the large number of components in the input structure and not to the dimension of the input tensor.

We use the mathematical terms vector, matrix, and tensor mainly to signify a data structure. In a programming context these are more commonly described as $1-$ dimensional, 2-dimensional, or higher-dimensional arrays. The difference between arrays and tensors (a vector and matrix are special forms of a tensor) is, however, that the mathematical definitions also include rules on how to calculate with these data structures. This book is not a course on mathematics; we are only users of mathematical notations and methods, and mathematical notation help us to keep the text short while being precise. We follow here a common notation of denoting a vector, matrix, or tensor with bold-faced letters, whereas we use regular fonts for scalars. We usually call the input vector a feature vector as the components of this are typically a set feature values of an object. The output could also be a multi-dimensional object such as a vector or tensor itself. Mathematically, we can denote the relations between the input and the output as a function
$$y=f(\mathbf{x}) .$$
We consider the function above as a description of the true underlying world, and our task in science or engineering is to find this relation. In the above formula we considered a single output value and several input values for illustration purposes, although we see later that we can extend this readily to multiple output values.

Before proceeding, it is useful to clarify our use of the term “feature.” Features represent components that describe the inputs to our learning systems. Feature values are often measured data in machine learning. Sometime the word “attributes” is used instead. In the most part, we use these terms interchangeably. However, sometimes researchers make a small distinction betwen the terms, using attributes to denote unique content while using feature as a derived value, such as the square of an attribute. This strict distinction is usually not crucial for the understanding of the context so our use of the term feature includes attributes.

Returning to the world model in equation $1.1$, the challenge for machine learning is to find this function, or at least to approximate it sufficiently. Machine learning offers several approaches to deal with this. One approach that we will predominantly follow is to define a general parameterized function
$$\hat{y}=\hat{f}(\mathbf{x} ; \mathbf{w})$$

## cs代写|机器学习代写machine learning代考|Non-linear regression in high-dimensions

The simplest example of supervised machine learning is linear regression. In linear regression we assume a linear model such as the function,
$$y=w_{0}+w_{1} x$$
This is a low-dimensional example with only a single feature, value $x$, and a scalar label, value y. Most of us learned in high school to use mean square regression. In this method we choose as values for the offset parameter $w_{0}$ and the slope parameter $w_{1}$ the values that minimize the summed squared difference between the regressed and the data points. This is illustrated in Fig. 1.4A. We will later explain this procedure in more detail. This is an example where data are used to determine the parameters of a parameterized model, and this model with the fitted parameters can then be used to predict $y$ values for new $x$ values. This is in essence supervised learning.What makes modern machine learning go beyond this type of modeling is that we are now usually describing data in high dimensions (many features) and to use non-linear functions. This seems straight forward, but there are several problems in practice going down this route. For example, Fig. 1.4B shows a non-linear function that seems somewhat to describe the pattern of the data much better than the linear model in Fig. 1.4A. However, the non-linear model shown in Fig. 1.4C is also a solution. It even goes through all the training points. This is a particularly difficult problem. If we are allowed to increase the model complexity arbitrarily, then we can always find a model which goes through all the data points. However, the data points might have a simple relation, such as the linear one of Fig. 1.4A, and the variation only represents noise. Fitting the data point with this noise as in Fig. 1.4C does therefore mean that we are overfitting the data.

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。