### cs代写|机器学习代写machine learning代考|Dimensionality reduction, feature selection, and t-SNE

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习machine learning代写方面经验极为丰富，各种代写机器学习machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## cs代写|机器学习代写machine learning代考|Dimensionality reduction, feature selection, and t-SNE

Before we dive deeper into the theory of machine learning, it is good to realize that we have only scratched the surface of machine learning tools in the sklearn toolbox. Besides classification, there is of course regression, where the label is a continuous variable instead of a categorical. We will later see that we can formulate most supervised machine learning techniques as regression and that classification is only a special case of regression. Sklearn also includes several techniques for clustering which are often unsupervised learning techniques to discover relations in data. Popular examples are k-means and Gaussian mixture models (GMM). We will discuss such techniques and unsupervised learning more generally in later chapters. Here we will end this section by discussing some dimensionality reduction methods.

As stressed earlier, machine learning is inherently aimed at high-dimensional feature spaces and corresponding large sets of model parameters, and interpreting machine learning results is often not easy. Several machine learning methods such as neural networks or SVMs are frequently called a blackbox method. However, there is nothing hidden from the user; we could inspect all portions of machine learning models such as the weights in support vector machines. However, since the models are complex, the human interpretability of results is challenging. An important aspect of machine learning is therefore the use of complementary techniques such as visualization and dimensionality reduction. We have seen in the examples with the iris data that even plotting the data in a subspace of the 4-dimensional feature space is useful, and we could ask which subspace is best to visualize. Also, a common technique to keep the model complexity low in order to help with the overfitting problem and with computational demands was to select input features carefully. Such feature selection is hence closely related to dimensionality reduction.

Today we have more powerful computers, typically more training data, as well as better regularization techniques so that input variable selection and standalone dimensionality reduction techniques seems less important. With the advent of deep learning we now often speak about end-to-end solutions that starts with basic features without the need for pre-processing to find solutions. Indeed, it can be viewed as problematic to potential information. However, there are still many practical reasons why dimensionality reduction can be useful, such as the limited availability of training data and computational constraints. Also, displaying results in human readable formats such as 2-dimensional maps can be very useful for human-computer interaction (HCI).
A traditional method that is still used frequently for dimensionality reduction is principle component analysis (PCA). PCA attempts to find a new coordinate system of the feature representation which orders the dimensions according to how spread the data are along these dimensions. The reasoning behind this is that dimensions with a large spread of data would offer the most sensitivity for distinguishing data. This is illustrated in Fig. 3.5. The direction of the largest variance of the data in this figure is called the first principal component. The variance in the perpendicular direction, which is called the second principal component, is less. In higher dimensions, the next principal components are in further perpendicular directions with decreasing variance along the directions. If one were allowed to use only one quantity to describe the data, then one can choose values along the first principal component, since this would capture an important distinction between the individual data points. Of course, we lose some information about the data, and a better description of the data can be given by including values along the directions of higher-order principal components. Describing the data with all principal components is equivalent to a transformation of the coordinate system and thus equivalent to the original description of the data.

## cs代写|机器学习代写machine learning代考|Decision trees and random forests

As stressed at the beginning of this chapter, our main aim here was to show that applying machine learning methods is made fairly easy with application packages like sklearn, although one still needs to know how to use techniques like hyperparameter tuning and balancing data to make effective use of them. In the next two sections we want to explain some of the ideas behind the specific models implemented by the random forrest classifier (RPF) and the support vector machine (SVM). This is followed in the next chapter by discussions of neural networks. The next two section are optional in the sense that following the theory behind them really require knowledge of additional mathematical concepts that are beyond our brief introductory treatment in this book. Instead, the main focus here is to give a glimpse of the deep thoughts behind those algorithms and to encourage the interested reader to engage with further studies. The asterisk in section headings indicates that these sections are not necessary reading to follow the rest of this book.

We have already used a random forrest classifier (RFC), and this method is a popular choice where deep learning has not yet made an impact. It is worthwhile to outline the concepts behind it briefly since it is also an example of a non-parametric machine learning method. The reason is that the structure of the model is defined by the training data and not conjectured at the beginning by the investigator. This fact alone helps the ease of use of this method and might explain some of its popularity, although there are additional factors that make it competitive such as the ability to build in feature selection. We will briefly outline what is behind this method. A random forest is actually an ensemble method of decision trees, so we will start by explaining what a decision tree is.

## cs代写|机器学习代写machine learning代考|Linear classifiers with large margins

In this section we outline the basic idea behind support vector machines (SVM) that have been instrumental in a first wave of industrial applications due to their robustness and ease of use. A warning: SVMs have some intense mathematical underpinning, although our goal here is to outline only some of the mathematical ideas behind this method. It is not strictly necessary to read this section in order to follow the rest of the book, but it does provide a summary of concepts that have been instrumental in previous progress and are likely to influence the development of further methods and research. This includes some examples of advanced optimization techniques and the idea of kernel methods. While we mention some formulae in what follows, we do not derive all the steps and will only use them to outline the form to understand why we can apply a kernel trick. Our purpose here is mainly to provide some intuitions.

SVMs, and the underlying statistical learning theory, was largely invented by Vladimir Vapnik in the early $1960 \mathrm{~s}$, but some further breakthroughs were made in the late 1990 s with collaborators such as Corinna Cortes, Chris Burges, Alex Smola, and Bernhard Schölkopf, to name but a few. The basic SVMs are concerned with binary classification. Fig. $3.9$ shows an example of two classes, depicted by different symbols, in a 2-dimensional attribute space. We distinguish here attributes from features as follows. Attributes are the raw measurements, whereas features can be made up by combining attributes. For example, the attributes $x_{1}$ and $x_{2}$ could be combined in a feature vector $\left(x_{1}, x_{1} x_{2}, x_{2}, x_{1}^{2}, x_{2}^{2}\right)^{T}$. This will become important later. Our training set consists of $m$ data with attribute values $\mathbf{x}^{(i)}$ and labels $y^{(i)}$. We put the superscript index $i$ in brackets so it is not mistaken as a power. For this discussion we chose the binary labels of the two classes as represented with $y \in{-1,1}$. This will simplify some equations.

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。