## 计算机代写|机器学习代写machine learning代考|COMP4702

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习 machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习 machine learning代写方面经验极为丰富，各种代写机器学习 machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|机器学习代写machine learning代考|The Research Behind Interpretability

Several industries are witnessing an increasing trend of leveraging ML for high stake prediction applications, which deeply impacts human lives.

When automated algorithms make high-stake decisions, the problem of incorrect predictions becomes even more severe. To address this issue, explainable machine learning emerged as a field of study focusing on machine learning interpretability and shifting toward a more transparent AI. The main goal of this was to create a suite of interpretable models and methods that produce human-friendly explanations and maintain high predictive performance levels.
One of the entities in this field is the Defense Advanced Research Projects Agency (DARPA), funded by the US Department of Defense. It created the interpretability and explainability program that funds academic and military research at 11 US research laboratories. The program information states that the program aims to produce more explainable models while maintaining high predictive performance levels, enabling appropriate human trust, and understanding for better management of the emerging generation of artificially intelligent partners.
This is not the only example of public focus on AI and machine learning interpretability. In 2016, the White House Office of Science and Technology Policy (OSTP) released a report titled, “Preparing for the Future of Artificial Intelligence,” which states that AI systems are open, transparent, and understandable so that people can interrogate the assumptions and decisions behind the models’ decisions.
Also, the Association for Computing Machinery US Public Policy Council (USACM) released a “Statement on algorithmic transparency and accountability” in 2017. It is stated that explainability is one of the seven principles for algorithmic transparency and accountability. Then it is particularly important in public policy contexts.
Other countries have also made public the demand for AI and machine learning interpretability. One example is the draft version of the Dutch AI, which is utterly focused on explainable AI, stating the utmost importance of AI systems being accurate and able to explain how the system came to its decision.

## 计算机代写|机器学习代写machine learning代考|Machine Learning Interpretability Taxonomy

Interpretable machine learning techniques can generally be grouped into three categories.

• Pre-model interpretability uses interpretable techniques used before model building.
• Intrinsic interpretability uses explanations derived using the model structure.
• Post hoc interpretability uses explanations derived from methods outside the model structure generally run after the model has been built and predictions have been made using the model.

Pre-model interpretability is exploratory data analysis on a data set to understand the distribution of various features. It helps determine any relationships in different feature values and each feature with the dependent.

Intrinsic interpretability or explanations are computed using self-explanatory models that incorporate interpretability directly to their structures. The algorithms of this category include decision trees, rule-based models, linear models, and attention models. With the use of an intrinsic interpretable model, you might not achieve the same accuracy as black-box models; however, it becomes easy to understand the models’ working because of their inherent structure. Intrinsic interpretable models can further be divided into global methods and local methods.

Global interpretability means users can understand how the model works globally by inspecting the structures and parameters of a complex model. In contrast, local interpretability examines an individual prediction of a model locally, figuring out why the model makes its decision.

After fitting a model on the data, the data scientist then analyses it to understand the model results. The process of analyzing the model using the interpretability method to extract various types of information is called post hoc interpretability. There are several post hoc interpretability methods that can be used in different forms on top of various models to understand the model’s inner workings. Post hoc interpretability methods are implemented after the predictions are made from the model. The common types of input that go into these kinds of models involve training data, the black-box model itself, or prediction functions.
The diagram in Figure 3-2 shows how all interpretability models can be divided into different sections. Some sections focus on the s separations of different techniques, while some sections focus on model-related sections.

# 机器学习代考

## 计算机代写|机器学习代写machine learning代考|Machine Learning Interpretability Taxonomy

• 预模型可解释性使用模型构建之前使用的可解释技术。
• 内在可解释性使用从模型结构派生的解释。
• 事后可解释性使用从模型结构外部的方法派生的解释，这些方法通常在构建模型并使用模型进行预测后运行。

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|机器学习代写machine learning代考|COMP30027

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习 machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习 machine learning代写方面经验极为丰富，各种代写机器学习 machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|机器学习代写machine learning代考|What Are Black-Box Models

Black-box model is a term to describe models that have complex workings to compute an output typically spread over multiple steps. In a black-box model, only the input and the output are known to the users, while all interim steps and calculations are difficult to comprehend (see Figure 2-6). Over time the black-box models have gained popularity as the preferred modeling technique due to their ability to generate high accuracy over a variety of data sources.

• Can we interpret a deep neural network?
• How about a random forest with 500 trees?
• Building a complex and dense machine learning model has the potential of reaching our desired accuracy, but does it make sense?
• Can you open the black-box model and explain how it arrived at the result?
The spark of black-box models was further ignited in recent years on competitive coding platforms such as Kaggle, where black-box models started topping the chart
• across multiple problem statements. By 2015 , gradient boosting and neural networks were the most popular terms among data scientists, and very soon, these black-box models entered the world of actual implementation across businesses. In an ideal world, every model would be explainable and transparent, useful in the following.
• Critical decisions (e.g., healthcare)
• Seldom made or non-routine decisions (e.g., M\&A work)
• Stakeholder justification-required decisions (e.g., strategic business choices)
• Situations where interactions matter more than outcomes (e.g., root cause analysis)
In the real world, however, there’s a time and place for both sorts of models. Not all decisions are equivalent, and developing interpretable models is extremely challenging (and in some cases impossible; for instance, modeling a posh scenario or a highdimensional space, as in image classification). Even in easier problems, black-box models typically outperform white-box counterparts due to black-box models’ ability to capture high non-linearity and interactions between features. Despite the advantage of having high performance in terms of accuracy, there is a very prevalent downside to black-box models: the lack of ability to provide explanations behind the predictions made to internal teams or audit firms and regulators.
Figure 2-7 shows how different methods are placed on a two-way axis between interpretability and accuracy. We can see in the image that more complex methods like deep neural networks and support vector machines are high in accuracy but fall on the lower value of the interpretability axis.

## 计算机代写|机器学习代写machine learning代考|What Is Interpretability

Interpretability is that the degree to which we can understand the explanation for a choice. Another one is: Interpretability is the degree to which a person can consistently predict the model’s result. Higher the interpretability of a machine learning model, the better it’s for somebody to grasp why certain decisions or predictions are made. A model is more interpretable than another model if its decisions are easier to grasp.
Most machine learning systems require the power to explain to stakeholders why certain predictions are made. When choosing an appropriate machine learning model, we frequently think about the accuracy vs. interpretability trade-off.
The accuracy vs. interpretability trade-off is based on an important assumption: explainability is an inherent property of the model. We believe, however, that with the right techniques, any machine learning model can be made more interpretable, albeit at complexity and cost, which is higher for some models than others.
When a model predicts or finds insights, it takes certain decisions and choices.
Model interpretation tries to understand and explain these decisions taken by the model (i.e., the what, why, and how). The key to model interpretation is transparency, the ability to question, and the ease of understanding model decisions by humans. The three most important aspects of model interpretation are explained as follows.

• What caused the model to make certain predictions? We should have the ability to query our model and find out feature interactions to get an idea of which features might be important in the decisionmaking rules of the model. This ensures the fairness of the model.
• Why did the model take a particular decision? We should also validate and justify why certain key features were responsible for driving decisions made by a model during predictions. This ensures the accountability and reliability of the model.
• Can we trust model predictions? We should evaluate and validate any data point and how a model takes decisions on it. This should be demonstrable and easy to understand for key stakeholders that the model works as expected. This ensures the transparency of the model.

# 机器学习代考

## 计算机代写|机器学习代写machine learning代考|What Are Black-Box Models

• 我们可以解释深度神经网络吗？
• 有 500 棵树的随机森林怎么样？
• 构建一个复杂而密集的机器学习模型有可能达到我们想要的精度，但这有意义吗？
• 你能打开黑盒模型并解释它是如何得出结果的吗？
近年来，黑盒模型的火花在 Kaggle 等竞争编码平台上进一步被点燃，黑盒模型开始登上榜首
• 跨多个问题陈述。到 2015 年，梯度提升和神经网络成为数据科学家中最流行的术语，很快，这些黑盒模型进入了跨业务实际实施的世界。在理想的世界中，每个模型都是可解释和透明的，在以下方面很有用。
• 关键决定（例如，医疗保健）
• 很少做出或非常规的决定（例如，并购工作）
• 利益相关者理由要求的决策（例如，战略业务选择）
• 交互比结果更重要的情况（例如，根本原因分析）
然而，在现实世界中，两种模型都有时间和地点。并非所 即使在更简单的问题中，黑盒模型通常也优于白盒模型，因为黑盒模型能够捕获高非线性和特征之间的交互。尽管在准确性方面具有高性能的优势，但黑盒模型存在一个非常普遍的缺点：无法向内部团队或审计公司和监管机构提供预测背后的解释。
图 2-7 显示了如何将不同的方法置于可解释性和准确性之间的双向轴上。我们可以在图像中看到更复杂的方法，如深度神经网络和支持向量机，其准确性很高，但落在可解释性轴的较低值上。

## 计算机代写|机器学习代写machine learning代考|What Is Interpretability

• 是什么导致模型做出某些预测？我们应该能够查询我们的模型并找出特征交互，以了解哪些特征在模型的决策规则中可能很重要。这保证了模型的公平性。
• 为什么模型会做出特定决定？我们还应该验证和证明为什么某些关键特征负责驱动模型在预测期间做出的决策。这确保了模型的问责制和可靠性。
• 我们可以相信模型预测吗？我们应该评估和验证任何数据点以及模型如何对其做出决策。对于模型按预期工作的主要利益相关者来说，这应该是可证明的并且易于理解。这保证了模型的透明性。

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|机器学习代写machine learning代考|COMP5318

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习 machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习 machine learning代写方面经验极为丰富，各种代写机器学习 machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|机器学习代写machine learning代考|Humans Are Explanation Hungry

Humans are explanation hungry animals. Since the start of civilization, we have always tried to find the why and how of things. Using the knowledge of why and how that has been collected over centuries, we humans have built rules or best practices for specific tasks. With the evolution of technology, we have expanded the rules and have been able to create computer software to process these rules. The following story highlights why simple explanations or the why and how behind processes are important.

Two doctors were in the same room attending their patients. Most of the patients were suffering from diabetes. Arjun sat in front of his doctor with curious eyes.
Occasionally he looked across the room. A person of similar age, weight, and height sat across the room with the other doctor. Arjun’s doctor told him that his diabetes was not under control, and he prescribed medicines for a few more weeks. Arjun was disheartened. This illness was like a black box to him. He tried to discuss why the medicines were not working, but his doctor was too busy to handle additional questions. He asked the attendant to send in the next patient and politely signaled Arjun to leave and come back after two weeks.

Arjun slowly walked out of the room and noticed the same person sitting with the other doctor. Hoping that he might be experiencing the same problem, Arjun approached them and introduced himself. The fellow was very joyful. His name was Vikas, and he was the same age as Arjun. He also had diabetes, but currently, it was under control, and he had come to his doctor to express his gratitude for the treatment. Arjun was excited after hearing his treatment story. He puzzlingly asked Vikas how he got it under control. What medicines did the doctor prescribe? Vikas showed his prescription to Arjun, and Arjun was completely surprised. Every medicine was the same as his. Arjun was curious about how the same medicines that worked for Vikas did not change his own symptoms. They agreed to get some coffee nearby and started talking.
Vikas then explained how he was also struggling at the start of the treatment and how he felt that diabetes was like a black box until he read a book that explained diabetes to ordinary people in simple terms. After that, his life changed. Vikas told Arjun how diabetes is not just a disease that can be treated with medicines, but it also requires life style changes, good food habits, and a healthy workout regime. The book had simple explanations for everything. How a particular food affects sugar levels in the body, how the sleep cycle affects sugar levels, the most important factors that increase sugar levels, and the important exercises that reduce stress and manage sugar. The answers to these questions helped Vikas convert his black-box disease into an explainable one. He knew the reason behind his fluctuating levels, and he started taking relevant actions. He suddenly knew which foods elevate his sugar levels and what actions to take post-eating to bring it down. With his newfound knowledge and medicine, Vikas was very soon able to control his diabetes. He now enjoys his favorite foods and feels healthy more than before.

## 计算机代写|机器学习代写machine learning代考|Explanations in Machine Learning

Let’s look at an example of a bank loan processing application. The rules for processing bank loans have been made after years of research. In the past, the bank manager decided who should get a loan based on an applicant’s income and credit history. For an application denied, the bank manager had a straightforward answer to why the loan didn’t get through. These days banking companies have machine learning models trained on millions of loan applications with hundreds of variables. These models can help the bank manager determine with high accuracy whether a loan should be granted or not. But since it is now an algorithmic decision with a very complex process, a few questions arise: Why was the loan denied? Or why was the loan granted? Is the decision made by the algorithm correct?
Throughout this book, we try to answer such questions and explain methods that help companies or individuals answer such questions.

Chapter 1 explained machine learning and how its importance is rising. Now, let’s apply this concept of understanding why and how to machine learning models by looking at a simple loan approval or rejection.
Figure 2-1 shows simple banking loan model data.

The data has variables like loan status, loan amount term, income, education, credit history, and age. While building models, we would fit a logistic regression model to predict whether a loan application is approved or rejected. We have some independent variables and one target variable (i.e., Loan_Status in the data set). In logistic regression, the target $\mathbf{y}$ is binary (Approved $\mathrm{p}=1$ /Rejected $\mathrm{p}=0$ ), and the probability of granting/ rejecting the loan $(\mathbf{p})$ is determined based on the cutoff value. The goal is to estimate the coefficients $\boldsymbol{\alpha i}$.
Logit $(p)=\log (p /[p-1])=\alpha 0+\alpha 1$. age $+\alpha 2$. income $+\alpha 3$. age $+$ $\alpha 4$. credit history
To find coefficients $\boldsymbol{\alpha} \mathbf{i}$, we train the classification model with a labeled data history. The decision approved/rejected is already known, using cross-entropy as a loss function to compare the predictions $\wedge \boldsymbol{y}$, vs. labels, $\boldsymbol{y}$ (see Figure 2-2).

# 机器学习代考

## 计算机代写|机器学习代写machine learning代考|Humans Are Explanation Hungry

Arjun 慢慢走出房间，发现同一个人和另一位医生坐在一起。希望自己可能遇到同样的问题，Arjun 走近他们并自我介绍。那家伙非常高兴。他叫维卡斯，和阿琼同岁。他也有糖尿病，但目前已经得到控制，他来找医生表达对治疗的感激之情。Arjun 在听到他的治疗故事后很兴奋。他不解地问维卡斯他是如何控制住它的。医生开了什么药？维卡斯将他的药方拿给阿琼看，阿琼彻底惊呆了。每一种药都和他的一样。Arjun 很好奇对 Vikas 有效的相同药物为何没有改变他自己的症状。他们同意在附近喝点咖啡，然后开始交谈。

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|机器学习代写machine learning代考|COMP4702

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习 machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习 machine learning代写方面经验极为丰富，各种代写机器学习 machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|机器学习代写machine learning代考|Silhouette coefficient

In this section, we describe a common heuristic method for picking the number of clusters in a K-means clustering model. This is designed to work for spherical (not elongated) clusters. First we define the silhouette coefficient of an instance $i$ to be $s c(i)=\left(b_i-a_i\right) / \max \left(a_i, b_i\right)$, where $a_i$ is the mean distance to the other instances in cluster $k_i=\operatorname{argmin}k\left|\boldsymbol{\mu}_k-\boldsymbol{x}_i\right|$, and $b_i$ is the mean distance to the other instances in the next closest cluster, $k_i^{\prime}=\operatorname{argmin}{k \neq k_i}\left|\boldsymbol{\mu}_k-\boldsymbol{x}_i\right|$. Thus $a_i$ is a measure of compactness of $i$ ‘s cluster, and $b_i$ is a measure of distance between the clusters. The silhouette coefficient varies from $-1$ to $+1$. A value of $+1$ means the instance is close to all the members of its cluster, and far from other clusters; a value of 0 means it is close to a cluster boundary; and a value of $-1$ means it may be in the wrong cluster. We define the silhouette score of a clustering $K$ to be the mean silhouette coefficient over all instances.

In Figure 21.11a, we plot the distortion vs $K$ for the data in Figure 21.7. As we explained above, it goes down monotonically with $K$. There is a slight “kink” or “elbow” in the curve at $K=3$, but this is hard to detect. In Figure 21.11c, we plot the silhouette score vs $K$. Now we see a more prominent peak at $K=3$, although it seems $K=7$ is almost as good. See Figure $21.12$ for a comparison of some of these clusterings.

It can be informative to look at the individual silhouette coefficients, and not just the mean score. We can plot these in a silhouette diagram, as shown in Figure 21.13, where each colored region corresponds to a different cluster. The dotted vertical line is the average coefficient. Clusters with many points to the left of this line are likely to be of low quality. We can also use the silhouette diagram to look at the size of each cluster, even if the data is not $2 \mathrm{~d}$.

## 计算机代写|机器学习代写machine learning代考|Unidentifiability and label switching

Note that we are free to permute the labels in a mixture model without changing the likelihood. This is called the label switching problem, and is an example of non-identifiability of the parameters.

This can cause problems if we wish to perform posterior inference over the parameters (as opposed to just computing the MLE or a MAP estimate). For example, suppose we fit a GMM with $K=2$ components to the data in Figure $21.15$ using HMC. The posterior over the means, $p\left(\mu_1, \mu_2 \mid \mathcal{D}\right)$, is shown in Figure 21.16a. We see that the marginal posterior for each component, $p\left(\mu_k \mid \mathcal{D}\right)$, is bimodal. This reflects the fact that there are two equally good explanations of the data: either $\mu_1 \approx 47$ and $\mu_2 \approx 57$, or vice versa.

To break symmetry, we can add an ordering constraint on the centers, so that $\mu_1<\mu_2$. We can do this by adding a penalty or potential function to the objective if the penalty is violated. More precisely, the penalized log joint becomes
$$\ell^{\prime}(\boldsymbol{\theta})=\log p(\mathcal{D} \mid \boldsymbol{\theta})+\log p(\boldsymbol{\theta})+\phi(\boldsymbol{\mu})$$
where
$$\phi(\boldsymbol{\mu})= \begin{cases}-\infty & \text { if } \mu_1<\mu_0 \ 0 & \text { otherwise }\end{cases}$$
This has the desired effect, as shown in Figure 21.16b.
A more general approach is to apply a transformation to the parameters, to ensure identifiability. That is, we sample the parameters $\boldsymbol{\theta}$ from a proposal, and then apply an invertible transformation $\boldsymbol{\theta}^{\prime}=f(\boldsymbol{\theta})$ to them before computing the $\log$ joint, $\log p\left(\mathcal{D}, \boldsymbol{\theta}^{\prime}\right)$. To account for the change of variables (Section 2.8.3), we add the $\log$ of the determinant of the Jacobian. In the case of a 1d ordering transformation, which just sorts its inputs, the determinant of the Jacobian is 1, so the log-det-Jacohian term vanishes.

Unfortunately, this approach does not scale to more than 1 dimensional problems, because there is no obvious way to enforce an ordering constraint on the centers $\boldsymbol{\mu}_{k \text { s }}$

# 机器学习代考

## 计算机代写|机器学习代写machine learning代考|Silhouette coefficient

$k_i^{\prime}=\operatorname{argmin} k \neq k_i\left|\boldsymbol{\mu}_k-\boldsymbol{x}_i\right|$. 因此 $a_i$ 是紧凑性的度量 $i$ 的集群，和 $b_i$ 是集群之间距离的度量。轮廓系数从 $-1$ 至 $+1$. 的价值 $+1$ 表示该实例靠近其集群的所有成员，并且远离其他集群；值为 0 表示它接近集群边界；和 价值 $-1$ 意味着它可能在错误的集群中。我们定义聚类的轮廓分数 $K$ 是所有实例的平均轮廓系数。

## 计算机代写|机器学习代写machine learning代考|Unidentifiability and label switching

$$\ell^{\prime}(\boldsymbol{\theta})=\log p(\mathcal{D} \mid \boldsymbol{\theta})+\log p(\boldsymbol{\theta})+\phi(\boldsymbol{\mu})$$

$$\phi(\boldsymbol{\mu})=\left{-\infty \quad \text { if } \mu_1<\mu_0 0 \quad\right. \text { otherwise }$$

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|机器学习代写machine learning代考|COMP30027

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习 machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习 machine learning代写方面经验极为丰富，各种代写机器学习 machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|机器学习代写machine learning代考|The K-medoids algorithm

There is a variant of K-means called $\mathbf{K}$-medoids algorithm, in which we estimate each cluster center $\boldsymbol{\mu}_k$ by choosing the data example $\boldsymbol{x}_n \in \mathcal{X}$ whose average dissimilarity to all other points in that cluster is minimal; such a point is known as a medoid. By contrast, in K-means, we take averages over points $\boldsymbol{x}_n \in \mathbb{R}^D$ assigned to the cluster to compute the center. K-medoids can be more robust to outliers (although that issue can also be tackled by using mixtures of Student distributions, instead of mixtures of Gaussians). More importantly, K-medoids can be applied to data that does not live in $\mathbb{R}^D$, where averaging may not be well defined. In K-medoids, the input to the algorithm is $N \times N$ pairwise distance matrix, $D\left(n, n^{\prime}\right)$, not an $N \times D$ feature matrix.

The classic algorithm for solving the K-medoids is the partitioning around medoids or PAM method [KR87]. In this approach, at each iteration, we loop over all $K$ medoids. For each medoid $m$, we consider each non-medoid point $o$, swap $m$ and $o$, and recompute the cost (sum of all the distances of points to their medoid). If the cost has decreased, we keep this swap. The running time of this algorithm is $O\left(N^2 K T\right)$, where $T$ is the number of iterations.

There is also a simpler and faster method, known as the Voronoi iteration method due to [PJ09]. In this approach, at each iteration, we have two steps, similar to K-means. First, for each cluster $k$, look at all the points currently assigned to that cluster, $S_k=\left{n: z_n=k\right}$, and then set $m_k$ to be the index of the medoid of that set. (To find the medoid requires examining all $\left|S_k\right|$ candidate points, and choosing the one that has the smallest sum of distances to all the other points in $S_k$.) Second, for each point $n$, assign it to its closest medoid, $z_n=\operatorname{argmin}_k D(n, k)$. The pseudo-code is given in Algorithm 12.

## 计算机代写|机器学习代写machine learning代考|Minimizing the distortion

Based on our experience with supervised learning, a natural choice for picking $K$ is to pick the value that minimizes the reconstruction error on a validation set, defined as follows:
$$\operatorname{err}\left(\mathcal{D}{\text {valid }}, K\right)=\frac{1}{\left|\mathcal{D}{\text {valid }}\right|} \sum_{n \in \mathcal{D}_{\text {valia }}}\left|\boldsymbol{x}_n-\hat{\boldsymbol{x}}_n\right|_2^2$$
where $\hat{\boldsymbol{x}}_n=$ decode $\left(\right.$ encode $\left.\left(\boldsymbol{x}_n\right)\right)$ is the reconstruction of $\boldsymbol{x}_n$.
Unfortunately, this technique will not work. Indeed, as we see in Figure 21.11a, the distortion monotonically decreases with $K$. To see why, note that the K-means model is a degenerate density model which consists of $K$ “spikes” at the $\boldsymbol{\mu}_k$ centers. As we increase $K$, we “cover” more of the input space. Hence any given input point is more likely to find a close prototype to accurately represent it as $K$ increases, thus decreasing reconstruction error. Thus unlike with supervised learning, we cannot use reconstruction error on a validation set as a way to select the best unsupervised model. (This comment also applies to picking the dimensionality for PCA, see Section 20.1.4.)

A method that does work is to use a proper probabilistic model, such as a GMM, as we describe in Section 21.4.1. We can then use the log marginal likelihood (LML) of the data to perform model selection.

We can approximate the LML using the BIC score as we discussed in Section 5.2.5.1. From Equation (5.59), we have
$$\operatorname{BIC}(K)=\log p\left(\mathcal{D} \mid \hat{\boldsymbol{\theta}}_k\right)-\frac{D_K}{2} \log (N)$$
where $D_K$ is the number of parameters in a model with $K$ clusters, and $\hat{\boldsymbol{\theta}}_K$ is the MLE. We see from Figure 21.11b that this exhibits the typical U-shaped curve, where the penalty decreases and then increases.

The reason this works is that each cluster is associated with a Gaussian distribution that fills a volume of the input space, rather than being a degenerate spike. Once we have enough clusters to cover the true modes of the distribution, the Bayesian Occam’s razor (Section 5.2.3) kicks in, and starts penalizing the model for being unncessarily complex.
See Section 21.4.1.3 for more discussion of Bayesian model selection for mixture models.

# 机器学习代考

## 计算机代写|机器学习代写machine learning代考|The K-medoids algorithm

K-means 有一个变体叫做钾-medoids 算法，我们在其中估计每个聚类中心米k通过选择数据示例Xn∈X与该集群中所有其他点的平均差异最小；这样的点被称为中心点。相比之下，在 K-means 中，我们对点取平均值Xn∈R丁分配给集群计算中心。K-medoids 对离群值更稳健（尽管这个问题也可以通过使用学生分布的混合来解决，而不是高斯分布的混合）。更重要的是，K-medoids 可以应用于不存在于R丁，其中平均可能没有明确定义。在 K-medoids 中，算法的输入是否×否成对距离矩阵，丁(n,n′), 不是否×丁特征矩阵。

## 计算机代写|机器学习代写machine learning代考|Minimizing the distortion

$$\operatorname{err}(\mathcal{D} \text { valid }, K)=\frac{1}{\mid \mathcal{D} \text { valid } \mid} \sum_{n \in \mathcal{D}_{\text {valia }}}\left|\boldsymbol{x}_n-\hat{\boldsymbol{x}}_n\right|_2^2$$

$$\operatorname{BIC}(K)=\log p\left(\mathcal{D} \mid \hat{\boldsymbol{\theta}}_k\right)-\frac{D_K}{2} \log (N)$$

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|机器学习代写machine learning代考|COMP5318

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习 machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习 machine learning代写方面经验极为丰富，各种代写机器学习 machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|机器学习代写machine learning代考|Vector quantization

Suppose we want to perform lossy compression of some real-valued vectors, $\boldsymbol{x}n \in \mathbb{R}^D$. A very simple approach to this is to use vector quantization or VQ. The basic idea is to replace each real-valued vector $\boldsymbol{x}_n \in \mathbb{R}^D$ with a discrete symbol $z_n \in{1, \ldots, K}$, which is an index into a codebook of $K$ prototypes, $\boldsymbol{\mu}_k \in \mathbb{R}^D$. Each data vector is encoded by using the index of the most similar prototype, where similarity is measured in terms of Euclidean distance: $$\operatorname{encode}\left(\boldsymbol{x}_n\right)=\arg \min _k\left|\boldsymbol{x}_n-\boldsymbol{\mu}_k\right|^2$$ We can define a cost function that measures the quality of a codebook by computing the reconstruction error or distortion it induces: $$J \triangleq \frac{1}{N} \sum{n=1}^N\left|\boldsymbol{x}n-\operatorname{decode}\left(\operatorname{encode}\left(\boldsymbol{x}_n\right)\right)\right|^2=\frac{1}{N} \sum{n=1}^N\left|\boldsymbol{x}n-\boldsymbol{\mu}{z_n}\right|^2$$
where decode $(k)=\boldsymbol{\mu}_k$. This is exactly the cost function that is minimized by the K-means algorithm. Of course, we can achieve zero distortion if we assign one prototype to every data vector, by using $K=N$ and assigning $\boldsymbol{\mu}_n=\boldsymbol{x}_n$. However, this does not compress the data at all. In particular, it takes $O(N D B)$ bits, where $N$ is the number of real-valued data vectors, each of length $D$, and $B$ is the number of bits needed to represent a real-valued scalar (the quantization accuracy to represent each $\left.\boldsymbol{x}_n\right)$

We can do better by detecting similar vectors in the data, creating prototypes or centroids for them, and then representing the data as deviations from these prototypes. This reduces the space requirement to $O\left(N \log _2 K+K D B\right)$ bits. The $O\left(N \log _2 K\right)$ term arises because each of the $N$ data vectors needs to specify which of the $K$ codewords it is using; and the $O(K D B)$ term arises because we have to store each codebook entry, each of which is a $D$-dimensional vector. When $N$ is large, the first term dominates the second, so we can approximate the rate of the encoding scheme (number of bits needed per object) as $O\left(\log _2 K\right)$, which is typically much less than $O(D B)$.

One application of VQ is to image compression. Consider the $200 \times 320$ pixel image in Figure $21.9$; we will treat this as a set of $N=64,000$ scalars. If we use one byte to represent each pixel (a gray-scale intensity of 0 to 255 ), then $B=8$, so we need $N B=512,000$ bits to represent the image in uncompressed form. For the compressed image, we need $O\left(N \log _2 K\right)$ bits. For $K=4$, this is about $128 \mathrm{~kb}$, a factor of 4 compression, yet it results in negligible perceptual loss (see Figure 21.9(b)). Greater compression could be achieved if we modeled spatial correlation between the pixels, e.g., if we encoded $5 \times 5$ blocks (as used by JPEG). This is because the residual errors (differences from the connection between data compression and density estimation. See the sequel to this book, [Mur22], for more information.

## 计算机代写|机器学习代写machine learning代考|The K-means++ algorithm

K-means is optimizing a non-convex objective, and hence needs to be initialized carefully. A simple approach is to pick $K$ data points at random, and to use these as the initial values for $\boldsymbol{\mu}_k$. We can improve on this by using multiple restarts, i.e., we run the algorithm multiple times from different random starting points, and then pick the best solution. However, this can be slow.

A better approach is to pick the centers sequentially so as to try to “cover” the data. That is, we pick the initial point uniformly at random, and then each subsequent point is picked from the remaining points, with probability proportional to its squared distance to the point’s closest cluster center. That is, at iteration $t$, we pick the next cluster center to be $\boldsymbol{x}n$ with probability $$p\left(\boldsymbol{\mu}_t=\boldsymbol{x}_n\right)=\frac{D{t-1}\left(\boldsymbol{x}n\right)}{\sum{n^{\prime}=1}^N D_{t-1}\left(\boldsymbol{x}{n^{\prime}}\right)}$$ where $$D_t(\boldsymbol{x})=\min {k=1}^{t-1}\left|\boldsymbol{x}-\boldsymbol{\mu}_k\right|_2^2$$
is the squared distance of $\boldsymbol{x}$ to the closest existing centroid. Thus points that are far away from a centroid are more likely to be picked, thus reducing the distortion. This is known as farthest point clustering [Gon85], or K-means++ [AV07; Bah+12; Bac+16; BLK17; LS19a]. Surprisingly, this simple trick can be shown to guarantee that the recontruction error is never more than $O(\log K)$ worse than optimal [AV07].

# 机器学习代考

## 计算机代写|机器学习代写machine learning代考|Vector quantization

$$\operatorname{encode}\left(\boldsymbol{x}_n\right)=\arg \min _k\left|\boldsymbol{x}_n-\boldsymbol{\mu}_k\right|^2$$

$$J \triangleq \frac{1}{N} \sum n=1^N\left|\boldsymbol{x} n-\operatorname{decode}\left(\operatorname{encode}\left(\boldsymbol{x}_n\right)\right)\right|^2=\frac{1}{N} \sum n=1^N\left|\boldsymbol{x} n-\boldsymbol{\mu} z_n\right|^2$$

VQ 的一个应用是图像压缩。考虑 $200 \times 320$ 图中的像素图像 $21.9$; 我们将把它当作一组 $N=64,000$ 标量。如 果我们用一个字节来表示每个像素 (灰度强度为 0 到 255 ) ，那么 $B=8$ ，所以我们需要 $N B=512,000$ 以末 压缩形式表示图像的位。对于压缩图像，我们需要 $O\left(N \log _2 K\right)$ 位。为了 $K=4$ ，这是关于 $128 \mathrm{~kb}$ ，一个 4 倍的压缩因子，但它导致的感知损失可以忽略不计（见图 $21.9$ (b) ) 。如果我们对像素之间的空间相关性进行 建模，例如，如果我们编码，则可以实现更大的压缩 $5 \times 5$ 块（由JPEG 使用)。这是因为残差 (不同于数据压 缩和密度估计之间的联系。有关更多信息，请参阅本书的续集 [Mur22]。

## 计算机代写|机器学习代写machine learning代考|The K-means++ algorithm

K-means 正在优化一个非凸目标，因此需要仔细初始化。一个简单的方法是选择 $K$ 随机数据点，并使用这些作 为初始值 $\boldsymbol{\mu}k$. 我们可以通过使用多次重新启动来改进这一点，即我们从不同的随机起点多次运行算法，然后选择 最佳解决方案。但是，这可能很慢。 更好的方法是按顺序选择中心以尝试”覆盖”数据。也就是说，我们随机均匀地选择初始点，然后从剩余的点中选 择每个后续点，概率与其到该点最近的聚类中心的距离的平方成正比。也就是说，在迭代 $t$ ，我们选择下一个聚 类中心 $\boldsymbol{x} n$ 有概率 $$p\left(\boldsymbol{\mu}_t=\boldsymbol{x}_n\right)=\frac{D t-1(\boldsymbol{x} n)}{\sum n^{\prime}=1^N D{t-1}\left(\boldsymbol{x} n^{\prime}\right)}$$

$$D_t(\boldsymbol{x})=\min k=1^{t-1}\left|\boldsymbol{x}-\boldsymbol{\mu}_k\right|_2^2$$

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 机器学习代考_Machine Learning代考_COMP4702

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习 machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习 machine learning代写方面经验极为丰富，各种代写机器学习 machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 机器学习代考_Machine Learning代考_Classical MDS

Suppose we start an $N \times D$ data matrix $\mathbf{X}$ with rows $\boldsymbol{x}i$. Let us define the centered Gram (similarity) matrix as follows: $$\ddot{K}{i j}=\left\langle\boldsymbol{x}i-\overline{\boldsymbol{x}}, \boldsymbol{x}_j-\overline{\boldsymbol{x}}\right\rangle$$ In matrix notation, we have $\tilde{\mathbf{K}}=\tilde{\mathbf{X}} \tilde{\mathbf{X}}^{\top}$, where $\tilde{\mathbf{X}}=\mathbf{C}_N \mathbf{X}$ and $\mathbf{C}_N=\mathbf{I}_N-\frac{1}{N} \mathbf{1}_N \mathbf{1}_N^{\top}$ is the centering matrix. Now define the strain of a set of embeddings as follows: $$\mathcal{L}{\text {strain }}(\mathbf{Z})=\sum_{i, j}\left(\tilde{K}_{i j}-\left\langle\tilde{z}_i, \tilde{z}_j\right\rangle\right)^2=\left|\tilde{\mathbf{K}}-\tilde{\mathbf{Z}} \tilde{\mathbf{Z}}^{\boldsymbol{\top}}\right|_F^2$$

where $\tilde{\boldsymbol{z}}i=\boldsymbol{z}_i-\overline{\boldsymbol{z}}$ is the centered embedding vector. Intuitively this measures how well similarities in the high-dimensional data space, $\tilde{K}{i j}$, are matched by similarities in the low-dimensional embedding space, $\left\langle\tilde{z}_i, \tilde{z}_j\right\rangle$. Minimizing this loss is called classical MDS.

We know from Section $7.5$ that the best rank $L$ approximation to a matrix is its truncated SVD representation, $\tilde{\mathbf{K}}=\mathbf{U S V}^{\top}$. Since $\tilde{\mathbf{K}}$ is positive semi definite, we have that $\mathbf{V}=\mathbf{U}$. Hence the optimal embedding satisfies
$$\tilde{\mathbf{Z}} \tilde{\mathbf{Z}}^{\top}=\mathbf{U S}^{\top}=\left(\mathbf{U S}^{\frac{1}{2}}\right)\left(\mathbf{S}^{\frac{1}{2}} \mathbf{U}^{\top}\right)$$
Thus we can set the embedding vectors to be the rows of $\tilde{\mathbf{Z}}=$ US $^{\frac{1}{2}}$.
Now we describe how to apply classical MDS to a dataset where we just have Euclidean distances, rather than raw features. First we compute a matrix of squared Euclidean distances, $\mathbf{D}^{(2)}=\mathbf{D} \odot \mathbf{D}$, which has the following entries:
\begin{aligned} D_{i j}^{(2)}=\left|\boldsymbol{x}i-\boldsymbol{x}_j\right|^2 &=\left|\boldsymbol{x}_i-\overline{\boldsymbol{x}}\right|^2+\left|\boldsymbol{x}_j-\overline{\boldsymbol{x}}\right|^2-2\left\langle\boldsymbol{x}_i-\overline{\boldsymbol{x}}, \boldsymbol{x}_j-\overline{\boldsymbol{x}}\right\rangle \ &=\left|\boldsymbol{x}_i-\overline{\boldsymbol{x}}\right|^2+\left|\boldsymbol{x}_j-\overline{\boldsymbol{x}}\right|^2-2 \tilde{K}{i j} \end{aligned}
We see that $\mathbf{D}^{(2)}$ only differs from $\tilde{\mathbf{K}}$ by some row and column constants (and a factor of -2). Hence we can compute $\tilde{\mathbf{K}}$ by double centering $\mathbf{D}^{(2)}$ using Equation (7.89) to get $\tilde{\mathbf{K}}=-\frac{1}{2} \mathbf{C}N \mathbf{D}^{(2)} \mathbf{C}_N$. In other words, $$\tilde{K}{i j}=-\frac{1}{2}\left(d_{i j}^2-\frac{1}{N} \sum_{l=1}^N d_{i l}^2-\frac{1}{N} \sum_{l=1}^N d_{j l}^2+\frac{1}{N^2} \sum_{l=1}^N \sum_{m=1}^N d_{l m}^2\right)$$
We can then compute the embeddings as before.
It turns out that classical MDS is equivalent to PCA (Section 20.1). To see this, let $\tilde{\mathbf{K}}=\mathbf{U}L \mathbf{S}_L \mathbf{U}_L^{\top}$ he the rank $L{\text {a truncated SVD of the centered kernel matrix. The MDS emberding is given by }}$ $\mathbf{Z}{\mathrm{MDS}}=\mathbf{U}_L \mathbf{S}_L^{\frac{1}{2}}$. Now consider the rank $L$ SVD of the centered data matrix, $\tilde{\mathbf{X}}=\mathbf{U}_X \mathbf{S}_X \mathbf{V}_X^{\top}$. The PCA embedding is $\mathbf{Z}{\mathrm{PCA}}=\mathbf{U}X \mathbf{S}_X$. Now $$\tilde{\mathbf{K}}=\tilde{\mathbf{X}} \tilde{\mathbf{X}}^{\top}=\mathbf{U}_X \mathbf{S}_X \mathbf{V}_X^{\top} \mathbf{V}_X \mathbf{S}_X \mathbf{U}_X^{\top}=\mathbf{U}_X \mathbf{S}_X^2 \mathbf{U}_X^{\top}=\mathbf{U}_L \mathbf{S}_L \mathbf{U}_L^{\top}$$ Hence $\mathbf{U}_X=\mathbf{U}_L$ and $\mathbf{S}_X=\mathbf{S}_L^2$, and so $\mathbf{Z}{\mathrm{PCA}}=\mathbf{Z}_{\mathrm{MDS}}$.

## 机器学习代考_Machine Learning代考_Metric MDS

Classical MDS assumes Euclidean distances. We can generalize it to allow for any dissimilarity measure by defining the stress function
$$\mathcal{L}{\text {stress }}(\mathbf{Z})=\sqrt{\frac{\sum{i<j}\left(d_{i, j}-\hat{d}{i j}\right)^2}{\sum{i j} d_{i j}^2}}$$
where $\hat{d}{i j}=\left|\boldsymbol{z}_i-\boldsymbol{z}_j\right|$. This is called metric MDS. Note that this is a different objective than the one used by classical MDS, so even if $d{i j}$ are Euclidean distances, the results will be different.
We can use gradient descent to solve the optimization problem. However, it is better to use an bound optimization algorithm (Section 8.7) called SMACOF [Lee77], which stands for “Scaling by MAjorizing a COmplication Function”. (This is the method implemented in scikit-learn.) See Figure $20.31$ for the results of applying this to our running example.

Instead of trying to match the distance between points, we can instead just try to match the ranking of how similar points are. To do this, let $f(d)$ be a monotonic transformation from distances to ranks. Now define the loss
$$\mathcal{L}{\mathrm{NM}}(\mathbf{Z})=\sqrt{\frac{\sum{i<j}\left(f\left(d_{i, j}\right)-\hat{d}{i j}\right)^2}{\sum{i j} \hat{d}{i j}^2}}$$ where $\hat{d}{i j}=\left|z_i-z_j\right|$. Minimizing this is known as non-metric MDS.
This objective can be optimized iteratively. First the function $f$ is optimized, for a given $\mathbf{Z}$, using isotonic regression; this finds the optimal monotonic transformation of the input distances to match the current embedding distances. Then the embeddings $\mathbf{Z}$ are optimized, for a given $f$, using gradient descent, and the process repeats.

# 机器学习代考

## 机器学习代考_Machine Learning代考_Classical MDS

$$\ddot{K} i j=\left\langle\boldsymbol{x} i-\overline{\boldsymbol{x}}, \boldsymbol{x}j-\overline{\boldsymbol{x}}\right\rangle$$ 在矩阵符号中，我们有 $\tilde{\mathbf{K}}=\tilde{\mathbf{X}} \tilde{\mathbf{X}}^{\top}$ ， 在哪里 $\tilde{\mathbf{X}}=\mathbf{C}_N \mathbf{X}$ 和 $\mathbf{C}_N=\mathbf{I}_N-\frac{1}{N} \mathbf{1}_N \mathbf{1}_N^{\top}$ 是中心矩阵。现在定义一 组嵌入的应变如下: $$\mathcal{L} \text { strain }(\mathbf{Z})=\sum{i, j}\left(\tilde{K}{i j}-\left\langle\tilde{z}_i, \tilde{z}_j\right\rangle\right)^2=\left|\tilde{\mathbf{K}}-\tilde{\mathbf{Z}} \tilde{\mathbf{Z}}^{\top}\right|_F^2$$ 在哪里 $\tilde{z} i=z_i-\bar{z}$ 是中心嵌入向量。直观地，这衡量了高维数据空间中的相似性， $\tilde{K} i j$, 由低维嵌入空间中的 相似性匹配， $\left\langle\tilde{z}_i, \tilde{z}_j\right\rangle$.最小化这种损失称为经典 MDS。 我们从科知道 $7.5$ 最好的排名 $L$ 矩阵的近似值是其截断的 SVD 表示， $\tilde{\mathbf{K}}=\mathbf{U S V}^{\top}$. 自从 $\tilde{\mathbf{K}}$ 是半正定的，我们有 $\mathbf{V}=\mathbf{U}$. 因此最优嵌入满足 $$\tilde{\mathbf{Z}} \tilde{\mathbf{Z}}^{\top}=\mathbf{U S}^{\top}=\left(\mathbf{U S}^{\frac{1}{2}}\right)\left(\mathbf{S}^{\frac{1}{2}} \mathbf{U}^{\top}\right)$$ 因此我们可以将嵌入向量设置为 $\tilde{\mathbf{Z}}=$ 我们 $^{\frac{1}{2}}$. 现在我们描述如何将经典 MDS 应用于我们只有欧几里德距离而不是原始特征的数据集。首先我们计算平方欧氏 距离的矩阵， $\mathbf{D}^{(2)}=\mathbf{D} \odot \mathbf{D}$ ，其中包含以下条目: $$D{i j}^{(2)}=\left|\boldsymbol{x} i-\boldsymbol{x}j\right|^2=\left|\boldsymbol{x}_i-\overline{\boldsymbol{x}}\right|^2+\left|\boldsymbol{x}_j-\overline{\boldsymbol{x}}\right|^2-2\left\langle\boldsymbol{x}_i-\overline{\boldsymbol{x}}, \boldsymbol{x}_j-\overline{\boldsymbol{x}}\right\rangle \quad=\left|\boldsymbol{x}_i-\overline{\boldsymbol{x}}\right|^2+\left|\boldsymbol{x}_j-\overline{\boldsymbol{x}}\right|^2-2$$ 我们看到 $\mathbf{D}^{(2)}$ 仅不同于 $\tilde{\mathbf{K}}$ 通过一些行和列常量 (和 $-2$ 的因子) 。因此我们可以计算 $\tilde{\mathbf{K}}$ 双中心 $\mathbf{D}^{(2)}$ 用式 (7.89) 得到 $\tilde{\mathbf{K}}=-\frac{1}{2} \mathbf{C} N \mathbf{D}^{(2)} \mathbf{C}_N$. 换句话说， $$\tilde{K} i j=-\frac{1}{2}\left(d{i j}^2-\frac{1}{N} \sum_{l=1}^N d_{i l}^2-\frac{1}{N} \sum_{l=1}^N d_{j l}^2+\frac{1}{N^2} \sum_{l=1}^N \sum_{m=1}^N d_{l m}^2\right)$$

## 机器学习代考_Machine Learning代考_Metric MDS

$$\mathcal{L} \text { stress }(\mathbf{Z})=\sqrt{\frac{\sum i<j\left(d_{i, j}-\hat{d} i j\right)^2}{\sum i j d_{i j}^2}}$$

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 机器学习代考_Machine Learning代考_COMP30027

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习 machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习 machine learning代写方面经验极为丰富，各种代写机器学习 machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 机器学习代考_Machine Learning代考_Manifold learning

In this section, we discuss the problem of recovering the underlying low-dimensional structure in a high-dimensional dataset. This structure is often assumed to be a curved manifold (explained in Section 20.4.1), so this problem is called manifold learning or nonlinear dimensionality reduction. The key difference from methods such as autoencoders (Section 20.3) is that we will focus on non-parametric methods, in which we compute an embedding for each point in the training set, as opposed to learning a generic model that can embed any input vector. That is, the methods we discuss do not (easily) support out-of-sample generalization. However, they can be easier to fit, and are quite flexible. Such methods can be a useful for unsupervised learning (knowledge discovery), data visualization, and as a preprocessing step for supervised learning. See [AAB21] for a recent review of this field.

Roughly speaking, a manifold is a topological space which is locally Euclidean. One of the simplest examples is the surface of the earth, which is a curved $2 \mathrm{~d}$ surface embedded in a $3 \mathrm{~d}$ space. At each local point on the surface, the earth seems flat.

More formally, a $d$-dimensional manifold $\mathcal{X}$ is a space in which each point $x \in \mathcal{X}$ has a neighborhood which is topologically equivalent to a $d$-dimensional Euclidean space, called the tangent space, denoted $\mathcal{T}_x=T_x \mathcal{X}$. This is illustrated in Figure $20.28$.

A Riemannian manifold is a differentiable manifold that associates an inner product operator at each point $x$ in tangent space; this is assumed to depend smoothly on the position $x$. The inner product induces a notion of distance, angles, and volume. The collection of these inner products is called a Riemannian metric. It can be shown that any sufficiently smooth Riemannian manifold can be embedded into a Euclidean space of potentially higher dimension; the Riemannian inner product at a point then becomes Euclidean inner product in that tangent space.

## 机器学习代考_Machine Learning代考_The manifold hypothesis

Most “naturally occuring” high dimensional dataset lie a low dimensional manifold. This is called the manifold hypothesis [FMN16]. For example, consider the case of an image. Figure 20.29a shows a single image of size $64 \times 57$. This is a vector in a 3,648-dimensional space, where each dimension corresponds to a pixel intensity. Suppose we try to generate an image by drawing a random point in this space; it is unlikely to look like the image of a digit, as shown in Figure 20.29b. However,the pixels are not independent of each other, since they are generated by some lower dimensional structure, namely the shape of the digit 6 .

As we vary the shape, we will generate different images. We can often characterize the space of shape variations using a low-dimensional manifold. This is illustrated in Figure 20.29c, where we apply PCA (Section 20.1) to project a dataset of 360 images, each one a slightly rotated version of the digit 6 , into a $2 \mathrm{~d}$ space. We see that most of the variation in the data is captured by an underlying curved $2 \mathrm{~d}$ manifold. We say that the intrinsic dimensionality $d$ of the data is 2 , even though the ambient dimensionality $D$ is 3,648 .

In the rest of this section, we discuss ways to learn manifolds from data. There are many different algorithms that have been proposed, which make different assumptions about the nature of the manifold, and which have different computational properties. We discuss a few of these methods in the following sections. For more details, see e.g., [Bur10].

The methods can be categorized as shown in Table 20.1. The term “nonparametric” refers to methods that learn a low dimensional embedding $\boldsymbol{z}_i$ for each datapoint $\boldsymbol{x}_i$, but do not learn a mapping function which can be applied to an out-of-sample datapoint. (However, [Ben $+04 \mathrm{~b}$ ] discusses how to extend many of these methods beyond the training set by learning a kernel.)

In the sections below, we compare some of these methods using 2 different datasets: a set of 1000 3d-points sampled from the $2 \mathrm{~d}$ “Swiss roll” manifold, and a set of 179764 -dimensional points sampled from the UCI digits dataset. See Figure $20.30$ for an illustration of the data. We will learn a 2 d manifold, so we can visualize the data.

# 机器学习代考

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 机器学习代考_Machine Learning代考_COMP5318

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习 machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习 machine learning代写方面经验极为丰富，各种代写机器学习 machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 机器学习代考_Machine Learning代考_Training VAEs

We cannot compute the exact marginal likelihood $p(\boldsymbol{x} \mid \boldsymbol{\theta})$ needed for MLE training, because posterior inference in a nonlinear FA model is intractable. However, we can use the inference network to compute an approximate posterior, $q(\boldsymbol{z} \mid \boldsymbol{x})$. We can then use this to compute the evidence lower bound or ELBO. For a single example $\boldsymbol{x}$, this is given by
\begin{aligned} \mathrm{L}(\boldsymbol{\theta}, \boldsymbol{\phi} \mid \boldsymbol{x}) &=\mathbb{E}{q{\boldsymbol{\phi}}(\boldsymbol{z} \mid \boldsymbol{x})}\left[\log p_{\boldsymbol{\theta}}(\boldsymbol{x}, \boldsymbol{z})-\log q_{\boldsymbol{\phi}}(\boldsymbol{z} \mid \boldsymbol{x})\right] \ &=\mathbb{E}{q(\boldsymbol{z} \mid \boldsymbol{x}, \boldsymbol{\phi})}[\log p(\boldsymbol{x} \mid \boldsymbol{z}, \boldsymbol{\theta})]-D{\mathrm{KL}}(q(\boldsymbol{z} \mid \boldsymbol{x}, \boldsymbol{\phi}) | p(\boldsymbol{z})) \end{aligned}
This can be interpreted as the expected log likelihood, plus a regularizer, that penalizes the posterior from deviating too much from the prior. (This is different than the approach in Section 20.3.4, where we applied the KL penalty to the aggregate posterior in each minibatch.)

The ELBO is a lower bound of the log marginal likelihood (aka evidence), as can be seen from Jensen’s inequality:
\begin{aligned} \llcorner(\boldsymbol{\theta}, \boldsymbol{\phi} \mid \boldsymbol{x})&=\int q_{\boldsymbol{\phi}}(\boldsymbol{z} \mid \boldsymbol{x}) \log \frac{p_{\boldsymbol{\theta}}(\boldsymbol{x}, \boldsymbol{z})}{q_\phi(\boldsymbol{z} \mid \boldsymbol{x})} d \boldsymbol{z} \ & \leq \log \int q_{\boldsymbol{\phi}}(\boldsymbol{z} \mid \boldsymbol{x}) \frac{p_{\boldsymbol{\theta}}(\boldsymbol{x}, \boldsymbol{z})}{q_{\boldsymbol{\phi}}(\boldsymbol{z} \mid \boldsymbol{x})} d \boldsymbol{z}=\log p_{\boldsymbol{\theta}}(\boldsymbol{x}) \end{aligned}
Thus for fixed inference network parameters $\phi$, increasing the ELBO should increase the log likelihood of the data, similar to EM Section 8.7.2.

## 机器学习代考_Machine Learning代考_The reparameterization trick

In this section, we discuss how to compute the ELBO and its gradient. For simplicity, let us suppose that the inference network estimates the parameters of a Gaussian posterior. Since $q_\phi(\boldsymbol{z} \mid \boldsymbol{x})$ is Gaussian, we can write
$$\boldsymbol{z}=f_{\epsilon, \mu}(\boldsymbol{x} ; \boldsymbol{\phi})+f_{e, \sigma}(\boldsymbol{x} ; \boldsymbol{\phi}) \odot \boldsymbol{\epsilon}$$
where $\boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$. Hence
$$£(\boldsymbol{\theta}, \boldsymbol{\phi} \mid \boldsymbol{x})=\mathbb{E}{\boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})}\left[\log p{\boldsymbol{\theta}}\left(\boldsymbol{x} \mid \boldsymbol{z}=\mu_{\boldsymbol{\phi}}(\boldsymbol{x})+\sigma_{\boldsymbol{\phi}}(\boldsymbol{x}) \odot \boldsymbol{\epsilon}\right)\right]-D_{\mathrm{KL}}\left(q_{\boldsymbol{\phi}}(\boldsymbol{z} \mid \boldsymbol{x}) | p(\boldsymbol{z})\right)$$
Now the expectation is independent of the parameters of the model, so we can safely push gradients inside and use backpropagation for training in the usual way, by minimizing $-\mathbb{E}_{\boldsymbol{x} \sim \mathcal{D}}[\mathrm{L}(\boldsymbol{\theta}, \boldsymbol{\phi} \mid \boldsymbol{x})]$ wrt $\boldsymbol{\theta}$ and $\boldsymbol{\phi}$. This is known as the reparameterization trick. See Figure $20.23$ for an illustration.
The first term in the ELBO can be approximated by sampling $\epsilon$, scaling it by the output of the inference network to get $\boldsymbol{z}$, and then evaluating $\log p(\boldsymbol{x} \mid \boldsymbol{z})$ using the decoder network.

The second term in the ELBO is the KL of two Gaussians, which has a closed form solution. In particular, inserting $p(\boldsymbol{z})=\mathcal{N}(\boldsymbol{z} \mid \mathbf{0}, \mathbf{I})$ and $q(\boldsymbol{z})=\mathcal{N}(\boldsymbol{z} \mid \boldsymbol{\mu}, \operatorname{diag}(\boldsymbol{\sigma}))$ into Equation (6.33), we get
$$D_{\mathrm{KL}}(q | p)=\sum_{k=1}^K\left[\log \left(\frac{1}{\sigma_k}\right)+\frac{\sigma_k^2+\left(\mu_k-0\right)^2}{2 \cdot 1}-\frac{1}{2}\right]=-\frac{1}{2} \sum_{k=1}^K\left[\log \sigma_k^2-\sigma_k^2-\mu_k^2+1\right]$$

# 机器学习代考

## 机器学习代考_Machine Learning代考_Training VAEs

$$\mathrm{L}(\boldsymbol{\theta}, \boldsymbol{\phi} \mid \boldsymbol{x})=\mathbb{E} q \boldsymbol{\phi}(\boldsymbol{z} \mid \boldsymbol{x})\left[\log p_{\boldsymbol{\theta}}(\boldsymbol{x}, \boldsymbol{z})-\log q_\phi(\boldsymbol{z} \mid \boldsymbol{x})\right] \quad=\mathbb{E} q(\boldsymbol{z} \mid \boldsymbol{x}, \boldsymbol{\phi})[\log p(\boldsymbol{x} \mid \boldsymbol{z}, \boldsymbol{\theta})]-D \mathrm{KL}$$

ELBO 是对数边际似然 (又名证据) 的下限，从詹森不等式可以看出：
$$\left\llcorner(\boldsymbol{\theta}, \boldsymbol{\phi} \mid \boldsymbol{x})=\int q_\phi(\boldsymbol{z} \mid \boldsymbol{x}) \log \frac{p_{\boldsymbol{\theta}}(\boldsymbol{x}, \boldsymbol{z})}{q_\phi(\boldsymbol{z} \mid \boldsymbol{x})} d \boldsymbol{z} \quad \leq \log \int q_\phi(\boldsymbol{z} \mid \boldsymbol{x}) \frac{p_{\boldsymbol{\theta}}(\boldsymbol{x}, \boldsymbol{z})}{q_\phi(\boldsymbol{z} \mid \boldsymbol{x})} d \boldsymbol{z}=\log p_{\boldsymbol{\theta}}(\boldsymbol{x})\right.$$

## 机器学习代考_Machine Learning代考_The reparameterization trick

$$\boldsymbol{z}=f_{\epsilon, \mu}(\boldsymbol{x} ; \boldsymbol{\phi})+f_{e, \sigma}(\boldsymbol{x} ; \boldsymbol{\phi}) \odot \boldsymbol{\epsilon}$$

$$£(\boldsymbol{\theta}, \boldsymbol{\phi} \mid \boldsymbol{x})=\mathbb{E} \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})\left[\log p \boldsymbol{\theta}\left(\boldsymbol{x} \mid \boldsymbol{z}=\mu_{\boldsymbol{\phi}}(\boldsymbol{x})+\sigma_\phi(\boldsymbol{x}) \odot \boldsymbol{\epsilon}\right)\right]-D_{\mathrm{KL}}\left(q_\phi(\boldsymbol{z} \mid \boldsymbol{x}) \mid p(\boldsymbol{z})\right)$$

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|机器学习代写machine learning代考|COMP4702

statistics-lab™ 为您的留学生涯保驾护航 在代写机器学习 machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习 machine learning代写方面经验极为丰富，各种代写机器学习 machine learning相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|机器学习代写machine learning代考|Consistency regularization

Consistency regularizalion leverages the simple idea that perturbing a given dalapoint (or the model itself) should not cause the model’s output to change dramatically. Since measuring consistency in this way only makes use of the model’s outputs (and not ground-truth labels), it is readily applicable to unlabeled data and therefore can be used to create appropriate loss functions for semi-supervised learning. This idea was first proposed under the framework of “learning with pseudo-ensembles” [BAP14], with similar variants following soon thereafter [LA16; SJT16].

In its most general form, both the model $p_\theta(y \mid x)$ and the transformations applied to the input can be stochastic. For example, in computer vision problems we may transform the input by using data augmentation like randomly rotating or adding noise the input image, and the network may include stochastic components like dropout (Section 13.5.4) or weight noise [Gra11]. A common and simple form of consistency regularization first samples $\boldsymbol{x}^{\prime} \sim q\left(\boldsymbol{x}^{\prime} \mid \boldsymbol{x}\right)$ (where $q\left(\boldsymbol{x}^{\prime} \mid x\right)$ is the distribution induced by the stochastic input transformations) and then minimizes the loss $\left|p_\theta(y \mid x)-p_\theta\left(y \mid x^{\prime}\right)\right|^2$. In practice, the first term $p_\theta(y \mid x)$ is typically treated as fixed (i.e. gradients are not propagated through it). In the semi-supervised setting, the combined loss function over a batch of labeled data $\left(\boldsymbol{x}1, y_1\right),\left(\boldsymbol{x}_2, y_2\right), \ldots,\left(\boldsymbol{x}_M, y_M\right)$ and unlabeled data $\boldsymbol{x}_1, \boldsymbol{x}_2, \ldots, \boldsymbol{x}_N$ is $$\mathcal{L}(\boldsymbol{\theta})=-\sum{i=1}^M \log p_\theta\left(y=y_i \mid \boldsymbol{x}i\right)+\lambda \sum{j=1}^N\left|p_\theta\left(y \mid \boldsymbol{x}j\right)-p\theta\left(y \mid \boldsymbol{x}_j^{\prime}\right)\right|^2$$
where $\lambda$ is a scalar hyperparameter that balances the importance of the loss on unlabeled data and, for simplicity, we write $\boldsymbol{x}_j^{\prime}$ to denote a sample drawn from $q\left(\boldsymbol{x}^{\prime} \mid \boldsymbol{x}_j\right)$.

The basic form of consistency regularization in Equation (19.27) reveals many design choices that impact the success of this semi-supervised learning approach. First, the value chosen for the $\lambda$ hyperparameter is important. If it is too large, then the model may not give enough weight to learning the supervised task and will instead start to reinforce its own bad predictions (as with confirmation bias in self-training). Since the model is often poor at the start of training before it has been trained on much labeled data, it is common in practice to initialize set $\lambda$ to zero and increase its value over the course of training.

## 计算机代写|机器学习代写machine learning代考|Variational autoencoders

In Section 20.3.5, we describe the variational autoencoder (VAE), which defines a probabilistic model of the joint distribution of data $\boldsymbol{x}$ and latent variables $\boldsymbol{z}$. Data is assumed to be generated by first sampling $\boldsymbol{z} \sim p(\boldsymbol{z})$ and then sampling $\boldsymbol{x} \sim p(\boldsymbol{x} \mid \boldsymbol{z})$. For learning, the VAE uses an encoder $\boldsymbol{q}{\boldsymbol{\lambda}}(\boldsymbol{z} \mid \boldsymbol{x})$ to approximate the posterior and a decoder $p\theta(\boldsymbol{x} \mid \boldsymbol{z})$ to approximate the likelihood. The encoder and decoder are typically deep neural networks. The parameters of the encoder and decoder can be jointly trained by maximizing the evidence lower bound (ELBO) of data.

The marginal distribution of latent variables $p(\boldsymbol{z})$ is often chosen to be a simple distribution like a diagonal-covariance Gaussian. In practice, this can make the latent variables $\boldsymbol{z}$ more amenable to downstream classification thanks to the facts that $\boldsymbol{z}$ is typically lower-dimensional than $\boldsymbol{x}$, that $\boldsymbol{z}$ is constructed via cascaded nonlinear transformations, and that the dimensions of the latent variables are designed to be independent. In other words, the latent variables can provide a (learned) representation where data may be more easily separable. In [Kin $+14]$, this approach is called M1 and it is indeed shown that the latent variables can be used to train stronger models when labels are scarce. (The general idea of unsupervised learning of representations to help with downstream classification tasks is described further in Section 19.2.4.)

An alternative approach to leveraging VAEs, also proposed in [Kin $+14]$ and called M2, has the form
$$p_{\boldsymbol{\theta}}(\boldsymbol{x}, y)=p_{\boldsymbol{\theta}}(y) p_{\boldsymbol{\theta}}(\boldsymbol{x} \mid y)=p_{\boldsymbol{\theta}}(y) \int p_{\boldsymbol{\theta}}(\boldsymbol{x} \mid y, \boldsymbol{z}) p_{\boldsymbol{\theta}}(\boldsymbol{z}) d \boldsymbol{z}$$
where $\boldsymbol{z}$ is a latent variable, $p_{\boldsymbol{\theta}}(\boldsymbol{z})=\mathcal{N}(\boldsymbol{z} \mid \mathbf{0}, \mathbf{I})$ is the latent prior, $p_{\boldsymbol{\theta}}(y)=\operatorname{Cat}(y \mid \boldsymbol{\pi})$ the label prior, and $p_{\boldsymbol{\theta}}(\boldsymbol{x} \mid y, \boldsymbol{z})=p\left(\boldsymbol{x} \mid f_{\boldsymbol{\theta}}(y, \boldsymbol{z})\right)$ is the likelihood, such as a Gaussian, with parameters computed by $f$ (a deep neural network). The main innovation of this approach is to assume that data is generated according to both a latent class variable $y$ as well as the continuous latent variable $\boldsymbol{z}$. The class variable $y$ is observed for labeled data and unobserved for unlabled data.

# 机器学习代考

## 计算机代写|机器学习代写machine learning代考|Consistency regularization

$\left|p_\theta(y \mid x)-p_\theta\left(y \mid x^{\prime}\right)\right|^2$. 在实践中，第一个词 $p_\theta(y \mid x)$ 通常被视为固定的（即梯度不通过它传播）。在半 监督设置中，一批标记数据的组合损失函数 $\left(\boldsymbol{x} 1, y_1\right),\left(\boldsymbol{x}2, y_2\right), \ldots,\left(\boldsymbol{x}_M, y_M\right)$ 和末标记的数据 $\boldsymbol{x}_1, \boldsymbol{x}_2, \ldots, \boldsymbol{x}_N$ 是 $$\mathcal{L}(\boldsymbol{\theta})=-\sum i=1^M \log p\theta\left(y=y_i \mid \boldsymbol{x} i\right)+\lambda \sum j=1^N\left|p_\theta(y \mid \boldsymbol{x} j)-p \theta\left(y \mid \boldsymbol{x}_j^{\prime}\right)\right|^2$$

## 计算机代写|机器学习代写machine learning代考|Variational autoencoders

[Kin] 中也提出了一种利用 VAE 的替代方法 $+14]$ 并称为 M2，具有以下形式
$$p_\theta(\boldsymbol{x}, y)=p_\theta(y) p_{\boldsymbol{\theta}}(\boldsymbol{x} \mid y)=p_{\boldsymbol{\theta}}(y) \int p_{\boldsymbol{\theta}}(\boldsymbol{x} \mid y, \boldsymbol{z}) p_{\boldsymbol{\theta}}(\boldsymbol{z}) d \boldsymbol{z}$$

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。