标签： COMP3670

计算机代写|机器学习代写machine learning代考|QBUS6850

Posted on 2023年8月11日2023年8月28日 by statistics-lab

如果你也在怎样代写机器学习Machine Learning 这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。机器学习Machine Learning令人兴奋。这是有趣的，具有挑战性的，创造性的，和智力刺激。它还为公司赚钱，自主处理大量任务，并从那些宁愿做其他事情的人那里消除单调工作的繁重任务。

机器学习Machine Learning也非常复杂。从数千种算法、数百种开放源码包，以及需要具备从数据工程(DE)到高级统计分析和可视化等各种技能的专业实践者，ML专业实践者所需的工作确实令人生畏。增加这种复杂性的是，需要能够与广泛的专家、主题专家(sme)和业务单元组进行跨功能工作——就正在解决的问题的性质和ml支持的解决方案的输出进行沟通和协作。

statistics-lab™ 为您的留学生涯保驾护航在代写机器学习 machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习 machine learning代写方面经验极为丰富，各种代写机器学习 machine learning相关的作业也就用不着说。

计算机代写|机器学习代写machine learning代考|Artifact management

Let’s imagine that we’re still working at the fire-risk department of the forest service introduced in chapter 15. In our efforts to effectively dispatch personnel and equipment to high-risk areas in the park system, we’ve arrived at a solution that works remarkably well. Our features are locked in and are stable over time. We’ve evaluated the performance of the predictions and are seeing genuine value from the model.
Throughout this process of getting the features into a good state, we’ve been iterating through the improvement cycle, shown in figure 16.1.

As this cycle shows, we’ve been iteratively releasing new versions of the model, testing against a baseline deployment, collecting feedback, and working to improve the predictions. At some point, however, we’ll be going into model-sustaining mode.

We’ve worked as hard as we can to improve the features going into the model and have found that the return on investment (ROI) of continuing to add new data elements to the project is simply not worth it. We’re now in the position of scheduled passive retraining of our model based on new data coming in over time.

When we’re at this steady-state point, the last thing that we want to do is to have one of the DS team members spend an afternoon manually retraining a model, manually comparing its results to the current production-deployed model with ad hoc analysis, and deciding on whether the model should be updated.

计算机代写|机器学习代写machine learning代考|MLflow’s model registry

In this situation that we find ourselves in, with scheduled updates to a model happening autonomously, it is important for us to know the state of production deployment. Not only do we need to know the current state, but if questions arise about performance of a passive retraining system in the past, we need to have a means of investigating the historical provenance of the model. Figure 16.3 compares using and not using a registry for tracking provenance in order to explain a historical issue.

As you can see, the process for attempting to re-create a past run is fraught with peril; we have a high risk of being unable to reproduce the issue that the business found in historical predictions. With no registry to record the artifacts utilized in production, manual work must be done to re-create the model’s original conditions. This can be incredible challenging (if not impossible) in most companies because changes may have occurred to the underlying data used to train the model, rendering it impossible to re-create that state.

The preferred approach, as shown in figure 16.3 , is to utilize a model registry service. MLflow, for instance, offers exactly this functionality within its APIs, allowing us to log details of each retraining run to the tracking server, handle production promotion if the scheduled retraining job performs better on holdout data, and archive the older model for future reference. If we had used this framework, the process of testing conditions of a model that had at one point run in production would be as simple as recalling the artifact from the registry entry, loading it into a notebook environment, and generating the explainable correlation reports with tools such as shap.

机器学习代考

计算机代写|机器学习代写machine learning代考|Artifact management

让我们想象一下，我们仍然在第15章介绍的森林服务部门的火灾风险部门工作。在我们努力有效地向公园系统的高风险区域派遣人员和设备的过程中，我们已经找到了一个非常有效的解决方案。随着时间的推移，我们的功能被锁定并保持稳定。我们已经评估了预测的表现，并从模型中看到了真正的价值。
在使特性进入良好状态的整个过程中，我们一直在迭代改进周期，如图16.1所示。

正如这个周期所示，我们一直在迭代地发布模型的新版本，针对基线部署进行测试，收集反馈，并努力改进预测。然而，在某种程度上，我们将进入模型维持模式。

我们已经尽我们所能地改进模型中的特性，并且发现继续向项目中添加新数据元素的投资回报(ROI)根本不值得。我们现在处于计划中的被动再训练位置，这是基于随着时间的推移输入的新数据。

当我们处于这个稳定状态点时，我们最不想做的事情就是让DS团队中的一个成员花一个下午的时间手动地重新训练模型，手动地将其结果与当前生产部署的模型进行特别分析比较，并决定是否应该更新模型。

计算机代写|机器学习代写machine learning代考|MLflow’s model registry

在我们发现自己所处的这种情况下，对模型的计划更新是自主发生的，因此了解生产部署的状态对我们来说非常重要。我们不仅需要知道当前状态，而且如果过去被动再训练系统的性能出现问题，我们需要有一种方法来调查模型的历史来源。图16.3比较了使用和不使用注册表跟踪来源的情况，以便解释历史问题。

正如你所看到的，试图重现过去的运行过程充满了危险;我们有很高的风险无法重现业务在历史预测中发现的问题。由于没有注册中心来记录生产中使用的工件，因此必须进行手工工作来重新创建模型的原始条件。在大多数公司中，这可能是一个难以置信的挑战(如果不是不可能的话)，因为用于训练模型的底层数据可能已经发生了变化，使得无法重新创建该状态。

如图16.3所示，首选的方法是利用模型注册中心服务。例如，MLflow在其api中提供了这种功能，允许我们将每次再培训运行的详细信息记录到跟踪服务器，如果计划的再培训工作在保留数据上表现更好，则处理生产提升，并存档旧模型以供将来参考。如果我们使用了这个框架，那么在生产环境中运行的模型的测试过程就会非常简单，只需从注册表项中召回工件，将其加载到笔记本环境中，并使用诸如shape之类的工具生成可解释的相关报告。

计算机代写|机器学习代写machine learning代考请认准statistics-lab™

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

金融工程是使用数学技术来解决金融问题。金融工程使用计算机科学、统计学、经济学和应用数学领域的工具和知识来解决当前的金融问题，以及设计新的和创新的金融产品。

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

术语广义线性模型（GLM）通常是指给定连续和/或分类预测因素的连续响应变量的常规线性回归模型。它包括多元线性回归，以及方差分析和方差分析（仅含固定效应）。

有限元方法代写

有限元方法（FEM）是一种流行的方法，用于数值解决工程和数学建模中出现的微分方程。典型的问题领域包括结构分析、传热、流体流动、质量运输和电磁势等传统领域。

有限元是一种通用的数值方法，用于解决两个或三个空间变量的偏微分方程（即一些边界值问题）。为了解决一个问题，有限元将一个大系统细分为更小、更简单的部分，称为有限元。这是通过在空间维度上的特定空间离散化来实现的，它是通过构建对象的网格来实现的：用于求解的数值域，它有有限数量的点。边界值问题的有限元方法表述最终导致一个代数方程组。该方法在域上对未知函数进行逼近。[1] 然后将模拟这些有限元的简单方程组合成一个更大的方程系统，以模拟整个问题。然后，有限元通过变化微积分使相关的误差函数最小化来逼近一个解决方案。

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

随机分析代写

随机微积分是数学的一个分支，对随机过程进行操作。它允许为随机过程的积分定义一个关于随机过程的一致的积分理论。这个领域是由日本数学家伊藤清在第二次世界大战期间创建并开始的。

时间序列分析代写

随机过程，是依赖于参数的一组随机变量的全体，参数通常是时间。随机变量是随机现象的数量表现，其时间序列是一组按照时间发生先后顺序进行排列的数据点序列。通常一组时间序列的时间间隔为一恒定值（如1秒，5分钟，12小时，7天，1年），因此时间序列可以作为离散时间数据进行分析处理。研究时间序列数据的意义在于现实中，往往需要研究某个事物其随时间发展变化的规律。这就需要通过研究该事物过去发展的历史记录，以得到其自身发展的规律。

回归分析代写

多元回归分析渐进（Multiple Regression Analysis Asymptotics）属于计量经济学领域，主要是一种数学上的统计分析方法，可以分析复杂情况下各影响因素的数学关系，在自然科学、社会和经济学等多个领域内应用广泛。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习和应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|QBUS3820

Posted on 2023年8月7日2023年8月28日 by statistics-lab

计算机代写|机器学习代写machine learning代考|Feature stores

We briefly touched on using a feature store in the preceding chapter. While it is important to understand the justification for and benefits of implementing a feature store (namely, that of consistency, reusability, and testability), seeing an application of a relatively nascent technology is more relevant than discussing the theory. Here, we’re going to look at a scenario that I struggled through, involving the importance of utilizing a feature store to enforce consistency throughout an organization leveraging both ML and advanced analytics.

Let’s imagine that we work at a company that has multiple DS teams. Within the engineering group, the main DS team focuses on company-wide initiatives. This team works mostly on large-scale projects involving critical services that can be employed by any group within the company, as well as customer-facing services. Spread among departments are a smattering of independent contributor DS employees who have been hired by and report to their respective department heads. While collaboration occurs, the main datasets used by the core DS team are not open for the independent DS employees’ use.

At the start of a new year, a department head hires a new DS straight out of a university program. Well-intentioned, driven, and passionate, this new hire immediately gets to work on the initiatives that this department head wants investigated. In the process of analyzing the characteristics of the customers of the company, the new hire come across a production table that contains probabilities for customers to make a call-center complaint. Curious, the new DS begins analyzing the predictions against the data that is in the data warehouse for their department.

Unable to reconcile any feature data to the predictions, the DS begins working on a new model prototype to try to improve upon the complaint prediction solution. After spending a few weeks, the DS presents their findings to their department head. Given the go-ahead to work on this project, the DS proceeds to build a project in their analytics department workspace. After several months, the DS presents their findings at a company all-hands meeting.

Confused, the core DS team asks why this project is being worked on and for further details on the implementation. In less than an hour, the core DS team is able to explain why the independent DS’s solution worked so well: they leaked the label. Figure 16.6 illustrates the core DS team’s explanation: the data required to build any new model or perform extensive analysis of the data collected from users is walled off by the silo surrounding the core DS team’s engineering department.

The data being used for training that was present in the department’s data warehouse was being fed from the core DS team’s production solution. Each source feature used to train the core model was inaccessible to anyone apart from engineering and production processes.

计算机代写|机器学习代写machine learning代考|What a feature store is used for

Solving the data silo issue in our scenario is among the most compelling reasons to use a feature store. When dealing with a distributed DS capability throughout an organization, the benefits of standardization and accessibility are seen through a reduction in redundant work, incongruous analyses, and general confusion surrounding the veracity of solutions.

However, having a feature store enables an organization to do far more with its data than just quality-control it. To illustrate these benefits, figure 16.7 shows a highlevel code architecture for model building and serving with and without a feature store.

The top portion of figure 16.7 shows the historical reality of ML development for projects. Tightly coupled feature-engineering code is developed inline to the model tuning and training code to generate models that are more effective than they would be if trained on the raw data. While this architecture makes sense from a development perspective of generating a good model, it creates an issue when developing the prediction code base (as shown at the top right of figure 16.7).

Any operations that are done to the raw data now need to be ported over to this serving code, presenting an opportunity for errors and inconsistencies in the model vector. Alternatives to this approach can help eliminate the chances of data inconsistency, however:

” Use a pipeline (most major ML frameworks have them).

Abstract feature-engineering code into a package that training and serving can both call.
“Write traditional ETL to generate features and store them.

机器学习代考

计算机代写|机器学习代写machine learning代考|Feature stores

在前一章中，我们简要地谈到了使用功能库。虽然理解实现特性存储(即一致性、可重用性和可测试性)的理由和好处很重要，但看到一个相对新兴技术的应用比讨论理论更有意义。在这里，我们将看到一个我曾经历过的场景，涉及到利用特性存储来在组织中同时利用ML和高级分析来加强一致性的重要性。

让我们想象一下，我们在一家拥有多个DS团队的公司工作。在工程团队中，主要的DS团队专注于公司范围内的计划。该团队主要从事涉及关键服务的大型项目，这些服务可以由公司内的任何团队使用，以及面向客户的服务。分散在各个部门之间的是一小部分独立贡献者DS员工，他们被各自的部门主管雇用并向其报告。当协作发生时，核心DS团队使用的主要数据集不开放给独立的DS员工使用。

新年伊始，一位部门主管直接从大学项目中招聘了一名新的副主任。这位新员工动机良好、干劲十足、充满激情，他立即着手部门主管希望调查的项目。在分析公司客户特征的过程中，新员工看到了一张生产表，上面写着客户向呼叫中心投诉的概率。奇怪的是，新的DS开始根据数据仓库中的数据分析预测。

由于无法将任何特征数据与预测相一致，DS开始研究一个新的模型原型，试图改进投诉预测解决方案。经过几周的调查后，副警长将他们的调查结果提交给部门主管。在这个项目上工作的许可下，DS继续在他们的分析部门工作空间中构建一个项目。几个月后，董事总经理在公司全体会议上公布了他们的调查结果。

核心DS团队感到很困惑，他们问为什么要做这个项目，并想知道更多关于实现的细节。在不到一个小时的时间里，核心DS团队能够解释为什么独立DS的解决方案如此有效:他们泄露了标签。图16.6说明了核心DS团队的解释:构建任何新模型或对从用户收集的数据进行广泛分析所需的数据被核心DS团队的工程部门周围的筒仓隔离开来。

部门数据仓库中用于培训的数据来自核心DS团队的生产解决方案。用于训练核心模型的每个源特征除了工程和生产过程之外，任何人都无法访问。

计算机代写|机器学习代写machine learning代考|What a feature store is used for

在我们的场景中，解决数据竖井问题是使用特性存储最令人信服的原因之一。在整个组织中处理分布式DS功能时，通过减少冗余工作、不一致的分析和围绕解决方案准确性的普遍混淆，可以看到标准化和可访问性的好处。

然而，拥有一个功能商店使组织能够对其数据做更多的事情，而不仅仅是对其进行质量控制。为了说明这些好处，图16.7显示了用于模型构建和使用或不使用特性库的高级代码体系结构。

图16.7的顶部显示了项目ML开发的历史现实。紧密耦合的特征工程代码与模型调优和训练代码内联开发，以生成比在原始数据上训练更有效的模型。虽然从生成良好模型的开发角度来看，这种体系结构是有意义的，但是在开发预测代码库时，它会产生一个问题(如图16.7的右上角所示)。

对原始数据所做的任何操作现在都需要移植到此服务代码中，这就为模型向量中的错误和不一致提供了机会。然而，这种方法的替代方法可以帮助消除数据不一致的可能性:

“使用管道(大多数主流ML框架都有管道)。

将特征工程代码抽象为培训和服务都可以调用的包。
“编写传统的ETL来生成特征并存储它们。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|STAT3888

Posted on 2023年8月7日2023年8月7日 by statistics-lab

计算机代写|机器学习代写machine learning代考|Model interpretability

Let’s suppose that we’re working on a problem designed to control forest fires. The organization that we work for can stage equipment, personnel, and services to locations within a large national park system in order to mitigate the chances of wildfires growing out of control. To make logistics effectiveness as efficient as possible, we’ve been tasked with building a solution that can identify risks of fire outbreaks by grid coordinates. We have several years of data, sensor data from each location, and a history of fire-burn area for each grid position.

After building the model and providing the predictions as a service to the logistics team, questions arise about the model’s predictions. The logistics team members notice that certain predictions don’t align with their tribal knowledge of having dealt with fire seasons, voicing concerns about addressing predicted calamities with the feature data that they’re exposed to.

They’ve begun to doubt the solution. They’re asking questions. They’re convinced that something strange is going on and they’d like to know why their services and personnel are being told to cover a grid coordinate in a month that, as far as they can remember, has never had a fire break out.

How can we tackle this situation? How can we run simulations of our feature vector for the prediction through our model and tell them conclusively why the model predicted what it did? Specifically, how can we implement explainable artificial intelligence (XAI) on our model with the minimum amount of effort?

When planning out a project, particularly for a business-critical use case, a frequently overlooked aspect is to think about model explainability. Some industries and companies are the exception to this rule, because of either legal requirements or corporate policies, but for most groups that I’ve interacted with, interpretability is an afterthought.

I understand the reticence that most teams have in considering tacking on XAI functionality to a project. During the course of EDA, model tuning, and QA validation, the DS team generally understands the behavior of the model quite well. Implementing XAI may seem redundant.

By the time you need to explain how or why a model predicted what it did, you’re generally in a panic situation that is already time-constrained. Through implementing XAI processes through straightforward open source packages, this panicked and chaotic scramble to explain functionality of a solution can be avoided.

计算机代写|机器学习代写machine learning代考|Shapley additive explanations

One of the more well-known and thoroughly proven XAI implementations for Python is the shap package, written and maintained by Scott Lundberg. This implementation is fully documented in detail in the 2017 NeurIPS paper “A Unified Approach to Interpreting Model Predictions” by Lundberg and Su-In Lee.

At the core of the algorithm is game theory. Essentially, when we’re thinking of features that go into a training dataset, what is the effect on the model’s predictions for each feature? As with players in a team sport, if a match is the model itself and the features involved in training are the players, what is the effect on the match if one player is substituted for another? How one player’s influence changes the outcome of the game is the basic question that shap is attempting to answer.
FOUNDATION
The principle behind shap involves estimating the contribution of each feature from the training dataset upon the model. According to the original paper, calculating the true contribution (the exact Shapley value) requires evaluating all permutations for each row of the dataset for inclusion and exclusion of the source row’s feature, creating different coalitions of feature groupings.

For instance, if we have three features $\left(\mathrm{a}, \mathrm{b}\right.$, and $\mathrm{c}$; original features denoted with $\mathrm{i}_{\mathrm{i}}$ ), with replacement features from the dataset denoted as ${ }_j$ (for example, $a_j$ ) the coalitions to test for evaluating feature $b$ are as follows:
$$
\left(a_i, b_i, c_j\right),\left(a_i, b_j, c_j\right),\left(a_i, b_j, c_i\right),\left(a_j, b_i, c_j\right),\left(a_j, b_j, c_i\right)
$$
These coalitions of features are run through the model to retrieve a prediction. The resulting prediction is then differenced from the original row’s prediction (and an absolute value taken of the difference). This process is repeated for each feature, resulting in a feature-value contribution score when a weighted average is applied to each delta grouping per feature.

It should come as no surprise that this isn’t a very scalable solution. As the feature count increases and the training dataset’s row count increases, the computational complexity of this approach quickly becomes untenable. Thankfully, another solution is far more scalable: the approximate Shapley estimation.

机器学习代考

计算机代写|机器学习代写machine learning代考|Model interpretability

假设我们正在研究一个控制森林火灾的问题。我们工作的组织可以将设备、人员和服务部署到大型国家公园系统内的各个地点，以减少野火失控的可能性。为了尽可能提高物流效率，我们的任务是建立一个可以通过网格坐标识别火灾爆发风险的解决方案。我们有几年的数据，每个位置的传感器数据，以及每个网格位置的火灾区域历史。

在构建模型并将预测作为服务提供给物流团队之后，出现了关于模型预测的问题。物流团队成员注意到，某些预测与他们处理火灾季节的部落知识不一致，表达了他们对使用他们所接触到的特征数据来处理预测灾难的担忧。

他们开始怀疑这个解决办法了。他们在问问题。他们确信发生了一些奇怪的事情，他们想知道为什么他们的服务和人员被要求在一个月内覆盖一个网格坐标，就他们所记得的，从来没有发生过火灾。

我们如何应对这种情况?我们如何通过我们的模型对预测的特征向量进行模拟，并最终告诉他们为什么模型预测了它所做的事情?具体来说，我们如何以最少的努力在我们的模型上实现可解释的人工智能(XAI) ?

当规划一个项目时，特别是对于业务关键型用例，一个经常被忽视的方面是考虑模型的可解释性。由于法律要求或公司政策，一些行业和公司是这条规则的例外，但对于我接触过的大多数团体来说，可解释性是事后考虑的。

我理解大多数团队在考虑将XAI功能添加到项目中时的沉默。在EDA、模型调优和QA验证过程中，DS团队通常非常了解模型的行为。实现XAI似乎是多余的。

当你需要解释一个模型如何或为什么预测它所做的事情时，你通常已经处于时间有限的恐慌状态。通过直接的开放源码包实现XAI过程，可以避免解释解决方案功能时出现的恐慌和混乱。

计算机代写|机器学习代写machine learning代考|Shapley additive explanations

shap包是Python中比较知名且经过彻底验证的XAI实现之一，它由Scott Lundberg编写和维护。Lundberg和Su-In Lee在2017年NeurIPS论文“解释模型预测的统一方法”中详细记录了这种实现。

算法的核心是博弈论。从本质上讲，当我们考虑进入训练数据集的特征时，每个特征对模型预测的影响是什么?就像团队运动中的球员一样，如果比赛是模型本身，训练中涉及的特征是球员，那么如果一名球员被另一名球员替换，会对比赛产生什么影响?玩家的影响力如何改变游戏结果是《shape》试图回答的基本问题。
基础
shape背后的原理包括估计来自训练数据集的每个特征对模型的贡献。根据原始论文，计算真正的贡献(确切的Shapley值)需要评估数据集每行的所有排列，以包含和排除源行的特征，创建不同的特征组联盟。

例如，如果我们有三个特征$\left(\mathrm{a}, \mathrm{b}\right.$和$\mathrm{c}$;原始特征表示为$\mathrm{i}_{\mathrm{i}}$)，替换特征表示为${ }_j$(例如，$a_j$)，用于评估特征$b$的测试联盟如下:
$$
\left(a_i, b_i, c_j\right),\left(a_i, b_j, c_j\right),\left(a_i, b_j, c_i\right),\left(a_j, b_i, c_j\right),\left(a_j, b_j, c_i\right)
$$
这些特征的联合通过模型来检索预测。然后将得到的预测值与原始行的预测值进行差值(并取差值的绝对值)。对每个特征重复此过程，当对每个特征的每个增量分组应用加权平均值时，产生一个特征值贡献分数。

毫无疑问，这不是一个非常可扩展的解决方案。随着特征数的增加和训练数据集行数的增加，这种方法的计算复杂度很快就会变得站不住脚。值得庆幸的是，另一种解决方案更具可扩展性:近似Shapley估计。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|COMP4318

Posted on 2023年8月7日2023年8月7日 by statistics-lab

计算机代写|机器学习代写machine learning代考|Leaning heavily on prior art

We could use nearly any of the comical examples from table 15.1 to illustrate the first rule in creating fallback plans. Instead, let’s use an actual example from my own personal history.

I once worked on a project that had to deal with a manufacturing recipe. The goal of this recipe was to set a rotation speed on a ludicrously expensive piece of equipment while a material was dripped onto it. The speed of this unit needed to be adjusted periodically throughout the day as the temperature and humidity changed the viscosity of the material being dripped onto the product. Keeping this piece of equipment running optimally was my job; there were many dozens of these stations in the machine and many types of chemicals.

As in so many times in my career, I got really tired of doing a repetitive task. I figured there had to be some way to automate the spin speed of these units so I wouldn’t have to stand at the control station and adjust them every hour or so. Thinking myself rather clever, I wired up a few sensors to a microcontroller, programmed the programmable logic controller to receive the inputs from my little controller, wrote a simple program that would adjust the chuck speed according to the temperature and humidity in the room, and activated the system.

Everything went well, I thought, for the first few hours. I had programmed a simple regression formula into the microcontroller, checked my math, and even tested it on an otherwise broken piece of equipment. It all seemed pretty solid.

It wasn’t until around 3 a.m. that my pager (yes, it was that long ago) started going off. By the time I made it to the factory 20 minutes later, I realized that I had caused an overspeed condition in every single spin chuck system. They stopped. The rest of the liquid dosing system did not. As the chilly breeze struck the back of my head, and I looked out at the open bay doors letting in the $27^{\circ} \mathrm{F}$ night air, I realized my error.
I didn’t have a fallback condition. The regression line, taking in the ambient temperature, tried to compensate for the untested range of data (the viscosity curve wasn’t actually linear at that range), and took a chuck that normally rotated at around 2,800 RPM and tried to instruct it to spin at 15,000 RPM.

I spent the next four days and three nights cleaning up lacquer from the inside of that machine. By the time I was finished, the lead engineer took me aside and handed me a massive three-ring binder and told me to “read it before playing any more games.” (I’m paraphrasing. I can’t put into print what he said to me.) The book was filled with the materials science analysis of each chemical that the machine was using. It had the exact viscosity curves that I could have used. It had information on maximum spin speeds for deposition.

计算机代写|机器学习代写machine learning代考|Cold-start woes

For certain types of ML projects, model prediction failures are not only frequent, but also expected. For solutions that require a historical context of existing data to function properly, the absence of historical data prevents the model from making a prediction. The data simply isn’t available to pass through the model. Known as the cold-start problem, this is a critical aspect of solution design and architecture for any project dealing with temporally associated data.

As an example, let’s imagine that we run a dog-grooming business. Our fleets of mobile bathing stations scour the suburbs of North America, offering all manner of services to dogs at their homes. Appointments and service selection is handled through an app interface. When booking a visit, the clients select from hundreds of options and prepay for the services through the app no later than a day before the visit.

To increase our customers’ satisfaction (and increase our revenue), we employ a service recommendation interface on the app. This model queries the customer’s historical visits, finds products that might be relevant for them, and indicates additional services that the dog might enjoy. For this recommender to function correctly, the historical services history needs to be present during service selection.

This isn’t much of a stretch for anyone to conceptualize. A model without data to process isn’t particularly useful. With no history available, the model clearly has no data in which to infer additional services that could be recommended for bundling into the appointment.

What’s needed to serve something to the end user is a cold-start solution. An easy implementation for this use case is to generate a collection of the most frequently ordered services globally. If the model doesn’t have enough data to provide a prediction, this popularity-based services aggregation can be served in its place. At that point, the app IFrame element will at least have something in it (instead of showing an empty collection) and the user experience won’t be broken by seeing an empty box.

机器学习代考

计算机代写|机器学习代写machine learning代考|Leaning heavily on prior art

我们几乎可以使用表15.1中的任何一个有趣的例子来说明创建后备计划的第一条规则。相反，让我们用我个人经历中的一个实际例子。

我曾经做过一个项目，必须处理一个制造配方。这个配方的目标是在一个昂贵得离谱的设备上设定一个旋转速度，同时把一种材料滴在上面。该装置的速度需要在一天中周期性地调整，因为温度和湿度改变了被滴到产品上的材料的粘度。保持这台设备的最佳运行状态是我的工作;机器里有几十个这样的工作站和许多种类的化学品。

在我的职业生涯中有很多次，我真的厌倦了做重复的工作。我想一定有某种方法可以自动控制这些装置的旋转速度，这样我就不必站在控制站，每隔一小时左右就调整一次。我觉得自己很聪明，于是在一个微控制器上安装了几个传感器，给可编程逻辑控制器编程，让它接收来自我的小控制器的输入，然后写了一个简单的程序，根据房间里的温度和湿度来调整卡盘的速度，然后启动了系统。

我想，在最初的几个小时里，一切都很顺利。我在微控制器中编写了一个简单的回归公式，检查了我的数学计算，甚至在一个坏掉的设备上进行了测试。一切似乎都很可靠。

直到凌晨3点左右，我的呼机才开始响(是的，那是很久以前的事了)。20分钟后，当我到达工厂时，我意识到我已经在每个旋转卡盘系统中造成了超速状态。他们停止了。液体加药系统的其余部分没有。当冷风吹过我的后脑勺时，我望着敞开的门，让27美元的夜晚空气进来，我意识到自己的错误。
我没有退路。回复线考虑了环境温度，试图补偿未测试的数据范围(粘度曲线在该范围内实际上不是线性的)，并选择了一个通常以2800转/分左右旋转的卡盘，并试图指示它以15,000转/分旋转。

接下来的四天三夜我都在清理机器里面的漆。当我完成游戏时，首席工程师把我叫到一边，递给我一个巨大的三环活页夹，并告诉我“在继续玩游戏之前先阅读它。”(我套用。我不能把他对我说的话付梓。)这本书里写满了机器所使用的每种化学物质的材料科学分析。它有我可以用的粘度曲线。它有关于沉积的最大旋转速度的信息。

计算机代写|机器学习代写machine learning代考|Cold-start woes

对于某些类型的ML项目，模型预测失败不仅频繁，而且是意料之中的。对于需要现有数据的历史上下文才能正常工作的解决方案，缺少历史数据会阻止模型进行预测。数据根本无法通过模型。这被称为冷启动问题，对于任何处理临时关联数据的项目来说，这是解决方案设计和体系结构的一个关键方面。

举个例子，假设我们经营一家狗狗美容公司。我们的移动洗浴站遍布北美郊区，为狗狗提供各种上门服务。约会和服务选择是通过应用程序界面处理的。当预约参观时，客户可以从数百个选项中进行选择，并在参观前一天通过应用程序预付服务费用。

为了提高客户的满意度(并增加我们的收入)，我们在应用程序上使用了一个服务推荐界面。这个模型会查询客户的历史访问记录，找到可能与他们相关的产品，并指出狗可能喜欢的其他服务。要使此推荐程序正确运行，在服务选择期间需要提供历史服务历史。

这对任何人来说都不是很容易理解的。没有数据要处理的模型并不是特别有用。由于没有可用的历史记录，该模型显然没有数据来推断可以推荐绑定到约会中的其他服务。

为最终用户提供服务所需要的是冷启动解决方案。此用例的一个简单实现是生成全局最频繁订购的服务的集合。如果模型没有足够的数据来提供预测，则可以使用这种基于流行度的服务聚合。在这一点上，应用程序的IFrame元素至少会有一些东西在里面(而不是显示一个空的集合)，用户体验不会因为看到一个空框而被破坏。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|QBUS6850

Posted on 2023年7月24日2023年8月25日 by statistics-lab

计算机代写|机器学习代写machine learning代考|Keeping things as simple as possible

Simplicity is a unique form of elegance in ML applications. Scoffed at by many who are new to the field, because they initially believe that complex solutions are fun to build, the simplest solutions are the ones that endure. This is true for no greater reason than that they’re easier to keep running than intensely complicated ones-mostly because of cost, reliability, and ease of upgrading.

Let’s imagine that we’re relatively new to a somewhat junior team. Each team member is steeped in the latest technological advancements in the field of ML, highly capable at developing a solution using these cutting-edge tools and techniques. Let’s pretend for an instant that these coworkers of ours believe that people using “old” techniques like Bayesian approaches, linear algorithms, and heuristics in solving problems are mere Luddites who refuse to learn the technology of the future.

One of the first projects that comes along to the team is from the operations department. The senior vice president (SVP) of the retail group approaches the team in a meeting and asks for a solution that the operations department simply can’t scale very well. The SVP wants to know if the DS team can, with only images as fodder for a solution, determine whether the people in the pictures are wearing a red shirt.

The DS team immediately goes to what they’re experienced with in their toolboxes of latest and greatest solutions. Figure 14.12 illustrates the events that unfold.

What happens in this scenario? The largest issue is in the complex approach that the team members take without validating simpler approaches. They choose to focus on technology over a solution. By focusing on a highly advanced solution to the problem and not entertaining a far simpler approach (grab a swatch of pixels at one-third of the way up in the center line of each image, determine the hue and saturation of those pixels, and classify them as either red or not red), they waste months of time and likely an awful lot of money in the process of solving the problem.

This scenario plays out remarkably frequently in companies-particularly those that are nascent to ML. These companies may feel a need to go fast with their projects because the hype surrounding $\mathrm{AI}$ is of such a deafening roar of cacophony that they think their businesses will be at risk if they don’t get AI working at whatever the cost. In the end, our example team recognizes what the easiest solution could be and rapidly develops a solution that runs at massive scale with minimal cost.

The idea of pursuing simplicity exists in two main facets of ML development: defining the problem that you’re trying to solve and building the most minimally complex solution to solve the problem.

计算机代写|机器学习代写machine learning代考|Simplicity in problem definitions

In our preceding scenario, the problem definition was clear to the business and the ML team both. “Predict red shirts for us, please” couldn’t get distilled to any more of a basic task than that. A fundamental breakdown still occurred in the discussion that was conducted, however.

The pursuit of simplicity in defining a problem centers around the elemental attributes of two important questions to be given to the internal (business unit) customer:
-What do you want a solution to do? This defines the prediction type.
-What will you do with the solution? This defines the decision aspect.
If nothing else aside from these two questions was discussed in the early-phase meetings with the business unit, the project would still be a success. Having the core need of the business problem addressed can more directly lead to project success than any other topic. The business simply wanted to identify whether employees were wearing the old company-branded red shirts in order to know to send them the new branded blue shirts. By fixating on the problem of red shirt versus blue shirt, a far simpler solution can be achieved.

Throughout the discussion that follows, we’d get the information about the nature of the photographs and their inherent homogeny. With these two fundamental aspects defined, the team can focus on a smaller list of potential approaches, simplifying the scope and work involved in order to solve the problem. Without these questions defined and answered, however, the team is left to an overly broad and creative exploration of possible solutions-which is risky.

The team members heard image classification, instantly went to CNN implementations, and for months on end locked themselves into a highly complex architecture.

Even though it eventually solved the problem fairly well, it did so in a way that would have been incredibly wasteful. (GPUs and DL models being trained on them are significantly more expensive than a pixel-hue-and-saturation bucketing algorithm that could run on a smart toaster oven.)

Keeping the problem definition for a particular prospective project to such simple terms will not only help guide initial discussions with the business unit requesting a solution, but also provide a path toward implementing the least possible complexity in whatever gets built.

机器学习代考

计算机代写|机器学习代写machine learning代考|Keeping things as simple as possible

简单性是ML应用程序中优雅的一种独特形式。许多刚进入这个领域的人会嘲笑他们，因为他们最初认为构建复杂的解决方案很有趣，最简单的解决方案才是持久的解决方案。之所以如此，无非是因为它们比非常复杂的系统更容易运行——主要是因为成本、可靠性和易于升级。

假设我们是一个初级团队的新手。每个团队成员都沉浸在机器学习领域的最新技术进步中，能够使用这些尖端的工具和技术开发解决方案。让我们暂时假设，我们的这些同事认为，使用贝叶斯方法、线性算法和启发式等“旧”技术来解决问题的人仅仅是拒绝学习未来技术的卢德分子。

团队的第一个项目来自运营部门。零售集团的高级副总裁(SVP)在一次会议上找到团队，要求提供一个解决方案，因为运营部门根本无法很好地扩展。高级副总裁想知道，DS团队能否仅凭图片作为解决方案的素材，确定图片中的人是否穿着红色衬衫。

DS团队立即在他们的工具箱中找到了最新最好的解决方案。图14.12展示了展开的事件。

在这种情况下会发生什么?最大的问题在于团队成员采用的复杂方法没有验证更简单的方法。他们选择关注技术而不是解决方案。由于专注于解决问题的高级解决方案，而不是采用简单得多的方法(在每个图像中线的三分之一处抓取像素样本，确定这些像素的色调和饱和度，并将其分类为红色或非红色)，他们在解决问题的过程中浪费了数月的时间和大量的金钱。

这种情况在公司中非常常见，尤其是那些刚刚开始使用ML的公司。这些公司可能觉得有必要加快他们的项目，因为围绕$\math {AI}$的炒作是如此震耳欲聋的刺耳的咆哮，以至于他们认为如果他们不不惜一切代价让AI工作，他们的业务将面临风险。最后，我们的示例团队认识到最简单的解决方案是什么，并迅速开发出以最小成本大规模运行的解决方案。

追求简单的理念存在于机器学习开发的两个主要方面:定义你想要解决的问题，以及构建最简单的解决方案来解决问题。

计算机代写|机器学习代写machine learning代考|Simplicity in problem definitions

在我们前面的场景中，问题定义对于业务和ML团队都是清楚的。“请帮我们预测一下红衫的颜色”不能提炼成比这更基本的任务了。然而，在进行的讨论中仍然出现了根本性的问题。

在定义问题时追求简单性，主要围绕向内部(业务单位)客户提出的两个重要问题的基本属性:
-你想要一个解决方案吗?这定义了预测类型。
-你怎么解决这个问题?这定义了决策方面。
如果在与业务单位的早期会议中除了这两个问题之外没有讨论其他问题，那么项目仍然是成功的。与其他主题相比，解决业务问题的核心需求可以更直接地导致项目成功。公司只是想确定员工是否穿着旧公司品牌的红色衬衫，以便知道给他们发新的品牌蓝色衬衫。通过关注红衬衫和蓝衬衫的问题，可以实现一个简单得多的解决方案。

在接下来的讨论中，我们会得到关于照片的本质和它们固有的同质性的信息。有了这两个基本方面的定义，团队可以专注于更小的潜在方法列表，简化解决问题所涉及的范围和工作。然而，如果没有定义和回答这些问题，团队就会对可能的解决方案进行过于宽泛和创造性的探索——这是有风险的。

团队成员听到图像分类后，立即转向CNN的实现，并在几个月的时间里将自己锁定在一个高度复杂的架构中。

尽管它最终很好地解决了这个问题，但它的解决方式会造成难以置信的浪费。(在此基础上训练的gpu和深度学习模型要比在智能烤箱上运行的像素色调和饱和度存储算法贵得多。)

将特定的潜在项目的问题定义保持如此简单

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|COMP5328

Posted on 2023年7月24日2023年8月25日 by statistics-lab

计算机代写|机器学习代写machine learning代考|Have you met your data?

What I mean by meeting isn’t the brief and polite nod of acknowledgment when passing your data on the way to refill your coffee. Nor is it the 30 -second rushed socially awkward introduction at a tradeshow meetup. Instead, the meeting that you should be having with your data is more like an hours-long private conversation in a quiet, wellfurnished speakeasy over a bottle of Macallan Rare Cask, sharing insights and delving into the nuances of what embodies the two of you as dram after silken dram caresses your digestive tracts: really and truly getting to know it.
TIP Before writing a single line of code, even for experimentation, make sure you have the data needed to answer the basic nature of the problem in the simplest way possible (an if/else statement). If you don’t have it, see if you can get it. If you can’t get it, move on to something you can solve.
As an example of the dangers of a mere passing casual rendezvous with data being used for problem solving, let’s pretend that we both work at a content provider company. Because of the nature of the business model at our little company, our content is listed on the internet behind a timed paywall. For the first few articles that are read, no ads are shown, content is free to view, and the interaction experience is bereft of interruptions. After a set number of articles, an increasingly obnoxious series of pop-ups and disruptions are presented to coerce a subscription registration from the reader.

The prior state of the system was set by a basic heuristic controlled through the counting of article pages that the end user had seen. Realizing that this would potentially be off-putting for someone browsing during their first session on the platform, this was then adjusted to look at session length and an estimate of how many lines of each article had been read. As time went on, this seemingly simple rule set became so unwieldy and complex that the web team asked our DS team to build something that could predict on a per-user level the type and frequency of disruptions that would maximize subscription rates.

We spend a few months, mostly using the prior work that was built to support the heuristics approach, having the data engineering team create mirrored ETL processes of the data structures and manipulation logic that the frontend team has been using to generate decision data. With the data available in the data lake, we proceed to build a highly effective and accurate model that seems to perform exceptionally well on all of our holdout tests.

计算机代写|机器学习代写machine learning代考|Make sure you have the data

This example might seem a bit silly, but I’ve seen this situation play out dozens of times. Having an inability to get at the right data for model serving is a common problem.
I’ve seen teams work with a manually extracted dataset (a one-time extract), build a truly remarkable solution with that data, and when ready to release the project to production, realize at the 11 th hour that the process for building that one-time extract required entirely manual actions by a DE team. The necessary data to make the solution effective was siloed off in a production infrastructure that the DS and DE teams had no ability to access. Figure 14.2 shows a rather familiar sight that I’ve borne witness to far too many times.

With no infrastructure present to bring the data into a usable form for predictions, as shown in figure 14.2, an entire project needs to be created for the DE team to build the ETL needed to materialize the data in a scheduled manner. Depending on the complexity of the data sources, this could take a while. Building hardened productiongrade ETL jobs that pull from multiple production relational databases and in-memory key-value stores is not a trivial reconciliation act, after all. Delays like this could lead (and have led) to project abandonment, regardless of the predictive capabilities of the DS portion of the solution.

This problem of complex ETL job creation becomes even more challenging if the predictions need to be conducted online. At that point, it’s not a question of the DE team working to get ETL processes running; rather, disparate groups in the engineering organization will have to accumulate the data into a single place in order to generate the collection of attributes that can be fed into a REST API request to the ML service.

This entire problem is solvable, though. During the time of EDA, the DS team should be evaluating the nature of the data generation, asking pointed questions to the data warehousing team:

Can the data be condensed to the fewest possible tables to reduce costs?
-What is the team’s priority for fixing these sources if something breaks down?
Can I access this data from both the training and serving layers?
-Will querying this data for serving meet the project SLA?

机器学习代考

计算机代写|机器学习代写machine learning代考|Have you met your data?

我所说的会面，并不是在你传递数据、去续杯咖啡的路上，简短而礼貌地点头致意。也不是在展会上匆忙的30秒尴尬的自我介绍。相反，你应该与你的数据进行的会议更像是在一个安静、设备完善的地下酒吧里，喝着一瓶麦卡伦稀有酒桶(Macallan Rare Cask)，进行长达数小时的私人谈话，分享见解，深入研究体现你们两人的细微差别，就像一杯又一杯柔滑的威士忌抚摸着你的消化道:真正真正地了解它。
提示:在编写一行代码之前，即使是为了进行实验，也要确保您拥有以最简单的方式(if/else语句)回答问题的基本性质所需的数据。如果你没有，看看你能不能得到它。如果你不能得到它，那就转向你能解决的问题。
为了说明与用于解决问题的数据仅仅是偶然相遇的危险，让我们假设我们都在一家内容提供商公司工作。由于我们这个小公司的商业模式的性质，我们的内容是在互联网上按时间收费的。对于阅读的前几篇文章，没有广告显示，内容可以自由查看，并且交互体验没有中断。在读完一定数量的文章后，会出现一系列令人讨厌的弹出窗口和干扰，迫使读者注册订阅。

系统的先验状态由一个基本的启发式设置，该启发式通过计算最终用户看过的文章页数来控制。意识到这可能会让那些在平台上的第一次浏览期间浏览的人感到不快，然后调整到查看会话长度和每篇文章的阅读行数估计。随着时间的推移，这个看似简单的规则集变得如此笨拙和复杂，以至于网络团队要求我们的DS团队构建一些东西，可以在每个用户的层面上预测中断的类型和频率，从而最大化订阅率。

我们花了几个月的时间，主要是使用之前为支持启发式方法而构建的工作，让数据工程团队创建数据结构和操作逻辑的镜像ETL流程，前端团队一直使用这些流程来生成决策数据。有了数据湖中可用的数据，我们继续构建一个非常有效和准确的模型，该模型似乎在我们所有的holdout测试中都表现得非常好。

计算机代写|机器学习代写machine learning代考|Make sure you have the data

这个例子可能看起来有点傻，但我已经看到这种情况发生过几十次了。无法为模型服务获取正确的数据是一个常见的问题。
我见过一些团队使用手动提取的数据集(一次性提取)，使用该数据构建真正出色的解决方案，并在准备将项目发布到生产环境时，在第11个小时意识到构建一次性提取的过程完全需要DE团队的手动操作。使解决方案有效的必要数据被隔离在生产基础设施中，DS和DE团队无法访问这些数据。图14.2显示了一个相当熟悉的场景，我已经见过太多次了。

如图14.2所示，由于没有基础设施将数据转换为可用于预测的形式，因此需要为DE团队创建一个完整的项目，以构建以预定方式实现数据所需的ETL。根据数据源的复杂性，这可能需要一段时间。毕竟，构建从多个生产关系数据库和内存中的键值存储中提取的坚固的生产级ETL作业并不是一个微不足道的协调行为。不管解决方案的DS部分的预测能力如何，像这样的延迟可能导致(并且已经导致)项目放弃。

如果预测需要在线进行，那么复杂的ETL创造就业机会的问题就变得更具挑战性。在这一点上，这不是DE团队努力使ETL进程运行的问题;相反，工程组织中的不同组必须将数据积累到一个地方，以便生成可以提供给ML服务的REST API请求的属性集合。

不过，整个问题是可以解决的。在EDA期间，DS团队应该评估数据生成的性质，向数据仓库团队提出尖锐的问题:

能否将数据压缩到尽可能少的表中以降低成本?
-如果有东西坏了，团队修复这些源的首要任务是什么?

我可以从训练层和服务层访问这些数据吗?
-为服务而查询这些数据是否符合项目SLA?

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|COMP5318

Posted on 2023年7月24日2023年8月25日 by statistics-lab

计算机代写|机器学习代写machine learning代考|Unintentional obfuscation: Could you read this if you didn’t write it?

A rather unique form of ML hubris materializes in the form of code development practices. Sometimes malicious, many times driven by ego (and a desire to be revered), but mostly due to inexperience and fear, this particular destructive activity takes shape through the creation of unintelligibly complex code.

For our scenario, let’s take a look at a common and somewhat simplistic task: recasting data types to support feature-engineering tasks. In this journey of comparative examples, we’ll take a dataset whose features (and the target field) need to have their types modified to support the pipeline-enabled processing stages to build a model. This problem, at its most simplistic implementation, is shown in the next listing.

From this relatively simple and imperative-style implementation of casting fields in a DataFrame, we’ll look at examples of obfuscation and discuss the impacts that each might have for something as seemingly simple as this use case.
NOTE In the next section, we’ll look at bad habits that some ML engineers have when writing code. Listing 13.3, it must be mentioned, is not intended to be disparaging in its approach and implementation. There is nothing wrong with an imperative approach when building ML code bases (provided the code base doesn’t have tight coupling requiring dozens of edits if one column changes). It becomes a problem only when the complexity of the solution makes modifying imperative code a burden. If the project is simple enough, stick with simpler code. You’ll thank yourself for the simplicity when you need to modify it and add new features.

计算机代写|机器学习代写machine learning代考|The flavors of obfuscation

This section progresses through a sliding scale of complexity, with code examples that become progressively less intelligible, more complex, and increasingly harder to maintain. We’ll analyze bad habits of some developers to aid you in identifying these coding patterns and to call them out for what they are-crippling to productivity and absolutely requiring refactoring to be maintainable.

If you find yourself going down one of these rabbit holes, these examples can serve as a reminder to not follow these patterns. But before we get to the examples, let’s look at the personas that I’ve seen with respect to development habits, shown in figure 13.3 .

These personas are not meant to identify a particular person, but rather to describe traits that a DS may go through during their journey of becoming a better developer. A nearly overwhelming number of people I’ve met (as well as myself)

started off writing code as the Hacker. We’d find ourselves stuck on a problem that we’d never encountered before and instantly move to search online for a solution, copy someone’s code, and if it worked, move on. (I’m not saying that looking on the internet or in books for information is a bad thing; even the most experienced developers do this quite frequently.)

As coding experience becomes deeper, some may lean toward one of the other three coding styles or, if they’re mentored properly, move directly to the center region. Some people have something to prove-usually only to themselves, as most people just want their peers to write the sort of code that comes from a Good Samaritan developer. Others may feel that the least number of lines of code is an effective development strategy, though they’re sacrificing legibility, extensibility, and testability in the process. Figure 13.4 shows the patterns that I’ve come across (and personally experienced).

This circuitous path leads to increasingly complex and unnecessarily complicated implementations before landing on the pinnacle of wisdom-fueled experience. The best we can hope for while making this journey is to have the ability to recognize and learn the better path-specifically, that the simplest solution to a problem (that still meets the requirements of the task) is always the best way to solve it.

机器学习代考

计算机代写|机器学习代写machine learning代考|Unintentional obfuscation: Could you read this if you didn’t write it?

ML傲慢的一种相当独特的形式体现在代码开发实践中。有时是恶意的，很多时候是由自我(和被尊敬的欲望)驱动的，但主要是由于缺乏经验和恐惧，这种特殊的破坏性活动通过创建难以理解的复杂代码而形成。

对于我们的场景，让我们看一看一个常见的、有点简单的任务:重铸数据类型以支持特征工程任务。在这个比较示例的旅程中，我们将采用一个数据集，其特征(和目标字段)需要修改其类型，以支持支持管道的处理阶段，以构建模型。这个问题最简单的实现如下面的清单所示。

从这个相对简单的、命令式的在DataFrame中强制转换字段的实现开始，我们将看到一些混淆的例子，并讨论每个例子对像这个用例这样看似简单的用例可能产生的影响。
在下一节中，我们将看看一些ML工程师在编写代码时的坏习惯。必须提到的是，清单13.3并不是要贬低它的方法和实现。在构建ML代码库时，命令式方法没有什么问题(前提是代码库没有紧密耦合，如果一个列发生变化，需要进行数十次编辑)。只有当解决方案的复杂性使得修改命令式代码成为负担时，它才会成为一个问题。如果项目足够简单，坚持使用更简单的代码。当您需要修改它并添加新功能时，您会感谢自己的简单性。

计算机代写|机器学习代写machine learning代考|The flavors of obfuscation

本节通过复杂性的滑动刻度进行进展，代码示例逐渐变得越来越不容易理解，越来越复杂，并且越来越难以维护。我们将分析一些开发人员的坏习惯，以帮助您识别这些编码模式，并指出它们对生产力的影响，以及绝对需要重构才能维护的地方。

如果你发现自己掉进了其中一个兔子洞，这些例子可以提醒你不要遵循这些模式。但是在我们开始示例之前，让我们看一下我所看到的关于开发习惯的角色，如图13.3所示。

这些角色并不是为了识别一个特定的人，而是为了描述DS在成为一名更好的开发人员的过程中可能经历的特征。我见过的绝大多数人(包括我自己)

以黑客的身份开始写代码。我们会发现自己被一个从未遇到过的问题卡住了，然后立即上网搜索解决方案，复制别人的代码，如果有效，就继续前进。(我并不是说在网上或书本上寻找信息是一件坏事;即使是最有经验的开发者也会经常这么做。)

随着编码经验的深入，一些人可能会倾向于其他三种编码风格中的一种，或者，如果他们得到适当的指导，直接进入中心区域。有些人需要证明一些东西——通常只向他们自己证明，因为大多数人只是希望他们的同伴编写来自好心人开发人员的那种代码。其他人可能觉得最少的代码行数是一种有效的开发策略，尽管他们在过程中牺牲了易读性、可扩展性和可测试性。图13.4显示了我遇到的(和亲身经历过的)模式。

这条迂回的道路会导致越来越复杂和不必要的复杂实现，然后才会到达智慧驱动体验的顶峰。在这段旅程中，我们所能期望的最好的结果是有能力识别和学习更好的路径——特别是，解决问题的最简单的解决方案(仍然满足任务的要求)总是解决问题的最佳方法。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|Clarifying correlation vs. causation

Posted on 2023年7月7日2023年7月7日 by statistics-lab

如果你也在怎样代写机器学习Machine Learning 这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。机器学习Machine Learning是一个致力于理解和建立 “学习 “方法的研究领域，也就是说，利用数据来提高某些任务的性能的方法。机器学习算法基于样本数据（称为训练数据）建立模型，以便在没有明确编程的情况下做出预测或决定。机器学习算法被广泛用于各种应用，如医学、电子邮件过滤、语音识别和计算机视觉，在这些应用中，开发传统算法来执行所需任务是困难的或不可行的。

机器学习Machine Learning程序可以在没有明确编程的情况下执行任务。它涉及到计算机从提供的数据中学习，从而执行某些任务。对于分配给计算机的简单任务，有可能通过编程算法告诉机器如何执行解决手头问题所需的所有步骤；就计算机而言，不需要学习。对于更高级的任务，由人类手动创建所需的算法可能是一个挑战。在实践中，帮助机器开发自己的算法，而不是让人类程序员指定每一个需要的步骤，可能会变得更加有效。

计算机代写|机器学习代写machine learning代考|Clarifying correlation vs. causation

An important part of presenting model results to a business unit is to be clear about the differences between correlation and causation. If there is even a slight chance of business leaders inferring a causal relationship from anything that you are showing them, it’s best to have this chat.

Correlation is simply the relationship or association that observed variables have to one another. It does not imply any meaning apart from the existence of this relationship. This concept is inherently counterintuitive to laypersons who are not involved in analyzing data. Making reductionist conclusions that “seem to make sense” about the data relationships in an analysis is effectively how our brains are wired.

For example, we could collect sales data for ice cream trucks and sales of mittens, both aggregated by week of year and country. We could calculate a strong negative correlation between the two (ice cream sales go up as mitten sales increase, and vice versa). Most people would chuckle at a conclusion of causality: “Well, if we want to sell more ice cream, we need to reduce our supply of mittens!”

What a layperson might instantly state from such a silly example is, “Well, people buy mittens when it’s cold and ice cream when it’s hot.” This is an attempt at defining causation. Based on this negative correlation in the observed data, we definitely can’t make such an inference regarding causation. We have no way of knowing what actually influenced the effect of purchasing ice cream or mittens on an individual basis (per observation).

If we were to introduce an additional confounding variable to this analysis (outside temperature), we might find additional confirmation of our spurious conclusion. However, this ignores the complexity of what drives decisions to purchase. As an example, see figure 11.7.

It’s clear that a relationship is present. As temperature increases, ice cream sales increase as well. The relationship being exhibited is fairly strong. But can we infer anything other than the fact that there is a relationship?

Let’s look at another plot. Figure 11.8 shows an additional observational data point that we could put into a model to aid in predicting whether someone might want to buy our ice cream.

计算机代写|机器学习代写machine learning代考|Leveraging A/B testing for attribution calculations

In the previous section, we established the importance of attribution measurement. For our ice cream coupon model, we defined a methodology to split our customer base into different cohort segments to minimize latent variable influence. We’ve defined why it’s so critical to evaluate the success criteria of our implementation based on business metrics associated with what we’re trying to improve (our revenue).

Armed with this understanding, how do we go about calculating the impact? How can we make an adjudication that is mathematically sound and provides an irrefutable assessment of something as complex as a model’s impact on the business?
A/B testing 101
Now that we have defined our cohorts by using a simple percentile-based RFM segmentation (the three groups that we assigned to customers in section 11.1.1), we’re ready to conduct random stratified sampling of our customers to determine which coupon experience they will get.

The control group will be getting the pre-ML treatment of a generic coupon being sent to their inbox on Mondays at 8 a.m. PST. The test group will be getting the targeted content and delivery timing.
NOTE Although simultaneously releasing multiple elements of a project that are all significant departures from the control conditions may seem counterintuitive for hypothesis testing (and it is confounding to a causal relationship), most companies are (wisely) willing to forego scientific accuracy of evaluations in the interest of getting a solution out into the world as soon as possible. If you’re ever faced with this supposed violation of statistical standards, my best advice is this: keep patiently quiet and realize that you can do variation tests later by changing aspects of the implementation in further A/B tests to determine causal impacts to the different aspects of your solution. When it’s time to release a solution, it’s often much more worthwhile to release the best possible solution first and then analyze components later.
Within a short period after production release, people typically want to see plots illustrating the impact as soon as the data starts rolling in. Many line charts will be created, aggregating business parameter results based on the control and test group. Before letting everyone go hog wild with making fancy charts, a few critical aspects of the hypothesis test need to be defined to make it a successful adjudication.

机器学习代考

计算机代写|机器学习代写machine learning代考|Clarifying correlation vs. causation

将模型结果呈现给业务单位的一个重要部分是明确相关性和因果关系之间的区别。如果商业领袖有一点点机会从你展示给他们的任何东西中推断出因果关系，那么最好和他们谈谈。

相关性仅仅是观察到的变量之间的关系或关联。除了这种关系的存在，它没有任何意义。对于不参与数据分析的外行来说，这个概念本质上是违反直觉的。对分析中的数据关系做出“似乎有意义”的简化主义结论，实际上是我们大脑的连接方式。

例如，我们可以收集冰淇淋车的销售数据和连指手套的销售数据，它们都是按周和国家进行汇总的。我们可以计算出两者之间强烈的负相关关系(冰淇淋销量上升，手套销量上升，反之亦然)。大多数人会对因果关系的结论窃笑:“嗯，如果我们想卖更多的冰淇淋，我们需要减少我们的连指手套的供应!”

对于这样一个愚蠢的例子，一个外行人可能会立即说:“嗯，人们在冷的时候买手套，在热的时候买冰淇淋。”这是一个定义因果关系的尝试。根据观察到的数据中的这种负相关，我们肯定不能对因果关系做出这样的推断。我们无法知道究竟是什么影响了个人购买冰淇淋或手套的效果(每次观察)。

如果我们在这个分析中引入一个额外的混淆变量(室外温度)，我们可能会发现我们的错误结论得到了额外的证实。然而，这忽略了驱动购买决策的因素的复杂性。如图11.7所示。

很明显，关系是存在的。随着气温的升高，冰淇淋的销量也会增加。所展示的关系是相当强的。但除了两者之间存在关系这一事实，我们还能推断出什么吗?

让我们看另一个图。图11.8显示了一个额外的观察数据点，我们可以将其放入模型中，以帮助预测某人是否可能想要购买我们的冰淇淋。

计算机代写|机器学习代写machine learning代考|Leveraging A/B testing for attribution calculations

在前一节中，我们确定了归因测量的重要性。对于我们的冰淇淋优惠券模型，我们定义了一种方法，将我们的客户群划分为不同的队列细分，以最小化潜在变量的影响。我们已经定义了为什么基于与我们正在努力改善的(我们的收入)相关的业务指标来评估我们实施的成功标准是如此重要。

有了这样的认识，我们该如何计算影响呢?我们如何才能做出一个在数学上合理的裁决，并对像模型对业务的影响这样复杂的事情提供无可辩驳的评估?
A/B测试101
现在，我们已经通过使用简单的基于百分位数的RFM细分(我们在11.1.1节中分配给客户的三组)定义了我们的队列，我们准备对客户进行随机分层抽样，以确定他们将获得哪种优惠券体验。

控制组将获得ml前处理的通用优惠券被发送到他们的收件箱在周一上午8点太平洋标准时间。测试组将获得目标内容和交付时间。
注:虽然同时发布一个项目的多个元素，这些元素都明显偏离控制条件，对于假设检验来说似乎是违反直觉的(而且它混淆了因果关系)，但大多数公司(明智地)愿意放弃评估的科学准确性，以便尽快将解决方案推向世界。如果你曾经遇到过这种违反统计标准的情况，我最好的建议是:耐心保持沉默，并意识到你可以在以后的A/B测试中通过改变执行方面来进行变异测试，以确定对解决方案不同方面的因果影响。在发布解决方案的时候，通常更值得先发布最好的解决方案，然后再分析组件。
在产品发布后的短时间内，人们通常希望在数据开始涌入时立即看到说明影响的图表。将创建许多折线图，根据控制和测试组聚合业务参数结果。在让每个人都疯狂地制作花哨的图表之前，需要定义假设检验的几个关键方面，以使其成为成功的裁决。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|Use of global mutable objects

Posted on 2023年7月7日2023年7月7日 by statistics-lab

计算机代写|机器学习代写machine learning代考|Use of global mutable objects

Continuing our exploration of our new team’s existing code base, we’re tackling another new feature to be added. This one adds completely new functionality. In the process of developing it, we realize that a large portion of the necessary logic for our branch already exists and we simply need to reuse a few methods and a function. What we fail to see is that the function uses a declaration of a globally scoped variable. When running our tests for our branch in isolation (through unit tests), everything works exactly as intended. However, the integration test of the entire code base produces a nonsensical result.

After hours of searching through the code, walking through debugging traces, we find that the state of the function that we were using actually changed from its first usage, and the global variable that the function was using actually changed, rendering our second use of it completely incorrect. We were burned by mutation.
How mutability can burn you
Recognizing how dangerous mutability is can be a bit tricky. Overuse of mutating values, shifting state, and overwriting of data can take many forms, but the end result is typically the same: an incredibly complicated series of bugs. These bugs can manifest themselves in different ways: Heisenbugs seemingly disappear when you’re trying to investigate them, and Mandelbugs are so complex and nondeterministic that they seem to be as complex as a fractal. Refactoring code bases that are riddled with mutation is nontrivial, and many times it’s simply easier to start over from scratch to fix the design flaws.
Issues with mutation and side effects typically don’t rear their heads until long after the initial MVP of a project. Later, in the development process or after a production release, flawed code bases relying on mutability and side effects start to break apart at the seams. Figure 10.3 shows an example of the nuances between different languages and their execution environments and why mutability concerns might not be as apparent, depending on which languages you’re familiar with.

For simplicity’s sake, let’s say that we’re trying to keep track of some fields to include in separate vectors used in an ensemble modeling problem. The following listing shows a simple function that contains a default value within the function signature’s parameters which, when used a single time, will provide the expected functionality.

计算机代写|机器学习代写machine learning代考|Encapsulation to prevent mutable side effects

By knowing that the Python functions maintain state (and everything is mutable in this language), we could have anticipated this behavior. Instead of applying a default argument to maintain isolation and break the object-mutation state, we should have initialized this function with a state that could be checked against.

By performing this simple state validation, we are letting the interpreter know that in order to satisfy the logic, a new object needs to be created to store the new list of values. The proper implementation for checking on instance state in Python for collection mutation is shown in the following listing.

Seemingly small issues like this can create endless headaches for the person (or team) implementing a project. Typically, these sorts of problems are developed early on, showing no issues while the modules are being built out. Even simple unit tests that validate this functionality in isolation will appear to be functioning correctly.

It is typically toward the midpoint of an MVP that issues involving mutability begin to rear their ugly heads. As greater complexity is built out, functions and classes may be utilized multiple times (which is a desired pattern in development), and if not implemented properly, what was seeming to work just fine before now results in difficult-to-troubleshoot bugs.
PRO TIP It’s best to become familiar with the way your development language handles objects, primitives, and collections. Knowing these core nuances of the language will give you the tools necessary to guide your development in a way that won’t create more work and frustration for you throughout the process.
A note on encapsulation
Throughout this book, you’ll see multiple references to me beating a dead horse about using functions in favor of declarative code. You’ll also notice references to favoring classes and methods to functions. This is all due to the overwhelming benefits that come with using encapsulation (and abstraction, but that’s another story discussed elsewhere in the text).
Encapsulating code has two primary benefits:

Restricting end-user access to internal protected functionality, state, or data

Enforcing execution of logic on a bundle of the data being passed in and the logic contained within the method

机器学习代考

计算机代写|机器学习代写machine learning代考|Use of global mutable objects

继续探索我们新团队现有的代码库，我们正在处理另一个要添加的新特性。这款添加了全新的功能。在开发它的过程中，我们意识到我们分支所需的大部分逻辑已经存在，我们只需要重用一些方法和一个函数。我们没有看到的是，该函数使用了一个全局作用域变量的声明。当为分支单独运行测试时(通过单元测试)，一切都完全按照预期工作。然而，整个代码库的集成测试产生了一个无意义的结果。

经过几个小时的代码搜索，通过调试跟踪，我们发现我们使用的函数的状态实际上与第一次使用时发生了变化，函数使用的全局变量实际上也发生了变化，导致我们对它的第二次使用完全不正确。我们被突变所灼伤。
可变性会如何毁掉你
认识到可变性有多危险可能有点棘手。过度使用变异值、转换状态和覆盖数据可以采取多种形式，但最终结果通常是相同的:一系列令人难以置信的复杂错误。这些错误可以以不同的方式表现出来:当你试图调查它们时，海森堡错误似乎会消失，而曼德尔bug是如此复杂和不确定，以至于它们看起来像分形一样复杂。重构充满突变的代码库是非常重要的，很多时候，从头开始修复设计缺陷更容易。
突变和副作用的问题通常在项目的初始MVP完成很久之后才会出现。后来，在开发过程中或产品发布之后，依赖于可变性和副作用的有缺陷的代码库开始在连接处破裂。图10.3显示了不同语言及其执行环境之间细微差别的一个示例，以及为什么可变性问题可能不那么明显，这取决于您熟悉的语言。

为了简单起见，假设我们试图跟踪一些字段，这些字段包含在集成建模问题中使用的单独向量中。下面的清单显示了一个简单的函数，它在函数签名的参数中包含一个默认值，当使用一次时，它将提供预期的功能。

计算机代写|机器学习代写machine learning代考|Encapsulation to prevent mutable side effects

通过了解Python函数维护状态(在这种语言中一切都是可变的)，我们可以预料到这种行为。我们不应该应用默认实参来维持隔离并打破对象突变状态，而应该将该函数初始化为可以检查的状态。

通过执行这个简单的状态验证，我们让解释器知道，为了满足逻辑，需要创建一个新对象来存储新的值列表。在下面的清单中显示了在Python中检查集合突变的实例状态的正确实现。

像这样看似很小的问题可能会给执行项目的人(或团队)带来无尽的头痛。通常，这些类型的问题是在早期开发的，在构建模块时没有显示任何问题。即使是单独验证此功能的简单单元测试也会正常运行。

通常是在MVP的中期，涉及可变性的问题开始浮出水面。随着构建出更大的复杂性，函数和类可能会被多次使用(这是开发中的一种理想模式)，如果没有正确实现，以前看起来工作得很好的东西现在会导致难以排除故障的bug。
专业提示:最好熟悉你的开发语言处理对象、原语和集合的方式。了解语言的这些核心细微差别将为您提供必要的工具，以指导您的开发，而不会在整个过程中为您带来更多的工作和挫折。
关于封装的说明
在本书中，您将看到我在使用函数而不是声明性代码的问题上反复强调的陈词滥调。您还会注意到对类和函数方法的引用。这都是由于使用封装(和抽象，但这是本文其他地方讨论的另一个故事)带来的巨大好处。
封装代码有两个主要好处:

限制最终用户对内部受保护功能、状态或数据的访问

对传入的数据束和方法中包含的逻辑强制执行逻辑

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|Walls of text

Posted on 2023年7月7日2023年7月7日 by statistics-lab

计算机代写|机器学习代写machine learning代考|Walls of text

If there was one thing that I learned relatively early in my career as a data scientist, it was that I truly hate debugging. It wasn’t the act of tracking down a bug in my code that frustrated me; rather, it was the process that I had to go through to figure out what went wrong in what I was telling the computer to do.

Like many DS practitioners at the start of their career, when I began working on solving problems with software, I would write a lot of declarative code. I wrote my solutions much in the way that I logically thought about the problem (“I pull my data, then I do some statistical tests, then I make a decision, then I manipulate the data, then I put it in a vector, then into a model …”). This materialized as a long list of actions that flowed directly, one into another. What this programming model meant in the final product was a massive wall of code with no separation or isolation of actions, let alone encapsulation.
Finding the needle in the haystack for any errors in code written in that manner is an exercise in pure, unadulterated torture. The architecture of the code was not conducive to allowing me to figure out which of the hundreds of steps contained therein was causing an issue.

Troubleshooting walls of text (WoT, pronounced What?!) is an exercise in patience that bears few parallels in depth and requisite effort. If you’re the original author of such a display of code, it’s an annoying endeavor (you have no one to hate other than yourself for creating the monstrosity), depressing activity (see prior comment), and time-consuming slog that can be so easily avoided-provided you know how, what, and where to isolate elements within your ML code.

If written by someone else, and you’re the unfortunate heir to the code base, I extend to you my condolences and a hearty “Welcome to the club.” Perhaps a worthy expenditure of your time after fixing the code base would be to mentor the author, provide them with an ample reading list, and help them to never produce such rage-inducing code again.

To have a frame of reference for our discussion, let’s take a look at what one of these WoTs could look like. While the examples in this section are rather simplistic, the intention is to imagine what a complete end-to-end ML project would look like in this format, without having to read through hundreds of lines. (I imagine that you wouldn’t like to flip through dozens of pages of code in a printed book.)

计算机代写|机器学习代写machine learning代考|Considerations for monolithic scripts

Aside from being hard to read, listing 9.1’s biggest flaw is that it’s monolithic. Although it is a script, the principles of WoT development can apply to both functions and methods within classes. This example comes from a notebook, which increasingly is the declarative vehicle used to execute ML code, but the concept applies in a general sense.

Having too much logic within the bounds of an execution encapsulation creates problems (since this is a script run in a notebook, the entire code is one encapsulated block). I invite you to think about these issues through the following questions:

What would it look like if you had to insert new functionality in this block of code?
Would it be easy to test if your changes are correct?
What if the code threw an exception?
How would you go about figuring out what went wrong with the code from an exception being thrown?
What if the structure of the data changed? How would you go about updating the code to reflect those changes?

Before we get into answering some of these questions, let’s look at what this code actually does. Because of the confusing variable names, dense coding structure, and tight coupling of references, we would have to run it to figure out what it’s doing. The next listing shows the first aspect of listing 9.1.

机器学习代考

计算机代写|机器学习代写machine learning代考|Logging: Code, metrics, and results

第2章和第3章讨论了关于建模活动的沟通的关键重要性，无论是对业务还是在数据科学家团队之间。不仅能够显示我们的项目解决方案，而且能够有一个可供参考的出处历史，这对于项目的成功同样重要，如果不是更重要的话，甚至比用于解决它的算法更重要。
对于我们在前几章中介绍的预测项目，解决方案的ML方面并不是特别复杂，但问题的严重性却很复杂。由于要对数千个机场进行建模(这反过来意味着要对数千个模型进行调优和跟踪)，处理通信并为每次项目代码的执行提供历史数据参考是一项艰巨的任务。

当在生产中运行我们的预测项目之后，业务单元团队的成员想要解释为什么特定的预测与所收集的数据的最终现实相距甚远时，会发生什么情况?这是许多公司的一个常见问题，这些公司依赖机器学习预测来告知业务运行中应该采取的行动。如果黑天鹅事件发生了，而企业在质疑为什么建模的预测解决方案没有预见到它，你最不想处理的事情就是尝试重新生成模型在某个时间点可能预测到的内容，以便完全解释不可预测的事件是如何无法建模的。
黑天鹅事件是一种不可预见的、多次灾难性的事件，它改变了所获取数据的性质。虽然罕见，但它们会对模特、企业和整个行业产生灾难性的影响。最近的一些黑天鹅事件包括9 / 11恐怖袭击、2008年金融崩溃和Covid-19大流行。由于这些事件的深远和完全不可预测的性质，对模型的影响绝对是毁灭性的。“黑天鹅”一词是纳西姆·尼古拉斯·塔勒布在《黑天鹅:极不可能事件的影响》(兰登书屋，2007年)一书中创造并普及的，涉及到数据和商业。
为了解决ML从业者必须处理的这些棘手问题，MLflow被创建。在本节中，我们将研究MLflow的一个方面是跟踪API，它为我们提供了一个地方来记录所有的调优迭代、每个模型调优运行的指标，以及可以从统一的图形用户界面(GUI)轻松检索和引用的预生成的可视化。

计算机代写|机器学习代写machine learning代考|MLflow tracking

让我们看看第7章(第7.2节)中关于MLflow日志记录的两个基于spark的实现是怎么回事。在该章的代码示例中，在两个不同的地方实例化了MLflow上下文的初始化。
在第一种方法中，使用SparkTrials作为状态管理对象(在驱动程序上运行)，MLflow上下文被放置为run_tuning()函数中整个调优运行的包装器。当使用SparkTrials时，这是编排运行跟踪的首选方法，这样可以很容易地将父运行的各个子运行关联起来，以便从跟踪服务器的GUI中进行查询，以及从REST API请求到涉及过滤器谓词的跟踪服务器进行查询。

图8.1显示了与MLflow的跟踪服务器交互时该代码的图形化表示。代码不仅记录了父封装运行的元数据，还记录了在每个超参数求值发生时来自工作线程的每次迭代日志记录。

在MLflow跟踪服务器的GUI中查看实际的代码表现时，我们可以看到父子关系的结果，如图8.2所示。

相反，用于pandas_udf实现的方法略有不同。在第7章的清单7.10中，Hyperopt执行的每次迭代都需要创建一个新的实验。由于没有父子关系将数据分组在一起，因此需要使用自定义命名和标记的应用程序来支持GUI中的可搜索性，并且对于具有生产能力的代码来说更重要的是REST API。图8.3显示了这种替代方法的日志记录机制的概述(以及这个包含数千个模型的用例的更可伸缩的实现)。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写