分类：机器学习/统计学习代写

计算机代写|机器学习代写machine learning代考|Can you put this into production? Would you want to maintain it?

Posted on 2023年6月8日2023年6月8日 by statistics-lab

如果你也在怎样代写机器学习Machine Learning 这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。机器学习Machine Learning是一个致力于理解和建立 “学习 “方法的研究领域，也就是说，利用数据来提高某些任务的性能的方法。机器学习算法基于样本数据（称为训练数据）建立模型，以便在没有明确编程的情况下做出预测或决定。机器学习算法被广泛用于各种应用，如医学、电子邮件过滤、语音识别和计算机视觉，在这些应用中，开发传统算法来执行所需任务是困难的或不可行的。

机器学习Machine Learning程序可以在没有明确编程的情况下执行任务。它涉及到计算机从提供的数据中学习，从而执行某些任务。对于分配给计算机的简单任务，有可能通过编程算法告诉机器如何执行解决手头问题所需的所有步骤；就计算机而言，不需要学习。对于更高级的任务，由人类手动创建所需的算法可能是一个挑战。在实践中，帮助机器开发自己的算法，而不是让人类程序员指定每一个需要的步骤，可能会变得更加有效。

statistics-lab™ 为您的留学生涯保驾护航在代写机器学习 machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习 machine learning代写方面经验极为丰富，各种代写机器学习 machine learning相关的作业也就用不着说。

计算机代写|机器学习代写machine learning代考|Can you put this into production? Would you want to maintain it?

While the primary purpose of an experimentation phase, to the larger team, is to make a decision on the predictive capabilities of a model’s implementation, one of the chief purposes internally, among the DS team, is to determine whether the solution is tenable for the team. The DS team lead, architect, or senior DS person on the team should be taking a close look at what is going to be involved in this project, asking difficult questions, and producing honest answers. Some of the most important questions are as follows:

How long is this solution going to take to build?

How complex is this code base going to be?

How expensive is this going to be to train based on the schedule it needs to be retrained at?

Does my team have the skill required to maintain this solution? Does everyone know this algorithm/language/platform?

How quickly will we be able to modify this solution should something dramatically change with the data that it’s training or inferring on?

Has anyone else reported success with using this methodology/platform/language/API? Are we reinventing the wheel or are we building a square wheel?

How much additional work will the team have to do to make this solution work while meeting all of the other feature goals?

Is this going to be extensible? When the inevitable version 2.0 of this is requested, will we be able to enhance this solution easily?

Is this testable?
= Is this auditable?

Innumerable times in my career, I’ve been either the one building these prototypes or the one asking these questions while reviewing someone else’s prototype. Although an ML practitioner’s first reaction to seeing results is frequently, “Let’s go with the one that has the best results,” many times the “best” one ends up being either nighimpossible to fully implement or a nightmare to maintain.

计算机代写|机器学习代写machine learning代考|TDD vs. RDD vs. PDD vs. CDD for ML projects

We seem to have an infinite array of methodologies to choose from when developing software. From waterfall to the Agile revolution (and all of its myriad flavors), each has benefits and drawbacks.

We won’t discuss the finer points of which development approach might be best for particular projects or teams. Absolutely fantastic books have been published that explore these topics in depth, and I highly recommend reading them to improve the development processes for ML projects. Becoming Agile in an Imperfect World by Greg Smith and Ahmed Sidky (Manning, 2009) and Test Driven: TDD and Acceptance TDD for Java Developers by Lasse Koskela (Manning, 2007) are notable resources. Worth discussing here, however, are four general approaches to ML development (one being a successful methodology, the others being cautionary tales).
TEST-DRIVEN DEVELOPMENT OR FEATURE-DRIVEN DEVELOPMENT
Pure test-driven development $(T D D)$ is incredibly challenging to achieve for ML projects (and certainly unable to achieve the same test coverage in the end that traditional software development can), mostly due to the nondeterministic nature of models themselves. A pure feature-driven development (FDD) approach can cause significant rework during a project.

But most successful approaches to ML projects embrace aspects of both of these development styles. Keeping work incremental, adaptable to change, and focused on modular code that is not only testable but focused entirely on required features to meet the project guidelines is a proven approach that helps deliver the project on time while also creating a maintainable and extensible solution.

These Agile approaches will need to be borrowed from and adapted in order to create an effective development strategy that works not only for the development team, but also for an organization’s general software development practices. In addition, specific design needs can dictate slightly different approaches to implementing a particular project.

机器学习代考

计算机代写|机器学习代写machine learning代考|Can you put this into production? Would you want to maintain it?

对于较大的团队来说，实验阶段的主要目的是对模型实现的预测能力做出决定，而DS团队内部的主要目的之一是确定解决方案是否适合团队。DS团队领导、架构师或团队中的高级DS人员应该仔细研究这个项目将涉及到什么，提出困难的问题，并给出诚实的答案。以下是一些最重要的问题:

构建这个解决方案需要多长时间?

这个代码库会有多复杂?

根据需要再培训的时间安排，培训成本有多高?

我的团队是否具备维护该解决方案所需的技能?大家都知道这个算法/语言/平台吗?

如果训练或推断的数据发生了巨大的变化，我们能多快地修改这个解决方案?

有没有其他人报告使用这种方法/平台/语言/API取得了成功?我们是在重新发明轮子还是在建造一个方形的轮子?

团队需要做多少额外的工作才能使这个解决方案工作，同时满足所有其他功能目标?

这是可扩展的吗?当这个不可避免的2.0版本被要求时，我们能够轻松地增强这个解决方案吗?

这是可测试的吗?
这是可审计的吗?

在我的职业生涯中，有无数次，我要么是构建这些原型的人，要么是在审查别人的原型时提出这些问题的人。尽管ML实践者看到结果的第一反应通常是“让我们选择最好的结果”，但很多时候，“最好”的结果要么是几乎不可能完全实现，要么是一场噩梦。

计算机代写|机器学习代写machine learning代考|TDD vs. RDD vs. PDD vs. CDD for ML projects

在开发软件时，我们似乎有无数种方法可供选择。从瀑布到敏捷革命(以及它的各种风格)，每一种都有优点和缺点。

我们不会讨论哪种开发方法对特定的项目或团队来说是最好的。已经出版了非常棒的书籍，深入探讨了这些主题，我强烈建议阅读它们来改进ML项目的开发过程。Greg Smith和Ahmed Sidky的《在不完美的世界中变得敏捷》(Manning, 2009)和Lasse Koskela的《测试驱动:Java开发人员的TDD和验收TDD》(Manning, 2007)都是值得注意的资源。然而，值得在这里讨论的是ML开发的四种一般方法(一种是成功的方法，另一种是警示故事)。
测试驱动开发或特性驱动开发
对于机器学习项目来说，纯粹的测试驱动开发(tdd)是难以置信的挑战(当然，最终无法实现与传统软件开发相同的测试覆盖率)，主要是由于模型本身的不确定性。纯粹的功能驱动开发(FDD)方法会在项目期间导致大量的返工。

但是大多数成功的ML项目方法都包含了这两种开发风格。保持工作增量，适应变化，专注于模块化代码，不仅可测试，而且完全专注于满足项目指导方针所需的功能，这是一种经过验证的方法，可以帮助按时交付项目，同时创建可维护和可扩展的解决方案。

为了创建不仅适用于开发团队，而且适用于组织的一般软件开发实践的有效开发策略，需要借鉴和调整这些敏捷方法。此外，特定的设计需求可以指示实现特定项目的略微不同的方法。

计算机代写|机器学习代写machine learning代考请认准statistics-lab™

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

金融工程是使用数学技术来解决金融问题。金融工程使用计算机科学、统计学、经济学和应用数学领域的工具和知识来解决当前的金融问题，以及设计新的和创新的金融产品。

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

术语广义线性模型（GLM）通常是指给定连续和/或分类预测因素的连续响应变量的常规线性回归模型。它包括多元线性回归，以及方差分析和方差分析（仅含固定效应）。

有限元方法代写

有限元方法（FEM）是一种流行的方法，用于数值解决工程和数学建模中出现的微分方程。典型的问题领域包括结构分析、传热、流体流动、质量运输和电磁势等传统领域。

有限元是一种通用的数值方法，用于解决两个或三个空间变量的偏微分方程（即一些边界值问题）。为了解决一个问题，有限元将一个大系统细分为更小、更简单的部分，称为有限元。这是通过在空间维度上的特定空间离散化来实现的，它是通过构建对象的网格来实现的：用于求解的数值域，它有有限数量的点。边界值问题的有限元方法表述最终导致一个代数方程组。该方法在域上对未知函数进行逼近。[1] 然后将模拟这些有限元的简单方程组合成一个更大的方程系统，以模拟整个问题。然后，有限元通过变化微积分使相关的误差函数最小化来逼近一个解决方案。

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

随机分析代写

随机微积分是数学的一个分支，对随机过程进行操作。它允许为随机过程的积分定义一个关于随机过程的一致的积分理论。这个领域是由日本数学家伊藤清在第二次世界大战期间创建并开始的。

时间序列分析代写

随机过程，是依赖于参数的一组随机变量的全体，参数通常是时间。随机变量是随机现象的数量表现，其时间序列是一组按照时间发生先后顺序进行排列的数据点序列。通常一组时间序列的时间间隔为一恒定值（如1秒，5分钟，12小时，7天，1年），因此时间序列可以作为离散时间数据进行分析处理。研究时间序列数据的意义在于现实中，往往需要研究某个事物其随时间发展变化的规律。这就需要通过研究该事物过去发展的历史记录，以得到其自身发展的规律。

回归分析代写

多元回归分析渐进（Multiple Regression Analysis Asymptotics）属于计量经济学领域，主要是一种数学上的统计分析方法，可以分析复杂情况下各影响因素的数学关系，在自然科学、社会和经济学等多个领域内应用广泛。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习和应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|SME review/prototype review: Can we solve this?

Posted on 2023年5月25日2023年5月25日 by statistics-lab

计算机代写|机器学习代写machine learning代考|SME review/prototype review: Can we solve this?

By far the most important of the early meetings, the SME review is one you really don’t want to skip. This is the point at which a resource commit occurs. It’s the final decision on whether this project is going to happen or will be put into the backlog while a simpler problem is solved.

During this review session, the same questions should be asked as in the preceding meeting with the SME group. The only modification is that they should be tailored to answering whether the capability, budget, and desire exist for developing the full solution, now that the full scope of the work is more fully known.

The main focus of this discussion is typically on the mocked-up prototype. For our recommendation engine, the prototype may look like a synthetic wireframe of the website with a superimposed block of product image and labels associated with the product being displayed. It is always helpful, for the purposes of these demonstrations, to use real data. If you’re showing a demonstration of recommendations to a group of SME members, show their data. Show the recommendations for their account (with their permission, of course!) and gauge their responses. Record each positivebut more important, each negative-impression that they give.

计算机代写|机器学习代写machine learning代考|What if it’s terrible?

Depending on the project, the models involved, and the general approach to the ML task, the subjective rating of a prototype being “terrible” can be either trivial to fix (properly tune the model, augment the feature set, and so forth) or can be a complete impossibility (the data doesn’t exist to augment the additional feature requests, the data isn’t granular enough to solve the request, or improving the prediction to the group’s satisfaction would require a healthy dose of magic since the technology to solve that problem doesn’t exist yet).

It’s critical to quickly distill the reasons that any identified issues are happening. If the reasons are obvious and widely known as elements that can be modified by the DS team, simply answer as such. “Don’t worry, we’ll be able to adjust the predictions so that you don’t see multiple pairs of sandals right next to one another” is perfectly fine. But if the problem is of an intensely complex nature, “I really don’t want to see bohemian maxi dresses next to grunge shoes” (hopefully, you will be able to quickly search what those terms mean during the meeting), the response should be either thoughtfully articulated to the person, or recorded for a period of additional research, capped in time and effort to such research.

At the next available opportunity, the response may be along the lines of either, “We looked into that, and since we don’t have data that declares what style these shoes are, we would have to build a CNN model, train it to recognize styles, and create the hundreds of thousands of labels needed to identify these styles across our product catalog. That would likely take several years to build.” or “We looked into that, and because we have the labels for every product, we can easily group recommendations by style type to give you more flexibility around what sort of product mixing you would like.”

Make sure that you know what is and is not possible before the prototype review session. If you encounter a request that you’re not sure of, use the eight golden words of ML: “I don’t know, but I’ll go find out.”

机器学习代考

计算机代写|机器学习代写machine learning代考|SME review/prototype review: Can we solve this?

到目前为止，在早期会议中最重要的是，SME审查是您真的不想跳过的。这是发生资源提交的点。当一个更简单的问题得到解决时，它是决定这个项目是否会发生或将被放入待办事项列表的最终决定。

在这次审查会议期间，应提出与上次与中小企业小组开会时相同的问题。唯一的修改是，它们应该被裁剪，以回答是否存在开发完整解决方案的能力、预算和愿望，现在工作的全部范围已经更充分地了解了。

这个讨论的主要焦点通常是在模拟原型上。对于我们的推荐引擎，原型可能看起来像一个网站的合成线框，上面有一个叠加的产品图像块和与所显示的产品相关的标签。为了这些演示的目的，使用真实数据总是很有帮助的。如果您要向一组SME成员展示推荐演示，请显示他们的数据。向他们的账户展示推荐内容(当然，要经过他们的允许!)并评估他们的反应。记录下他们给人的每一个积极的，但更重要的是，每一个消极的印象。

计算机代写|机器学习代写machine learning代考|What if it’s terrible?

根据项目、所涉及的模型和ML任务的一般方法，对原型的主观评价“糟糕”可能是微不足道的(适当调整模型、增加功能集等)，也可能是完全不可能的(数据不存在，无法增加额外的功能请求，数据不够细粒度，无法解决请求)。或者将预测提高到团队满意的程度，需要一剂健康的魔法，因为解决这个问题的技术还不存在)。

快速提炼出任何已识别问题发生的原因是至关重要的。如果原因很明显，并且众所周知是DS团队可以修改的元素，那么就简单地回答。“别担心，我们会调整预测，这样你就不会看到多双凉鞋并排在一起了。”但如果问题非常复杂，“我真的不想看到波西米亚及地长裙和垃圾鞋放在一起”(希望你能在会议中快速搜索到这些术语的含义)，你的回答要么要深思熟虑地向对方表达，要么要记录下来，作为一段时间的额外研究，为这些研究提供时间和精力。

在下一次可用的机会中，回应可能是这样的:“我们研究了一下，因为我们没有数据表明这些鞋子是什么风格，我们必须建立一个CNN模型，训练它识别风格，并创建数十万个标签，以便在我们的产品目录中识别这些风格。”这可能需要几年的时间来建立。”或者“我们对此进行了研究，因为我们有每个产品的标签，所以我们可以很容易地根据风格类型进行分组推荐，让你更灵活地选择你喜欢的产品组合。”

确保你在原型审查会议之前知道什么是可能的，什么是不可能的。如果你遇到一个你不确定的请求，使用ML的八个黄金字:“我不知道，但我会去弄清楚的。”

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|DEVELOPMENT SPRINT REVIEWS

Posted on 2023年5月25日2023年5月25日 by statistics-lab

计算机代写|机器学习代写machine learning代考|DEVELOPMENT SPRINT REVIEWS (PROGRESS REPORTS FOR A NONTECHNICAL AUDIENCE)

Conducting recurring meetings of a non-engineering-focused bent are useful for more than just passing information from the development teams to the business. They can serve as a bellwether of the state of the project and help indicate when integration of disparate systems can begin. These meetings should still be a high-level projectfocused discussion, though.

The temptation for many cross-functional teams that work on projects like this is to turn these update meetings into either an über-retrospective or a super sprint-planning meeting. While such discussions can be useful (particularly for integration purposes among various engineering departments), those topics should be reserved for the engineering team’s meetings.

A full-team progress report meeting should make the effort to generate a currentstate demonstration of progress up to that point. Simulations of the solution should be shown to ensure that the business team and SMEs can provide relevant feedback on details that might have been overlooked by the engineers working on the project. These periodic meetings (either every sprint or every other sprint) can help prevent the aforementioned dreaded scope creep and the 11 th-hour finding that a critical component that wasn’t noticed as necessary is missing, causing massive delays in the project’s delivery.

计算机代写|机器学习代写machine learning代考|MVP REVIEW (FULL DEMO WITH UAT)

Code complete can mean different things to different organizations. In general, it is widely accepted to be a state in which

Code is tested (and passes unit/integration tests).
” The system functions as a whole in an evaluation environment using productionscale data (models have been trained on production data).
All agreed-upon features that have been planned are complete and perform as designed.

This doesn’t mean that the subjective quality of the solution is met, though. This stage simply means the system will pass recommendations to the right elements on the page for this recommendation engine example. The MVP review and the associated UAT that goes into preparing for this meeting is the stage at which subjective measures of quality are done.

What does this mean for our recommendation engine? It means that the SMEs log in to the UAT environment and navigate the site. They look at the recommendations based on their preferences and make judgments on what they see. It also means that high-value accounts are simulated, ensuring that the recommendations that the SMEs are looking at through the lens of these customers are congruous to what they know about those types of users.

For many ML implementations, metrics are a wonderful tool (and most certainly should be heavily utilized and recorded for all modeling). But the best gauge of determining whether the solution is qualitatively solving the problem is to use the breadth of knowledge of internal users and experts who can use the system before it’s deployed to end users.

At meetings evaluating the responses to UAT feedback of a solution developed over a period of months, I’ve seen arguments break out between the business and the DS team about how one particular model’s validation metrics are higher, but the qualitative review quality is much lower than the inverse situation. This is exactly why this particular meeting is so critical. It may uncover glaring issues that were missed in not only the planning phases, but in the experimental and development phases as well. Having final sanity checks on the results of the solution can only make the end result better.

There is a critical bit of information to remember about this meeting and review period dealing with estimates of quality: nearly every project carries with it a large dose of creator bias. When creating something, particularly an exciting system that has a sufficient challenge to it, the creators can overlook and miss important flaws because of familiarity with and adoration of it.
A parent can never see how ugly or stupid their children are. It’s human nature to unconditionally love what you’ve created.
-Every rational parent, ever.

机器学习代考

PMENT SPRINT REVIEWS (PROGRESS REPORTS FOR A NONTECHNICAL AUDIENCE)

定期召开以非工程为中心的会议不仅仅是将信息从开发团队传递给业务部门。它们可以作为项目状态的风向标，并帮助指示何时可以开始集成不同的系统。尽管如此，这些会议仍然应该是高层次的项目讨论。

许多从事这类项目的跨职能团队都倾向于将这些更新会议变成超级回顾会议或超级冲刺计划会议。虽然这样的讨论可能是有用的(特别是对于不同工程部门之间的集成目的)，但这些主题应该保留给工程团队的会议。

一个完整的团队进度报告会议应该努力生成到该点为止的进度的当前状态演示。应该显示解决方案的模拟，以确保业务团队和中小企业可以提供有关项目工程师可能忽略的细节的相关反馈。这些定期会议(每个sprint或每个其他sprint)可以帮助防止前面提到的可怕的范围蔓延，以及在第11个小时发现没有注意到的关键组件丢失了，从而导致项目交付的大量延迟。

计算机代写|机器学习代写machine learning代考|MVP REVIEW (FULL DEMO WITH UAT)

代码完整对于不同的组织意味着不同的东西。一般来说，它被广泛接受为一种状态

代码经过测试(并通过单元/集成测试)。
“该系统在使用生产规模数据的评估环境中作为一个整体运行(模型已经在生产数据上进行了训练)。

所有已计划的商定功能都是完整的，并按设计执行。

但是，这并不意味着解决方案的主观质量得到了满足。这个阶段仅仅意味着系统将把推荐传递给这个推荐引擎示例页面上的正确元素。MVP评审和为会议做准备的相关UAT是完成质量主观度量的阶段。

这对我们的推荐引擎意味着什么?这意味着中小企业登录到UAT环境并浏览站点。他们会根据自己的喜好查看推荐，并根据所看到的内容做出判断。这也意味着高价值的账户是模拟的，确保中小企业通过这些客户的视角看到的建议与他们对这些类型的用户的了解是一致的。

对于许多ML实现来说，度量是一个很好的工具(当然应该在所有建模中大量使用和记录)。但是，确定解决方案是否定性地解决问题的最佳标准是在将系统部署给最终用户之前使用内部用户和专家的知识广度。

在评估几个月来开发的解决方案对UAT反馈的响应的会议上，我看到业务和DS团队之间爆发了争论，争论的焦点是一个特定模型的验证度量是如何更高的，但是定性审查的质量却比相反的情况低得多。这就是为什么这次会议如此重要。它可能会发现不仅在计划阶段，而且在实验和开发阶段都被遗漏的明显问题。对解决方案的结果进行最终的完整性检查只会使最终结果更好。

关于处理质量评估的会议和审查阶段，有一点至关重要的信息需要记住:几乎每个项目都带有大量的创作者偏见。当创造某些内容时，特别是一个具有足够挑战的令人兴奋的系统，创造者可能会因为熟悉和崇拜它而忽略重要的缺陷。
父母永远看不到他们的孩子有多丑或多笨。无条件地爱自己创造的东西是人的本性。
-每一个理性的父母。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|WHEN SHOULD WE MEET TO SHARE PROGRESS?

Posted on 2023年5月25日2023年5月25日 by statistics-lab

计算机代写|机器学习代写machine learning代考|WHEN SHOULD WE MEET TO SHARE PROGRESS?

Because of the complex nature of most ML projects (particularly ones that require so many interfaces to parts of a business as a recommendation engine), meetings are critical. However, not all meetings are created equally.

While it is incredibly tempting for people to want to have cadence meetings on a certain weekly prescribed basis, project meetings should coincide with milestones associate with the project. These project-based milestone meetings should

Not be a substitute for daily standup meetings

Not overlap with team-focused meetings of individual departments

Always be conducted with the full team present

Always have the project lead present to make final decisions on contentious topics

Be focused on presenting the solution as it stands at that point and nothing else
Well-intentioned but toxic external ideation
It’s incredibly tempting for discussions to happen outside these structured presentation and data-focused meetings. Perhaps people on your team who are not involved in the project are curious and would like to provide feedback and additional brain

(continued)
storming sessions. Similarly, it could be convenient to discuss a solution to something that you’re stuck on with a small group from the larger team.

I cannot stress strongly enough how much disruption can, and likely will, arise from these outside-of-the-team discussions. Any decisions made in a large-scale project (even in the experimentation phase) by the team members should be considered sacrosanct. Involving outside voices and people who are “trying to help” erodes the inclusive communication environment that has been built collectively.

Outside ideation also typically introduces an uncontrollable chaos to the project that is difficult for everyone involved in the implementation to manage. If the DS team decides in a vacuum, for instance, to change the delivery method of the predictions (reusing a REST endpoint with additional payload data, for instance), it would affect the entire project. Even though it may save the DS team a week’s worth of work by not having to create another REST endpoint, it would be disastrous for any work that the frontend engineers are working on. This could potentially cause weeks of rework for the frontend team.

Introducing changes without notifying and discussing them in the larger group risks wasting a great deal of time and resources, which in turn erodes the confidence that the team and the business at large has in the process. It’s a fantastically effective way of having the project become shelfware or introducing silo behavior among microcosm groups of business units.

计算机代写|机器学习代写machine learning代考|But when is it going to be done?

Honesty is always the best policy. I’ve seen a lot of DS teams think that it’s wise to under-promise and over-deliver during project planning. This isn’t a wise move.

Many times, this policy of giving wiggle room to a project is employed to protect against unforeseen complexities that arise during project development. But factoring those into estimated delivery dates doesn’t do the team any favors. It’s dishonest and can erode trust that the business has in the team. The better approach is to just be honest with everyone. Let them know that ML projects have a lot of unknown factors baked into them.

The only thing this practice will result in is frustrated and angry internal business unit customers. They won’t like continually getting results weeks earlier than promised and will quickly catch on to your antics. Trust is important.

The other side of this factual omission coin relates to setting unrealistic expectations in deliveries. By not telling the business that things can go sideways during many of the phases of project work and setting an aggressive delivery date for iterative design, everyone will expect something useful to be delivered on that date. Failing to explain that these are general targets that may need slight adjustment means that the only way to accommodate unforeseen complications is by forcing the DS team to work long and grueling hours to hit those goals.

Only one result is guaranteed: team burnout. If the team is completely demotivated and exhausted from striving to meet unreasonable demands, the solution will never be very good. Details will be missed, bugs will proliferate in code, and the best members on the team will be updating their resumes to find a better job once the solution is in production.

计算机代写|机器学习代写machine learning代考|What is experimental scoping?

机器学习代考

计算机代写|机器学习代写machine learning代考|WHEN SHOULD WE MEET TO SHARE PROGRESS?

由于大多数ML项目的复杂性(特别是那些需要与业务的各个部分相连接的项目，比如推荐引擎)，会议是至关重要的。然而，并不是所有的会议都是平等的。

虽然人们想要在每周规定的基础上有节奏的会议是非常诱人的，但项目会议应该与项目相关的里程碑一致。这些基于项目的里程碑会议应该

不要代替每天的站立会议

不要与个别部门的团队会议重叠

在整个团队都在场的情况下进行

总是让项目负责人在场，对有争议的话题做出最终决定

专注于呈现当前的解决方案，而不是其他
善意但有害的外部想法
在这些结构化的演示和以数据为中心的会议之外进行讨论是非常诱人的。也许你的团队中没有参与项目的人很好奇，愿意提供反馈和额外的想法

(继续)
风暴会议。类似地，在大团队中的一个小组中讨论某个问题的解决方案也很方便。

我再怎么强调也不过分，这些团队外的讨论可能会造成多大的破坏。团队成员在大型项目(甚至在实验阶段)中做出的任何决定都应该被认为是神圣不可侵犯的。让外界的声音和“试图帮助”的人参与进来，会侵蚀集体建立起来的包容性沟通环境。

外部构思通常也会给项目带来无法控制的混乱，这对参与实施的每个人来说都是难以管理的。例如，如果DS团队在真空中决定更改预测的交付方法(例如，重用带有额外有效负载数据的REST端点)，则会影响整个项目。尽管它可以省去DS团队一周的工作，因为它不必创建另一个REST端点，但对于前端工程师正在进行的任何工作来说，这将是灾难性的。这可能会给前端团队带来数周的返工。

在没有通知和讨论的情况下引入变更可能会浪费大量的时间和资源，这反过来又会削弱团队和业务在整个过程中的信心。这是一种非常有效的方法，可以让项目成为架子软件，或者在业务单元的微观组中引入筒仓行为

计算机代写|机器学习代写machine learning代考|But when is it going to be done?

诚实永远是上策。我看到许多DS团队认为在项目规划期间少承诺和多交付是明智的。这不是明智之举。

很多时候，这种给项目留有回旋余地的策略被用来防止项目开发过程中出现的不可预见的复杂性。但是，将这些因素纳入预计的交付日期对团队没有任何好处。这是不诚实的，而且会破坏企业对团队的信任。更好的方法是对每个人都诚实。让他们知道机器学习项目有很多未知的因素。

这种做法只会导致内部业务部门的客户感到沮丧和愤怒。他们不喜欢总是比承诺提前几周得到结果，很快就会发现你的滑稽行为。信任很重要。

事实遗漏硬币的另一面与在交付中设定不切实际的期望有关。通过不告诉业务人员在项目工作的许多阶段中事情可能会出现偏差，并为迭代设计设定一个积极的交付日期，每个人都会期望在那个日期交付有用的东西。未能解释这些是可能需要稍微调整的一般目标，这意味着适应不可预见的复杂性的唯一方法是强迫DS团队长时间工作以实现这些目标。

只有一个结果是肯定的:团队倦怠。如果团队因为努力满足不合理的要求而完全失去动力和疲惫，那么解决方案永远不会很好。细节会被遗漏，代码中的bug会激增，一旦解决方案投入生产，团队中最优秀的成员会更新他们的简历，寻找更好的工作。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|What is experimental scoping?

Posted on 2023年5月12日2023年5月12日 by statistics-lab

计算机代写|机器学习代写machine learning代考|What is experimental scoping?

Before you can begin to estimate how long a project is going to take, you need to research not only how others have solved similar problems but also potential solutions from a theoretical point of view. With the scenario that we’ve been discussing, the initial project planning and overall scoping (the requirements gathering), a number of potential approaches were decided on. When the project then moves into the phase of research and experimentation, it is absolutely critical to set an expectation with the larger team of how long the DS team will spend on vetting each of those ideas.

Setting expectations benefits the DS team. Although it may seem counterproductive to set an arbitrary deadline on something that is wholly unknowable (which is the best solution), having a target due date can help focus the generally disorganized process of testing. Elements that under other circumstances might seem interesting to explore are ignored and marked as “”will investigate during MVP development” with the looming deadline approaching. This approach simply helps focus the work.

The expectations similarly help the business and the cross-functional team members involved in the project. They will gain not only a decision on project direction that has a higher chance of success in the end, but also a guarantee of progress in the near-term future. Remember that communication is absolutely essential to successful $M L$ project work, and setting delivery goals even for experimentation will aid in continuing to involve everyone in the process. It will only make the end result better.

For relatively simple and straightforward ML use cases (forecasting, outlier detection, clustering, and conversion prediction, for example), the amount of time dedicated to testing approaches should be relatively short. One to two weeks is typically sufficient for exploring potential solutions for standard ML; remember, this isn’t the time to build an MVP, but rather to get a general idea of the efficacy of different algorithms and methodologies.

For a far more complex use case, such as this scenario, a longer investigation period can be warranted. Two weeks alone may be needed to devote simply to the research phase, with an additional two weeks of “hacking” (roughshod scripting of testing APIs, libraries, and building crude visualizations).

The sole purpose of these phases is to decide on a path, but to make that decision in the shortest amount of time practicable. The challenge is to balance the time required to make the best adjudication possible for the problem against the timetable of delivery of the MVP.

No standard rubric exists for figuring out how long this period should be, as it is dependent on the problem, the industry, the data, the experience of the team, and the relative complexity of each option being considered. Over time, a team will gain the wisdom that will make for more accurate experimental (“hacking”) estimates. The most important point to remember is that this stage, and the communication to the business unit of how long it will take, should never be overlooked.

计算机代写|机器学习代写machine learning代考|Experimental scoping for the ML team: Research

In the heart of all ML practitioners is the desire to experiment, explore, and learn new things. With the depth and breadth of all that exists in the ML space, we could spend a lifetime learning only a fraction of what has been done, is currently being researched, and will be worked on as novel solutions to complex problems. This innate desire shared among all of us means that it is of the utmost importance to set boundaries around how long and how far we will go when researching a solution to a new problem.

In the first stages following the planning meetings and general project scoping, it’s now time to start doing some actual work. This initial stage, experimentation, can vary quite significantly among projects and implementations, but the common theme for the ML team is that it must be time-boxed. This can feel remarkably frustrating for many of us. Instead of focusing on researching a novel solution to something from the ground up, or utilizing a new technique that’s been recently developed, sometimes we are forced into a “just get it built” situation. A great way to meet that requirement of time-bound urgency is to set limits on how much time the ML team has to research possibilities for solutions.

For the recommendation engine project that we’ve been discussing in this chapter, a research path for the ML team might look something like figure 3.12.

In this simplified diagram, effective research constrains the options available to the team. After a few cursory internet searches, blog readings, and whitepaper consultations, the team can identify (typically in a day or so) the “broad strokes” for existing solutions in industry and academia.

Once the common approaches are identified (and individually curated by the team members), a full list of possibilities can be researched in more depth. Once this level of applicability and complexity is arrived at, the team can meet and discuss its findings
As figure 3.12 shows, the approaches that are candidates for testing are culled during the process of presenting findings. By the end of this adjudication phase, the team should have a solid plan of two or three options that warrant testing through prototype development.

Note the mix of approaches that the group selects. Within the selections is sufficient heterogeneity that will help aid the MVP-based decision later (if all three options are slight variations on deep learning approaches, for instance, it will be hard to decide which to go with in some circumstances).

机器学习代考

计算机代写|机器学习代写machine learning代考|What is experimental scoping?

在你开始估计一个项目需要多长时间之前，你不仅需要研究其他人是如何解决类似问题的，还需要从理论的角度研究潜在的解决方案。根据我们已经讨论过的场景，初始项目计划和总体范围界定(需求收集)，确定了许多可能的方法。当项目进入研究和实验阶段时，与更大的团队设定一个预期是至关重要的，即DS团队将花多长时间来审查每个想法。

设定期望有利于DS团队。虽然为完全不可知的事情设定一个任意的截止日期(这是最好的解决方案)似乎会适得其反，但有一个目标截止日期可以帮助专注于通常杂乱无章的测试过程。在其他情况下可能看起来很有趣的元素被忽略了，并标记为“将在MVP开发期间进行研究”。这种方法有助于集中工作。

期望同样有助于项目中涉及的业务和跨职能团队成员。他们将不仅获得对项目方向的决定，这最终有更高的成功机会，而且还保证在近期内取得进展。请记住，沟通对于成功的项目工作是绝对必要的，即使是为实验设定交付目标也有助于让每个人都参与到这个过程中来。它只会使最终结果更好。

对于相对简单和直接的ML用例(例如预测、离群值检测、聚类和转换预测)，专用于测试方法的时间应该相对较短。一到两周通常足以探索标准ML的潜在解决方案;记住，现在不是建立MVP的时候，而是要对不同算法和方法的有效性有一个大致的了解。

对于更复杂的用例，比如这个场景，可能需要更长的调查周期。可能只需要两周的时间用于研究阶段，另外两周的时间用于“黑客”(编写测试api、库的粗略脚本，以及构建粗糙的可视化)。

这些阶段的唯一目的是决定一条路径，但要在最短的可行时间内做出决定。我们面临的挑战是如何在对问题做出最佳裁决所需的时间与交付MVP的时间表之间取得平衡。

没有标准的标准来计算这个周期应该有多长，因为它取决于问题、行业、数据、团队的经验以及所考虑的每个选项的相对复杂性。随着时间的推移，团队将获得更准确的实验(“黑客”)估计的智慧。要记住的最重要的一点是，这个阶段，以及与业务单位沟通需要多长时间，永远不应该被忽视。

计算机代写|机器学习代写machine learning代考|Experimental scoping for the ML team: Research

在所有ML从业者的内心深处，都渴望实验、探索和学习新事物。凭借机器学习空间中存在的所有深度和广度，我们可以花费一生的时间来学习已经完成的，目前正在研究的，并将作为复杂问题的新解决方案而工作的一小部分。我们所有人都有这种与生俱来的渴望，这意味着在研究一个新问题的解决方案时，为自己能走多远、花多长时间设定界限是至关重要的。

在计划会议和一般项目范围界定之后的第一阶段，现在是开始做一些实际工作的时候了。这个初始阶段，实验，在不同的项目和实现中可能会有很大的不同，但ML团队的共同主题是必须有时间限制。这对我们中的许多人来说都是非常令人沮丧的。有时候，我们不是专注于从头开始研究一个新颖的解决方案，或者利用最近开发的一项新技术，而是被迫陷入“只是把它构建起来”的境地。满足时间紧迫要求的一个好方法是限制机器学习团队研究解决方案可能性的时间。

对于我们在本章中讨论的推荐引擎项目，ML团队的研究路径可能类似于图3.12。

在这个简化的图表中，有效的研究限制了团队可用的选项。经过一些粗略的互联网搜索、博客阅读和白皮书咨询，团队可以确定(通常在一天左右)工业界和学术界现有解决方案的“大致思路”。

一旦确定了通用方法(并由团队成员单独策划)，就可以更深入地研究完整的可能性列表。一旦达到了这个级别的适用性和复杂性，团队就可以开会讨论它的发现
如图3.12所示，在呈现结果的过程中，筛选候选的测试方法。在这个评判阶段结束时，团队应该有一个可靠的计划，其中包含两到三个可以通过原型开发进行测试的选项。

注意小组选择的方法组合。在选择中有足够的异质性，这将有助于以后基于mvp的决策(例如，如果所有三个选项都是深度学习方法的细微变化，那么在某些情况下将很难决定使用哪个)。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|Basic planning for a project

Posted on 2023年5月12日2023年5月12日 by statistics-lab

计算机代写|机器学习代写machine learning代考|Basic planning for a project

The planning of any ML project typically starts at a high level. A business unit, executive, or even a member of the DS team comes up with an idea of using the DS team’s expertise to solve a challenging problem. While typically little more than a concept at this early stage, this is a critical juncture in a project’s life cycle.

In the scenario we’ve been discussing, the high-level idea is personalization. To an experienced DS, this could mean any number of things. To an SME of the business unit, it could mean many of the same concepts that the DS team could think of, but it may not. From this early point of an idea to before even basic research begins, the first thing everyone involved in this project should be doing is having a meeting. The subject of this meeting should focus on one fundamental element: Why are we building this?
It may sound like a hostile or confrontational question to ask. It may take some people aback when hearing it. However, it’s one of the most effective and important questions, as it opens a discussion into the true motivations for why people want the project to be built. Is it to increase sales? Is it to make our external customers happier? Or is it to keep people browsing on the site for longer durations?

Each of these nuanced answers can help inform the goal of this meeting: defining the expectations of the output of any ML work. The answer also satisfies the measurement metric criteria for the model’s performance, as well as attribution scoring of the performance in production (the very score that will be used to measure $\mathrm{A} / \mathrm{B}$ testing much later).

In our example scenario, the team fails to ask this important why question. Figure 3.6 shows the divergence in expectations from the business side and the ML side because neither group is speaking about the essential aspect of the project and is instead occupied in mental silos of their own creating. The ML team is focusing entirely on how to solve the problem, while the business team has expectations of what would be delivered, wrongfully assuming that the ML team will “just understand it.”

计算机代写|机器学习代写machine learning代考|ASSUMPTION OF BUSINESS KNOWLEDGE

Assumption of business knowledge is a challenging issue, particularly for a company that’s new to utilizing ML, or for a business unit at a company that has never worked with its ML team before. In our example, the business leadership’s assumption was that the ML team knew aspects of the business that the leadership considered widely held knowledge. Because no clear and direct set of requirements was set out, this assumption wasn’t identified as a clear requirement. With no SME from the business unit involved in guiding the ML team during data exploration, there simply was no way for them to know this information during the process of building the MVP either.
An assumption of business knowledge is often a dangerous path to tread for most companies. At many companies, the ML practitioners are insulated from the inner workings of a business. With their focus mostly in the realm of providing advanced analytics, predictive modeling, and automation tooling, scant time can be devoted to understanding the nuances of how and why a business is run. While some obvious aspects of the business are known by all (for example, “we sell product $x$ on our website”), it is not reasonable to expect that the modelers should know that a business process exists in which some suppliers of goods would be promoted on the site over others.

A good solution for arriving at these nuanced details is to have an SME from the group that is requesting a solution be built for them (in this case, the product marketing group) explain how they decide the ordering of products on each page of the website and app. Going through this exercise would allow for everyone in the room to understand the specific rules that may be applied to govern the output of a model.
ASSUMPTION OF DATA QUALITY
The onus of duplicate product listings in the demo output is not entirely on either team. While the ML team members certainly could have planned for this to be an issue, they weren’t aware of it precisely in the scope of its impact. Even had they known, they likely would have wisely mentioned that correcting for this issue would not be a part of the demo phase (because of the volume of work required and the request that the prototype not be delayed for too long).

The principal issue here is in not planning for it. By not discussing the expectations, the business leaders’ confidence in the capabilities of the ML team erodes. The objective measure of the prototype’s success will largely be ignored as the business members focus solely on the fact that for a few users’ sample data, the first 300 recommendations show nothing but 4 products in 80 available shades and patterns.

For our use case, the ML team believed that the data they were using was, as told to them by the DE team, quite clean. Reality, for most companies, is a bit more dire than what most would think when it comes to data quality. Figure 3.8 summarizes two industry studies, conducted by IBM and Deloitte, indicating that thousands of companies are struggling with ML implementations, specifically noting problems with data cleanliness. Checking data quality before working on models is pretty important.

机器学习代考

计算机代写|机器学习代写machine learning代考|Basic planning for a project

任何ML项目的计划通常都是从高层开始的。业务部门、执行人员，甚至是DS团队的一名成员提出了使用DS团队的专业知识来解决一个具有挑战性的问题的想法。虽然在早期阶段通常只是一个概念，但这是项目生命周期中的关键时刻。

在我们讨论的场景中，高级概念是个性化。对于一个经验丰富的DS来说，这可能意味着很多事情。对于业务单元的SME来说，它可能意味着DS团队可以想到的许多相同的概念，但也可能不是。从这个想法的早期阶段到基础研究开始之前，参与这个项目的每个人都应该做的第一件事就是开会。这次会议的主题应该集中在一个基本要素上:我们为什么要建设这个?
这听起来像是一个充满敌意或对抗性的问题。听到这句话可能会让一些人大吃一惊。然而，这是最有效和最重要的问题之一，因为它开启了关于人们为什么想要构建项目的真正动机的讨论。是为了增加销量吗?是为了让我们的外部客户更快乐吗?还是为了让人们在网站上浏览更长时间?

这些细微的答案都可以帮助告知这次会议的目标:定义任何ML工作输出的期望。答案还满足模型性能的度量度量标准，以及生产中性能的归属评分(该分数将用于稍后测量$\mathrm{A} / \mathrm{B}$测试)。

在我们的示例场景中，团队没有问这个重要的“为什么”问题。图3.6显示了业务端和机器学习端期望的差异，因为两组都没有谈论项目的基本方面，而是被自己创造的思维筒仓所占据。机器学习团队完全专注于如何解决问题，而业务团队则对交付的内容抱有期望，错误地假设机器学习团队将“理解它”。

计算机代写|机器学习代写machine learning代考|ASSUMPTION OF BUSINESS KNOWLEDGE

业务知识的假设是一个具有挑战性的问题，特别是对于一个新使用ML的公司，或者对于一个从未与ML团队合作过的公司的业务部门。在我们的示例中，业务领导的假设是ML团队了解领导认为广泛掌握的业务方面。因为没有明确和直接的需求集，所以这个假设没有被确定为一个明确的需求。在数据探索过程中，没有来自业务部门的中小企业参与指导ML团队，他们在构建MVP的过程中也没有办法了解这些信息。
对大多数公司来说，假定自己具备商业知识往往是一条危险的道路。在许多公司，机器学习从业者与企业的内部运作是隔离的。由于他们主要关注的是提供高级分析、预测建模和自动化工具，因此很少有时间用于理解业务运行方式和原因的细微差别。虽然所有人都知道业务的一些明显方面(例如，“我们在我们的网站上销售产品x”)，但是期望建模者知道存在这样的业务流程是不合理的，在这个业务流程中，一些商品供应商将在网站上比其他供应商更受欢迎。

获得这些微妙细节的一个好的解决方案是，让一个来自请求为他们构建解决方案的小组(在本例中，是产品营销小组)的中小企业解释他们如何决定网站和应用程序的每个页面上的产品排序。通过这个练习，可以让房间里的每个人都理解可能应用于管理模型输出的特定规则。
数据质量假设
演示输出中重复产品列表的责任并不完全由任何一个团队承担。虽然ML团队成员当然可以计划这是一个问题，但他们并没有确切地意识到它的影响范围。即使他们知道，他们也可能会明智地提到，纠正这个问题不会是演示阶段的一部分(因为需要大量的工作，并且要求原型不能延迟太久)。

这里的主要问题是没有为它做计划。如果不讨论期望，业务领导者对ML团队能力的信心就会受到侵蚀。原型成功与否的客观衡量将在很大程度上被忽略，因为业务成员只关注这样一个事实:对于少数用户的样本数据，前300个用户推荐

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|Development

Posted on 2023年5月12日2023年5月12日 by statistics-lab

计算机代写|机器学习代写machine learning代考|Development

Having a poor development practice for ML projects can manifest itself in a multitude of ways that can completely kill a project. Though usually not as directly visible as some of the other leading causes, having a fragile and poorly designed code base and poor development practices can make a project harder to work on, easier to break in production, and far harder to improve as time goes on.

For instance, let’s look at a rather simple and frequent modification situation that comes up during the development of a modeling solution: changes to the feature engineering. In figure 1.9 , we see two data scientists attempting to make a set of changes in a monolithic code base. In this development paradigm, all the logic for the entire job is written in a single notebook through scripted variable declarations and functions.

Julie, in the monolithic code base, will likely have a lot of searching and scrolling to do, finding each individual location where the feature vector is defined and adding her new fields to collections. Her encoding work will need to be correct and carried throughout the script in the correct places as well. It’s a daunting amount of work for any sufficiently complex ML code base (as the number of code lines for feature engineering and modeling combined can reach to the thousands if developed in a scripting paradigm) and is prone to frustrating errors in the form of omissions, typos, and other transcription mistakes.

Joe, meanwhile, has far fewer edits to do. But he is still subject to the act of searching through the long code base and relying on editing the hardcoded values correctly.
The real problem with the monolithic approach comes when they try to incorporate each of their changes into a single copy of the script. As they have mutual dependencies on each other’s work, both will have to update their code and select one of their copies to serve as a master for the project, copying in the changes from the other’s work. This long and arduous process wastes precious development time and likely will require a great deal of debugging to get correct.

Figure 1.10 shows a different approach to maintaining an ML project’s code base. This time, a modularized code architecture separates the tight coupling that is present within the large script from figure 1.9.

计算机代写|机器学习代写machine learning代考|Deployment

Not planning a project around a deployment strategy is like having a dinner party without knowing how many guests are showing up. You’ll either be wasting money or ruining experiences.
Perhaps the most confusing and complex part of ML project work for newer teams is in how to build a cost-effective deployment strategy. If it’s underpowered, the prediction quality doesn’t matter (since the infrastructure can’t properly serve the predictions). If it’s overpowered, you’re effectively burning money on unused infrastructure and complexity.

As an example, let’s look at an inventory optimization problem for a fast-food company. The DS team has been fairly successful in serving predictions for inventory management at region-level groupings for years, running large batch predictions for the per-day demands of expected customer counts at a weekly level, and submitting forecasts as bulk extracts each week. Up until this point, the DS team has been accustomed to an ML architecture that effectively looks like that shown in figure 1.11 .

This relatively standard architecture for serving up scheduled batch predictions focuses on exposing results to internal analytics personnel who provide guidance on quantities of materials to order. This prediction-serving architecture isn’t particularly complex and is a paradigm that the DS team members are familiar with. With the scheduled synchronous nature of the design, as well as the large amounts of time between subsequent retraining and inference, the general sophistication of their technology stack doesn’t have to be particularly high (which is a good thing; see the following sidebar).

As the company realizes the benefits of predictive modeling over time with these batch approaches, its faith in the DS team increases. When a new business opportunity arises that requires near-real-time inventory forecasting at a per-store level, company executives ask the DS team to provide a solution for this use case.

机器学习代考

计算机代写|机器学习代写machine learning代考|Development

对于ML项目来说，糟糕的开发实践可能会以多种方式表现出来，从而彻底扼杀一个项目。虽然通常不像其他一些主要原因那样直接可见，但拥有一个脆弱和设计不良的代码库和糟糕的开发实践会使项目更难进行，更容易在生产中中断，并且随着时间的推移更难改进。

例如，让我们看看在建模解决方案的开发过程中出现的一个相当简单和频繁的修改情况:对特征工程的更改。在图1.9中，我们看到两个数据科学家试图在一个单一的代码库中做一组更改。在这种开发范例中，整个作业的所有逻辑都通过脚本变量声明和函数写在一个笔记本中。

在整体代码库中，Julie可能需要进行大量的搜索和滚动，找到每个定义了特征向量的单独位置，并将她的新字段添加到集合中。她的编码工作需要是正确的，并在整个剧本中正确的地方进行。对于任何足够复杂的ML代码库来说，这都是一项令人生畏的工作(因为如果在脚本范例中开发，用于特征工程和建模的代码行数可以达到数千行)，并且容易出现令人沮丧的错误，如遗漏、拼写错误和其他转录错误。

与此同时，乔要做的编辑要少得多。但是他仍然受制于搜索长代码库的行为，并依赖于正确编辑硬编码值。
当他们试图将每个更改合并到脚本的单个副本中时，整体方法的真正问题就出现了。由于它们相互依赖于彼此的工作，因此双方都必须更新自己的代码，并选择其中一个副本作为项目的主副本，从对方的工作中复制更改。这个漫长而艰巨的过程浪费了宝贵的开发时间，并且可能需要进行大量的调试才能获得正确的结果。

图1.10显示了维护ML项目代码库的另一种方法。这一次，模块化的代码体系结构分离了图1.9中大脚本中出现的紧密耦合。

计算机代写|机器学习代写machine learning代考|Deployment

不根据部署策略来规划项目就像举办一个不知道有多少客人出席的晚宴。你要么在浪费钱，要么毁了你的经历。
对于新团队来说，ML项目工作中最令人困惑和复杂的部分可能是如何构建具有成本效益的部署策略。如果它的功能不足，预测质量就无关紧要了(因为基础设施不能正确地为预测服务)。如果它过于强大，你就会在未使用的基础设施和复杂性上有效地烧钱。

作为一个例子，让我们看看一家快餐公司的库存优化问题。多年来，DS团队在为区域级分组的库存管理提供预测服务方面相当成功，在每周水平上对预期客户数量的每日需求进行大量预测，并每周以批量摘要的形式提交预测。到目前为止，DS团队已经习惯了如图1.11所示的ML体系结构。

这种提供计划批预测的相对标准的体系结构侧重于向内部分析人员公开结果，这些分析人员提供关于订购材料数量的指导。这种预测服务体系结构不是特别复杂，是DS团队成员熟悉的范例。由于设计的计划同步性质，以及随后的再训练和推理之间的大量时间，他们的技术堆栈的一般复杂性不必特别高(这是一件好事;请参阅下面的侧栏)。

随着时间的推移，公司意识到这些批处理方法的预测建模的好处，它对DS团队的信心也在增加。当出现需要在每个商店级别进行近乎实时的库存预测的新业务机会时，公司高管要求DS团队为该用例提供解决方案。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

机器学习代写|流形学习代写manifold data learning代考|ICML2022

Posted on 2023年2月10日2023年2月10日 by statistics-lab

如果你也在怎样代写流形学习manifold data learning这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

流形学习是机器学习的一个流行且快速发展的子领域，它基于一个假设，即一个人的观察数据位于嵌入高维空间的低维流形上。本文介绍了流形学习的数学观点，深入探讨了核学习、谱图理论和微分几何的交叉点。重点放在图和流形之间的显著相互作用上，这构成了流形正则化技术的广泛使用的基础。

statistics-lab™ 为您的留学生涯保驾护航在代写流形学习manifold data learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写流形学习manifold data learning代写方面经验极为丰富，各种代写流形学习manifold data learning相关的作业也就用不着说。

我们提供的流形学习manifold data learning及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

机器学习代写|流形学习代写manifold data learning代考|ICML2022

机器学习代写|流形学习代写manifold data learning代考|History of Probabilistic Dimensionality Reduction

The probabilistic variants of many of the spectral dimensionality reduction methods were gradually developed and proposed. For example, probabilistic $P C A[50,62]$ is the stochastic version of PCA, and it assumes that data are obtained by the addition of noise to the linear projection of a latent random variable. It can be demonstrated that PCA is a special case of probabilistic PCA, where the variance of noise tends to zero. The probabilistic PCA, itself, is a special case of factor analysis [17]. In the factor analysis, the different dimensions of noise can be correlated, while they are uncorrelated and isometric in the probabilistic PCA. The factor analysis and the probabilistic PCA use expectation maximization and variational inference on the data.

The linear spectral dimensionality reduction methods, such as PCA and FDA, learn a projection matrix from the data for better data representation or more separation between classes. In 1984, it was surprisingly determined that if a random matrix is used for the linear projection of the data, without being learned from the data, it represents data well! The correctness of this mystery of random projection was proven by the Johnson-Lindenstrauss lemma [33], which put a bound on the error of preservation of the distances in the subspace. Later, nonlinear variants of random projection were developed, including random Fourier features [48] and random kitchen sinks [49].

Sufficient Dimension Reduction $(S D R)$ is another family of probabilistic methods, whose first method was proposed in 1991 [39]. It is used for finding a transformation of data to a lower-dimensional space, which does not change the conditional of labels given data. Therefore, the subspace is sufficient for predicting labels from projected data onto the subspace. SDR was mainly proposed for high-dimensional regression, where the regression labels are used. Later, Kernel Dimensionality Reduction $(K D R)$ was proposed [18] as a method in the family of SDR for dimensionality reduction in machine learning.

Stochastic Neighbour Embedding (SNE) was proposed in 2003 [30] and took a probabilistic approach to dimensionality reduction. It attempted to preserve the probability of a point being a neighbour of others in the subspace. A problem with SNE was that it could not find an optimal subspace because it was not possible for it to preserve all the important information of high-dimensional data in the low-dimensional subspace. Therefore, t-SNE was proposed [41], which used another distribution with more capacity in the subspace. This allowed t-SNE to preserve a larger amount of information in the low-dimensional subspace. A recent successful probabilistic dimensionality reduction method is the Uniform Manifold Approximation and Projection (UMAP) [43], which is widely used for data visualization. Today, both t-SNE and UMAP are used for high-dimensional data visualization, especially in the visualization of extracted features in deep learning. They have also been widely used for visualizing high-dimensional genome data.

机器学习代写|流形学习代写manifold data learning代考|History of Neural Network-Based Dimensionality

Neural networks are machine learning models modeled after the neural structure of the human brain. Neural networks are currently powerful tools for representation learning and dimensionality reduction. In the 1990s, researchers’ interest in neural networks decreased; this was called the winter of neural networks. This winter occurred mainly because networks could not become deep, as gradients vanished after many layers of network during optimization. The success of kernel support vector machines [10] also exaggerated this winter. In 2006, Hinton and Salakhutdinov demonstrated that a network’s weights can be initialized using energy-based training, where the layers of the network are considered stacks of Restricted Boltzmann Machines (RBM) [1,31]. RBM is a two-layer structure of neurons, whose weights between the two layers are trained using maximum likelihood estimation [68]. This initialization saved the neural network from the vanishing gradient problem and ended the neural networks’ winter. A deep network using RBM training was named the deep belief network [29]. Although, later, the proposal of the ReLU activation function [23] and the dropout technique [59] made it possible to train deep neural networks with random initial weights [24].

In fundamental machine learning, people often extract features using traditional dimensionality reduction and then apply the classification, regression, or clustering task afterwards. However, modern deep learning extracts features and learns embedding spaces in the layers of the network; this process is called end-to-end. Therefore, deep learning can be seen as performing a form of dimensionality reduction as part of its model. One problem with end-to-end models is that they are harder to troubleshoot if the performance is not satisfactory on a part of the data. The insights and meaning of the data coming from representation learning are critical to fully understand a model’s performance. Some of these insights can be useful for improving or understanding how deep neural networks operate. Researchers often visualize the extracted features of a neural network to interpret and analyze why deep learning is working properly on their data.

Deep metric learning [35] utilizes deep neural networks for extracting lowdimensional descriptive features from data at the last or one-to-last layer of the network. Siamese networks [11] are important network structures for deep metric learning. They contain several identical networks that share their weights, but have different inputs. Contrastive loss [27] and triplet loss [56] are two well-known loss functions that were proposed for training Siamese networks. Deep reconstruction autoencoders also make it possible to capture informative features at the bottleneck between the encoder and decoder.

流形学习代写

机器学习代写|流形学习代写manifold data learning代考|History of Probabilistic Dimensionality Reduction

许多谱降维方法的概率变体逐渐被开发和提出。例如，概率PCA[50,62]是 PCA 的随机版本，它假设数据是通过将噪声添加到潜在随机变量的线性投影中获得的。可以证明 PCA 是概率 PCA 的一个特例，其中噪声的方差趋于零。概率 PCA 本身是因子分析的一个特例 [17]。在因子分析中，噪声的不同维度可以相关，而在概率PCA中它们是不相关和等距的。因子分析和概率 PCA 对数据使用期望最大化和变分推理。

PCA 和 FDA 等线性光谱降维方法从数据中学习投影矩阵，以实现更好的数据表示或更好的类间分离。1984年，令人惊奇地确定，如果用一个随机矩阵来做数据的线性投影，不用从数据中学习，就可以很好地表示数据！Johnson-Lindenstrauss 引理 [33] 证明了这种随机投影之谜的正确性，该引理限制了子空间中距离保存的误差。后来，开发了随机投影的非线性变体，包括随机傅立叶特征 [48] 和随机厨房水槽 [49]。

足够的降维(小号丁R)是另一类概率方法，其第一个方法于 1991 年提出 [39]。它用于寻找数据到低维空间的转换，这不会改变给定数据的标签条件。因此，子空间足以预测来自投影数据到子空间的标签。SDR 主要是为高维回归提出的，其中使用了回归标签。后来，内核降维(钾丁R)被提议 [18] 作为 SDR 家族中的一种方法，用于机器学习中的降维。

随机邻域嵌入（SNE）于 2003 年提出 [30]，并采用概率方法来降维。它试图保留一个点作为子空间中其他点的邻居的概率。SNE 的一个问题是它找不到最佳子空间，因为它不可能在低维子空间中保留高维数据的所有重要信息。因此，t-SNE被提出[41]，它使用了另一种子空间容量更大的分布。这允许 t-SNE 在低维子空间中保留大量信息。最近成功的概率降维方法是均匀流形近似和投影（UMAP）[43]，它被广泛用于数据可视化。今天，t-SNE和UMAP都用于高维数据可视化，特别是深度学习中提取特征的可视化。它们还被广泛用于可视化高维基因组数据。

机器学习代写|流形学习代写manifold data learning代考|History of Neural Network-Based Dimensionality

神经网络是模仿人脑神经结构的机器学习模型。神经网络目前是表示学习和降维的强大工具。20 世纪 90 年代，研究人员对神经网络的兴趣下降；这被称为神经网络的冬天。这个冬天的发生主要是因为网络无法变深，因为在优化过程中，梯度在多层网络之后消失了。内核支持向量机 [10] 的成功也在这个冬天被夸大了。2006 年，Hinton 和 Salakhutdinov 证明了可以使用基于能量的训练来初始化网络的权重，其中网络的层被视为受限玻尔兹曼机 (RBM) 的堆栈 [1,31]。RBM是神经元的两层结构，其两层之间的权重使用最大似然估计 [68] 进行训练。这种初始化将神经网络从梯度消失问题中解救出来，结束了神经网络的寒冬。使用 RBM 训练的深度网络被命名为深度信念网络 [29]。尽管后来，ReLU 激活函数 [23] 和 dropout 技术 [59] 的提出使训练具有随机初始权重的深度神经网络成为可能 [24]。

在基础机器学习中，人们通常使用传统的降维方法提取特征，然后再应用分类、回归或聚类任务。然而，现代深度学习提取特征并学习网络层中的嵌入空间；这个过程称为端到端。因此，深度学习可以被视为执行一种形式的降维，作为其模型的一部分。端到端模型的一个问题是，如果部分数据的性能不令人满意，则很难对其进行故障排除。来自表示学习的数据的见解和意义对于充分理解模型的性能至关重要。其中一些见解可能有助于改进或理解深度神经网络的运作方式。

深度度量学习 [35] 利用深度神经网络从网络最后一层或最后一层的数据中提取低维描述性特征。孪生网络 [11] 是深度度量学习的重要网络结构。它们包含几个共享权重但具有不同输入的相同网络。Contrastive loss [27] 和 triplet loss [56] 是两个众所周知的损失函数，它们被提议用于训练 Siamese 网络。深度重建自动编码器还可以在编码器和解码器之间的瓶颈处捕获信息特征。

机器学习代写|流形学习代写manifold data learning代考请认准statistics-lab™

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

机器学习代写|流形学习代写manifold data learning代考|SCl7314

Posted on 2023年2月10日2023年2月10日 by statistics-lab

如果你也在怎样代写流形学习manifold data learning这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的流形学习manifold data learning及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

机器学习代写|流形学习代写manifold data learning代考|SCl7314

机器学习代写|流形学习代写manifold data learning代考|Dimensionality Reduction and Manifold Learning

Feature extraction is also referred to as dimensionality reduction, manifold learning [12], subspace learning, submanifold learning, manifold unfolding, embedding, encoding, and representation learning [7, 70]. This book uses manifold learning and dimensionality reduction interchangeably for feature extraction. Manifold learning techniques can be used in a variety of ways, including:

Data dimensionality reduction: Produce a compact (compressed) lowdimensional encoding of a given high-dimensional dataset.
Data visualization: Provide an interpretation of a given dataset in terms of intrinsic degrees of freedom, usually as a byproduct of data dimensionality reduction.
Preprocessing for supervised learning: Simplify, reduce, and clean the data for subsequent supervised training.

In dimensionality reduction, the data points are mapped to a lower-dimensional subspace either linearly or nonlinearly. Dimensionality reduction methods can be grouped into three categories-spectral dimensionality reduction, probabilistic dimensionality reduction, and (artificial) neural network-based dimensionality reduction [19] (see Fig.1.3). These categories are introduced in the following subsections.

Consider several points in a three-dimensional Euclidean space. Assume these points lie on a nonlinear submanifold with a local dimensionality of two, as illustrated in Fig. 1.4a. This means that there is no need to have three features to represent each of these points. Rather, two features can represent most of the data points’ information if the two features demonstrate the 2D coordinates on the submanifold. A nonlinear dimensionality reduction method can unfold this manifold correctly, as depicted in Fig. 1.4b. However, a linear dimensionality reduction method cannot properly find a correct underlying $2 \mathrm{D}$ representation of the data points. Figure $1.4 \mathrm{c}$ demonstrates that a linear method ruins the relative structure of the data points. This is because a linear method uses the Euclidean distances between the points, while a nonlinear method considers the geodesic distances along the nonlinear submanifold. If the submanifold is linear, the linear method is able to obtain the lower-dimensional structure of the data. Spectral dimensionality reduction methods typically have a geometric perspective and attempt to find the linear or nonlinear submanifold of the data. These methods are often reduced to a generalized eigenvalue problem [20].

机器学习代写|流形学习代写manifold data learning代考|History of Spectral Dimensionality Reduction

Principal Component Analysis (PCA) [34] was first proposed by Pearson in 1901 [47]. It was the first spectral dimensionality reduction method and one of the first methods in linear subspace learning. It is unsupervised, meaning that it does not use any class labels. Fisher Discriminant Analysis (FDA) [16], proposed by Fisher in 1936 [15], was the first supervised spectral dimensionality reduction method. PCA and FDA are based on the scatter, i.e., variance of the data. A proper subspace preserves either the relative similarity or relative dissimilarity of the data points after transformation of data from the input space to the subspace. This was the goal of Multidimensional Scaling (MDS) [13], which preserves the relative similarities of data points in its subspace. In later MDS approaches, the cost function was changed to preserve the distances between points [37], which developed into Sammon mapping [52]. Sammon mapping is considered to be the first nonlinear dimensionality reduction method.

Figure $1.4$ demonstrates that a linear algorithm cannot perform well on nonlinear data. For nonlinear data, two approaches can be used:

a nonlinear algorithm should be designed to handle nonlinear data, or
the nonlinear data should be modified to become linear. In this case, the data should be transformed to another space to become linearly separable in that space. Then, the transformed data, which now have a linear pattern, will be able to use a linear approach. This approach is called kernelization in machine learning.
The kernel PCA $[54,55]$ uses the PCA and the kernel trick [32] to transform data to a high-dimensional space so that it becomes roughly linear within that space. Kernel FDA $[44,45]$ was also proposed to manipulate nonlinear data in a supervised manner using representation theory [3]. Representation theory can be used for kernelization; it will be introduced in Chap. 3 .

流形学习代写

机器学习代写|流形学习代写manifold data learning代考|Dimensionality Reduction and Manifold Learning

特征提取也称为降维、流形学习 [12]、子空间学习、子流形学习、流形展开、嵌入、编码和表示学习 [7, 70]。本书交替使用流形学习和降维来进行特征提取。流形学习技术可以以多种方式使用，包括：

数据降维：为给定的高维数据集生成紧凑（压缩）的低维编码。
数据可视化：根据内在自由度提供给定数据集的解释，通常作为数据降维的副产品。
Preprocessing for supervised learning：简化、减少和清洗数据，用于后续的监督训练。

在降维中，数据点被线性或非线性映射到低维子空间。降维方法可以分为三类——谱降维、概率降维和基于（人工）神经网络的降维[19]（见图1.3）。这些类别在以下小节中介绍。

考虑三维欧几里德空间中的几个点。假设这些点位于局部维数为 2 的非线性子流形上，如图 1.4a 所示。这意味着不需要三个特征来表示这些点中的每一个。相反，如果这两个特征展示了子流形上的二维坐标，则这两个特征可以表示大部分数据点的信息。如图 1.4b 所示，非线性降维方法可以正确展开该流形。然而，线性降维方法无法正确找到正确的底层2丁数据点的表示。数字1.4C表明线性方法破坏了数据点的相对结构。这是因为线性方法使用点之间的欧几里得距离，而非线性方法考虑沿非线性子流形的测地线距离。如果子流形是线性的，则线性方法能够获得数据的低维结构。谱降维方法通常具有几何视角，并试图找到数据的线性或非线性子流形。这些方法通常被简化为广义特征值问题 [20]。

机器学习代写|流形学习代写manifold data learning代考|History of Spectral Dimensionality Reduction

主成分分析（PCA）[34]最早由 Pearson 于 1901 年提出[47]。它是第一个谱降维方法，也是线性子空间学习中最早的方法之一。它是无监督的，这意味着它不使用任何类别标签。Fisher判别分析（FDA）[16]是由Fisher于1936年[15]提出的，是第一个有监督的谱降维方法。PCA 和 FDA 基于散点，即数据的方差。在将数据从输入空间转换到子空间之后，适当的子空间保留数据点的相对相似性或相对不相似性。这是多维缩放 (MDS) [13] 的目标，它保留了其子空间中数据点的相对相似性。在后来的 MDS 方法中，更改成本函数以保留点之间的距离 [37]，这发展成 Sammon 映射 [52]。Sammon映射被认为是第一个非线性降维方法。

数字1.4表明线性算法不能很好地处理非线性数据。对于非线性数据，可以使用两种方法：

应该设计一个非线性算法来处理非线性数据，或者
非线性数据应修改为线性。在这种情况下，应该将数据转换到另一个空间，使其在该空间中线性可分。然后，现在具有线性模式的转换数据将能够使用线性方法。这种方法在机器学习中称为内核化。
内核PCA[54,55]使用 PCA 和内核技巧 [32] 将数据转换到高维空间，使其在该空间内大致呈线性。内核FDA[44,45]还提出了使用表示论 [3] 以监督方式操纵非线性数据。表示论可用于核化；将在第 1 章介绍。3.

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

机器学习代写|流形学习代写manifold data learning代考|EECS559

Posted on 2023年2月10日2023年2月10日 by statistics-lab

如果你也在怎样代写流形学习manifold data learning这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的流形学习manifold data learning及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

机器学习代写|流形学习代写manifold data learning代考|EECS559

机器学习代写|流形学习代写manifold data learning代考|Manifold Hypothesis

Each feature of a data point does not carry an equal amount of information. For example, some pixels of an image are background regions with limited information, while other pixels contain important objects that describe the scene in the image. This means that data points can be significantly compressed to preserve the most informative features while eliminating those with limited information. In other words, the $d$-dimensional data points of a dataset usually do not cover the entire $d$ dimensional Euclidean space, but they lie on a specific lower-dimensional structure in the space.

Consider the illustration in Fig. 1.1, where several three-dimensional points exist in $\mathbb{R}^3$. These points can represent any measurement, such as personal health measurements, including blood pressure, blood sugar, and blood fat. As demonstrated in Fig. 1.1, the points of the dataset have a structure in a two-dimensional space. The three-dimensional Euclidean space is called the input space, and the twodimensional space, which has a lower dimensionality than the input space, is called the subspace, the submanifold, or the embedding space. The subspace can be either linear or nonlinear, depending on whether a linear (hyper)plane passes through the points. Usually, subspace and submanifold are used for linear and nonlinear lowerdimensional spaces, respectively. Linear and nonlinear subspaces are depicted in Fig. 1.1a and b, respectively.

Whether the points of a dataset lie on a space is a hypothesis, but this hypothesis is usually true because the data points typically represent a natural signal, such as an image. When the data acquisition process is natural, the data will have a define structure. For example, in the dataset where there are multiple images from different angles depicting the same scene, the objects of the scene remain the same, but the point of view changes (see Fig. 1.2). This hypothesis is called the manifold hypothesis [14]. Its formal definition is as follows. According to the manifold hypothesis, data points of a dataset lie on a submanifold or subspace with lower dimensionality. In other words, the dataset in $\mathbb{R}^d$ lies on an embedded submanifold [38] with local dimensionality less than $d$ [14]. According to this hypothesis, the data points most often lie on a submanifold with high probability [64].

机器学习代写|流形学习代写manifold data learning代考|Feature Engineering

Due to the manifold hypothesis, a dataset can be compressed while preserving most of the important information. Therefore, engineering and processing can be applied to the features for the sake of compression [4]. Feature engineering can be seen as a preprocessing stage, where the dimensionality of the data is reduced. Assume $d$ and $p$ denote the dimensionality of the input space and the subspace, respectively, where $p \in(0, d]$. Feature engineering is a map from a $d$-dimensional Euclidean space to a $p$-dimensional Euclidean space, i.e., $\mathbb{R}^d \rightarrow \mathbb{R}^p$. The dimensionality of the subspace is usually much smaller than the dimensionality of the space, i.e. $p \ll d$, because most of the information usually exists in only a few features.

Feature engineering is divided into two broad approaches-feature selection and feature extraction [22]. In feature selection, the $p$ most informative features of the $d$-dimensional data vector are selected so the features of the transformed data points are a subset of the original features. In feature extraction, however, the $d$-dimensional data vector is transformed to a $p$-dimensional data vector, where the $p$ new features are completely different from the original features. In other words, data points are represented in another lower-dimensional space. Both feature selection and feature extraction are used for compression, which results in either the better discrimination of classes or better representation of data. In other words, the compressed data by feature engineering may have a better representation of the data or may separate the classes of data. This book concentrates on feature extraction.

流形学习代写

机器学习代写|流形学习代写manifold data learning代考|Manifold Hypothesis

数据点的每个特征并不携带等量的信息。例如，图像的一些像素是信息有限的背景区域，而其他像素包含描述图像中场景的重要对象。这意味着可以显着压缩数据点以保留信息最多的特征，同时消除信息有限的特征。换句话说，d数据集的维数据点通常不会覆盖整个d维欧几里德空间，但它们位于空间中特定的低维结构上。

考虑图 1.1 中的图示，其中存在几个三维点R3. 这些点可以表示任何测量值，例如个人健康测量值，包括血压、血糖和血脂。如图 1.1 所示，数据集的点在二维空间中具有结构。三维欧氏空间称为输入空间，维数低于输入空间的二维空间称为子空间、子流形或嵌入空间。子空间可以是线性的也可以是非线性的，这取决于线性（超）平面是否通过这些点。通常，子空间和子流形分别用于线性和非线性低维空间。线性和非线性子空间分别如图 1.1a 和 b 所示。

数据集的点是否位于空间上是一个假设，但这个假设通常是正确的，因为数据点通常表示自然信号，例如图像。当数据采集过程自然时，数据将具有定义的结构。例如，在有多张不同角度的图像描绘同一场景的数据集中，场景的对象保持不变，但视角发生变化（见图1.2）。这个假设被称为流形假设[14]。它的正式定义如下。根据流形假设，数据集的数据点位于较低维度的子流形或子空间上。换句话说，数据集在Rd位于局部维数小于的嵌入子流形 [38] 上d[14]。根据这个假设，数据点最常位于子流形上的概率很高 [64]。

机器学习代写|流形学习代写manifold data learning代考|Feature Engineering

由于流形假设，可以在保留大部分重要信息的同时压缩数据集。因此，为了压缩 [4]，可以对特征应用工程和处理。特征工程可以看作是一个预处理阶段，其中数据的维度被降低。认为d和p分别表示输入空间和子空间的维数，其中p∈(0,d]. 特征工程是一张来自d维欧几里德空间到p-维欧几里德空间，即Rd→Rp. 子空间的维数通常远小于空间的维数，即p≪d，因为大部分信息通常只存在于少数特征中。

特征工程分为两大类——特征选择和特征提取[22]。在特征选择中，p最有信息量的特征d选择维数据向量，因此转换数据点的特征是原始特征的子集。然而，在特征提取中，d维数据向量被转换为p维数据向量，其中p新功能与原来的功能完全不同。换句话说，数据点在另一个低维空间中表示。特征选择和特征提取都用于压缩，这可以更好地区分类别或更好地表示数据。换句话说，通过特征工程压缩的数据可能具有更好的数据表示或可能分离数据的类别。本书专注于特征提取。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写