分类：机器学习代考

计算机代写|机器学习代写machine learning代考|COMP3308

Posted on 2023年8月23日2023年10月10日 by statistics-lab

如果你也在怎样代写机器学习Machine Learning 这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。机器学习Machine Learning令人兴奋。这是有趣的，具有挑战性的，创造性的，和智力刺激。它还为公司赚钱，自主处理大量任务，并从那些宁愿做其他事情的人那里消除单调工作的繁重任务。

机器学习Machine Learning也非常复杂。从数千种算法、数百种开放源码包，以及需要具备从数据工程(DE)到高级统计分析和可视化等各种技能的专业实践者，ML专业实践者所需的工作确实令人生畏。增加这种复杂性的是，需要能够与广泛的专家、主题专家(sme)和业务单元组进行跨功能工作——就正在解决的问题的性质和ml支持的解决方案的输出进行沟通和协作。

statistics-lab™ 为您的留学生涯保驾护航在代写机器学习 machine learning方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写机器学习 machine learning代写方面经验极为丰富，各种代写机器学习 machine learning相关的作业也就用不着说。

计算机代写|机器学习代写machine learning代考|Bias and Variance Analysis

Bias and variance analysis plays an important role in ensemble classification research due to the framework it provides for classifier prediction error decomposition.

Initially, Geman has decomposed prediction error into bias and variance terms under the regression setting using squared-error loss [10]. This decomposition brings about the fact that a decrease/increase in the prediction error rate is caused by a decrease/increase in bias, or in variance, or in both. Extensions of the analysis have been carried out on the classification setting, and later applied on different ensemble classifiers in order to analyse the reason behind their success over single classifiers. It has been shown that the reason for most of the ensembles to have lower prediction error rates is due to the reductions they offer in sense of both bias and variance.
However, the extension of the original theoretical analysis on regression has been done in various ways by different researchers for classification; and there is no standard definition accepted. Therefore, the results of the analyses also differ from each other slightly. Some of the definitions/frameworks that have gained interest within the research field are given by Breiman [3], Kohavi and Wolpert [15], Dietterich and Kong [16], Friedman [9], Wolpert [25], Heskes [11], Tibshirani [19], Domingos [6] and James [13].

Although there are dissimilarities in-between the frameworks, the main intuitions behind each are similar. Consider a training set $T$ with patterns $\left(x_i, l_i\right)$, where $x$ represents the feature vector and $l$ the corresponding label. Given a test pattern, an optimal classifier model predicts a decision label by assuring the lowest expected loss over all possible target label values. This classifier, which is actually the Bayes classifier when used with the zero-one loss function, is supposed to know and use the underlying likelihood probability distribution for the input dataset patterns/classes. If we call the decision of the optimal classifier as the optimal decision $(O D)$, then for a given test pattern $\left(x_i, l_i\right), O D=\operatorname{argmin}_\alpha E_t[L(t, \alpha)]$ where $L$ denotes the loss function used, and $l$ the possible target label values.

The estimator, on the other hand, is actually an averaged classifier model. It predicts a decision label by assuring the lowest expected loss over all labels that are created by classifiers trained on different training sets. The intrinsic parameters of these classifiers are usually the same, and the only difference is the training sets that they are trained on. In this case, instead of minimizing the loss over the target labels using the known underlying probability distribution as happens in the optimal classifier case, the minimization of the loss is carried out for the set of labels which are created by the classifiers trained on various training sets. If the decision of the estimator is named as the expected estimator decision $(E E D)$; then for a given test pattern $\left(x_i, l_i\right), E E D=\operatorname{argmin}_\alpha E_l[L(l, \alpha)]$ where $L$ denotes the loss function used, and $l$ the label values obtained from the classifiers used. For regression under the squared-error loss setting, the $O D$ is the mean of the target labels while the EED is the mean of the classifier decisions obtained via different training sets.

计算机代写|机器学习代写machine learning代考|Bias and Variance Analysis of James

James [13] extends the prediction error decomposition, which is initially proposed by Geman et al [10] for squared error under regression setting, for all symmetric loss functions. Therefore, his definition also covers zero-one loss under classification setting, which we use in the experiments.

In his decomposition, the terms systematic effect (SE) and variance effect (VE) satisfy the additive decomposition for all symmetric loss functions, and for both real valued and categorical predictors. They actually indicate the effect of bias and variance on the prediction error. For example, a negative $V E$ would mean that variance actually helps reduce the prediction error. On the other hand, the bias term is defined to show the average distance between the response and the predictor; and the variance term refers to the variability of the predictor. As a result, both the meanings and the additive characteristics of the bias and variance concepts of the original setup have been preserved. Following is a summary of the bias-variance derivations of James:

For any symmetric loss function $L$, where $L(a, b)=L(b, a)$ :
$$
\begin{aligned}
E_{Y, \tilde{Y}}[L(Y, \tilde{Y})]= & E_Y[L(Y, S Y)]+E_Y[L(Y, S \tilde{Y})-L(Y, S Y)] \
& +E_{Y, \tilde{Y}}[L(Y, \tilde{Y})-L(Y, S \tilde{Y})] \
\text { predictionerror }= & \operatorname{Var}(Y)+S E(\tilde{Y}, Y)+\operatorname{VE}(\tilde{Y}, Y)
\end{aligned}
$$
where $L(a, b)$ is the loss when $b$ is used in predicting $a, Y$ is the response and $\tilde{Y}$ is the predictor. $S Y=\operatorname{argmin}\mu E_Y[L(Y, \mu)]$ and $S \tilde{Y}=\operatorname{argmin}\mu E_Y[L(\tilde{Y}, \mu)]$. We see here that prediction error is composed of the variance of the response (irreducible noise), $S E$ and $V E$.

Using the same terminology, the bias and variance for the predictor are defined as follows:
$$
\begin{aligned}
\operatorname{Bias}(\tilde{Y}) & =L(S Y, S \tilde{Y}) \
\operatorname{Var}(\tilde{Y}) & =E_{\tilde{Y}}[L(\tilde{Y}, S \tilde{Y})]
\end{aligned}
$$
When the specific case of classification problems with zero-one loss function is considered, we end up with the following formulations:
$L(a, b)=I(a \neq b), Y \varepsilon{1,2,3 . . N}$ for an $N$ class problem, $P_i^Y=P_Y(Y=i), P_i^{\tilde{Y}}=$ $P_{\tilde{Y}}(\tilde{Y}=i), S T=\operatorname{argmin}i E_Y[I(Y \neq i)]=\operatorname{argmax}_i P_i^Y$ Therefore, $$ \begin{aligned} \operatorname{Var}(Y) & =P_Y(Y \neq S Y)=1-\max _i P_i^Y \ \operatorname{Var}(\tilde{Y}) & =P{\tilde{Y}}(\tilde{Y} \neq S \tilde{Y})=1-\max i P_i^{\tilde{Y}} \ \operatorname{Bias}(\tilde{Y}) & =I(S \tilde{Y} \neq S Y) \ \operatorname{VE}(\tilde{Y}, Y) & =P(Y \neq \tilde{Y})-P_Y(Y \neq S \tilde{Y})=P{S \tilde{Y}}^Y-\sum_i P_i^Y P_i^{\tilde{Y}} \
S E(\tilde{Y}, Y) & =P_Y(Y \neq S \tilde{Y})-P_Y(Y \neq S Y)=P_{S Y}^Y-P_{S \tilde{Y}}^Y
\end{aligned}
$$
where $I(q)$ is 1 if $q$ is a true argument and 0 otherwise.

机器学习代考

计算机代写|机器学习代写machine learning代考|Bias and Variance Analysis

偏差和方差分析为分类器预测误差分解提供了框架，在集成分类研究中起着重要的作用。

最初，german使用误差平方损失将回归设置下的预测误差分解为偏差项和方差项[10]。这种分解带来的事实是，预测错误率的减少/增加是由偏差或方差的减少/增加引起的，或者两者兼而有之。本文对分类设置进行了扩展分析，随后将其应用于不同的集成分类器，以分析它们优于单一分类器的原因。已经表明，大多数集成具有较低的预测错误率的原因是由于它们在偏差和方差的意义上提供了减少。
然而，不同的研究人员以不同的方式对回归的原始理论分析进行了扩展，用于分类;目前还没有公认的标准定义。因此，分析结果也略有不同。Breiman[3]、Kohavi and Wolpert[15]、Dietterich and Kong[16]、Friedman[9]、Wolpert[25]、Heskes[11]、Tibshirani[19]、Domingos[6]和James[13]给出了一些在研究领域引起兴趣的定义/框架。

尽管框架之间存在差异，但每个框架背后的主要直觉是相似的。考虑一个模式为$\left(x_i, l_i\right)$的训练集$T$，其中$x$表示特征向量，$l$表示相应的标签。给定一个测试模式，最优分类器模型通过确保所有可能的目标标签值的最低预期损失来预测决策标签。当与0 – 1损失函数一起使用时，这个分类器实际上是贝叶斯分类器，它应该知道并使用输入数据集模式/类的潜在可能性概率分布。如果我们将最优分类器的决策称为最优决策$(O D)$，那么对于给定的测试模式$\left(x_i, l_i\right), O D=\operatorname{argmin}_\alpha E_t[L(t, \alpha)]$，其中$L$表示使用的损失函数，$l$表示可能的目标标签值。

另一方面，估计器实际上是一个平均分类器模型。它通过确保在不同训练集上训练的分类器创建的所有标签上的最低预期损失来预测决策标签。这些分类器的内在参数通常是相同的，唯一的区别是它们所训练的训练集。在这种情况下，与在最优分类器情况下使用已知的潜在概率分布最小化目标标签上的损失不同，对由在各种训练集上训练的分类器创建的标签集进行最小化损失。如果估计器的决策被命名为预期估计器决策$(E E D)$;然后对于给定的测试模式$\left(x_i, l_i\right), E E D=\operatorname{argmin}_\alpha E_l[L(l, \alpha)]$，其中$L$表示使用的损失函数，$l$表示从使用的分类器获得的标签值。对于平方误差损失设置下的回归，$O D$为目标标签的均值，EED为通过不同训练集得到的分类器决策的均值。

计算机代写|机器学习代写machine learning代考|Bias and Variance Analysis of James

James[13]将Geman等[10]最初针对回归设置下的平方误差提出的预测误差分解扩展到所有对称损失函数。因此，他的定义也涵盖了我们在实验中使用的分类设置下的0 – 1损失。

在他的分解中，系统效应(SE)和方差效应(VE)两项满足所有对称损失函数的加性分解，对于实值和分类预测器都是如此。它们实际上表明了偏差和方差对预测误差的影响。例如，负值$V E$意味着方差实际上有助于减少预测误差。另一方面，定义偏差项来表示响应和预测器之间的平均距离;方差项指的是预测器的可变性。结果，保留了原始设置的偏差和方差概念的含义和可加性特征。下面是James的偏方差推导的总结:

对于任意对称损失函数$L$，其中$L(a, b)=L(b, a)$:
$$
\begin{aligned}
E_{Y, \tilde{Y}}[L(Y, \tilde{Y})]= & E_Y[L(Y, S Y)]+E_Y[L(Y, S \tilde{Y})-L(Y, S Y)] \
& +E_{Y, \tilde{Y}}[L(Y, \tilde{Y})-L(Y, S \tilde{Y})] \
\text { predictionerror }= & \operatorname{Var}(Y)+S E(\tilde{Y}, Y)+\operatorname{VE}(\tilde{Y}, Y)
\end{aligned}
$$
当$b$用于预测时，$L(a, b)$是损失$a, Y$是响应，$\tilde{Y}$是预测者。$S Y=\operatorname{argmin}\mu E_Y[L(Y, \mu)]$和$S \tilde{Y}=\operatorname{argmin}\mu E_Y[L(\tilde{Y}, \mu)]$。我们在这里看到，预测误差由响应的方差(不可约噪声)，$S E$和$V E$组成。

使用相同的术语，预测器的偏差和方差定义如下:
$$
\begin{aligned}
\operatorname{Bias}(\tilde{Y}) & =L(S Y, S \tilde{Y}) \
\operatorname{Var}(\tilde{Y}) & =E_{\tilde{Y}}[L(\tilde{Y}, S \tilde{Y})]
\end{aligned}
$$
当考虑具有0 – 1损失函数的分类问题的具体情况时，我们得到如下公式:
$L(a, b)=I(a \neq b), Y \varepsilon{1,2,3 . . N}$为$N$类问题，$P_i^Y=P_Y(Y=i), P_i^{\tilde{Y}}=$$P_{\tilde{Y}}(\tilde{Y}=i), S T=\operatorname{argmin}i E_Y[I(Y \neq i)]=\operatorname{argmax}i P_i^Y$因此，$$ \begin{aligned} \operatorname{Var}(Y) & =P_Y(Y \neq S Y)=1-\max _i P_i^Y \ \operatorname{Var}(\tilde{Y}) & =P{\tilde{Y}}(\tilde{Y} \neq S \tilde{Y})=1-\max i P_i^{\tilde{Y}} \ \operatorname{Bias}(\tilde{Y}) & =I(S \tilde{Y} \neq S Y) \ \operatorname{VE}(\tilde{Y}, Y) & =P(Y \neq \tilde{Y})-P_Y(Y \neq S \tilde{Y})=P{S \tilde{Y}}^Y-\sum_i P_i^Y P_i^{\tilde{Y}} \ S E(\tilde{Y}, Y) & =P_Y(Y \neq S \tilde{Y})-P_Y(Y \neq S Y)=P{S Y}^Y-P_{S \tilde{Y}}^Y
\end{aligned}
$$
如果$q$为真参数，$I(q)$为1，否则为0。

计算机代写|机器学习代写machine learning代考请认准statistics-lab™

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

金融工程是使用数学技术来解决金融问题。金融工程使用计算机科学、统计学、经济学和应用数学领域的工具和知识来解决当前的金融问题，以及设计新的和创新的金融产品。

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

术语广义线性模型（GLM）通常是指给定连续和/或分类预测因素的连续响应变量的常规线性回归模型。它包括多元线性回归，以及方差分析和方差分析（仅含固定效应）。

有限元方法代写

有限元方法（FEM）是一种流行的方法，用于数值解决工程和数学建模中出现的微分方程。典型的问题领域包括结构分析、传热、流体流动、质量运输和电磁势等传统领域。

有限元是一种通用的数值方法，用于解决两个或三个空间变量的偏微分方程（即一些边界值问题）。为了解决一个问题，有限元将一个大系统细分为更小、更简单的部分，称为有限元。这是通过在空间维度上的特定空间离散化来实现的，它是通过构建对象的网格来实现的：用于求解的数值域，它有有限数量的点。边界值问题的有限元方法表述最终导致一个代数方程组。该方法在域上对未知函数进行逼近。[1] 然后将模拟这些有限元的简单方程组合成一个更大的方程系统，以模拟整个问题。然后，有限元通过变化微积分使相关的误差函数最小化来逼近一个解决方案。

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

随机分析代写

随机微积分是数学的一个分支，对随机过程进行操作。它允许为随机过程的积分定义一个关于随机过程的一致的积分理论。这个领域是由日本数学家伊藤清在第二次世界大战期间创建并开始的。

时间序列分析代写

随机过程，是依赖于参数的一组随机变量的全体，参数通常是时间。随机变量是随机现象的数量表现，其时间序列是一组按照时间发生先后顺序进行排列的数据点序列。通常一组时间序列的时间间隔为一恒定值（如1秒，5分钟，12小时，7天，1年），因此时间序列可以作为离散时间数据进行分析处理。研究时间序列数据的意义在于现实中，往往需要研究某个事物其随时间发展变化的规律。这就需要通过研究该事物过去发展的历史记录，以得到其自身发展的规律。

回归分析代写

多元回归分析渐进（Multiple Regression Analysis Asymptotics）属于计量经济学领域，主要是一种数学上的统计分析方法，可以分析复杂情况下各影响因素的数学关系，在自然科学、社会和经济学等多个领域内应用广泛。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习和应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|QBUS6850

Posted on 2023年8月23日2023年8月23日 by statistics-lab

计算机代写|机器学习代写machine learning代考|Error Correcting Output Coding (ECOC)

ECOC is an ensemble technique [4], in which multiple base classifiers are created and trained according to the information obtained from a pre-set binary code matrix. The main idea behind this procedure is to solve the original multi-class problem by combining the decision boundaries obtained from simpler two-class decompositions. The original problem is likely to be more complex compared to the subproblems into which it is decomposed, and therefore the aim is to come up with an easier and/or more accurate solution using the sub-problems rather than trying to solve it by a single complex classifier.

The base classifiers are actually two-class classifiers (dichotomizers), each of which is trained to solve a different bi-partitioning of the original problem. The bipartitions are created by combining the patterns from some predetermined classes together and relabeling them. An example bi-partitioning of an $N>2$ class dataset would be by having the patterns from the first 2 classes labeled as +1 and the last $N-2$ classes as -1 . The training patterns are therefore separated into two superclasses for each base classifier, and the information about how to create these superclasses is obtained from the ECOC matrix.

Consider an ECOC matrix $C$, where a particular element $C_{i j}$ is an element of the set $(+1,-1)$. Each $C_{i j}$ indicates the desired label for class $i$, to be used in training the base classifier $j$; and each row, called a codeword, represents the desired output for the whole set of base classifiers for the class it indicates. Figure 4.1 shows an ECOC matrix for a 4-class problem for illustration purposes.

计算机代写|机器学习代写machine learning代考|Bias and Variance Analysis

Bias and variance analysis plays an important role in ensemble classification research due to the framework it provides for classifier prediction error decomposition.

The estimator, on the other hand, is actually an averaged classifier model. It predicts a decision label by assuring the lowest expected loss over all labels that are created by classifiers trained on different training sets. The intrinsic parameters of these classifiers are usually the same, and the only difference is the training sets that they are trained on. In this case, instead of minimizing the loss over the target labels using the known underlying probability distribution as happens in the optimal classifier case, the minimization of the loss is carried out for the set of labels which are created by the classifiers trained on various training sets. If the decision of the estimator is named as the expected estimator decision $(E E D)$; then for a given test pattern $\left(x_i, l_i\right), E E D=\operatorname{argmin}_\alpha E_l[L(l, \alpha)]$ where $L$ denotes the loss function used, and $l$ the label values obtained from the classifiers used. For regression under the squared-error loss setting, the $O D$ is the mean of the target labels while the $E E D$ is the mean of the classifier decisions obtained via different training sets.

机器学习代考

计算机代写|机器学习代写machine learning代考|Error Correcting Output Coding (ECOC)

ECOC是一种集成技术[4]，它根据从预先设置的二进制码矩阵中获得的信息创建和训练多个基分类器。该过程的主要思想是通过结合由更简单的两类分解得到的决策边界来解决原来的多类问题。与被分解成的子问题相比，原始问题可能更复杂，因此目标是使用子问题提出更容易和/或更准确的解决方案，而不是试图通过单个复杂分类器来解决它。

基本分类器实际上是两类分类器(二分器)，每个分类器都被训练来解决原始问题的不同双划分。通过将来自某些预定类的模式组合在一起并重新标记它们来创建双分区。对$N>2$类数据集进行双分区的一个示例是，将前两个类中的模式标记为+1，将最后一个$N-2$类中的模式标记为-1。因此，对于每个基分类器，训练模式被分成两个超类，关于如何创建这些超类的信息是从ECOC矩阵中获得的。

考虑一个ECOC矩阵$C$，其中一个特定元素$C_{i j}$是集合$(+1,-1)$的一个元素。每个$C_{i j}$表示类$i$所需的标签，用于训练基分类器$j$;每一行称为一个码字，表示它所指示的类的整个基本分类器集的期望输出。为了便于说明，图4.1显示了一个4类问题的ECOC矩阵。

计算机代写|机器学习代写machine learning代考|Bias and Variance Analysis

偏差和方差分析为分类器预测误差分解提供了框架，在集成分类研究中起着重要的作用。

另一方面，估计器实际上是一个平均分类器模型。它通过确保在不同训练集上训练的分类器创建的所有标签上的最低预期损失来预测决策标签。这些分类器的内在参数通常是相同的，唯一的区别是它们所训练的训练集。在这种情况下，与在最优分类器情况下使用已知的潜在概率分布最小化目标标签上的损失不同，对由在各种训练集上训练的分类器创建的标签集进行最小化损失。如果估计器的决策被命名为预期估计器决策$(E E D)$;然后对于给定的测试模式$\left(x_i, l_i\right), E E D=\operatorname{argmin}_\alpha E_l[L(l, \alpha)]$，其中$L$表示使用的损失函数，$l$表示从使用的分类器获得的标签值。对于平方误差损失设置下的回归，$O D$为目标标签的均值，$E E D$为通过不同训练集得到的分类器决策的均值。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|COMP4702

Posted on 2023年8月23日2023年8月23日 by statistics-lab

计算机代写|机器学习代写machine learning代考|UCI Data Experiments

The purpose of the experiments in this section is to compare the classification accuracy of MBDS, VMBDSs, eECOC, and OA ensembles on 15 UCI datasets [2]. Three types of classifiers were employed as binary base classifiers: the Ripper rule classifier [3], logistic regression [10], and Support Vector Machines [8]. The number of MBDS classifiers in the VMBDSs ensembles varied from 1 to 15 . The evaluation method was 10 -fold cross validation averaged over 10 runs. The results are given in Table 3.4, Table 3.5, and Table 3.6 in the Appendix. The classification accuracy of the classifiers was compared using the corrected paired t-test [15] at the 5\% significance level. Two types of t-test comparisons were realized: eECOC ensembles against all other ensembles and OA ensembles against all other ensembles.
The results in Tables 3.4-3.6 show that:

The difference in classification accuracy between the MBDS and eECOC ensembles is not statistically significant in 28 out of 45 experiments. In the remaining 17 experiments the classification accuracy of the MBDS ensembles is statistically lower than that of the eECOC ensembles.

The difference in classification accuracy between the MBDS and OA ensembles is not statistically significant in 34 out of 45 experiments. In the remaining 11 experiments the classification accuracy of the MBDS ensembles is statistically lower than that of the OA ensembles.

The classification accuracy of the VMBDSs ensembles varies between the accuracy of the MBDS ensembles and the accuracy of the eECOC ensembles. The difference of the classification accuracy of the worst VMBDSs ensembles and the eECOC ensembles is not statistically significant in 28 out of 45 experiments. In the remaining 17 experiments the classification accuracy of the worst VMBDSs ensembles is statistically lower than that of the eECOC ensembles. The difference of the classification accuracy of the best VMBDSs ensembles and the eECOC ensembles is not statistically significant in 44 out of 45 experiments. In the remaining one experiment the classification accuracy of the best VMBDSs ensembles is statistically greater than that of the eECOC ensembles.

The difference of the classification accuracy between the worst VMBDSs ensembles and the OA ensembles is not statistically significant in 34 out of 45 experiments. In the remaining 11 experiments the classification accuracy of the worst VMBDSs ensembles is statistically lower than that of the eECOC ensembles. The difference of the classification accuracy of the best VMBDSs ensembles and the OA ensembles is not statistically significant in 38 out of 45 experiments. In the next 6 (1) experiments the classification accuracy of the best VMBDSs ensembles is statistically greater (lower) than that of the OA ensembles. In addition we compare the VMBDSs and OA ensembles when they have an approximately equal number of binary classifiers. In this case we compare the VMBDSs ensembles using two MBDS classifiers with the OA ensembles. The results are that the difference of the classification accuracy of the VMBDSs and OA ensembles is not statistically significant in 41 out of 45 experiments. In the next 2 (2) experiments the classification accuracy of the VMBDSs ensembles is statistically greater (lower) than that of the OA ensembles.

计算机代写|机器学习代写machine learning代考|Experiments on Data Sets with Large Number of Classes

The purpose of this section’s experiments is to compare the classification accuracy of the VMBDSs and OA on three datasets with a large number of classes. The datasets chosen are Abalone [2], Patents [12], and Faces94 [11]. Several properties of these datasets are summarized in Table 3.1.

The eECOC ensembles were excluded from the experiments, since they require an exponential number of binary classifiers (in our experiments at least $2^{27}-1$ ). Support Vector Machines [8] were used as a base classifier. The number of MBDS classifiers in the VMBDSs ensembles was varied from 5 – 25. The evaluation method was 5 -fold cross validation averaged over 5 runs. The results are presented in Table 3.2. The classification accuracy of the classifiers is compared using the corrected paired t-test [15] at the 5\% significance level. The test compares the OA ensembles against all VMBDSs ensembles.

The experimental results from Table 3.2 show that the VMBDSs ensembles can outperform statistically the OA ensembles on these three datasets. In this respect it is important to know whether the VMBDSs ensembles outperform the OA ensembles when both types of ensembles contain the same number of binary classifiers; i.e., when their computational complexities are equal. We show how to organize this experiment for the Abalone dataset. This dataset has 28 classes. Thus, the number of binary classifiers in the OA ensemble is 28 . This implies that we have to find a configuration for the VMBDSs ensembles so that the total number of binary classifiers is close to 28 . In this context we note that the number of binary classifiers in each MBDS ensemble is $\left\lceil\log _2(28)\right\rceil=5$. Thus, in order to have close to 28 number of binary classifiers we need $\left\lfloor\frac{28}{5}\right\rfloor=5$ MBDS classifiers. According to Table 3.2 for this configuration the VMBDSs ensemble outperforms statistically the OA ensemble. Analogously we can do the same computation for the Patents and Faces 94 datasets: for the Patents dataset we need 10 MBDS classifiers and for the Faces 94 dataset we need 19 MBDS classifiers in the VMBDSs ensemble. According to Table 3.2 for these configurations the VMBDSs ensembles outperform statistically the OA ensemble.

机器学习代考

计算机代写|机器学习代写machine learning代考|UCI Data Experiments

本节实验的目的是比较MBDS、vmbds、eECOC和OA集合在15个UCI数据集上的分类准确率[2]。采用三种分类器作为二元基分类器:Ripper规则分类器[3]、逻辑回归[10]和支持向量机[8]。vmbds集成中MBDS分类器的数量从1到15不等。评价方法为10次平均交叉验证。结果见附录表3.4、表3.5、表3.6。在5%显著性水平下，使用校正成对t检验比较分类器的分类精度[15]。实现了两种类型的t检验比较:eECOC集成与所有其他集成以及OA集成与所有其他集成。
表3.4-3.6的结果表明:

在45个实验中，有28个实验MBDS和eECOC集合的分类准确率差异无统计学意义。在其余17个实验中，MBDS集合的分类精度在统计学上低于eECOC集合。

在45个实验中，有34个实验MBDS与OA的分类准确率差异无统计学意义。在其余11个实验中，MBDS集成集的分类精度在统计学上低于OA集成集。

vmbds系统的分类精度在MBDS系统和eECOC系统的分类精度之间存在差异。在45个实验中，28个实验中最差的vmbds集合与eECOC集合的分类准确率差异无统计学意义。在其余17个实验中，最差的VMBDSs集合的分类准确率在统计学上低于eECOC集合。在45个实验中，有44个实验的最佳VMBDSs集合与eECOC集合的分类准确率差异无统计学意义。在剩下的一个实验中，最佳的VMBDSs集合的分类精度在统计学上高于eECOC集合。

在45个实验中，有34个实验中最差的vmbds集合与OA集合的分类准确率差异无统计学意义。在其余11个实验中，最差的VMBDSs集合的分类精度在统计学上低于eECOC集合。在45个实验中，38个实验中最佳的vmbds集成与OA集成的分类准确率差异无统计学意义。在接下来的6(1)个实验中，最佳的vmbds集合的分类准确率在统计学上高于(低于)OA集合。此外，我们还比较了vmbds和OA集成，当它们具有近似相等数量的二元分类器时。在本例中，我们将使用两个MBDS分类器的vmbds集成与OA集成进行比较。结果表明，在45个实验中，有41个实验中vmbds与OA集合的分类准确率差异无统计学意义。在接下来的2(2)个实验中，VMBDSs集成集的分类精度在统计学上高于(低于)OA集成集。

计算机代写|机器学习代写machine learning代考|Experiments on Data Sets with Large Number of Classes

本节实验的目的是比较vmbds和OA在三个类数较多的数据集上的分类准确率。选择的数据集是Abalone[2]、Patents[12]和Faces94[11]。表3.1总结了这些数据集的几个属性。

eECOC集成被排除在实验之外，因为它们需要指数数量的二元分类器(在我们的实验中至少$2^{27}-1$)。支持向量机[8]被用作基础分类器。vmbds集合中MBDS分类器的数量从5 – 25个不等。评价方法为5次平均交叉验证。结果如表3.2所示。在5％显著性水平下，使用校正的配对t检验[15]比较分类器的分类精度。测试将OA集成与所有vmbds集成进行比较。

从表3.2的实验结果可以看出，在这三个数据集上，vmbds集成在统计上优于OA集成。在这方面，重要的是要知道当两种类型的集成包含相同数量的二进制分类器时，vmbds集成是否优于OA集成;也就是说，当它们的计算复杂度相等时。我们展示了如何为Abalone数据集组织这个实验。这个数据集有28个类。因此，OA集合中的二元分类器数量为28个。这意味着我们必须为vmbds集成找到一个配置，以便二进制分类器的总数接近28。在这种情况下，我们注意到每个MBDS集成中的二元分类器数量为$\left\lceil\log _2(28)\right\rceil=5$。因此，为了拥有接近28个二进制分类器，我们需要$\left\lfloor\frac{28}{5}\right\rfloor=5$ MBDS分类器。根据表3.2，在此配置中，vmbds集成在统计上优于OA集成。类似地，我们可以对Patents和Faces 94数据集进行相同的计算:对于Patents数据集，我们需要10个MBDS分类器，对于Faces 94数据集，我们需要vmbds集成中的19个MBDS分类器。从表3.2可以看出，在这些配置下，vmbds系统总体性能优于OA系统。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|STAT3888

Posted on 2023年8月16日2023年8月28日 by statistics-lab

计算机代写|机器学习代写machine learning代考|UCI Categorization

The classification results obtained for all the UCI data sets considering the different ECOC configurations are shown in Table 2.2. In order to compare the performances provided for each strategy, the table also shows the mean rank of each ECOC design considering the twelve different experiments. The rankings are obtained estimating each particular ranking $r_i^j$ for each problem $i$ and each ECOC configuration $j$, and computing the mean ranking $R$ for each design as $R_j=\frac{1}{N} \sum_i r_i^j$, where $N$ is the total number of data sets. We also show the mean number of classifiers (#) required for each strategy.

In order to analyze if the difference between ranks (and hence, the methods) is statistically significant, we apply a statistical test. In order to reject the null hypothesis (which implies no significant statistical difference among measured ranks and the mean rank), we use the Friedman test. The Friedman statistic value is computed as follows:
$$
X_F^2=\frac{12 N}{k(k+1)}\left[\sum_j R_j^2-\frac{k(k+1)^2}{4}\right] .
$$
In our case, with $k=4$ ECOC designs to compare, $X_F^2=-4.94$. Since this value is rather conservative, Iman and Davenport proposed a corrected statistic:
$$
F_F=\frac{(N-1) X_F^2}{N(k-1)-X_F^2}
$$

Applying this correction we obtain $F_F=-1.32$. With four methods and twelve experiments, $F_F$ is distributed according to the $F$ distribution with 3 and 33 degrees of freedom. The critical value of $F(3,33)$ for 0.05 is 2.89 . As the value of $F_F$ is no higher than 2.98 we can state that there is no statistically significant difference among the ECOC schemes. This means that all four strategies are suitable in order to deal with multi-class categorization problems. This result is very satisfactory and encourages the use of the compact approach since similar (or even better) results can be obtained with far less number of classifiers. Moreover, the GA evolutionary version of the compact design improves in the mean rank to the rest of classical coding strategies, and in most cases outperforms the binary compact approach in the present experiment. This result is expected since the evolutionary version looks for a compact ECOC matrix configuration that minimizes the error over the training data. In particular, the advantage of the evolutionary version over the binary one is more significant when the number of classes increases, since more compact matrices are available for optimization.

计算机代写|机器学习代写machine learning代考|Labelled Faces in the Wild Categorization

This dataset contains 13000 faces images taken directly from the web from over 1400 people. These images are not constrained in terms of pose, light, occlusions or any other relevant factor. For the purpose of this experiment we used a specific subset, taking only the categories which at least have four or more examples, having a total of 610 face categories. Finally, in order to extract relevant features from the images, we apply an Incremental Principal Component Analysis procedure [16], keeping $99.8 \%$ of the information. An example of face images is shown in Fig. 2.4.
The results in the first row of Table 2.3 show that the best performance is obtained by the Evolutionary GA and PBIL compact strategies. One important observation is that Evolutionary strategies outperform the classical one-versus-all approach, with far less number of classifiers (10 instead of 610). Note that in this case we omitted the one-vs-one strategy since it requires 185745 classifiers for discriminating 610 face categories.

For this second computer vision experiment, we use the video sequences obtained from the Mobile Mapping System of [1] to test the ECOC methodology on a real traffic sign categorization problem. In this system, the position and orientation of the different traffic signs are measured with video cameras fixed on a moving vehicle. The system has a stereo pair of calibrated cameras, which are synchronized with a GPS/INS system. The result of the acquisition step is a set of stereo-pairs of images with their position and orientation information. From this system, a set of 36 circular and triangular traffic sign classes are obtained. Some categories from this data set are shown in Fig. 2.5. The data set contains a total of 3481 samples of size $32 \times 32$, filtered using the Weickert anisotropic filter, masked to exclude the background pixels, and equalized to prevent the effects of illumination changes. These feature vectors are then projected into a 100 feature vector by means of PCA.

The classification results obtained when considering the different ECOC configurations are shown in the second row of Table 2.3. The ECOC designs obtain similar classification results with an accuracy of over $90 \%$. However, note that the compact methodologies use six times less classifiers than the one-versus-all and 105 less times classifiers than the one-versus-one approach, respectively.

机器学习代考

计算机代写|机器学习代写machine learning代考|UCI Categorization

考虑不同ECOC配置的所有UCI数据集的分类结果如表2.2所示。为了比较每种策略提供的性能，下表还显示了考虑到12种不同实验的每种ECOC设计的平均排名。通过估计每个问题$i$和每个ECOC配置$j$的每个特定排名$r_i^j$得到排名，并计算每个设计的平均排名$R$为$R_j=\frac{1}{N} \sum_i r_i^j$，其中$N$为数据集的总数。我们还展示了每种策略所需的分类器的平均数量(＃)。

为了分析等级之间的差异(以及方法之间的差异)是否具有统计显著性，我们应用了统计检验。为了拒绝零假设(这意味着测量秩和平均秩之间没有显著的统计差异)，我们使用弗里德曼检验。弗里德曼统计值计算公式如下:
$$
X_F^2=\frac{12 N}{k(k+1)}\left[\sum_j R_j^2-\frac{k(k+1)^2}{4}\right] .
$$
在我们的案例中，与$k=4$ ECOC设计进行比较，$X_F^2=-4.94$。由于这个值相当保守，Iman和Davenport提出了一个修正后的统计:
$$
F_F=\frac{(N-1) X_F^2}{N(k-1)-X_F^2}
$$

应用这个修正，我们得到$F_F=-1.32$。通过4种方法和12个实验，$F_F$按照$F$的3自由度和33自由度分布进行分布。$F(3,33)$对0.05的临界值为2.89。由于$F_F$的值不大于2.98，我们可以认为ECOC方案之间没有统计学上的显著差异。这意味着这四种策略都适用于处理多类分类问题。这个结果非常令人满意，并鼓励使用紧凑方法，因为使用更少的分类器可以获得类似(甚至更好)的结果。此外，遗传进化版本的紧凑设计在平均秩上优于其他经典编码策略，并且在大多数情况下优于本实验中的二进制紧凑方法。这个结果是预期的，因为进化版本寻找一个紧凑的ECOC矩阵配置，使训练数据上的误差最小化。特别是，当类的数量增加时，进化版本相对于二进制版本的优势更加显著，因为可以使用更紧凑的矩阵进行优化。

计算机代写|机器学习代写machine learning代考|Labelled Faces in the Wild Categorization

该数据集包含13000张直接从网络上取自1400多人的人脸图像。这些图像不受姿势、光线、遮挡或任何其他相关因素的限制。为了这个实验的目的，我们使用了一个特定的子集，只取至少有四个或更多例子的类别，总共有610个面部类别。最后，为了从图像中提取相关特征，我们应用增量主成分分析程序[16]，保留99.8%的信息。一个人脸图像的例子如图2.4所示。
表2.3第一行的结果表明，进化遗传算法和PBIL压缩策略的性能最好。一个重要的观察结果是，进化策略优于经典的“一对全”方法，它的分类器数量要少得多(10个而不是610个)。注意，在这种情况下，我们省略了一对一策略，因为它需要185745个分类器来区分610个人脸类别。

对于第二个计算机视觉实验，我们使用从[1]的移动地图系统获得的视频序列来测试ECOC方法在实际交通标志分类问题上的应用。在这个系统中，不同的交通标志的位置和方向是通过固定在移动车辆上的摄像机来测量的。该系统有一对立体校准相机，与GPS/INS系统同步。采集步骤的结果是一组具有位置和方向信息的立体图像对。从这个系统中，得到了一组36个圆形和三角形交通标志类。该数据集中的一些类别如图2.5所示。该数据集共包含3481个大小为$32 × 32$的样本，使用Weickert各向异性滤波器进行滤波，屏蔽以排除背景像素，并进行均衡以防止光照变化的影响。然后通过PCA将这些特征向量投影成100个特征向量。

考虑不同ECOC配置得到的分类结果如表2.3第二行所示。ECOC设计获得了类似的分类结果，准确率超过90%。但是，请注意，紧凑方法使用的分类器比单对全方法少6倍，比单对一方法少105倍。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|QBUS3820

Posted on 2023年8月16日2023年8月28日 by statistics-lab

计算机代写|机器学习代写machine learning代考|Evolutionary Compact Parametrization

When defining a compact design of an ECOC, the possible loss of generalization performance has to be taken into account. In order to deal with this problem an evolutionary optimization process is used to find a compact ECOC with high generalization capability.

In order to show the parametrization complexity of the compact ECOC design, we first provide an estimation of the number of different possible ECOC matrices that we can build, and therefore, the search space cardinality. We approximate this number using some simple combinatorial principles. First of all, if we have an $N$-class problem and $B$ possible bits to represent all the classes, we have a set $C W$ with $2^B$ different words. In order to build an ECOC matrix, we select $N$ codewords from $C W$ without replacement. In combinatorics this is represented as $\left(\begin{array}{c}2_N^B \ N\end{array}\right)$, which means that we can construct $V_{2^B}^N=\frac{2^{B} !}{\left(2^B-N\right) !}$ different ECOC matrices. Nevertheless, in the ECOC framework, one matrix and its opposite (swapping all zeros by ones and vice-versa) are considered as the same matrix, since both represent the same partitions of the data. Therefore, the approximated number of possible ECOC matrices with the minimum number of classifiers is $\frac{V_{2 B}^N}{2}=\frac{2^{B} !}{2\left(2^B-N\right) !}$. In addition to the huge cardinality, it is easy to show that this space is neither continuous nor differentiable, because a change in just one bit of the matrix may produce a wrong coding design.

In this type of scenarios, evolutionary approaches are often introduced with good results. Evolutionary algorithms are a wide family of methods that are inspired on the Darwin’s evolution theory, and used to be formulated as optimization processes where the solution space is neither differentiable nor well defined. In these cases, the simulation of natural evolution process using computers results in stochastic optimization techniques which often outperform classical methods of optimization when applied to difficult real-world problems. Although the most used and studied evolutionary algorithms are the Genetic Algorithms (GA), from the publication of the Population Based Incremental Learning (PBIL) in 1995 by Baluja and Caruana [4], a new family of evolutionary methods is striving to find a place in this field. In contrast to $\mathrm{GA}$, those new algorithms consider each value in the chromosome as a random variable, and their goal is to learn a probability model to describe the characteristics of good individuals. In the case of PBIL, if a binary chromosome is used, a uniform distribution is learned in order to estimate the probability of each variable to be one or zero.

In this chapter, we report experiments made with the selected evolutionary strategies – i.e. GA and PBIL. Note that for both Evolutionary Strategies, the encoding step and the adaptation function are exactly equivalent.

计算机代写|机器学习代写machine learning代考|Problem encoding

Problem encoding: The first step in order to use an evolutionary algorithm is to define the problem encoding, which consists of the representation of a certain solution or point in the search space by means of a genotype or alternatively a chromosome [14]. When the solutions or individuals are transformed in order to be represented in a chromosome, the original values (the individuals) are referred as phenotypes, and each one of the possible settings for a phenotype is the allele. Binary encoding is the most common, mainly because the first works about GA used this type of encoding. In binary encoding, every chromosome is a string of bits. Although this encoding is often not natural for many problems and sometimes corrections must be performed after crossover and/or mutation, in our case, the chromosomes represent binary ECOC matrices, and therefore, this encoding perfectly adapts to the problem. Each ECOC is encoded as a binary chromosome $\zeta=$, where $h_i^{c_j} \in{0,1}$ is the expected value of the $i$-th classifier for the class $c_j$, which corresponds to the $i-t h$ bit of the class $c_j$ codeword.

Adaptation function: Once the encoding is defined, we need to define the adaptation function, which associates to each individual its adaptation value to the environment, and thus, their survival probability. In the case of the ECOC framework, the adaptation value must be related to the classification error.

Given a chromosome $\zeta=\left\langle\zeta_0, \zeta_1, \ldots, \zeta_L>\right.$ with $\zeta_i \in{0,1}$, the first step is to recover the ECOC matrix $M$ codified in this chromosome. The elements of $M$ allow to create binary classification problems from the original multi-class problem, following the partitions defined by the ECOC columns. Each binary problem is addressed by means of a binary classifier, which is trained in order to separate both partitions of classes. Assuming that there exists a function $y=f(x)$ that maps each sample $x$ to its real label $y$, training a classifier consists of finding the best parameters $w^$ of a certain function $y=f^{\prime}(x, w)$, in the manner that for any other $w \neq w^, f^{\prime}\left(x, w^\right)$ is a better approximation to $f$ than $f^{\prime}(x, w)$. Once the $w^$ are estimated for each binary problem, the adaptation value corresponds to the classification error. In order to take into account the generalization power of the trained classifiers, the estimation of $w^*$ is performed over a subset of the samples, while the rest of the samples are reserved for a validation set, and the adaptation value $\xi$ is the classification error over that validation subset. The adaptation value for an individual represented by a certain chromosome $\zeta_i$ can be formulated as:
$$
\varepsilon_i\left(P, Y, M_i\right)=\frac{\sum_{j=1}^s \delta\left(\rho_j, M_i\right) \neq y_j}{s},
$$
where $M_i$ is the ECOC matrix encoded in $\zeta_i, P=\left\langle\rho_1, \ldots, \rho_s\right\rangle$ a set of samples, $Y=\left\langle y_1, \ldots, y_s\right\rangle$ the expected labels for samples in $P$, and $\delta$ is the function that returns the classification label applying the decoding strategy.

机器学习代考

计算机代写|机器学习代写machine learning代考|Evolutionary Compact Parametrization

在定义ECOC的紧凑设计时，必须考虑到可能的泛化性能损失。为了解决这一问题，采用进化优化方法寻找具有高泛化能力的紧凑ECOC。

为了显示紧凑ECOC设计的参数化复杂性，我们首先提供了我们可以构建的不同可能ECOC矩阵的数量的估计，从而提供了搜索空间基数。我们用一些简单的组合原理来近似这个数字。首先，如果我们有一个$N$ -class问题，并且有$B$个可能的位来表示所有的class，那么我们就有一个包含$2^B$个不同单词的集合$C W$。为了构建ECOC矩阵，我们从$C W$中选择$N$码字而不进行替换。在组合学中，这表示为$\left(\begin{array}{c}2_N^B \ N\end{array}\right)$，这意味着我们可以构造$V_{2^B}^N=\frac{2^{B} !}{\left(2^B-N\right) !}$不同的ECOC矩阵。然而，在ECOC框架中，一个矩阵和它的对立面(用1交换所有零，反之亦然)被认为是相同的矩阵，因为两者都表示数据的相同分区。因此，具有最小分类器数的可能ECOC矩阵的近似值为$\frac{V_{2 B}^N}{2}=\frac{2^{B} !}{2\left(2^B-N\right) !}$。除了巨大的基数之外，很容易表明这个空间既不是连续的也不是可微的，因为仅仅改变矩阵的一位就可能产生错误的编码设计。

在这种类型的场景中，引入进化方法通常会带来良好的结果。进化算法是受达尔文进化论启发的一大类方法，过去常被表述为求解空间既不可微也不能很好定义的优化过程。在这些情况下，使用计算机模拟自然进化过程的结果是随机优化技术，当应用于困难的现实世界问题时，这种技术通常优于经典的优化方法。虽然使用和研究最多的进化算法是遗传算法(Genetic algorithms, GA)，但从1995年Baluja和Caruana[4]发表的基于种群的增量学习(Population Based Incremental Learning, PBIL)开始，一个新的进化方法家族正在努力在这一领域找到一席之地。与$\mathrm{GA}$相比，这些新算法将染色体中的每个值视为随机变量，其目标是学习一个概率模型来描述优秀个体的特征。在PBIL的情况下，如果使用双染色体，则学习均匀分布以估计每个变量为1或0的概率。

在本章中，我们报告了用选择的进化策略-即GA和PBIL进行的实验。注意，对于两种进化策略，编码步骤和适应函数是完全相同的。

计算机代写|机器学习代写machine learning代考|Problem encoding

问题编码:使用进化算法的第一步是定义问题编码，问题编码包括通过基因型或染色体在搜索空间中表示某个解或点[14]。当溶液或个体被转化以在染色体中表示时，原始值(个体)被称为表型，而表型的每个可能设置都是等位基因。二进制编码是最常见的，主要是因为关于GA的第一个作品使用了这种类型的编码。在二进制编码中，每条染色体都是一串比特。虽然这种编码对于许多问题来说往往是不自然的，有时在交叉和/或突变之后必须进行修正，但在我们的例子中，染色体代表二进制ECOC矩阵，因此，这种编码完美地适应了问题。每个ECOC被编码为一个二进制染色体$\zeta=$，其中$h_i^{c_j} \in{0,1}$是类$c_j$的$i$ -第一个分类器的期望值，它对应于类$c_j$码字的$i-t h$位。

适应函数:一旦编码被定义，我们需要定义适应函数，它与每个个体对环境的适应值相关联，从而与他们的生存概率相关联。对于ECOC框架，自适应值必须与分类误差相关。

给定一条含有$\zeta_i \in{0,1}$的染色体$\zeta=\left\langle\zeta_0, \zeta_1, \ldots, \zeta_L>\right.$，第一步是恢复在该染色体中编码的ECOC矩阵$M$。$M$的元素允许根据ECOC列定义的分区，从原始的多类问题创建二元分类问题。每个二进制问题都是通过一个二进制分类器来解决的，该分类器是为了分离类的两个分区而训练的。假设存在一个函数$y=f(x)$，它将每个样本$x$映射到它的真实标签$y$，那么训练一个分类器就是找到某个函数$y=f^{\prime}(x, w)$的最佳参数$w^$，因为对于任何其他的$w \neq w^, f^{\prime}\left(x, w^\right)$都比$f^{\prime}(x, w)$更接近$f$。一旦对每个二值问题估计出$w^$，其自适应值就对应于分类误差。为了考虑训练的分类器的泛化能力，对样本的一个子集执行$w^*$的估计，而其余的样本保留给一个验证集，并且自适应值$\xi$是该验证子集上的分类误差。以某条染色体$\zeta_i$为代表的个体的适应值可表示为:
$$
\varepsilon_i\left(P, Y, M_i\right)=\frac{\sum_{j=1}^s \delta\left(\rho_j, M_i\right) \neq y_j}{s},
$$
其中$M_i$是在$\zeta_i, P=\left\langle\rho_1, \ldots, \rho_s\right\rangle$中编码的ECOC矩阵(一组样本)，$Y=\left\langle y_1, \ldots, y_s\right\rangle$是$P$中样本的期望标签，$\delta$是应用解码策略返回分类标签的函数。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|MKTG6010

Posted on 2023年8月16日2023年8月28日 by statistics-lab

计算机代写|机器学习代写machine learning代考|Local Binary Patterns

The local binary pattern (LBP) operator [14] is a powerful $2 \mathrm{D}$ texture descriptor that has the benefit of being somewhat insensitive to variations in the lighting and orientation of an image. The method has been successfully applied to applications such as face recognition [1] and facial expression recognition [16]. As illustrated in Fig. 1.2, the LBP algorithm associates each interior pixel of an intensity image with a binary code number in the range $0-256$. This code number is generated by taking the surrounding pixels and, working in a clockwise direction from the top left hand corner, assigning a bit value of 0 where the neighbouring pixel intensity is less than that of the central pixel and 1 otherwise. The concatenation of these bits produces an eight-digit binary code word which becomes the grey-scale value of the corresponding pixel in the transformed image. Figure 1.2 shows a pixel being compared with its immediate neighbours. It is however also possible to compare a pixel with others which are separated by distances of two, three or more pixel widths, giving rise to a series of transformed images. Each such image is generated using a different radius for the circularly symmetric neighbourhood over which the LBP code is calculated.

Another possible refinement is to obtain a finer angular resolution by using more than 8 bits in the code-word [14]. Note that the choice of the top left hand corner as a reference point is arbitrary and that different choices would lead to different LBP codes; valid comparisons can be made, however, provided that the same choice of reference point is made for all pixels in all images.

It is noted in [14] that in practice the majority of LBP codes consist of a concatenation of at most three consecutive sub-strings of $0 \mathrm{~s}$ and $1 \mathrm{~s}$; this means that when the circular neighbourhood of the centre pixel is traversed, the result is either all $0 \mathrm{~s}$, all $1 \mathrm{~s}$ or a starting point can be found which produces a sequence of 0 s followed by a sequence of $1 \mathrm{~s}$. These codes are referred to as uniform patterns and, for an 8 bit code, there are 58 possible values. Uniform patterns are most useful for texture discrimination purposes as they represent local micro-features such as bright spots, flat spots and edges; non-uniform patterns tend to be a source of noise and can therefore usefully be mapped to the single common value 59 .

In order to use LBP codes as a face expression comparison mechanism it is first necessary to subdivide a face image into a number of sub-windows and then compute the occurrence histograms of the LBP codes over these regions. These histograms can be combined to generate useful features, for example by concatenating them or by comparing corresponding histograms from two images.

计算机代写|机器学习代写machine learning代考|Fast Correlation-Based Filtering

Broadly speaking, feature selection algorithms can be divided into two groups: wrapper methods and filter methods [3]. In the wrapper approach different combinations of features are considered and a classifier is trained on each combination to determine which is the most effective. Whilst this approach undoubtedly gives good results, the computational demands that it imposes render it impractical when a very large number of features needs to be considered. In such cases the filter approach may be used; this considers the merits of features in themselves without reference to any particular classification method.

Fast correlation-based filtering (FCBF) has proved itself to be a successful feature selection method that can handle large numbers of features in a computationally efficient way. It works by considering the classification between each feature and the class label and between each pair of features. As a measure of classification the concept of symmetric uncertainty is used; for a pair random variables $X$ and $Y$ this is defined as:
$$
S U(X, Y)=2\left[\frac{I G(X, Y)}{H(X)+H(Y)}\right]
$$
where $H(\cdot)$ is the entropy of the random variable and $I G(X, Y)=H(X)-H(X \mid Y)=$ $H(Y)-H(Y \mid X)$ is the information gain between $X$ and $Y$. As its name suggests, symmetric uncertainty is symmetric in its arguments; it takes values in the range $[0,1]$ where 0 implies independence between the random variables and 1 implies that the value of each variable completely predicts the value of the other. In calculating the entropies of Eq. 1.6, any continuous features must first be discretised.

The FCBF algorithm applies heuristic principles that aim to achieve a balance between using relevant features and avoiding redundant features. It does this by selecting features $f$ that satisfy the following properties:

$S U(f, c) \geq \delta$ where $c$ is the class label and $\delta$ is a threshold value chosen to suit the application.
$\forall g: S U(f, g) \geq S U(f, c) \Rightarrow S U(f, c) \geq S U(g, c)$ where $g$ is any feature other than $f$.

Here, property 1 ensures that the selected features are relevant, in that they are correlated with the class label to some degree, and property 2 eliminates redundant features by discarding those that are strongly correlated with a more relevant feature.

机器学习代考

计算机代写|机器学习代写machine learning代考|Local Binary Patterns

局部二元模式(LBP)算子[14]是一种功能强大的$2 \mathrm{D}$纹理描述符，其优点是对图像的光照和方向变化不敏感。该方法已成功应用于人脸识别[1]、面部表情识别[16]等应用。如图1.2所示，LBP算法将强度图像的每个内部像素与范围为$0-256$的二进制码数相关联。这个代码号是通过取周围的像素，从左上角开始顺时针方向工作，在邻近像素强度小于中心像素时分配位值0，否则分配位值1来生成的。这些位的连接产生一个8位二进制码字，它成为转换后的图像中相应像素的灰度值。图1.2显示了一个像素与其近邻的比较。然而，也可以将一个像素与被两个、三个或更多像素宽度的距离隔开的其他像素进行比较，从而产生一系列转换后的图像。每个这样的图像都是使用不同半径的圆对称邻域来生成的，LBP代码是在这个邻域上计算的。

另一种可能的改进是通过在码字中使用超过8位来获得更精细的角度分辨率[14]。注意，左上角作为参考点的选择是任意的，不同的选择将导致不同的LBP代码;然而，只要对所有图像中的所有像素选择相同的参考点，就可以进行有效的比较。

在[14]中指出，在实践中，大多数LBP码由最多三个连续的$0 \mathrm{~s}$和$1 \mathrm{~s}$子串组成;这意味着当遍历中心像素的圆形邻域时，结果要么是全部$0 \mathrm{~s}$，全部$1 \mathrm{~s}$，要么可以找到一个起点，它产生一个0 s序列，后面是一个$1 \mathrm{~s}$序列。这些代码被称为统一模式，对于一个8位的代码，有58个可能的值。均匀的图案在纹理识别中最有用，因为它们代表了局部的微特征，如亮点、平斑和边缘;不均匀的模式往往是噪声源，因此可以有效地映射到单一的公共值59。

为了使用LBP码作为人脸表情比较机制，首先需要将人脸图像细分为多个子窗口，然后计算这些区域上LBP码的出现直方图。这些直方图可以组合起来生成有用的特征，例如通过连接它们或比较来自两幅图像的相应直方图。

计算机代写|机器学习代写machine learning代考|Fast Correlation-Based Filtering

包装器方法考虑了不同的特征组合，并在每种组合上训练分类器，以确定哪种组合最有效。虽然这种方法无疑给出了很好的结果，但当需要考虑非常大量的特征时，它所施加的计算需求使其不切实际。在这种情况下，可以使用过滤器方法;这考虑了特征本身的优点，而不参考任何特定的分类方法。

快速相关滤波(Fast correlation-based filtering, FCBF)是一种成功的特征选择方法，能够以高效的计算方式处理大量特征。它通过考虑每个特征与类标号之间以及每对特征之间的分类来工作。作为分类的度量，对称不确定性的概念被使用;对于一对随机变量$X$和$Y$，定义为:
$$
S U(X, Y)=2\left[\frac{I G(X, Y)}{H(X)+H(Y)}\right]
$$
其中$H(\cdot)$为随机变量的熵，$I G(X, Y)=H(X)-H(X \mid Y)=$$H(Y)-H(Y \mid X)$为$X$与$Y$之间的信息增益。顾名思义，对称不确定性在其参数中是对称的;它的取值范围为$[0,1]$，其中0表示随机变量之间的独立性，1表示每个变量的值完全预测另一个变量的值。在计算Eq. 1.6的熵时，必须首先对任何连续特征进行离散。

FCBF算法采用启发式原则，目的是在使用相关特征和避免冗余特征之间取得平衡。它通过选择满足以下属性的功能$f$来实现这一点:

$S U(f, c) \geq \delta$ 其中$c$是类标签，$\delta$是为适应应用程序而选择的阈值。

$\forall g: S U(f, g) \geq S U(f, c) \Rightarrow S U(f, c) \geq S U(g, c)$ 其中$g$是除$f$之外的任何特性。

在这里，属性1确保所选择的特征是相关的，因为它们在某种程度上与类标签相关，而属性2通过丢弃那些与更相关的特征强烈相关的特征来消除冗余特征

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|COMP4318

Posted on 2023年8月14日2023年8月28日 by statistics-lab

计算机代写|机器学习代写machine learning代考|Bulk external delivery

The considerations for bulk external delivery aren’t substantially different from internal use serving to a database or data warehouse. The only material differences between these serving cases are in the realms of delivery time and monitoring of the predictions.
DELIVERY CONSISTENCY
Bulk delivery of results to an external party has the same relevancy requirements as any other ML solution. Whether you’re building something for an internal team or generating predictions that will be end-user-customer facing, the goal of creating useful predictions doesn’t change.

The one thing that does change with providing bulk predictions to an outside organization (generally applicable to business-to-business companies) when compared to other serving paradigms is in the timeliness of the delivery. While it may be obvious that a failure to deliver an extract of bulk predictions entirely is a bad thing, an inconsistent delivery can be just as detrimental. There is a simple solution to this, however, illustrated in the bottom portion of figure 16.14.

Figure 16.14 shows the comparison of gated and ungated serving to an external user group. By controlling a final-stage egress from the stored predictions in a scheduled batch prediction job, as well as coupling feature-generation logic to an ETL process governed by a feature store, delivery consistency from a chronological perspective can be guaranteed. While this may not seem an important consideration from the DS perspective of the team generating the predictions, having a predictable dataavailability schedule can dramatically increase the perceived professionalism of the serving company.

计算机代写|机器学习代写machine learning代考|QUALITY ASSURANCE

An occasionally overlooked aspect of serving bulk predictions externally (external to the DS and analytics groups at a company) is ensuring that a thorough quality check is performed on those predictions.

An internal project may rely on a simple check for overt prediction failures (for example, silent failures are ignored that result in null values, or a linear model predicts infinity). When sending data products externally, additional steps should be done to minimize the chances of end users of predictions finding fault with them. Since we, as humans, are so adept at finding abnormalities in patterns, a few scant issues in a batch-delivered prediction dataset can easily draw the focus of a consumer of the data, deteriorating their faith in the efficacy of the solution to the point of disuse.

In my experience, when delivering bulk predictions external to a team of data specialists, I’ve found it worthwhile to perform a few checks before releasing the data:

Validate the predictions against the training data:
Classification problems-Comparing aggregated class counts
Regression problems-Comparing prediction distribution
Unsupervised problems-Evaluating group membership counts
Check for prediction outliers (applicable to regression problems).
Build (if applicable) heuristics rules based on knowledge from SMEs to ensure that predictions are not outside the realm of possibility for the topic.
Validate incoming features (particularly encoded ones that may use a generic catchall encoding if the encoding key is previously unseen) to ensure that the data is fully compatible with the model as it was trained.

By running a few extra validation steps on the output of a batch prediction, a great deal of confusion and potential lessening of trust in the final product can be avoided in the eyes of end users.

机器学习代考

计算机代写|机器学习代写machine learning代考|Bulk external delivery

批量外部交付的考虑因素与数据库或数据仓库的内部使用没有本质上的区别。这些服务案例之间唯一的实质性区别在于交付时间和预测监控方面。
交付的一致性
将结果批量交付给外部方与任何其他ML解决方案具有相同的相关性要求。无论您是在为内部团队构建某些东西，还是生成面向最终用户-客户的预测，创建有用预测的目标都不会改变。

与其他服务范式相比，向外部组织提供批量预测(通常适用于b2b公司)确实改变了一件事，那就是交付的及时性。虽然很明显，不能完全交付批量预测的摘要是一件坏事，但不一致的交付可能同样有害。但是，有一个简单的解决方案，如图16.14的底部部分所示。

图16.14显示了为外部用户组服务的门控和不门控的比较。通过控制计划批处理预测作业中存储的预测的最后阶段出口，以及将特征生成逻辑耦合到由特征存储管理的ETL过程，可以保证从时间顺序角度来看交付的一致性。虽然从生成预测的团队的DS角度来看，这似乎不是一个重要的考虑因素，但拥有可预测的数据可用性时间表可以显著提高服务公司的专业水平。

计算机代写|机器学习代写machine learning代考|QUALITY ASSURANCE

在向外部(公司的DS和分析组外部)提供批量预测时，一个偶尔被忽视的方面是确保对这些预测执行彻底的质量检查。

内部项目可能依赖于对公开预测失败的简单检查(例如，忽略导致空值的静默失败，或者线性模型预测无穷大)。在向外部发送数据产品时，应该采取额外的步骤，以尽量减少预测的最终用户发现错误的可能性。作为人类，我们非常善于发现模式中的异常，因此批量交付的预测数据集中的一些小问题很容易吸引数据消费者的注意力，从而降低他们对解决方案有效性的信心，直至不再使用。

根据我的经验，当向数据专家团队外部交付批量预测时，我发现在发布数据之前执行一些检查是值得的:

根据训练数据验证预测:

分类问题——比较聚合类计数

回归问题-比较预测分布

不受监督的问题——评估团队成员数量

检查预测异常值(适用于回归问题)。

基于中小企业的知识构建启发式规则(如果适用)，以确保预测不会超出主题的可能性范围。

验证传入的特性(特别是编码的特性，如果编码键以前未见过，则可能使用通用的通用编码)，以确保数据在训练时与模型完全兼容。

通过在批预测的输出上运行一些额外的验证步骤，可以避免在最终用户眼中对最终产品的大量混淆和潜在的信任度降低。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|QBUS6850

Posted on 2023年8月11日2023年8月28日 by statistics-lab

计算机代写|机器学习代写machine learning代考|Artifact management

Let’s imagine that we’re still working at the fire-risk department of the forest service introduced in chapter 15. In our efforts to effectively dispatch personnel and equipment to high-risk areas in the park system, we’ve arrived at a solution that works remarkably well. Our features are locked in and are stable over time. We’ve evaluated the performance of the predictions and are seeing genuine value from the model.
Throughout this process of getting the features into a good state, we’ve been iterating through the improvement cycle, shown in figure 16.1.

As this cycle shows, we’ve been iteratively releasing new versions of the model, testing against a baseline deployment, collecting feedback, and working to improve the predictions. At some point, however, we’ll be going into model-sustaining mode.

We’ve worked as hard as we can to improve the features going into the model and have found that the return on investment (ROI) of continuing to add new data elements to the project is simply not worth it. We’re now in the position of scheduled passive retraining of our model based on new data coming in over time.

When we’re at this steady-state point, the last thing that we want to do is to have one of the DS team members spend an afternoon manually retraining a model, manually comparing its results to the current production-deployed model with ad hoc analysis, and deciding on whether the model should be updated.

计算机代写|机器学习代写machine learning代考|MLflow’s model registry

In this situation that we find ourselves in, with scheduled updates to a model happening autonomously, it is important for us to know the state of production deployment. Not only do we need to know the current state, but if questions arise about performance of a passive retraining system in the past, we need to have a means of investigating the historical provenance of the model. Figure 16.3 compares using and not using a registry for tracking provenance in order to explain a historical issue.

As you can see, the process for attempting to re-create a past run is fraught with peril; we have a high risk of being unable to reproduce the issue that the business found in historical predictions. With no registry to record the artifacts utilized in production, manual work must be done to re-create the model’s original conditions. This can be incredible challenging (if not impossible) in most companies because changes may have occurred to the underlying data used to train the model, rendering it impossible to re-create that state.

The preferred approach, as shown in figure 16.3 , is to utilize a model registry service. MLflow, for instance, offers exactly this functionality within its APIs, allowing us to log details of each retraining run to the tracking server, handle production promotion if the scheduled retraining job performs better on holdout data, and archive the older model for future reference. If we had used this framework, the process of testing conditions of a model that had at one point run in production would be as simple as recalling the artifact from the registry entry, loading it into a notebook environment, and generating the explainable correlation reports with tools such as shap.

机器学习代考

计算机代写|机器学习代写machine learning代考|Artifact management

让我们想象一下，我们仍然在第15章介绍的森林服务部门的火灾风险部门工作。在我们努力有效地向公园系统的高风险区域派遣人员和设备的过程中，我们已经找到了一个非常有效的解决方案。随着时间的推移，我们的功能被锁定并保持稳定。我们已经评估了预测的表现，并从模型中看到了真正的价值。
在使特性进入良好状态的整个过程中，我们一直在迭代改进周期，如图16.1所示。

正如这个周期所示，我们一直在迭代地发布模型的新版本，针对基线部署进行测试，收集反馈，并努力改进预测。然而，在某种程度上，我们将进入模型维持模式。

我们已经尽我们所能地改进模型中的特性，并且发现继续向项目中添加新数据元素的投资回报(ROI)根本不值得。我们现在处于计划中的被动再训练位置，这是基于随着时间的推移输入的新数据。

当我们处于这个稳定状态点时，我们最不想做的事情就是让DS团队中的一个成员花一个下午的时间手动地重新训练模型，手动地将其结果与当前生产部署的模型进行特别分析比较，并决定是否应该更新模型。

计算机代写|机器学习代写machine learning代考|MLflow’s model registry

在我们发现自己所处的这种情况下，对模型的计划更新是自主发生的，因此了解生产部署的状态对我们来说非常重要。我们不仅需要知道当前状态，而且如果过去被动再训练系统的性能出现问题，我们需要有一种方法来调查模型的历史来源。图16.3比较了使用和不使用注册表跟踪来源的情况，以便解释历史问题。

正如你所看到的，试图重现过去的运行过程充满了危险;我们有很高的风险无法重现业务在历史预测中发现的问题。由于没有注册中心来记录生产中使用的工件，因此必须进行手工工作来重新创建模型的原始条件。在大多数公司中，这可能是一个难以置信的挑战(如果不是不可能的话)，因为用于训练模型的底层数据可能已经发生了变化，使得无法重新创建该状态。

如图16.3所示，首选的方法是利用模型注册中心服务。例如，MLflow在其api中提供了这种功能，允许我们将每次再培训运行的详细信息记录到跟踪服务器，如果计划的再培训工作在保留数据上表现更好，则处理生产提升，并存档旧模型以供将来参考。如果我们使用了这个框架，那么在生产环境中运行的模型的测试过程就会非常简单，只需从注册表项中召回工件，将其加载到笔记本环境中，并使用诸如shape之类的工具生成可解释的相关报告。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|COMP5328

Posted on 2023年8月11日2023年8月28日 by statistics-lab

计算机代写|机器学习代写machine learning代考|Biased testing

Internal testing is easy-well, easier than the alternatives. It’s painless (if the model works properly). It’s what we typically think of when we’re qualifying the results of a project. The process typically involves the following:

Generating predictions on new (unseen to the modeling process) data
Analyzing the distribution and statistical properties of the new predictions
Taking random samples of predictions and making qualitative judgments of them
Running handcrafted sample data (or their own accounts, if applicable) through the model

The first two elements in this list are valid for qualification of model effectiveness. They are wholly void of bias and should be done. The latter two, on the other hand, are dangerous. The final one is the more dangerous of them.

In our music playlist generator system scenario, let’s say that the DS team members are all fans of classical music. Throughout their qualitative verifications, they’ve been checking to see the relative quality of the playlist generator for the field of music that they are most familiar with: classical music. To perform these validations, they’ve been generating listening history of their favorite pieces, adjusting the implementation to fine-tune the results, and iterating on the validation process.

When they are fully satisfied that the solution works well at identifying a nearly uncanny level of sophistication for capturing thematic and tonally relevant similar pieces of music, they ask a colleague what they think. The results for both the DS team (Ben and Julie) as well as for their data warehouse engineer friend Connor are shown in figure 15.10.

计算机代写|机器学习代写machine learning代考|Dogfooding

A far more thorough approach than Ben and Julie’s first attempt would have been to canvass people at the company. Instead of keeping the evaluation internal to the team, where a limited exposure to genres hampers their ability to qualitatively measure the effectiveness of the project, they could ask for help. They could ask around and see if people at the company might be interested in taking a look at how their own accounts and usage would be impacted by the changes the DS team is introducing. Figure 15.11 illustrates how this could work for this scenario.

Dogfooding, in the broadest sense, is consuming the results of your own product. The term refers to opening up functionality that is being developed so that everyone at a company can use it, find out how to break it, provide feedback on how it’s broken, and collectively work toward building a better product. All of this happens across a broad range of perspectives, drawing on the experience and knowledge of many employees from all departments.

However, as you can see in figure 15.11, the evaluation still contains bias. An internal user who uses the company’s product is likely not a typical user. Depending on their job function, they may be using their account to validate functionality in the product, use it for demonstrations, or simply interact with the product more because of an employee benefit associated with it.

In addition to the potentially spurious information contained within the listen history of employees, the other form of bias is that people like what they like. They also don’t like what they don’t like. Subjective responses to something as emotionally charged as music preferences add an incredible amount of bias due to the nature of being a member of the human race. Knowing that these predictions are based on their listening history and that it is their own company’s product, internal users evaluating their own profiles will generally be more critical than a typical user if they find something that they don’t like (which is a stark contrast to the builder bias that the DS team would experience).

While dogfooding is certainly preferable to evaluating a solution’s quality within the confines of the DS team, it’s still not ideal, mostly because of these inherent biases that exist.

机器学习代考

计算机代写|机器学习代写machine learning代考|Biased testing

内部测试很容易——好吧，比其他选择更容易。这是无痛的(如果模型工作正常的话)。这是我们在确定项目结果时通常会想到的。这个过程通常包括以下内容:

在新的(建模过程看不到的)数据上生成预测

分析新预测的分布和统计特性

随机抽取预测样本，并对其进行定性判断

通过模型运行手工制作的示例数据(或他们自己的帐户，如果适用的话)

此列表中的前两个元素对于模型有效性的资格是有效的。他们完全没有偏见，应该这样做。另一方面，后两者是危险的。最后一种是更危险的。

在我们的音乐播放列表生成器系统场景中，假设DS团队成员都是古典音乐迷。在他们的定性验证过程中，他们一直在检查他们最熟悉的音乐领域的播放列表生成器的相对质量:古典音乐。为了执行这些验证，他们已经生成了他们最喜欢的片段的收听历史，调整实现以微调结果，并在验证过程中迭代。

当他们完全满意这个解决方案能够很好地识别出一种近乎不可思议的复杂程度，从而捕捉到主题和音调相关的类似音乐片段时，他们就会询问同事自己的看法。DS团队(Ben和Julie)以及他们的数据仓库工程师朋友Connor的结果如图15.10所示。

计算机代写|机器学习代写machine learning代考|Dogfooding

比本和朱莉的第一次尝试更彻底的方法是在公司里游说。与其在团队内部进行评估(游戏邦注:因为对游戏类型的接触有限而阻碍了他们定性地衡量项目的有效性)，他们不如寻求帮助。他们可以四处询问，看看公司里的人是否有兴趣看看他们自己的账户和使用情况会受到DS团队引入的变化的影响。图15.11说明了如何在这个场景中工作。

从最广泛的意义上讲，狗食就是食用自己产品的结果。这个术语指的是开放正在开发的功能，以便公司的每个人都可以使用它，找出如何破坏它，提供关于它如何被破坏的反馈，并共同努力构建更好的产品。所有这些都是在广泛的视角下进行的，利用了各个部门许多员工的经验和知识。

然而，如图15.11所示，评估仍然包含偏差。使用公司产品的内部用户可能不是典型的用户。根据他们的工作职能，他们可能会使用他们的帐户来验证产品中的功能，将其用于演示，或者仅仅是因为与产品相关的员工福利而更多地与产品交互。

除了员工的倾听历史中包含的潜在虚假信息外，另一种形式的偏见是人们喜欢他们喜欢的东西。他们也不喜欢他们不喜欢的东西。对于像音乐偏好这样充满情感的事物的主观反应，由于作为人类一员的本质，增加了难以置信的偏见。知道这些预测是基于他们的收听历史，并且这是他们自己公司的产品，如果内部用户发现他们不喜欢的东西，他们评估自己的资料通常会比普通用户更重要(这与DS团队所经历的构建者偏见形成鲜明对比)。

虽然在DS团队的范围内，狗食肯定比评估解决方案的质量更可取，但它仍然不是理想的，主要是因为存在这些固有的偏见。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|机器学习代写machine learning代考|COMP5318

Posted on 2023年8月11日2023年8月28日 by statistics-lab

计算机代写|机器学习代写machine learning代考|Process over technology

The success of a feature store implementation is not in the specific technology used to implement it. The benefit is in the actions it enables a company to take with its calculated and standardized feature data.

Let’s briefly examine an ideal process for a company that needs to update the definition of its revenue metric. For such a broadly defined term, the concept of revenue at a company can be interpreted in many ways, depending on the end-use case, the department concerned with the usage of that data, and the level of accounting standards applied to the definition for those use cases.

A marketing group, for instance, may be interested in gross revenue for measuring the success rate of advertising campaigns. The DE group may define multiple variations of revenue to handle the needs of different groups within the company. The DS team may be looking at a windowed aggregation of any column in the data warehouse that has the words “sales,” “revenue,” or “cost” in it to create feature data. The BI team might have a more sophisticated set of definitions that appeal to a broader set of analytics use cases.

Changing a definition of the logic of such a key business metric can have farreaching impacts to an organization if everyone is responsible for their group’s personal definitions. The likelihood of each group changing its references in each of the queries, code bases, reports, and models that it is responsible for is marginal. Fragmenting the definition of such an important metric across departments is problematic enough on its own. Creating multiple versions of the defining characteristics within each group is a recipe for complete chaos. With no established standard for how key business metrics are defined, groups within a company are effectively no longer speaking on even terms when evaluating the results and outputs from one another.

Regardless of the technology stack used to store the data for consumption, having a process built around change management for critical features can guarantee a frictionless and resilient data migration. Figure 15.4 illustrates such a process.

计算机代写|机器学习代写machine learning代考|The dangers of a data silo

Data silos are deceptively dangerous. Isolating data in a walled-off, private location that is accessible only to a certain select group of individuals stifles the productivity of other teams, causes a large amount of duplicated effort throughout an organization, and frequently (in my experience of seeing them, at least) leads to esoteric data definitions that, in their isolation, depart wildly from the general accepted view of a metric for the rest of the company.

It may seem like a really great thing when an ML team is granted a database of its own or an entire cloud object store bucket to empower the team to be self-service. The seemingly geologically scaled time spent for the DE or warehousing team to load required datasets disappears. The team members are fully masters of their domain, able to load, consume, and generate data with impunity. This can definitely be a good thing, provided that clear and soundly defined processes govern the management of this technology.

But clean or dirty, an internal-use-only data storage stack is a silo, the contents squirreled away from the outside world. These silos can generate more problems than they solve.

To show how a data silo can be disadvantageous, let’s imagine that we work at a company that builds dog parks. Our latest ML project is a bit of a moon shot, working with counterfactual simulations (causal modeling) to determine which amenities would be most valuable to our customers at different proposed construction sites. The goal is to figure out how to maximize the perceived quality and value of the proposed parks while minimizing our company’s investment costs.

To build such a solution, we have to get data on all of the registered dog parks in the country. We also need demographic data associated with the localities of these dog parks. Since the company’s data lake contains no data sources that have this information, we have to source it ourselves. Naturally, we put all of this information in our own environment, thinking it will be far faster than waiting for the DE team’s backlog to clear enough to get around to working on it.

After a few months, questions began to arise about some of the contracts that the company had bid on in certain locales. The business operations team is curious about why so many orders for custom paw-activated watering fountains are being ordered as part of some of these construction inventories. As the analysts begin to dig into the data available in the data lake, they can’t make sense of why the recommendations for certain contracts consistently recommended these incredibly expensive components.

机器学习代考

计算机代写|机器学习代写machine learning代考|Process over technology

功能库实现的成功不在于实现它所使用的特定技术。其好处在于，它使公司能够利用其计算和标准化的特征数据采取行动。

让我们简要地研究一下需要更新收入指标定义的公司的理想流程。对于这样一个定义广泛的术语，公司收入的概念可以用多种方式解释，这取决于最终用例、与该数据的使用有关的部门，以及应用于这些用例定义的会计标准的级别。

例如，一个营销团队可能对毛收入感兴趣，以衡量广告活动的成功率。DE组可以定义多种收入变化来处理公司内不同组的需求。DS团队可能会查看数据仓库中包含“销售”、“收入”或“成本”字样的任何列的窗口聚合，以创建特征数据。BI团队可能拥有更复杂的定义集，以吸引更广泛的分析用例集。

如果每个人都对其团队的个人定义负责，那么更改此类关键业务度量的逻辑定义可以对组织产生深远的影响。每个组在其负责的每个查询、代码库、报告和模型中更改其引用的可能性很小。跨部门划分如此重要的度量标准的定义本身就有足够的问题。在每个组中创建定义特征的多个版本会导致完全的混乱。由于没有关于如何定义关键业务指标的既定标准，公司内部的团队在评估彼此的结果和输出时，实际上不再以平等的方式说话。

无论使用何种技术堆栈来存储供消费的数据，围绕关键特性的变更管理构建流程都可以保证无摩擦且有弹性的数据迁移。图15.4说明了这样一个过程。

计算机代写|机器学习代写machine learning代考|The dangers of a data silo

数据孤岛看起来很危险。将数据隔离在一个封闭的私有位置，只有特定的一组个人可以访问，这会扼杀其他团队的生产力，在整个组织中导致大量的重复工作，并且经常(至少在我看到他们的经验中)导致深奥的数据定义，在他们的隔离中，与公司其他部分普遍接受的度量标准观点背道而驰。

当ML团队被授予自己的数据库或整个云对象存储桶以授权团队进行自助服务时，这似乎是一件非常棒的事情。DE或仓库团队加载所需数据集所花费的时间似乎是按地质比例计算的。团队成员完全掌握了他们的领域，能够不受惩罚地加载、使用和生成数据。这绝对是一件好事，前提是该技术的管理有清晰而完善的流程定义。

但是，无论是干净的还是脏的，仅供内部使用的数据存储堆栈都是一个筒仓，其内容与外部世界隔绝。这些竖井产生的问题比它们解决的问题要多。

为了说明数据孤岛是多么的不利，让我们想象一下，我们在一家建造狗公园的公司工作。我们最新的机器学习项目有点像登月，使用反事实模拟(因果模型)来确定在不同的拟建工地，哪些设施对我们的客户最有价值。我们的目标是找出如何最大限度地提高拟建公园的质量和价值，同时最大限度地降低公司的投资成本。

为了建立这样的解决方案，我们必须获得全国所有注册狗公园的数据。我们还需要与这些狗公园所在地相关的人口统计数据。由于公司的数据湖不包含包含此信息的数据源，因此我们必须自己查找。很自然地，我们把所有这些信息放在我们自己的环境中，认为这样做比等待DE团队的待办事项清理干净以便腾出时间进行工作要快得多。

几个月后，该公司在某些地区投标的一些合同开始出现问题。业务运营团队很好奇，为什么这么多定制的爪动喷水器订单被订购，作为这些建筑库存的一部分。当分析师开始挖掘数据湖中的可用数据时，他们无法理解为什么某些合同的建议总是推荐这些非常昂贵的组件。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写