机器学习代写|决策树作业代写decision tree代考|Pruning

statistics-lab™ 为您的留学生涯保驾护航 在代写决策树decision tree方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写决策树decision tree代写方面经验极为丰富，各种代写决策树decision tree相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

机器学习代写|决策树作业代写decision tree代考|Reduced-Error Pruning

Reduced-error pruning is a conceptually simple strategy proposed by Quinlan [94]. It uses a pruning set (a part of the training set) to evaluate the goodness of a given subtree from $T$. The idea is to evaluate each non-terminal node $t \in \zeta_{T}$ with regard to the classification error in the pruning set. If such an error decreases when we replace the subtree $T^{(t)}$ by a leaf node, than $T^{(t)}$ must be pruned.

Quinlan imposes a constraint: a node $t$ cannot be pruned if it contains a subtree that yields a lower classification error in the pruning set. The practical consequence of this constraint is that REP should be performed in a bottom-up fashion. The REP pruned tree $T^{\prime}$ presents an interesting optimality property: it is the smallest most accurate tree resulting from pruning original tree $T$ [94]. Besides this optimality property, another advantage of REP is its linear complexity, since each node is visited only once in $T$. An obvious disadvantage is the need of using a pruning set, which means one has to divide the original training set, resulting in less instances to grow the tree. This disadvantage is particularly serious for small data sets.

机器学习代写|决策树作业代写decision tree代考|Pessimistic Error Pruning

Also proposed by Quinlan [94], the pessimistic error pruning uses the training set for both growing and pruning the tree. The apparent error rate, i.e., the error rate calculated over the training set, is optimistically biased and cannot be used to decide whether pruning should be performed or not. Quinlan thus proposes adjusting the apparent error according to the continuity correction for the binomial distribution (cc) in order to provide a more realistic error rate. Consider the apparent error of a pruned node $t$, and the error of its entire subtree $T^{(t)}$ before pruning is performed, respectively:
\begin{aligned} r^{(t)} &=\frac{E^{(t)}}{N_{x}^{(t)}} \ r^{T^{(t)}} &=\frac{\sum_{s \in \lambda_{T^{(t)}}} E^{(s)}}{\sum_{s \in \lambda_{T^{(t)}}} N_{x}^{(s)}} . \end{aligned}
Modifying (2.33) and (2.34) according to $c c$ results in:
$$\begin{gathered} r_{c c}^{(t)}=\frac{E^{(t)}+1 / 2}{N_{x}^{(t)}} \ r_{c c}^{T^{(t)}}=\frac{\sum_{s \in \lambda_{T}(t)} E^{(s)}+1 / 2}{\sum_{s \in \lambda_{T}(t)} N_{x}^{(s)}}=\frac{\frac{\left|\lambda_{T(t)}\right|}{2} \sum_{s \in \lambda_{T(t)}} E^{(s)}}{\sum_{s \in \lambda_{T}(t)} N_{x}^{(s)}} . \end{gathered}$$
For the sake of simplicity, we will refer to the adjusted number of errors rather than the adjusted error rate, i.e., $E_{c c}^{(t)}=E^{(t)}+1 / 2$ and $E_{c c}^{T^{(t)}}=\left(\left|\lambda_{T^{(t)}}\right| / 2\right) \sum_{s \in \lambda_{T^{(t)}}} E^{(s)}$. Ideally, pruning should occur if $E_{c c}^{(t)} \leq E_{c c}^{T^{(t)}}$, but note that this condition seldom holds, since the decision tree is usually grown up to the homogeneity stopping criterion (criterion 1 in Sect. 2.3.2), and thus $E_{c c}^{T^{(t)}}=\left|\lambda_{T^{(t)}}\right| / 2$ whereas $E_{c c}^{(t)}$ will very probably be a higher value. In fact, due to the homogeneity stopping criterion, $E_{c c}^{T^{(t)}}$ becomes simply a measure of complexity which associates each leaf node with a cost of $1 / 2$. Quinlan, aware of this situation, weakens the original condition a

$$E_{c c}^{(t)} \leq E_{c c}^{T^{(t)}}$$
to
$$E_{c c}^{(t)} \leq E_{c c}^{T^{(t)}}+S E\left(E_{c c}^{T^{(\mathrm{r})}}\right)$$
where
$$S E\left(E_{c c}^{T^{(t)}}\right)=\sqrt{\frac{E_{c c}^{T^{(t)}} *\left(N_{x}^{(t)}-E_{c c}^{T^{(t)}}\right)}{N_{x}^{(t)}}}$$
is the standard error for the subtree $T^{(t)}$, computed as if the distribution of errors were binomial.

PEP is computed in a top-down fashion, and if a given node $t$ is pruned, its descendants are not examined, which makes this pruning strategy quite efficient in terms of computational effort. As a point of criticism, Esposito et al. [32] point out that the introduction of the continuity correction in the estimation of the error rate has no theoretical justification, since it was never applied to correct over-optimistic estimates of error rates in statistics.

机器学习代写|决策树作业代写decision tree代考|Minimum Error Pruning

Originally proposed by Niblett and Bratko [82] and further extended by Cestnik and Bartko [19], minimum error pruning is a bottom-up approach that seeks to minimize the expected error rate for unseen cases. It estimates the expected error rate in node $t\left(E E^{(t)}\right)$ as follows:
$$E E^{(t)}=\min {\xi M}\left[\frac{N{x}^{(t)}-N_{\bullet, y y^{(t)}}^{(t)}+\left(1-p_{\bullet}^{(t)}\right) \times m}{N_{x}^{(t)}+m}\right] .$$
where $m$ is a parameter that determines the importance of the a priori probability on the estimation of the error. Eq. (2.39), presented in [19], is a generalisation of the expected error rate presented in [82] if we assume that $m=k$ and that $p_{\bullet}^{(t)}=$ $1 / k, \forall y_{n} \in Y$.

MEP is performed by comparing $E E^{(t)}$ with the weighted sum of the expected error rate of all children nodes from $t$. Each weight is given by $p_{y_{j}, \boldsymbol{e}}$, assuming $v_{j}$ is the partition corresponding to the $j$ th child of $t$. A disadvantage of MEP is the need of setting the ad-hoc parameter $m$. Usually, the higher the value of $m$, the more severe the pruning. Cestnik and Bratko [19] suggest that a domain expert should set $m$ according to the level of noise in the data. Alternatively, a set of trees pruned with different values of $m$ could be offered to the domain expert, so he/she can choose the best one according to his/her experience.

机器学习代写|决策树作业代写decision tree代考|Reduced-Error Pruning

Quinlan 施加了一个约束：一个节点吨如果它包含在修剪集中产生较低分类错误的子树，则无法修剪。这种约束的实际结果是 REP 应该以自下而上的方式执行。REP 修剪树吨′提出了一个有趣的最优性：它是修剪原始树得到的最小最准确的树吨[94]。除了这个最优性之外，REP 的另一个优点是它的线性复杂性，因为每个节点只被访问一次吨. 一个明显的缺点是需要使用剪枝集，这意味着必须分割原始训练集，导致生成树的实例更少。这个缺点对于小数据集尤其严重。

机器学习代写|决策树作业代写decision tree代考|Pessimistic Error Pruning

Quinlan [94] 也提出，悲观错误修剪使用训练集来生长和修剪树。表观错误率，即在训练集上计算的错误率，具有乐观偏差，不能用于决定是否应该进行剪枝。因此，Quinlan 建议根据二项式分布 (cc) 的连续性校正来调整表观误差，以提供更真实的错误率。考虑修剪节点的明显错误吨, 及其整个子树的误差吨(吨)在进行剪枝之前，分别为：
r(吨)=和(吨)ñX(吨) r吨(吨)=∑s∈λ吨(吨)和(s)∑s∈λ吨(吨)ñX(s).

rCC(吨)=和(吨)+1/2ñX(吨) rCC吨(吨)=∑s∈λ吨(吨)和(s)+1/2∑s∈λ吨(吨)ñX(s)=|λ吨(吨)|2∑s∈λ吨(吨)和(s)∑s∈λ吨(吨)ñX(s).

PEP 以自上而下的方式计算，如果给定节点吨被修剪，它的后代没有被检查，这使得这种修剪策略在计算工作方面非常有效。作为一个批评点，Esposito 等人。[32] 指出，在错误率估计中引入连续性校正没有理论依据，因为它从未应用于纠正统计中错误率的过度乐观估计。

机器学习代写|决策树作业代写decision tree代考|Minimum Error Pruning

MEP 通过比较来执行和和(吨)与所有子节点的预期错误率的加权和吨. 每个权重由下式给出p是j,和， 假设在j是对应的分区j的第一个孩子吨. MEP 的一个缺点是需要设置 ad-hoc 参数米. 通常，值越高米，剪枝越严重。Cestnik 和 Bratko [19] 建议领域专家应该设置米根据数据中的噪声水平。或者，用不同的值修剪一组树米可以提供给领域专家，因此他/她可以根据自己的经验选择最好的。

有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。