## 机器学习代写|决策树作业代写decision tree代考|Other Induction Strategies

statistics-lab™ 为您的留学生涯保驾护航 在代写决策树decision tree方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写决策树decision tree代写方面经验极为丰富，各种代写决策树decision tree相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 机器学习代写|决策树作业代写decision tree代考|Other Induction Strategies

We presented a thorough review of the greedy top-down strategy for induction of decision trees in the previous section. In this section, we briefly present alternative strategies for inducing decision trees.

Bottom-up induction of decision trees was first mentioned in [59]. The authors propose a strategy that resembles agglomerative hierarchical clustering. The algorithm starts with each leaf having objects of the same class. In that way, a $k$-class

problem will generate a decision tree with $k$ leaves. The key idea is to merge, recursively, the two most similar classes in a non-terminal node. Then, a hyperplane is associated to the new non-terminal node, much in the same way as in top-down induction of oblique trees (in [59], a linear discriminant analysis procedure generates the hyperplanes). Next, all objects in the new non-terminal node are considered to be members of the same class (an artificial class that embodies the two clustered classes), and the procedure evaluates once again which are the two most similar classes. By recursively repeating this strategy, we end up with a decision tree in which the more obvious discriminations are done first, and the more subtle distinctions are postponed to lower levels. Landeweerd et al. [59] propose using the Mahalanobis distance to evaluate similarity among classes:
where $\mu_{y_{i}}$ is the mean attribute vector of class $y_{i}$ and $\Sigma$ is the covariance matrix pooled over all classes.

Some obvious drawbacks of this strategy of bottom-up induction are: (i) binaryclass problems provide a l-level decision tree (root node and two children); such a simple tree cannot model complex problems; (ii) instances from the same class may be located in very distinct regions of the attribute space, harming the initial assumption that instances from the same class should be located in the same leaf node; (iii) hierarchical clustering and hyperplane generation are costly operations; in fact, a procedure for inverting the covariance matrix in the Mahalanobis distance is usually of time complexity proportional to $O\left(n^{3}\right) .^{5}$ We believe these issues are among the main reasons why bottom-up induction has not become as popular as topdown induction. For alleviating these problems, Barros et al. [4] propose a bottom-up induction algorithm named BUTIA that combines EM clustering with SVM classifiers. The authors later generalize BUTIA to a framework for generating oblique decision trees, namely BUTIF [5], which allows the application of different clustering and classification strategies.

Hybrid induction was investigated in [56]. The ideia is to combine both bottom-up and top-down approaches for building the final decision tree. The algorithm starts by executing the bottom-up approach as described above until two subgroups are achieved. Then, two centers (mean attribute vectors) and covariance information are extracted from these subgroups and used for dividing the training data in a topdown fashion according to a normalized sum-of-squared-error criterion. If the two new partitions induced account for separated classes, then the hybrid induction is finished; otherwise, for each subgroup that does not account for a class, recursively executes the hybrid induction by once again starting with the bottom-up procedure. Kim and Landgrebe [56] argue that in hybrid induction “It is more likely to converge to classes of informational value, because the clustering initialization provides early

guidance in that direction, while the straightforward top-down approach does not guarantee such convergence”.

Several studies attempted on avoiding the greedy strategy usually employed for inducing trees. For instance, lookahead was employed for trying to improve greedy induction $[17,23,29,79,84]$. Murthy and Salzberg [79] show that one-level lookahead does not help building significantly better trees and can actually worsen the quality of trees induced. A more recent strategy for avoiding greedy decision-tree induction is to generate decision trees through evolutionary algorithms. The idea involved is to consider each decision tree as an individual in a population, which is evolved through a certain number of generations. Decision trees are modified by genetic operators, which are performed stochastically. A thorough review of decisiontree induction through evolutionary algorithms is presented in [6].

## 机器学习代写|决策树作业代写decision tree代考|Chapter Remarks

In this chapter, we presented the main design choices one has to face when programming a decision-tree induction algorithm. We gave special emphasis to the greedy top-down induction strategy, since it is by far the most researched technique for decision-tree induction.

Regarding top-down induction, we presented the most well-known splitting measures for univariate decision trees, as well as some new criteria found in the literature, in an unified notation. Furthermore, we introduced some strategies for building decision trees with multivariate tests, the so-called oblique trees. In particular, we showed that efficient oblique decision-tree induction has to make use of heuristics in order to derive “good” hyperplanes within non-terminal nodes. We detailed the strategy employed in the $\mathrm{OCl}$ algorithm $[77,80]$ for deriving hyperplanes with the help of a randomized perturbation process. Following, we depicted the most common stopping criteria and post-pruning techniques employed in classic algorithms such as CART [12] and C4.5 [89], and we ended the discussion on top-down induction with an enumeration of possible strategies for dealing with missing values, either in the growing phase or during classification of a new instance.

We ended our analysis on decision trees with some alternative induction strategies, such as bottom-up induction and hybrid-induction. In addition, we briefly discussed work that attempt to avoid the greedy strategy, by either implementing lookahead techniques, evolutionary algorithms, beam-search, linear programming, (non-) incremental restructuring, skewing, or anytime learning. In the next chapters, we present an overview of evolutionary algorithms and hyper-heuristics, and review how they can be applied to decision-tree induction.

## 机器学习代写|决策树作业代写decision tree代考|Evolutionary Algorithms

Evolutionary algorithms (EAs) are a collection of optimisation techniques whose design is based on metaphors of biological processes. Fretias [20] defines EAs as “stochastic search algorithms inspired by the process of neo-Darwinian evolution”, and Weise [44] states that “EAs are population-based metaheuristic optimisation algorithms that use biology-inspired mechanisms (…) in order to refine a set of solution candidates iteratively”.

The idea surrounding EAs is the following. There is a population of individuals, where each individual is a possible solution to a given problem. This population evolves towards increasingly better solutions through stochastic operators. After the evolution is completed, the fittest individual represents a “near-optimal” solution for the problem at hand.

For evolving individuals, an EA evaluates each individual through a fitness function that measures the quality of the solutions that are being evolved. After the evaluation of all individuals that are part of the initial population, the algorithm’s iterative process starts. At each iteration, hereby called generation, the fittest individuals have a higher probability of being selected for reproduction to increase the chances of producing good solutions. The selected individuals undergo stochastic genetic operators, such as crossover and mutation, producing new offspring. These new individuals will replace the current population of individuals and the evolutionary process continues until a stopping criterion is satisfied (e.g., until a fixed number of generations is achieved, or until a satisfactory solution has been found).

There are several kinds of EAs, such as genetic algorithms (GAs), genetic programming (GP), classifier systems (CS), evolution strategies (ES), evolutionary programming (EP), estimation of distribution algorithms (EDA), etc. This chapter will focus on GA and GP, the most commonly used EAs for data mining [19]. At a high level of abstraction, GAs and GP can be described by the pseudocode in Algorithm $1 .$

## 机器学习代写|决策树作业代写decision tree代考|Evolutionary Algorithms

EA有几种类型，如遗传算法（GAs）、遗传编程（GP）、分类系统（CS）、进化策略（ES）、进化编程（EP）、分布估计算法（EDA）等。本章将重点介绍 GA 和 GP，这是数据挖掘中最常用的 EA [19]。在高抽象层次上，GAs 和 GP 可以用 Algorithm 中的伪代码来描述1.

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 机器学习代写|决策树作业代写decision tree代考|Cost-Complexity Pruning

statistics-lab™ 为您的留学生涯保驾护航 在代写决策树decision tree方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写决策树decision tree代写方面经验极为丰富，各种代写决策树decision tree相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 机器学习代写|决策树作业代写decision tree代考|Cost-Complexity Pruning

Cost-complexity pruning is the post-pruning strategy of the CART system, detailed in [12]. It consists of two steps:

1. Generate a sequence of increasingly smaller trees, beginning with $T$ and ending with the root node of $T$, by successively pruning the subtree yielding the lowest cost complexity, in a bottom-up fashion;
2. Choose the best tree among the sequence based on its relative size and accuracy (either on a pruning set, or provided by a cross-validation procedure in the training set).

The idea within step 1 is that pruned tree $T_{i+1}$ is obtained by pruning the subtrees that show the lowest increase in the apparent error (error in the training set) per pruned leaf. Since the apparent error of pruned node $t$ increases by the amount $r^{(t)}-r^{T^{(r)}}$, whereas its number of leaves decreases by $\left|\lambda_{T^{(t)}}\right|-1$ units, the following ratio measures the increase in apparent error rate per pruned leaf:
$$\alpha=\frac{r^{(r)}-r^{T^{(t)}}}{\left|\lambda_{T^{(t)}}\right|-1}$$
Therefore, $T_{i+1}$ is obtained by pruning all nodes in $T_{i}$ with the lowest value of $\alpha$. $T_{0}$ is obtained by pruning all nodes in $T$ whose $\alpha$ value is 0 . It is possible to show that each tree $T_{i}$ is associated to a distinct value $\alpha_{i}$, such that $\alpha_{i}<\alpha_{i+1}$. Building the sequence of trees in step 1 takes quadratic time with respect to the number of internal nodes.

Regarding step 2, CCP chooses the smallest tree whose error (either on the pruning set or on cross-validation) is not more than one standard error (SE) greater than the lowest error observed in the sequence of trees. This strategy is known as “1-SE” variant since the work of Esposito et al. [33], which proposes ignoring the standard error constraint, calling the strategy of selecting trees based only on accuracy of “0-SE”. It is argued that 1-SE has a tendency of overpruning trees, since its selection is based on a conservative constraint $[32,33]$.

## 机器学习代写|决策树作业代写decision tree代考|Error-Based Pruning

This strategy was proposed by Quinlan and it is implemented as the default pruning strategy of C4.5 [89]. It is an improvement over PEP, based on a far more pessimistic estimate of the expected error. Unlike PEP, EBP performs a bottom-up search, and it performs not only the replacement of non-terminal nodes by leaves but also the grafting $g^{4}$ of subtree $T^{(t)}$ onto the place of parent $t$. Grafting is exemplified in Fig. $2.2$.
Since grafting is potentially a time-consuming task, only the child subtree $T^{\left(t^{\prime}\right)}$ of $t$ with the greatest number of instances is considered to be grafted onto the place of $t$.

For deciding whether to replace a non-terminal node by a leaf (subtree replacement), to graft a subtree onto the place of its parent (subtree raising) or not to prune at all, a pessimistic estimate of the expected error is calculated by using an upper confidence bound. Assuming that errors in the training set are binomially distributed with a given probability $p$ in $N_{x}^{(t)}$ trials, it is possible to compute the exact value of the upper confidence bound as the value of $p$ for which a binomially distributed random variable $P$ shows $E^{(t)}$ successes in $N_{x}^{(t)}$ trials with probability $C F$. In other words, given a particular confidence $C F$ (C4.5 default value is $C F=25 \%$ ), we can find the upper bound of the expected error $\left(E E_{U B}\right)$ as follows:
$$E E_{U B}=\frac{f+\frac{z^{2}}{2 N_{x}}+z \sqrt{\frac{f}{N_{x}}-\frac{f^{2}}{N_{x}}+\frac{z^{2}}{4 N_{x}^{2}}}}{1+\frac{z^{2}}{N_{x}}}$$
where $f=E^{(t)} / N_{x}$ and $z$ is the number of standard deviations corresponding to the confidence $C F$ (e.g., for $C F=25 \%, z=0.69$ ).

In order to calculate the expected error of node $t\left(E E^{(t)}\right)$, one must simply compute $N_{x}^{(t)} \times E E_{U B}$. For evaluating a subtree $T^{(t)}$, one must sum the expected error of every leaf of that subtree, i.e., $\sum_{s \in \lambda_{T}(t)} E E^{(s)}$. Hence, given a non-terminal node $t$, it is possible to decide whether one should perform subtree replacement (when condition $E E^{(t)} \leq E E^{T^{(t)}}$ holds), subtree raising (when conditions $\exists j \in \zeta_{t}, E E^{(j)}<E E^{(t)} \wedge$ $\forall i \in \zeta_{I}, N_{x}^{(i)}<N_{x}^{(j)}$ hold), or not to prune $t$ otherwise.

An advantage of EBP is the new grafting operation that allows pruning useless branches without ignoring interesting lower branches (an elegant solution to the horizon effect problem). A drawback of the method is the parameter $C F$, even though it represents a confidence level. Smaller values of $C F$ result in more pruning.

## 机器学习代写|决策树作业代写decision tree代考|Empirical Evaluations

Some studies in the literature performed empirical analyses for evaluating pruning strategies. For instance, Quinlan [94] compared four methods of tree pruning (three of them presented in the previous sections-REP, PEP and CCP 1-SE). He argued that those methods in which a pruning set is needed (REP and CCP) did not perform noticeably better than the other methods, and thus their requirement for additional data is a weakness.

Mingers [71] compared five pruning methods, all of them presented in the previous sections (CCP, CVP, MEP, REP and PEP), and related them to different splitting measures. He states that pruning can improve the accuracy of induced decision trees by up to $25 \%$ in domains with noise and residual variation. In addition, he highlights the following findings: (i) MEP (the original version by Niblett and Bratko [82]) is the least accurate method due to its sensitivity to the number of classes in the data; (ii) PEP is the most “crude” strategy, though the fastest one-due to some bad results,

it should be used with caution; (iii) CVP, CCP and REP performed well, providing consistently low error-rates for all data sets used; and (iv) there is no evidence of an interaction between the splitting measure and the pruning method used for inducing a decision tree.

Buntine [16], in his PhD thesis, also reports experiments on pruning methods (PEP, MEP, CCP 0-SE and 1-SE for both pruning set and cross-validation). Some of his findings were: (i) CCP $0-S E$ versions were marginally superior than the $1-S E$ versions; (ii) CCP 1-SE versions were superior in data sets with little apparent structure, where more severe pruning was inherently better; (iii) CCP 0-SE with crossvalidation was marginally better than the other methods, though not in all data sets; and (iv) PEP performed reasonably well in all data sets, and was significantly superior in well-structured data sets (mushroom, glass and LED, all from UCI [36]);

Esposito et al. [32] compare the six post-pruning methods presented in the previous sections within an extended C4.5 system. Their findings were the following: (i) MEP, CVP, and EBP tend to underprune, whereas 1-SE (both cross-validation and pruning set versions) and REP have a propensity for overpruning; (ii) using a pruning-set is not usually a good option; (iii) PEP and EBP behave similarly, despite the difference in their formulation; (iv) pruning does not generally decrease the accuracy of a decision tree (only one of the domains tested was deemed as “pruning-averse”); and (v) data sets not prone to pruning are usually the ones with the highest base error whereas data sets with a low base error tend to benefit of any pruning strategy.

For a comprehensive survey of strategies for simplifying decision trees, please refer to [13]. For more details on post-pruning techniques in decision trees for regression, we recommend $[12,54,85,97,113-115]$.

## 机器学习代写|决策树作业代写decision tree代考|Cost-Complexity Pruning

1. 生成一系列越来越小的树，从吨并以根节点结束吨，通过以自下而上的方式连续修剪产生最低成本复杂度的子树；
2. 根据其相对大小和准确性（在修剪集上，或由训练集中的交叉验证程序提供）在序列中选择最佳树。

## 机器学习代写|决策树作业代写decision tree代考|Error-Based Pruning

EBP 的一个优点是新的嫁接操作，它允许修剪无用的分支而不会忽略有趣的较低分支（对地平线效应问题的优雅解决方案）。该方法的一个缺点是参数CF，即使它代表一个置信水平。较小的值CF导致更多的修剪。

## 机器学习代写|决策树作业代写decision tree代考|Empirical Evaluations

Mingers [71] 比较了五种修剪方法，所有这些方法都在前面的章节中介绍过（CCP、CVP、MEP、REP 和 PEP），并将它们与不同的分裂措施相关联。他指出，剪枝可以将诱导决策树的准确性提高多达25%在具有噪声和残余变化的域中。此外，他强调了以下发现：（i）MEP（Niblett 和 Bratko [82] 的原始版本）是最不准确的方法，因为它对数据中的类数很敏感；(ii) PEP 是最“粗鲁”的策略，虽然是最快的策略——由于一些糟糕的结果，

Buntine [16] 在他的博士论文中还报告了剪枝方法的实验（PEP、MEP、CCP 0-SE 和 1-SE 用于剪枝集和交叉验证）。他的一些发现是：(i) CCP0−小号和版本略优于1−小号和版本；(ii) CCP 1-SE 版本在几乎没有明显结构的数据集中表现出色，其中更严格的修剪本质上更好；(iii) 具有交叉验证的 CCP 0-SE 略好于其他方法，尽管并非在所有数据集中；(iv) PEP 在所有数据集中表现相当不错，并且在结构良好的数据集中（蘑菇、玻璃和 LED，均来自 UCI [36]）显着优于；

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 机器学习代写|决策树作业代写decision tree代考|Pruning

statistics-lab™ 为您的留学生涯保驾护航 在代写决策树decision tree方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写决策树decision tree代写方面经验极为丰富，各种代写决策树decision tree相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 机器学习代写|决策树作业代写decision tree代考|Reduced-Error Pruning

Reduced-error pruning is a conceptually simple strategy proposed by Quinlan [94]. It uses a pruning set (a part of the training set) to evaluate the goodness of a given subtree from $T$. The idea is to evaluate each non-terminal node $t \in \zeta_{T}$ with regard to the classification error in the pruning set. If such an error decreases when we replace the subtree $T^{(t)}$ by a leaf node, than $T^{(t)}$ must be pruned.

Quinlan imposes a constraint: a node $t$ cannot be pruned if it contains a subtree that yields a lower classification error in the pruning set. The practical consequence of this constraint is that REP should be performed in a bottom-up fashion. The REP pruned tree $T^{\prime}$ presents an interesting optimality property: it is the smallest most accurate tree resulting from pruning original tree $T$ [94]. Besides this optimality property, another advantage of REP is its linear complexity, since each node is visited only once in $T$. An obvious disadvantage is the need of using a pruning set, which means one has to divide the original training set, resulting in less instances to grow the tree. This disadvantage is particularly serious for small data sets.

## 机器学习代写|决策树作业代写decision tree代考|Pessimistic Error Pruning

Also proposed by Quinlan [94], the pessimistic error pruning uses the training set for both growing and pruning the tree. The apparent error rate, i.e., the error rate calculated over the training set, is optimistically biased and cannot be used to decide whether pruning should be performed or not. Quinlan thus proposes adjusting the apparent error according to the continuity correction for the binomial distribution (cc) in order to provide a more realistic error rate. Consider the apparent error of a pruned node $t$, and the error of its entire subtree $T^{(t)}$ before pruning is performed, respectively:
\begin{aligned} r^{(t)} &=\frac{E^{(t)}}{N_{x}^{(t)}} \ r^{T^{(t)}} &=\frac{\sum_{s \in \lambda_{T^{(t)}}} E^{(s)}}{\sum_{s \in \lambda_{T^{(t)}}} N_{x}^{(s)}} . \end{aligned}
Modifying (2.33) and (2.34) according to $c c$ results in:
$$\begin{gathered} r_{c c}^{(t)}=\frac{E^{(t)}+1 / 2}{N_{x}^{(t)}} \ r_{c c}^{T^{(t)}}=\frac{\sum_{s \in \lambda_{T}(t)} E^{(s)}+1 / 2}{\sum_{s \in \lambda_{T}(t)} N_{x}^{(s)}}=\frac{\frac{\left|\lambda_{T(t)}\right|}{2} \sum_{s \in \lambda_{T(t)}} E^{(s)}}{\sum_{s \in \lambda_{T}(t)} N_{x}^{(s)}} . \end{gathered}$$
For the sake of simplicity, we will refer to the adjusted number of errors rather than the adjusted error rate, i.e., $E_{c c}^{(t)}=E^{(t)}+1 / 2$ and $E_{c c}^{T^{(t)}}=\left(\left|\lambda_{T^{(t)}}\right| / 2\right) \sum_{s \in \lambda_{T^{(t)}}} E^{(s)}$. Ideally, pruning should occur if $E_{c c}^{(t)} \leq E_{c c}^{T^{(t)}}$, but note that this condition seldom holds, since the decision tree is usually grown up to the homogeneity stopping criterion (criterion 1 in Sect. 2.3.2), and thus $E_{c c}^{T^{(t)}}=\left|\lambda_{T^{(t)}}\right| / 2$ whereas $E_{c c}^{(t)}$ will very probably be a higher value. In fact, due to the homogeneity stopping criterion, $E_{c c}^{T^{(t)}}$ becomes simply a measure of complexity which associates each leaf node with a cost of $1 / 2$. Quinlan, aware of this situation, weakens the original condition a

$$E_{c c}^{(t)} \leq E_{c c}^{T^{(t)}}$$
to
$$E_{c c}^{(t)} \leq E_{c c}^{T^{(t)}}+S E\left(E_{c c}^{T^{(\mathrm{r})}}\right)$$
where
$$S E\left(E_{c c}^{T^{(t)}}\right)=\sqrt{\frac{E_{c c}^{T^{(t)}} *\left(N_{x}^{(t)}-E_{c c}^{T^{(t)}}\right)}{N_{x}^{(t)}}}$$
is the standard error for the subtree $T^{(t)}$, computed as if the distribution of errors were binomial.

PEP is computed in a top-down fashion, and if a given node $t$ is pruned, its descendants are not examined, which makes this pruning strategy quite efficient in terms of computational effort. As a point of criticism, Esposito et al. [32] point out that the introduction of the continuity correction in the estimation of the error rate has no theoretical justification, since it was never applied to correct over-optimistic estimates of error rates in statistics.

## 机器学习代写|决策树作业代写decision tree代考|Minimum Error Pruning

Originally proposed by Niblett and Bratko [82] and further extended by Cestnik and Bartko [19], minimum error pruning is a bottom-up approach that seeks to minimize the expected error rate for unseen cases. It estimates the expected error rate in node $t\left(E E^{(t)}\right)$ as follows:
$$E E^{(t)}=\min {\xi M}\left[\frac{N{x}^{(t)}-N_{\bullet, y y^{(t)}}^{(t)}+\left(1-p_{\bullet}^{(t)}\right) \times m}{N_{x}^{(t)}+m}\right] .$$
where $m$ is a parameter that determines the importance of the a priori probability on the estimation of the error. Eq. (2.39), presented in [19], is a generalisation of the expected error rate presented in [82] if we assume that $m=k$ and that $p_{\bullet}^{(t)}=$ $1 / k, \forall y_{n} \in Y$.

MEP is performed by comparing $E E^{(t)}$ with the weighted sum of the expected error rate of all children nodes from $t$. Each weight is given by $p_{y_{j}, \boldsymbol{e}}$, assuming $v_{j}$ is the partition corresponding to the $j$ th child of $t$. A disadvantage of MEP is the need of setting the ad-hoc parameter $m$. Usually, the higher the value of $m$, the more severe the pruning. Cestnik and Bratko [19] suggest that a domain expert should set $m$ according to the level of noise in the data. Alternatively, a set of trees pruned with different values of $m$ could be offered to the domain expert, so he/she can choose the best one according to his/her experience.

## 机器学习代写|决策树作业代写decision tree代考|Reduced-Error Pruning

Quinlan 施加了一个约束：一个节点吨如果它包含在修剪集中产生较低分类错误的子树，则无法修剪。这种约束的实际结果是 REP 应该以自下而上的方式执行。REP 修剪树吨′提出了一个有趣的最优性：它是修剪原始树得到的最小最准确的树吨[94]。除了这个最优性之外，REP 的另一个优点是它的线性复杂性，因为每个节点只被访问一次吨. 一个明显的缺点是需要使用剪枝集，这意味着必须分割原始训练集，导致生成树的实例更少。这个缺点对于小数据集尤其严重。

## 机器学习代写|决策树作业代写decision tree代考|Pessimistic Error Pruning

Quinlan [94] 也提出，悲观错误修剪使用训练集来生长和修剪树。表观错误率，即在训练集上计算的错误率，具有乐观偏差，不能用于决定是否应该进行剪枝。因此，Quinlan 建议根据二项式分布 (cc) 的连续性校正来调整表观误差，以提供更真实的错误率。考虑修剪节点的明显错误吨, 及其整个子树的误差吨(吨)在进行剪枝之前，分别为：
r(吨)=和(吨)ñX(吨) r吨(吨)=∑s∈λ吨(吨)和(s)∑s∈λ吨(吨)ñX(s).

rCC(吨)=和(吨)+1/2ñX(吨) rCC吨(吨)=∑s∈λ吨(吨)和(s)+1/2∑s∈λ吨(吨)ñX(s)=|λ吨(吨)|2∑s∈λ吨(吨)和(s)∑s∈λ吨(吨)ñX(s).

PEP 以自上而下的方式计算，如果给定节点吨被修剪，它的后代没有被检查，这使得这种修剪策略在计算工作方面非常有效。作为一个批评点，Esposito 等人。[32] 指出，在错误率估计中引入连续性校正没有理论依据，因为它从未应用于纠正统计中错误率的过度乐观估计。

## 机器学习代写|决策树作业代写decision tree代考|Minimum Error Pruning

MEP 通过比较来执行和和(吨)与所有子节点的预期错误率的加权和吨. 每个权重由下式给出p是j,和， 假设在j是对应的分区j的第一个孩子吨. MEP 的一个缺点是需要设置 ad-hoc 参数米. 通常，值越高米，剪枝越严重。Cestnik 和 Bratko [19] 建议领域专家应该设置米根据数据中的噪声水平。或者，用不同的值修剪一组树米可以提供给领域专家，因此他/她可以根据自己的经验选择最好的。

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 机器学习代写|决策树作业代写decision tree代考|Other Classification Criteria

statistics-lab™ 为您的留学生涯保驾护航 在代写决策树decision tree方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写决策树decision tree代写方面经验极为丰富，各种代写决策树decision tree相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 机器学习代写|决策树作业代写decision tree代考|Other Classification Criteria

In this category, we include all criteria that did not fit in the previously-mentioned categories.

Li and Dubes [62] propose a binary criterion for binary-class problems called permutation statistic. It evaluates the degree of similarity between two vectors, $V_{a_{i}}$ and $y$, and the larger this statistic, the more alike the vectors. Vector $V_{a_{i}}$ is calculated as follows. Let $a_{i}$ be a given numeric attribute with the values $[8.20,7.3,9.35,4.8,7.65,4.33]$ and $N_{x}=6$. Vector $y=[0,0,1,1,0,1]$ holds the corresponding class labels. Now consider a given threshold $\Delta=5.0$. Vector $V_{a_{i}}$ is calculated in two steps: first, attribute $a_{i}$ values are sorted, i.e., $a_{i}=$ $[4.33,4.8,7.3,7.65,8.20,9.35]$, consequently rearranging $y=[1,1,0,0,0,1] ;$ then, $V_{a_{i}}(n)$ takes 0 when $a_{i}(n) \leq \Delta$, and 1 otherwise. Thus, $V_{a_{i}}=[0,0,1,1,1,1]$. The permutation statistic first analyses how many $1-1$ matches $(d)$ vectors $V_{a_{i}}$ and $y$ have. In this particular example, $d=1$. Next, it counts how many l’s there are in $V_{a_{i}}\left(n_{a}\right)$ and in $y\left(n_{y}\right)$. Finally, the permutation statistic can be computed as:
\begin{aligned} \beta^{\text {permutation }}\left(V_{a_{i}}, y\right) &=\sum_{j=0}^{d} \frac{\left(\begin{array}{c} n_{a} \ j \end{array}\right)\left(\begin{array}{c} N_{x}-n_{a} \ n_{y}-j \end{array}\right)}{\left(\begin{array}{c} N_{x} \ n_{y} \end{array}\right)}-\frac{\left(\begin{array}{c} n_{a} \ d \end{array}\right)\left(\begin{array}{c} N_{x}-n_{a} \ n_{y}-d \end{array}\right)}{\left(\begin{array}{c} N_{x} \ n_{g} \end{array}\right)} U \ \left(\begin{array}{c} n \ m \end{array}\right) &=0 \text { if } n<0 \text { or } m<0 \text { or } n<m \ &=\frac{n !}{m !(n-m) !} \text { otherwise } \end{aligned}
where $U$ is a (continuous) random variable distributed uniformly over $[0,1]$.

## 机器学习代写|决策树作业代写decision tree代考|Regression Criteria

All criteria presented so far are dedicated to classification problems. For regression problems, where the target variable $y$ is continuous, a common approach is to calculate the mean squared error (MSE) as a splitting criterion:
$$\operatorname{MSE}\left(a_{i}, \mathbf{X}, y\right)=N_{x}^{-1} \sum_{j=1}^{\left|a_{i}\right|} \sum_{x_{l} \in v_{j}}\left(y\left(x_{l}\right)-\overline{v_{v}}\right)^{2}$$
where $\overline{y_{v}}=N_{v_{i},}^{-1} \sum_{x_{i} \in v_{j}} y\left(x_{l}\right)$. Just as with clustering, we are trying to minimize the within-partition variance. Usually, the sum of squared errors is weighted over each partition according to the estimated probability of an instance belonging to the given partition [12]. Thus, we should rewrite MSE to:
$$w \operatorname{MSE}\left(a_{i}, \mathbf{X}, y\right)=\sum_{j=1}^{\left|a_{l}\right|} p_{v_{j}}, \sum_{x_{l} \in v_{j}}\left(y\left(x_{l}\right)-\overline{y_{v_{j}}}\right)^{2}$$
Another common criterion for regression is the sum of absolute deviations (SAD) [12], or similarly its weighted version given by:
$$w S A D\left(a_{i}, \mathbf{X}, y\right)=\sum_{j=1}^{\left|a_{i}\right|} p_{v j \cdot \bullet} \sum_{x_{l} \in v_{j}} a b s\left(y\left(x_{l}\right)-\operatorname{median}\left(y_{v j}\right)\right)$$
where median $\left(y_{v_{j}}\right)$ is the target attribute’s median of instances belonging to $\mathbf{X}{\mathrm{a}{i}=\mathbf{v}{\text {}}}$. Quinlan [93] proposes the use of the standard deviation reduction (SDR) for his pioneering system of model trees induction, M5. Wang and Witten [124] extend the work of Quinlan in their proposed system M5′, also employing the SDR criterion. It is given by: $$\operatorname{SDR}\left(a{i}, \mathbf{X}, y\right)=\sigma_{X}-\sum_{j=1}^{\left|a_{i}\right|} p_{v_{j}, \bullet} \sigma_{v j}$$
where $\sigma_{X}$ is the standard deviation of instances in $\mathbf{X}$ and $\sigma_{v_{j}}$ the standard deviation of instances in $\mathbf{X}{\mathbf{a}{l}=\mathbf{v}{j}}$. SDR should be maximized, i.e., the weighted sum of standard deviations of each partition should be as small as possible. Thus, partitioning the instance space according to a particular attribute $a{i}$ should provide partitions whose target attribute variance is small (once again we are interested in minimizing the within-partition variance). Observe that minimizing the second term in SDR is equivalent to minimizing wMSE, but in SDR we are using the partition standard deviation $(\sigma)$ as a similarity criterion whereas in wMSE we are using the partition variance $\left(\sigma^{2}\right)$.

## 机器学习代写|决策树作业代写decision tree代考|Multivariate Splits

All criteria presented so far are intended for building univariate splits. Decision trees with multivariate splits (known as oblique, linear or multivariate decision trees) are not so popular as the univariate ones, mainly because they are harder to interpret. Nevertheless, researchers reckon that multivariate splits can improve the performance

of the tree in several data sets, while generating smaller trees $[47,77,98]$. Clearly, there is a tradeoff to consider in allowing multivariate tests: simple tests may result in large trees that are hard to understand, yet multivariate tests may result in small trees with tests hard to understand [121].

A decision tree with multivariate splits is able to produce polygonal (polyhedral) partitions of the attribute space (hyperplanes at an oblique orientation to the attribute axes) whereas univariate trees can only produce hyper-rectangles parallel to the attribute axes. The tests at each node have the form:
$$w_{0}+\sum_{i=1}^{n} w_{i} a_{i}(x) \leq 0$$
where $w_{i}$ is a real-valued coefficient associated to the $i$ th attribute and $w_{0}$ the disturbance coefficient of the test.

CART (Classification and Regression Trees) [12] is one of the first systems that allowed multivariate splits. It employs a hill-climbing strategy with a backward attribute elimination for finding good (albeit suboptimal) linear combinations of attributes in non-terminal nodes. It is a fully-deterministic algorithm with no built-in mechanisms to escape local-optima. Breiman et al. [12] point out that the proposed algorithm has much room for improvement.

Another approach for building oblique decision trees is LMDT (Linear Machine Decision Trees) $[14,119]$, which is an evolution of the perceptron tree method [117]. Each non-terminal node holds a linear machine [83], which is a set of $k$ linear discriminant functions that are used collectively to assign an instance to one of the $k$ existing classes. LMDT uses heuristics to determine when a linear machine has stabilized (since convergence cannot be guaranteed). More specifically, for handling non-linearly separable problems, a method similar to simulated annealing (SA) is used (called thermal training). Draper and Brodley [30] show how LMDT can be altered to induce decision trees that minimize arbitrary misclassification cost functions.

SADT (Simulated Annealing of Decision Trees) [47] is a system that employs SA for finding good coefficient values for attributes in non-terminal nodes of decision trees. First, it places a hyperplane in a canonical location, and then iteratively perturbs the coefficients in small random amounts. At the beginning, when the temperature parameter of the SA is high, practically any perturbation of the coefficients is accepted regardless of the goodness-of-split value (the value of the utilised splitting criterion). As the SA cools down, only perturbations that improve the goodness-of-split are likely to be allowed. Although SADT can eventually escape from local-optima, its efficiency is compromised since it may consider tens of thousands of hyperplanes in a single node during annealing.

## 机器学习代写|决策树作业代写decision tree代考|Other Classification Criteria

Li 和 Dubes [62] 提出了一种二元类问题的二元标准，称为排列统计。它评估两个向量之间的相似程度，在一种一世和是，并且这个统计量越大，向量越相似。向量在一种一世计算如下。让一种一世是具有值的给定数字属性[8.20,7.3,9.35,4.8,7.65,4.33]和ñX=6. 向量是=[0,0,1,1,0,1]持有相应的类标签。现在考虑给定的阈值Δ=5.0. 向量在一种一世计算分两步：首先，属性一种一世值是排序的，即一种一世= [4.33,4.8,7.3,7.65,8.20,9.35]，因此重新排列是=[1,1,0,0,0,1];然后，在一种一世(n)取 0 时一种一世(n)≤Δ, 否则为 1。因此，在一种一世=[0,0,1,1,1,1]. 排列统计量首先分析有多少1−1火柴(d)矢量图在一种一世和是有。在这个特定的例子中，d=1. 接下来，它计算有多少 l 有在一种一世(n一种)并且在是(n是). 最后，排列统计量可以计算为：
b排列 (在一种一世,是)=∑j=0d(n一种 j)(ñX−n一种 n是−j)(ñX n是)−(n一种 d)(ñX−n一种 n是−d)(ñX nG)在 (n 米)=0 如果 n<0 或者 米<0 或者 n<米 =n!米!(n−米)! 除此以外

## 机器学习代写|决策树作业代写decision tree代考|Regression Criteria

MSE⁡(一种一世,X,是)=ñX−1∑j=1|一种一世|∑Xl∈在j(是(Xl)−在在¯)2

## 机器学习代写|决策树作业代写decision tree代考|Multivariate Splits

CART（分类和回归树）[12] 是最早允许多变量拆分的系统之一。它采用具有后向属性消除的爬山策略，以在非终端节点中找到良好（尽管次优）的属性线性组合。它是一种完全确定的算法，没有内置机制来逃避局部最优。布雷曼等人。[12]指出，所提出的算法有很大的改进空间。

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 机器学习代写|决策树作业代写decision tree代考| Selecting Splits

statistics-lab™ 为您的留学生涯保驾护航 在代写决策树decision tree方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写决策树decision tree代写方面经验极为丰富，各种代写决策树decision tree相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 机器学习代写|决策树作业代写decision tree代考|Selecting Splits

A major issue in top-down induction of decision trees is which attribute(s) to choose for splitting a node in subsets. For the case of axis-parallel decision trees (also known as univariate), the problem is to choose the attribute that better discriminates the input data. A decision rule based on such an attribute is thus generated, and the input data is filtered according to the outcomes of this rule. For oblique decision trees (also known as multivariate), the goal is to find a combination of attributes with good discriminatory power. Either way, both strategies are concerned with ranking attributes quantitatively.

We have divided the work in univariate criteria in the following categories: (i) information theory-based criteria; (ii) distance-based criteria; (iii) other classification criteria; and (iv) regression criteria. These categories are sometimes fuzzy and do not constitute a taxonomy by any means. Many of the criteria presented in a given category can be shown to be approximations of criteria in other categories.

## 机器学习代写|决策树作业代写decision tree代考|Information Theory-Based Criteria

Examples of this category are criteria based, directly or indirectly, on Shannon’s entropy [104]. Entropy is known to be a unique function which satisfies the four axioms of uncertainty. It represents the average amount of information when coding each class into a codeword with ideal length according to its probability. Some interesting facts regarding entropy are:

• For a fixed number of classes, entropy increases as the probability distribution of classes becomes more uniform;
• If the probability distribution of classes is uniform, entropy increases logarithmically as the number of classes in a sample increases;
• If a partition induced on a set $\mathbf{X}$ by an attribute $a_{j}$ is a refinement of a partition induced by $a_{i}$, then the entropy of the partition induced by $a_{j}$ is never higher than the entropy of the partition induced by $a_{i}$ (and it is only equal if the class distribution is kept identical after partitioning). This means that progressively refining a set in sub-partitions will continuously decrease the entropy value, regardless of the class distribution achieved after partitioning a set.

The first splitting criterion that arose based on entropy is the global mutual information (GMI) $[41,102,108]$, given by:
$$G M I\left(a_{i}, \mathbf{X}, y\right)=\frac{1}{N_{x}} \sum_{l=1}^{k} \sum_{j=1}^{\left|a_{i}\right|} N_{v j \cap \cap_{i}} \log {e} \frac{N{v_{j} \cap \cap_{y}} N_{x}}{N_{v_{j}, \bullet} N_{\mathbf{\bullet}, u}}$$
Ching et al. [22] propose the use of GMI as a tool for supervised discretization. They name it class-attribute mutual information, though the criterion is exactly the same. GMI is bounded by zero (when $a_{i}$ and $y$ are completely independent) and its maximum value is $\max \left(\log {2}\left|a{i}\right|, \log {2} k\right.$ ) (when there is a maximum correlation between $a{i}$ and $y$ ). Ching et al. [22] reckon this measure is biased towards attributes with many distinct values, and thus propose the following normalization called classattribute interdependence redundancy (CAIR):
$$\operatorname{CAIR}\left(a_{i}, \mathbf{X}, y\right)=\frac{G M I}{-\sum_{j=1}^{\left|a_{i}\right|} \sum_{l=1}^{k} p_{v_{j}} \cap \cap_{y} \log {2} p{v_{j} \cap \mathrm{y}{t}}}$$ which is actually dividing GMI by the joint entropy of $a{i}$ and $y$. Clearly CAIR $\left(a_{i}, \mathbf{X}, y\right) \geq 0$, since both GMI and the joint entropy are greater (or equal) than zero. In fact, $0 \leq \operatorname{CAIR}\left(a_{i}, \mathbf{X}, y\right) \leq 1$, with $\operatorname{CAIR}\left(a_{i}, \mathbf{X}, y\right)=0$ when $a_{i}$ and $y$ are totally independent and $\operatorname{CAIR}\left(a_{i}, \mathbf{X}, y\right)=1$ when they are totally dependent. The term redundancy in CAIR comes from the fact that one may discretize a continuous attribute in intervals in such a way that the class-attribute interdependence is kept intact (i.e., redundant values are combined in an interval). In the decision tree partitioning context, we must look for an attribute that maximizes CAIR (or similarly, that maximizes GMI).

## 机器学习代写|决策树作业代写decision tree代考|Distance-Based Criteria

Criteria in this category evaluate separability, divergency or discrimination between classes. They measure the distance between class probability distributions.

A popular distance criterion which is also from the class of impurity-based criteria is the Gini index $[12,39,88]$. It is given by:
$$\phi^{G i n i}(y, \mathbf{X})=1-\sum_{l=1}^{k} p_{\bullet}, y^{2}$$

Breiman et al. [12] also acknowledge Gini’s bias towards attributes with many values. They propose the twoing binary criterion for solving this matter. It belongs to the class of binary criteria, which requires attributes to have their domain split into two mutually exclusive subdomains, allowing binary splits only. For every binary criteria, the process of dividing attribute $a_{i}$ values into two subdomains, $d_{1}$ and $d_{2}$, is exhaustive ${ }^{1}$ and the division that maximizes its value is selected for attribute $a_{i}$. In other words, a binary criterion $\beta$ is tested over all possible subdomains in order to provide the optimal binary split, $\beta^{}$ : $$\beta^{}=\max {d{1}, d_{2}} \beta\left(a_{i}, d_{1}, d_{2}, \mathbf{X}, y\right)$$
s.t.
\begin{aligned} &d_{1} \cup d_{2}=\operatorname{dom}\left(a_{i}\right) \ &d_{1} \cap d_{2}=\emptyset \end{aligned}
Now that we have defined binary criteria, the twoing binary criterion is given by:
$$\beta^{\text {twoing }}\left(a_{i}, d_{1}, d_{2}, \mathbf{X}, y\right)=0.25 \times p_{d_{1}, \bullet} \times p_{d_{2}, \bullet} \times\left(\sum_{l=1}^{k} a b s\left(p_{y_{i} \mid d_{1}}-p_{y_{i} \mid d_{2}}\right)\right)^{2}$$
where $a b s(.)$ returns the absolute value.
Friedman [38] and Rounds [99] propose a binary criterion based on the Kolmogorov-Smirnoff (KS) distance for handling binary-class problems:
$$\beta^{K S}\left(a_{i}, d_{1}, d_{2}, \mathbf{X}, y\right)=a b s\left(p_{d_{1} \mid y_{1}}-p_{d_{1} \mid y_{2}}\right)$$
Haskell and Noui-Mehidi [45] propose extending $\beta^{K S}$ for handling multi-class problems. Utgoff and Clouse [120] also propose a multi-class extension to $\beta^{K S}$, as well as missing data treatment, and they present empirical results which show their criterion is similar in accuracy to Quinlan’s gain ratio, but produces smaller-sized trees.

## 机器学习代写|决策树作业代写decision tree代考|Information Theory-Based Criteria

• 对于固定数量的类，熵随着类的概率分布变得更加均匀而增加；
• 如果类的概率分布是均匀的，则熵会随着样本中类数的增加而对数增加；
• 如果在集合上引起分区X按属性一种j是由以下引起的分区的细化一种一世，然后由下式诱导的分区的熵一种j永远不会高于由一种一世（并且只有在分区后类分布保持相同时才相等）。这意味着逐步细化子分区中的集合将不断降低熵值，而不管划分集合后实现的类分布如何。

G米一世(一种一世,X,是)=1ñX∑l=1ķ∑j=1|一种一世|ñ在j∩∩一世日志⁡和ñ在j∩∩是ñXñ在j,∙ñ∙,在

## 机器学习代写|决策树作业代写decision tree代考|Distance-Based Criteria

φG一世n一世(是,X)=1−∑l=1ķp∙,是2

d1∪d2=dom⁡(一种一世) d1∩d2=∅

b合二为一 (一种一世,d1,d2,X,是)=0.25×pd1,∙×pd2,∙×(∑l=1ķ一种bs(p是一世∣d1−p是一世∣d2))2

Friedman [38] 和 Rounds [99] 提出了一种基于 Kolmogorov-Smirnoff (KS) 距离的二元标准，用于处理二元类问题：
bķ小号(一种一世,d1,d2,X,是)=一种bs(pd1∣是1−pd1∣是2)
Haskell 和 Noui-Mehidi [45] 建议扩展bķ小号用于处理多类问题。Utgoff 和 Clouse [120] 还提出了一个多类扩展bķ小号，以及缺失数据处理，他们提供的经验结果表明，他们的标准在准确性上与 Quinlan 的增益率相似，但会产生更小的树。

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 机器学习代写|决策树作业代写decision tree代考|Decision-Tree Induction

statistics-lab™ 为您的留学生涯保驾护航 在代写决策树decision tree方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写决策树decision tree代写方面经验极为丰富，各种代写决策树decision tree相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 机器学习代写|决策树作业代写decision tree代考|Origins

Automatically generating rules in the form of decision trees has been object of study of most research fields in which data exploration techniques have been developed [78]. Disciplines like engineering (pattern recognition), statistics, decision theory, and more recently artificial intelligence (machine learning) have a large number of studies dedicated to the generation and application of decision trees.

In statistics, we can trace the origins of decision trees to research that proposed building binary segmentation trees for understanding the relationship between target and input attributes. Some examples are AID [107], MAID [40], THAID [76], and CHAID [55]. The application that motivated these studies is survey data analysis. In engineering (pattern recognition), research on decision trees was motivated by the need to interpret images from remote sensing satellites in the 70 s [46]. Decision trees, and induction methods in general, arose in machine learning to avoid the knowledge acquisition bottleneck for expert systems [78].

Specifically regarding top-down induction of decision trees (by far the most popular approach of decision-tree induction), Hunt’s Concept Learning System (CLS) [49] can be regarded as the pioneering work for inducing decision trees. Systems that directly descend from Hunt’s CLS are ID3 [91], ACLS [87], and Assistant [57].

## 机器学习代写|决策树作业代写decision tree代考|Basic Concepts

Decision trees are an efficient nonparametric method that can be applied either to classification or to regression tasks. They are hierarchical data structures for supervised learning whereby the input space is split into local regions in order to predict the dependent variable [2].

A decision tree can be seen as a graph $G=(V, E)$ consisting of a finite, nonempty set of nodes (vertices) $V$ and a set of edges $E$. Such a graph has to satisfy the following properties [101]:

• The edges must be ordered pairs $(v, w)$ of vertices, i.e., the graph must be directed;
• There can be no cycles within the graph, i.e., the graph must be acyclic;
• There is exactly one node, called the root, which no edges enter;
• Every node, except for the root, has exactly one entering edge;
• There is a unique path-a sequence of edges of the form $\left(v_{1}, v_{2}\right),\left(v_{2}, v_{3}\right), \ldots$, $\left(v_{n-1}, v_{n}\right)$-from the root to each node;
• When there is a path from node $v$ to $w, v \neq w, v$ is a proper ancestor of $w$ and $w$ is a proper descendant of $v$. A node with no proper descendant is called a leaf (or a terminal). All others are called internal nodes (except for the root).

Root and internal nodes hold a test over a given data set attribute (or a set of attributes), and the edges correspond to the possible outcomes of the test. Leaf nodes can either hold class labels (classification), continuous values (regression), (non-) linear models (regression), or even models produced by other machine learning algorithms. For predicting the dependent variable value of a certain instance, one has to navigate through the decision tree. Starting from the root, one has to follow the edges according to the results of the tests over the attributes. When reaching a leaf node, the information it contains is responsible for the prediction outcome. For instance, a traditional decision tree for classification holds class labels in its leaves.
Decision trees can be regarded as a disjunction of conjunctions of constraints on the attribute values of instances [74]. Each path from the root to a leaf is actually a conjunction of attribute tests, and the tree itself allows the choice of different paths, that is, a disjunction of these conjunctions.

Other important definitions regarding decision trees are the concepts of depth and breadth. The average number of layers (levels) from the root node to the terminal nodes is referred to as the average depth of the tree. The average number of internal nodes in each level of the tree is referred to as the average breadth of the tree. Both depth and breadth are indicators of tree complexity, that is, the higher their values are, the more complex the corresponding decision tree is.

In Fig. 2.1, an example of a general decision tree for classification is presented. Circles denote the root and internal nodes whilst squares denote the leaf nodes. In

this particular example, the decision tree is designed for classification and thus the leaf nodes hold class labels.

There are many decision trees that can be grown from the same data. Induction of an optimal decision tree from data is considered to be a hard task. For instance, Hyafil and Rivest [50] have shown that constructing a minimal binary tree with regard to the expected number of tests required for classifying an unseen object is an NP-complete problem. Hancock et al. [43] have proved that finding a minimal decision tree consistent with the training set is NP-Hard, which is also the case of finding the minimal equivalent decision tree for a given decision tree [129], and building the optimal decision tree from decision tables [81]. These papers indicate that growing optimal decision trees (a brute-force approach) is only feasible in very small problems.

Hence, it was necessary the development of heuristics for solving the problem of growing decision trees. In that sense, several approaches which were developed in the last three decades are capable of providing reasonably accurate, if suboptimal, decision trees in a reduced amount of time. Among these approaches, there is a clear preference in the literature for algorithms that rely on a greedy, top-down, recursive partitioning strategy for the growth of the tree (top-down induction).

## 机器学习代写|决策树作业代写decision tree代考|Top-Down Induction

Hunt’s Concept Learning System framework (CLS) [49] is said to be the pioneer work in top-down induction of decision trees. CLS attempts to minimize the cost of classifying an object. Cost, in this context, is referred to two different concepts: the
10
2 Decision-Tree Induction
measurement cost of determining the value of a certain property (attribute) exhibited by the object, and the cost of classifying the object as belonging to class $j$ when it actually belongs to class $k$. At each stage, CLS exploits the space of possible decision trees to a fixed depth, chooses an action to minimize cost in this limited space, then moves one level down in the tree.

In a higher level of abstraction, Hunt’s algorithm can be recursively defined in only two steps. Let $\mathbf{X}{t}$ be the set of training instances associated with node $t$ and $y=\left{y{1}, y_{2}, \ldots, y_{k}\right}$ be the class labels in a $k$-class problem [110]:

1. If all the instances in $\mathbf{X}{t}$ belong to the same class $y{t}$ then $t$ is a leaf node labeled as $y_{t}$
2. If $\mathbf{X}{T}$ contains instances that belong to more than one class, an attribute test condition is selected to partition the instances into smaller subsets. A child node is created for each outcome of the test condition and the instances in $\mathbf{X}{I}$ are distributed to the children based on the outcomes. Recursively apply the algorithm to each child node.

Hunt’s simplified algorithm is the basis for all current top-down decision-tree induction algorithms. Nevertheless, its assumptions are too stringent for practical use. For instance, it would only work if every combination of attribute values is present in the training data, and if the training data is inconsistency-free (each combination has a unique class label).

Hunt’s algorithm was improved in many ways. Its stopping criterion, for example, as expressed in step 1, requires all leaf nodes to be pure (i.e., belonging to the same class). In most practical cases, this constraint leads to enormous decision trees, which tend to suffer from overfitting (an issue discussed later in this chapter). Possible solutions to overcome this problem include prematurely stopping the tree growth when a minimum level of impurity is reached, or performing a pruning step after the tree has been fully grown (more details on other stopping criteria and on pruning in Sects. 2.3.2 and 2.3.3). Another design issue is how to select the attribute test condition to partition the instances into smaller subsets. In Hunt’s original approach, a cost-driven function was responsible for partitioning the tree. Subsequent algorithms such as ID3 [91, 92] and C4.5 [89] make use of information theory based functions for partitioning nodes in purer subsets (more details on Sect. 2.3.1).

An up-to-date algorithmic framework for top-down induction of decision trees is presented in [98], and we reproduce it in Algorithm 1. It contains three procedures: one for growing the tree (treeGrowing), one for pruning the tree (treePruning) and one to combine those two procedures (inducer). The first issue to be discussed is how to select the test condition $f(A)$, i.e., how to select the best combination of attribute(s) and value(s) for splitting nodes.

## 机器学习代写|决策树作业代写decision tree代考|Basic Concepts

• 边必须是有序对(在,在)顶点数，即图必须是有向的；
• 图内不能有环，即图必须是无环的；
• 只有一个节点，称为根，没有边进入；
• 除根外，每个节点都只有一个进入边；
• 有一条唯一的路径——形式的一系列边(在1,在2),(在2,在3),…, (在n−1,在n)- 从根到每个节点；
• 当有来自节点的路径时在到在,在≠在,在是正确的祖先在和在是正确的后裔在. 没有适当后代的节点称为叶子（或终端）。所有其他都称为内部节点（根除外）。

## 机器学习代写|决策树作业代写decision tree代考|Top-Down Induction

Hunt 的概念学习系统框架 (CLS) [49] 据说是自上而下归纳决策树的先驱工作。CLS 试图最小化对象分类的成本。在这种情况下，成本指的是两个不同的概念：确定对象表现出的某个属性（属性）值的
10
2 决策树归纳

1. 如果所有实例在X吨属于同一类是吨然后吨是一个叶节点，标记为是吨
2. 如果X吨包含属于多个类的实例，选择属性测试条件将实例划分为更小的子集。为测试条件的每个结果和其中的实例创建一个子节点X一世根据结果​​分配给孩子们。递归地将算法应用于每个子节点。

Hunt 的简化算法是当前所有自上而下的决策树归纳算法的基础。然而，它的假设对于实际使用来说过于严格。例如，只有当训练数据中存在属性值的每个组合，并且训练数据没有不一致时（每个组合都有一个唯一的类标签），它才会起作用。

Hunt 的算法在很多方面都得到了改进。例如，它的停止标准，如步骤 1 所示，要求所有叶节点都是纯的（即，属于同一类）。在大多数实际情况下，这种约束会导致巨大的决策树，这往往会受到过度拟合的影响（本章稍后会讨论这个问题）。克服此问题的可能解决方案包括在达到最低杂质水平时过早停止树的生长，或在树完全生长后执行修剪步骤（有关其他停止标准和修剪的详细信息，请参阅第 2.3.2 节和2.3.3)。另一个设计问题是如何选择属性测试条件将实例划分为更小的子集。在 Hunt 的原始方法中，成本驱动的函数负责对树进行分区。

[98] 中提出了一种用于自上而下归纳决策树的最新算法框架，我们在算法 1 中重现了它。它包含三个过程：一个用于生长树（treeGrowing），一个用于修剪树（treePruning）和一个结合这两个程序（inducer）。首先要讨论的问题是如何选择测试条件F(一种)，即如何选择属性和值的最佳组合来分割节点。

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 机器学习代写|决策树作业代写decision tree代考|Classification

statistics-lab™ 为您的留学生涯保驾护航 在代写决策树decision tree方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写决策树decision tree代写方面经验极为丰富，各种代写决策树decision tree相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 机器学习代写|决策树作业代写decision tree代考|Introduction

Classification, which is the data mining task of assigning objects to predefined categories, is widely used in the process of intelligent decision making. Many classification techniques have been proposed by researchers in machine learning, statistics, and pattern recognition. Such techniques can be roughly divided according to the their level of comprehensibility. For instance, techniques that produce interpretable classification models are known as white-box approaches, whereas those that do not are known as black-box approaches. There are several advantages in employing white-box techniques for classification, such as increasing the user confidence in the prediction, providing new insight about the classification problem, and allowing the detection of errors either in the model or in the data [12]. Examples of white-box classification techniques are classification rules and decision trees. The latter is the main focus of this book.

A decision tree is a classifier represented by a flowchart-like tree structure that has been widely used to represent classification models, specially due to its comprehensible nature that resembles the human reasoning. In a recent poll from the kdnuggets website [13], decision trees figured as the most used data mining/analytic method by researchers and practitioners, reaffirming its importance in machine learning tasks. Decision-tree induction algorithms present several advantages over other learning algorithms, such as robustness to noise, low computational cost for generating the model, and ability to deal with redundant attributes [22].

Several attempts on optimising decision-tree algorithms have been made by researchers within the last decades, even though the most successful algorithms date back to the mid-80s [4] and early $90 \mathrm{~s}[21]$. Many strategies were employed for deriving accurate decision trees, such as bottom-up induction $[1,17]$, linear programming [3], hybrid induction [15], and ensemble of trees [5], just to name a few. Nevertheless, no strategy has been more successful in generating accurate and comprehensible decision trees with low computational effort than the greedy top-down induction strategy.

A greedy top-down decision-tree induction algorithm recursively analyses if a sample of data should be partitioned into subsets according to a given rule, or if no further partitioning is needed. This analysis takes into account a stopping criterion, for

deciding when tree growth should halt, and a splitting criterion, which is responsible for choosing the “best” rule for partitioning a subset. Further improvements over this basic strategy include pruning tree nodes for enhancing the tree’s capability of dealing with noisy data, and strategies for dealing with missing values, imbalanced classes, oblique splits, among others.

A very large number of approaches were proposed in the literature for each one of these design components of decision-tree induction algorithms. For instance, new measures for node-splitting tailored to a vast number of application domains were proposed, as well as many different strategies for selecting multiple attributes for composing the node rule (multivariate split). There are even studies in the literature that survey the numerous approaches for pruning a decision tree $[6,9]$. It is clear that by improving these design components, more effective decision-tree induction algorithms can be obtained.

## 机器学习代写|决策树作业代写decision tree代考|Book Outline

This book is structured in 7 chapters, as follows.
Chapter 2 [Decision-Tree Induction]. This chapter presents the origins, basic concepts, detailed components of top-down induction, and also other decision-tree induction strategies.

Chapter 3 [Evolutionary Algorithms and Hyper-Heuristics]. This chapter covers the origins, basic concepts, and techniques for both Evolutionary Algorithms and Hyper-Heuristics.

Chapter 4 [HEAD-DT: Automatic Design of Decision-Tree Induction Algorithms]. This chapter introduces and discusses the hyper-heuristic evolutionary algorithm that is capable of automatically designing decision-tree algorithms. Details such as the evolutionary scheme, building blocks, fitness evaluation, selection, genetic operators, and search space are covered in depth.

Chapter 5 [HEAD-DT: Experimental Analysis]. This chapter presents a thorough empirical analysis on the distinct scenarios in which HEAD-DT may be applied to. In addition, a discussion on the cost effectiveness of automatic design, as well as examples of automatically-designed algorithms and a baseline comparison between genetic and random search are also presented.

Chapter 6 [HEAD-DT: Fitness Function Analysis]. This chapter conducts an investigation of 15 distinct versions for HEAD-DT by varying its fitness function, and a new set of experiments with the best-performing strategies in balanced and imbalanced data sets is described.

Chapter 7 [Conclusions]. We finish this book by presenting the current limitations of the automatic design, as well as our view of several exciting opportunities for future work.

## 机器学习代写|决策树作业代写decision tree代考|Decision-tree induction algorithms

Abstract Decision-tree induction algorithms are highly used in a variety of domains for knowledge discovery and pattern recognition. They have the advantage of producing a comprehensible classification/regression model and satisfactory accuracy levels in several application domains, such as medical diagnosis and credit risk assessment. In this chapter, we present in detail the most common approach for decision-tree induction: top-down induction (Sect. 2.3). Furthermore, we briefly comment on some alternative strategies for induction of decision trees (Sect. 2.4). Our goal is to summarize the main design options one has to face when building decision-tree induction algorithms. These design choices will be specially interesting when designing an evolutionary algorithm for evolving decision-tree induction algorithms.

Keywords Decision trees – Hunt’s algorithm . Top-down induction – Design components

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。