机器学习代写|决策树作业代写decision tree代考|Decision-Tree Induction

如果你也在 怎样代写决策树decision tree这个学科遇到相关的难题,请随时右上角联系我们的24/7代写客服。

决策树是一种决策支持工具,它使用决策及其可能后果的树状模型,包括偶然事件结果、资源成本和效用。它是显示一个只包含条件控制语句的算法的一种方式。

statistics-lab™ 为您的留学生涯保驾护航 在代写决策树decision tree方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写决策树decision tree代写方面经验极为丰富,各种代写决策树decision tree相关的作业也就用不着说。

我们提供的决策树decision tree及其相关学科的代写,服务范围广, 其中包括但不限于:

  • Statistical Inference 统计推断
  • Statistical Computing 统计计算
  • Advanced Probability Theory 高等概率论
  • Advanced Mathematical Statistics 高等数理统计学
  • (Generalized) Linear Models 广义线性模型
  • Statistical Machine Learning 统计机器学习
  • Longitudinal Data Analysis 纵向数据分析
  • Foundations of Data Science 数据科学基础
Machine Learning/Inductive Inference/Decision Trees/Overview
机器学习代写|决策树作业代写decision tree代考|Decision-Tree Induction

机器学习代写|决策树作业代写decision tree代考|Origins

Automatically generating rules in the form of decision trees has been object of study of most research fields in which data exploration techniques have been developed [78]. Disciplines like engineering (pattern recognition), statistics, decision theory, and more recently artificial intelligence (machine learning) have a large number of studies dedicated to the generation and application of decision trees.

In statistics, we can trace the origins of decision trees to research that proposed building binary segmentation trees for understanding the relationship between target and input attributes. Some examples are AID [107], MAID [40], THAID [76], and CHAID [55]. The application that motivated these studies is survey data analysis. In engineering (pattern recognition), research on decision trees was motivated by the need to interpret images from remote sensing satellites in the 70 s [46]. Decision trees, and induction methods in general, arose in machine learning to avoid the knowledge acquisition bottleneck for expert systems [78].

Specifically regarding top-down induction of decision trees (by far the most popular approach of decision-tree induction), Hunt’s Concept Learning System (CLS) [49] can be regarded as the pioneering work for inducing decision trees. Systems that directly descend from Hunt’s CLS are ID3 [91], ACLS [87], and Assistant [57].

机器学习代写|决策树作业代写decision tree代考|Basic Concepts

Decision trees are an efficient nonparametric method that can be applied either to classification or to regression tasks. They are hierarchical data structures for supervised learning whereby the input space is split into local regions in order to predict the dependent variable [2].

A decision tree can be seen as a graph $G=(V, E)$ consisting of a finite, nonempty set of nodes (vertices) $V$ and a set of edges $E$. Such a graph has to satisfy the following properties [101]:

  • The edges must be ordered pairs $(v, w)$ of vertices, i.e., the graph must be directed;
  • There can be no cycles within the graph, i.e., the graph must be acyclic;
  • There is exactly one node, called the root, which no edges enter;
  • Every node, except for the root, has exactly one entering edge;
  • There is a unique path-a sequence of edges of the form $\left(v_{1}, v_{2}\right),\left(v_{2}, v_{3}\right), \ldots$, $\left(v_{n-1}, v_{n}\right)$-from the root to each node;
  • When there is a path from node $v$ to $w, v \neq w, v$ is a proper ancestor of $w$ and $w$ is a proper descendant of $v$. A node with no proper descendant is called a leaf (or a terminal). All others are called internal nodes (except for the root).

Root and internal nodes hold a test over a given data set attribute (or a set of attributes), and the edges correspond to the possible outcomes of the test. Leaf nodes can either hold class labels (classification), continuous values (regression), (non-) linear models (regression), or even models produced by other machine learning algorithms. For predicting the dependent variable value of a certain instance, one has to navigate through the decision tree. Starting from the root, one has to follow the edges according to the results of the tests over the attributes. When reaching a leaf node, the information it contains is responsible for the prediction outcome. For instance, a traditional decision tree for classification holds class labels in its leaves.
Decision trees can be regarded as a disjunction of conjunctions of constraints on the attribute values of instances [74]. Each path from the root to a leaf is actually a conjunction of attribute tests, and the tree itself allows the choice of different paths, that is, a disjunction of these conjunctions.

Other important definitions regarding decision trees are the concepts of depth and breadth. The average number of layers (levels) from the root node to the terminal nodes is referred to as the average depth of the tree. The average number of internal nodes in each level of the tree is referred to as the average breadth of the tree. Both depth and breadth are indicators of tree complexity, that is, the higher their values are, the more complex the corresponding decision tree is.

In Fig. 2.1, an example of a general decision tree for classification is presented. Circles denote the root and internal nodes whilst squares denote the leaf nodes. In

this particular example, the decision tree is designed for classification and thus the leaf nodes hold class labels.

There are many decision trees that can be grown from the same data. Induction of an optimal decision tree from data is considered to be a hard task. For instance, Hyafil and Rivest [50] have shown that constructing a minimal binary tree with regard to the expected number of tests required for classifying an unseen object is an NP-complete problem. Hancock et al. [43] have proved that finding a minimal decision tree consistent with the training set is NP-Hard, which is also the case of finding the minimal equivalent decision tree for a given decision tree [129], and building the optimal decision tree from decision tables [81]. These papers indicate that growing optimal decision trees (a brute-force approach) is only feasible in very small problems.

Hence, it was necessary the development of heuristics for solving the problem of growing decision trees. In that sense, several approaches which were developed in the last three decades are capable of providing reasonably accurate, if suboptimal, decision trees in a reduced amount of time. Among these approaches, there is a clear preference in the literature for algorithms that rely on a greedy, top-down, recursive partitioning strategy for the growth of the tree (top-down induction).

机器学习代写|决策树作业代写decision tree代考|Top-Down Induction

Hunt’s Concept Learning System framework (CLS) [49] is said to be the pioneer work in top-down induction of decision trees. CLS attempts to minimize the cost of classifying an object. Cost, in this context, is referred to two different concepts: the
10
2 Decision-Tree Induction
measurement cost of determining the value of a certain property (attribute) exhibited by the object, and the cost of classifying the object as belonging to class $j$ when it actually belongs to class $k$. At each stage, CLS exploits the space of possible decision trees to a fixed depth, chooses an action to minimize cost in this limited space, then moves one level down in the tree.

In a higher level of abstraction, Hunt’s algorithm can be recursively defined in only two steps. Let $\mathbf{X}{t}$ be the set of training instances associated with node $t$ and $y=\left{y{1}, y_{2}, \ldots, y_{k}\right}$ be the class labels in a $k$-class problem [110]:

  1. If all the instances in $\mathbf{X}{t}$ belong to the same class $y{t}$ then $t$ is a leaf node labeled as $y_{t}$
  2. If $\mathbf{X}{T}$ contains instances that belong to more than one class, an attribute test condition is selected to partition the instances into smaller subsets. A child node is created for each outcome of the test condition and the instances in $\mathbf{X}{I}$ are distributed to the children based on the outcomes. Recursively apply the algorithm to each child node.

Hunt’s simplified algorithm is the basis for all current top-down decision-tree induction algorithms. Nevertheless, its assumptions are too stringent for practical use. For instance, it would only work if every combination of attribute values is present in the training data, and if the training data is inconsistency-free (each combination has a unique class label).

Hunt’s algorithm was improved in many ways. Its stopping criterion, for example, as expressed in step 1, requires all leaf nodes to be pure (i.e., belonging to the same class). In most practical cases, this constraint leads to enormous decision trees, which tend to suffer from overfitting (an issue discussed later in this chapter). Possible solutions to overcome this problem include prematurely stopping the tree growth when a minimum level of impurity is reached, or performing a pruning step after the tree has been fully grown (more details on other stopping criteria and on pruning in Sects. 2.3.2 and 2.3.3). Another design issue is how to select the attribute test condition to partition the instances into smaller subsets. In Hunt’s original approach, a cost-driven function was responsible for partitioning the tree. Subsequent algorithms such as ID3 [91, 92] and C4.5 [89] make use of information theory based functions for partitioning nodes in purer subsets (more details on Sect. 2.3.1).

An up-to-date algorithmic framework for top-down induction of decision trees is presented in [98], and we reproduce it in Algorithm 1. It contains three procedures: one for growing the tree (treeGrowing), one for pruning the tree (treePruning) and one to combine those two procedures (inducer). The first issue to be discussed is how to select the test condition $f(A)$, i.e., how to select the best combination of attribute(s) and value(s) for splitting nodes.

机器学习代写|决策树作业代写decision tree代考|Decision-Tree Induction

决策树代写

机器学习代写|决策树作业代写decision tree代考|Origins

以决策树的形式自动生成规则一直是开发数据探索技术的大多数研究领域的研究对象[78]。工程学(模式识别)、统计学、决策理论以及最近的人工智能(机器学习)等学科都有大量致力于决策树的生成和应用的研究。

在统计学中,我们可以将决策树的起源追溯到提出构建二元分割树以理解目标和输入属性之间关系的研究。一些例子是 AID [107]、MAID [40]、THAID [76] 和 CHAID [55]。推动这些研究的应用是调查数据分析。在工程(模式识别)中,决策树的研究是出于解释 70 年代遥感卫星图像的需要[46]。决策树和一般的归纳方法出现在机器学习中,以避免专家系统的知识获取瓶颈[78]。

特别是关于决策树的自上而下的归纳(迄今为止最流行的决策树归纳方法),亨特的概念学习系统(CLS)[49]可以被视为归纳决策树的开创性工作。直接源自 Hunt 的 CLS 的系统是 ID3 [91]、ACLS [87] 和 Assistant [57]。

机器学习代写|决策树作业代写decision tree代考|Basic Concepts

决策树是一种有效的非参数方法,可应用于分类或回归任务。它们是用于监督学习的分层数据结构,其中输入空间被分成局部区域以预测因变量 [2]。

决策树可以看作是一张图G=(在,和)由一组有限的非空节点(顶点)组成在和一组边和. 这样的图必须满足以下属性[101]:

  • 边必须是有序对(在,在)顶点数,即图必须是有向的;
  • 图内不能有环,即图必须是无环的;
  • 只有一个节点,称为根,没有边进入;
  • 除根外,每个节点都只有一个进入边;
  • 有一条唯一的路径——形式的一系列边(在1,在2),(在2,在3),…, (在n−1,在n)- 从根到每个节点;
  • 当有来自节点的路径时在到在,在≠在,在是正确的祖先在和在是正确的后裔在. 没有适当后代的节点称为叶子(或终端)。所有其他都称为内部节点(根除外)。

根节点和内部节点对给定的数据集属性(或一组属性)进行测试,边缘对应于测试的可能结果。叶节点可以保存类标签(分类)、连续值(回归)、(非)线性模型(回归),甚至可以保存其他机器学习算法产生的模型。为了预测某个实例的因变量值,必须浏览决策树。从根开始,必须根据对属性的测试结果跟踪边缘。当到达叶节点时,它包含的信息负责预测结果。例如,用于分类的传统决策树在其叶子中保存类标签。
决策树可以看作是对实例属性值的约束合取的析取[74]。从根到叶子的每条路径实际上是属性测试的合取,而树本身允许选择不同的路径,即这些合取的析取。

关于决策树的其他重要定义是深度和广度的概念。从根节点到终端节点的平均层数(层数)称为树的平均深度。树的每一层的内部节点的平均数称为树的平均宽度。深度和广度都是树复杂度的指标,即它们的值越高,对应的决策树越复杂。

在图 2.1 中,给出了一个用于分类的通用决策树的示例。圆圈表示根节点和内部节点,而正方形表示叶节点。在

在这个特定的例子中,决策树是为分类而设计的,因此叶节点持有类标签。

有许多决策树可以从相同的数据中生成。从数据中归纳出最优决策树被认为是一项艰巨的任务。例如,Hyafil 和 Rivest [50] 已经表明,根据对看不见的对象进行分类所需的预期测试次数构建最小二叉树是一个 NP 完全问题。汉考克等人。[43] 证明了找到与训练集一致的最小决策树是 NP-Hard,这也是为给定决策树 [129] 找到最小等效决策树的情况,并从决策中构建最优决策树表 [81]。这些论文表明,生长最优决策树(一种蛮力方法)仅在非常小的问题中是可行的。

因此,有必要开发启发式算法来解决生长决策树的问题。从这个意义上说,过去 30 年开发的几种方法能够在更短的时间内提供相当准确的决策树,如果不是最优的,决策树。在这些方法中,文献中明显偏爱依赖贪婪、自上而下、递归分区策略来生长树(自上而下归纳)的算法。

机器学习代写|决策树作业代写decision tree代考|Top-Down Induction

Hunt 的概念学习系统框架 (CLS) [49] 据说是自上而下归纳决策树的先驱工作。CLS 试图最小化对象分类的成本。在这种情况下,成本指的是两个不同的概念:确定对象表现出的某个属性(属性)值的
10
2 决策树归纳
测量成本,以及将对象归类为类别的成本j当它实际上属于类时ķ. 在每个阶段,CLS 将可能的决策树空间利用到一个固定的深度,在这个有限的空间中选择一个动作来最小化成本,然后在树中向下移动一个级别。

在更高的抽象层次上,Hunt 的算法可以递归地定义为只需要两个步骤。让X吨是与节点关联的训练实例集吨和y=\left{y{1}, y_{2}, \ldots, y_{k}\right}y=\left{y{1}, y_{2}, \ldots, y_{k}\right}是 a 中的类标签ķ-类问题[110]:

  1. 如果所有实例在X吨属于同一类是吨然后吨是一个叶节点,标记为是吨
  2. 如果X吨包含属于多个类的实例,选择属性测试条件将实例划分为更小的子集。为测试条件的每个结果和其中的实例创建一个子节点X一世根据结果​​分配给孩子们。递归地将算法应用于每个子节点。

Hunt 的简化算法是当前所有自上而下的决策树归纳算法的基础。然而,它的假设对于实际使用来说过于严格。例如,只有当训练数据中存在属性值的每个组合,并且训练数据没有不一致时(每个组合都有一个唯一的类标签),它才会起作用。

Hunt 的算法在很多方面都得到了改进。例如,它的停止标准,如步骤 1 所示,要求所有叶节点都是纯的(即,属于同一类)。在大多数实际情况下,这种约束会导致巨大的决策树,这往往会受到过度拟合的影响(本章稍后会讨论这个问题)。克服此问题的可能解决方案包括在达到最低杂质水平时过早停止树的生长,或在树完全生长后执行修剪步骤(有关其他停止标准和修剪的详细信息,请参阅第 2.3.2 节和2.3.3)。另一个设计问题是如何选择属性测试条件将实例划分为更小的子集。在 Hunt 的原始方法中,成本驱动的函数负责对树进行分区。

[98] 中提出了一种用于自上而下归纳决策树的最新算法框架,我们在算法 1 中重现了它。它包含三个过程:一个用于生长树(treeGrowing),一个用于修剪树(treePruning)和一个结合这两个程序(inducer)。首先要讨论的问题是如何选择测试条件F(一种),即如何选择属性和值的最佳组合来分割节点。

机器学习代写|决策树作业代写decision tree代考 请认准statistics-lab™

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

金融工程是使用数学技术来解决金融问题。金融工程使用计算机科学、统计学、经济学和应用数学领域的工具和知识来解决当前的金融问题,以及设计新的和创新的金融产品。

非参数统计代写

非参数统计指的是一种统计方法,其中不假设数据来自于由少数参数决定的规定模型;这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型(GLM)归属统计学领域,是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

术语 广义线性模型(GLM)通常是指给定连续和/或分类预测因素的连续响应变量的常规线性回归模型。它包括多元线性回归,以及方差分析和方差分析(仅含固定效应)。

有限元方法代写

有限元方法(FEM)是一种流行的方法,用于数值解决工程和数学建模中出现的微分方程。典型的问题领域包括结构分析、传热、流体流动、质量运输和电磁势等传统领域。

有限元是一种通用的数值方法,用于解决两个或三个空间变量的偏微分方程(即一些边界值问题)。为了解决一个问题,有限元将一个大系统细分为更小、更简单的部分,称为有限元。这是通过在空间维度上的特定空间离散化来实现的,它是通过构建对象的网格来实现的:用于求解的数值域,它有有限数量的点。边界值问题的有限元方法表述最终导致一个代数方程组。该方法在域上对未知函数进行逼近。[1] 然后将模拟这些有限元的简单方程组合成一个更大的方程系统,以模拟整个问题。然后,有限元通过变化微积分使相关的误差函数最小化来逼近一个解决方案。

tatistics-lab作为专业的留学生服务机构,多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务,包括但不限于Essay代写,Assignment代写,Dissertation代写,Report代写,小组作业代写,Proposal代写,Paper代写,Presentation代写,计算机作业代写,论文修改和润色,网课代做,exam代考等等。写作范围涵盖高中,本科,研究生等海外留学全阶段,辐射金融,经济学,会计学,审计学,管理学等全球99%专业科目。写作团队既有专业英语母语作者,也有海外名校硕博留学生,每位写作老师都拥有过硬的语言能力,专业的学科背景和学术写作经验。我们承诺100%原创,100%专业,100%准时,100%满意。

随机分析代写


随机微积分是数学的一个分支,对随机过程进行操作。它允许为随机过程的积分定义一个关于随机过程的一致的积分理论。这个领域是由日本数学家伊藤清在第二次世界大战期间创建并开始的。

时间序列分析代写

随机过程,是依赖于参数的一组随机变量的全体,参数通常是时间。 随机变量是随机现象的数量表现,其时间序列是一组按照时间发生先后顺序进行排列的数据点序列。通常一组时间序列的时间间隔为一恒定值(如1秒,5分钟,12小时,7天,1年),因此时间序列可以作为离散时间数据进行分析处理。研究时间序列数据的意义在于现实中,往往需要研究某个事物其随时间发展变化的规律。这就需要通过研究该事物过去发展的历史记录,以得到其自身发展的规律。

回归分析代写

多元回归分析渐进(Multiple Regression Analysis Asymptotics)属于计量经济学领域,主要是一种数学上的统计分析方法,可以分析复杂情况下各影响因素的数学关系,在自然科学、社会和经济学等多个领域内应用广泛。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中,其中问题和解决方案以熟悉的数学符号表示。典型用途包括:数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发,包括图形用户界面构建MATLAB 是一个交互式系统,其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题,尤其是那些具有矩阵和向量公式的问题,而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问,这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展,得到了许多用户的投入。在大学环境中,它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域,MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要,工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数(M 文件)的综合集合,可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

R语言代写问卷设计与分析代写
PYTHON代写回归分析与线性模型代写
MATLAB代写方差分析与试验设计代写
STATA代写机器学习/统计学习代写
SPSS代写计量经济学代写
EVIEWS代写时间序列分析代写
EXCEL代写深度学习代写
SQL代写各种数据建模与可视化代写

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注