### 机器学习代写|自然语言处理代写NLP代考|WHAT IS IMBALANCED CLASSIFICATION

statistics-lab™ 为您的留学生涯保驾护航 在代写自然语言处理NLP方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写自然语言处理NLP代写方面经验极为丰富，各种代写自然语言处理NLP相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 机器学习代写|自然语言处理代写NLP代考|WHAT IS IMBALANCED CLASSIFICATION

Imbalanced classification involves datasets with imbalanced classes. For example, suppose that class A has $99 \%$ of the data and class B has $1 \%$. Which classification algorithm would you use? Unfortunately, classification algorithms

don’t work well with this type of imbalanced dataset. Here is a list of several well-known techniques for handling imbalanced datasets:

• Random resampling rebalances the class distribution.
• Random oversampling duplicates data in the minority class.
• Random undersampling deletes examples from the majority class.
• SMOTE
Random resampling transforms the training dataset into a new dataset, which is effective for imbalanced classification problems.

The random undersampling technique removes samples from the dataset, and involves the following:

• randomly remove samples from majority class
• can be performed with or without replacement
• alleviates imbalance in the dataset
• may increase the variance of the classifier
• may discard useful or important samples
However, random undersampling does not work well with a dataset that has a $99 \% / 1 \%$ split into two classes. Moreover, undersampling can result in losing information that is useful for a model.

Instead of random undersampling, another approach involves generating new samples from a minority class. The first technique involves oversampling examples in the minority class and duplicate examples from the minority class.
There is another technique that is better than the preceding technique, which involves the following:

• synthesize new examples from minority class
• a type of data augmentation for tabular data
• this technique can be very effective
• generate new samples from minority class
Another well-known technique is called SMOTE, which involves data augmentation (i.e., synthesizing new data samples) well before you use a classification algorithm. SMOTE was initially developed by means of the kNN algorithm (other options are available), and it can be an effective technique for handling imbalanced classes.

Yet another option to consider is the Python package imbal anced-learn in the scikit-learn-contrib project. This project provides various re-sampling techniques for datasets that exhibit class imbalance. More details are available online:
https://github.com/scikit-learn-contrib/imbalanced-learn.

## 机器学习代写|自然语言处理代写NLP代考|WHAT IS SMOTE

SMOTE is a technique for synthesizing new samples for a dataset. This technique is based on linear interpolation:

• Step 1: Select samples that are close in the feature space.
• Step 2: Draw a line between the samples in the feature space.
• Step 3: Draw a new sample at a point along that line.
A more detailed explanation of the SMOTE algorithm is as follows:
• Select a random sample “a” from the minority class.
• Find $\mathrm{k}$ nearest neighbors for that example.
• Select a random neighbor “b” from the nearest neighbors.
• Create a line “L” that connects “a” and “b.”
• Randomly select one or more points “c” on line L.
If need be, you can repeat this process for the other $(\mathrm{k}-1)$ nearest neighbors to distribute the synthetic values more evenly among the nearest neighbors.

The initial SMOTE algorithm is based on the kNN classification algorithm, which has been extended in various ways, such as replacing $\mathrm{kNN}$ with SVM. A list of SMOTE extensions is shown as follows:

• selective synthetic sample generation
• Borderline-SMOTE (kNN)
• Borderline-SMOTE (SVM)

## 机器学习代写|自然语言处理代写NLP代考|ANALYZING CLASSIFIERS

This section is marked “optional” because its contents pertain to machine learning classifiers, which are not the focus of this book. However, it’s still worthwhile to glance through the material, or perhaps return to this section after you have a basic understanding of machine learning classifiers.

Several well-known techniques are available for analyzing the quality of machine learning classifiers. Two techniques are LIME and ANOVA, both of which are discussed in the following subsections.

LIME is an acronym for Local Interpretable Model-Agnostic Explanations. LIME is a model-agnostic technique that can be used with machine learning models. In LIME, you make small random changes to data samples and then observe the manner in which predictions change (or not). The approach involves changing the output (slightly) and then observing what happens to the output.

By way of analogy, consider food inspectors who test for bacteria in truckloads of perishable food. Clearly, it’s infeasible to test every food item in a truck (or a train car), so inspectors perform “spot checks” that involve testing randomly selected items. In an analogous fashion, LIME makes small changes to input data in random locations and then analyzes the changes in the associated output values.

However, there are two caveats to keep in mind when you use LIME with input data for a given model:

1. The actual changes to input values are model-specific.
2. This technique works on input that is interpretable.
Examples of interpretable input include machine learning classifiers (such as trees and random forests) and NLP techniques such as BoW (Bag of Words). Non-interpretable input involves “dense” data, such as a word embedding (which is a vector of floating point numbers).

You could also substitute your model with another model that involves interpretable data, but then you need to evaluate how accurate the approximation is to the original model.

## 机器学习代写|自然语言处理代写NLP代考|WHAT IS IMBALANCED CLASSIFICATION

• 随机重采样重新平衡类分布。
• 随机过采样会复制少数类中的数据。
• 随机欠采样从多数类中删除示例。
• SMOTE
随机重采样将训练数据集转换为新的数据集，这对于不平衡的分类问题是有效的。

• 从多数类中随机删除样本
• 可以在有或没有更换的情况下进行
• 减轻数据集中的不平衡
• 可能会增加分类器的方差
• 可能会丢弃有用或重要的样本
但是，随机欠采样不适用于具有99%/1%分为两类。此外，欠采样会导致丢失对模型有用的信息。

• 从少数类中合成新的例子
• 表格数据的一种数据扩充
• 这种技术非常有效
• 从少数类生成新样本
另一种众所周知的技术称为 SMOTE，它在使用分类算法之前就涉及数据增强（即合成新数据样本）。SMOTE 最初是通过 kNN 算法（其他选项可用）开发的，它可以成为处理不平衡类的有效技术。

https://github.com/scikit-learn-contrib/imbalanced-learn。

## 机器学习代写|自然语言处理代写NLP代考|WHAT IS SMOTE

SMOTE 是一种为数据集合成新样本的技术。该技术基于线性插值：

• 步骤 1：选择特征空间中相近的样本。
• 第 2 步：在特征空间中的样本之间画一条线。
• 第 3 步：在沿该线的一点绘制一个新样本。
SMOTE算法更详细的解释如下：
• 从少数类中选择一个随机样本“a”。
• 寻找ķ该示例的最近邻居。
• 从最近的邻居中选择一个随机邻居“b”。
• 创建一条连接“a”和“b”的线“L”。
• 在L线上随机选择一个或多个点“c”。
如果需要，您可以对另一个重复此过程(ķ−1)最近的邻居在最近的邻居之间更均匀地分配合成值。

• 选择性合成样品生成
• 边界-SMOTE (kNN)
• 边界-SMOTE (SVM)

## 机器学习代写|自然语言处理代写NLP代考|ANALYZING CLASSIFIERS

LIME 是 Local Interpretable Model-Agnostic Explanations 的首字母缩写词。LIME 是一种与模型无关的技术，可与机器学习模型一起使用。在 LIME 中，您对数据样本进行小的随机更改，然后观察预测更改（或不更改）的方式。该方法涉及（稍微）更改输出，然后观察输出发生了什么。

1. 输入值的实际变化是特定于模型的。
2. 这种技术适用于可解释的输入。
可解释输入的示例包括机器学习分类器（例如树和随机森林）和 NLP 技术，例如 BoW（词袋）。不可解释的输入涉及“密集”数据，例如词嵌入（它是浮点数的向量）。

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。