## 机器学习代写|自然语言处理代写NLP代考|CS11-711

statistics-lab™ 为您的留学生涯保驾护航 在代写自然语言处理NLP方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写自然语言处理NLP代写方面经验极为丰富，各种代写自然语言处理NLP相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 机器学习代写|自然语言处理代写NLP代考|Input embedding

The input embedding sub-layer converts the input tokens to vectors of dimension $d_{\text {modd }}=512$ using learned embeddings in the original Transformer model. The structure of the input embedding is classical:

The embedding sub-layer works like other standard transduction models. A tokenizer will transform a sentence into tokens. Each tokenizer has its methods, but the results are similar. For example, a tokenizer applied to the sequence “the Transformer is an innovative NLP model!” will produce the following tokens in one type of model:You will notice that this tokenizer normalized the string to lower case and truncated it into subparts. A tokenizer will generally provide an integer representation that will be used for the embedding process. For example:

There is not enough information in the tokenized text at this point to go further. The tokenized text must be embedded.
The Transformer contains a learned embedding sub-layer. Many embedding methods can be applied to the tokenized input.
I chose the skip-gram architecture of the word2vec embedding approach Google made available in 2013 to illustrate the embedding sublayer of the Transformer. A skip-gram will focus on a center word in a window of words and predicts context words. For example, if word(i) is the center word in a two-step window, a skipgram model will analyze word(i-2), word(i-1), word(i+1), and word(i+2). Then the window will slide and repeat the process. A skip-gram model generally contains an input layer, weights, a hidden layer, and an output containing the word cmbeddings of the tokenized input words.
Suppose we need to perform embedding for the following sentence:
The black cat sat on the couch and the brown dog slept on the rug.
We will focus on two words, black and brown. The word embedding vectors of these two words should be similar.
Since we must produce a vector of size $d_{\text {madel }}=512$ for each word, we will obtain a size 512 vector embedding for each word:The word black is now represented by 512 dimensions. Other embedding methods could be used and $d_{\text {mudel }}$ could have a higher number of dimensions.

## 机器学习代写|自然语言处理代写NLP代考|Positional encoding

We enter this positional encoding function of the Transformer with no idea of the position of a word in a sequence:

We cannot create independent positional vectors that would have a high cost on the training speed of the Transformer and make attention sub-layers very complex to work with. The idea is to add a positional encoding value to the input embedding instead of having additional vectors to describe the position of a token in a sequence.
We also know that the Transformer expects a fixed size $d_{\text {madel }}=512$ (or other constant value for the model) for each vector of the output of the positional encoding function.
If we go back to the sentence we used in the word embedding sub-layer, we can see that black and brown may be similar, but they are far apart:
The black cat sat on the couch and the brown dog slept on the rug.
The word black is in position 2, pos $=2$, and the word brown is in position 10 , pos $=10$.
Our problem is to find a way to add a value to the word embedding of each word so that it has that information. However, we need to add a value to the $d_{\text {madel }}=512$ dimensions! For each word embedding vector, we need to find a way to provide information to $i$ in the range $(\theta, 512)$ dimensions of the word embedding vector of black and brown.

There are many ways to achieve this goal. The designers found a clever way to use a unit sphere to represent positional encoding with sine and cosine values that will thus remain small but very useful.

## 机器学习代写|自然语言处理代写NLP代考|Input embedding

Transformer 包含一个学习的嵌入子层。许多嵌入方法可以应用于标记化输入。

## 机器学习代写|自然语言处理代写NLP代考|Positional encoding

black 这个词在位置 2，pos=2, 单词 brown 在位置 10 , pos=10.

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 机器学习代写|自然语言处理代写NLP代考|CS4650

statistics-lab™ 为您的留学生涯保驾护航 在代写自然语言处理NLP方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写自然语言处理NLP代写方面经验极为丰富，各种代写自然语言处理NLP相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 机器学习代写|自然语言处理代写NLP代考|The rise of the Transformer: Attention Is All You Need

In December 2017, Vaswani et al. published their seminal paper, Attention Is All You Need. They performed their work at Google Research and Google Brain. I will refer to the model described in Attention Is All You Need as the “original Transformer model” throughout this chapter and book.

In this section, we will look at the Transformer model they built from the outside. In the following sections, we will explore what is inside each component of the model.
The original Transformer model is a stack of 6 layers. The output of layer $l$ is the input of layer $l+1$ until the final prediction is reached. There is a 6-layer encoder stack on the left and a 6-layer decoder stack on the right:

On the left, the inputs enter the encoder side of the Transformer through an attention sub-layer and FeedForward Network (FFN) sub-layer. On the right, the target outputs go into the decoder side of the Transformer through two attention sub-layers and an FFN sub-layer. We immediately notice that there is no RNN, LSTM, or CNN. Recurrence has been abandoned.
Attention has replaced recurrence, which requires an increasing number of operations as the distance between two words increases. The attention mechanism is a “word-to-word” operation. The attention mechanism will find how each word is related to all other words in a sequence, including the word being analyzed itself. Let’s examine the following sequence:

The attention mechanism will provide a deeper relationship between words and produce better results.
For each attention sub-layer, the original Transformer model runs not one but eight attention mechanisms in parallel to speed up the calculations. We will explore this architecture in the following section, The encoder stack. This process is named “multihead attention, ” providing:

• A broader in-depth analysis of sequences
• The preclusion of recurrence reducing calculation operations
• The implementation of parallelization, which reduces training time
• Each attention mechanism learns different perspectives of the same input sequence

## 机器学习代写|自然语言处理代写NLP代考|The encoder stack

The layers of the encoder and decoder of the original Transformer model are stacks of layers. Each layer of the encoder stack has the following structure:

The original encoder layer structure remains the same for all of the $N=6$ layers of the Transformer model. Each layer contains two main sub-layers: a multi-headed attention mechanism and a fully connected position-wise feedforward network.
Notice that a residual connection surrounds each main sub-layer, Sublayer $(x)$, in the Transformer model. These connections transport the unprocessed input $x$ of a sublayer to a layer normalization function. This way, we are certain that key information such as positional encoding is not lost on the way. The normalized output of each layer is thus:
LayerNormalization $(x+$ Sublayer $(x))$
Though the structure of each of the $N=6$ layers of the encoder is identical, the content of each layer is not strictly identical to the previous layer.
For example, the embedding sub-layer is only present at the bottom level of the stack. The other five layers do not contain an embedding layer, and this guarantees that the encoded input is stable through all the layers.

Also, the multi-head attention mechanisms perform the same functions from layer 1 to 6 . However, they do not perform the same tasks. Each layer learns from the previous layer and explores different ways of associating the tokens in the sequence. It looks for various associations of words, just like how we look for different associations of letters and words when we solve a crossword puzzle.
The designers of the Transformer introduced a very efficient constraint. The output of every sub-layer of the model has a constant dimension, including the embedding layer and the residual connections. This dimension is $d_{\text {madd }}$ and can be set to another value depending on your goals. In the original Transformer architecture, $d_{\text {madel }}=512$.

## 机器学习代写|自然语言处理代写NLP代考|The rise of the Transformer: Attention Is All You Need

2017 年 12 月，Vaswani 等人。发表了他们的开创性论文，Attention Is All You Need。他们在 Google Research 和 Google Brain 开展工作。在本章和本书中，我将把 Attention Is All You Need 中描述的模型称为“原始 Transformer 模型”。

• 对序列进行更广泛的深入分析
• 排除递归减少计算操作
• 并行化的实现，减少了训练时间
• 每个注意力机制学习相同输入序列的不同视角

## 机器学习代写|自然语言处理代写NLP代考|The encoder stack

LayerNormalization(X+子层(X))

Transformer 的设计者引入了一个非常有效的约束。模型的每个子层的输出都有一个恒定的维度，包括嵌入层和残差连接。这个维度是d疯狂 并且可以根据您的目标设置为另一个值。在最初的 Transformer 架构中，d马德尔 =512.

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 机器学习代写|自然语言处理代写NLP代考|CS224n

statistics-lab™ 为您的留学生涯保驾护航 在代写自然语言处理NLP方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写自然语言处理NLP代写方面经验极为丰富，各种代写自然语言处理NLP相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 机器学习代写|自然语言处理代写NLP代考|Getting Started with the Model Architecture of the Transformer

Language is the essence of human communication. Civilizations would never have been born without the word sequences that form language. We now mostly live in a world of digital representations of language. Our daily lives rely on Natural Language Processing (NLP) digitalized language functions: web search engines, emails, social networks, posts, tweets, smartphone texting, translations, web pages, speech-to-text on streaming sites for transcripts, text-to-speech on hotline services, and many more everyday functions.

In December 2017, the seminal Vaswani et al. Attention Is All You Need article, written by Google Brain members and Google Research, was published. The Transformer was born. The Transformer outperformed the existing state-of-the-art NLP models. The Transformer trained faster than previous architectures and obtained higher evaluation results. Transformers have become a key component of NLP.
The digital world would never have existed without NLP. Natural Language Processing would have remained primitive and inefficient without artificial intelligence. However, the use of Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) comes at a tremendous cost in terms of calculations and machine power.

In this chapter, we will first start with the background of NLP that led to the rise of the Transformer. We will briefly go from early NLP to RNNs and CNNs. Then we will see how the Transformer overthrew the reign of RNNs and CNNs, which had prevailed for decades for sequence analysis.

Then we will open the hood of the Transformer model described by Vaswani et al. (2017) and examine the key components of its architecture. We will explore the fascinating world of attention and illustrate the key components of the Transformer.
This chapter covers the following topics:

• The background of the Transformer
• The architecture of the Transformer
• The Transformer’s self-attention model
• The encoding and decoding stacks
• Input and output embedding
• Positional embedding
• Self-attention
• Residual connections
• Normalization
• Feedforward network
• Output probabilities
Our first step will be to explore the background of the Transformer.

## 机器学习代写|自然语言处理代写NLP代考|The background of the Transformer

In this section, we will go through the background of NLP that led to the Transformer. The Transformer model invented by Google Research has toppled decades of Natural Language Processing research, development, and implementations.
Let us first see how that happened when NLP reached a critical limit that required a new approach.

Over the past $100+$ years, many great minds have worked on sequence transduction and language modeling. Machines progressively learned how to predict probable sequences of words. It would take a whole book to cite all the giants that made this happen.

In this section, I will share my favorite researchers with you to lay the ground for the arrival of the Transformer.

In the early $20^{\text {th }}$ century, Andrey Markov introduced the concept of random values and created a theory of stochastic processes. We know them in artificial intelligence (AI) as Markov Decision Processes (MDPs), Markov Chains, and Markov Processes. In 1902, Markov showed that we could predict the next element of a chain, a sequence, using only the last past element of that chain. In 1913, he applied this to a 20,000 -letter dataset using past sequences to predict the future letters of a chain. Bear in mind that he had no computer but managed to prove his theory, which is still in use today in AI.
In 1948, Claude Shannon’s The Mathematical Theory of Communication was published. He cites Andrey Markov’s theory multiple times when building his probabilistic approach to sequence modeling. Claude Shannon laid the ground for a communication model based on a source encoder, a transmitter, and a received decoder or semantic decoder.

## 机器学习代写|自然语言处理代写NLP代考|Getting Started with the Model Architecture of the Transformer

2017 年 12 月，开创性的 Vaswani 等人。发表了由 Google Brain 成员和 Google Research 撰写的 Attention Is All You Need 文章。变形金刚诞生了。Transformer 的性能优于现有的最先进的 NLP 模型。Transformer 的训练速度比以前的架构更快，并获得了更高的评估结果。Transformer 已成为 NLP 的关键组成部分。

• 变压器的背景
• 变压器的架构
• Transformer 的自注意力模型
• 编码和解码堆栈
• 输入和输出嵌入
• 位置嵌入
• 自注意力
• 多头注意力
• 蒙面多注意
• 剩余连接
• 正常化
• 前馈网络
• 输出概率
我们的第一步是探索 Transformer 的背景。

## 机器学习代写|自然语言处理代写NLP代考|The background of the Transformer

1948年，克劳德·香农的《通信的数学理论》出版。在构建序列建模的概率方法时，他多次引用了 Andrey Markov 的理论。Claude Shannon 为基于源编码器、发射器和接收解码器或语义解码器的通信模型奠定了基础。

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 机器学习代写|自然语言处理代写NLP代考|JAPANESE GRAMMAR

statistics-lab™ 为您的留学生涯保驾护航 在代写自然语言处理NLP方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写自然语言处理NLP代写方面经验极为丰富，各种代写自然语言处理NLP相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 机器学习代写|自然语言处理代写NLP代考|Japanese Postpositions

Instead of prepositions, Japanese uses postpositions (which can occur multiple times in a sentence). Here are some common Japanese postpositions that are written in Romanji:

• Ka (a marker for a question)
• Wa (the topic of a sentence)
• Ga (the subject of a sentence)
• $\mathrm{O}$ (direct object)
• To (can mean “for” and “and”)
• Ni (physical motion toward something)
• E (toward something)
The particle $k a$ at the end of a sentence in Japanese indicates a question. A simple example of $k a$ is the Romanji sentence Nan desu ka, which means “What is it?”

An example of wa is the following sentence: Watashi wa Nihon jin desu, which means “As for me, I’m Japanese.” By contrast, the sentence Watashi ga Nihon jin desu, which means “It is I (not somebody else) who is Japanese.”
As you can see, Japanese makes a distinction between the topic of a sentence (with $w a$ ) versus the subject of a sentence (with $g a$ ). A Japanese sentence can contain both particles $w a$ and $g a$, with the following twist: if a negative fact is expressed about the noun that precedes $g a$, then $g a$ is replaced with $w a$ and the main verb is written in the negative form. For example, the Romanji sentence “I still have not studied Kanji” is translated into Hiragana as follows:
Watashi wa kanji wa mada benkyou shite imasen.

## 机器学习代写|自然语言处理代写NLP代考|Ambiguity in Japanese Sentences

Since Japanese does not pluralize nouns, the same word is used for singular as well as plural, which requires contextual information to determine the exact meaning of a Japanese sentence. As a simple illustration, which is discussed

in more detail later in this chapter under the topic of tokenization, here is a Japanese sentence written in Romanji, followed by Hiragana and Kanji (the second and third sentences are from Google Translate):
Watashi wa tomodachi ni hon o agemashita
$\mathrm{~ क た L ~ क ~ と も た ゙ ち ~ に ~ ほ h ~ お}$

The preceding sentence can mean any of the following, and the correct interpretation depends on the context of a conversation:

• I gave a book to a friend.
• I gave a book to friends.
• I gave books to a friend.
• I gave books to friends.
Moreover, the context for the words “friend” and “friends” in the Japanese sentence is also ambiguous: they do not indicate whose friends (mine, yours, his, or hers). In fact, the following Japanese sentence is also grammatically correct and ambiguous:
Tomodachi ni hon o agemashita
The preceding sentence does not specify who gave a book (or books) to a friend (or friends), but its context will be clear during a conversation. Incidentally, Japanese people often omit the subject pronoun (unless the sentence becomes ambiguous), so it’s more common to see the second sentence (i.e., without Watashi wa) instead of the first Romanji sentence.

Contrast the earlier Japanese sentence with its counterpart in the romance languages Italian, Spanish, French, Portuguese, and German (some accent marks are missing for some words):

• Italian: Ho dato un libro a mio amico.
• Spanish: [Yo] Le di un libro a mi amigo.
• Portuguese: Eu dei um livro para meu amigo.
• French: Jai donne un livre au mon ami.
• German. Ich habe ein Buch dem Freund gegeben.
Notice that the Italian and French sentences use a compound verb whose two parts are consecutive (adjacent), whereas German uses a compound verb in which the second part (the past participle) is at the end of the sentence. However, the Spanish and Portuguese sentences use the simple past (the preterit) form of the verb “to give.”

## 机器学习代写|自然语言处理代写NLP代考|Japanese Nominalization

Nominalizers convert verbs (or even entire sentences) into a noun. Nominalizers resemble a “that” clause in English, and they are useful when speaking about an action as a noun. Japanese has two nominalizers: no and koto ga.

The nominalizer $O$ (no) is required with verbs of perception, such as 見 (to see) and 閆 $<$ (to listen). For example, the following sentence mean “I love listening to music”, written in Romanji in the first sentence, followed by a second sentence that contains a mixture of Kanji and Hiragana:
Watashi wa ongaku o kiku no ga daisuki desu
The next three sentences all mean “He loves reading a newspaper,” written in Romanji and then Hiragana and Kanji:
Kare wa shimbun o yomu no ga daisuki desu
$\mathrm{~ カ ั 丸 は 新 間 を 読}$

The koto ga nominalizer, which is the other Japanese nominalizer, is used sentences of the form “have you ever …” For example, the following sentence means “Have you (ever) been in Japan?”

## 机器学习代写|自然语言处理代写NLP代考|Japanese Postpositions

• Ka（问题的标记）
• Wa（一个句子的主题）
• 嘎（句子的主语）
• ○（直接宾语）
• To（可以表示“for”和“and”）
• Ni（朝向某物的物理运动）
• E（朝向某物）
粒子ķ一个日语句末表示疑问。一个简单的例子ķ一个是罗马字句 Nan desu ka，意思是“它是什么？”

Watashi wa kanji wa mada benkyou shite imasen。

## 机器学习代写|自然语言处理代写NLP代考|Ambiguity in Japanese Sentences

Watashi wa tomodachi ni hon oagemashita
कたकともだちにほお ķ稻田大号 ķ 什么时候和稻田゙血液 至 何H 哦

• 我给了朋友一本书。
• 我给了朋友一本书。
• 我把书送给了一个朋友。
• 我把书送给了朋友。
此外，日语句子中“朋友”和“朋友”这两个词的上下文也是模棱两可的：它们不表示谁的朋友（我的、你的、他的或她的）。事实上，下面的日语句子在语法上也是正确的和模棱两可的：
Tomodachi ni hon oagemashita
前面的句子没有具体说明谁把一本书（或几本书）送给了一个朋友（或几个朋友），但它的上下文在对话中会很清楚. 顺便说一句，日本人经常省略主语代词（除非句子变得模棱两可），因此更常见的是看到第二个句子（即没有 Watashi wa）而不是第一个罗马字句子。

• 意大利人：我给了我朋友一本书。
• Chinese: [我] 给了我的朋友一本书。
• Chinese: 我给了我朋友一本书。
• Chinese: 我给了我朋友一本书。
• 德语。Ich habe ein Buch dem Freund gegeben。
请注意，意大利语和法语句子使用复合动词，其两个部分是连续的（相邻），而德语使用复合动词，其中第二部分（过去分词）位于句子的末尾。然而，西班牙语和葡萄牙语的句子使用动词“to give”的简单过去（preterit）形式。

## 机器学习代写|自然语言处理代写NLP代考|Japanese Nominalization

Watashi wa ongaku o kiku no ga daisuki desu

Kare wa shimbun o yomu no ga daisuki desu
カั丸は新間を読 力量○丸牙齿新間的阅读

koto ga 名词化器是另一个日语名词化器，用于“你曾经……”形式的句子例如，下面的句子表示“你（曾经）去过日本吗？”

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 机器学习代写|自然语言处理代写NLP代考|THE COMPLEXITY OF NATURAL LANGUAGES

statistics-lab™ 为您的留学生涯保驾护航 在代写自然语言处理NLP方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写自然语言处理NLP代写方面经验极为丰富，各种代写自然语言处理NLP相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 机器学习代写|自然语言处理代写NLP代考|Word Order in Sentences

As mentioned previously, German and Slavic languages allow for a rearrangement of the words in sentences because those languages support declension, which involves modifying the endings of articles and adjectives in accordance with the grammatical function of those words in a sentence (such as the subject, direct object, and indirect object). Those word endings are loosely comparable to prepositions in English, and sometimes they have the same spelling for different grammatical functions. For example, in German, the article den precedes a masculine noun that is a direct object and also a plural noun that is an indirect object: ambiguity can occur if the singular masculine noun has the same spelling in its plural form.

Alternatively, since English is word order dependent, ambiguity can still arise in sentences, which we have learned to parse correctly without any conscious effort.

Groucho Marx often incorporated ambiguous sentences in his dialogues, such as the following paraphrased examples:

“This morning I shot an elephant in my pajamas. How he got into my pajamas I have no idea.”

“In America, a woman gives birth to a child every fifteen minutes. Somebody needs to find that woman and stop her.”

Now consider the following pair of sentences involving a boy, a mountain, and a telescope:
I saw the boy on the mountain with the telescope.
I saw the boy with the telescope on the mountain.
Human speakers interpret both English sentences as having the same meaning; however, arriving at the same interpretation is less obvious from the standpoint of a purely NLP task. Why does this ambiguity in the preceding example not arise in Russian? The reason is simple: the preposition with is associated with the instrumental case in Russian, whereas on is not the instrumental case, and therefore the nouns have suffixes that indicate the distinction.

## 机器学习代写|自然语言处理代写NLP代考|Languages and Regional Accents

Accents, slang, and dialects have some common features, but there can be some significant differences. Accents involve modifying the standard pronunciation of words, which can vary significantly in different parts of the same country.

One interesting phenomenon pertains to the southern region of some countries (in the northern hemisphere), which tend to have a more “relaxed” pronunciation compared to the northern region of that country. For example, some people in the southeastern United States speak with a so-called “drawl,” whereas newscasters will often speak with a midwestern pronunciation, which is considered a neutral pronunciation. The same is true of people in Tokyo, who often speak Japanese with a “flat” pronunciation (which is also true of Japanese newscasters on NHK), versus people from the Kansai region (Kyoto, Kobe, and Osaka) of Japan, who vary the tone and emphasis of Japanese words.

Regional accents can also involve modifying the meaning of words in ways that are specific to the region in question. For example, Texans will say “I’m fixing to graduate this year” whereas people from other parts of the United States would say “going” instead of “fixing.” In France, Parisians are unlikely to say Il faut fatiguer la salade (“it’s necessary to toss the salad”), whereas this sentence is much more commonplace in southern France. (The English word “fatigue” is derived from the French verb fatiguer)

Verbs exist in every written language, and they undergo conjugation that reflects their tense and mood in a sentence. Such languages have an overlapping set of verb tenses, but there are differences. For instance, Portuguese has a future perfect subjunctive, as does Spanish (but it’s almost never used in spoken form), whereas these verb forms do not exist in English. English verb tenses (in the indicative mood) can include:

• present
• present perfect
• present progressive
• present perfect progressive
• preterite (simple past)
• past perfect
• past progressive
• past perfect progressive
• future tense
• future perfect
• future progressive
• future perfect progressive (does not exist in Italian)

Here are some examples of English sentences that illustrate (most of) the preceding verb forms:

• I have read a book.
• I am reading a book.
• I have been reading a book.
• I have read a book.
• I will read a book.
• I will have read a book.
• I will be reading a book.
• At 6 p.m., I will have been reading a book for 3 hours.
Verb moods can be indicative (as shown in the preceding list), subjunctive (discussed soon), and conditional (“I would go but I have work to do”). In English, subjunctive verb forms can include the present subjunctive (“I insist that he do the task”), the past subjunctive (“If I were you”), and the pluperfect subjunctive (“Had I but known …”). Interestingly, Portuguese also provides a future perfect subjunctive verb form; Spanish also has this verb form but it’s never used in conversation.

Interestingly (from a linguistic perspective, at least), there are modern languages, such as Mandarin, that have only one verb tense: they rely on other words in a sentence (such as time adverbs or aspect particles) to convey the time frame. Such languages would express the present, the past, and the future in a form that is comparable to the following:

• “I read a book now.”
• “I read a book yesterday.”
• “I read a book tomorrow.”

## 机器学习代写|自然语言处理代写NLP代考|Word Order in Sentences

“今天早上我穿着睡衣射了一头大象。我不知道他是怎么穿上我的睡衣的。”

“在美国，每十五分钟就有一个女人生一个孩子。需要有人找到那个女人并阻止她。”

## 机器学习代写|自然语言处理代写NLP代考|Languages and Regional Accents

• 当下
• 现在完美
• 现在进行
• 现在完成进行时
• preterite （简单过去）
• 过去完成时
• 过去进步
• 过去完成进行时
• 将来时
• 未来完美
• 未来进步
• 将来完成进行时（意大利语中不存在）

• 我读了一本书。
• 我读过一本书。
• 我正在读一本书。
• 我一直在看书。
• 我读了一本书。
• 我读过一本书。
• 我一直在看书。
• 我会读一本书。
• 我会读一本书。
• 我会读一本书。
• 下午 6 点，我会读 3 个小时的书。
动词语气可以是指示性的（如前面的列表所示）、虚拟语气（很快会讨论）和条件性的（“我会去，但我有工作要做”）。在英语中，虚拟语气动词形式可以包括现在虚拟语气（“我坚持他做任务”）、过去虚拟语气（“如果我是你”）和过去完成虚拟语气（“如果我知道……”）。有趣的是，葡萄牙语还提供了将来完成的虚拟语气动词形式；西班牙语也有这种动词形式，但从未在对话中使用。

• “我现在读了一本书。”
• “我昨天看了一本书。”
• “我明天看书。”

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 机器学习代写|自然语言处理代写NLP代考|Peak Usage of Some Languages

statistics-lab™ 为您的留学生涯保驾护航 在代写自然语言处理NLP方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写自然语言处理NLP代写方面经验极为丰富，各种代写自然语言处理NLP相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 机器学习代写|自然语言处理代写NLP代考|Peak Usage of Some Languages

As you might have surmised, different languages have been in an influential position during the past 2,000 years. If you trace the popularity and influence of Indo-European languages, you will find periods of time with varying degrees of influence involving multiple languages, including Hebrew, Greek, Latin, Arabic, French, and English.

Latin is an Indo-European language (apparently derived from the Etruscan and Greek alphabets), and during the lst century AD, Latin became a mainstream language. In addition, romance languages are derived from Latin. Today Latin is considered a dead language in the sense that it’s not actively spoken on a daily basis by large numbers of people. The same is true of Sanskrit, which is a very old language from India.

During the Roman Empire, Latin and Greek were the official languages for administrative as well as military activities. In addition, Latin was an important language for diplomacy among countries for many centuries after the fall of the Roman Empire.

You might be surprised to know that Arabic was the lingua franca throughout the Mediterranean during the 10th and 11th centuries AD. As another example, French was spoken in many parts of Europe during the 18th century, including the Russian aristocracy.

Today English appears to be in its ascendancy in terms of the number of native English speakers as well as the number of people who speak English as a second (or third or fourth) language. Although Mandarin is a widely spoken Asian language, English is the lingua franca for commerce as well as technology: virtually every computer language is based on English.

## 机器学习代写|自然语言处理代写NLP代考|Languages and Regional Accents

Accents, slang, and dialects have some common features, but there can be some significant differences. Accents involve modifying the standard pronunciation of words, which can vary significantly in different parts of the same country.

One interesting phenomenon pertains to the southern region of some countries (in the northern hemisphere), which tend to have a more “relaxed” pronunciation compared to the northern region of that country. For example, some people in the southeastern United States speak with a so-called “drawl,” whereas newscasters will often speak with a midwestern pronunciation, which is considered a neutral pronunciation. The same is true of people in Tokyo, who often speak Japanese with a “flat” pronunciation (which is also true of Japanese newscasters on NHK), versus people from the Kansai region (Kyoto, Kobe, and Osaka) of Japan, who vary the tone and emphasis of Japanese words.

Regional accents can also involve modifying the meaning of words in ways that are specific to the region in question. For example, Texans will say “I’m fixing to graduate this year” whereas people from other parts of the United States would say “going” instead of “fixing.” In France, Parisians are unlikely to say Il faut fatiguer la salade (“it’s necessary to toss the salad”), whereas this sentence is much more commonplace in southern France. (The English word “fatigue” is derived from the French verb fatiguer)

## 机器学习代写|自然语言处理代写NLP代考|Languages and Slang

The existence of slang words is interesting and perhaps inevitable, they seem to flourish in every human language. Sometimes slang words are used for obfuscation so that only members of an “in group” understand the modified meaning of those words. Slang words can also be a combination of existing words, new words (but not officially recognized), and short-hand expressions. Slang can also “invert” the meaning of words (“bad” instead of “good”), which can be specific to an age group, minority, or region. In addition, slang can also assign an entirely unrelated meaning to a standard word (e.g., the slang terms “that’s dope,” “that’s sick,” and “the bomb”).

Slang words can also be specific to an age group to prevent communication with members of different age groups. For example, Japanese teens can communicate with each other by reversing the order of the syllables in a word, which renders those “words” incomprehensible to adults. The inversion of syllables is far more complex than “pig Latin,” in which the first letter of a word

is shifted to the end of the word, followed by the syllable “ay.” For example, “East Bay” (an actual location in the Bay Area in Silicon Valley) is humorously called “beast” in pig Latin.

Teenagers also use acronyms (perhaps as another form of slang) when sending text messages to each other. For example, the acronym “aos” means “adult over shoulder.” The acronym “bos” has several different meanings, including “brother over shoulder” and “boyfriend over shoulder.”

The slang terms that you use with your peers invariably simplifies communication with others in your in-group, sometimes accompanied by specialized interpretations to words (such as reversing their meaning). A simple example is the word zanahoria, which is the Spanish word for carrot. In colloquial speech in Venezuela, calling someone a zanahoria means that that person is very conservative and as “straight” as a carrot.

Slang enables people to be creative and also playfully break the rules of language. Both slang and colloquial speech simplify formal language and rarely (if ever) introduce greater complexity in alternate speech rules.

Perhaps that’s the reason that slang and colloquial speech cannot be controlled or regulated by anyone (or by any language committee): like water, they are fluid and adapt to the preferences of their speakers.

One more observation: while slang can be viewed as a creative by-product of standard speech, there is a reverse effect that can occur in certain situations. For example, you have probably noticed how influential subgenres are eventually absorbed (perhaps only partially) into mainstream culture: witness how commercials eventually incorporated a “softened” form of rap music and its rhythm in commercials for personal products. There’s a certain irony in hearing “Stairway to Heaven” as elevator music.

Another interesting concept is a “meme” (which includes Internet memes) in popular culture, which refers to something with humorous content. While slang words are often used to exclude people, a meme often attempts to communicate a particular sentiment. One such meme is “OK Boomer,” which some people view as a derogatory remark that’s sometimes expressed in a snarky manner, and much less often interpreted as a humorous term. Although language dialects can also involve regional accents and slang, they also have more distinct characteristics, as discussed in the next section.

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 机器学习代写|自然语言处理代写NLP代考|NLP Concepts

statistics-lab™ 为您的留学生涯保驾护航 在代写自然语言处理NLP方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写自然语言处理NLP代写方面经验极为丰富，各种代写自然语言处理NLP相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 机器学习代写|自然语言处理代写NLP代考|THE ORIGIN OF LANGUAGES

Someone once remarked that “the origin of language is an enigma,” which is viscerally appealing because it has at least a kernel of truth. Although there are multiple theories that attempt to explain how and why languages developed, none of them has attained universal consensus. Nevertheless, there is no doubt that humans have far surpassed all other species in terms of language development.

There is also the question of how the vocabulary of a language is formed, which can be the confluence of multiple factors, as well as meaning in a language. According to Ludwig Wittgenstein (1953), who was an influential philosopher in many other fields, language derives its meaning from use.

One theory about the evolution of language in humans asserts that the need for communication between humans makes language a necessity. Another explanation is that language is influenced by the task of creating complex tools, because the latter requires a precise sequence of steps, which ultimately spurred the development of languages.

Without delving into their details, the following list contains some theories that have been proposed regarding language development. Keep in mind that they vary in terms of their support in the academic community:

• Strong Minimalist Thesis
• The FlintKnapper Theory
• The Sapir-Whorf Hypothesis
• Universal Grammar (Noam Chomsky)
The Strong Minimalist Thesis (SRT) asserts that language is based on something called the hierarchical syntactic structure. The FlintKnapper Theory asserts that the ability to create complex tools involved an intricate sequence of steps, which in turn necessitated communication between people. In simplified terms, the Sapir-Whorf Hypothesis (also called the linguistic relativity hypothesis, which is a slightly weaker form) posits that the language we speak influences how we think. Consider how our physical environment can influence our spoken language: Eskimos have several words to describe snow, whereas people in some parts of the Middle East have never seen a snow storm.

## 机器学习代写|自然语言处理代写NLP代考|Language Fluency

As mentioned in the previous section, human infants are capable of producing the sounds of any language, given enough opportunity to imitate those

sounds. They tend to lose some of that capacity as they become older, which might explain why some adults speak another language with an accent (of course, there are plenty of exceptions).

Interestingly, babies respond favorably to the sound of vowel-rich “Parentese” and a study in 2018 suggested that babies prefer the sound of other babies instead of their mother:
https://getpocket.com/explore/item/babies-prefer-the-sounds-of-otherbabies-to-the-cooing-of-their-parents

There are two interesting cases in which people can acquire native-level speech capability. The first case is intuitive: people who have been raised in a bilingual (or multilingual) environment tend to have a greater capacity for learning how to speak other languages with native level (or near native level) speech. Second, people who speak phonetic languages have an advantage when they study another phonetic language, especially one that is in their language group, because they already know how to pronounce the majority of vowel sounds. languages whose pronunciation can be a challenge for practically every non-native speaker. For example, letters that have a guttural sound (such as those in Dutch, German, and Arabic), the glottal stop (most noticeable in Arabic), and the letter “ain” in Arabic are generally more challenging to pronounce for native speakers of romance languages and some Asian languages.

To some extent, the non-phonetic nature of the English language might explain why some monolingual native-English speakers might struggle with learning to speak other languages with native-level speech. Perhaps the closest language to English (in terms of cadence) is Dutch, and people from Holland can often speak native-level English. This tends to be true of Swedes and Danes as well, whose languages are Germanic, but not necessarily true of Germans, who can speak perfect grammatical English but sometimes speak English with an accent.

Perhaps somewhat ironically, sometimes accents can impart a sort of cachet, such as speaking with a British or Australian accent in the United States. Indeed, a French accent can also add a certain je-ne-sais-quoi to a speaker in various parts of the United States.

## 机器学习代写|自然语言处理代写NLP代考|Major Language Groups

There are more than 140 language families, and the six largest language families (based on language count) are listed here:

• Niger-Congo
• Austronesian
• Trans-New Guinea
• Sino-Tibetan
• Indo-European
• Afro-Asiatic
English belongs to the Indo-European group, Mandarin belongs to the Sino-Tibetan, and Arabic belongs to the Afro-Asiatic group. According to Wikipedia, Indo-European languages comprise almost 600 languages, including most of the languages in Europe, the northern Indian subcontinent, and the Iranian plateau. Almost half the world speaks an Indo-European language as a native language, which is greater than any of the language groups listed in the introduction of this section. Indo-European has several major language subgroups, which are Germanic, Slavic, and Romance languages. The preceding information is from the following Wikipedia link:
https://en.wikipedia.org/wiki/List_of_language_families
As of 2019 , the top four languages that are spoken in the world, which counts the number of people who are native speakers or secondary speakers, are as follows:
• English: $1.268$ billion
• Mandarin: $1.120$ billion
• Hindi: $637.3$ million
• Spanish: $537.9$ million
• French: $276.6$ million
The preceding information is from the following Wikipedia link:
https://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_ speakers

Many factors can influence the expansion of a given language into multiple countries, such as commerce, economic factors, technological influence, and warfare, thereby resulting in the absorption of new words by another language. Somewhat intuitively, countries with a common border influence each other’s language, sometimes resulting in new hybrid languages.

## 机器学习代写|自然语言处理代写NLP代考|THE ORIGIN OF LANGUAGES

• 强大的极简主义论文
• FlintKnapper 理论
• Sapir-Whorf 假说
• 通用语法 (Noam Chomsky)
强极简主义论文 (SRT) 断言语言是基于一种称为层次句法结构的东西。FlintKnapper 理论断言，创建复杂工具的能力涉及一系列错综复杂的步骤，这反过来又需要人与人之间的交流。简而言之，Sapir-Whorf 假设（也称为语言相对论假设，这是一种稍弱的形式）假设我们所说的语言会影响我们的思维方式。想想我们的物理环境如何影响我们的口语：爱斯基摩人有几个词来形容雪，而中东一些地区的人们从未见过暴风雪。

## 机器学习代写|自然语言处理代写NLP代考|Language Fluency

https://getpocket.com/explore/item/babies-prefer-the-sounds-of-otherbabies-to-the-cooing-of-their-parents

## 机器学习代写|自然语言处理代写NLP代考|Major Language Groups

• 尼日尔-刚果
• 南岛语
• 跨新几内亚
• 汉藏
• 印欧语系
• 亚非
英语属于印欧语系，普通话属于汉藏语系，阿拉伯语属于亚非语系。根据维基百科，印欧语系包括近 600 种语言，包括欧洲、印度北部次大陆和伊朗高原的大部分语言。世界上几乎有一半的人将印欧语作为母语，这比本节介绍中列出的任何语言组都多。印欧语有几个主要的语言亚群，它们是日耳曼语、斯拉夫语和罗曼语。上述信息来自以下维基百科链接：
https ://en.wikipedia.org/wiki/List_of_language_families
截至 2019 年，以母语为母语或第二母语的人数计算，世界上使用最多的四种语言如下：
• 英语：1.268十亿
• Mandarin: 1.120十亿
• 印地语：637.3百万
• 西班牙语：537.9百万
• 法语：276.6万以下是美国
使用的语言列表

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 机器学习代写|自然语言处理代写NLP代考|WHAT IS IMBALANCED CLASSIFICATION

statistics-lab™ 为您的留学生涯保驾护航 在代写自然语言处理NLP方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写自然语言处理NLP代写方面经验极为丰富，各种代写自然语言处理NLP相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 机器学习代写|自然语言处理代写NLP代考|WHAT IS IMBALANCED CLASSIFICATION

Imbalanced classification involves datasets with imbalanced classes. For example, suppose that class A has $99 \%$ of the data and class B has $1 \%$. Which classification algorithm would you use? Unfortunately, classification algorithms

don’t work well with this type of imbalanced dataset. Here is a list of several well-known techniques for handling imbalanced datasets:

• Random resampling rebalances the class distribution.
• Random oversampling duplicates data in the minority class.
• Random undersampling deletes examples from the majority class.
• SMOTE
Random resampling transforms the training dataset into a new dataset, which is effective for imbalanced classification problems.

The random undersampling technique removes samples from the dataset, and involves the following:

• randomly remove samples from majority class
• can be performed with or without replacement
• alleviates imbalance in the dataset
• may increase the variance of the classifier
• may discard useful or important samples
However, random undersampling does not work well with a dataset that has a $99 \% / 1 \%$ split into two classes. Moreover, undersampling can result in losing information that is useful for a model.

Instead of random undersampling, another approach involves generating new samples from a minority class. The first technique involves oversampling examples in the minority class and duplicate examples from the minority class.
There is another technique that is better than the preceding technique, which involves the following:

• synthesize new examples from minority class
• a type of data augmentation for tabular data
• this technique can be very effective
• generate new samples from minority class
Another well-known technique is called SMOTE, which involves data augmentation (i.e., synthesizing new data samples) well before you use a classification algorithm. SMOTE was initially developed by means of the kNN algorithm (other options are available), and it can be an effective technique for handling imbalanced classes.

Yet another option to consider is the Python package imbal anced-learn in the scikit-learn-contrib project. This project provides various re-sampling techniques for datasets that exhibit class imbalance. More details are available online:
https://github.com/scikit-learn-contrib/imbalanced-learn.

## 机器学习代写|自然语言处理代写NLP代考|WHAT IS SMOTE

SMOTE is a technique for synthesizing new samples for a dataset. This technique is based on linear interpolation:

• Step 1: Select samples that are close in the feature space.
• Step 2: Draw a line between the samples in the feature space.
• Step 3: Draw a new sample at a point along that line.
A more detailed explanation of the SMOTE algorithm is as follows:
• Select a random sample “a” from the minority class.
• Find $\mathrm{k}$ nearest neighbors for that example.
• Select a random neighbor “b” from the nearest neighbors.
• Create a line “L” that connects “a” and “b.”
• Randomly select one or more points “c” on line L.
If need be, you can repeat this process for the other $(\mathrm{k}-1)$ nearest neighbors to distribute the synthetic values more evenly among the nearest neighbors.

The initial SMOTE algorithm is based on the kNN classification algorithm, which has been extended in various ways, such as replacing $\mathrm{kNN}$ with SVM. A list of SMOTE extensions is shown as follows:

• selective synthetic sample generation
• Borderline-SMOTE (kNN)
• Borderline-SMOTE (SVM)

## 机器学习代写|自然语言处理代写NLP代考|ANALYZING CLASSIFIERS

This section is marked “optional” because its contents pertain to machine learning classifiers, which are not the focus of this book. However, it’s still worthwhile to glance through the material, or perhaps return to this section after you have a basic understanding of machine learning classifiers.

Several well-known techniques are available for analyzing the quality of machine learning classifiers. Two techniques are LIME and ANOVA, both of which are discussed in the following subsections.

LIME is an acronym for Local Interpretable Model-Agnostic Explanations. LIME is a model-agnostic technique that can be used with machine learning models. In LIME, you make small random changes to data samples and then observe the manner in which predictions change (or not). The approach involves changing the output (slightly) and then observing what happens to the output.

By way of analogy, consider food inspectors who test for bacteria in truckloads of perishable food. Clearly, it’s infeasible to test every food item in a truck (or a train car), so inspectors perform “spot checks” that involve testing randomly selected items. In an analogous fashion, LIME makes small changes to input data in random locations and then analyzes the changes in the associated output values.

However, there are two caveats to keep in mind when you use LIME with input data for a given model:

1. The actual changes to input values are model-specific.
2. This technique works on input that is interpretable.
Examples of interpretable input include machine learning classifiers (such as trees and random forests) and NLP techniques such as BoW (Bag of Words). Non-interpretable input involves “dense” data, such as a word embedding (which is a vector of floating point numbers).

You could also substitute your model with another model that involves interpretable data, but then you need to evaluate how accurate the approximation is to the original model.

## 机器学习代写|自然语言处理代写NLP代考|WHAT IS IMBALANCED CLASSIFICATION

• 随机重采样重新平衡类分布。
• 随机过采样会复制少数类中的数据。
• 随机欠采样从多数类中删除示例。
• SMOTE
随机重采样将训练数据集转换为新的数据集，这对于不平衡的分类问题是有效的。

• 从多数类中随机删除样本
• 可以在有或没有更换的情况下进行
• 减轻数据集中的不平衡
• 可能会增加分类器的方差
• 可能会丢弃有用或重要的样本
但是，随机欠采样不适用于具有99%/1%分为两类。此外，欠采样会导致丢失对模型有用的信息。

• 从少数类中合成新的例子
• 表格数据的一种数据扩充
• 这种技术非常有效
• 从少数类生成新样本
另一种众所周知的技术称为 SMOTE，它在使用分类算法之前就涉及数据增强（即合成新数据样本）。SMOTE 最初是通过 kNN 算法（其他选项可用）开发的，它可以成为处理不平衡类的有效技术。

https://github.com/scikit-learn-contrib/imbalanced-learn。

## 机器学习代写|自然语言处理代写NLP代考|WHAT IS SMOTE

SMOTE 是一种为数据集合成新样本的技术。该技术基于线性插值：

• 步骤 1：选择特征空间中相近的样本。
• 第 2 步：在特征空间中的样本之间画一条线。
• 第 3 步：在沿该线的一点绘制一个新样本。
SMOTE算法更详细的解释如下：
• 从少数类中选择一个随机样本“a”。
• 寻找ķ该示例的最近邻居。
• 从最近的邻居中选择一个随机邻居“b”。
• 创建一条连接“a”和“b”的线“L”。
• 在L线上随机选择一个或多个点“c”。
如果需要，您可以对另一个重复此过程(ķ−1)最近的邻居在最近的邻居之间更均匀地分配合成值。

• 选择性合成样品生成
• 边界-SMOTE (kNN)
• 边界-SMOTE (SVM)

## 机器学习代写|自然语言处理代写NLP代考|ANALYZING CLASSIFIERS

LIME 是 Local Interpretable Model-Agnostic Explanations 的首字母缩写词。LIME 是一种与模型无关的技术，可与机器学习模型一起使用。在 LIME 中，您对数据样本进行小的随机更改，然后观察预测更改（或不更改）的方式。该方法涉及（稍微）更改输出，然后观察输出发生了什么。

1. 输入值的实际变化是特定于模型的。
2. 这种技术适用于可解释的输入。
可解释输入的示例包括机器学习分类器（例如树和随机森林）和 NLP 技术，例如 BoW（词袋）。不可解释的输入涉及“密集”数据，例如词嵌入（它是浮点数的向量）。

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 机器学习代写|自然语言处理代写NLP代考|MISSING DATA, ANOMALIES, AND OUTLIERS

statistics-lab™ 为您的留学生涯保驾护航 在代写自然语言处理NLP方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写自然语言处理NLP代写方面经验极为丰富，各种代写自然语言处理NLP相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 机器学习代写|自然语言处理代写NLP代考|Missing Data

How you decide to handle missing data depends on the specific dataset. Here are some ways to handle missing data (the first three techniques are manual techniques, and the other techniques are algorithms):

1. replace missing data with the mean/median/mode value
2. infer (“impute”) the value for missing data
3. delete rows with missing data
4. isolation forest (tree-based algorithm)
5. minimum covariance determinant
6. local outlier factor
7. one-class SVM (Support Vector Machines)
In general, replacing a missing numeric value with zero is a risky choice: this value is obviously incorrect if the values of a feature are between 1,000 and 5,000 . For a feature that has numeric values, replacing a missing value with the average value is better than the value zero (unless the average equals zero); also consider using the median value. For categorical data, consider using the mode to replace a missing value.

If you are not confident that you can impute a “reasonable” value, consider dropping the row with a missing value, and then train a model with the imputed value and also with the deleted row.

One problem that can arise after removing rows with missing values is that the resulting dataset is too small. In this case, consider using SMOTE, which is discussed later in this chapter, in order to generate synthetic data.

## 机器学习代写|自然语言处理代写NLP代考|Anomalies and Outliers

In simplified terms, an outlier is an abnormal data value that is outside the range of “normal” values. For example, a person’s height in centimeters is typically between 30 centimeters and 250 centimeters. Hence, a data point (e.g., a row of data in a spreadsheet) with a height of 5 centimeters or a height of 500 centimeters is an outlier. The consequences of these outlier values are unlikely to involve a significant financial or physical loss (though they could adversely affect the accuracy of a trained model).

Anomalies are also outside the “normal” range of values (just like outliers), and they are typically more problematic than outliers: anomalies can have more severe consequences than outliers. For example, consider the scenario in which someone who lives in California suddenly makes a credit

card purchase in New York. If the person is on vacation (or a business trip), then the purchase is an outlier (it’s outside the typical purchasing pattern), but it’s not an issue. However, if that person was in California when the credit card purchase was made, then it’s most likely to be credit card fraud, as well as an anomaly.

Unfortunately, there is no simple way to decide how to deal with anomalies and outliers in a dataset. Although you can drop rows that contain outliers, keep in mind that doing so might deprive the dataset-and therefore the trained model – of valuable information. You can try modifying the data values (described as follows), but again, this might lead to erroneous inferences in the trained model. Another possibility is to train a model with the dataset that contains anomalies and outliers, and then train a model with a dataset from which the anomalies and outliers have been removed. Compare the two results and see if you can infer anything meaningful regarding the anomalies and outliers.

## 机器学习代写|自然语言处理代写NLP代考|Outlier Detection

Although the decision to keep or drop outliers is your decision to make, there are some techniques available that help you detect outliers in a dataset. This section contains a short list of some techniques, along with a very brief description and links for additional information.

Perhaps trimming is the simplest technique (apart from dropping outliers), which involves removing rows whose feature value is in the upper $5 \%$ range or the lower $5 \%$ range. Winsorizing the data is an improvement over trimming: set the values in the top $5 \%$ range equal to the maximum value in the 95 th percentile, and set the values in the bottom $5 \%$ range equal to the minimum in the 5th percentile.

The Minimum Covariance Determinant is a covariance-based technique, and a Python-based code sample that uses this technique is available online:
https://scikit-learn.org/stable/modules/outlier_detection.html.
The Local Outlier Factor (LOF) technique is an unsupervised technique that calculates a local anomaly score via the kNN (k Nearest Neighbor) algorithm. Documentation and short code samples that use LOF are available online:
https://scikit-learn.org/stable/modules/generated/sklearn.neighbors. LocalOutlierFactor.html.

Two other techniques involve the Huber and the Ridge classes, both of which are included as part of Sklearn. The Huber error is less sensitive to

outliers because it’s calculated via the linear loss, similar to the MAE (Mean Absolute Error). A code sample that compares Huber and Ridge is available online:
https://scikit-learn.org/stable/auto_examples/linear_model/plot_huber_ ts_ridge.html.

You can also explore the Theil-Sen estimator and RANSAC, which are “robust” against outliers:
https://scikit-learn.org/stable/auto_examples/linear_model/plot_theilsen. html and
https://en.wikipedia.org/wiki/Random_sample_consensus.
Four algorithms for outlier detection are discussed at the following site:
https://www.kdnuggets.com/2018/12/four-techniques-outlier-detection. html.

One other scenario involves “local” outliers. For example, suppose that you use kMeans (or some other clustering algorithm) and determine that a value is an outlier with respect to one of the clusters. While this value is not necessarily an “absolute” outlier, detecting such a value might be important for your use case.

## 机器学习代写|自然语言处理代写NLP代考|Missing Data

1. 用均值/中值/众数替换缺失数据
2. 推断（“估算”）缺失数据的值
3. 删除缺少数据的行
4. 隔离森林（基于树的算法）
5. 最小协方差行列式
6. 局部异常因子
7. 一类 SVM（支持向量机）
一般来说，用零替换缺失的数值是一种冒险的选择：如果特征的值介于 1,000 和 5,000 之间，这个值显然是不正确的。对于具有数值的特征，用平均值代替缺失值优于零值（除非平均值等于零）；还可以考虑使用中值。对于分类数据，请考虑使用众数替换缺失值。

## 机器学习代写|自然语言处理代写NLP代考|Outlier Detection

https://scikit-learn.org/stable/modules/outlier_detection.html。

https://scikit-learn.org/stable/modules/generated/sklearn.neighbors。LocalOutlierFactor.html。

https://scikit-learn.org/stable/auto_examples/linear_model/plot_huber_ts_ridge.html。

https://scikit-learn.org/stable/auto_examples/linear_model/plot_theilsen。html 和
https://en.wikipedia.org/wiki/Random_sample_consensus。

https://www.kdnuggets.com/2018/12/four-techniques-outlier-detection。html。

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 机器学习代写|自然语言处理代写NLP代考|Scaling Numeric Data via Standardization

statistics-lab™ 为您的留学生涯保驾护航 在代写自然语言处理NLP方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写自然语言处理NLP代写方面经验极为丰富，各种代写自然语言处理NLP相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 机器学习代写|自然语言处理代写NLP代考|Scaling Numeric Data via Standardization

The standardization technique involves finding the mean mu and the standard deviation sigma, and then mapping each $x i$ value to (xi-mu)/sigma. Recall the following formulas:
$\mathrm{mu}=[\operatorname{SUM}(\mathrm{x})] / \mathrm{n}$
$\operatorname{variance}(\mathrm{x})=[$ SUM $(\mathrm{x}-\mathrm{xbar}) *(\mathrm{x}-\mathrm{xbar})] / \mathrm{n}$
sigma $=\operatorname{sqrt}($ variance $)$
As a simple illustration of standardization, suppose that the random variable $x$ has the values ${-1,0,1}$. Then $m u$ and sigma are calculated as follows:
mu $\quad=($ SUM $x i) / n=(-1+0+1) / 3=0$
variance $=\left[\mathrm{SUM}(\mathrm{xi}-\mathrm{mu})^{\wedge} 2\right] / \mathrm{n}$
$=\left[(-1-0)^{\wedge} 2+(0-0)^{\wedge} 2+(1-0)^{\wedge} 2\right] / 3$
$=2 / 3$
sigma $=\operatorname{sqrt}(2 / 3)=0.816$ (approximate value)
Hence, the standardization of ${-1,0,1}$ is ${-1 / 0.816,0 / 0.816$,
$1 / 0.816}$, which in turn equals the set of values ${-1.2254,0,1.2254}$.
As another example, suppose that the random variable $x$ has the values
${-6,0,6}$. Then mu and sigma are calculated as follows:
$m u=(\mathrm{SUM} \mathrm{xi}) / \mathrm{n}=(-6+0+6) / 3=0$
variance $=\left[S U M(x i-m u)^{\wedge} 2\right] / \mathrm{n}$
$=\left[(-6-0)^{\wedge} 2+(0-0)^{\wedge} 2+(6-0)^{\wedge} 2\right] / 3$
$=72 / 3$
$=24$
sigma $=\operatorname{sqrt}(24)=4.899$ (approximate value)

Hence, the standardization of ${-6,0,6}$ is ${-6 / 4.899,0 / 4.899$, $6 / 4.899}$, which in turn equals the set of values ${-1.2247,0,1.2247}$.
In the preceding two examples, the mean equals 0 in both cases, but the variance and standard deviation are significantly different. The normalization of a set of values always produces a set of numbers between 0 and 1 .

However, the standardization of a set of values can generate numbers that are less than $-1$ and greater than 1 ; this will occur when sigma is less than the minimum value of every term $|\mathrm{mu}-\mathrm{xi}|$, where the latter is the absolute value of the difference between mu and each xi value. In the preceding example, the minimum difference equals 1 , whereas sigma is $0.816$, and therefore the largest standardized value is greater than $1 .$

## 机器学习代写|自然语言处理代写NLP代考|What to Look for in Categorical Data

This section contains various suggestions for handling inconsistent data values, and you can determine which ones to adopt based on any additional factors that are relevant to your particular task. For example, consider dropping columns that have very low cardinality (equal to or close to 1), as well as numeric columns with zero or very low variance.

Next, check the contents of categorical columns for inconsistent spellings or errors. A good example pertains to the gender category, which can consist of a combination of the following values:
male
Male
female
Female
$\mathrm{m}$
f
$M$
$\mathrm{F}$
The preceding categorical values for gender can be replaced with two categorical values (unless you have a valid reason to retain some of the other values). Moreover, if you are training a model whose analysis involves a single gender, then you need to determine which rows (if any) of a dataset must be excluded. Also check categorical data columns for redundant or missing white spaces.

Check for data values that have multiple data types, such as a numerical column with numbers as numerals and some numbers as strings or objects.

## 机器学习代写|自然语言处理代写NLP代考|Mapping Categorical Data to Numeric Values

Character data is often called categorical data, examples of which include people’s names, home or work addresses, and email addresses. Many types of categorical data involve short lists of values. For example, the days of the week and the months in a year involve seven and twelve distinct values, respectively. Notice that the days of the week have a relationship: For example, each day has a previous day and a next day. However, the colors of an automobile are independent of each other: the color red is not “better” or “worse” than the color blue.

There are several well-known techniques for mapping categorical values to a set of numeric values. A simple example where you need to perform this conversion involves the gender feature in the Titanic dataset. This feature is one of the relevant features for training a machine learning model. The gender feature has ${\mathbf{M}, \mathrm{F}}$ as its set of possible values. As you will see later in this chapter, Pandas makes it very easy to convert the set of values ${M, F}$ to the set of values ${0,1}$.

Another mapping technique involves mapping a set of categorical values to a set of consecutive integer values. For example, the set {Red, Green, Blue} can be mapped to the set of integers $[0,1,2}$. The set ${$ Male, Female $}$ can be mapped to the set of integers ${0,1}$. The days of the week can be mapped to ${0,1,2,3,4,5,6}$. Note that the first day of the week depends on the country: In some cases it’s Sunday, and in other cases it’s Monday.

Another technique is called one-hot encoding, which converts each value to a vector (check Wikipedia if you need a refresher regarding vectors). Thus, {Male, Female} can be represented by the vectors $[1,0]$ and $[0,1]$, and the colors {Red, Green, Blue} can be represented by the vectors $[1,0,0]$, $[0,1,0]$, and $[0,0,1]$. If you vertically “line up” the two vectors for gender, they form a $2 \times 2$ identity matrix, and doing the same for the colors will form a $3 \times 3$ identity matrix.

If you vertically “line up” the two vectors for gender, they form a $2 \times 2$ identity matrix, and doing the same for the colors will form a $3 \times 3$ identity matrix, as shown here:
$$[1,0,0]$$
$[0,1,0]$
$[0,0,1]$

## 机器学习代写|自然语言处理代写NLP代考|Scaling Numeric Data via Standardization

mu=(和X一世)/n=(−1+0+1)/3=0

=[(−1−0)∧2+(0−0)∧2+(1−0)∧2]/3
=2/3

−6,0,6. 然后 mu 和 sigma 计算如下：

=[(−6−0)∧2+(0−0)∧2+(6−0)∧2]/3
=72/3
=24

F

F

[1,0,0]
[0,1,0]
[0,0,1]

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。