机器学习代写|自然语言处理代写NLP代考|JAPANESE GRAMMAR

statistics-lab™ 为您的留学生涯保驾护航 在代写自然语言处理NLP方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写自然语言处理NLP代写方面经验极为丰富，各种代写自然语言处理NLP相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

机器学习代写|自然语言处理代写NLP代考|Japanese Postpositions

Instead of prepositions, Japanese uses postpositions (which can occur multiple times in a sentence). Here are some common Japanese postpositions that are written in Romanji:

• Ka (a marker for a question)
• Wa (the topic of a sentence)
• Ga (the subject of a sentence)
• $\mathrm{O}$ (direct object)
• To (can mean “for” and “and”)
• Ni (physical motion toward something)
• E (toward something)
The particle $k a$ at the end of a sentence in Japanese indicates a question. A simple example of $k a$ is the Romanji sentence Nan desu ka, which means “What is it?”

An example of wa is the following sentence: Watashi wa Nihon jin desu, which means “As for me, I’m Japanese.” By contrast, the sentence Watashi ga Nihon jin desu, which means “It is I (not somebody else) who is Japanese.”
As you can see, Japanese makes a distinction between the topic of a sentence (with $w a$ ) versus the subject of a sentence (with $g a$ ). A Japanese sentence can contain both particles $w a$ and $g a$, with the following twist: if a negative fact is expressed about the noun that precedes $g a$, then $g a$ is replaced with $w a$ and the main verb is written in the negative form. For example, the Romanji sentence “I still have not studied Kanji” is translated into Hiragana as follows:
Watashi wa kanji wa mada benkyou shite imasen.

机器学习代写|自然语言处理代写NLP代考|Ambiguity in Japanese Sentences

Since Japanese does not pluralize nouns, the same word is used for singular as well as plural, which requires contextual information to determine the exact meaning of a Japanese sentence. As a simple illustration, which is discussed

in more detail later in this chapter under the topic of tokenization, here is a Japanese sentence written in Romanji, followed by Hiragana and Kanji (the second and third sentences are from Google Translate):
Watashi wa tomodachi ni hon o agemashita
$\mathrm{~ क た L ~ क ~ と も た ゙ ち ~ に ~ ほ h ~ お}$

The preceding sentence can mean any of the following, and the correct interpretation depends on the context of a conversation:

• I gave a book to a friend.
• I gave a book to friends.
• I gave books to a friend.
• I gave books to friends.
Moreover, the context for the words “friend” and “friends” in the Japanese sentence is also ambiguous: they do not indicate whose friends (mine, yours, his, or hers). In fact, the following Japanese sentence is also grammatically correct and ambiguous:
Tomodachi ni hon o agemashita
The preceding sentence does not specify who gave a book (or books) to a friend (or friends), but its context will be clear during a conversation. Incidentally, Japanese people often omit the subject pronoun (unless the sentence becomes ambiguous), so it’s more common to see the second sentence (i.e., without Watashi wa) instead of the first Romanji sentence.

Contrast the earlier Japanese sentence with its counterpart in the romance languages Italian, Spanish, French, Portuguese, and German (some accent marks are missing for some words):

• Italian: Ho dato un libro a mio amico.
• Spanish: [Yo] Le di un libro a mi amigo.
• Portuguese: Eu dei um livro para meu amigo.
• French: Jai donne un livre au mon ami.
• German. Ich habe ein Buch dem Freund gegeben.
Notice that the Italian and French sentences use a compound verb whose two parts are consecutive (adjacent), whereas German uses a compound verb in which the second part (the past participle) is at the end of the sentence. However, the Spanish and Portuguese sentences use the simple past (the preterit) form of the verb “to give.”

机器学习代写|自然语言处理代写NLP代考|Japanese Nominalization

Nominalizers convert verbs (or even entire sentences) into a noun. Nominalizers resemble a “that” clause in English, and they are useful when speaking about an action as a noun. Japanese has two nominalizers: no and koto ga.

The nominalizer $O$ (no) is required with verbs of perception, such as 見 (to see) and 閆 $<$ (to listen). For example, the following sentence mean “I love listening to music”, written in Romanji in the first sentence, followed by a second sentence that contains a mixture of Kanji and Hiragana:
Watashi wa ongaku o kiku no ga daisuki desu
The next three sentences all mean “He loves reading a newspaper,” written in Romanji and then Hiragana and Kanji:
Kare wa shimbun o yomu no ga daisuki desu
$\mathrm{~ カ ั 丸 は 新 間 を 読}$

The koto ga nominalizer, which is the other Japanese nominalizer, is used sentences of the form “have you ever …” For example, the following sentence means “Have you (ever) been in Japan?”

机器学习代写|自然语言处理代写NLP代考|Japanese Postpositions

• Ka（问题的标记）
• Wa（一个句子的主题）
• 嘎（句子的主语）
• ○（直接宾语）
• To（可以表示“for”和“and”）
• Ni（朝向某物的物理运动）
• E（朝向某物）
粒子ķ一个日语句末表示疑问。一个简单的例子ķ一个是罗马字句 Nan desu ka，意思是“它是什么？”

Watashi wa kanji wa mada benkyou shite imasen。

机器学习代写|自然语言处理代写NLP代考|Ambiguity in Japanese Sentences

Watashi wa tomodachi ni hon oagemashita
कたकともだちにほお ķ稻田大号 ķ 什么时候和稻田゙血液 至 何H 哦

• 我给了朋友一本书。
• 我给了朋友一本书。
• 我把书送给了一个朋友。
• 我把书送给了朋友。
此外，日语句子中“朋友”和“朋友”这两个词的上下文也是模棱两可的：它们不表示谁的朋友（我的、你的、他的或她的）。事实上，下面的日语句子在语法上也是正确的和模棱两可的：
Tomodachi ni hon oagemashita
前面的句子没有具体说明谁把一本书（或几本书）送给了一个朋友（或几个朋友），但它的上下文在对话中会很清楚. 顺便说一句，日本人经常省略主语代词（除非句子变得模棱两可），因此更常见的是看到第二个句子（即没有 Watashi wa）而不是第一个罗马字句子。

• 意大利人：我给了我朋友一本书。
• Chinese: [我] 给了我的朋友一本书。
• Chinese: 我给了我朋友一本书。
• Chinese: 我给了我朋友一本书。
• 德语。Ich habe ein Buch dem Freund gegeben。
请注意，意大利语和法语句子使用复合动词，其两个部分是连续的（相邻），而德语使用复合动词，其中第二部分（过去分词）位于句子的末尾。然而，西班牙语和葡萄牙语的句子使用动词“to give”的简单过去（preterit）形式。

机器学习代写|自然语言处理代写NLP代考|Japanese Nominalization

Watashi wa ongaku o kiku no ga daisuki desu

Kare wa shimbun o yomu no ga daisuki desu
カั丸は新間を読 力量○丸牙齿新間的阅读

koto ga 名词化器是另一个日语名词化器，用于“你曾经……”形式的句子例如，下面的句子表示“你（曾经）去过日本吗？”

有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

机器学习代写|自然语言处理代写NLP代考|THE COMPLEXITY OF NATURAL LANGUAGES

statistics-lab™ 为您的留学生涯保驾护航 在代写自然语言处理NLP方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写自然语言处理NLP代写方面经验极为丰富，各种代写自然语言处理NLP相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

机器学习代写|自然语言处理代写NLP代考|Word Order in Sentences

As mentioned previously, German and Slavic languages allow for a rearrangement of the words in sentences because those languages support declension, which involves modifying the endings of articles and adjectives in accordance with the grammatical function of those words in a sentence (such as the subject, direct object, and indirect object). Those word endings are loosely comparable to prepositions in English, and sometimes they have the same spelling for different grammatical functions. For example, in German, the article den precedes a masculine noun that is a direct object and also a plural noun that is an indirect object: ambiguity can occur if the singular masculine noun has the same spelling in its plural form.

Alternatively, since English is word order dependent, ambiguity can still arise in sentences, which we have learned to parse correctly without any conscious effort.

Groucho Marx often incorporated ambiguous sentences in his dialogues, such as the following paraphrased examples:

“This morning I shot an elephant in my pajamas. How he got into my pajamas I have no idea.”

“In America, a woman gives birth to a child every fifteen minutes. Somebody needs to find that woman and stop her.”

Now consider the following pair of sentences involving a boy, a mountain, and a telescope:
I saw the boy on the mountain with the telescope.
I saw the boy with the telescope on the mountain.
Human speakers interpret both English sentences as having the same meaning; however, arriving at the same interpretation is less obvious from the standpoint of a purely NLP task. Why does this ambiguity in the preceding example not arise in Russian? The reason is simple: the preposition with is associated with the instrumental case in Russian, whereas on is not the instrumental case, and therefore the nouns have suffixes that indicate the distinction.

机器学习代写|自然语言处理代写NLP代考|Languages and Regional Accents

Accents, slang, and dialects have some common features, but there can be some significant differences. Accents involve modifying the standard pronunciation of words, which can vary significantly in different parts of the same country.

One interesting phenomenon pertains to the southern region of some countries (in the northern hemisphere), which tend to have a more “relaxed” pronunciation compared to the northern region of that country. For example, some people in the southeastern United States speak with a so-called “drawl,” whereas newscasters will often speak with a midwestern pronunciation, which is considered a neutral pronunciation. The same is true of people in Tokyo, who often speak Japanese with a “flat” pronunciation (which is also true of Japanese newscasters on NHK), versus people from the Kansai region (Kyoto, Kobe, and Osaka) of Japan, who vary the tone and emphasis of Japanese words.

Regional accents can also involve modifying the meaning of words in ways that are specific to the region in question. For example, Texans will say “I’m fixing to graduate this year” whereas people from other parts of the United States would say “going” instead of “fixing.” In France, Parisians are unlikely to say Il faut fatiguer la salade (“it’s necessary to toss the salad”), whereas this sentence is much more commonplace in southern France. (The English word “fatigue” is derived from the French verb fatiguer)

Verbs exist in every written language, and they undergo conjugation that reflects their tense and mood in a sentence. Such languages have an overlapping set of verb tenses, but there are differences. For instance, Portuguese has a future perfect subjunctive, as does Spanish (but it’s almost never used in spoken form), whereas these verb forms do not exist in English. English verb tenses (in the indicative mood) can include:

• present
• present perfect
• present progressive
• present perfect progressive
• preterite (simple past)
• past perfect
• past progressive
• past perfect progressive
• future tense
• future perfect
• future progressive
• future perfect progressive (does not exist in Italian)

Here are some examples of English sentences that illustrate (most of) the preceding verb forms:

• I read a book.
• I have read a book.
• I am reading a book.
• I have been reading a book.
• I read a book.
• I have read a book.
• I had been reading a book.
• I will read a book.
• I will have read a book.
• I will be reading a book.
• At 6 p.m., I will have been reading a book for 3 hours.
Verb moods can be indicative (as shown in the preceding list), subjunctive (discussed soon), and conditional (“I would go but I have work to do”). In English, subjunctive verb forms can include the present subjunctive (“I insist that he do the task”), the past subjunctive (“If I were you”), and the pluperfect subjunctive (“Had I but known …”). Interestingly, Portuguese also provides a future perfect subjunctive verb form; Spanish also has this verb form but it’s never used in conversation.

Interestingly (from a linguistic perspective, at least), there are modern languages, such as Mandarin, that have only one verb tense: they rely on other words in a sentence (such as time adverbs or aspect particles) to convey the time frame. Such languages would express the present, the past, and the future in a form that is comparable to the following:

• “I read a book now.”
• “I read a book yesterday.”
• “I read a book tomorrow.”

机器学习代写|自然语言处理代写NLP代考|Word Order in Sentences

“今天早上我穿着睡衣射了一头大象。我不知道他是怎么穿上我的睡衣的。”

“在美国，每十五分钟就有一个女人生一个孩子。需要有人找到那个女人并阻止她。”

机器学习代写|自然语言处理代写NLP代考|Languages and Regional Accents

• 当下
• 现在完美
• 现在进行
• 现在完成进行时
• preterite （简单过去）
• 过去完成时
• 过去进步
• 过去完成进行时
• 将来时
• 未来完美
• 未来进步
• 将来完成进行时（意大利语中不存在）

• 我读了一本书。
• 我读过一本书。
• 我正在读一本书。
• 我一直在看书。
• 我读了一本书。
• 我读过一本书。
• 我一直在看书。
• 我会读一本书。
• 我会读一本书。
• 我会读一本书。
• 下午 6 点，我会读 3 个小时的书。
动词语气可以是指示性的（如前面的列表所示）、虚拟语气（很快会讨论）和条件性的（“我会去，但我有工作要做”）。在英语中，虚拟语气动词形式可以包括现在虚拟语气（“我坚持他做任务”）、过去虚拟语气（“如果我是你”）和过去完成虚拟语气（“如果我知道……”）。有趣的是，葡萄牙语还提供了将来完成的虚拟语气动词形式；西班牙语也有这种动词形式，但从未在对话中使用。

• “我现在读了一本书。”
• “我昨天看了一本书。”
• “我明天看书。”

有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

机器学习代写|自然语言处理代写NLP代考|Peak Usage of Some Languages

statistics-lab™ 为您的留学生涯保驾护航 在代写自然语言处理NLP方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写自然语言处理NLP代写方面经验极为丰富，各种代写自然语言处理NLP相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

机器学习代写|自然语言处理代写NLP代考|Peak Usage of Some Languages

As you might have surmised, different languages have been in an influential position during the past 2,000 years. If you trace the popularity and influence of Indo-European languages, you will find periods of time with varying degrees of influence involving multiple languages, including Hebrew, Greek, Latin, Arabic, French, and English.

Latin is an Indo-European language (apparently derived from the Etruscan and Greek alphabets), and during the lst century AD, Latin became a mainstream language. In addition, romance languages are derived from Latin. Today Latin is considered a dead language in the sense that it’s not actively spoken on a daily basis by large numbers of people. The same is true of Sanskrit, which is a very old language from India.

During the Roman Empire, Latin and Greek were the official languages for administrative as well as military activities. In addition, Latin was an important language for diplomacy among countries for many centuries after the fall of the Roman Empire.

You might be surprised to know that Arabic was the lingua franca throughout the Mediterranean during the 10th and 11th centuries AD. As another example, French was spoken in many parts of Europe during the 18th century, including the Russian aristocracy.

Today English appears to be in its ascendancy in terms of the number of native English speakers as well as the number of people who speak English as a second (or third or fourth) language. Although Mandarin is a widely spoken Asian language, English is the lingua franca for commerce as well as technology: virtually every computer language is based on English.

机器学习代写|自然语言处理代写NLP代考|Languages and Regional Accents

Accents, slang, and dialects have some common features, but there can be some significant differences. Accents involve modifying the standard pronunciation of words, which can vary significantly in different parts of the same country.

One interesting phenomenon pertains to the southern region of some countries (in the northern hemisphere), which tend to have a more “relaxed” pronunciation compared to the northern region of that country. For example, some people in the southeastern United States speak with a so-called “drawl,” whereas newscasters will often speak with a midwestern pronunciation, which is considered a neutral pronunciation. The same is true of people in Tokyo, who often speak Japanese with a “flat” pronunciation (which is also true of Japanese newscasters on NHK), versus people from the Kansai region (Kyoto, Kobe, and Osaka) of Japan, who vary the tone and emphasis of Japanese words.

Regional accents can also involve modifying the meaning of words in ways that are specific to the region in question. For example, Texans will say “I’m fixing to graduate this year” whereas people from other parts of the United States would say “going” instead of “fixing.” In France, Parisians are unlikely to say Il faut fatiguer la salade (“it’s necessary to toss the salad”), whereas this sentence is much more commonplace in southern France. (The English word “fatigue” is derived from the French verb fatiguer)

机器学习代写|自然语言处理代写NLP代考|Languages and Slang

The existence of slang words is interesting and perhaps inevitable, they seem to flourish in every human language. Sometimes slang words are used for obfuscation so that only members of an “in group” understand the modified meaning of those words. Slang words can also be a combination of existing words, new words (but not officially recognized), and short-hand expressions. Slang can also “invert” the meaning of words (“bad” instead of “good”), which can be specific to an age group, minority, or region. In addition, slang can also assign an entirely unrelated meaning to a standard word (e.g., the slang terms “that’s dope,” “that’s sick,” and “the bomb”).

Slang words can also be specific to an age group to prevent communication with members of different age groups. For example, Japanese teens can communicate with each other by reversing the order of the syllables in a word, which renders those “words” incomprehensible to adults. The inversion of syllables is far more complex than “pig Latin,” in which the first letter of a word

is shifted to the end of the word, followed by the syllable “ay.” For example, “East Bay” (an actual location in the Bay Area in Silicon Valley) is humorously called “beast” in pig Latin.

Teenagers also use acronyms (perhaps as another form of slang) when sending text messages to each other. For example, the acronym “aos” means “adult over shoulder.” The acronym “bos” has several different meanings, including “brother over shoulder” and “boyfriend over shoulder.”

The slang terms that you use with your peers invariably simplifies communication with others in your in-group, sometimes accompanied by specialized interpretations to words (such as reversing their meaning). A simple example is the word zanahoria, which is the Spanish word for carrot. In colloquial speech in Venezuela, calling someone a zanahoria means that that person is very conservative and as “straight” as a carrot.

Slang enables people to be creative and also playfully break the rules of language. Both slang and colloquial speech simplify formal language and rarely (if ever) introduce greater complexity in alternate speech rules.

Perhaps that’s the reason that slang and colloquial speech cannot be controlled or regulated by anyone (or by any language committee): like water, they are fluid and adapt to the preferences of their speakers.

One more observation: while slang can be viewed as a creative by-product of standard speech, there is a reverse effect that can occur in certain situations. For example, you have probably noticed how influential subgenres are eventually absorbed (perhaps only partially) into mainstream culture: witness how commercials eventually incorporated a “softened” form of rap music and its rhythm in commercials for personal products. There’s a certain irony in hearing “Stairway to Heaven” as elevator music.

Another interesting concept is a “meme” (which includes Internet memes) in popular culture, which refers to something with humorous content. While slang words are often used to exclude people, a meme often attempts to communicate a particular sentiment. One such meme is “OK Boomer,” which some people view as a derogatory remark that’s sometimes expressed in a snarky manner, and much less often interpreted as a humorous term. Although language dialects can also involve regional accents and slang, they also have more distinct characteristics, as discussed in the next section.

有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

机器学习代写|自然语言处理代写NLP代考|NLP Concepts

statistics-lab™ 为您的留学生涯保驾护航 在代写自然语言处理NLP方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写自然语言处理NLP代写方面经验极为丰富，各种代写自然语言处理NLP相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

机器学习代写|自然语言处理代写NLP代考|THE ORIGIN OF LANGUAGES

Someone once remarked that “the origin of language is an enigma,” which is viscerally appealing because it has at least a kernel of truth. Although there are multiple theories that attempt to explain how and why languages developed, none of them has attained universal consensus. Nevertheless, there is no doubt that humans have far surpassed all other species in terms of language development.

There is also the question of how the vocabulary of a language is formed, which can be the confluence of multiple factors, as well as meaning in a language. According to Ludwig Wittgenstein (1953), who was an influential philosopher in many other fields, language derives its meaning from use.

One theory about the evolution of language in humans asserts that the need for communication between humans makes language a necessity. Another explanation is that language is influenced by the task of creating complex tools, because the latter requires a precise sequence of steps, which ultimately spurred the development of languages.

Without delving into their details, the following list contains some theories that have been proposed regarding language development. Keep in mind that they vary in terms of their support in the academic community:

• Strong Minimalist Thesis
• The FlintKnapper Theory
• The Sapir-Whorf Hypothesis
• Universal Grammar (Noam Chomsky)
The Strong Minimalist Thesis (SRT) asserts that language is based on something called the hierarchical syntactic structure. The FlintKnapper Theory asserts that the ability to create complex tools involved an intricate sequence of steps, which in turn necessitated communication between people. In simplified terms, the Sapir-Whorf Hypothesis (also called the linguistic relativity hypothesis, which is a slightly weaker form) posits that the language we speak influences how we think. Consider how our physical environment can influence our spoken language: Eskimos have several words to describe snow, whereas people in some parts of the Middle East have never seen a snow storm.

机器学习代写|自然语言处理代写NLP代考|Language Fluency

As mentioned in the previous section, human infants are capable of producing the sounds of any language, given enough opportunity to imitate those

sounds. They tend to lose some of that capacity as they become older, which might explain why some adults speak another language with an accent (of course, there are plenty of exceptions).

Interestingly, babies respond favorably to the sound of vowel-rich “Parentese” and a study in 2018 suggested that babies prefer the sound of other babies instead of their mother:
https://getpocket.com/explore/item/babies-prefer-the-sounds-of-otherbabies-to-the-cooing-of-their-parents

There are two interesting cases in which people can acquire native-level speech capability. The first case is intuitive: people who have been raised in a bilingual (or multilingual) environment tend to have a greater capacity for learning how to speak other languages with native level (or near native level) speech. Second, people who speak phonetic languages have an advantage when they study another phonetic language, especially one that is in their language group, because they already know how to pronounce the majority of vowel sounds. languages whose pronunciation can be a challenge for practically every non-native speaker. For example, letters that have a guttural sound (such as those in Dutch, German, and Arabic), the glottal stop (most noticeable in Arabic), and the letter “ain” in Arabic are generally more challenging to pronounce for native speakers of romance languages and some Asian languages.

To some extent, the non-phonetic nature of the English language might explain why some monolingual native-English speakers might struggle with learning to speak other languages with native-level speech. Perhaps the closest language to English (in terms of cadence) is Dutch, and people from Holland can often speak native-level English. This tends to be true of Swedes and Danes as well, whose languages are Germanic, but not necessarily true of Germans, who can speak perfect grammatical English but sometimes speak English with an accent.

Perhaps somewhat ironically, sometimes accents can impart a sort of cachet, such as speaking with a British or Australian accent in the United States. Indeed, a French accent can also add a certain je-ne-sais-quoi to a speaker in various parts of the United States.

机器学习代写|自然语言处理代写NLP代考|Major Language Groups

There are more than 140 language families, and the six largest language families (based on language count) are listed here:

• Niger-Congo
• Austronesian
• Trans-New Guinea
• Sino-Tibetan
• Indo-European
• Afro-Asiatic
English belongs to the Indo-European group, Mandarin belongs to the Sino-Tibetan, and Arabic belongs to the Afro-Asiatic group. According to Wikipedia, Indo-European languages comprise almost 600 languages, including most of the languages in Europe, the northern Indian subcontinent, and the Iranian plateau. Almost half the world speaks an Indo-European language as a native language, which is greater than any of the language groups listed in the introduction of this section. Indo-European has several major language subgroups, which are Germanic, Slavic, and Romance languages. The preceding information is from the following Wikipedia link:
https://en.wikipedia.org/wiki/List_of_language_families
As of 2019 , the top four languages that are spoken in the world, which counts the number of people who are native speakers or secondary speakers, are as follows:
• English: $1.268$ billion
• Mandarin: $1.120$ billion
• Hindi: $637.3$ million
• Spanish: $537.9$ million
• French: $276.6$ million
The preceding information is from the following Wikipedia link:
https://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_ speakers

Many factors can influence the expansion of a given language into multiple countries, such as commerce, economic factors, technological influence, and warfare, thereby resulting in the absorption of new words by another language. Somewhat intuitively, countries with a common border influence each other’s language, sometimes resulting in new hybrid languages.

机器学习代写|自然语言处理代写NLP代考|THE ORIGIN OF LANGUAGES

• 强大的极简主义论文
• FlintKnapper 理论
• Sapir-Whorf 假说
• 通用语法 (Noam Chomsky)
强极简主义论文 (SRT) 断言语言是基于一种称为层次句法结构的东西。FlintKnapper 理论断言，创建复杂工具的能力涉及一系列错综复杂的步骤，这反过来又需要人与人之间的交流。简而言之，Sapir-Whorf 假设（也称为语言相对论假设，这是一种稍弱的形式）假设我们所说的语言会影响我们的思维方式。想想我们的物理环境如何影响我们的口语：爱斯基摩人有几个词来形容雪，而中东一些地区的人们从未见过暴风雪。

机器学习代写|自然语言处理代写NLP代考|Language Fluency

https://getpocket.com/explore/item/babies-prefer-the-sounds-of-otherbabies-to-the-cooing-of-their-parents

机器学习代写|自然语言处理代写NLP代考|Major Language Groups

• 尼日尔-刚果
• 南岛语
• 跨新几内亚
• 汉藏
• 印欧语系
• 亚非
英语属于印欧语系，普通话属于汉藏语系，阿拉伯语属于亚非语系。根据维基百科，印欧语系包括近 600 种语言，包括欧洲、印度北部次大陆和伊朗高原的大部分语言。世界上几乎有一半的人将印欧语作为母语，这比本节介绍中列出的任何语言组都多。印欧语有几个主要的语言亚群，它们是日耳曼语、斯拉夫语和罗曼语。上述信息来自以下维基百科链接：
https ://en.wikipedia.org/wiki/List_of_language_families
截至 2019 年，以母语为母语或第二母语的人数计算，世界上使用最多的四种语言如下：
• 英语：1.268十亿
• Mandarin: 1.120十亿
• 印地语：637.3百万
• 西班牙语：537.9百万
• 法语：276.6万以下是美国
使用的语言列表

有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

机器学习代写|自然语言处理代写NLP代考|WHAT IS IMBALANCED CLASSIFICATION

statistics-lab™ 为您的留学生涯保驾护航 在代写自然语言处理NLP方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写自然语言处理NLP代写方面经验极为丰富，各种代写自然语言处理NLP相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

机器学习代写|自然语言处理代写NLP代考|WHAT IS IMBALANCED CLASSIFICATION

Imbalanced classification involves datasets with imbalanced classes. For example, suppose that class A has $99 \%$ of the data and class B has $1 \%$. Which classification algorithm would you use? Unfortunately, classification algorithms

don’t work well with this type of imbalanced dataset. Here is a list of several well-known techniques for handling imbalanced datasets:

• Random resampling rebalances the class distribution.
• Random oversampling duplicates data in the minority class.
• Random undersampling deletes examples from the majority class.
• SMOTE
Random resampling transforms the training dataset into a new dataset, which is effective for imbalanced classification problems.

The random undersampling technique removes samples from the dataset, and involves the following:

• randomly remove samples from majority class
• can be performed with or without replacement
• alleviates imbalance in the dataset
• may increase the variance of the classifier
• may discard useful or important samples
However, random undersampling does not work well with a dataset that has a $99 \% / 1 \%$ split into two classes. Moreover, undersampling can result in losing information that is useful for a model.

Instead of random undersampling, another approach involves generating new samples from a minority class. The first technique involves oversampling examples in the minority class and duplicate examples from the minority class.
There is another technique that is better than the preceding technique, which involves the following:

• synthesize new examples from minority class
• a type of data augmentation for tabular data
• this technique can be very effective
• generate new samples from minority class
Another well-known technique is called SMOTE, which involves data augmentation (i.e., synthesizing new data samples) well before you use a classification algorithm. SMOTE was initially developed by means of the kNN algorithm (other options are available), and it can be an effective technique for handling imbalanced classes.

Yet another option to consider is the Python package imbal anced-learn in the scikit-learn-contrib project. This project provides various re-sampling techniques for datasets that exhibit class imbalance. More details are available online:
https://github.com/scikit-learn-contrib/imbalanced-learn.

机器学习代写|自然语言处理代写NLP代考|WHAT IS SMOTE

SMOTE is a technique for synthesizing new samples for a dataset. This technique is based on linear interpolation:

• Step 1: Select samples that are close in the feature space.
• Step 2: Draw a line between the samples in the feature space.
• Step 3: Draw a new sample at a point along that line.
A more detailed explanation of the SMOTE algorithm is as follows:
• Select a random sample “a” from the minority class.
• Find $\mathrm{k}$ nearest neighbors for that example.
• Select a random neighbor “b” from the nearest neighbors.
• Create a line “L” that connects “a” and “b.”
• Randomly select one or more points “c” on line L.
If need be, you can repeat this process for the other $(\mathrm{k}-1)$ nearest neighbors to distribute the synthetic values more evenly among the nearest neighbors.

The initial SMOTE algorithm is based on the kNN classification algorithm, which has been extended in various ways, such as replacing $\mathrm{kNN}$ with SVM. A list of SMOTE extensions is shown as follows:

• selective synthetic sample generation
• Borderline-SMOTE (kNN)
• Borderline-SMOTE (SVM)

机器学习代写|自然语言处理代写NLP代考|ANALYZING CLASSIFIERS

This section is marked “optional” because its contents pertain to machine learning classifiers, which are not the focus of this book. However, it’s still worthwhile to glance through the material, or perhaps return to this section after you have a basic understanding of machine learning classifiers.

Several well-known techniques are available for analyzing the quality of machine learning classifiers. Two techniques are LIME and ANOVA, both of which are discussed in the following subsections.

LIME is an acronym for Local Interpretable Model-Agnostic Explanations. LIME is a model-agnostic technique that can be used with machine learning models. In LIME, you make small random changes to data samples and then observe the manner in which predictions change (or not). The approach involves changing the output (slightly) and then observing what happens to the output.

By way of analogy, consider food inspectors who test for bacteria in truckloads of perishable food. Clearly, it’s infeasible to test every food item in a truck (or a train car), so inspectors perform “spot checks” that involve testing randomly selected items. In an analogous fashion, LIME makes small changes to input data in random locations and then analyzes the changes in the associated output values.

However, there are two caveats to keep in mind when you use LIME with input data for a given model:

1. The actual changes to input values are model-specific.
2. This technique works on input that is interpretable.
Examples of interpretable input include machine learning classifiers (such as trees and random forests) and NLP techniques such as BoW (Bag of Words). Non-interpretable input involves “dense” data, such as a word embedding (which is a vector of floating point numbers).

You could also substitute your model with another model that involves interpretable data, but then you need to evaluate how accurate the approximation is to the original model.

机器学习代写|自然语言处理代写NLP代考|WHAT IS IMBALANCED CLASSIFICATION

• 随机重采样重新平衡类分布。
• 随机过采样会复制少数类中的数据。
• 随机欠采样从多数类中删除示例。
• SMOTE
随机重采样将训练数据集转换为新的数据集，这对于不平衡的分类问题是有效的。

• 从多数类中随机删除样本
• 可以在有或没有更换的情况下进行
• 减轻数据集中的不平衡
• 可能会增加分类器的方差
• 可能会丢弃有用或重要的样本
但是，随机欠采样不适用于具有99%/1%分为两类。此外，欠采样会导致丢失对模型有用的信息。

• 从少数类中合成新的例子
• 表格数据的一种数据扩充
• 这种技术非常有效
• 从少数类生成新样本
另一种众所周知的技术称为 SMOTE，它在使用分类算法之前就涉及数据增强（即合成新数据样本）。SMOTE 最初是通过 kNN 算法（其他选项可用）开发的，它可以成为处理不平衡类的有效技术。

https://github.com/scikit-learn-contrib/imbalanced-learn。

机器学习代写|自然语言处理代写NLP代考|WHAT IS SMOTE

SMOTE 是一种为数据集合成新样本的技术。该技术基于线性插值：

• 步骤 1：选择特征空间中相近的样本。
• 第 2 步：在特征空间中的样本之间画一条线。
• 第 3 步：在沿该线的一点绘制一个新样本。
SMOTE算法更详细的解释如下：
• 从少数类中选择一个随机样本“a”。
• 寻找ķ该示例的最近邻居。
• 从最近的邻居中选择一个随机邻居“b”。
• 创建一条连接“a”和“b”的线“L”。
• 在L线上随机选择一个或多个点“c”。
如果需要，您可以对另一个重复此过程(ķ−1)最近的邻居在最近的邻居之间更均匀地分配合成值。

• 选择性合成样品生成
• 边界-SMOTE (kNN)
• 边界-SMOTE (SVM)

机器学习代写|自然语言处理代写NLP代考|ANALYZING CLASSIFIERS

LIME 是 Local Interpretable Model-Agnostic Explanations 的首字母缩写词。LIME 是一种与模型无关的技术，可与机器学习模型一起使用。在 LIME 中，您对数据样本进行小的随机更改，然后观察预测更改（或不更改）的方式。该方法涉及（稍微）更改输出，然后观察输出发生了什么。

1. 输入值的实际变化是特定于模型的。
2. 这种技术适用于可解释的输入。
可解释输入的示例包括机器学习分类器（例如树和随机森林）和 NLP 技术，例如 BoW（词袋）。不可解释的输入涉及“密集”数据，例如词嵌入（它是浮点数的向量）。

有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

机器学习代写|自然语言处理代写NLP代考|MISSING DATA, ANOMALIES, AND OUTLIERS

statistics-lab™ 为您的留学生涯保驾护航 在代写自然语言处理NLP方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写自然语言处理NLP代写方面经验极为丰富，各种代写自然语言处理NLP相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

机器学习代写|自然语言处理代写NLP代考|Missing Data

How you decide to handle missing data depends on the specific dataset. Here are some ways to handle missing data (the first three techniques are manual techniques, and the other techniques are algorithms):

1. replace missing data with the mean/median/mode value
2. infer (“impute”) the value for missing data
3. delete rows with missing data
4. isolation forest (tree-based algorithm)
5. minimum covariance determinant
6. local outlier factor
7. one-class SVM (Support Vector Machines)
In general, replacing a missing numeric value with zero is a risky choice: this value is obviously incorrect if the values of a feature are between 1,000 and 5,000 . For a feature that has numeric values, replacing a missing value with the average value is better than the value zero (unless the average equals zero); also consider using the median value. For categorical data, consider using the mode to replace a missing value.

If you are not confident that you can impute a “reasonable” value, consider dropping the row with a missing value, and then train a model with the imputed value and also with the deleted row.

One problem that can arise after removing rows with missing values is that the resulting dataset is too small. In this case, consider using SMOTE, which is discussed later in this chapter, in order to generate synthetic data.

机器学习代写|自然语言处理代写NLP代考|Anomalies and Outliers

In simplified terms, an outlier is an abnormal data value that is outside the range of “normal” values. For example, a person’s height in centimeters is typically between 30 centimeters and 250 centimeters. Hence, a data point (e.g., a row of data in a spreadsheet) with a height of 5 centimeters or a height of 500 centimeters is an outlier. The consequences of these outlier values are unlikely to involve a significant financial or physical loss (though they could adversely affect the accuracy of a trained model).

Anomalies are also outside the “normal” range of values (just like outliers), and they are typically more problematic than outliers: anomalies can have more severe consequences than outliers. For example, consider the scenario in which someone who lives in California suddenly makes a credit

card purchase in New York. If the person is on vacation (or a business trip), then the purchase is an outlier (it’s outside the typical purchasing pattern), but it’s not an issue. However, if that person was in California when the credit card purchase was made, then it’s most likely to be credit card fraud, as well as an anomaly.

Unfortunately, there is no simple way to decide how to deal with anomalies and outliers in a dataset. Although you can drop rows that contain outliers, keep in mind that doing so might deprive the dataset-and therefore the trained model – of valuable information. You can try modifying the data values (described as follows), but again, this might lead to erroneous inferences in the trained model. Another possibility is to train a model with the dataset that contains anomalies and outliers, and then train a model with a dataset from which the anomalies and outliers have been removed. Compare the two results and see if you can infer anything meaningful regarding the anomalies and outliers.

机器学习代写|自然语言处理代写NLP代考|Outlier Detection

Although the decision to keep or drop outliers is your decision to make, there are some techniques available that help you detect outliers in a dataset. This section contains a short list of some techniques, along with a very brief description and links for additional information.

Perhaps trimming is the simplest technique (apart from dropping outliers), which involves removing rows whose feature value is in the upper $5 \%$ range or the lower $5 \%$ range. Winsorizing the data is an improvement over trimming: set the values in the top $5 \%$ range equal to the maximum value in the 95 th percentile, and set the values in the bottom $5 \%$ range equal to the minimum in the 5th percentile.

The Minimum Covariance Determinant is a covariance-based technique, and a Python-based code sample that uses this technique is available online:
https://scikit-learn.org/stable/modules/outlier_detection.html.
The Local Outlier Factor (LOF) technique is an unsupervised technique that calculates a local anomaly score via the kNN (k Nearest Neighbor) algorithm. Documentation and short code samples that use LOF are available online:
https://scikit-learn.org/stable/modules/generated/sklearn.neighbors. LocalOutlierFactor.html.

Two other techniques involve the Huber and the Ridge classes, both of which are included as part of Sklearn. The Huber error is less sensitive to

outliers because it’s calculated via the linear loss, similar to the MAE (Mean Absolute Error). A code sample that compares Huber and Ridge is available online:
https://scikit-learn.org/stable/auto_examples/linear_model/plot_huber_ ts_ridge.html.

You can also explore the Theil-Sen estimator and RANSAC, which are “robust” against outliers:
https://scikit-learn.org/stable/auto_examples/linear_model/plot_theilsen. html and
https://en.wikipedia.org/wiki/Random_sample_consensus.
Four algorithms for outlier detection are discussed at the following site:
https://www.kdnuggets.com/2018/12/four-techniques-outlier-detection. html.

One other scenario involves “local” outliers. For example, suppose that you use kMeans (or some other clustering algorithm) and determine that a value is an outlier with respect to one of the clusters. While this value is not necessarily an “absolute” outlier, detecting such a value might be important for your use case.

机器学习代写|自然语言处理代写NLP代考|Missing Data

1. 用均值/中值/众数替换缺失数据
2. 推断（“估算”）缺失数据的值
3. 删除缺少数据的行
4. 隔离森林（基于树的算法）
5. 最小协方差行列式
6. 局部异常因子
7. 一类 SVM（支持向量机）
一般来说，用零替换缺失的数值是一种冒险的选择：如果特征的值介于 1,000 和 5,000 之间，这个值显然是不正确的。对于具有数值的特征，用平均值代替缺失值优于零值（除非平均值等于零）；还可以考虑使用中值。对于分类数据，请考虑使用众数替换缺失值。

机器学习代写|自然语言处理代写NLP代考|Outlier Detection

https://scikit-learn.org/stable/modules/outlier_detection.html。

https://scikit-learn.org/stable/modules/generated/sklearn.neighbors。LocalOutlierFactor.html。

https://scikit-learn.org/stable/auto_examples/linear_model/plot_huber_ts_ridge.html。

https://scikit-learn.org/stable/auto_examples/linear_model/plot_theilsen。html 和
https://en.wikipedia.org/wiki/Random_sample_consensus。

https://www.kdnuggets.com/2018/12/four-techniques-outlier-detection。html。

有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

机器学习代写|自然语言处理代写NLP代考|Scaling Numeric Data via Standardization

statistics-lab™ 为您的留学生涯保驾护航 在代写自然语言处理NLP方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写自然语言处理NLP代写方面经验极为丰富，各种代写自然语言处理NLP相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

机器学习代写|自然语言处理代写NLP代考|Scaling Numeric Data via Standardization

The standardization technique involves finding the mean mu and the standard deviation sigma, and then mapping each $x i$ value to (xi-mu)/sigma. Recall the following formulas:
$\mathrm{mu}=[\operatorname{SUM}(\mathrm{x})] / \mathrm{n}$
$\operatorname{variance}(\mathrm{x})=[$ SUM $(\mathrm{x}-\mathrm{xbar}) *(\mathrm{x}-\mathrm{xbar})] / \mathrm{n}$
sigma $=\operatorname{sqrt}($ variance $)$
As a simple illustration of standardization, suppose that the random variable $x$ has the values ${-1,0,1}$. Then $m u$ and sigma are calculated as follows:
mu $\quad=($ SUM $x i) / n=(-1+0+1) / 3=0$
variance $=\left[\mathrm{SUM}(\mathrm{xi}-\mathrm{mu})^{\wedge} 2\right] / \mathrm{n}$
$=\left[(-1-0)^{\wedge} 2+(0-0)^{\wedge} 2+(1-0)^{\wedge} 2\right] / 3$
$=2 / 3$
sigma $=\operatorname{sqrt}(2 / 3)=0.816$ (approximate value)
Hence, the standardization of ${-1,0,1}$ is ${-1 / 0.816,0 / 0.816$,
$1 / 0.816}$, which in turn equals the set of values ${-1.2254,0,1.2254}$.
As another example, suppose that the random variable $x$ has the values
${-6,0,6}$. Then mu and sigma are calculated as follows:
$m u=(\mathrm{SUM} \mathrm{xi}) / \mathrm{n}=(-6+0+6) / 3=0$
variance $=\left[S U M(x i-m u)^{\wedge} 2\right] / \mathrm{n}$
$=\left[(-6-0)^{\wedge} 2+(0-0)^{\wedge} 2+(6-0)^{\wedge} 2\right] / 3$
$=72 / 3$
$=24$
sigma $=\operatorname{sqrt}(24)=4.899$ (approximate value)

Hence, the standardization of ${-6,0,6}$ is ${-6 / 4.899,0 / 4.899$, $6 / 4.899}$, which in turn equals the set of values ${-1.2247,0,1.2247}$.
In the preceding two examples, the mean equals 0 in both cases, but the variance and standard deviation are significantly different. The normalization of a set of values always produces a set of numbers between 0 and 1 .

However, the standardization of a set of values can generate numbers that are less than $-1$ and greater than 1 ; this will occur when sigma is less than the minimum value of every term $|\mathrm{mu}-\mathrm{xi}|$, where the latter is the absolute value of the difference between mu and each xi value. In the preceding example, the minimum difference equals 1 , whereas sigma is $0.816$, and therefore the largest standardized value is greater than $1 .$

机器学习代写|自然语言处理代写NLP代考|What to Look for in Categorical Data

This section contains various suggestions for handling inconsistent data values, and you can determine which ones to adopt based on any additional factors that are relevant to your particular task. For example, consider dropping columns that have very low cardinality (equal to or close to 1), as well as numeric columns with zero or very low variance.

Next, check the contents of categorical columns for inconsistent spellings or errors. A good example pertains to the gender category, which can consist of a combination of the following values:
male
Male
female
Female
$\mathrm{m}$
f
$M$
$\mathrm{F}$
The preceding categorical values for gender can be replaced with two categorical values (unless you have a valid reason to retain some of the other values). Moreover, if you are training a model whose analysis involves a single gender, then you need to determine which rows (if any) of a dataset must be excluded. Also check categorical data columns for redundant or missing white spaces.

Check for data values that have multiple data types, such as a numerical column with numbers as numerals and some numbers as strings or objects.

机器学习代写|自然语言处理代写NLP代考|Mapping Categorical Data to Numeric Values

Character data is often called categorical data, examples of which include people’s names, home or work addresses, and email addresses. Many types of categorical data involve short lists of values. For example, the days of the week and the months in a year involve seven and twelve distinct values, respectively. Notice that the days of the week have a relationship: For example, each day has a previous day and a next day. However, the colors of an automobile are independent of each other: the color red is not “better” or “worse” than the color blue.

There are several well-known techniques for mapping categorical values to a set of numeric values. A simple example where you need to perform this conversion involves the gender feature in the Titanic dataset. This feature is one of the relevant features for training a machine learning model. The gender feature has ${\mathbf{M}, \mathrm{F}}$ as its set of possible values. As you will see later in this chapter, Pandas makes it very easy to convert the set of values ${M, F}$ to the set of values ${0,1}$.

Another mapping technique involves mapping a set of categorical values to a set of consecutive integer values. For example, the set {Red, Green, Blue} can be mapped to the set of integers $[0,1,2}$. The set ${$ Male, Female $}$ can be mapped to the set of integers ${0,1}$. The days of the week can be mapped to ${0,1,2,3,4,5,6}$. Note that the first day of the week depends on the country: In some cases it’s Sunday, and in other cases it’s Monday.

Another technique is called one-hot encoding, which converts each value to a vector (check Wikipedia if you need a refresher regarding vectors). Thus, {Male, Female} can be represented by the vectors $[1,0]$ and $[0,1]$, and the colors {Red, Green, Blue} can be represented by the vectors $[1,0,0]$, $[0,1,0]$, and $[0,0,1]$. If you vertically “line up” the two vectors for gender, they form a $2 \times 2$ identity matrix, and doing the same for the colors will form a $3 \times 3$ identity matrix.

If you vertically “line up” the two vectors for gender, they form a $2 \times 2$ identity matrix, and doing the same for the colors will form a $3 \times 3$ identity matrix, as shown here:
$$[1,0,0]$$
$[0,1,0]$
$[0,0,1]$

机器学习代写|自然语言处理代写NLP代考|Scaling Numeric Data via Standardization

mu=(和X一世)/n=(−1+0+1)/3=0

=[(−1−0)∧2+(0−0)∧2+(1−0)∧2]/3
=2/3

−6,0,6. 然后 mu 和 sigma 计算如下：

=[(−6−0)∧2+(0−0)∧2+(6−0)∧2]/3
=72/3
=24

F

F

[1,0,0]
[0,1,0]
[0,0,1]

有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

机器学习代写|自然语言处理代写NLP代考|PREPARING DATASETS

statistics-lab™ 为您的留学生涯保驾护航 在代写自然语言处理NLP方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写自然语言处理NLP代写方面经验极为丰富，各种代写自然语言处理NLP相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

机器学习代写|自然语言处理代写NLP代考|Discrete Data Versus Continuous Data

As a simple rule of thumb: discrete data is a set of values that can be counted, whereas continuous data must be measured. Discrete data can reasonably fit in a drop-down list of values, but there is no exact value for making such a determination. One person might think that a list of 500 values is discrete, whereas another person might think it’s continuous.

For example, the list of provinces of Canada and the list of states of the United States are discrete data values, but is the same true for the number of countries in the world (roughly 200 ) or for the number of languages in the world (more than 7,000$)$ ?

Values for temperature, humidity, and barometric pressure are considered continuous. Currency is also treated as continuous, even though there is a measurable difference between two consecutive values. The smallest

unit of currency for U.S. currency is one penny, which is $1 / 100$ th of a dollar (accounting-based measurements use the “mil,” which is $1 / 1,000$ th of a dollar).
Continuous data types can have subtle differences. For example, someone who is 200 centimeters tall is twice as tall as someone who is 100 centimeters tall; the same is true for 100 kilograms versus 50 kilograms. However, temperature is different: 80 degrees Fahrenheit is not twice as hot as 40 degrees Fahrenheit.

Furthermore, keep in mind that the meaning of the word “continuous” in mathematics is not necessarily the same as continuous in machine learning. In the former, a continuous variable (let’s say in the 2D Euclidean plane) can have an uncountably infinite number of values. A feature in a dataset that can have more values than can be reasonably displayed in a drop-down list is treated as though it’s a continuous variable.

For instance, values for stock prices are discrete: they must differ by at least a penny (or some other minimal unit of currency), which is to say, it’s meaningless to say that the stock price changes by one-millionth of a penny. However, since there are so many possible stock values, it’s treated as a continuous variable. The same comments apply to car mileage, ambient temperature, and barometric pressure.

机器学习代写|自然语言处理代写NLP代考|“Binning” Continuous Data

Binning refers to subdividing a set of values into multiple intervals, and then treating all the numbers in the same interval as though they had the same value.

As a simple example, suppose that a feature in a dataset contains the age of people in a dataset. The range of values is approximately between 0 and 120 , and we could bin them into 12 equal intervals, where each consists of 10 values: 0 through 9,10 through 19,20 through 29 , and so forth.

However, partitioning the values of people’s ages as described in the preceding paragraph can be problematic. Suppose that person A, person B, and person C are 29,30 , and 39 , respectively. Then person $A$ and person $B$ are probably more similar to each other than person $B$ and person C, but because of the way in which the ages are partitioned, $B$ is classified as closer to $C$ than to A. In fact, binning can increase Type I errors (false positive) and Type II errors (false negative), as discussed in this blog post (along with some alternatives to binning):

As another example, using quartiles is even more coarse-grained than the earlier age-related binning example. The issue with binning pertains to the consequences of classifying people in different bins, even though they are in close proximity to each other. For instance, some people struggle financially because they earn a meager wage, and they are disqualified from financial assistance because their salary is higher than the cutoff point for receiving any assistance.

机器学习代写|自然语言处理代写NLP代考|Scaling Numeric Data via Normalization

A range of values can vary significantly, and it’s important to note that they often need to be scaled to a smaller range, such as values in the range $[-1,1]$ or $[0,1]$, which you can do via the tanh function or the sigmoid function, respectively.

For example, measuring a person’s height in terms of meters involves a range of values between $0.50$ meters and $2.5$ meters (in the vast majority of cases), whereas measuring height in terms of centimeters ranges between 50 centimeters and 250 centimeters: these two units differ by a factor of 100 . A person’s weight in kilograms generally varies between 5 kilograms and 200 kilograms, whereas measuring weight in grams differs by a factor of 1,000 . Distances between objects can be measured in meters or in kilometers, which also differ by a factor of 1,000 .

In general, use units of measure so that the data values in multiple features belong to a similar range of values. In fact, some machine learning algorithms require scaled data, often in the range of $[0,1]$ or $[-1,1]$. In addition to the tanh and sigmoid function, there are other techniques for scaling data, such as standardizing data (think Gaussian distribution) and normalizing data (linearly scaled so that the new range of values is in $[0,1]$ ).

The following examples involve a floating point variable $x$ with different ranges of values that will be scaled so that the new values are in the interval $[0,1]$.

• Example 1: If the values of $x$ are in the range $[0,2]$, then $x / 2$ is in the range $[0,1]$.
• Example 2: If the values of $x$ are in the range $[3,6]$, then $x-3$ is in the range $[0,3]$, and $(x-3) / 3$ is in the range $[0,1]$.
• Example 3: If the values of $x$ are in the range $[-10,20]$, then $x+10$ is in the range $[0,30]$, and $(x+10) / 30$ is in the range of $[0,1]$.

机器学习代写|自然语言处理代写NLP代考|“Binning” Continuous Data

https ://medium.com/@ peterflom/why-binning-continuous-data-is-almost always-a-mistake-ad0b3ald141f。

机器学习代写|自然语言处理代写NLP代考|Scaling Numeric Data via Normalization

• 示例 1：如果X在范围内[0,2]， 然后X/2在范围内[0,1].
• 示例 2：如果X在范围内[3,6]， 然后X−3在范围内[0,3]， 和(X−3)/3在范围内[0,1].
• 示例 3：如果X在范围内[−10,20]， 然后X+10在范围内[0,30]， 和(X+10)/30是在范围内[0,1].

有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

机器学习代写|tensorflow代写|Polynomial modelUsing regression for call-center volume prediction

TensorFlow是一个用于机器学习和人工智能的免费和开源的软件库。它可以用于一系列的任务，但特别关注深度神经网络的训练和推理。

statistics-lab™ 为您的留学生涯保驾护航 在代写tensorflow方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写tensorflow代写方面经验极为丰富，各种代写tensorflow相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

机器学习代写|tensorflow代写|Cleaning the data for regression

First, download this data-a set of phone calls from the summer of 2014 from the New York City 311 service-from http://mng.bz/P16w. Kaggle has other 311 datasets, but you’ll use this particular data due to its interesting properties. The calls are formatted as a comma-separated values (CSV) file that has several interesting features, including the following:

• A unique call identifier showing the date when the call was created
” The location and ZIP code of the reported incident or information request
• The specific action that the agent on the call took to resolve the issue
• What borough (such as the Bronx or Queens) the call was made from
• The status of the call
This dataset contains lot of useful information for machine learning, but for purposes of this exercise, you care only about the call-creation date. Create a new file named 311.py. Then write a function to read each line in the CSV file, detect the week number, and sum the call counts by week.

Your code will need to deal with some messiness in this data file. First, you aggregate individual calls, sometimes hundreds in a single day, into a seven-day or weekly bin, as identified by the bucket variable in listing 4.1. The freq (short for frequency) variable holds the value of calls per week and per year. If the $311 \mathrm{CSV}$ contains more than a year’s worth of data (as other 311 CSVs that you can find on Kaggle do), gin up your code to allow for selection by year of calls to train on. The result of the code in listing $4.1$ is a freq dictionary whose values are the number of calls indexed by year and by week number via the period variable. The $t$. tm_year variable holds the parsed year resulting from passing the call-creation-time value (indexed in the CSV as date_idx, an integer defining the column number where the date field is located) and the date_parse format string to Python’s time library’s strptime (or string parse time) function. The date parse format string is a pattern defining the way the date appears as text in the CSV so that Python knows how to convert it to a datetime representation.

机器学习代写|tensorflow代写|What’s in a bell curve? Predicting Gaussian distributions

A bell or normal curve is a common term to describe data that we say fits a normal distribution. The largest $Y$ values of the data occur in the middle or statistically the mean $\mathrm{X}$ value of the distribution of points, and the smaller $Y$ values occur on the early and tail X values of the distribution. We also call this a Gaussian distribution after the famous German mathematician Carl Friedrich Gauss, who was responsible for the Gaussian function that describes the normal distribution.

We can use the NumPy method np.random.normal to generate random points sampled from the normal distribution in Python. The following equation shows the Gaussian function that underlies this distribution:
$$e^{\frac{\left(-(x-\mu)^{2}\right)}{2 \sigma^{2}}}$$
The equation includes the parameters $\mu$ (pronounced $m u$ ) and $\sigma$ (pronounced sigma), where $m u$ is the mean and sigma is the standard deviation of the distribution, respectively. Mu and sigma are the parameters of the model, and as you have seen, TensorFlow will learn the appropriate values for these parameters as part of training a model.

To convince yourself that you can use these parameters to generate bell curves, you can type the code snippet in listing $4.3$ into a file named gaussian.py and then run it to produce the plot that follows it. The code in listing $4.3$ produces the bell curve visualizations shown in figure 4.4. Note that I selected values of mu between $-1$ and 2 . You should see center points of the curve in figure 4.4, as well as standard deviations (sigma) between 1 and 3 , so the width of the curves should correspond to those values inclusively. The code plots 120 linearly-spaced points with $\mathrm{X}$ values between $-3$ and 3 and $\mathrm{Y}$ values between 0 and 1 that fit the normal distribution according to $\mathrm{mu}$ and sigma, and the output should look like figure 4.4.

机器学习代写|tensorflow代写|Training your call prediction regressor

Now you are ready to use TensorFlow to fit your NYC 311 data to this model. It’s probably clear by looking at the curves that they seem to comport naturally with the 311 data, especially if TensorFlow can figure out the values of mu that put the center point of the curve near spring and summer and that have a fairly large call volume, as well as the sigma value that approximates the best standard deviation.

Listing $4.4$ sets up the TensorFlow training session, associated hyperparameters, learning rate, and number of training epochs. I’m using a fairly large step for learning rate so that TensorFlow can appropriately scan the values of mu and sig by taking bigenough steps before settling down. The number of epochs-5,000-gives the algorithm enough training steps to settle on optimal values. In local testing on my laptop, these hyperparameters arrived at strong accuracy $(99 \%)$ and took less than a minute. But I could have chosen other hyperparameters, such as a learning rate of $0.5$, and given the training process more steps (epochs). Part of the fun of machine learning is hyperparameter training, which is more art than science, though techniques such as meta-learning and algorithms such as HyperOpt may ease this process in the future. A full discussion of hyperparameter tuning is beyond the scope of this chapter, but an online search should yields thousands of relevant introductions.

When the hyperparameters are set up, define the placeholders $\mathrm{X}$ and $\mathrm{Y}$, which will be used for the input week number and associated number of calls (normalized), respectively. Earlier, I mentioned normalizing the Y values and creating the ny_train variable in listing $4.2$ to ease learning. The reason is that the model Gaussian function that we are attempting to learn has $\mathrm{Y}$ values only between 0 and 1 due to the exponent e. The model function defines the Gaussian model to learn, with the associated variables mu and sig initialized arbitrarily to 1. The cost function is defined as the L2 norm, and the training uses Gradient descent. After training your regressor for 5,000 epochs, the final steps in listing $4.4$ print the learned values for mu and sig.

机器学习代写|tensorflow代写|Polynomial model

F(X)=在nXn+…+在1X+在0

机器学习代写|tensorflow代写|Application of linear regression

• 马萨诸塞大学阿默斯特分校在 https://scholarworks.umass.edu/data 提供各种类型的小型数据集。
• Kaggle 在 https://www.kaggle.com/datasets 为机器学习竞赛提供所有类型的大规模数据。
= Data.gov (https://catalog.data.gov) 是美国政府的一项开放数据计划，其中包含许多有趣且实用的数据集。

import csv import time
def read(filename, date_idx, date_parse, year, bucket=7)=
days_in_year=365

if t.tm_year == year and吨.tm_yday<(days_in_year-1):

return freq

有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

机器学习代写|tensorflow代写|Polynomial model

TensorFlow是一个用于机器学习和人工智能的免费和开源的软件库。它可以用于一系列的任务，但特别关注深度神经网络的训练和推理。

statistics-lab™ 为您的留学生涯保驾护航 在代写tensorflow方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写tensorflow代写方面经验极为丰富，各种代写tensorflow相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

机器学习代写|tensorflow代写|Polynomial model

Linear models may be an intuitive first guess, but real-world correlations are rarely so simple. The trajectory of a missile through space, for example, is curved relative to the observer on Earth. Wi-Fi signal strength degrades with an inverse square law. The change in height of a flower over its lifetime certainly isn’t linear.

When data points appear to form smooth curves rather than straight lines, you need to change your regression model from a straight line to something else. One such approach is to use a polynomial model. A polynomial is a generalization of a linear function. The $n$th degree polynomial looks like the following:
$$f(x)=w_{n} x^{n}+\ldots+w_{1} x+w_{0}$$
NOTE When $n=1$, a polynomial is simply a linear equation $f(x)=w_{1} x+\mathrm{w}_{0}$.
Consider the scatter plot in figure $3.10$, showing the input on the $x$-axis and the output on the y-axis. As you can tell, a straight line is insufficient to describe all the data. A polynomial function is a more flexible generalization of a linear function.

机器学习代写|tensorflow代写|Regularization

Don’t be fooled by the wonderful flexibility of polynomials, as shown in section $3.3$. Just because higher-order polynomials are extensions of lower ones doesn’t mean that you should always prefer the more flexible model.

In the real world, raw data rarely forms a smooth curve mimicking a polynomial. Suppose that you’re plotting house prices over time. The data likely will contain fluctuations. The goal of regression is to represent the complexity in a simple mathematical equation. If your model is too flexible, the model may be overcomplicating its interpretation of the input.

Take, for example, the data presented in figure 3 .12. You try to fit an eighth-degree polynomial into points that appear to follow the equation $y=x^{2}$. This process fails miserably, as the algorithm tries its best to update the nine coefficients of the polynomial.

To influence the learning algorithm to produce a smaller coefficient vector (let’s call it $w$ ), you add that penalty to the loss term. To control how significantly you want to weigh the penalty term, you multiply the penalty by a constant non-negative number, $\lambda$, as follows:
$$\operatorname{Cost}(X, Y)=\operatorname{Loss}(X, Y)+\lambda$$
If $\lambda$ is set to 0 , regularization isn’t in play. As you set $\lambda$ to larger and larger values, parameters with larger norms will be heavily penalized. The choice of norm varies case by case, but parameters are typically measured by their Ll or L2 norm. Simply put, regularization reduces some of the flexibility of the otherwise easily tangled model.

To figure out which value of the regularization parameter $\lambda$ performs best, you must split your dataset into two disjointed sets. About $70 \%$ of the randomly chosen input/output pairs will consist of the training dataset; the remaining $30 \%$ will be used for testing. You’ll use the function provided in listing $3.4$ for splitting the dataset.

机器学习代写|tensorflow代写|Application of linear regression

Running linear regression on fake data is like buying a new car and never driving it. This awesome machinery begs to manifest itself in the real world! Fortunately, many datasets are available online to test your newfound knowledge of regression:

• The University of Massachusetts Amherst supplies small datasets of various types at https://scholarworks.umass.edu/data.
• Kaggle provides all types of large-scale data for machine-learning competitions at https://www.kaggle.com/datasets.
= Data.gov (https://catalog.data.gov) is an open data initiative by the US government that contains many interesting and practical datasets.

A good number of datasets contain dates. You can find a dataset of all phone calls to the 311 nonemergency line in Los Angeles, California, for example, at https://www .dropbox.com/s/naw774olqkve7sc/311.csv?dl=0. A good feature to track could be the frequency of calls per day, week, or month. For convenience, listing $3.6$ allows you to obtain a weekly frequency count of data items.

import csv import time
def read(filename, date_idx, date_parse, year, bucket $=7)=$
days_in_year $=365$
freq $={} \quad \mid$ Sets up initial frequency map
for period in range $(0$, int(days_in year / bucket)):
freq [period] $=0$
With open(filename, “rb’) as csvfile: csvreader = csv. reader (csvfile) csvreader. next() $\quad$ Reads data and aggregates count per period
for row in csvreader:
if $\operatorname{row}\left[\right.$ date_idx] $=={ }^{\prime}=$
continue
$t=$ time.strptime (row [date_idx], date_parse)
if t.tm_year == year and $t .$ tm_yday $<$ (days_in_year-1):
freq[int(t.tm_yday / bucket)] $+=1$
return freq
This code gives you the training data for linear regression. The freq variable is a dictionary that maps a period (such as a week) to a frequency count. A year has 52 weeks, so you’ll have 52 data points if you leave bucket=7 as is.

机器学习代写|tensorflow代写|Polynomial model

F(X)=在nXn+…+在1X+在0

机器学习代写|tensorflow代写|Application of linear regression

• 马萨诸塞大学阿默斯特分校在 https://scholarworks.umass.edu/data 提供各种类型的小型数据集。
• Kaggle 在 https://www.kaggle.com/datasets 为机器学习竞赛提供所有类型的大规模数据。
= Data.gov (https://catalog.data.gov) 是美国政府的一项开放数据计划，其中包含许多有趣且实用的数据集。

import csv import time
def read(filename, date_idx, date_parse, year, bucket=7)=
days_in_year=365

if t.tm_year == year and吨.tm_yday<(days_in_year-1):

return freq

有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。