## 计算机代写|机器学习代写machine learning代考|Supervised Learning

All of the techniques presented in this chapter-and most of the personalization techniques we will explore throughout this book-are forms of supervised learning. Supervised learning techniques assume that our prediction tasks (or our datasets) can be separated into the following two components:
labels (denoted $y$ ) that we would like to predict, and features (denoted $X$ ) that we believe will help us to predict those labels. ${ }^1$
For example, given a sentiment analysis task (chap. 8), our data might be (the text of) reviews from Amazon or Yelp, and our labels would be the ratings associated with those reviews.

Given this distinction between features and labels in a dataset, the goal of a supervised learning algorithm is to infer the underlying function
$$f(x) \rightarrow y$$
that explains the relationship between the features and the labels. Usually, this function will be parameterized by model parameters $\theta$, that is,
$$f_\theta(x) \rightarrow y .$$
For example, in this chapter, $\theta$ might describe which features are positively or negatively correlated (or uncorrelated) with the labels; later, $\theta$ might capture the preferences of a particular user in a recommender system (chap. 5). Figure 2.1 explains how this type of supervised approach relates to other types of learning.

Throughout this chapter, we will assume that we are given labels in the form of a vector $y$ and features in the form of a matrix $X$, so that each $y_i$ is the label associated with the $i$ th observation and $x_i$ is a vector of features associated with that observation.

The two categories of supervised learning that we will cover in this and the next chapter include:

• Regression, in which our goal is to predict real-valued labels $y$ as closely as possible (sec. 2.1). When building personalized models in later chapters,
• such targets may include ratings, sentiment, the number of votes a social media post receives, or a patient’s heart rate.
• Classification, in which $y$ is an element of a discrete set (chap. 3). In later chapters, these will correspond to outcomes such as whether a user clicks on or purchases an item. We will also see how such approaches can be adapted to learn rankings over items (sec. 3.3.3).

## 计算机代写|机器学习代写machine learning代考|Linear Regression

Perhaps the simplest association we could assume between our features $X$ and labels $y$ would be a linear relationship, that is, the relationship between $X$ and $y$ is defined as
$$y=X \theta .$$
Using our notation from Equation (2.2):
$$f_\theta(X)=X \theta,$$
or equivalently for a single observation $x_i$ (a row of $X$ )
$$f_\theta(x)=x_i \cdot \theta=\sum_k x_{i k} \theta_i .$$
Here $\theta$ is our set of model parameters: a vector of unknowns that describes which features are relevant to predicting the labels.

Ignoring strict notation for now, a trivial example might consist of predicting a review’s rating as a function of its length. To do so, let us consider a small dataset of 100 (length, rating) pairs from Goodreads fantasy novels (Wan and McAuley, 2018). Figure $2.2$ plots the relationship between review length (in characters) and the rating.

From Figure 2.2, there appears to be a (rough) association between ratings and review length, that is, more positive reviews tend to be longer. A very simple model might attempt to describe that relationship with a line, that is,
$$\text { rating } \simeq \theta_0+\theta_1 \times \text { (review length). }$$
Note that Equation (2.6) is just the standard equation for a line $(y=m x+b)$, where $\theta_1$ is a slope and $\theta_0$ is an intercept.

If we can identify a line that approximately describes this relationship, we can use it to estimate a rating from a given review, even though we may never have seen a review of some specific length before. In this sense, the line is a simple model of the data, as it allows us to predict labels from previously unseen features. To do so, we formalize the problem of finding a line of best fit. Specifically, we are interested in identifying the values of $\theta_0$ and $\theta_1$ that most closely match the trend in Figure 2.2. To solve for $\theta=\left[\theta_0, \theta_1\right]$, we can write out the problem as a system of equations in matrix form:
$$y \simeq X \cdot \theta,$$
where $y$ is our vector of observed ratings and $X$ is our matrix of observed features (in this case the reviews’ lengths).

# 机器学习代考

## 计算机代写|机器学习代写machine learning代考|Supervised Learning

$$f(x) \rightarrow y$$

$$f_\theta(x) \rightarrow y .$$

• 回归，我们的目标是预测实值标签 $y$ 尽可能接近（第 $2.1$ 节) 。在后面的章节中构建个性化模型时，
• 这些目标可能包括评级、情绪、社交媒体帖子获得的投票数或患者的心率。
• 分类，其中 $y$ 是离散集的一个元素（第 3 章) 。在后面的章节中，这些将对应于结果，例如用户是否 点击或购买了商品。我们还将看到如何调整这些方法来学习项目排名（第 $3.3 .3$ 节)。

## 计算机代写|机器学习代写machine learning代考|Linear Regression

$$y=X \theta \text {. }$$

$$f_\theta(X)=X \theta$$

$$f_\theta(x)=x_i \cdot \theta=\sum_k x_{i k} \theta_i .$$

$$\text { rating } \simeq \theta_0+\theta_1 \times \text { (review length) } .$$

$$y \simeq X \cdot \theta,$$ 在哪里 $y$ 是我们观察到的评级向量，并且 $X$ 是我们观察到的特征矩阵（在本例中是评论的长度）。

