计算机代写|机器学习代写machine learning代考|COMP30027

## 计算机代写|机器学习代写machine learning代考|Evaluating Regression Models

When developing the earlier linear models, we were somewhat imprecise about what is meant by a ‘line of best fit’ (or generally a model of best fit). Indeed, the pseudoinverse is not a ‘solution’ to the system of equations given in Equation (2.8), but is merely an approximation (naturally, the line of best fit does not pass through all points exactly).

Here, we would like to be more precise about what it means for a model to be ‘good.’ This is a key issue when fitting and evaluating any machine learning model: one needs a way of quantifying how closely a model fits the given data. Given a desired measure of success, we can compare alternative models against this measure and design optimization schemes that optimize the desired measure directly.

A commonly used evaluation criterion when evaluating regression algorithms is called the mean squared error, or MSE. The MSE between a model $f_\theta(X)$ and a set of labels $y$ is defined as
$$\operatorname{MSE}\left(y, f_\theta(X)\right)=\frac{1}{|y|} \sum_{i=1}^{|y|}\left(f_\theta\left(x_i\right)-y_i\right)^2,$$
in other words, the average squared difference between the model’s predictions and the labels. Often reported is also the root mean squared error (RMSE), that is, $\sqrt{\operatorname{MSE}\left(y, f_\theta(X)\right)}$; the RMSE is sometimes preferable as it is consistent in scale with the original labels.

With some effort, it can be shown that the linear model $f_\theta(X)$ that minimizes the MSE compared to the labels $y$ is given by using the pseudoinverse as in Equation (2.10). We leave this as an exercise (Exercise 2.6).

## 计算机代写|机器学习代写machine learning代考|Why the Mean Squared Error

Although the MSE has a convenient relationship with the pseudoinverse, it may otherwise seem a somewhat arbitrary choice of error measure. For instance, it may seem more obvious at first to compute an error measure such as the mean absolute error (or MAF):
$$\operatorname{MAE}\left(y, f_\theta(X)\right)=\frac{1}{|y|} \sum_{i=1}^{|y|}\left|f_\theta\left(x_i\right)-y_i\right| \text {. }$$
Or, why not count the number of times the model is wrong by more than one star? For that matter, why not measure the mean cubed error?

To defend the MSE as a reasonable choice, we need to characterize what types of errors are more ‘likely’ than others. Essentially, the MSE assigns very small penalties to small errors and very large penalties to large errors. This is in contrast to, say, the MAE, which assigns penalties precisely in proportion to how large the error is. What the MSE therefore seems to be assuming is that small errors are common and large errors are particularly uncommon.

What we are talking about informally here is a notion of how errors are distributed under some model. Formally, we say that the labels are equal to our model’s predictions, plus some error:
$$y=\underbrace{f_\theta(X)}{\text {prediction }}+\underbrace{\epsilon}{\text {error }},$$
and that our error follows some probability distribution. Our argument here said that small errors are common and large errors are very rare. This suggests that errors may be distributed following a bell curve, which we could capture with a Gaussian (or ‘Normal’) distribution:
$$\epsilon \sim \mathcal{N}\left(0, \sigma^2\right) .$$
The density function for a (zero mean) Gaussian distribution is given by
$$f^{\prime}\left(x^{\prime}\right)=\frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{1}{2}\left(\frac{2}{\sigma}\right)^2}$$

$$\operatorname{MSE}\left(y, f_\theta(X)\right)=\frac{1}{|y|} \sum_{i=1}^{|y|}\left(f_\theta\left(x_i\right)-y_i\right)^2,$$

$$\operatorname{MAE}\left(y, f_\theta(X)\right)=\frac{1}{|y|} \sum_{i=1}^{|y|}\left|f_\theta\left(x_i\right)-y_i\right| .$$

$$y=\underbrace{f_\theta(X)} \text { prediction }+\underbrace{\epsilon} \text { error, }$$

$$\epsilon \sim \mathcal{N}\left(0, \sigma^2\right) .$$
(零均值) 高斯分布的密度函数由下式给出
$$f^{\prime}\left(x^{\prime}\right)=\frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{1}{2}\left(\frac{2}{\sigma}\right)^2}$$

