### 机器学习代写|tensorflow代写|Polynomial model

TensorFlow是一个用于机器学习和人工智能的免费和开源的软件库。它可以用于一系列的任务，但特别关注深度神经网络的训练和推理。

## 机器学习代写|tensorflow代写|Polynomial model

Linear models may be an intuitive first guess, but real-world correlations are rarely so simple. The trajectory of a missile through space, for example, is curved relative to the observer on Earth. Wi-Fi signal strength degrades with an inverse square law. The change in height of a flower over its lifetime certainly isn’t linear.

When data points appear to form smooth curves rather than straight lines, you need to change your regression model from a straight line to something else. One such approach is to use a polynomial model. A polynomial is a generalization of a linear function. The $n$th degree polynomial looks like the following:
$$f(x)=w_{n} x^{n}+\ldots+w_{1} x+w_{0}$$
NOTE When $n=1$, a polynomial is simply a linear equation $f(x)=w_{1} x+\mathrm{w}_{0}$.
Consider the scatter plot in figure $3.10$, showing the input on the $x$-axis and the output on the y-axis. As you can tell, a straight line is insufficient to describe all the data. A polynomial function is a more flexible generalization of a linear function.

## 机器学习代写|tensorflow代写|Regularization

Don’t be fooled by the wonderful flexibility of polynomials, as shown in section $3.3$. Just because higher-order polynomials are extensions of lower ones doesn’t mean that you should always prefer the more flexible model.

In the real world, raw data rarely forms a smooth curve mimicking a polynomial. Suppose that you’re plotting house prices over time. The data likely will contain fluctuations. The goal of regression is to represent the complexity in a simple mathematical equation. If your model is too flexible, the model may be overcomplicating its interpretation of the input.

Take, for example, the data presented in figure 3 .12. You try to fit an eighth-degree polynomial into points that appear to follow the equation $y=x^{2}$. This process fails miserably, as the algorithm tries its best to update the nine coefficients of the polynomial.

To influence the learning algorithm to produce a smaller coefficient vector (let’s call it $w$ ), you add that penalty to the loss term. To control how significantly you want to weigh the penalty term, you multiply the penalty by a constant non-negative number, $\lambda$, as follows:
$$\operatorname{Cost}(X, Y)=\operatorname{Loss}(X, Y)+\lambda$$
If $\lambda$ is set to 0 , regularization isn’t in play. As you set $\lambda$ to larger and larger values, parameters with larger norms will be heavily penalized. The choice of norm varies case by case, but parameters are typically measured by their Ll or L2 norm. Simply put, regularization reduces some of the flexibility of the otherwise easily tangled model.

To figure out which value of the regularization parameter $\lambda$ performs best, you must split your dataset into two disjointed sets. About $70 \%$ of the randomly chosen input/output pairs will consist of the training dataset; the remaining $30 \%$ will be used for testing. You’ll use the function provided in listing $3.4$ for splitting the dataset.

## 机器学习代写|tensorflow代写|Application of linear regression

Running linear regression on fake data is like buying a new car and never driving it. This awesome machinery begs to manifest itself in the real world! Fortunately, many datasets are available online to test your newfound knowledge of regression:

• The University of Massachusetts Amherst supplies small datasets of various types at https://scholarworks.umass.edu/data.
• Kaggle provides all types of large-scale data for machine-learning competitions at https://www.kaggle.com/datasets.
= Data.gov (https://catalog.data.gov) is an open data initiative by the US government that contains many interesting and practical datasets.

A good number of datasets contain dates. You can find a dataset of all phone calls to the 311 nonemergency line in Los Angeles, California, for example, at https://www .dropbox.com/s/naw774olqkve7sc/311.csv?dl=0. A good feature to track could be the frequency of calls per day, week, or month. For convenience, listing $3.6$ allows you to obtain a weekly frequency count of data items.

import csv import time
def read(filename, date_idx, date_parse, year, bucket $=7)=$
days_in_year $=365$
freq $={} \quad \mid$ Sets up initial frequency map
for period in range $(0$, int(days_in year / bucket)):
freq [period] $=0$
With open(filename, “rb’) as csvfile: csvreader = csv. reader (csvfile) csvreader. next() $\quad$ Reads data and aggregates count per period
if $\operatorname{row}\left[\right.$ date_idx] $=={ }^{\prime}=$
continue
$t=$ time.strptime (row [date_idx], date_parse)
if t.tm_year == year and $t .$ tm_yday $<$ (days_in_year-1):
freq[int(t.tm_yday / bucket)] $+=1$
return freq
This code gives you the training data for linear regression. The freq variable is a dictionary that maps a period (such as a week) to a frequency count. A year has 52 weeks, so you’ll have 52 data points if you leave bucket=7 as is.

