## 统计代写|线性回归代写linear regression代考|MATH5386

## 统计代写|线性回归代写linear regression代考|Checking Goodness of Fit

It is crucial to realize that an MLR model is not necessarily a useful model for the data, even if the data set consists of a response variable and several predictor variables. For example, a nonlinear regression model or a much more complicated model may be needed. Chapters 1 and 13 describe several alternative models. Let $p$ be the number of predictors and $n$ the number of cases. Assume that $n \geq 5 p$, then plots can be used to check whether the MLR model is useful for studying the data. This technique is known as checking the goodness of fit of the MLR model.

Notation. Plots will be used to simplify regression analysis, and in this text a plot of $W$ versus $Z$ uses $W$ on the horizontal axis and $Z$ on the vertical axis.

Definition 2.10. A scatterplot of $X$ versus $Y$ is a plot of $X$ versus $Y$ and is used to visualize the conditional distribution $Y \mid X$ of $Y$ given $X$.

Definition 2.11. A response plot is a plot of a variable $w_i$ versus $Y_i$. Typically $w_i$ is a linear combination of the predictors: $w_i=\boldsymbol{x}_i^T \boldsymbol{\eta}$ where $\boldsymbol{\eta}$ is a known $p \times 1$ vector. The most commonly used response plot is a plot of the fitted values $\widehat{Y}_i$ versus the response $Y_i$.

Proposition 2.1. Suppose that the regression estimator $\boldsymbol{b}$ of $\boldsymbol{\beta}$ is used to find the residuals $r_i \equiv r_i(\boldsymbol{b})$ and the fitted values $\widehat{Y}_i \equiv \widehat{Y}_i(\boldsymbol{b})=\boldsymbol{x}_i^T \boldsymbol{b}$. Then in the response plot of $\widehat{Y}_i$ versus $Y_i$, the vertical deviations from the identity line (that has unit slope and zero intercept) are the residuals $r_i(b)$.

Proof. The identity line in the response plot is $Y=\boldsymbol{x}^T \boldsymbol{b}$. Hence the vertical deviation is $Y_i-\boldsymbol{x}_i^T \boldsymbol{b}=r_i(\boldsymbol{b})$.

Definition 2.12. A residual plot is a plot of a variable $w_i$ versus the residuals $r_i$. The most commonly used residual plot is a plot of $\hat{Y}_i$ versus $r_i$.
Notation: For MLR, “the residual plot” will often mean the residual plot of $\hat{Y}_i$ versus $r_i$, and “the response plot” will often mean the plot of $\hat{Y}_i$ versus $Y_i^*$

If the unimodal MLR model as estimated by least squares is useful, then in the response plot the plotted points should scatter about the identity line while in the residual plot of $\hat{Y}$ versus $r$ the plotted points should scatter about the $r=0$ line (the horizontal axis) with no other pattern. Figures $1.2$ and $1.3$ show what a response plot and residual plot look like for an artificial MLR data set where the MLR regression relationship is rather strong in that the sample correlation $\operatorname{corr}(\tilde{Y}, Y)$ is near 1 . Figure $1.4$ shows a response plot where the response $Y$ is independent of the nontrivial predictors in the model. Here $\operatorname{corr}(\hat{Y}, Y)$ is near 0 but the points still scatter about the identity line. When the MLR relationship is very weak, the response plot will look like the residual plot.

## 统计代写|线性回归代写linear regression代考|Checking Lack of Fit

The response plot may look good while the residual plot suggests that the unimodal MLR model can be improved. Examining plots to find model violations is called checking for lack of fit. Again assume that $n \geq 5 p$.

The unimodal MLR model often provides a useful model for the data, but the following assumptions do need to be checked.
i) Is the MLR model appropriate?
ii) Are outliers present?
iii) Is the error variance constant or nonconstant? The constant variance assumption $\operatorname{VAR}\left(e_i\right) \equiv \sigma^2$ is known as homoscedasticity. The nonconstant variance assumption $\operatorname{VAR}\left(e_i\right)=\sigma_i^2$ is known as heteroscedasticity.
iv) Are any important predictors left out of the model?
v) Are the errors $e_1, \ldots, e_n$ iid?
vi) Are the errors $e_i$ independent of the predictors $\boldsymbol{x}_i$ ?
Make the response plot and the residual plot to check i), ii), and iii). An MLR model is reasonable if the plots look like Figures 1.2, 1.3, 1.4, and 2.1. A response plot that looks like Figure $13.7$ suggests that the model is not linear. If the plotted points in the residual plot do not scatter about the $r=0$ line with no other pattern (i.e., if the cloud of points is not ellipsoidal or rectangular with zero slope), then the unimodal MLR model is not sustained.
The $i$ th residual $r_i$ is an estimator of the $i$ th error $e_i$. The constant variance assumption may have been violated if the variability of the point cloud in the residual plot depends on the value of $\hat{Y}$. Often the variability of the residuals increases as $\hat{Y}$ increases, resulting in a right opening megaphone shape. (Figure $4.1 \mathrm{~b}$ has this shape.) Often the variability of the residuals decreases as $\hat{Y}$ increases, resulting in a left opening megaphone shape. Sometimes the variability decreases then increases again, and sometimes the variability increases then decreases again (like a stretched or compressed football).

# 线性回归代写

## 统计代写|线性回归代写线性回归代考|检查不符合

. . . .

i) MLR模型是否合适?
ii)是否存在异常值?
iii)误差方差是常数还是非常数?恒定方差假设 $\operatorname{VAR}\left(e_i\right) \equiv \sigma^2$ 被称为同异性。非常数方差假设 $\operatorname{VAR}\left(e_i\right)=\sigma_i^2$
iv)模型中是否遗漏了任何重要的预测因子?
v)是否存在误差 $e_1, \ldots, e_n$
vi)错误 $e_i$ 独立于预测因子 $\boldsymbol{x}_i$ ?

$i$ 残差 $r_i$ 的估计值 $i$ 错误 $e_i$。如果残差图中点云的可变性取决于的值，则可能违反恒定方差假设 $\hat{Y}$。通常残差的可变性会随着 $\hat{Y}$ 增加，导致右开口扩音器形状。(图 $4.1 \mathrm{~b}$ 有这个形状。)残差的变异性往往随着 $\hat{Y}$ 增加，导致左侧开口扩音器形状。有时可变性会减小然后再次增大，有时可变性会增大然后再次减小(就像一个拉伸或压缩的足球)

## 统计代写|线性回归代写linear regression代考|MATH839

## 统计代写|线性回归代写linear regression代考|Variable Selection

A standard problem in $1 \mathrm{D}$ regression is variable selection, also called subset or model selection. Assume that the 1D regression model uses a linear predictor
$$Y \Perp \boldsymbol{x} \mid\left(\alpha+\boldsymbol{\beta}^T \boldsymbol{x}\right),$$
that a constant $\alpha$ is always included, that $\boldsymbol{x}=\left(x_1, \ldots, x_{p-1}\right)^T$ are the $p-1$ nontrivial predictors, and that the $n \times p$ matrix $\boldsymbol{X}$ with $i$ th row $\left(1, \boldsymbol{x}_i^T\right)$ has full rank $p$. Then variable selection is a search for a subset of predictor variables that can be deleted without important loss of information.

To clarify ideas, assume that there exists a subset $S$ of predictor variables such that if $x_S$ is in the $1 \mathrm{D}$ model, then none of the other predictors are needed in the model. Write $E$ for these (‘extraneous’) variables not in $S$, partitioning $\boldsymbol{x}=\left(\boldsymbol{x}_S^T, \boldsymbol{x}_E^T\right)^T$. Then
$$S P=\alpha+\boldsymbol{\beta}^T \boldsymbol{x}=\alpha+\boldsymbol{\beta}_S^T \boldsymbol{x}_S+\boldsymbol{\beta}_E^T \boldsymbol{x}_E=\alpha+\boldsymbol{\beta}_S^T \boldsymbol{x}_S .$$
The extraneous terms that can be eliminated given that the subset $S$ is in the model have zero coefficients: $\boldsymbol{\beta}_E=\mathbf{0}$.

Now suppose that $I$ is a candidate subset of predictors, that $S \subseteq I$ and that $O$ is the set of predictors not in $I$. Then
$$S P-\alpha+\boldsymbol{\beta}^T \boldsymbol{x}-\alpha+\boldsymbol{\beta}S^T \boldsymbol{x}_S-\alpha+\boldsymbol{\beta}_S^T \boldsymbol{x}_S+\boldsymbol{\beta}{(I / S)}^T \boldsymbol{x}{I / S}+\mathbf{0}^T \boldsymbol{x}_O-\alpha+\boldsymbol{\beta}_I^T \boldsymbol{x}_I,$$ where $\boldsymbol{x}{I / S}$ denotes the predictors in $I$ that are not in $S$. Since this is true regardless of the values of the predictors, $\boldsymbol{\beta}O=\mathbf{0}$ if $S \subseteq I$. Hence for any subset $I$ that includes all relevant predictors, the population correlation $$\operatorname{corr}\left(\alpha+\boldsymbol{\beta}^{\mathrm{T}} \boldsymbol{x}{\mathrm{i}}, \alpha+\boldsymbol{\beta}{\mathrm{I}}^{\mathrm{T}} \boldsymbol{x}{\mathrm{I}, \mathrm{i}}\right)=1 .$$
This observation, which is true regardless of the explanatory power of the model, suggests that variable selection for a $1 \mathrm{D}$ regression model (1.11) is simple in principle. For each value of $j=1,2, \ldots, p-1$ nontrivial predictors, keep track of subsets $I$ that provide the largest values of corr(ESP,ESP $(I))$. Any such subset for which the correlation is high is worth closer investigation and consideration.

## 统计代写|线性回归代写linear regression代考|Other Issues

The $1 \mathrm{D}$ regression models offer a unifying framework for many of the most used regression models. By writing the model in terms of the sufficient predictor $S P=h(\boldsymbol{x})$, many important topics valid for all $1 \mathrm{D}$ regression models can be explained compactly. For example, the previous section presented variable selection, and equation (1.14) can be used to motivate the test for whether the reduced model can be used instead of the full model. Similarly, the sufficient predictor can be used to unify the interpretation of coefficients and to explain models that contain interactions and factors.
Interpretation of Coefficients
One interpretation of the coefficients in a 1D model (1.11) is that $\beta_i$ is the rate of change in the SP associated with a unit increase in $x_i$ when all other predictor variables $x_1, \ldots, x_{i-1}, x_{i+1}, \ldots, x_p$ are held fixed. Denote a model by $S P=\alpha+\boldsymbol{\beta}^T \boldsymbol{x}=\alpha+\beta_1 x_1+\cdots+\beta_p x_p$. Then
$$\beta_i=\frac{\partial S P}{\partial x_i} \text { for } \mathrm{i}=1, \ldots, \mathrm{p} .$$
Of course, holding all other variables fixed while changing $x_i$ may not be possible. For example, if $x_1=x, x_2=x^2$ and $S P=\alpha+\beta_1 x+\beta_2 x^2$, then $x_2$ cannot be held fixed when $x_1$ increases by one unit, but
$$\frac{d S P}{d x}=\beta_1+2 \beta_2 x .$$
The interpretation of $\beta_i$ changes with the model in two ways. First, the interpretation changes as terms are added and deleted from the SP. Hence the interpretation of $\beta_1$ differs for models $S P=\alpha+\beta_1 x_1$ and $S P=\alpha+\beta_1 x_1+\beta_2 x_2$. Secondly, the interpretation changes as the parametric or semiparametric form of the model changes.

# 线性回归代写

## 统计代写|线性回归代写线性回归代考|变量选择

$1 \mathrm{D}$回归中的一个标准问题是变量选择，也称为子集或模型选择。假设一维回归模型使用线性预测器
$$Y \Perp \boldsymbol{x} \mid\left(\alpha+\boldsymbol{\beta}^T \boldsymbol{x}\right),$$
，总是包含一个常数$\alpha$, $\boldsymbol{x}=\left(x_1, \ldots, x_{p-1}\right)^T$是$p-1$的非平凡预测器，并且$n \times p$矩阵$\boldsymbol{X}$与$i$的第一行$\left(1, \boldsymbol{x}_i^T\right)$具有满秩$p$。然后，变量选择是搜索一个预测变量的子集，可以删除而不丢失重要信息

$$S P=\alpha+\boldsymbol{\beta}^T \boldsymbol{x}=\alpha+\boldsymbol{\beta}_S^T \boldsymbol{x}_S+\boldsymbol{\beta}_E^T \boldsymbol{x}_E=\alpha+\boldsymbol{\beta}_S^T \boldsymbol{x}_S .$$

$$S P-\alpha+\boldsymbol{\beta}^T \boldsymbol{x}-\alpha+\boldsymbol{\beta}S^T \boldsymbol{x}_S-\alpha+\boldsymbol{\beta}_S^T \boldsymbol{x}_S+\boldsymbol{\beta}{(I / S)}^T \boldsymbol{x}{I / S}+\mathbf{0}^T \boldsymbol{x}_O-\alpha+\boldsymbol{\beta}_I^T \boldsymbol{x}_I,$$ 哪里 $\boldsymbol{x}{I / S}$ 中的预测器 $I$ 这些都不在 $S$。由于无论预测因子的值是多少，这都是正确的， $\boldsymbol{\beta}O=\mathbf{0}$ 如果 $S \subseteq I$。因此对于任何子集 $I$ 这包括所有相关的预测因子，总体相关性 $$\operatorname{corr}\left(\alpha+\boldsymbol{\beta}^{\mathrm{T}} \boldsymbol{x}{\mathrm{i}}, \alpha+\boldsymbol{\beta}{\mathrm{I}}^{\mathrm{T}} \boldsymbol{x}{\mathrm{I}, \mathrm{i}}\right)=1 .$$这一观察结果，无论模型的解释能力如何，都是正确的，表明对a的变量选择 $1 \mathrm{D}$ 回归模型(1.11)原理简单。的每一个值 $j=1,2, \ldots, p-1$ 非平凡预测器，跟踪子集 $I$ 提供最大的corr值(ESP,ESP $(I))$。任何这种相关性较高的子集都值得仔细研究和考虑

## 统计代写|线性回归代写线性回归代考|其他问题

.

$1 \mathrm{D}$回归模型为许多最常用的回归模型提供了统一的框架。通过将模型写成充分的预测因子$S P=h(\boldsymbol{x})$，可以简洁地解释对所有$1 \mathrm{D}$回归模型有效的许多重要主题。例如，上一节介绍了变量选择，可以使用(1.14)式来激励是否可以使用简化模型代替完整模型的检验。同样，充分预测器可以用来统一系数的解释和解释包含相互作用和因素的模型。对一维模型(1.11)中系数的一种解释是，当所有其他预测变量$x_1, \ldots, x_{i-1}, x_{i+1}, \ldots, x_p$保持固定时，$\beta_i$是与$x_i$的单位增长相关的SP的变化率。通过$S P=\alpha+\boldsymbol{\beta}^T \boldsymbol{x}=\alpha+\beta_1 x_1+\cdots+\beta_p x_p$表示一个模型。那么
$$\beta_i=\frac{\partial S P}{\partial x_i} \text { for } \mathrm{i}=1, \ldots, \mathrm{p} .$$

$$\frac{d S P}{d x}=\beta_1+2 \beta_2 x .$$
$\beta_i$的解释随模型有两种变化。首先，随着术语从SP中添加和删除，解释会发生变化。因此，对于模型$S P=\alpha+\beta_1 x_1$和$S P=\alpha+\beta_1 x_1+\beta_2 x_2$, $\beta_1$的解释是不同的。其次，随着模型的参数或半参数形式的变化，解释也随之变化。

## 统计代写|线性回归代写linear regression代考|STAT6450

## 统计代写|线性回归代写linear regression代考|Some Regression Models

In data analysis, an investigator is presented with a problem and data from some population. The population might be the collection of all possible outcomes from an experiment while the problem might be predicting a future value of the response variable $Y$ or summarizing the relationship between $Y$ and the $p \times 1$ vector of predictor variables $\boldsymbol{x}$. A statistical model is used to provide a useful approximation to some of the important underlying characteristics of the population which generated the data. Many of the most used models for 1D regression, defined below, are families of conditional distributions $Y \mid \boldsymbol{x}=\boldsymbol{x}_o$ indexed by $\boldsymbol{x}=\boldsymbol{x}_o$. A 1D regression model is a parametric model if the conditional distribution is completely specified except for a fixed finite number of parameters, otherwise, the 1D model is a semiparametric model. GLMs and GAMs, defined below, are covered in Chapter 13.

Definition 1.1. Regression investigates how the response variable $Y$ changes with the value of a $p \times 1$ vector $\boldsymbol{x}$ of predictors. Often this conditional distribution $Y \mid \boldsymbol{x}$ is described by a $1 D$ regression model, where $Y$ is conditionally independent of $\boldsymbol{x}$ given the sufficient predictor $S P=h(\boldsymbol{x})$, written
$$Y \Perp x \mid S P \text { or } \mathrm{Y} \Perp \boldsymbol{x} \mid \mathrm{h}(\boldsymbol{x}) \text {, }$$
where the real valued function $h: \mathbb{R}^p \rightarrow \mathbb{R}$. The estimated sufficient predictor $\mathrm{ESP}=\hat{h}(\boldsymbol{x})$. An important special case is a model with a linear predictor $h(\boldsymbol{x})=\alpha+\boldsymbol{\beta}^T \boldsymbol{x}$ where ESP $=\hat{\alpha}+\hat{\boldsymbol{\beta}}^T \boldsymbol{x}$. This class of models includes the generalized linear model (GLM). Another important special case is a generalized additive model (GAM), where $Y$ is independent of $\boldsymbol{x}=\left(x_1, \ldots, x_p\right)^T$ given the additive predictor $A P=\alpha+\sum_{j=1}^p S_j\left(x_j\right)$ for some (usually unknown) functions $S_j$. The estimated additive predictor $\mathrm{EAP}=\mathrm{ESP}=\hat{\alpha}+\sum_{j=1}^p \hat{S}_j\left(x_j\right)$.

Notation: In this text, a plot of $x$ versus $Y$ will have $x$ on the horizontal axis, and $Y$ on the vertical axis.

Plots are extremely important for regression. When $p=1, x$ is both a sufficient predictor and an estimated sufficient predictor. So a plot of $x$ versus $Y$ is both a sufficient summary plot and a response plot. Usually the SP is unknown, so only the response plot can be made. The response plot will be extremely useful for checking the goodness of fit of the 1D regression model.
Definition 1.2. A sufficient summary plot is a plot of the SP versus $Y$. An estimated sufficient summary plot (ESSP) or response plot is a plot of the ESP versus $Y$.

## 统计代写|线性回归代写linear regression代考|Multiple Linear Regression

Suppose that the response variable $Y$ is quantitative and that at least one predictor variable $x_i$ is quantitative. Then the multiple linear regression (MLR) model is often a very useful model. For the MLR model,
$$Y_i=\alpha+x_{i, 1} \beta_1+x_{i, 2} \beta_2+\cdots+x_{i, p} \beta_p+e_i=\alpha+\boldsymbol{x}_i^T \boldsymbol{\beta}+e_i=\alpha+\boldsymbol{\beta}^T \boldsymbol{x}_i+e_i \text { (1.9) }$$
for $i=1, \ldots, n$. Here $Y_i$ is the response variable, $\boldsymbol{x}_i$ is a $p \times 1$ vector of nontrivial predictors, $\alpha$ is an unknown constant, $\boldsymbol{\beta}$ is a $p \times 1$ vector of unknown coefficients, and $e_i$ is a random variable called the error.

The Gaussian or normal MLR model makes the additional assumption that the errors $e_i$ are iid $N\left(0, \sigma^2\right)$ random variables. This model can also be written as $Y-\alpha+\beta^T x+e$ where $e \sim N\left(0, \sigma^2\right)$, or $Y \mid x \sim N\left(\alpha+\beta^T x, \sigma^2\right)$, or $Y \mid x \sim$ $N\left(S P, \sigma^2\right)$, or $Y \mid S P \sim N\left(S P, \sigma^2\right)$. The normal MLR model is a parametric model since, given $\boldsymbol{x}$, the family of conditional distributions is completely specified by the parameters $\alpha, \boldsymbol{\beta}$, and $\sigma^2$. Since $Y \mid S P \sim N\left(S P, \sigma^2\right)$, the conditional mean function $E(Y \mid S P) \equiv M(S P)=\mu(S P)=S P=\alpha+\beta^T \boldsymbol{x}$. The MLR model is discussed in detail in Chapters 2,3 , and 4.

A sufficient summary plot (SSP) of the sufficient predictor $S P=\alpha+\boldsymbol{\beta}^T \boldsymbol{x}_i$ versus the response variable $Y_i$ with the mean function added as a visual aid can be useful for describing the multiple linear regression model. This plot can not be used for real data since $\alpha$ and $\boldsymbol{\beta}$ are unknown. To make Figure 1.1, the artificial data used $n=100$ cases with $k=5$ nontrivial predictors. The data used $\alpha=-1, \boldsymbol{\beta}=(1,2,3,0,0)^T, e_i \sim N(0,1)$ and $\boldsymbol{x}$ from a multivariate normal distribution $\boldsymbol{x} \sim N_5(\mathbf{0}, \boldsymbol{I})$.

In Figure 1.1, notice that the identity line with unit slope and zero intercept corresponds to the mean function since the identity line is the line $Y=S P=\alpha+\boldsymbol{\beta}^T \boldsymbol{x}=\mu(S P)=E(Y \mid S P)$. The vertical deviation of $Y_i$ from the line is equal to $e_i=Y_i-\left(\alpha+\boldsymbol{\beta}^T \boldsymbol{x}_i\right)$. For a given value of $S P$, $Y_i \sim N\left(S P, \sigma^2\right)$. For the artificial data, $\sigma^2=1$. Hence if $S P=0$ then $Y_i \sim N(0,1)$, and if $S P=5$ then $Y_i \sim N(5,1)$. Imagine superimposing the $N\left(S P, \sigma^2\right)$ curve at various values of $S P$. If all of the curves were shown, then the plot would resemble a road through a tunnel. For the artificial data, each $Y_i$ is a sample of size 1 from the normal curve with mean $\alpha+\boldsymbol{\beta}^T \boldsymbol{x}_i$.

# 线性回归代写

## 统计代写|线性回归代写线性回归代考|一些回归模型

$$Y \Perp x \mid S P \text { or } \mathrm{Y} \Perp \boldsymbol{x} \mid \mathrm{h}(\boldsymbol{x}) \text {, }$$
，其中实值函数$h: \mathbb{R}^p \rightarrow \mathbb{R}$。估计的充分预测器$\mathrm{ESP}=\hat{h}(\boldsymbol{x})$。一个重要的特例是具有线性预测器$h(\boldsymbol{x})=\alpha+\boldsymbol{\beta}^T \boldsymbol{x}$的模型，其中ESP $=\hat{\alpha}+\hat{\boldsymbol{\beta}}^T \boldsymbol{x}$。这类模型包括广义线性模型(GLM)。另一个重要的特例是广义相加模型(GAM)，其中$Y$独立于$\boldsymbol{x}=\left(x_1, \ldots, x_p\right)^T$，对于某些(通常未知的)函数$S_j$，已知相加预测器$A P=\alpha+\sum_{j=1}^p S_j\left(x_j\right)$。估计的相加预测器$\mathrm{EAP}=\mathrm{ESP}=\hat{\alpha}+\sum_{j=1}^p \hat{S}_j\left(x_j\right)$ .

## 统计代写|线性回归代写线性回归代考|多元线性回归

$$Y_i=\alpha+x_{i, 1} \beta_1+x_{i, 2} \beta_2+\cdots+x_{i, p} \beta_p+e_i=\alpha+\boldsymbol{x}_i^T \boldsymbol{\beta}+e_i=\alpha+\boldsymbol{\beta}^T \boldsymbol{x}_i+e_i \text { (1.9) }$$
For $i=1, \ldots, n$。这里$Y_i$是响应变量，$\boldsymbol{x}_i$是一个非平凡预测因子的$p \times 1$向量，$\alpha$是一个未知常数，$\boldsymbol{\beta}$是一个未知系数的$p \times 1$向量，$e_i$是一个叫做误差的随机变量

$$Y \mid\left(W=a_i\right) \sim f_Z\left(y-\mu_i\right)$$

$$Y \mid\left(W=a_i\right) \sim N\left(\mu_i, \sigma^2\right)$$

$$E(\boldsymbol{Y})=\left(E\left(Y_1\right), \ldots, E\left(Y_n\right)\right)^T$$
，前提是$i=1, \ldots, n$存在$E\left(Y_i\right)$。否则期望的值不存在。$n \times n$总体协方差矩阵
$$\operatorname{Cov}(\boldsymbol{Y})=E\left[(\boldsymbol{Y}-E(\boldsymbol{Y}))(\boldsymbol{Y}-E(\boldsymbol{Y}))^T\right]=\left(\sigma_{i, j}\right)$$
，其中$\operatorname{Cov}(\boldsymbol{Y})$的$i j$条目是$\operatorname{Cov}\left(Y_i, Y_j\right)=\sigma_{i, j}$，前提是每个$\sigma_{i, j}$都存在。 .否则$\operatorname{Cov}(\boldsymbol{Y})$不存在

$$E(\boldsymbol{a}+\boldsymbol{Y})=\boldsymbol{a}+E(\boldsymbol{Y}) \text { and } E(\boldsymbol{Y}+\boldsymbol{Z})=E(\boldsymbol{Y})+E(\boldsymbol{Z})$$和
$$E(\boldsymbol{A} \boldsymbol{Y})=\boldsymbol{A} E(\boldsymbol{Y}) \text { and } E(\boldsymbol{A} \boldsymbol{Y} \boldsymbol{B})=\boldsymbol{A} E(\boldsymbol{Y}) \boldsymbol{B}$$

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## 统计代写|线性回归代写linear regression代考|Math5386

## 统计代写|线性回归代写linear regression代考|The MLR Model

Definition 2.1. The response variable is the variable that you want to predict. The predictor variables are the variables used to predict the response variable.

Notation. In this text the response variable will usually be denoted by $Y$ and the $p$ predictor variables will often be denoted by $x_1, \ldots, x_p$. The response variable is also called the dependent variable while the predictor variables are also called independent variables, explanatory variables, carriers, or covariates. Often the predictor variables will be collected in a vector $\boldsymbol{x}$. Then $\boldsymbol{x}^T$ is the transpose of $\boldsymbol{x}$.

Definition 2.2. Regression is the study of the conditional distribution $Y \mid \boldsymbol{x}$ of the response variable $Y$ given the vector of predictors $\boldsymbol{x}=$ $\left(x_1, \ldots, x_p\right)^T$.

Definition 2.3. A quantitative variable takes on numerical values while a qualitative variable takes on categorical values.

Example 2.1. Archeologists and crime scene investigators sometimes want to predict the height of a person from partial skeletal remains. A model for prediction can be built from nearly complete skeletons or from living humans, depending on the population of interest (e.g., ancient Egyptians or modern US citizens). The response variable $Y$ is height and the predictor variables might be $x_1 \equiv 1, x_2=$ femur length, and $x_3=$ ulna length.

## 统计代写|线性回归代写linear regression代考|Checking Goodness of Fit

It is crucial to realize that an MLR model is not necessarily a useful model for the data, even if the data set consists of a response variable and several predictor variables. For example, a nonlinear regression model or a much more complicated model may be needed. Chapters 1 and 13 describe several alternative models. Let $p$ be the number of predictors and $n$ the number of cases. Assume that $n \geq 5 p$, then plots can be used to check whether the MLR model is useful for studying the data. This technique is known as checking the goodness of fit of the MLR model.

Notation. Plots will he used to simplify regression analysis, and in this text a plot of $W$ versus $Z$ uses $W$ on the horizontal axis and $Z$ on the vertical axis.

Definition 2.10. A scatterplot of $X$ versus $Y$ is a plot of $X$ versus $Y$ and is used to visualize the conditional distribution $Y \mid X$ of $Y$ given $X$.

Definition 2.11. A response plot is a plot of a variable $w_i$ versus $Y_i$. Typically $w_i$ is a linear combination of the predictors: $w_i=\boldsymbol{x}_i^T \boldsymbol{\eta}$ where $\boldsymbol{\eta}$ is a known $p \times 1$ vèctor. Thé most commonly usèd responsè plot is a plot of the fitted values $\widehat{Y}_i$ versus the response $Y_i$.

Proposition 2.1. Suppose that the regression estimator $\boldsymbol{b}$ of $\boldsymbol{\beta}$ is used to find the residuals $r_i \equiv r_i(\boldsymbol{b})$ and the fitted values $\widehat{Y}_i \equiv \widehat{Y}_i(\boldsymbol{b})=\boldsymbol{x}_i^T \boldsymbol{b}$. Then in the response plot of $\widehat{Y}_i$ versus $Y_i$, the vertical deviations from the identity line (that has unit slope and zero intercept) are the residuals $r_i(\boldsymbol{b})$.

Proof. The identity line in the response plot is $Y=\boldsymbol{x}^T \boldsymbol{b}$. Hence the vertical deviation is $Y_i-\boldsymbol{x}_i^T \boldsymbol{b}-r_i(\boldsymbol{b}) . \sqcap$

Definition 2.12. A residual plot is a plot of a variable $w_i$ versus the residuals $r_i$. The most commonly used residual plot is a plot of $\hat{Y}_i$ versus $r_i$.
Notation: For MLR, “the residual plot” will often mean the residual plot of $\hat{Y}_i$ vérsus $r_i$, and “thé ressponsé plot” will ofteñ méan thè plot of $\hat{Y}_i$ versus $Y_i$.

## 统计代写|线性回归代写linear regression代考|MATH839

## 统计代写|线性回归代写linear regression代考|Variable Selection

A standard problem in $1 \mathrm{D}$ regression is variable selection, also called subset or model selection. Assume that the $1 \mathrm{D}$ regression model uses a linear predictor
$$Y \Perp \boldsymbol{x} \mid\left(\alpha+\beta^T \boldsymbol{x}\right),$$
that a constant $\alpha$ is always included, that $\boldsymbol{x}=\left(x_1, \ldots, x_{p-1}\right)^T$ are the $p-1$ nontrivial predictors, and that the $n \times p$ matrix $\boldsymbol{X}$ with $i$ th row $\left(1, \boldsymbol{x}_i^T\right)$ has full rank $p$. Then variable selection is a search for a subset of predictor variables that can be deleted without important loss of information.

To clarify ideas, assume that there exists a subset $S$ of predictor variables such that if $x_S$ is in the 1D model, then none of the other predictors are needed in the model. Write $E$ for these (‘extraneous’) variables not in $S$, partitioning $\boldsymbol{x}=\left(\boldsymbol{x}_S^T, \boldsymbol{x}_E^T\right)^T$. Then
$$S P=\alpha+\boldsymbol{\beta}^T \boldsymbol{x}=\alpha+\boldsymbol{\beta}_S^T \boldsymbol{x}_S+\boldsymbol{\beta}_E^T \boldsymbol{x}_E=\alpha+\boldsymbol{\beta}_S^T \boldsymbol{x}_S$$
The extraneous terms that can be eliminated given that the subset $S$ is in the model have zero coefficients: $\boldsymbol{\beta}_E=\mathbf{0}$.

Now suppose that $I$ is a candidate subset of predictors, that $S \subseteq I$ and that $O$ is the set of predictors not in $I$. Then
$$S P=\alpha+\boldsymbol{\beta}^T \boldsymbol{x}=\alpha+\boldsymbol{\beta}S^T \boldsymbol{x}_S=\alpha+\boldsymbol{\beta}_S^T \boldsymbol{x}_S+\boldsymbol{\beta}{(I / S)}^T \boldsymbol{x}{I / S}+\mathbf{0}^T \boldsymbol{x}_O=\alpha+\boldsymbol{\beta}_I^T \boldsymbol{x}_I,$$ where $\boldsymbol{x}{I / S}$ denotes the predictors in $I$ that are not in $S$. Since this is true regardless of the values of the predictors, $\boldsymbol{\beta}O=0$ if $S \subseteq I$. Hence for any subset $I$ that includes all relevant predictors, the population correlation $$\operatorname{corr}\left(\alpha+\beta^{\mathrm{T}} x{\mathrm{i}}, \alpha+\beta_{\mathrm{I}}^{\mathrm{T}} x_{\mathrm{I}, \mathrm{i}}\right)=1 .$$
This observation, which is true regardless of the explanatory power of the model, suggests that variable selection for a $1 \mathrm{D}$ regression model (1.11) is simple in principle. For each value of $j=1,2, \ldots, p-1$ nontrivial predictors, keep track of subsets $I$ that provide the largest values of corr( $\operatorname{ESP}, \operatorname{ESP}(I))$.

## 统计代写|线性回归代写linear regression代考|Interpretation of Coefficients

One interpretation of the coefficients in a $1 \mathrm{D}$ model (1.11) is that $\beta_i$ is the rate of change in the $\mathrm{SP}$ associated with a unit increase in $x_i$ when all other predictor variables $x_1, \ldots, x_{i-1}, x_{i+1}, \ldots, x_p$ are held fixed. Denote a model by $S P=\alpha+\beta^T x=\alpha+\beta_1 x_1+\cdots+\beta_p x_p$. Then
$$\beta_i=\frac{\partial S P}{\partial x_i} \text { for } \mathrm{i}=1, \ldots, \mathrm{p} .$$
Of course, holding all other variables fixed while changing $x_i$ may not be possible. For example, if $x_1=x, x_2=x^2$ and $S P=\alpha+\beta_1 x+\beta_2 x^2$, then $x_2$ cannot be held fixed when $x_1$ increases by one unit, but
$$\frac{d S P}{d x}=\beta_1+2 \beta_2 x .$$
The interpretation of $\beta_i$ changes with the model in two ways. First, the interpretation changes as terms are added and deleted from the SP. Hence the interpretation of $\beta_1$ differs for models $S P=\alpha+\beta_1 x_1$ and $S P=\alpha+\beta_1 x_1+\beta_2 x_2$. Secondly, the interpretation changes as the parametric or semiparametric form of the model changes. For multiple linear regression, $E(Y \mid S P)=S P$ and an increase in one unit of $x_i$ increases the conditional expectation by $\beta_i$. For binary logistic regression,
$$E(Y \mid S P)=\rho(S P)=\frac{\exp (S P)}{1+\exp (S P)},$$
and the change in the conditional expectation associated with a one unit increase in $x_i$ is more complex.

## 统计代写|线性回归代写linear regression代考|Variable Selection

$$Y \backslash \operatorname{Perp} \boldsymbol{x} \mid\left(\alpha+\beta^T \boldsymbol{x}\right),$$

## 统计代写|线性回归代写linear regression代考|Interpretation of Coefficients

$$\frac{d S P}{d x}=\beta_1+2 \beta_2 x .$$

$$E(Y \mid S P)=\rho(S P)=\frac{\exp (S P)}{1+\exp (S P)}$$

## 统计代写|线性回归代写linear regression代考|STAT6450

## 统计代写|线性回归代写linear regression代考|Some Regression Models

In data analysis, an investigator is presented with a problem and data from some population. The population might be the collection of all possible outcomes from an experiment while the problem might be predicting a future value of the response variable $Y$ or summarizing the relationship between $Y$ and the $p \times 1$ vector of predictor variables $\boldsymbol{x}$. A statistical model is used to provide a useful approximation to some of the important underlying characteristics of the population which generated the data. Many of the most used models for 1D regression, defined below, are families of conditional distributions $Y \mid \boldsymbol{x}=\boldsymbol{x}_o$ indexed by $\boldsymbol{x}=\boldsymbol{x}_o$. A $1 \mathrm{D}$ regression model is a parametric model if the conditional distribution is completely specified except for a fixed finite number of parameters, otherwise, the 1D model is a semiparametric model. GLMs and GAMs, defined below, are covered in Chapter $13 .$

Definition 1.1. Regression investigates how the response variable $Y$ changes with the value of a $p \times 1$ vector $x$ of predictors. Often this conditional distribution $Y \mid \boldsymbol{x}$ is described by a $1 D$ regression model, where $Y$ is conditionally independent of $\boldsymbol{x}$ given the sufficient predictor $S P=h(\boldsymbol{x})$, written
$$Y \Perp x \mid S P \text { or } \mathrm{Y} \Perp \boldsymbol{x} \mid \mathrm{h}(\boldsymbol{x}),$$
where the real valued function $h: \mathbb{R}^p \rightarrow \mathbb{R}$. The estimated sufficient predictor $\mathrm{ESP}=\hat{h}(\boldsymbol{x})$. An important special case is a model with a linear predictor $h(\boldsymbol{x})=\alpha+\boldsymbol{\beta}^T \boldsymbol{x}$ where $\mathrm{ESP}=\hat{\alpha}+\hat{\boldsymbol{\beta}}^T \boldsymbol{x}$. This class of models includes the generalized linear model (GLM). Another important special case is a generalized additive model (GAM), where $Y$ is independent of $\boldsymbol{x}=\left(x_1, \ldots, x_p\right)^T$ given the additive predictor $A P=\alpha+\sum_{j=1}^p S_j\left(x_j\right)$ for some (usually unknown) functions $S_j$. The estimated additive predictor $\mathrm{EAP}=\mathrm{ESP}=\hat{\alpha}+\sum_{j=1}^p \hat{S}_j\left(x_j\right)$.

## 统计代写|线性回归代写linear regression代考|Multiple Linear Regression

Suppose that the response variable $Y$ is quantitative and that at least one predictor variable $x_i$ is quantitative. Then the multiple linear regression (MLR) model is often a very useful model. For the MLR model,
$$Y_i=\alpha+x_{i, 1} \beta_1+x_{i, 2} \beta_2+\cdots+x_{i, p} \beta_p+e_i=\alpha+\boldsymbol{x}_i^T \boldsymbol{\beta}+e_i=\alpha+\boldsymbol{\beta}^T x_i+e_i(1.9)$$
for $i=1, \ldots, n$. Here $Y_i$ is the response variable, $\boldsymbol{x}_i$ is a $p \times 1$ vector of nontrivial predictors, $\alpha$ is an unknown constant, $\boldsymbol{\beta}$ is a $p \times 1$ vector of unknown coefficients, and $e_i$ is a random variable called the error.

The Gaussian or normal MLR model makes the additional assumption that the errors $e_i$ are iid $N\left(0, \sigma^2\right)$ random variables. This model can also he written as $Y=\alpha+\boldsymbol{\beta}^T \boldsymbol{x}+e$ where $e \sim N\left(0, \sigma^2\right)$, or $Y \mid \boldsymbol{x} \sim N\left(\alpha+\boldsymbol{\beta}^T \boldsymbol{x}, \sigma^2\right)$, or $Y \mid \boldsymbol{x} \sim$ $N\left(S P, \sigma^2\right)$, or $Y \mid S P \sim N\left(S P, \sigma^2\right)$. The normal MLR model is a parametric model since, given $\boldsymbol{x}$, the family of conditional distributions is completely specified by the parameters $\alpha, \boldsymbol{\beta}$, and $\sigma^2$. Since $Y \mid S P \sim N\left(S P, \sigma^2\right)$, the conditional mean function $E(Y \mid S P) \equiv M(S P)=\mu(S P)=S P=\alpha+\boldsymbol{\beta}^T \boldsymbol{x}$. The MLR model is discussed in detail in Chapters 2,3 , and $4 .$

## 统计代写|线性回归代写linear regression代考|Multiple Linear Regression

$$Y_i=\alpha+x_{i, 1} \beta_1+x_{i, 2} \beta_2+\cdots+x_{i, p} \beta_p+e_i=\alpha+\boldsymbol{x}_i^T \boldsymbol{\beta}+e_i=\alpha+\boldsymbol{\beta}^T x_i+e_i(1.9)$$

$N\left(S P, \sigma^2\right)$ ， 或者 $Y \mid S P \sim N\left(S P, \sigma^2\right)$. 正常的 MLR 模型是参数模型，因为，给定 $\boldsymbol{x}$ ，条件分布族完全由 参数指定 $\alpha, \boldsymbol{\beta}$ ，和 $\sigma^2$. 自从 $Y \mid S P \sim N\left(S P, \sigma^2\right)$, 条件均值函数
$E(Y \mid S P) \equiv M(S P)=\mu(S P)=S P=\alpha+\boldsymbol{\beta}^T \boldsymbol{x} . \mathrm{MLR}$ 模型在第 2,3 章中详细讨论，以及4.

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

