## 统计代写|线性回归代写linear regression代考|Variable Selection

A standard problem in $1 \mathrm{D}$ regression is variable selection, also called subset or model selection. Assume that the $1 \mathrm{D}$ regression model uses a linear predictor
$$Y \Perp \boldsymbol{x} \mid\left(\alpha+\beta^T \boldsymbol{x}\right),$$
that a constant $\alpha$ is always included, that $\boldsymbol{x}=\left(x_1, \ldots, x_{p-1}\right)^T$ are the $p-1$ nontrivial predictors, and that the $n \times p$ matrix $\boldsymbol{X}$ with $i$ th row $\left(1, \boldsymbol{x}_i^T\right)$ has full rank $p$. Then variable selection is a search for a subset of predictor variables that can be deleted without important loss of information.

To clarify ideas, assume that there exists a subset $S$ of predictor variables such that if $x_S$ is in the 1D model, then none of the other predictors are needed in the model. Write $E$ for these (‘extraneous’) variables not in $S$, partitioning $\boldsymbol{x}=\left(\boldsymbol{x}_S^T, \boldsymbol{x}_E^T\right)^T$. Then
$$S P=\alpha+\boldsymbol{\beta}^T \boldsymbol{x}=\alpha+\boldsymbol{\beta}_S^T \boldsymbol{x}_S+\boldsymbol{\beta}_E^T \boldsymbol{x}_E=\alpha+\boldsymbol{\beta}_S^T \boldsymbol{x}_S$$
The extraneous terms that can be eliminated given that the subset $S$ is in the model have zero coefficients: $\boldsymbol{\beta}_E=\mathbf{0}$.

Now suppose that $I$ is a candidate subset of predictors, that $S \subseteq I$ and that $O$ is the set of predictors not in $I$. Then
$$S P=\alpha+\boldsymbol{\beta}^T \boldsymbol{x}=\alpha+\boldsymbol{\beta}S^T \boldsymbol{x}_S=\alpha+\boldsymbol{\beta}_S^T \boldsymbol{x}_S+\boldsymbol{\beta}{(I / S)}^T \boldsymbol{x}{I / S}+\mathbf{0}^T \boldsymbol{x}_O=\alpha+\boldsymbol{\beta}_I^T \boldsymbol{x}_I,$$ where $\boldsymbol{x}{I / S}$ denotes the predictors in $I$ that are not in $S$. Since this is true regardless of the values of the predictors, $\boldsymbol{\beta}O=0$ if $S \subseteq I$. Hence for any subset $I$ that includes all relevant predictors, the population correlation $$\operatorname{corr}\left(\alpha+\beta^{\mathrm{T}} x{\mathrm{i}}, \alpha+\beta_{\mathrm{I}}^{\mathrm{T}} x_{\mathrm{I}, \mathrm{i}}\right)=1 .$$
This observation, which is true regardless of the explanatory power of the model, suggests that variable selection for a $1 \mathrm{D}$ regression model (1.11) is simple in principle. For each value of $j=1,2, \ldots, p-1$ nontrivial predictors, keep track of subsets $I$ that provide the largest values of corr( $\operatorname{ESP}, \operatorname{ESP}(I))$.

## 统计代写|线性回归代写linear regression代考|Interpretation of Coefficients

One interpretation of the coefficients in a $1 \mathrm{D}$ model (1.11) is that $\beta_i$ is the rate of change in the $\mathrm{SP}$ associated with a unit increase in $x_i$ when all other predictor variables $x_1, \ldots, x_{i-1}, x_{i+1}, \ldots, x_p$ are held fixed. Denote a model by $S P=\alpha+\beta^T x=\alpha+\beta_1 x_1+\cdots+\beta_p x_p$. Then
$$\beta_i=\frac{\partial S P}{\partial x_i} \text { for } \mathrm{i}=1, \ldots, \mathrm{p} .$$
Of course, holding all other variables fixed while changing $x_i$ may not be possible. For example, if $x_1=x, x_2=x^2$ and $S P=\alpha+\beta_1 x+\beta_2 x^2$, then $x_2$ cannot be held fixed when $x_1$ increases by one unit, but
$$\frac{d S P}{d x}=\beta_1+2 \beta_2 x .$$
The interpretation of $\beta_i$ changes with the model in two ways. First, the interpretation changes as terms are added and deleted from the SP. Hence the interpretation of $\beta_1$ differs for models $S P=\alpha+\beta_1 x_1$ and $S P=\alpha+\beta_1 x_1+\beta_2 x_2$. Secondly, the interpretation changes as the parametric or semiparametric form of the model changes. For multiple linear regression, $E(Y \mid S P)=S P$ and an increase in one unit of $x_i$ increases the conditional expectation by $\beta_i$. For binary logistic regression,
$$E(Y \mid S P)=\rho(S P)=\frac{\exp (S P)}{1+\exp (S P)},$$
and the change in the conditional expectation associated with a one unit increase in $x_i$ is more complex.

