## 统计代写|广义线性模型代写generalized linear model代考|Challenger Disaster Example

In January 1986, the space shuttle Challenger exploded shortly after launch. An investigation was launched into the cause of the crash and attention focused on the rubber O-ring seals in the rocket boosters. At lower temperatures, rubber becomes more brittle and is a less effective sealant. At the time of the launch, the temperature was $31^{\circ} \mathrm{F}$. Could the failure of the O-rings have been predicted? In the 23 previous shuttle missions for which data exists, some evidence of damage due to blow by and erosion was recorded on some O-rings. Each shuttle had two boosters, each with three O-rings. For each mission, we know the number of $\mathrm{O}$-rings out of six showing some damage and the launch temperature. This is a simplification of the problem-see Dalal, Fowlkes, and Hoadley (1989) for more details.

Let’s start our analysis with R. For help in obtaining R and installing the necessary add-on packages and datasets, please see Appendix B. First we load the data. To do this, you will first need to load the faraway package using the library command as seen in here. You will need to do this in every session that you run examples from this book. If you forget, you will receive a warning message about the data not being found. We then plot the proportion of damaged O-rings against temperature in Figure 2.1:
$>$ library (faraway)
$>$ data (orings)
$>$ plot (damage/ $6 \sim$ temp, orings, $x l i m=c(25,85)$, ylim $=$
$c(0,1)$,
$\quad x l a b=$ “Temperature”, ylab=”Prob of damage”)
We are interested in how the probability of failure in a given O-ring is related to the launch temperature and predicting that probability when the temperature is $31^{\circ} \mathrm{F}$. A naive approach, based on linear models, simply fits a line to this data:
\begin{aligned} & >\text { lmod }<-1 \mathrm{~lm} \text { (damage } / 6 \sim \text { temp, orings) } \\ & >\text { abline (lmod) } \end{aligned}
The fit is shown in Figure 2.1. There are several problems with this approach. Most obviously from the plot, it can predict probabilities greater than one or less than zero. One might suggest truncating predictions outside the range to zero or one as appropriate, but it does not seem eredible that these probabilities would be exactly zero or one, in this particular example or many others.

## 统计代写|广义线性模型代写generalized linear model代考|Binomial Regression Model

Suppose the response variable $Y_i$ for $i=1, \ldots, n_i$ is binomially distributed $B\left(n_i, p_i\right)$ so that:
$$P\left(Y_i=y_i\right)=\left(\begin{array}{c} n_i \ y_i \end{array}\right) p_i^{y_i}\left(1-p_i\right)^{n_i-y_i}$$
We further assume that the $Y_i$ are independent. The individual trials that compose the response $Y_i$ are all subject to the same $q$ predictors $\left(x_{i 1}, \ldots, x_{i q}\right)$. The group of trials is known as a covariate class. We need a model that describes the relationship of $x_1, \ldots, x_q$ to $p$. Following the linear model approach, we construct a linear predictor:
$$\eta_i=\beta_0+\beta_1 x_{i 1}+\ldots+\beta_q x_{i q}$$

Since the linear predictor can accommodate quantitative and qualitative predictors with the use of dummy variables and also allows for transformations and combinations of the original predictors, it is very flexible and yet retains interpretability. This notion that we can express the effect of the predictors on the response solely through the linear predictor is important. The idea can be extended to models for other types of response and is one of the defining features of the wider class of generalized linear models (GLMs) discussed in Chapter 6.

We have already seen above that setting $\eta_i=p_i$ is not appropriate because we require $0 \leq p_i \leq 1$. Instead we shall use a link function $g$ such that $\eta i=g\left(p_i\right)$. For this application, we shall need $g$ to be monotone and be such that $0 \leq \mathrm{g}^{-1}(\eta) \leq 1$ for any $\eta$. There are three common choices:

1. Logit: $\eta=\log (p /(1-p))$.
2. Probit: $\eta=\Phi^{-1}(p)$ where $\Phi^{-1}$ is the inverse normal cumulative distribution function.
3. Complementary $\log -\log : \eta=\log (-\log (1-p))$.
The idea of the link function is also one of the central ideas of generalized linear models. It is used to link the linear predictor to the mean of the response in the wider class of models.

We will compare these three choices of link function later, but first we estimate the parameters of the model. We shall use the method of maximum likelihood; see Appendix A for a brief introduction to this method. The log-likelihood is given by:
$$l(\beta)=\sum_{i=1}^n\left[y_i \eta_i-n_i \log \left(1+e_i^\eta\right)+\log \left(\begin{array}{l} n_i \ y_i \end{array}\right)\right]$$
We can maximize this to obtain the maximum likelihood estimates $\hat{\beta}$ and use the standard theory to obtain approximate standard errors. An algorithm to perform the maximization will be discussed in Chapter 6 .

# 广义线性模型代考

## 统计代写|广义线性模型代写generalized linear model代考|Challenger Disaster Example

1986 年 1 月，挑战者号航天飞机在发射后不久就爆炸了。对坠机原因展开了调查，并将注意 力集中在火箭助推器中的橡胶 $O$ 形密封圈上。在较低的温度下，橡胶变得更脆并且是一种效 果较差的密封剂。发射时的温度是 $31^{\circ} \mathrm{F}$. 是否可以预测 $\mathrm{O}$ 形圈的失效? 在现有数据的 23 次航 天飞机任务中，一些 $O$ 形环上记录了由于吹过和侵蚀造成的损坏证据。每架航天飞机都有两 个助推器，每个助推器都有三个 $O$ 形圈。对于每个任务，我们知道有多少 0 -六分之一的环显 示了一些损坏和发射温度。这是问题的简化一一有关详细信息，请参阅 Dalal、Fowlkes 和 Hoadley (1989)。

$>$ 图书馆 (遥远)

$>$ 情节 (伤害 $6 \sim$ 温度, orings, $x$ lim $=c(25,85)$ ， 优越的 $=$ $c(0,1)$,
$x l a b=$ “Temperature”, ylab=”Prob of damage”)

\begin{aligned} & >\operatorname{lmod}<-1 \operatorname{lm}(\text { damage } / 6 \sim \text { temp, orings) } \\ & >\text { abline }(\operatorname{lmod}) \end{aligned}

## 统计代写|广义线性模型代写generalized linear model代考|Binomial Regression Model

$$P\left(Y_i=y_i\right)=\left(n_i y_i\right) p_i^{y_i}\left(1-p_i\right)^{n_i-y_i}$$

$$\eta_i=\beta_0+\beta_1 x_{i 1}+\ldots+\beta_q x_{i q}$$

1. 登录: $\eta=\log (p /(1-p))$.
2. 概率: $\eta=\Phi^{-1}(p)$ 在哪里 $\Phi^{-1}$ 是逆正态累积分布函数。
3. 补充 $\log -\log : \eta=\log (-\log (1-p))$.
链㢺函数的思想也是广义线性模型的核心思想之一。它用于将线生预测变量链㢺到更广 泛的模型类别中的响应均值。

