## 统计代写|广义线性模型代写generalized linear model代考|Prospective and Retrospective Sampling

Consider the data shown in Table $4.1$ from a study on infant respiratory disease which shows the proportions of children developing bronchitis or pneumonia in their first year of life by type of feeding and sex, which may be found in Payne (1987):

\begin{tabular}{llll}
& Bottle Only & Some Breast with Supplement & Breast Only \
\hline Boys & $77 / 458$ & $19 / 147$ & $47 / 494$ \
Girls & $48 / 384$ & $16 / 127$ & $31 / 464$
\end{tabular}
Table $4.1$ Incidence of respiratory disease in infants to the age of 1 year.
We can recover the layout above with the proportions as follows:
data (babyfood, package=” faraway”)
xtabs (disease/ (disease + nondisease) $\sim$ sex + food, babyfood)
food
sex $\quad$ Bottle Breast Suppl
G1rl $0.125000 .066810 \quad 0.12598$
In prospective sampling, the predictors are fixed and then the outcome is observed. This is also called a cohort study. In the infant respiratory disease example shown in Table 4.1, we would select a sample of newborn girls and boys whose parents had chosen a particular method of feeding and then monitor them for their first year.

In retrospective sampling, the outcome is fixed and then the predictors are observed. This is also called a case-control study. Typically, we would find infants coming to a doctor with a respiratory disease in the first year and then record their sex and method of feeding. We would also obtain a sample of respiratory diseasefree infants and record their information. The method for obtaining the samples is important – we require that the probability of inclusion in the study is independent of the predictor values.

## 统计代写|广义线性模型代写generalized linear model代考|Prediction and Effective Doses

Sometimes we wish to predict the outcome for given values of the covariates. For binomial data this will mean estimating the probability of success. Given covariates $x_{0}$, the predicted response on the link scale is $\hat{\eta}=x_{0} \hat{\beta}$ with variance given by $x_{0}^{T}\left(X^{T} W X\right)^{-1} x_{0}$. Approximate confidence intervals may be obtained using a normal approximation. To get an answer in the probability scale, it will be necessary to transform back using the inverse of the link function. We predict the response for the insect data:
data (bliss, packagem” faraway”)
$1 \mathrm{mod}<-$ glm(cbind (dead, alive) conc, familymbinomial, data=bliss)
lmodsum <- summary (lmod)
We show how to predict the response at a dose of $2.5$ :
$x 0<-c(1,2.5)$
eta0 $<-\operatorname{sum}(x 0 * \operatorname{coe}(1 \mathrm{mod}))$
$1 \log i t(\mathrm{eta0})$
[1) $0.64129$
A $64 \%$ predicted chance of death at this dose – now compute a $95 \%$ confidence interval (CI) for this probability. First, extract the variance matrix of the coefficients:
(cm \&- lmodsum\$cov. unscaled) (Intercept)$\begin{array}{rr}\text { (Intercept) } & \text { conc } \ \text { conc } & -0.065823\end{array}$se <- sqrt$(t(x 0)$왛은$\mathrm{cm}$화화$x 0)$so the CI on the probability scale is: ilogit (c (eta0$-1.96 *$se, eta0$1.96 * \mathrm{se})$) [1)$0.534300 .73585$A more direct way of obtaining the same result is: predict (lmod, newdata data. frame (conc=2.5), se=$T$) [1]$0.58095$\$se.fit
[1] $0.2263$
1logit (c (0.58095-1.960.2263,0.58095+1.960.2263))
[1] $0.534300 .73585$
Note that in contrast to the linear regression situation, there is no distinction possible between confidence intervals for a future observation and those for the mean response. Now we try predicting the response probability at the low dose of $-5$ :
$x 0<-c(1,-5)$
se $<-\operatorname{sqrt}(t(x 0)$ 왛의 $\mathrm{cm} \mathrm{~ ㅇ}$
eta0 <- sum $(x 0 * 1 \mathrm{mod}$ scoef $)$
ilogit (c (eta0 -1.96*se, eta0 $0+1.96 * s e)$ )
[1) $2.3577 \mathrm{e}-053.6429 \mathrm{e}-03$

## 统计代写|广义线性模型代写generalized linear model代考|Matched Case-Control Studies

In a case-control study, we try to determine the effect of certain risk factors on the outcome. We understand that there are other confounding variables that may affect the outcome. One approach to dealing with these is to measure or record them, include them in the logistic regression model as appropriate and thereby control for

their effect. But this method requires that we model these confounding variables with the correct functional form. This may be difficult. Also, making an appropriate adjustment is problematic when the distribution of the confounding variables is quite different in the cases and controls. So we might consider an alternative where the confounding variables are explicitly adjusted for in the design.

In a matched case-control study, we match each case (diseased person, defective object, success, etc.) with one or more controls that have the same or similar values of some set of potential confounding variables. For example, if we have a 56-year-old, Hispanic male case, we try to match him with some number of controls who are also 56-year-old Hispanic males. This group would be called a matched set. Obviously, the more confounding variables one specifies, the more difficult it will be to make the matches. Loosening the matching requirements, for example, accepting controls who are 50-60 years old, might be necessary. Matching also gives us the possibility of adjusting for confounders that are difficult to measure. For example, suppose we suspect an environmental effect on the outcome. However, it is difficult to measure exposure, particularly when we may not know which substances are relevant. We could match subjects based on their place of residence or work. This would go some way to adjusting for the environmental effects.

Matched case-control studies also have some disadvantages apart from the difficulties of forming the matched sets. One loses the possibility of discovering the effects of the variables used to determine the matches. For example, if we match on sex, we will not be able to investigate a sex effect. Furthermore, the data will likely be far from a random sample of the population of interest. So although relative effects may be found, it may be difficult to generalize to the population.

Sometimes, cases are rare but controls are readily available. A $1: M$ design has $M$ controls for each case. $M$ is typically small and can even vary in size from matched set to matched set due to difficulties in finding matching controls and missing values. Each additional control yields a diminished return in terms of increased efficiency in estimating risk factors – it is usually not worth exceeding $M=5$.

## 统计代写|广义线性模型代写generalized linear model代考|Prospective and Retrospective Sampling

\begin{tabular}{llll} & 瓶装 & 一些含补充剂的乳房 & 仅乳房 \ \hline 男孩 & $77 / 458$ & $19 / 147$ & $47 / 494$ \ 女孩 & $48 / 384$ & $16 / 127$ & $31 / 464$ \end{表格}\begin{tabular}{llll} & 瓶装 & 一些含补充剂的乳房 & 仅乳房 \ \hline 男孩 & $77 / 458$ & $19 / 147$ & $47 / 494$ \ 女孩 & $48 / 384$ & $16 / 127$ & $31 / 464$ \end{表格}

data (babyfood, package=”faraway”)
xtabs (disease/ (disease + nondisease)∼性+食物，婴儿食品）

## 统计代写|广义线性模型代写generalized linear model代考|Prediction and Effective Doses

1米○d<−glm(cbind (dead, alive) conc, familymbinomial, data=bliss)
lmodsum <- summary (lmod)

X0<−C(1,2.5)

1日志⁡一世吨(和吨一个0)
[1) 0.64129

(cm \&- lmodsum $cov. unscaled) (Intercept) （截距） 浓 浓 −0.065823 se <- sqrt(吨(X0)哇C米华华X0) 所以概率尺度上的CI为： ilogit (c (eta0−1.96∗硒, eta01.96∗s和) ) [1) 0.534300.73585 获得相同结果的更直接的方法是： predict (lmod, newdata data.frame (conc=2.5), se=吨 ) [1] 0.58095$ se.fit
[1]0.2263
1logit (c (0.58095-1.960.2263,0.58095+1.960.2263))
[1]0.534300.73585

X0<−C(1,−5)

eta0 <- 总和(X0∗1米○d斯科夫)
ilogit (c (eta0 -1.96*se, eta00+1.96∗s和) )
[1) 2.3577和−053.6429和−03

