统计代写|广义线性模型代写generalized linear model代考|MAST30025

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 统计代写|广义线性模型代写generalized linear model代考|Poisson Regression

If $Y$ is Poisson with mean $\mu>0$, then:
$$P(Y=y)=\frac{e^{-\mu} \mu^{y}}{y !}, \quad y=0,1,2, \ldots$$
Three examples of the Poisson density are depicted in Figure 5.1. In the left panel, we see a distribution that gives highest probability to $y=0$ and falls rapidly as $y$ increases. In the center panel, we see a skew distribution with longer tail on the right. Even for a not so large $\mu=5$, we see the distribution become more normally shaped. This becomes more pronounced as $\mu$ increases.
barplot (dpois $(0: 5,0.5)$, xlab=” $y$ “, ylab=”Probability”, names=0:5, main=”
$\hookrightarrow$ mean $\left.=0.5^{\prime \prime}\right)$
barplot (dpois $(0: 10,2), x 1 a b=” y ” y$ labm”Probability”, names= $0: 10$, main $=$ “
$\rightarrow$ mean $\left.=2^{\prime \prime}\right)$
barplot (dpois $(0: 15,5)$, xlab=” ” $”, y l a b=”$ Probability”, names $0: 15$, main ” “
$\rightarrow$ mean $\left.=5^{\prime \prime}\right)$
The expectation and variance of a Poisson are the same: $E Y=\operatorname{var} Y=\mu$. The Poisson distribution arises naturally in several ways:

1. If the count is some number out of some possible total, then the response would be more appropriately modeled as a binomial. However, for small success probabilities and large totals, the Poisson is a good approximation and can be used. For example, in modeling the incidence of rare forms of cancer, the number of people affected is a small proportion of the population in a given geographical area. Specifically, if $\mu=n p$ while $n \rightarrow \infty$, then $B(n, p)$ is well approximated by Pois $(\mu)$. Also, for small $p$, note that $\operatorname{logit}(p) \approx \log p$, so that the use of the Poisson with a log link is comparable to the binomial with a logit link. Where $n$ varies between cases, a rate model can be used as described in Section 5.3.

## 统计代写|广义线性模型代写generalized linear model代考|Dispersed Poisson Model

We can modify the standard Poisson model to allow for more variation in the response. But before we do that, we must check whether the large size of deviance might be related to some other cause.

In the Galápagos example, we check the residuals to see if the large deviance can be explained by an outlier:
halfnorm (residuals (modp))
The half-normal plot of the (absolute value of the) residuals shown in Figure $5.3$ shows no outliers. It could be that the structural form of the model needs some improvement, but some experimentation with different forms for the predictors will reveal that there is little scope for improvement. Furthermore, the proportion of deviance explained by this model, $1-717 / 3510=0.796$, is about the same as in the linear model above.

For a Poisson distribution, the mean is equal to the variance. Let’s investigate this relationship for this model. It is difficult to estimate the variance for a given value of the mean, but $(y-\hat{\mu})^{2}$ does serve as a crude approximation. We plot this estimated variance against the mean, as seen in the second panel of Figure 5.3:
plot ( $\log ($ fitted (modp) ), $\log (($ gala\$Species-fitted (modp) ) 2$), \quad x l a b=\hookrightarrow$expression (hat (mu)), ylab=expression$\left.\left((y-h a t(m u))^{\wedge} 2\right)\right)$abline$(0,1)$We see that the variance is proportional to, but larger than, the mean. When the variance assumption of the Poisson regression model is broken but the link function and choice of predictors are correct, the estimates of$\beta$are consistent, but the standard er- rors will be wrong. We cannot determine which predictors are statistically significant in the above model using the output we have. The Poisson distribution has only one parameter and so is not very flexible for empirical fitting purposes. We can generalize by allowing ourselves a dispersion parameter. Over- or underdispersion can occur in various ways in Poisson models. For example, suppose the Poisson response$Y$has rate$\lambda$which is itself a random variable. The tendency to fail for a machine may vary from unit to unit even though they are the same model. We can model this by letting$\lambda$be gamma distributed with$E \lambda=\mu$and var$\lambda=\mu / \phi$. Now$Y$is negative binomial with mean$E Y=\mu$. The mean is the same as the Poisson, but the variance var$Y=\mu(1+\phi) / \phi$which is not equal to$\mu$. In this case, overdispersion would occur and could be modeled using a negative binomial model as demonstrated in Section 5.4. If we know the specific mechanism, as in the above example, we could model the response as a negative binomial or other more flexible distribution. If the mechanism is not known, we can introduce a dispersion parameter$\phi$such that var$Y=\phi E Y=\phi \mu$.$\phi=1$is the regular Poisson regression case, while$\phi>1$is overdispersion and$\phi<1$is underdispersion. The dispersion parameter may be estimated using: $$\hat{\phi}=\frac{X^{2}}{n-p}=\frac{\sum_{i}\left(y_{i}-\hat{\mu}{i}\right)^{2} / \hat{\mu}{i}}{n-p}$$ ## 统计代写|广义线性模型代写generalized linear model代考|Rate Models The number of events observed may depend on a size variable that determines the number of opportunities for the events to occur. For example, if we record the number of burglaries reported in different cities, the observed number will depend on the number of households in these cities. In other cases, the size variable may be time. For example, if we record the number of customers served by a sales worker, we must take account of the differing amounts of time worked. Sometimes, it is possible to analyze such data using a binomial response model. For the burglary example above, we might model the number of burglaries out of the number of households. However, if the proportion is small, the Poisson approxima- tion to the binomial is effective. Furthermore, in some examples, the total number of potential cases may not be known exactly. The modeling of rare diseases illustrates this issue as we may know the number of cases but not have precise population data. Sometimes, the binomial model simply cannot be used. In the burglary example, some households may be robbed more than once. In the customer service example, the size variable is not a count. An alternative approach is to model the ratio. However, there are often difficulties with normality and unequal variance when taking this approach, particularly if the counts are small. In Purott and Reeder (1976), some data is presented from an experiment conducted to determine the effect of gamma radiation on the numbers of chromosomal abnormalities (ca) observed. The number (cells), in hundreds of cells exposed in each run, differs. The dose amount (doseamt) and the rate (doserate) at which the dose is applied are the predictors of interest. We may format the data for observation like this: data (dicentric, package=”faraway”) round (xtabs (ca/cells doseamt+doserate, dicentric),2) ## 广义线性模型代考 ## 统计代写|广义线性模型代写generalized linear model代考|Poisson Regression 如果是是泊松的均值μ>0， 然后： 磷(是=是)=和−μμ是是!,是=0,1,2,… 图 5.1 描述了泊松密度的三个示例。在左侧面板中，我们看到一个分布，它给出的概率最高是=0并迅速下降是增加。在中心面板中，我们看到右侧有较长尾部的偏斜分布。即使对于一个不是那么大的μ=5，我们看到分布变得更正常。这变得更加明显μ增加。 条形图（dpois(0:5,0.5), xlab =”是“, ylab=”概率”, 名称=0:5, main=” 意思是=0.5′′) 条形图（dpois(0:10,2),X1一个b=”是”是实验室“概率”，名称=0:10， 主要的= “ →意思是=2′′) 条形图（dpois(0:15,5), xlab =” ””,是l一个b=”概率”，名称0:15， 主要的 ” ” →意思是=5′′) 泊松的期望和方差是相同的：和是=曾是⁡是=μ. 泊松分布以多种方式自然产生： 1. 如果计数是某个可能总数中的某个数字，则响应将更适合建模为二项式。但是，对于较小的成功概率和较大的总数，泊松是一个很好的近似值，可以使用。例如，在模拟罕见癌症的发病率时，受影响的人数只是特定地理区域内人口的一小部分。具体来说，如果μ=np尽管n→∞， 然后乙(n,p)由 Pois 很好地逼近(μ). 另外，对于小p， 注意罗吉特⁡(p)≈日志⁡p，因此使用对数链接的泊松与使用对数链接的二项式相当。在哪里n不同情况下的不同，可以使用第 5.3 节中描述的费率模型。 ## 统计代写|广义线性模型代写generalized linear model代考|Dispersed Poisson Model 我们可以修改标准泊松模型以允许响应的更多变化。但在我们这样做之前，我们必须检查较大的偏差是否与其他原因有关。 在加拉帕戈斯的例子中，我们检查残差，看看是否可以用异常值来解释大偏差： halfnorm (residuals (modp)) 残差（ 的绝对值）的半正态图如图所示5.3显示没有异常值。可能是模型的结构形式需要一些改进，但是对预测变量的不同形式进行一些实验会发现改进的余地很小。此外，该模型解释的偏差比例，1−717/3510=0.796, 与上述线性模型中的大致相同。 对于泊松分布，均值等于方差。让我们研究这个模型的这种关系。对于给定的均值，很难估计方差，但是(是−μ^)2确实可以作为粗略的近似值。 我们将这个估计的方差与平均值作图，如图 5.3 的第二个面板所示：日志⁡(安装（modp）），日志⁡((晚会$物种拟合 (modp) ) 2),Xl一个b=

rors 将是错误的。我们无法使用我们拥有的输出确定哪些预测变量在上述模型中具有统计显着性。

φ^=X2n−p=∑一世(是一世−μ^一世)2/μ^一世n−p

## 统计代写|广义线性模型代写generalized linear model代考|Rate Models

data (dicentric, package=”faraway”)
round (xtabs (ca/cells doseamt​​+doserate, dicentric),2)

