## 统计代写|广义线性模型代写generalized linear model代考|Beta Regression

Beta regression is useful for responses that are bounded in $(0,1)$ such as proportions. It could also be used for variables that are bounded in some other finite interval simply by rescaling to $(0,1)$. A Beta-distributed random variable $Y$ has density:
$$f(y \mid a, b)=\frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} y^{a-1}(1-y)^{b-1}$$
for parameters $a, b$ and Gamma function $\Gamma()$. It is more convenient to transform the parameters so $\mu=a /(a+b)$ and $\phi=a+b$ so that $E Y=\mu$ and $\operatorname{var} Y=\mu(1-\mu) /(1+$ $\emptyset)$. We can then link the linear predictor $\eta$ using $\eta=g(\mu)$ using a link function $g$ where any of the choices used for the binomial model would be suitable.

An implementation of the Beta regression model can be found in the macv package of Wood (2006). We can apply this to the mammalsleep also used in the previous section.

The default choice of link is the logit function. The estimated value of $\phi$ is $8.927$. A comparison of the fitted values of this model and the quasi-binomial model fitted earlier reveals no substantial difference. The advantage of the Beta-based model is the full distributional model which would allow the construction of full predictive distributions rather than just a point estimate and standard error.

## 统计代写|广义线性模型代写generalized linear model代考|Latent Variables

Suppose that students answer questions on a test and that a specific student has an aptitude $T$. A particular question might have difficulty $d$ and the student will get the answer correct only if $T>d$. Now if we consider $d$ fixed and $T$ as a random variable with density $f$ and distribution function $F$, then the probability that the student will get the answer wrong is:
$$p=P(T \leq d)=F(d)$$
$T$ is called a latent variable. Suppose that the distribution of $T$ is logistic:
$$F(y)=\frac{\exp (y-\mu) / \sigma}{1+\exp (y-\mu) / \sigma}$$
$\mathrm{SO}$
$$\operatorname{logit}(p)=-\mu / \sigma+d / \sigma$$
If we set $\beta_{0}=-\mu / \sigma$ and $\beta_{1}=1 / \sigma$, we now have a logistic regression model. We can illustrate this in the following example where we set $d=1$ and let $T$ have mean $-1$ and $\sigma=1$ :
$x<-\operatorname{seq}(-6,4,0.1)$
$y<-d \log _{\text {is }}(x$, location $=-1)$
plot (x,y, type=”1″, ylab=”density”, $x l a b=” t “)$
i1 $<-(x<1)$
polygon (c $(x[11], 1,-6), c(y[11], 0,0)$, col=’ gray’)
The plot in Figure $4.1$ shows a logistically distributed latent variable. We can see that this distribution is apparently very similar to the normal distribution. The shaded area represents the probability of getting an answer wrong. As the mean aptitude of this student is somewhat less than the difficulty of the question, this probability is substantially greater than one half.

This idea also arises in a bioassay where we might treat an animal, plant or person with some concentration of a treatment and observe the outcome. For example, suppose we are interested in the concentration of insecticide to be used in exterminating a pest. Insects will have varying tolerances for the toxin and will survive if their tolerance is greater than the dose. In this context, the term tolerance distribution for $T$ is used. Applications in several other areas exist where we observe only a binary outcome but believe this to be generated by some continuous but unobserved variable.

Until now we have used logit link function to connect the probability and the linear predictor. But other choices of link function are reasonable. We need a function that bounds the probability between zero and one. We also expect the link function to be monotone. It is conceivable that the success probability may go up and down as the linear predictor increases but this circumstance is best modeled by adding nonlinear components to the linear predictors such as quadratic terms rather than modifying the link function. The latent variable formulation suggests some other possibilities for the link function. Here are some choices which are implemented in the glm () function:

1. Probit: $\eta=\Phi^{-1}(p)$ where $\Phi$ is the normal cumulative distribution function. This arises from a normally distributed latent variable.
2. Complementary $\log -\log : \eta=\log (-\log (1-p))$. A Gumbel-distributed latent variable will lead to this.
3. Cauchit: $\eta=\tan ^{-1}(\pi(p-1 / 2))$ which is motivated by a Cauchy-distributed latent variable.

We can illustrate the choices using some data from Bliss (1935) on the numbers of insects dying at different levels of insecticide concentration. We fit all four link functions:

These are not very different, but now look at a wider range from $[-4,8]$. We apply the predict function to each of the four models forming a matrix of predicted values. We label the columns and add the information about the dose. The tidyr package is useful for reformatting data from a wide format of multiple measured values per row to a long format where there is only one response value per row. This is accomplished using the gather () function. This format, where each row is an observation and each column is a variable, is the most convenient form for many $R$ analyses. Finally, the ggplot 2 package is useful for completing and well-labeled plot.

