## 统计代写|广义线性模型代写generalized linear model代考|Correspondence Analysis

The analysis of the hair-eye color data in the previous section revealed how hair and eye color are dependent. But this does not tell us how they are dependent. To study this, we can use a kind of residual analysis for contingency tables called correspondence analysis.

Compute the Pearson residuals $r_{P}$ and write them in the matrix form $R_{i j}$, where $i=1, \ldots, r$ and $j=1, \ldots, c$, according to the structure of the data. Perform the singular value decomposition:
$$R_{r \times c}=U_{r \times w} D_{w \times w} V_{w \times c}^{T}$$
where $r$ is the number of rows, $c$ is the number of columns and $w=\min (r, c) . U$ and $V$ are called the right and left singular vectors, respectively. $D$ is a diagonal matrix with sorted elements $d_{i}$, called singular values. Another way of writing this is:
$$R_{i j}=\sum_{k=1}^{w} U_{i k} d_{k} V_{j k}$$
As with eigendecompositions, it is not uncommon for the first few singular values to be much larger than the rest. Suppose that the first two dominate so that:
$$R_{i j} \approx U_{i 1} d_{1} V_{j 1}+U_{i 2} d_{2} V_{j 2}$$

We usually absorb the $d$ s into $U$ and $V$ for plotting purposes so that we can assess the relative contribution of the components. Thus:
\begin{aligned} R_{i j} & \approx\left(U_{i 1} \sqrt{d_{1}}\right) \times\left(V_{j 1} \sqrt{d_{1}}\right)+\left(U_{i 2} \sqrt{d_{2}}\right) \times\left(V_{j 2} \sqrt{d_{2}}\right) \ & \equiv U_{i 1} V_{j 1}+U_{i 2} V_{j 2} \end{aligned}
where in the latter expression we have redefined the $U \mathrm{~s}$ and $V \mathrm{~s}$ to include the $\sqrt{d}$.

## 统计代写|广义线性模型代写generalized linear model代考|Matched Pairs

\begin{aligned}
&\text { In the typical two-way contingency tables, we display accumulated information } \
&\text { about two categorical measures on the same object. In matched pairs, we observe } \
&\text { one measure on two matched objects. } \
&\text { In Stuart (1955), data on the vision of a sample of women is presented. The left } \
&\text { and right eye performance is graded into four categories: } \
&\text { data (eyegrade) } \
&\text { (ct c- xtabs }(y \sim \text { right+left, eyegrade)) } \
&\text { right best second third worst } \
&\text { second } 234 \text { left } 1512 \quad 432 \quad 78 \
\end{aligned}

If we check for independence:
summary (et)
Call: xtabs (formula – y right + left, data – eyegrade)
Number of cases in table: 7477
Number of factors: 2
Test for independence of all factors:
Chisq – 8097, df – 9, p-value $=0$
We are not surprised to find strong evidence of dependence. Most people’s eyes are similar. A more interesting hypothesis for such matched pair data is symmetry. Is $p_{i j}=p_{j i}$ ? We can fit such a model by defining a factor where the levels represent the symmetric pairs for the off-diagonal elements. There is only one observation for each level down the diagonal:
(symfac <- factor (apply (eyegrade $[, 2: 3], 1$, function (x) paste (sort $(x)$,
$\rightarrow$ collapse=” ” “))))
[1] best-best best-second best-third best-worst
[5] best-second second-second second-third second-worst
[9] best-third second-third third-third third-worst
10 Levels: best-best best-second best-third … worst-worst
We now fit this model:
mods <- glm(y symfac, eyegrade, familympoisson)
c (deviance (mods), df . residual (mods))
[1] $19.2496 .000$
pchisq (deviance (mods), df . residual (mods), lower=F)
[1] $0.0037629$
Here, we see evidence of a lack of symmetry. It is worth checking the residuals:
round (xtabs (residuals (mods) right+left, eyegrade), 3)
round (xtabs (residuals (mods) righ left
We see that the residuals above the diagonal are mostly positive, while they are mostly negative below the diagonal. So there are generally more poor left, good right eye combinations than the reverse. Furthermore, we can compute the marginals:
margin table $(c t, 1)$
right
$\begin{array}{lrr}\text { best second third } & \text { worst } \ 1976 & 2256 & 2456\end{array}$

## 统计代写|广义线性模型代写generalized linear model代考|Ordinal Variables

Some variables have a natural order. We can use the methods for nominal variables described earlier in this chapter, but more information can be extracted by taking advantage of the structure of the data. Sometimes we might identify a particular ordinal variable as the response. In such cases, the methods of Section $7.4$ can be used. However, sometimes we are interested in modeling the association between ordinal variables. Here the use of scores can be helpful.

Consider a two-way table where both variables are ordinal. We may assign scores $u_{i}$ and $v_{j}$ to the rows and columns such that $u_{1} \leq u_{2} \leq \cdots \leq u_{I}$ and $v_{1} \leq v_{2} \leq \cdots \leq v_{J}$. The assignment of scores requires some judgment. If you have no particular prefer-

ence, even spacing allows for the simplest interpretation. If you have an interval scale, for example, $0-10$ years old, 10-20 years old, $20-40$ years old and so on, midpoints are often used. It is a good idea to check that the inference is robust to the assignment of scores by trying some reasonable alternative choices. If your qualitative conclusions are changed, this is an indication that you cannot make any strong finding.
Now fit the linear-by-linear association model:
$$\log E Y_{i j}=\log \mu_{i j}=\log n p_{i j}=\log n+\alpha_{i}+\beta_{j}+\gamma u_{i} v_{j}$$
So $\gamma=0$ means independence while $\gamma$ represents the amount of association and can be positive or negative. $\gamma$ is rather like an (unscaled) correlation coefficient. Consider underlying (latent) continuous variables which are discretized by the cutpoints $u_{i}$ and $v_{j}$. We can then identify $\gamma$ with the correlation coefficient of the latent variables.
Consider an example drawn from a subset of the 1996 American National Election Study (Rosenstone et al. (1997)). Using just the data on party affiliation and level of education, we can construct a two-way table:
data (nes96)
xtabs ( PID + educ, nes96)
\begin{tabular}{lrrrrrrrr}
\multicolumn{8}{c}{ educ } \
PID & MS & HSdrop HS Coll cCdeg & BAdeg MAdeg \
strDem & 5 & 19 & 59 & 38 & 17 & 40 & 22 \
weakDem & 4 & 10 & 49 & 36 & 17 & 41 & 23 \
indDem & 1 & 4 & 28 & 15 & 13 & 27 & 20 \
indind & 0 & 3 & 12 & 9 & 3 & 6 & 4 \
indRep & 2 & 7 & 23 & 16 & 8 & 22 & 16 \
weakRep & 0 & 5 & 35 & 40 & 15 & 38 & 17 \
strRep & 1 & 4 & 42 & 33 & 17 & 53 & 25
\end{tabular}
Both variables are ordinal in this example. We need to convert this to a dataframe with one count per line to enable model fitting.

## 统计代写|广义线性模型代写generalized linear model代考|Correspondence Analysis

Rr×C=在r×在D在×在在在×C吨

R一世j=∑ķ=1在在一世ķdķ在jķ

R一世j≈在一世1d1在j1+在一世2d2在j2

R一世j≈(在一世1d1)×(在j1d1)+(在一世2d2)×(在j2d2) ≡在一世1在j1+在一世2在j2

## 统计代写|广义线性模型代写generalized linear model代考|Matched Pairs

在典型的双向列联表中，我们显示累积信息   关于同一对象的两个分类度量。在配对中，我们观察到   对两个匹配的对象进行一次测量。   在 Stuart (1955) 中，提供了关于女性样本视力的数据。左边   右眼表现分为四类：   数据（眼级）   (ct c-xtabs (是∼ 右+左，眼级））   对 最好 第二 第三 最差   最好的 152026612466  第二 234 剩下 151243278  第三 1173621772205  最坏的 3682179492

summary (et)
Call: xtabs (formula – y right + left, data – eyegrade)

Chisq – 8097, df – 9、p值=0

(symfac <- factor (apply (eyegrade[,2:3],1, 函数 (x) 粘贴（排序(X),
→collapse=” ” “)))))
[1] 最佳-最佳-最佳-第二-最佳-第三–最差
[5] 最佳-第二-第二-第二-第三-第二-最差
[9] 最佳-第三-第二-第三-第三-第三第三最差
10 个级别：最好最好最好第二最好第三…最差

mods <- glm(y symfac, eyegrade, familympoisson)
c (deviance (mods), df .residual (mods ))
[1]19.2496.000
pchisq (deviance (mods), df .residual (mods), lower=F)
[1]0.0037629

round (xtabs(residuals (mods) right+left, eyegrade), 3)
round (xtabs(residuals (mods) right left

margin table(C吨,1)

最好的第二第三  最坏的  197622562456

## 统计代写|广义线性模型代写generalized linear model代考|Ordinal Variables

data (nes96)
xtabs ( PID + educ, nes96)

\begin{tabular}{lrrrrrrrr} \multicolumn{8}{c}{ educ} \ PID & MS & HSdrop HS Coll cCdeg & BAdeg MAdeg \ strDem & 5 & 19 & 59 & 38 & 17 & 40 & 22 \weakDem & 4 & 10 & 49 & 36 & 17 & 41 & 23 \ indDem & 1 & 4 & 28 & 15 & 13 & 27 & 20 \ indind & 0 & 3 & 12 & 9 & 3 & 6 & 4 \ indRep & 2 & 7 & 23 & 16 & 8 & 22 & 16 \weakRep & 0 & 5 & 35 & 40 & 15 & 38 & 17 \ strRep & 1 & 4 & 42 & 33 & 17 & 53 & 25 \end{tabular}\begin{tabular}{lrrrrrrrr} \multicolumn{8}{c}{ educ} \ PID & MS & HSdrop HS Coll cCdeg & BAdeg MAdeg \ strDem & 5 & 19 & 59 & 38 & 17 & 40 & 22 \weakDem & 4 & 10 & 49 & 36 & 17 & 41 & 23 \ indDem & 1 & 4 & 28 & 15 & 13 & 27 & 20 \ indind & 0 & 3 & 12 & 9 & 3 & 6 & 4 \ indRep & 2 & 7 & 23 & 16 & 8 & 22 & 16 \weakRep & 0 & 5 & 35 & 40 & 15 & 38 & 17 \ strRep & 1 & 4 & 42 & 33 & 17 & 53 & 25 \end{tabular}

