### 统计代写|生物统计学作业代写Biostatistics代考| Nonparametric Correlation Coe‰cients

## 统计代写|生物统计学作业代写Biostatistics代考|Nonparametric Correlation Coe‰cients

Suppose that the data set consists of $n$ pairs of observations $\left{\left(x_{i}, y_{i}\right)\right}$, expressing a possible relationship between two continuous variables. We characterize the strength of such a relationship by calculating the coefficient of correlation:
$$r=\frac{\sum(x-\bar{x})(y-\bar{y})}{\sqrt{\left[\sum(x-\bar{x})^{2}\right]\left[\sum(y-\bar{y})^{2}\right]}}$$
called Pearson’s correlation coefficient. Like other common statistics, such as the mean $\bar{x}$ and the standard deviation $s$, the correlation coefficient $r$ is very sensitive to extreme observations. We may be interested in calculating a measure of association that is more robust with respect to outlying values. There are not one but two nonparametric procedures: Spearman’s rho and Kendall’s tau rank correlations.

Spearman’s Rho Spearman’s rank correlation is a direct nonparametric counterpart of Pearson’s correlation coefficient. To perform this procedure, we first arrange the $x$ values from smallest to largest and assign a rank from 1 to $n$ for each value; let $R_{i}$ be the rank of value $x_{i}$. Similarly, we arrange the $y$ values from smallest to largest and assign a rank from 1 to $n$ for each value; let $S_{i}$ be the rank of value $y_{i}$. If there are tied observations, we assign an average rank, averaging the ranks that the tied observations take jointly. For example, if the second and third measurements are equal, they are both assigned $2.5$ as their

common rank. The next step is to replace, in the formula of Pearson’s correlation coefficient $r, x_{i}$ by its rank $R_{i}$ and $y_{i}$ by its rank $S_{i}$. The result is Spearman’s rho, a popular rank correlation:
\begin{aligned} \rho &=\frac{\sum\left(R_{i}-\bar{R}\right)\left(S_{i}-\bar{S}\right)}{\sqrt{\left[\sum\left(R_{i}-\bar{R}\right)^{2}\right]\left[\sum\left(S_{i}-\bar{S}\right)^{2}\right]}} \ &=1-\frac{6 \sum\left(R_{i}-S_{i}\right)^{2}}{n\left(n^{2}-1\right)} \end{aligned}
The second expression is simpler and easier to use.
Example 2.10 Consider again the birth-weight problem of Example $2.8$. We have the data given in Table 2.16. Substituting the value of $\sum\left(R_{i}-S_{i}\right)^{2}$ into the formula for rho $(\rho)$, we obtain
\begin{aligned} \rho &=1-\frac{(6)(560.5)}{(12)(143)} \ &=-0.96 \end{aligned}
which is very close to the value of $r(-0.946)$ obtained in Example $2.8$. This closeness is true when there are few or no extreme observations.

## 统计代写|生物统计学作业代写Biostatistics代考|NOTES ON COMPUTATIONS

In Section $1.4$ we covered basic techniques for Microsoft’s Excel: how to open/ form a spreadsheet, save, and retrieve it. Topics included data-entry steps such

as select and drag, use of formula bar, and bar and pie charts. In this short section we focus on continuous data, covering topics such as the construction of histograms, basic descriptive statistics, and correlation analysis.

Histograms With a frequency table ready, click the ChartWizard icon (the one with multiple colored bars on the standard toolbar near the top). A box appears with choices (as when you learned to form a bar chart or pie chart); select the column chart type. Then click on next.

• For the data range, highlight the frequency column. This can be done by clicking on the first observation and dragging the mouse to the last observation. Then click on next.
• To remove the gridlines, click on the gridline tab and uncheck the box. To remove the legend, you can do the same using the legend tab. Now click finish.
• The problem is that there are still gaps. To remove these, double-click on a bar of the graph and a new set of options should appear. Click on the options tab and change the gap width from 150 to $0 .$
Descriptive Statistics
• First, click the cell you want to fill, then click the paste function icon, $\mathrm{f}^{*}$, which will give you-in a box-a list Excel functions available for your use.
• The item you need in this list is Statistical; upon hitting this, a new list appears with function names, each for a statistical procedure.
• The following are procedures/names we learn in this chapter (alphabetically): AVERAGE: provides the sample mean, GEOMEAN: provides the geometric mean, MEDIAN: provides the sample median, STDEV: provides the standard deviation, and VAR: provides the variance. In each case you can obtain only one statistic at a time. First, you have to enter the range containing your sample: for example, D6:D20 (you can see what you are entering on the formula bar). The computer will return with a numerical value for the statistic requested in a preselected cell.

## 统计代写|生物统计学作业代写Biostatistics代考|DESCRIPTIVE METHODS FOR CONTINUOUS

In Exercise 1.46, we investigated the effects of the three binary preoperative variables (x-ray, grade, and stage); in this exercise, we focus on the effects of the two continuous factors (age and acid phosphatase). The 53 patients are divided into two groups by the finding at surgery, a group with nodal involvement and a group without (denoted by 1 or 0 in the sixth column). For each group and for each of the two factors age at diagnosis and level of serum acid phosphatase, calculate the mean $\bar{x}$, variance $s^{2}$, and standard deviation $s$.

Refer to the data on cancer of the prostate in Exercise 2.32. Investigate the relationship between age at diagnosis and level of serum acid phosphatase by calculating Pearson’s correlation coefficient and draw your conclusion. Repeat this analysis, but analyze the data separately for the two groups, the group with nodal involvement and the group without. Does the nodal involvement seem to have any effect on the strength of this relationship?

A study was undertaken to examine the data for 44 physicians working for an emergency department at a major hospital so as to determinewhich of a number of factors are related to the number of complaints received during the preceding year. In addition to the number of complaints, data available consist of the number of visits – which serves as the size for the observation unit, the physician-and four other factors under investigation. Table E2.34 presents the complete data set. For each of the 44 physicians there are two continuous explanatory factors, revenue (dollars per hour) and workload at the emergency service (hours), and two binary variables, gender (female/male) and residency training in emergency services (no/yes). Divide the number of complaints by the number of visits and use this ratio (number of complaints per visit) as the primary outcome or endpoint $X$.
(a) For each of the two binary factors, gender (female/male) and residency training in emergency services (no/yes), which divide the 44 physicians into two subgroups-say, men and women-calculate the mean $\bar{x}$ and standard deviation $s$ for the endpoint $X$.
(b) Investigate the relationship between the outcome, number of complaints per visit, and each of two continuous explanatory factors, revenue (dollars per hour) and workload at the emergency service (hours), by calculating Pearson’s correlation coefficient, and draw your conclusion.
(c) Draw a scatter diagram to show the association, if any, between the number of complaints per visit and the workload at the emergency service. Does it appear to be linear?

## 统计代写|生物统计学作业代写Biostatistics代考|NOTES ON COMPUTATIONS

• 对于数据范围，突出显示频率列。这可以通过单击第一个观察并将鼠标拖动到最后一个观察来完成。然后点击下一步。
• 要删除网格线，请单击网格线选项卡并取消选中该框。要删除图例，您可以使用图例选项卡执行相同操作。现在点击完成。
• 问题是仍然存在差距。要删除这些，双击图表的一个栏，应该会出现一组新的选项。单击选项选项卡并将间隙宽度从 150 更改为0.
描述性统计
• 首先，单击要填充的单元格，然后单击粘贴功能图标，F∗，它将在一个框中为您提供可供您使用的 Excel 函数列表。
• 您在此列表中需要的项目是 Statistical；点击此按钮后，将出现一个新列表，其中包含函数名称，每个函数名称都用于统计过程。
• 以下是我们在本章中学习的过程/名称（按字母顺序）： AVERAGE：提供样本均值，GEOMEAN：提供几何均值，MEDIAN：提供样本中位数，STDEV：提供标准差，VAR：提供方差。在每种情况下，您一次只能获得一个统计数据。首先，您必须输入包含样本的范围：例如，D6:D20（您可以在公式栏上看到您输入的内容）。计算机将返回预选单元格中请求的统计数据的数值。

## 统计代写|生物统计学作业代写Biostatistics代考|DESCRIPTIVE METHODS FOR CONTINUOUS

(a) 对于两个二元因素中的每一个，性别（女性/男性）和急诊服务中的住院医师培训（否/是），将 44 位医生分为两个亚组——比如男性和女性——计算平均值X¯和标准差s对于端点X.
(b) 通过计算皮尔逊相关系数，调查结果、每次就诊的投诉数量与两个连续解释因素、收入（每小时美元）和紧急服务工作量（小时）之间的关系，并得出结论.
(c) 绘制散点图，显示每次就诊的投诉数量与紧急服务工作量之间的关联（如果有）。它看起来是线性的吗？

