### 统计代写|生物统计学作业代写Biostatistics代考| BRIEF NOTES ON THE FUNDAMENTALS

## 统计代写|生物统计学作业代写Biostatistics代考|Mean and Variance

As seen in Sections $3.3$ and $3.4$, a probability density function $f$ is defined so that:
(a) $f(k)=\operatorname{Pr}(X=k)$ in the discrete case
(b) $f(x) d x=\operatorname{Pr}(x \leq X \leq x+d x)$ in the continuous case
For a continuous distribution, such as the normal distribution, the mean $\mu$ and variance $\sigma^{2}$ are calculated from:
(a) $\mu=\int x f(x) d x$
(b) $\sigma^{2}=\int(x-\mu)^{2} f(x) d x$
For a discrete distribution, such as the binomial distribution or Poisson distribution, the mean $\mu$ and variance $\sigma^{2}$ are calculated from:

(a) $\mu=\sum x f(x)$
(b) $\sigma^{2}=\sum(x-\mu)^{2} f(x)$
For example, we have for the binomial distribution,
\begin{aligned} \mu &=n p \ \sigma^{2} &=n p(1-p) \end{aligned}
and for the Poisson distribution,
\begin{aligned} \mu &=\theta \ \sigma^{2} &=\theta \end{aligned}

## 统计代写|生物统计学作业代写Biostatistics代考|Pair-Matched Case–Control Study

Data from epidemiologic studies may come from various sources, the two fundamental designs being retrospective and prospective (or cohort). Retrospective studies gather past data from selected cases (diseased individuals) and controls (nondiseased individuals) to determine differences, if any, in the exposure to a suspected risk factor. They are commonly referred to as case-control studies. Cases of a specific disease, such as lung cancer, are ascertained as they arise from population-based disease registers or lists of hospital admissions, and controls are sampled either as disease-free persons from the population at risk or as hospitalized patients having a diagnosis other than the one under investigation. The advantages of a case-control study are that it is economical and that it is possible to answer research questions relatively quickly because the cases are already available. Suppose that each person in a large population has been classified as exposed or not exposed to a certain factor, and as having or not having some disease. The population may then be enumerated in a $2 \times 2$ table (Table 3.12), with entries being the proportions of the total population.
Using these proportions, the association (if any) between the factor and the disease could be measured by the ratio of risks (or relative risk) of being disease positive for those with or without the factor:

\begin{aligned} \text { relative risk } &=\frac{P_{1}}{P_{1}+P_{3}} \div \frac{P_{2}}{P_{2}+P_{4}} \ &=\frac{P_{1}\left(P_{2}+P_{4}\right)}{P_{2}\left(P_{1}+P_{3}\right)} \end{aligned}
since in many (although not all) situations, the proportions of subjects classified as disease positive will be small. That is, $P_{1}$ is small in comparison with $P_{3}$, and $P_{2}$ will be small in comparison with $P_{4}$. In such a case, the relative risk is almost equal to
\begin{aligned} \theta &=\frac{P_{1} P_{4}}{P_{2} P_{3}} \ &=\frac{P_{1} / P_{3}}{P_{2} / P_{4}} \end{aligned}
the odds ratio of being disease positive, or
$$=\frac{P_{1} / P_{2}}{P_{3} / P_{4}}$$
the odds ratio of being exposed. This justifies the use of an odds ratio to determine differences, if any, in the exposure to a suspected risk factor.

As a technique to control confounding factors in a designed study, individual cases are matched, often one to one, to a set of controls chosen to have similar values for the important confounding variables. The simplest example of pair-matched data occurs with a single binary exposure (e.g., smoking versus nonsmoking). The data for outcomes can be represented by a $2 \times 2$ table (Table 3.13) where $(+,-)$ denotes (exposed, unexposed).

For example, $n_{10}$ denotes the number of pairs where the case is exposed, but the matched control is unexposed. The most suitable statistical model for making inferences about the odds ratio $\theta$ is to use the conditional probability of the number of exposed cases among the discordant pairs. Given $n=n_{10}+n_{01}$ being fixed, it can be seen that $n_{10}$ has $B(n, p)$, where
$$p=\frac{\theta}{1+\theta}$$

## 统计代写|生物统计学作业代写Biostatistics代考|NOTES ON COMPUTATIONS

In Sections $1.4$ and $2.5$ we covered basic techniques for Microsoft’s Excel: how to open/form a spreadsheet, save it, retrieve it, and perform certain descriptive statistical tasks. Topics included data-entry steps, such as select and drag, use of formula bar, bar and pie charts, histograms, calculations of descritive statistics such as mean and standard deviation, and calculation of a coefficient of correlation. In this short section we focus on probability models related to the calculation of areas under density curves, especially normal curves and $t$ curves.
Normal Curves The first two steps are the same as in obtaining descriptive statistics (but no data are needed now): (1) click the paste function icon, $\mathrm{f}^{*}$, and (2) click Statistical. Among the functions available, two are related to normal curves: NORMDIST and NORMINV. Excel provides needed information for any normal distribution, not just the standard normal distribution as in Appendix B. Upon selecting either one of the two functions above, a box appears asking you to provide (1) the mean $\mu,(2)$ the standard deviation $\sigma$, and (3) in the last row, marked cumulative, to enter TRUE (there is a choice $F A L S E$, but you do not need that). The answer will appear in a preselected cell.

• NORMDIST gives the area under the normal curve (with mean and variance provided) all the way from the far-left side (minus infinity) to the value $x$ that you have to specify. For example, if you specify $\mu=0$ and $\sigma=1$, the return is the area under the standard normal curve up to the point specified (which is the same as the number from Appendix B plus $0.5)$.
• NORMINV performs the inverse process, where you provide the area under the normal curve (a number between 0 and 1), together with the mean $\mu$ and standard deviation $\sigma$, and requests the point $x$ on the horizontal axis so that the area under that normal curve from the far-left side (minus infinity) to the value $x$ is equal to the number provided between 0 and 1. For example, if you put in $\mu=0, \sigma=1$, and probability $=0.975$, the return is $1.96$; unlike Appendix $B$, if you want a number in the right tail of the curve, the input probability should be a number greater than $0.5$.
The $t$ Curves: Procedures TDIST and TINV We want to learn how to find the areas under the normal curves so that we can determine the $p$ values for statistical tests (a topic starting in Chapter 5). Another popular family in this category is the $t$ distributions, which begin with the same first two steps: (1) click the paste function icon, $\mathrm{f}^{*}$, and (2) click Statistical. Among the functions available, two related to the $t$ distributions are TDIST and TINV. Similar to the case of NORMDIST and NORMINV, TDIST gives the area under the $t$ curve, and TINV performs the inverse process where you provide the area under the curve and request point $x$ on the horizontal axis. In each case you have to provide the degrees of freedom. In addition, in the last row, marked with tails, enter:
• (Tails=) $I$ if you want one-sided
• (Tails $=) 2$ if you want $t$ wo-sided
(More details on the concepts of one- and two-sided areas are given in Chapter
5.) For example:
• Example 1: If you enter $(\mathrm{x}=) 2.73,($ deg freedom $=) 18$, and, (Tails $=) 1$, you’re requesting the area under a $t$ curve with 18 degrees of freedom and to the right of $2.73$ (i.e., right tail); the answer is $0.00687$.
• Example 2: If you enter $(\mathrm{x}=) 2.73$, (deg freedom $=) 18$, and (Tails $=) 2$, you’re requesting the area under a $t$ curve with 18 degrees of freedom and to the right of $2.73$ and to the left of $-2.73$ (i.e., both right and left tails); the answer is $0.01374$, which is twice the previous answer of $0.00687$.

## 统计代写|生物统计学作业代写Biostatistics代考|Mean and Variance

(a)F(ķ)=公关⁡(X=ķ)在离散情况下
(b)F(X)dX=公关⁡(X≤X≤X+dX)在连续情况下

(a)μ=∫XF(X)dX
(二)σ2=∫(X−μ)2F(X)dX

（一种）μ=∑XF(X)
(二)σ2=∑(X−μ)2F(X)

μ=np σ2=np(1−p)

μ=θ σ2=θ

## 统计代写|生物统计学作业代写Biostatistics代考|Pair-Matched Case–Control Study

θ=磷1磷4磷2磷3 =磷1/磷3磷2/磷4

=磷1/磷2磷3/磷4

p=θ1+θ

## 统计代写|生物统计学作业代写Biostatistics代考|NOTES ON COMPUTATIONS

• NORMDIST 给出从最左侧（负无穷大）一直到值的正态曲线下面积（提供均值和方差）X您必须指定。例如，如果您指定μ=0和σ=1，返回是标准正态曲线下到指定点的面积（与附录 B 中的数字相同加上0.5).
• NORMINV 执行逆过程，您提供正态曲线下的面积（0 到 1 之间的数字）以及平均值μ和标准差σ, 并请求点X在水平轴上，使得从最左侧（负无穷大）到该值的正态曲线下的面积X等于在 0 和 1 之间提供的数字。例如，如果您输入μ=0,σ=1, 和概率=0.975，回报是1.96; 不同于附录乙，如果你想在曲线的右尾有一个数字，输入概率应该是一个大于0.5.
这吨曲线：过程 TDIST 和 TINV 我们想学习如何找到正态曲线下的区域，以便我们可以确定p统计检验的值（从第 5 章开始的主题）。此类别中另一个受欢迎的家庭是吨发行版，前两个步骤相同：（1）单击粘贴功能图标，F∗，和 (2) 单击统计。在可用的功能中，有两个与吨分布是 TDIST 和 TINV。与 NORMDIST 和 NORMINV 的情况类似，TDIST 给出了吨曲线，TINV 执行逆过程，您提供曲线下面积和请求点X在水平轴上。在每种情况下，您都必须提供自由度。此外，在最后一行标有尾巴的地方，输入：
• （尾巴=）一世如果你想要一面
• （尾巴=)2如果你想吨双面
（有关单面和双面区域概念的更多详细信息，请参见第
5 章。）例如：
• 示例 1：如果您输入(X=)2.73,(你自由=)18, 并且, (尾巴=)1，您正在请求 a 下的区域吨具有 18 个自由度和右侧的曲线2.73（即右尾）；答案是0.00687.
• 示例 2：如果您输入(X=)2.73,（你的自由=)18, 和 (尾巴=)2，您正在请求 a 下的区域吨具有 18 个自由度和右侧的曲线2.73和左边−2.73（即左右尾巴）；答案是0.01374，这是先前答案的两倍0.00687.

