统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Fitting Models with R

After the data have been inspected comes the next step of analysis, interpretation, inference, drawing conclusions, shaking the data down, or whatever phrase is in current vogue. This rests upon fitting models, fully parametric, semi-parametric, or non-parametric. The first example of fitting a fully parametric model occurs here in Section $3.2$, that of a non-parametric model in Section 4.1, and that of a semi-parametric one in Section 4.2. We will make use of the gold-standard R-package survival, which includes routines to perform a variety of tasks as well as some data sets to illustrate them. A high priority for the budding survival analyst is to learn how to use survival and other packages listed on the CRAN Web site.

In addition to the software available in survival, some homegrown programs are used. These are either to make the computations more transparent or to fill minor gaps in the available software. The first such instance occurs in Section 3.2. The R-code used in this book, together with the data sets not subject to copyright restriction, is available on the Web site referred to in the Preface.

统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Simulating Data with R

Simulation is a powerful tool in statistics. Many modern techniques using simulation have been developed on the back of fast computing. In this section an introductory exercise in simulating data will be described. It is often useful in assessing the performance of proposed methodology to be able to test its performance on data whose structure is known and controlled. It is a particularly powerful approach in situations where the framework is straightforward to set up but the consequences of interest are analytically intractable.

Let us generate some right-censored survival times whose mean depends on some recorded factors. Sophisticated usage of $\mathrm{R}$ is not the point here-just a demonstration of some basic commands. Suppose that $T_{i}$, the breakdown time of the $i$ th machine, has an exponential distribution with mean a given function of $x_{1}$, a measure of the intensity of usage, and $x_{2}$, a measure of the frequency and quality of maintenance. (No prizes for guessing that I have my car in mind here-see the observation with the smallest $x_{2}$.) Some basic $R$ code to achieve this is as follows:
$\mathrm{n}=25 ; \mathrm{b} 0=1.5 ; \mathrm{b} 1=1.2 ; \mathrm{b} 2=-2.5 ;$ #n sample size, $(\mathrm{b} 0, \mathrm{~b} 1, \mathrm{~b} 2)=$ regression coeffs
$x 1 m r e p(0, n) ; x 2 m x 1 ;$ timmxl; #initialise $x 1, x 2$, tim as vectors of $0 s$ of length $n$
for (i in $1: n){x 1[i]=r u n i f(1, \min =0, \max =1) ; x 2[i]=r u n i f(1) ;$
$1 a m-\exp (\mathrm{b} 0+\mathrm{b} 1 * x 1[i]+\mathrm{b} 2 * \mathrm{x} 2[i]) ;$ timrexp $(1$, ratem $1 \mathrm{am}) ;$ tim[i]min(ti, 10$) ;}$
for $(i \operatorname{in} 1 \mathrm{n})} \mathrm{v} 1 \mathrm{me}(x 1[i], x 2[i]$, tim[i]}; v 2 mformat (v1, widthmg, digits $=2)$;
$\operatorname{cat}(* \ln , v 2) ;}$
Note the exciting variety of brackets: (…) to enclose the arguments of a function, […] for indices of a vector or matrix, and ${\ldots}$ to group sets of instructions in a for loop. The for loop runs through the sample elements one by one. The functions runif and rexp generate samples from uniform and exponential distributions, respectively: look them up, using help (runif) and help (rexp), for their full capabilities. For example, $x 1=$ runif $(n)$ outside the for loop would have had the same effect. The function c (…) concatenates (look that word up too), for example, $a=c(b, c)$ puts b and c together into $a$, and cat prints stuff out. The model here for the mean breakdown time, $\lambda_{i}^{-1}$, is of log-linear form: $\log \lambda_{i}=b_{0}+b_{1} x_{i 1}+b_{2} x_{i 2}$, and the signs of the regression coefficients, $b_{1}$ and $b_{2}$, are meant to reflect the expected effects of $x_{1}$ and $x_{2}$. The times are right-censored at value 10 . The data, printed out, can be copied and pasted into a data file.

统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Continuous Lifetimes

Let $T$ be the random variable representing the lifetime under study. The distribution function $F$ and the survivor function $\bar{F}$ of $T$ are defined by the probabilities
F(t)=\mathrm{P}(T \leq t), \quad \bar{F}(t)=\mathrm{P}(T>t),
so $F(t)+\bar{F}(t)=1$ for all $t$. Note that $F(t)$ is an increasing, and $\bar{F}(t)$ a decreasing, function of $t$; normally, $F(t)$ will rise from 0 to 1 , and $\bar{F}(t)$ will fall from 1 to 0 , over the range of $t$. When $T$ is essentially positive, as it is in most applications, $F(0)=0$ and $\bar{F}(0)=1$. The density function is defined as
f(t)=d F(t) / d t=-d \bar{F}(t) / d t
F(t)=\int_{0}^{t} f(s) d s, \quad \bar{F}(t)=\int_{t}^{\infty} f(s) d s .
(Unless otherwise stated, it will be tacitly assumed that continuous survival distributions have densities, that is, that they are absolutely continuous.)

Modern survival analysis is mostly based around hazard functions. (This has nothing to do with the over-zealous health-and-safety culture that blights our lives nowadays.) These functions are concerned with the probability of imminent failure, that is, that, having got this far, you will get no further. The formal definition of the hazard function $h$ of $T$ is
h(t)=\lim _{\delta \downarrow 0} \delta^{-1} \mathrm{P}(T \leq t+\delta \mid T>t)

The right-hand side is equal to
\lim {\delta \downarrow 0} \delta^{-1} \mathrm{P}(tt) &=\lim {\delta \downarrow 0} \delta^{-1}{\bar{F}(t)-\bar{F}(t+\delta)} / \bar{F}(t) \
&=-{d \bar{F}(t) / d t} / \bar{F}(t)=-d \log \bar{F}(t) / d t .
In different contexts $h(t)$ is variously known as the instantaneous failure rate, age-specific failure rate, age-specific death rate, intensity function, and force of mortality or decrement. Integration yields the inverse relationship
F(t)=\exp \left{-\int_{0}^{t} h(s) d s\right}=\exp {-H(t)},
where $H(t)$ is the integrated hazard function and the lower limit 0 of the integral is consistent with $\bar{F}(0)=1$. For a proper lifetime distribution, that is, one for which $\bar{F}(\infty)=0, H(t)$ must tend to $\infty$ as $t \rightarrow \infty$.

Both the distribution function $F$ and the hazard function $h$ are concerned with the probability that failure occurs before some given time. The difference is this: with the former, you are stuck at time zero looking ahead to a time maybe a long way into the future (with a telescope); with the latter, you are moving along with time and just looking ahead to the next instant (with a microscope).

统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Fitting Models with R


统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Fitting Models with R

检查数据后,下一步是分析、解释、推理、得出结论、调整数据或任何当前流行的短语。这取决于拟合模型,完全参数化、半参数化或非参数化。第一个拟合全参数模型的例子出现在第3.2,第 4.1 节中的非参数模型的模型,以及第 4.2 节中的半参数模型的模型。我们将利用黄金标准的 R 包生存,其中包括执行各种任务的例程以及一些数据集来说明它们。初露头角的生存分析师的一个高度优先事项是学习如何使用 CRAN 网站上列出的生存和其他软件包。

除了生存中可用的软件外,还使用了一些本土程序。这些要么是为了使计算更加透明,要么是为了填补可用软件中的微小空白。第一个这样的例子出现在第 3.2 节。本书中使用的 R 代码以及不受版权限制的数据集可在前言中提到的网站上找到。

统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Simulating Data with R


让我们生成一些右删失的生存时间,其平均值取决于一些记录的因素。复杂的使用R不是这里的重点——只是一些基本命令的演示。假设吨一世, 的击穿时间一世th 机器,具有指数分布,平均给定函数为X1,使用强度的度量,以及X2,衡量维护频率和质量的指标。(猜猜我有我的车在这里没有奖品 – 看最小的观察X2.) 一些基本的R实现这一点的代码如下:
n=25;b0=1.5;b1=1.2;b2=−2.5;#n 样本大小,(b0, b1, b2)=回归系数
X1米r和p(0,n);X2米X1;tmmxl; #初始化X1,X2, tim 作为向量0s长度n
为了(i \operatorname{in} 1 \mathrm{n})} \mathrm{v} 1 \mathrm{me}(x 1[i], x 2[i](i \operatorname{in} 1 \mathrm{n})} \mathrm{v} 1 \mathrm{me}(x 1[i], x 2[i], 蒂姆[i]}; v 2 mformat (v1, widthmg, 数字=2);
\operatorname{cat}(* \ln , v 2) ;}\operatorname{cat}(* \ln , v 2) ;}
请注意令人兴奋的各种括号:(…) 括住函数的参数,[…] 用于向量或矩阵的索引,以及…在 for 循环中对指令集进行分组。for 循环一一遍历示例元素。函数 runif 和 rexp 分别从均匀分布和指数分布生成样本:使用 help (runif) 和 help (rexp) 查找它们,以了解它们的全部功能。例如,X1=鲁尼夫(n)在 for 循环之外会产生相同的效果。函数 c (…) 连接(也可以查找该词),例如,一种=C(b,C)将 b 和 c 放在一起一种, cat 打印出东西。这里的模型是平均故障时间,λ一世−1, 是对数线性形式:日志⁡λ一世=b0+b1X一世1+b2X一世2,以及回归系数的符号,b1和b2, 旨在反映预期的效果X1和X2. 时间在值 10 处右删失。打印出来的数据可以复制并粘贴到数据文件中。

统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Continuous Lifetimes

所以F(吨)+F¯(吨)=1对全部吨. 注意F(吨)是增加的,并且F¯(吨)的递减函数吨; 一般,F(吨)将从 0 上升到 1 ,并且F¯(吨)将从 1 下降到 0 ,范围为吨. 什么时候吨本质上是积极的,就像在大多数应用中一样,F(0)=0和F¯(0)=1. 密度函数定义为

现代生存分析主要基于危险函数。(这与如今困扰我们生活的过分热心的健康和安全文化无关。)这些功能与即将失败的可能性有关,也就是说,已经走到了这一步,你将无路可走. 危险函数的正式定义H的吨是

林d↓0d−1磷(吨吨)=林d↓0d−1F¯(吨)−F¯(吨+d)/F¯(吨) =−dF¯(吨)/d吨/F¯(吨)=−d日志⁡F¯(吨)/d吨.
F(t)=\exp \left{-\int_{0}^{t} h(s) d s\right}=\exp {-H(t)},F(t)=\exp \left{-\int_{0}^{t} h(s) d s\right}=\exp {-H(t)},
在哪里H(吨)是积分的危险函数,积分的下限 0 与F¯(0)=1. 对于适当的寿命分布,即F¯(∞)=0,H(吨)必须倾向于∞作为吨→∞.


