统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Fitting Models with R

After the data have been inspected comes the next step of analysis, interpretation, inference, drawing conclusions, shaking the data down, or whatever phrase is in current vogue. This rests upon fitting models, fully parametric, semi-parametric, or non-parametric. The first example of fitting a fully parametric model occurs here in Section $3.2$, that of a non-parametric model in Section 4.1, and that of a semi-parametric one in Section 4.2. We will make use of the gold-standard R-package survival, which includes routines to perform a variety of tasks as well as some data sets to illustrate them. A high priority for the budding survival analyst is to learn how to use survival and other packages listed on the CRAN Web site.

In addition to the software available in survival, some homegrown programs are used. These are either to make the computations more transparent or to fill minor gaps in the available software. The first such instance occurs in Section 3.2. The R-code used in this book, together with the data sets not subject to copyright restriction, is available on the Web site referred to in the Preface.

统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Simulating Data with R

Simulation is a powerful tool in statistics. Many modern techniques using simulation have been developed on the back of fast computing. In this section an introductory exercise in simulating data will be described. It is often useful in assessing the performance of proposed methodology to be able to test its performance on data whose structure is known and controlled. It is a particularly powerful approach in situations where the framework is straightforward to set up but the consequences of interest are analytically intractable.

Let us generate some right-censored survival times whose mean depends on some recorded factors. Sophisticated usage of $\mathrm{R}$ is not the point here-just a demonstration of some basic commands. Suppose that $T_{i}$, the breakdown time of the $i$ th machine, has an exponential distribution with mean a given function of $x_{1}$, a measure of the intensity of usage, and $x_{2}$, a measure of the frequency and quality of maintenance. (No prizes for guessing that I have my car in mind here-see the observation with the smallest $x_{2}$.) Some basic $R$ code to achieve this is as follows:
$\mathrm{n}=25 ; \mathrm{b} 0=1.5 ; \mathrm{b} 1=1.2 ; \mathrm{b} 2=-2.5 ;$ #n sample size, $(\mathrm{b} 0, \mathrm{~b} 1, \mathrm{~b} 2)=$ regression coeffs
$x 1 m r e p(0, n) ; x 2 m x 1 ;$ timmxl; #initialise $x 1, x 2$, tim as vectors of $0 s$ of length $n$
for (i in $1: n){x 1[i]=r u n i f(1, \min =0, \max =1) ; x 2[i]=r u n i f(1) ;$
$1 a m-\exp (\mathrm{b} 0+\mathrm{b} 1 * x 1[i]+\mathrm{b} 2 * \mathrm{x} 2[i]) ;$ timrexp $(1$, ratem $1 \mathrm{am}) ;$ tim[i]min(ti, 10$) ;}$
for $(i \operatorname{in} 1 \mathrm{n})} \mathrm{v} 1 \mathrm{me}(x 1[i], x 2[i]$, tim[i]}; v 2 mformat (v1, widthmg, digits $=2)$;
$\operatorname{cat}(* \ln , v 2) ;}$
Note the exciting variety of brackets: (…) to enclose the arguments of a function, […] for indices of a vector or matrix, and ${\ldots}$ to group sets of instructions in a for loop. The for loop runs through the sample elements one by one. The functions runif and rexp generate samples from uniform and exponential distributions, respectively: look them up, using help (runif) and help (rexp), for their full capabilities. For example, $x 1=$ runif $(n)$ outside the for loop would have had the same effect. The function c (…) concatenates (look that word up too), for example, $a=c(b, c)$ puts b and c together into $a$, and cat prints stuff out. The model here for the mean breakdown time, $\lambda_{i}^{-1}$, is of log-linear form: $\log \lambda_{i}=b_{0}+b_{1} x_{i 1}+b_{2} x_{i 2}$, and the signs of the regression coefficients, $b_{1}$ and $b_{2}$, are meant to reflect the expected effects of $x_{1}$ and $x_{2}$. The times are right-censored at value 10 . The data, printed out, can be copied and pasted into a data file.

Let $T$ be the random variable representing the lifetime under study. The distribution function $F$ and the survivor function $\bar{F}$ of $T$ are defined by the probabilities
$$F(t)=\mathrm{P}(T \leq t), \quad \bar{F}(t)=\mathrm{P}(T>t),$$
so $F(t)+\bar{F}(t)=1$ for all $t$. Note that $F(t)$ is an increasing, and $\bar{F}(t)$ a decreasing, function of $t$; normally, $F(t)$ will rise from 0 to 1 , and $\bar{F}(t)$ will fall from 1 to 0 , over the range of $t$. When $T$ is essentially positive, as it is in most applications, $F(0)=0$ and $\bar{F}(0)=1$. The density function is defined as
$$f(t)=d F(t) / d t=-d \bar{F}(t) / d t$$
correspondingly,
$$F(t)=\int_{0}^{t} f(s) d s, \quad \bar{F}(t)=\int_{t}^{\infty} f(s) d s .$$
(Unless otherwise stated, it will be tacitly assumed that continuous survival distributions have densities, that is, that they are absolutely continuous.)

Modern survival analysis is mostly based around hazard functions. (This has nothing to do with the over-zealous health-and-safety culture that blights our lives nowadays.) These functions are concerned with the probability of imminent failure, that is, that, having got this far, you will get no further. The formal definition of the hazard function $h$ of $T$ is
$$h(t)=\lim _{\delta \downarrow 0} \delta^{-1} \mathrm{P}(T \leq t+\delta \mid T>t)$$
The right-hand side is equal to
\begin{aligned} \lim {\delta \downarrow 0} \delta^{-1} \mathrm{P}(tt) &=\lim {\delta \downarrow 0} \delta^{-1}{\bar{F}(t)-\bar{F}(t+\delta)} / \bar{F}(t) \ &=-{d \bar{F}(t) / d t} / \bar{F}(t)=-d \log \bar{F}(t) / d t . \end{aligned}
In different contexts $h(t)$ is variously known as the instantaneous failure rate, age-specific failure rate, age-specific death rate, intensity function, and force of mortality or decrement. Integration yields the inverse relationship
$$F(t)=\exp \left{-\int_{0}^{t} h(s) d s\right}=\exp {-H(t)},$$
where $H(t)$ is the integrated hazard function and the lower limit 0 of the integral is consistent with $\bar{F}(0)=1$. For a proper lifetime distribution, that is, one for which $\bar{F}(\infty)=0, H(t)$ must tend to $\infty$ as $t \rightarrow \infty$.

Both the distribution function $F$ and the hazard function $h$ are concerned with the probability that failure occurs before some given time. The difference is this: with the former, you are stuck at time zero looking ahead to a time maybe a long way into the future (with a telescope); with the latter, you are moving along with time and just looking ahead to the next instant (with a microscope).

统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Simulating Data with R

n=25;b0=1.5;b1=1.2;b2=−2.5;#n 样本大小，(b0, b1, b2)=回归系数
X1米r和p(0,n);X2米X1;tmmxl; ＃初始化X1,X2, tim 作为向量0s长度n

\operatorname{cat}(* \ln , v 2) ;}\operatorname{cat}(* \ln , v 2) ;}

F(吨)=磷(吨≤吨),F¯(吨)=磷(吨>吨),

F(吨)=dF(吨)/d吨=−dF¯(吨)/d吨

F(吨)=∫0吨F(s)ds,F¯(吨)=∫吨∞F(s)ds.
（除非另有说明，将默认假设连续生存分布具有密度，即它们是绝对连续的。）

H(吨)=林d↓0d−1磷(吨≤吨+d∣吨>吨)
F(t)=\exp \left{-\int_{0}^{t} h(s) d s\right}=\exp {-H(t)},F(t)=\exp \left{-\int_{0}^{t} h(s) d s\right}=\exp {-H(t)},

