### 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Some Small Data Sets

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Strengths of Cords

Crowder et al. (1991) gave this data set, shown here in Table 1.1, as Example 2.3. The figures are breaking strengths of parachute rigging lines after certain treament. This is of some interest to those too impatient to wait for the aeroplane to land. There are 48 observations, of which the last 7 are right censored, indicated by adding $a+$ sign.

The figures here just form a random sample from some distribution, possibly the least-structured form of data and more common in textbooks than practice. Nevertheless, models can be fitted and assessed and, on the odd occasion, useful inferences made.

Boag (1949) listed the data in his Table II, given in Table 1.2. The groups refer to different types of cancer, different treatments, and different hospitals. There were eight groups, listed as $a$ to $h$ in his Figure 1, but only four appear in the table. The first column gives survival time in months: the data are grouped into six-monthly intervals until three years, after which intervals become wider. In group $e$ the count 232 spans interval 0-12 months, and 156 spans 12-24 months, both indicate by a $+$.

Boag was interested in comparing the fits of lognormal and exponential distributions. He computed expected frequencies to set against those observed, and found that chi-square tests accepted the lognormal and rejected the exponential for groups $a, b$, and $c$ but not $e$ (for which the opposite was obtained).

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Catheter Infection

Collett (2003, Table 4.1) presented some data on 13 kidney dialysis patients, each of whom had a catheter inserted to remove waste products from the blood. The original data were used by McGilchrist and Aisbett (1991) to illustrate regression with frailty. If an infection occurs at the entry site, the catheter has to be removed and the area disinfected. The survival time is the number of days until a first infection occurs; pre-infection removal of the catheter for some other reason produces a right-censored time. Among the other variables recorded were age (years) and sex $(1=$ male, $2=$ female). Collett fitted a Cox proportional hazards model (Section 5.2) and found that sex, but not age, was a significant factor; one can only speculate. He then went on to illustrate the computation and interpretation of various types of residuals.

Table $1.3$ gives some artificial data on 27 patients of the same general type as Collett’s. Here, tim is time and cns is the censoring indicator ( 0 for a right-censored time, 1 for an observed time); observation on each patient was terminated at 28 days, so tim $=28$ entails $c n s=0$. The data will be used for illustration below.

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Inspecting the Data with R

This section and the ones following are for R-novices. If you are among the large number of statisticians more experienced than I am with $R$, go directly to Chapter 2; do not pass GO; do not collect $£ 200$.
I will assume that you have R set up on your computer. Otherwise, and if you do not know how to download it and set it up, phone a friend-I did. If you, like me, grew up in the days before personal computers, when man first stood erect and started to use tools, you will probably need to be guided through abstruse concepts such as working directories. Everitt and Hothorn (2010) tell you how to do it all in plain English that even I can understand. Venables and Ripley (1999) is also highly recommended-when all else fails, read the instructions! (Mrs. Crowder once forced me to stop the car, after driving round in circles for an hour, and ask for directions.)

Incidentally, to perform various data analyses throughout this book, functions have been coded in R; they are available on the Web site referred to in the Preface. There is certainly no claim that they are superior to ones available elsewhere, in the CRAN collection, for instance. But it is often quicker to write your own function than to spend hours searching lists of packages for one that does the job you want. What writing your own code does do, too, is to force you to get to grips with the particular technique better. It also enables you to arrange things as you want them to be arranged. Mainly, it is good practice for tackling data for which there is no off-the-shelf software. How many inappropriate statistical analyses are performed simply because there’s a readily available program that does it?

For illustration let us apply $\mathrm{R}$ to the catheter data (Table 1.3). First, the data should be set up in a plain text file, say catheter 1. dat. The standard format is
id age sex tim ens
1232220
$\begin{array}{lllll}2 & 21 & 1 & 9 & 1\end{array}$
$\begin{array}{llll}27 & 62 & 1 & 10\end{array}$
The data file has a header row at the top, giving names to the columns, and then 27 rows of figures. The file must occupy the current working directory, as defined in your R setup. Now the data must be loaded into $R$ : in the R-window type
dmx=read. table(‘catheter 1, dat’, headerm $)$; attach $(\operatorname{dmx})$; dmx; *input and check data
The option header $=T$ (T means True) indicates that there is a header in the data file (use header $=\mathrm{F}$ if not). The $27 \times 5$ data matrix will now be stored as $\mathrm{dmx}$ : this is created as a list variable. (In a moment of weakness I did once look it up in the manual, which has a whole chapter on lists and data frames, but too long to actually read.) The command attach (dmx) makes the columns accessible for further processing, for example, age is now a numerical vector of length 27. Sometimes you need to force a list to become numeric: this can be done with $d m x=a s$. numeric (unlist $(d m x)$ ). The # symbol indicates a comment: the rest of the line is ignored by the processor. The semicolon separates commands on the same line: some users prefer to have a new line for each command.

Now try some R commands: type the following, one at a time (pressing the Enter key after each), and see what you get:
age; mean(age); avagemean(age); avage; var(age); summary (dmx);
agf=sort(age); agf; agf [1]; hist(age); plot(age,tim); pairs(dmx);
Try variations to see what works and what does not. Incidentally, I just use $=$ in $R$ commands rather than $<-$ because (a) I am more used to it, (b) it is easier to type, and (c) I cannot rid myself of the feeling that $y<-x$ means that $y$ is less than $-x$. If you come across an unfamilar function, such as $y=$ wotsthisdo $(x)$, look it up online by typing help (wotsthisdo). Many more functions can be found in the online manual and in Venables and Ripley (1999).

You will soon decide to type your commands into a text file and just paste them into the $\mathrm{R}$ window: this can save a lot of frustration in retyping to correct minor errors. Throughout this book I will use $\mathrm{R}$ for data processing. My listings of R-code are basic and without frills, reflecting my own level in competence. Aficionados will spot slicker ways of doing things. However, there is a case for transparency, hoping to keep down the number of mistakes. One small tip that might come in handy is as follows: after the customary cursing of the computer for daring to produce errors with your code, close the R-window and start again, sometimes previous assignations can corrupt the current run.

## 统计代写|多元统计分析代写Multivariate Statistical Analysis代考|Inspecting the Data with R

id age sex tim ens
1232220
221191
2762110

age; 平均年龄）; avagemean（年龄）；野蛮的；变量（年龄）；摘要（dmx）；
agf=排序（年龄）；agf; agf [1]；历史（年龄）；情节（年龄，蒂姆）；对（dmx）；

