### 统计代写|应用统计代写applied statistics代考|Before You Begin

## 统计代写|应用统计代写applied statistics代考|aka Thoughts on Proper Data Analysis

Before we embark on the journey that is learning $R$ and how to use it to analyze your data and make fantastic figures, it is useful to stop and think a little bit about best practices for data analysis.
$2.2$ BASIC PRINCIPLES OF EXPERIMENTAL DESIGN
If there are three words to remember when thinking about experimental design, they are balance, randomization, and replication. In a nutshell, what you are trying to prevent with these three factors is your data being correlated in some way that is unhelpful to your analysis. You are also trying to ensure that your data are independent from one another and that you have enough data to actually determine if your treatments did anything or not (see Boxes 2.1, 2.2, and 2.3).

Obviously, it is not always possible to control these aspects of your data, particularly if you have observational data (as compared to a controlled experiment which you design and run yourself). But, even in the case of observational studies, these principles are important to keep in mind and consider.

## 统计代写|应用统计代写applied statistics代考|BLOCKED EXPERIMENTAL DESIGNS

Consider the four experimental setups shown in Figure 2.1. Imagine that we are now testing the effects of four fertilizers on plant growth (labelled A, B, C, and D), each with 12 individuals. The experiment is conducted in four separate “blocks” What is a block? It could be many things. Maybe it is a physical way of setting up the experiment, for example, four shelves in an incubator that contain the experimental units or four rooms that contain the cages our individuals live in. Maybe, due to space or time limitations, only 12 individuals can be tested or measured at a time, and thus the experiment has to be run four separate times. Each of these can be considered a “block” so you can hopefully imagine how this idea relates to your own research. Blocks are only important to consider if there is some systematic difference among them.

In the first example, the four treatments will be perfectly correlated with the four blocks. Thus, if we imagine a significant difference is detected in one treatment, there is no way to know if it is because of the experimental treatment or if there was something else going on in that block (or room, or time point, or whatever you want to imagine that block represents). Once again, this is an example of pseudoreplication because it seems like we have a large sample size but in reality, we have a sample size of $\mathrm{N}=1$ in each of our treatments. Despite growing 48 different plants, this design is unreplicated.
The second example is a fully randomized design, where the four treatments are allocated across the four blocks completely at random. The third example is a fully balanced design, where each of the four treatments is assigned to each block in the same manner. Each of these setups has its own advantages and disadvantages.

The fully randomized design is good, and in theory should lead to the highest degree of replication, with all experimental units being truly independent. In reality it can actually work against the principle of balance, since some treatments might end up overrepresented in some blocks and underrepresented in others (e.g., in Figure 2.1, there are six individuals from treatment B in Block 2, but only one in Block 3). In the extreme, leaving your entire setup to random chance could lead to a horribly unbalanced and biased design, but this would be very rare. In general, fully randomizing your experiment is a very good idea!

## 统计代写|应用统计代写applied statistics代考|YOU CAN (AND SHOULD) PLAN YOUR ANALYSES BEFORE YOU HAVE THE DATA!

In addition to the aspects of experimental design described previously, the other most important thing to do is to have a clear idea of your predictor and response variables before you even start the experiment. Before you ever put a mouse in a testing box or a seed in growth chamber, you should identify what it is you are going to measure. Hopefully, if you know your study system pretty well or perhaps have some preliminary data, you can estimate what the data are going look like which will allow you to think about and plan for what type of analyses you will do. Maybe that sounds like wishful thinking, but this whole book is about the importance of knowing what your data look like, so don’t worry-you’ll get there!

Sir Ronald Fisher, one of the founders of modern statistics, offered one of the best statements about this issue in $1938 .$ Fisher’s point is that because your experimental design directly effects your data analysis, you should think about your analysis up front when planning the experiment.

