### 统计代写|linear regression代写线性回归代考|Review of Elementary Statistical Concepts

statistics-lab™ 为您的留学生涯保驾护航 在代写linear regression方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写linear regression代写方面经验极为丰富，各种代写linear regression相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 统计代写|linear regression代写线性回归代考|Review of Elementary Statistical Concepts

You were probably introduced to statistics in a pre-algebra course. But your initial introduction may have occurred in elementary or primary school. When was the first time you heard the word mean used to indicate the average of a set of numbers? What about the term median? Perhaps in an early math class. Do you recall doing graphing exercises: being given two sets of numbers and plotting them on the $x$ – and $y$-axes (also called the coordinate axes)? You may remember that the set of numbers for $x$ and $y$ are called variables because they can take on different values (e.g., $[12,14,17])$. Contrast these to constants: sets of numbers that contain single values only $\left(\mathrm{e} \cdot \mathrm{g}_{-}[2,2,2]\right.$, $[17,17,17])$.

At some point, you were likely introduced to the concepts of elementary probability. ${ }^{1}$ Your introduction might have included a motivating question such as, “What is the probability or chance that, if you choose one ball from a closed container that includes one red and five blue balls, it will be the red ball?” You were instructed to first count the number of possible outcomes (since the container includes six balls, any one of which could be chosen, six possible outcomes exist), which served as a denominator. You then counted the particular outcome-choosing the red ball. Only one red ball is in the container, so the count of this outcome is one. This was a numerator. Putting these two counts together in a fraction resulted in a probability of choosing the red ball of $1 / 6$, or about $0.167$. The latter number is also called a proportion. Probabilities and proportions fall between zero and one; this constitutes the first rule of probability. ${ }^{2}$ Multiplying them by 100 creates percentages $(0.167 \times$ $100=16.7 \%$. But what does a probability of $0.167$ mean? One interpretation is that we expect to choose a red ball about $16.7 \%$ of the time when we pick balls one at a time over and over (don’t forget that we need to replace the chosen ball each time-this is called sampling with replacement). Each selection is called an experiment, so selecting the balls over and over again composes multiple experiments. But is it realistic to expect to choose a red ball $16.7 \%$ of the time? We could test this assumption by repeating the experiment again and

again and keeping a lengthy record. Statisticians refer to this approach-the theoretical idea rather than the actual tedious counting-as frequentist probability or frequentism since it examines, but actually assumes, what happens when something countable is repeated over and over. ${ }^{3}$

Probabilities are presented using, not surprisingly, the letter $P$. One way to symbolize the probability of choosing a red ball is with $P($ red). We may write $P($ red $)=0.167$ or $P($ red $)=1 / 6$. You might recall that some statistical tests, such as $t$-tests or analysis of variance (ANOVA)s, are accompanied by $p$-values. As we shall learn, $p$-values are a type of probability used in many statistical tests.

The basic foundations of statistical analysis are established by combining the principles of probability and elementary statistical concepts. Among a variety of descriptions, statistics may be defined as the analysis of data and the use of such data to make decisions in the presence of uncertainty. This two-pronged definition is useful for delineating two general types of statistical analyses: descriptive and inferential. Researchers use descriptive methods to analyze one or more variables in order to describe or summarize their characteristics, often with measures of central tendency and measures of dispersion. Descriptive methods are also employed to visualize variables, such as with histograms, density plots, stem-and-leaf plots, and box-and-whisker plots. We’ll see examples of some of these later in the chapter.

Inferential statistics are designed to infer or deduce something about a population from a sample and are useful for decision making, policy analysis, and gaining an understanding of patterns of associations among people, states, companies, or other units of interest. But uncertainty is a key issue since inferring something about a population requires acknowledgment that any sample contains limited information about that population. We’ll return to the issue of inferential statistics once we’ve reviewed some tools for describing variables.

## 统计代写|linear regression代写线性回归代考|Measures of Central Tendency

Now that we have some background information about statistics, let’s turn to some statistical measures, including how they are used and computed. We’ll begin with measures of central tendency. Suppose we collect data on the weights (in ounces) of several puppies in a litter. We place each puppy on a digital scale, trying to hold them still so we can record their weights. What is your best guess of the average weight of the puppies in the litter? Perhaps

not always the best, but the most common measure is the arithmetic mean, ${ }^{6}$ which is computed using the formulas in Equation 2.1.
$$\mathrm{E}[\mathrm{X}]=\mu=\frac{\sum X_{i}}{N} \text { (population) or } \bar{x}=\frac{\sum x_{i}}{n} \text { (sample) }$$
The term on the left-hand side of the first equation, $E[X]$, is a symbolic way of expressing the expected value of variable $X$, which is often used to represent the mean. We could also list this term as E[weight in ounces], but, as long as it’s clear that $X=$ weight in ounces, using $E[X]$ is satisfactory. The Greek letter $\mu$ represents the population mean, whereas $\bar{x}$ in the second part of the equation is the sample mean. The formula for computing the mean is simple. Add all the values of the variable and divide the sum by the number of observations. The cumbersome symbol that looks like an overgrown $E$ in the numerator of Equation $2.1$ is the summation sign; it tells us to add whatever is next to it. The symbol $X_{i}$ or $x_{i}$ signifies specific values of the variable, or the individual puppy weights we’ve recorded. The subscript $i$ indicates each observation. The letter $N$ or $n$ is the number of observations. This may be represented as $i \ldots n$. If $n=5$, then five individual observations are in the sample. Uppercase Roman letters represent population values and lowercase Roman letters represent sample values. $E[X]=\bar{x}$ implies that the sample mean is designed to estimate the population expected value or the population mean. Here’s a simple example. Our Siberian Husky, Steppenwolf, sires a litter of puppies. We weigh them and record the following: $[48,52,58,62,70]$. The sum of this set is $[48+52+58+62+70]=290$, with a sample mean of $290 / 5=58$ ounces. The mean is also called the center of gravity. Suppose we have a plank of wood that is of uniform weight across its span. We order the puppies from lightest to heaviest-trying to space them out proportional to their weights-and place them on the plank of wood. The mean is the point of balance, or the point at which we would place a fulcrum underneath the plank to balance the puppies.
The mean has a couple of interesting features:

1. It is measured in the same units as the observations. If the observations are not all measured in the same unit (e.g., some puppies’ weights are in grams, others in ounces), then the mean is not interpretable.
2. The mean provides a suitable measure of central tendency if the variable is measured continuously and is normally distributed.

## 统计代写|linear regression代写线性回归代考|Measures of Dispersion

Knowing a variable’s central tendency is just part of the story. As suggested by Figures $2.1$ and 2.2, depicting how much a variable fluctuates around the mean is also important. The objective of measures of dispersion is to indicate the spread of the distribution of a variable. You should be familiar with the term standard deviation, the most common dispersion measure for continuous variables. $.^{12}$ Before seeing the formula for this measure, however, let’s consider some other measures of dispersion. The most basic for a continuous

variable is the sum of squares, or $S S[x]$. Assuming a sample, the formula is supplied in Equation 2.3.
$$\operatorname{SS}[\mathrm{x}]=\Sigma\left(x_{i}-\bar{x}\right)^{2}$$
We first compute deviations from the mean $\left(x_{i}-\bar{x}\right)$ for each observation, square each, and then add them. If you’ve learned about ANOVA models, the sum of squares should be familiar. Perhaps you even recall the various forms of the sum of squares. We’ll learn more about these in Chapter 5 .
The second measure of dispersion, and one you should recognize, is the variance, which is labeled $s^{2}$ for samples and $\sigma^{2}$ (sigma-squared) for populations. The sample formula is shown in Equation 2.4.
$$\operatorname{var}[x]=s^{2}=\frac{\sum\left(x_{i}-\bar{x}\right)^{2}}{n-1}$$
The variance is the sum of squares divided by the sample size minus one and is measured in squared units of the variable. The standard deviation (symbolized as $s$ (sample) or $\sigma$ (population)), however, is measured in the same units as the variable (see Equation 2.5).
$$\mathrm{sd}[\mathrm{x}]=s=\sqrt{\frac{\sum\left(x_{i}-\bar{x}\right)^{2}}{n-1}}$$
A variable’s distribution is often represented by its mean and standard deviation (or variance). A variable that follows a normal distribution, for instance, is symbolized as $X \sim N(\mu, \sigma)$ or $x \sim N(\bar{x}, \mathrm{~s})$ (the wavy line means “distributed as”). When two variables are measured in the same units and have the same mean, one is less dispersed than the other if its standard deviation is smaller. Recall that the means for the two litters of puppies are 55 and 70 . Their standard deviations are $10.8$ and 47.1. The weights in the second litter have a much larger standard deviation, which is not surprising given their range. Another useful measure of dispersion is the coefficient of variation (CV), which is computed as $s / \bar{x}$ and is usually then multiplied by 100 . The CV is valuable for comparing distributions because it shows how much a variable fluctuates about its mean. The CVs for the two litters are $19.6$ and 67.1.

Earlier we discussed the median and trimmed mean as robust alternatives to the mean. Robust measures of dispersion are also available, such as the interquartile range (75th $-25$ th quartile) and the median absolute deviation (MAD): median ( $\left.\left|x_{i}-\tilde{x}\right|\right)$, or the median of the absolute values of the observed $x$ s minus the median. Whereas the standard deviations and CVs of the two puppy litters are far apart, the MADs are the same: 14.8. The one extreme value in litter 2 does not affect this robust measure of dispersion.

## 统计代写|linear regression代写线性回归代考|Measures of Central Tendency

1. 它的测量单位与观测值相同。如果观察结果并非全部以相同的单位测量（例如，一些幼犬的体重以克为单位，另一些以盎司为单位），则平均值无法解释。
2. 如果变量是连续测量的并且是正态分布的，则平均值提供了一种合适的集中趋势度量。

## 统计代写|linear regression代写线性回归代考|Measures of Dispersion

sd[X]=s=∑(X一世−X¯)2n−1

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。