### 统计代写|AP统计辅导AP统计答疑|Exploring and Graphing Bivariate Data

## 统计代写|AP统计辅导AP统计答疑|Scatterplots

• Scatterplots are ideal for exploring the relationship between two quantitative variables. When constructing a scatterplot we often deal with explanatory and response variables. The explanatory variable may be thought of as the independent variable, and the response variable may be thought of as the dependent variable.
• It’s important to note that when working with two quantitative variables, we do not always consider one to be the explanatory variable and the other to be the response variable. Sometimes, we just want to explore the relationship between two variables, and it doesn’t make sense to declare one variable the explanatory and the other the response.
• We interpret scatterplots in much the same way we interpret univariate data; we look for the overall pattern of the data. We address the form, direction, and strength of the relationship. Remember to look for outliers as well. Are there any points in the scatterplot that deviate from the overall pattern?
• When addressing the form of the relationship, look to see if the data is linear (Figure 2.1) or curved (Figure 2.2).

## 统计代写|AP统计辅导AP统计答疑|Correlation

• When dealing with linear relationships, we often use the r-value, or the correlation coefficient. The correlation coefficient can be found by using the formula:
$$r=\frac{1}{n-1} \sum\left(\frac{x_{i}-\bar{x}}{s_{x}}\right)\left(\frac{y_{i}-\bar{y}}{s_{y}}\right)$$
• In practice, we avoid using the formula at all cost. However, it helps to suffer through a couple of calculations using the formula in order to understand how the formula works and gain a deeper appreciation of technology.
• It’s important to remember the following facts about correlation (make sure you know all of them!):

Correlation (the r-value) only describes a linear relationship. Do not use $r$ to describe a curved relationship.

Correlation makes no distinction between explanatory and response variables. If we switch the $x$ and $y$ variables, we still get the same correlation.
Correlation has no unit of measurement. The formula for correlation uses the means and standard deviations for $x$ and $y$ and thus uses standardized values.

If $r$ is positive, then the association is positive; if $r$ is negative, then the association is negative.
$-1 \leq r \leq 1: r=1$ implies that there is a perfectly linear positive
relationship. $r=-1$ implies that there is a perfectly linear negative relationship. $r=0$ implies that there is no correlation.

The r-value, like the mean and standard deviation, is not a resistant measure. This means that even one extreme data point can have a dramatic effect on the r-value. Remember that outliers can either strengthen or weaken the r-value. So use caution!
The r-value does not change when you change units of measurement. For example, changing the $x$ and/or $y$ variables from centimeters to millimeters or even from centimeters to inches does not change the r-value.

Correlation does not imply causation. Just because two variables are strongly associated or even correlated (linear) does not mean that changes in one variable are causing changes in another.

## 统计代写|AP统计辅导AP统计答疑|Least Squares Regression

• When modeling linear data, we use the Least Squares Regression Line (LSRL). The LSRL is fitted to the data by minimizing the sum of the squared residuals. The graphing calculator again comes to our rescue by calculating the LSRL and its equation. The LSRL equation takes the form of $\hat{y}=a+b x$ where $b$ is the slope and $a$ is the $y$-intercept. The AP* formula sheet uses the form $\hat{y}=b_{0}+b_{1} x$. Either form may be used as long as you define your variables. Just remember that the number in front of $x$ is the slope, and the “other” number is the $y$-intercept.
• Once the LSRL is fitted to the data, we can then use the LSRL equation to make predictions. We can simply substitute a value of $x$ into the equation of the LSRL and obtain the predicted value, $\hat{y}$.
• The LSRL minimizes the sum of the squared residuals. What does this mean? A residual is the difference between the observed value, $y$, and the predicted value, $\hat{y}$. In other words, residual-observed – predicted. Remember that all predicted values are located on the LSRL. A residual can be positive, negative, or zero. A residual is zero only when the point is located on the LSRL. Since the sum of the residuals is always zero,

we square the vertical distances of the residuals. The LSRL is fitted to the data so that the sum of the square of these vertical distances is as small as possible.

• The slope of the regression line (LSRL) is important. Consider the time required to run the last mile of a marathon in relation to the time required to run the first mile of a marathon. The equation $\hat{y}=1.25 x$, where $x$ is the time required to run the first mile in minutes and $\hat{y}$ is the predicted time it takes to run the last mile in minutes, could be used to model or predict the runner’s time for his last mile. The interpretation of the slope in context would be that for every one minute increase in time needed to run the first mile, the predicted time to run the last mile would increase by $1.25$ minutes, on average. It should be noted that the slope is a rate of change and that that since the slope is positive, the time will increase by $1.25$ minutes. A negative slope would give a negative rate of change.

