统计代写|回归分析作业代写Regression Analysis代考|STA4210

统计代写|回归分析作业代写Regression Analysis代考|Evaluating the Constant Variance

The first graph you should use to evaluate the constant variance assumption is the $\left(\hat{y}_i, e_i\right)$ scatterplot. Look for changes in the pattern of vertical variability of the $e_i$ for different $\hat{y}_i$. The most common indications of constant variance assumption violation are shapes that indicate either increasing variability of $Y$ for larger $\mathrm{E}(Y \mid X=x)$, or shapes that indicate decreasing variability of $Y$ for larger $\mathrm{E}(Y \mid X=x)$. Increasing variability of $Y$ for larger $\mathrm{E}(Y \mid X=x)$ is indicated by greater variability in the vertical ranges of the $e_i$ when $\hat{y}_i$ is larger.
Recall again that the constant variance assumption (like all assumptions) refers to the data-generating process, not the data. The statement “the data are homoscedastic” makes no sense. By the same logic, the statements “the data are linear” and “the data are normally distributed” also are nonsense. Thus, whichever pattern of variability that you decide to claim based on the $\left(\hat{y}_i, e_i\right)$ scatterplot, you should try to make sense of it in the context of the subject matter that determines the data-generating process. As one example, physical boundaries on data force smaller variance when the data are closer to the boundary. As another, when income increases, people have more choice as to whether or not they choose to purchase an item. Thus, there should be more variability in expenditures among people with more money than among people with less money. Whatever pattern you see in the $\left(\hat{y}_i, e_i\right)$ scatterplot should make sense to you from a subject matter standpoint.

While the LOESS smooth to the $\left(\hat{y}_i, e_i\right)$ scatterplot is useful for checking the linearity assumption, it is not useful for checking the constant variance assumption. Instead, you should use the LOESS smooth over the plot of $\left(\hat{y}_i,\left|e_i\right|\right)$. When the variability in the residuals is larger, they will tend to be farther from zero, giving larger mean absolute residuals $\left|e_i\right|$. An increasing trend in the $\left(\hat{y}_i,\left|e_i\right|\right)$ plot suggests larger variability in $Y$ for larger $\mathrm{E}(Y \mid X=x)$, and a flat trend line for the $\left(\hat{y}_i,\left|e_i\right|\right)$ plot suggests that the variability in $Y$ is nearly unrelated to $\mathrm{E}(Y \mid X=x)$. However, as always, do not over-interpret. Data are idiosyncratic (random), so even if homoscedasticity is true in reality, the LOESS fit to the $\left(\hat{y}_i,\left|e_i\right|\right)$ graph will not be a perfectly flat line, due to chance alone. To understand “chance alone” in this case you can simulate data from a homoscedastic model, construct the $\left(\hat{y}_i,\left|e_i\right|\right)$ graph, and add the LOESS smooth. You will see that the LOESS smooth is not a perfect flat line, and you will know that such deviations are explained by chance alone.

The hypothesis test for homoscedasticity will help you to decide whether the observed deviation from a flat line is explainable by chance alone, but recall that the test does not answer the real question of interest, which is “Is the heteroscedasticity so bad that we cannot use the homoscedastic model?” (That question is best answered by simulating data sets having the type of heteroscedasticity you expect with your real data, then by performing the types of analyses you plan to perform on your real data, then by evaluating the performance of those analyses.)

统计代写|回归分析作业代写Regression Analysis代考|Evaluating the Constant Variance Assumption

Consider the $\left(\hat{y}_i, \mid e_i\right)$ scatterplot in the right-hand panel of Figure 4.7. In that plot, there is an increasing trend that suggests heteroscedasticity. You can test for trend in the $\left(\hat{y}_i,\left|e_i\right|\right)$ scatterplot by fitting an ordinary regression line to those data, and then testing for significance of the slope coefficient. Significance $(p<0.05)$ means that the observed trend is not easily explained by chance alone under the homoscedastic model; insignificance $(p>0.05)$ means that the observed trend is explainable by chance alone under the homoscedastic model. This test is called the Glejser test (Glejser 1969).

There are many tests for heteroscedasticity other than the Glejser test, including the “Breusch-Pagan test” and “White’s test.” These tests use absolute and/or squared values of the residuals. Because absolute and squared residuals are non-negative, the assumption of normality of the absolute and squared residuals is obviously violated. Hence these tests are only approximately valid.

Another approach to testing heteroscedasticity is to model the variance function $\operatorname{Var}(Y \mid X=x)=g(x, \theta)$ explicitly within a model that uses a reasonable (perhaps nonnormal) distribution for $Y \mid X=x$, then to estimate the model using maximum likelihood, and then to test for constant variance in the context of that model using the likelihood ratio test. This approach is better because it identifies the nature of the heteroscedasticity explicitly, which may be an end unto itself in your research. This approach is also better because you can use the resulting heteroscedastic variance function $g(x, \theta)$ to obtain weighted least-squares (WLS) estimates of the $\beta$ ‘s that are better than the ordinary least-squares (OLS) estimates. Chapter 12 discusses these issues further.

统计代写|回归分析作业代写Regression Analysis代考|Evaluating the Linearity Assumption Using Graphical Methods

While we are not big fans of data analysis “recipes,” in regression or elsewhere, which instruct you to perform step 1, step 2, step 3, etc. for the analysis of your data, we are happy to recommend the following first step for the analysis of regression data.
Step 1 of any analysis of regression data
Plot the ordinary $\left(x_i, y_i\right)$ scatterplot, or scatterplots if there are multiple $X$ variables.
The simple $\left(x_i, y_i\right)$ scatterplot gives you immediate insight into the viability of the linearity, constant variance, and normality assumptions (see Section $1.8$ for examples of such scatterplots). It will also alert you to the presence of outliers.

To evaluate linearity using the $\left(x_i, y_i\right)$ scatterplot, simply look for evidence of curvature. You can overlay the LOESS fit to better estimate the form of the curvature. Recall, though, that all assumptions refer to the data-generating process. Thus, if you are going to claim there is curvature, such curvature should make sense in the context of the subject matter. For one example, boundary constraints can force curvature: If the minimum $Y$ is zero, then the curve must flatten for $X$ values where $Y$ is close to zero. For another example, in the case of the product preference vs. product complexity shown in Figure 1.16, there is a subject matter rationale for the curvature: People prefer more complexity up to a point, after which more complexity is less desirable. Ideally, you should be able to justify curvature in terms of the processes that produced your data.

A refinement of the $\left(x_i, y_i\right)$ scatterplot is the residual $\left(x_i, e_i\right)$ scatterplot. This scatterplot is an alternative, “magnified” view of the $\left(x_i, y_i\right)$ scatterplot, where the $e=0$ horizontal line in the $\left(x_i, e_i\right)$ scatterplot corresponds to the least-squares line in the $\left(x_i, y_i\right)$ scatterplot. Look for upward or downward ” $\mathrm{U}^{\prime \prime}$ shape to suggest curvature; overlay the LOESS fit to the $\left(x_i, e_i\right)$ data to help see these patterns.

You can also use the $\left(\hat{y}i, e_i\right)$ scatterplot to check the linearity assumption. In simple regression (i.e., one $X$ variable), the $\left(\hat{y}_i, e_i\right)$ scatterplot is identical to the $\left(x_i, e_i\right)$ scatterplot, with the exception that the horizontal scale is linearly transformed via $\hat{y}_i=\hat{\beta}_0+\hat{\beta}_1 x_i$. When the estimated slope is negative, the horizontal axis is “reflected”-large values of $x$ map to small values of $\hat{y}_i$ and vice versa. You can use this plot just like the $\left(x_i, e_i\right)$ scatterplot. In simple regression, the $\left(\hat{y}_i, e_i\right)$ scatterplot offers no advantage over the $\left(x_i, e_i\right)$ scatterplot. However, in multiple regression, the $\left(\hat{y}_i, e_i\right)$ scatterplot is invaluable as a quick look at the overall model, since there is just one $\left(\hat{y}_i, e_i\right)$ plot to look at, instead of several $\left(x{i j}, e_i\right)$ plots (one for each $X_j$ variable). This $\left(\hat{y}_i, e_i\right)$ scatterplot, which you can call a “predicted/residual scatterplot,” is automatically provided by $\mathrm{R}$ when you plot a fitted lm object.

统计代写|回归分析作业代写Regression Analysis代考|Evaluating the Linearity Assumption Using Hypothesis Testing Methods

Here, we will get slightly ahead of the flow of the book, because multiple regression is covered in the next chapter. A simple, powerful way to test for curvature is to use a multiple regression model that includes a quadratic term. The quadratic regression model is given by:
$$Y=\beta_0+\beta_1 X+\beta_2 X^2+\varepsilon$$
This model assumes that, if there is curvature, then it takes a quadratic form. Logic for making this assumption is given by “Taylor’s Theorem,” which states that many types of curved functions are well approximated by quadratic functions.

Testing methods require restricted (null) and unrestricted (alternative) models. Here, the null model enforces the restriction that $\beta_2=0$; thus the null model states that the mean response is a linear (not curved) function of $x$. So-called “insignificance” (determined historically by $p>0.05$ ) of the estimate of $\beta_2$ means that the evidence of curvature in the observed data, as indicated by a non-zero estimate of $\beta_2$ or by a curved LOESS fit, is explainable by chance alone under the linear model. “Significance” (determined historically by $p<0.05$ ) means that such evidence of curvature is not easily explained by chance alone under the linear model.

But you should not take the result of this $p$-value based test as a “recipe” for model construction. If “significant,” you should not automatically assume a curved model. Instead, you should ask, “Is the curvature dramatic enough to warrant the additional modeling complexity?” and “Do the predictions differ much, whether you use a model for curvature or the ordinary linear model?” If the answers to those questions are “No,” then you should use the linear model anyway, even if it was “rejected” by the $p$-value based test.

In addition, models employing curvature (particularly quadratics) are notoriously poor at the extremes of the $x$-range(s). So again, you can easily prefer the linear model, even if the curvature is “significant” $(p<0.05)$.

Conversely, if the quadratic term is “insignificant,” it does not mean that the function is linear. Recall from Chapter 1 that the linearity is usually false, a priori; hence, “insignificance” means that you have failed to detect curvature. If the test for the quadratic term is “insignificant,” it is most likely a Type II error.

Even when the curvature does not have a perfectly quadratic form, the quadratic test is usually very powerful; rare exceptions include cases where the curvature is somewhat exotic. If the quadratic model is grossly wrong for modeling curvature in your application, then you should use a test based on a model other than the quadratic model.

统计代写|回归分析作业代写Regression Analysis代考|Evaluating the Linearity Assumption Using Hypothesis Testing Methods

$$Y=\beta_0+\beta_1 X+\beta_2 X^2+\varepsilon$$

统计代写|回归分析作业代写Regression Analysis代考|Descriptive Methods Versus Testing Methods for Checking Assumptions

One benefit of using graphical/descriptive methods to check assumptions, rather than hypothesis testing ( $p$-value based) methods, is transparency: The graphs show the data, as they are. The $p$-values of the statistical tests give information that is distorted by the sample size. Another benefit is that you can determine the practical significance of a result using graphical methods and descriptive statistics, but not by statistical tests and their $p$-values. Tests can tell you whether a result is statistically significant (again, historically, $p<0.05$ ), but statistically significant results can be practically unimportant, and vice versa, because of the sample size distortion. Unlike statistical tests of assumptions, larger sample sizes always point you closer to the best answer when you use well-chosen graphs and descriptive statistics.

But, care is needed in interpreting and constructing graphs. Interpreting graphs requires practice, judgment, and some knowledge of statistics. In addition, producing good graphs requires skill, practice, and in some cases, an artistic eye. A classic and very helpful text on the use and construction of statistical graphics is The Visual Display of Quantitative Information, by Edward Tufte (Tufte 2001).

The only good thing about tests is that they answer the question, “Is the apparent deviation from the assumption that is seen in the data explainable by chance alone?” The question of whether a result is explainable by chance alone is indeed important because researchers are prone to over-interpret idiosyncratic (chance) aspects of their data. Hypothesis testing provides a reality check to guard against such over-interpretation. But other methods, simulation in particular, are better for assessing the effects of chance deviation. Hence, $p$-value based hypothesis testing methods are not even needed for their one use, which is to assess the effect of chance variation.

Tests of model assumptions have been used for much of statistical history and are still used today in some quarters. Perhaps the main reason for their historical persistence is simplicity. Researchers have routinely applied the rule, “p-value greater than $0.05 \rightarrow$ assumption is satisfied; $p$-value less than $0.05 \rightarrow$ assumption is not satisfied,” because it is simple, despite it being a horribly misguided practice. We have already mentioned many concerns with tests, but here they are, in set-off form, so that you can easily refer to them.

统计代写|回归分析作业代写Regression Analysis代考|Which Assumptions Should You Evaluate First

We suggest (only mildly; this is not a hard-and-fast rule) that you evaluate the linearity and constant variance assumptions first. The reason is that, for checking the assumptions of independence and normality, you often will use the residuals $e_i=y_i-\hat{y}_{i,}$, where the predicted values $\hat{y}_i=\hat{\beta}_0+\hat{\beta}_1 x_i$ are based on the linear fit. If the assumption of linearity is badly violated, then these estimated residuals will be badly biased. In such a case you should evaluate the normality and independence assumptions by first fitting a more appropriate (non-linear) model, and then by using that model to calculate the predicted values and associated residuals.

Furthermore, if the linearity assumption is reasonably valid but the homoscedasticity (constant variance) assumption is violated, then the residuals $e_i$ will automatically look non-normal, even when the conditional distributions $p(y \mid x)$ are normal because some residuals will come from distributions with larger variance and some will come from distributions with smaller variance, lending a heavy-tailed appearance to the pooled $\left{e_i\right}$ data. For these reasons, we mildly suggest that you evaluate the assumptions in the order (1) linearity, (2) constant variance, (3) independence, and (4) normality. But there are cases where this sequence is logically flawed, so please just treat it as one of those “ugly rules of thumb.”

统计代写|回归分析作业代写Regression Analysis代考|The Trashcan Experiment: Random- $X$ Versus Fixed- X

Here is something you can do (or at least imagine doing) with a group of people. You need a crumpled piece of paper (call it a “ball”), a tape measure, and a clean trashcan. Let each person attempt to throw the ball into the trashcan. The goal of the study is to identify the relationship between success at throwing the ball into the trash can $(Y)$, and distance from the trashcan $(X)$.

In a fixed- $X$ version of the experiment, place markers 5 feet, 10 feet, 15 feet and 20 feet from the trashcan. Have all people attempt to throw the ball into the trashcan from all those distances. Here the $X^{\prime}$ s are fixed because they are known in advance. If you imagine doing another experiment just like this one (say in a different class), then the $X^{\prime}$ s would be the same: $5,10,15$ and 20 .

In a random- $X$ version of the same experiment, you give a person the ball, then tell the person to pick a spot where he or she thinks the probability of making the shot might be around $50 \%$. Have the person attempt to throw the ball into the trashcan multiple times from that distance that he or she selected. Repeat for all people, letting each person pick where they want to stand. Here the $X$ ‘s are random because they are not known in advance. If you imagine doing another experiment just like this one (say in a different class), then the $X$ ‘s would be different because different people will choose different places to stand.

The fixed- $X$ version gives rise to experimental data. In experiments, the experimenter first sets the $X$ and then observes the $Y$. The random- $X$ version gives rise to observational data, where the $X$ ‘s are simply observed, and not controlled by the researcher.

Experimental data are the gold standard, because with observational data the observed effect of $X$ on $Y$ may not be a causal effect. With experimental data, the observed effect of $X$ on $Y$ can be more easily interpreted as a causal effect. Issues of causality in more detail in later chapters.

统计代写|回归分析作业代写Regression Analysis代考|The Production Cost Data and Analysis

A company produces items, classically called “Widgets,” in batches. Here $Y=$ production cost for a given job (a batch), and $X=$ the number of “Widgets” produced during that job. It stands to reason that it will cost more to produce more widgets. Regression is used to clarify the nature of this relationship by identifying the additional cost per widget produced (called “variable cost”), and the set-up cost for any job, regardless of the number of widgets produced (called “fixed cost”).

The following $\mathrm{R}$ code (i) reads the data, (ii) draws a scatterplot of the (Widgets, Cost) data, (iii) adds the best-fitting straight line to the data, and (iv) summarizes the results of the linear regression analysis. Do not worry if you do not understand everything at this time; it will all be explained in detail later.

广义线性模型代考

统计代写|回归分析作业代写Regression Analysis代考|Models and Generalization

The model $p(y \mid x)$ is the model for these processes; therefore, the data specifically target $p(y \mid x)$.

Depending on the context of the study, these data-producing processes may involve biology, psychology, sociology, economics, physics, etc. The processes that produce the data also involve the measurement processes: If the measurement process is faulty, then the data will provide misleading information about the real, natural processes, because, as the note in the box above states, the data target the processes that produced the data. In addition to natural and measurement processes, the process also involves the type of observations sampled, where they are sampled, and when they are sampled. This ensemble of processes that produces the data is called the data-generating process, abbreviated DGP.
Consider the (Age, Assets) example introduced in the previous section, for example. Suppose you have such data from a Dallas, Texas-based retirement planning company’s clientele, from the year 2003. The processes that produced these data include people’s asset accrual habits, socio-economic nature of the clientele, method of measurement (survey or face-to-face interview), extant macroeconomic conditions in the year 2003, and regional effects specific to Dallas, Texas. All of these processes, as well as any others we might have missed, collectively define the data-generating process (DGP).

The regression model $Y \mid X=x \sim p(y \mid x)$ is a model for the DGP. Like all models, this model allows generalization. Not only does the model explain how the actual data you collected came to be, it also generalizes to an infinity (or near infinity) of other data values that you did not collect. To visualize such “other data,” consider the (Age, Assets) example of the preceding paragraph, and imagine being back in the year 1998 , well prior to the data collection in 2003. Envision the (Age, Assets) data that might be collected in 2003, from your standpoint in 1998 . There are nearly infinitely many potentially observable data values, do you see? The regression model Assets $\mid$ Age $=x \sim p($ Assets $\mid$ Age $=x)$ describes not only how the actual 2003 data arose, but it also describes all the other potentially observable data that could have arisen. Thus, the model generalizes beyond the observed data to the potentially observable data.

统计代写|回归分析作业代写Regression Analysis代考|The “Population” Terminology and Reasons Not to Use It

In the previous section, we emphasized that a regression model is a model for the datagenerating process, which is comprised of measurement, scientific, and other processes at the given time and place of data collection. Some sources describe regression (and other statistical) models in terms of “populations” instead of “processes.” The “population” framework states that $p(y \mid x)$ is defined in terms of a finite population of values from which $Y$ is randomly sampled when $X=x$. This terminology is flawed in most statistics applications, but is especially flawed in regression; in this section, we explain why.

Suppose you are interested in estimating the mean amount of charitable contributions $(Y)$ that one might claim on a U.S. tax return, as a function of taxpayer income $(X=x)$. This mean value is denoted by $\mathrm{E}(Y \mid X=x)$, and is mathematically calculated either by $\mathrm{E}(Y \mid X=x)=\int_{\text {all } y} y p(y \mid x) d y$ when $p(y \mid x)$ is a continuous distribution, or by $\mathrm{E}(Y \mid X=x)=\sum_{\text {all } y} y p(y \mid x)$ when $p(y \mid x)$ is a discrete distribution.

To estimate $\mathrm{E}(\mathrm{Y} \mid \mathrm{X}=x)$, you obtain a random sample of all taxpayers by (a) identifying the population of all taxpayers (maybe you work at the IRS!), and (b) using a computer random number generator to select a random sample from this population.

Because each taxpayer is randomly sampled, it is correct to infer that the observed $Y$ in your sample for which $X=\$ 1,000,000.00$are a random sample from the subpopulation of U.S. taxpayers having$X=\$1,000,000.00$. However, in regression analysis, the distribution of this subpopulation of $Y$ values is not what is usually meant by $p(y \mid x)$.

统计代写|回归分析作业代写Regression Analysis代考|The “Population” Terminology and Reasons Not to Use It

$\mathrm{E}(Y \mid X=x)=\int_{\text {all } y} y p(y \mid x) d y$ 什么时候 $p(y \mid x)$ 是一个连续分布，或者由
$\mathrm{E}(Y \mid X=x)=\sum_{\text {all } y} y p(y \mid x)$ 什么时候 $p(y \mid x)$ 是离散分布。

统计代写|回归分析作业代写Regression Analysis代考|Introduction to Regression Models

Regression models are used to relate a variable, $Y$, to a single variable $X$, or to multiple variables, $X_{1}, X_{2}, \ldots, X_{k}$.

• How does a person’s choice of toothpaste $(Y)$ relate to the person’s age $\left(X_{1}\right)$ and income $\left(X_{2}\right)$ ?
• How does a person’s cancer remission status $(Y)$ relate to their chemotherapy regimen $(X)$ ?
• How does the number of potholes in a road $(Y)$ relate to the material used in surfacing $\left(X_{1}\right)$ and time since installation $\left(X_{2}\right)$ ?
• How does a person’s ability to repay a loan $(Y)$ relate to the person’s income $\left(X_{1}\right)$, assets $\left(X_{2}\right)$, and debt $\left(X_{3}\right)$ ?
• How does a person’s intent to purchase a technology product $(Y)$ relate to their perceived usefulness $\left(X_{1}\right)$ and perceived ease of use of the product $\left(X_{2}\right)$ ?
• How does today’s return on the S\&P 500 stock index $(Y)$ relate to yesterday’s return $(X)$ ?
• How does a company’s profitability $(Y)$ relate to its investment in quality management $(X)$ ?
Understanding such relationships can help you to predict what an unknown $Y$ will be for a given fixed value of $X$, it can help you to make decisions as to what course of action you should choose, and it can help you to understand the subject that you are studying in a scientific way.

Regression models can help you to forecast the future as well. Forecasting is a special case of prediction: Forecasting means prediction of the future, while prediction includes any type of “what-if” analysis, not only about what might happen in the future, but also about what might have happened in the past under different circumstances.
In some subjects, you learn to make predictions using equations such as
$$Y=f(X),$$
where the function $f$ might be a linear, quadratic, exponential, or logarithmic function; or it might not have any “named” function form at all. In all cases, though, this is a deterministic relationship: Given a particular value, $x$, of the variable $X$, the value of $Y$ is completely determined by $Y=f(x)$.

Notice that there is a distinction between upper-case $X$ and lower-case $x$. The convention followed in this book regarding lower-case and upper-case $Y$ and $X$ is standard: Uppercase refers to the variable in general, which can be many different possible values, while lower-case refers to a specific value of the variable. For example, $X=$ Age can be many different values in general, whereas $X=x$ identifies the subset of people having age $x$, e.g., the subset of people who are 25 years old.

统计代写|回归分析作业代写Regression Analysis代考|Randomness of the Measured Area of a Circle as Related to Its Measured Radius

A circle in nature has its radius $(X)$ measured. Suppose the measurement is $X=10$ meters. Still, there are many potentially observable measurements of its area, $Y$, due to imperfections in the circle and imperfections in the measuring device. The regression model states that $Y$ is a random observation from the conditional distribution $p(y \mid X=10)$.

This model is reasonable, because it perfectly matches the reality that there are many potentially observable measurements of the area $Y$, even when the radius $X$ is measured to be precisely 10 meters. Note that this model does not say anything about the mean of the distribution $p(y \mid x)$ : It might be $3.14159265 x^{2}$, but it is more likely not, because of biases in the data-generating process (again, this data-generating process includes imperfections in circles, and also imperfections in the measuring devices).

This model also does not say anything about the nature of the probability distributions $p(y \mid x)$, whether they are discrete, continuous, normal, lognormal, etc., or even whether you have one type of distribution for one $x$ (e.g., normal) and another for a different $x$ (e.g., Poisson). It simply says there is a distribution $p(y \mid x)$ of potential outcomes of $Y$ when $X=x$, and that the number you measure will appear as if produced at random from this distribution, i.e., as if simulated using a random number generator. As such, there is no arguing with the model-the measured data really will look this way (random, variable), hence you may even say that this model is a correct model because it produces data that are random and variable (non-deterministic). Further, because the model is so general, no data can ever contradict, or “reject” it.

Thus, the model $p(y \mid x)$ is correct. It is only when you make assumptions about the nature of $p(y \mid x)$, for example, about the specific distributions (e.g. normal), and about how are distributions related to $x$ (e.g., linearly), that you must consider that the model is wrong in certain ways.

The model $p(y \mid x)$ does not require that the distribution of $Y$ change for different values of $X$. If the distributions $p(y \mid x)$ are the same, for all values $X=x$, then, by definition, $Y$ is independent of $X$. In the example above, one may logically assume that the distributions of $Y$ (measured area) will differ greatly for different $X$ (measured radius), and that $Y$ and $X$ are thus strongly dependent.

The following $\mathrm{R}$ code and resulting graph of Figure $1.1$ illustrate how the distributions of area $(Y)$ might look for circles whose radius $(X)$ is measured to be $9.0$ meters versus circles whose radius is measured to be $10.0$ meters. In this example, we assume that $p(y \mid x)$ is a normal distribution with mean $\pi x^{2}$ meters ${ }^{2}$ and standard deviation of 1 meter $^{2}$.

统计代写|回归分析作业代写Regression Analysis代考|Introduction to Regression Models

• 一个人如何选择牙享 $(Y)$ 与人的年龄有关 $\left(X_{1}\right)$ 和收入 $\left(X_{2}\right)$ ?
• 一个人的㾔症缓解状态如何 $(Y)$ 与他们的化疗方案有关 $(X)$ ?
• 道路坑洼的数量是多少 $(Y)$ 与表面处理中使用的材料有关 $\left(X_{1}\right)$ 和安装后的时间 $\left(X_{2}\right)$ ?
• 一个人偿还贷款的能力如何 $(Y)$ 与该人的收入有关 $\left(X_{1}\right)$ ，资产 $\left(X_{2}\right)$ ，和债务 $\left(X_{3}\right)$ ?
• 一个人购买科技产品的意图如何 $(Y)$ 与他们感知的有用性有关 $\left(X_{1}\right)$ 和感知到的产品易用性 $\left(X_{2}\right)$ ?
• 标准普尔 500 股指今天的回报率如何 $(Y)$ 与昨天的回报有关 $(X)$ ?
• 一家公司的盈利能力如何 $(Y)$ 与其在质量管理方面的投资有关 $(X)$ ?
了解这种关系可以帮助您预测末知的 $Y$ 将对于给定的固定值 $X$ ，它可以帮助你决定你应该选择什么样的行 动方案，它可以帮助你以科学的方式理解你正在学习的主题。
回归模型也可以帮助您预测末来。预测是预测的一个特例：预测意味着对末来的预测，而预测包括任何类型的“假 设”分析，不仅是关于末来可能发生的事情，还包括过去在不同情况下可能发生的事情。情况。
在某些科目中，您将学习使用方程式进行预测，例如
$$Y=f(X)$$
函数在哪里 $f$ 可能是线性、二次、指数或对数函数；或者它可能根本没有任何“命名”函数形式。然而，在所有情况 下，这是一种确定性的关系：给定一个特定的值， $x$ ，的变量 $X ，$ 的价值 $Y$ 完全由 $Y=f(x)$.
注意大写之间是有区别的 $X$ 和小写 $x$. 本书中关于小写和大写的约定 $Y$ 和 $X$ 是标准的：大写是指一般的变量，可以 是许多不同的可能值，而小写是指变量的特定值。例如， $X=$ 一般来说，年龄可以是许多不同的值，而 $X=x$ 识别有年龄的人的子集 $x$ ，例如，25 岁的人的子集。

广义线性模型代考

统计代写|回归分析作业代写Regression Analysis代考|Comparisons between ESF and SAR model specification

The simplest version of MESF accounts for $\mathrm{SA}$ by including a nonconstant mean in a regression model. The spatial SAR specification does this as well by including the term $\left[(1-\rho) \beta_{0} 1+\rho \mathbf{W Y}\right]$, where $\beta_{0}$ denotes the intercept term. The pure SA SAR model is specified as
$$\mathbf{Y}=(1-\rho) \beta_{0} 1+\rho \mathbf{W}+\boldsymbol{\varepsilon},$$
employing the row-standardized version of matrix $\mathbf{C}$, namely, matrix $\mathbf{W}$. For the Box-Cox transformed PD studied in this chapter, the maximum likelihood estimate of the SA parameter is $\hat{\rho}=0.70120$. This $\mathrm{SA}$ term

accounts for about $48.1 \%$ of the variance in the Box-Cox transformed PD across Texas. This percentage is less than the $62 \%$ for the ESF specification, in part because the SAR specification includes all, not only the relevant subset of, eigenvectors, introducing some noise into its estimation. Meanwhile, the SAR residual Shapiro-Wilk statistic, $0.96204$, is statistically significant $(p<0.0001)$. Both Getis and Griffith $(2002)$ and Thayn and Simanis (2013) present comparisons of spatial autoregressive and ESF analyses. An ESF specification frequently outperforms a spatial autoregressive specification.
Perhaps one of the greatest advantages MESF has vis-à-vis spatial autoregression is its ability to visualize the SA latent in a georeferenced attribute variable. It also has implementation advantages for generalized linear models (GLMs; see Chapter 5 ).

统计代写|回归分析作业代写Regression Analysis代考|Simulation experiments based upon ESFs

Griffith (2017) argues that MESF is superior to spatial autoregression for spatial statistical simulation experiments because it preserves an underlying map pattern and is characterized by constant variance; in other words, it supports conditional geospatial simulations. A spatial analyst can undertake a simulation experiment employing MESF in one of the following three ways: (1) draw a random error term from a normal distribution with mean zero and variance equal to the linear regression mean squared error; (2) randomly permute the n residuals calculated with linear regression estimation; and, (3) randomly sample, with replacement, the n residuals from the linear regression estimation (similar to bootstrapping). Each of these three strategies was used to perform a sensitivity analysis simulation for the ESF constructed in Section 3.2.2. Each simulation experiment involved 10,000 replications (to profit from the Law of Large Numbers).

The first simulation experiment added random noise $\varepsilon_{i} \sim \mathrm{N}\left(0,1.24350^{2}\right)$, $\mathrm{i}=1,2, \ldots, 254$, to the ESF + intercept tern (i.e., $4.40986$ ). The simulation mean of the map averages (based upon sets of $254 \varepsilon_{i}$ ) is $-0.00045$; the simulation mean of the map variances is $1.15826$. Fig. $3.5 \mathrm{~A}$ portrays the simulated mean map pattern for the simulated log-transformed PD values; it essentially is identical to the map pattern in Fig. 3.1B. The variances for the individual county simulations span the range from $1.13704^{2}$ to $1.18137^{2}$; the F-ratio for these two extreme variances is $1.08$, which is not statistically significant, yielding a single variance class (Fig. 3.5B). One important advantage of MESF vis-à-vis spatial autoregression-based simulation experiments is that the variance is constant across a geographic landscape, which is not the case

for spatial autoregression (see Griffith, 2017). The simulation mean $\mathrm{R}^{2}$ value is $0.6699$, which is somewhat greater than the actual $\mathrm{R}^{2}$ value. Meanwhile, the simulation mean Shapiro-Wilk probability is $0.50136 .$

Table $3.1$ tabulates the eigenvector selection significance level probabilities, Psig, as well as the eigenvector selection simulation probabilities, psimUsing a $10 \%$ level of significance selection criterion renders roughly a $10 \%$ chance that some of the 52 eigenvectors not selected in the original analysis are selected in a simulation analysis. The relationship between these two selection probabilities may be described as follows:
$$0.24\left(\mathrm{c}^{-3.35 p_{u}^{2.2}}-\mathrm{e}^{-3.35}\right), \mathrm{pscudo}^{-\mathrm{R}^{2}} \approx 1.0000$$

统计代写|回归分析作业代写Regression Analysis代考|ESF prediction with linear regression

Prediction is a valuable use of linear regression and is alluded to by the PRESS statistic. Redundant attribute information (i.e., multicollinearity) with the covariates supports the prediction of the response variable; each of these predictions is a conditional mean (i.e., a regression fitted value) based upon the given covariates used to compute it. An extension of this prediction capability is to observations not included in the original sample; a set of estimated regression coefficients enables the calculation of a prediction with covariates measured for out-of-sample observations. These supplemental observations have an additional source of variation affiliated with them, namely, their own stochastic noise, which is not addressed during estimation of the already-calculated regression coefficients.

Cross-validation offers an application of ESF prediction with linear regression. This prediction may be executed with the following modified pure SA linear regression specification when a single attribute variable value, $y_{\mathrm{m}}$, is miscing
$$\left(\begin{array}{c} \mathbf{Y}{\mathrm{o}} \ 0 \end{array}\right)=\beta{0} \mathbf{1}-\mathrm{y}{\mathrm{m}}\left(\begin{array}{c} \mathbf{0}{\mathrm{o}} \ 1 \end{array}\right)+\sum_{\mathrm{k}=1}^{\mathrm{K}}\left(\begin{array}{c} \mathbf{E}{\mathrm{o}, \mathrm{k}} \ \mathbf{E}{\mathrm{m}, \mathrm{k}} \end{array}\right) \boldsymbol{\beta}{\mathrm{E}{\mathrm{K}}}+\left(\begin{array}{c} \boldsymbol{\varepsilon}_{\mathrm{o}} \ 0 \end{array}\right),$$
where the subscript o denotes observed data, the subscript $m$ denotes missing data, and 0 is a vector of zeros. This specification subtracts the unknown data

values, $y_{\mathrm{m}}$, from both sides of the equation and then allows these values to be estimated as regression parameters (i.e., conditional means). In doing so, these conditional means are equivalent to their fitted values and hence have residuals of zero.

Fig. $3.7$ portrays the scatterplot of the log-transformed 2010 Texas PD (vertical axis) versus the corresponding 254 imputed values calculated with Eq. (3.8) but with no covariates (i.e., a pure SA specification); this exercise is similar to kriging. The linear regression equation describing this correspondence may be written as follows:
$$\hat{\mathrm{Y}}=0.98849+0.77990 \mathrm{Y}_{\text {predicted }}, \mathrm{R}^{2}=0.4078$$

统计代写|回归分析作业代写Regression Analysis代考|Comparisons between ESF and SAR model specification

MESF 相对于空间自回归的最大优势之一可能是它能够可视化地理参考属性变量中潜在的 SA。它还具有广义线性模型（GLM；见第 5 章）的实施优势。

统计代写|回归分析作业代写Regression Analysis代考|Simulation experiments based upon ESFs

Griffith (2017) 认为，MESF 在空间统计模拟实验中优于空间自回归，因为它保留了基础地图模式并且具有恒定方差的特点；换句话说，它支持有条件的地理空间模拟。空间分析师可以通过以下三种方式之一使用 MESF 进行模拟实验： (1) 从均值为零且方差等于线性回归均方误差的正态分布中绘制随机误差项；(2) 随机排列用线性回归估计计算的n个残差；(3) 随机抽取线性回归估计的 n 个残差进行替换（类似于自举）。这三种策略中的每一种都用于对第 3.2.2 节中构建的 ESF 进行敏感性分析模拟。

0.24(C−3.35p在2.2−和−3.35),psC在d这−R2≈1.0000

统计代写|回归分析作业代写Regression Analysis代考|ESF prediction with linear regression

(是这 0)=b01−是米(0这 1)+∑ķ=1ķ(和这,ķ 和米,ķ)b和ķ+(e这 0),

广义线性模型代考

统计代写|回归分析作业代写Regression Analysis代考|An illustrative linear regression example

A pure SA analysis ignores covariates and estimates the SA latent in an attribute variable map pattern. This type of analysis is pertinent to, for example, the construction of histograms for georeferenced RVs or the calculation of Pearson product moment correlation coefficients for pairs of georeferenced RVs, among other things. The MESF linear regression equation, which assumes normally distributed residuals, is the following nonconstant mean-only specification:
$$\mathbf{Y}=\mathbf{1} \boldsymbol{\beta}{0}+\mathrm{E}{\mathrm{K}} \boldsymbol{\beta}_{\mathrm{E}}+\boldsymbol{\xi} .$$
One traditional specification error concern here pertains to how closely the response variable Y conforms to a normal distribution. Analysts frequently subject a nonnormal set of attribute values to a $B o x-\operatorname{Cox} /$ Manly transformation to normality (see Griffith, 2013).

Consider the 2010 population density (PD) across the 254 counties of Texas (see Fig. 3.1A); urban areas are conspicuous in this map pattern, revealing geographic heterogeneity. Raw PD values do not conform closely to a bell-shaped curve (Fig. 3.2A), whereas Box-Cox transformed values LN $(\mathrm{PD}-0.08)$ do (Fig. 3.2B), where LN denotes natural logarithm.

统计代写|回归分析作业代写Regression Analysis代考|The selection of eigenvectors to construct an ESF

The first step in constructing an ESF for the 2010 Texas PD by county is to extract the 254 eigenvectors from the modified SWM $\left(\mathbf{I}-11^{\mathrm{T}} / 254\right) \times$ $\mathbf{C}\left(\mathbf{I}-11^{\mathrm{T}} / 254\right)$, where $0-1$ matrix $\mathrm{C}$ denotes the Texas county SWM, based upon the rook definition of adjacency (see Preface, Fig. P1). Because this PD exhibits PSA, determining an appropriate candidate set of eigenvectors for stepwise regression can begin by setting aside the $149 \mathrm{NSA}$ eigenvectors plus the single eigenvector having a zero eigenvalue (corresponding to the eigenvector proportional to the vector 1 ), which a regression equation already includes for its intercept term. The next step is to determine how many of the 104 PSA eigenvectors to include, counting this number from the largest eigenvalue (i.e., the maximum possible PSA). Chun et al. (2016, p. 75) furnish the following equation to help with this decision:
$$1+\exp \left{2.1480-\frac{6.1808\left(\mathrm{z}{\mathrm{MC}}+0.6\right)^{0.1742}}{\mathrm{n}{\text {pos }}^{0.1298}}+\frac{3.3534}{\left(\mathrm{z}{\mathrm{MC}}+0.6\right)^{0.1742}}\right}$$ with $\mathrm{n}{\mathrm{Pos}}=104$ (the number of PSA eigenvectors), and $\mathrm{z}{\mathrm{MC}}=13.52$ (the linear regression residuals $z$-score measure of SA) here. This expression indicates that the candidate set should contain the 78 eigenvectors with the largest eigenvalues. Spatial regression analysis using eigenvector spatial filtering One useful criterion for eigenvector selection from the candidate set is the level of significance for each eigenvector’s regression coefficient, which essentially maximizes the linear regression $\mathrm{R}^{2}$ value; other selection criteria could be utilized (see Griffith, 2004). In addition, a stepwise procedure that combines both forward selection and backward elimination supports the construction of a parsimonious ESF. Because the eigenvectors are mutually orthogonal and uncorrelated, the primary factor in eigenvector selection during any given step is the marginal error sum of squares for that step. Of the 78 candidate eigenvectors, 26 were selected using a significance level criterion of $0.10$, accounting for roughly $62.5 \%$ of the variation in logtransformed PD across the counties of Texas (Fig. 3.1B), highlighting the Dallas, Houston, and Austin-San Antonio metropolitan regions and indicating that $\mathrm{SA}$ introduces variance inflation by more than doubling the underlying IID variance. Table $3.1$ summarizes the stepwise selection results, revealing that global (e.g., $\mathbf{E}{2}$ ), regional (e.g., $\mathbf{E}{19}$ ), and local (e.g., $\mathbf{E}{77}$ ) map pattern ${ }^{1}$ components account for the $\mathrm{SA}$ under study and that the Aegree of SA does not determine the selection sequence.

统计代写|回归分析作业代写Regression Analysis代考|Selected criteria for assessing regression models

Once an ESF is constructed, model dingnostics should be performed. The predicted residual error sum of squares (PRESS) statistic is a useful global diagnostic to calculate because it relates to a cross-validation assessment, with the set of covariates being held constant. Values of the ratio PRESS/ESS close to 1, where ESS denotes error sum of squares, indicate good model performance in this context because the corresponding estimated model fitting and prediction error essentially are the same (i.e., the estimated trend line also describes new observations well). Here this values is $376.737 / 355.645=1.059$, implying a very respectable model performance with regard to the cross-validation criterion.

Three features of the linear regression residuals merit assessment. The first concerns normality (Fig. 3.3A); here the Shapiro-Wilk statistic for the linear regrension residuals is $0.98030(p=0.0014)$; the frequency distribution for these residuals differs statistically, but not substantively, from a bell-shaped curve. The second concerns residual SA. The expected value
‘The grouping into global, regional, and local map patterns is subjective. These terms, respectively, refer to $\mathrm{MC} / \mathrm{MC}_{\max }$ (i-e., the maximum $\mathrm{MC}$ ) values in the ranges $0.9-1,0.7-0.9$, and $0.25-0.7$. The maximum MC value here is $1.09798$, which should be used to standardize $\mathrm{MC}$ values to make them comparable across geoggraphic handscapes.

统计代写|回归分析作业代写Regression Analysis代考|An illustrative linear regression example

$$\mathbf{Y}=\mathbf{1} \boldsymbol{\beta} {0}+\mathrm{E} { \mathrm{K}} \boldsymbol{\beta}_{\mathrm{E}}+\boldsymbol{\xi} 。$$

统计代写|回归分析作业代写Regression Analysis代考|The selection of eigenvectors to construct an ESF

1+\exp \left{2.1480-\frac{6.1808\left(\mathrm{z}{\mathrm{MC}}+0.6\right)^{0.1742}}{\mathrm{n}{\text {pos } }^{0.1298}}+\frac{3.3534}{\left(\mathrm{z}{\mathrm{MC}}+0.6\right)^{0.1742}}\right}1+\exp \left{2.1480-\frac{6.1808\left(\mathrm{z}{\mathrm{MC}}+0.6\right)^{0.1742}}{\mathrm{n}{\text {pos } }^{0.1298}}+\frac{3.3534}{\left(\mathrm{z}{\mathrm{MC}}+0.6\right)^{0.1742}}\right}和n磷这s=104（PSA特征向量的数量），和和米C=13.52（线性回归残差和-SA 的得分度量）在这里。这个表达式表明候选集应该包含 78 个特征向量的最大特征值。使用特征向量空间滤波的空间回归分析 从候选集中选择特征向量的一个有用标准是每个特征向量的回归系数的显着性水平，它基本上使线性回归最大化R2价值; 可以使用其他选择标准（参见 Griffith，2004 年）。此外，结合前向选择和后向消除的逐步过程支持简约 ESF 的构建。因为特征向量是相互正交且不相关的，所以在任何给定步骤中选择特征向量的主要因素是该步骤的边际误差平方和。在 78 个候选特征向量中，使用显着性水平标准选择了 26 个0.10, 大致占62.5%得克萨斯州各县的对数转换 PD 的变化（图 3.1B），突出显示达拉斯、休斯顿和奥斯汀-圣安东尼奥大都市区，并表明小号一种通过将基础 IID 方差增加一倍以上来引入方差膨胀。桌子3.1总结了逐步选择的结果，揭示了全局（例如，和2), 地区性的 (例如,和19）和本地（例如，和77) 地图图案1组件占小号一种正在研究中，并且 SA 的 Aegree 不能确定选择顺序。

统计代写|回归分析作业代写Regression Analysis代考|Selected criteria for assessing regression models

‘对全球、区域和本地地图模式的分组是主观的。这些术语分别指米C/米C最大限度（即，最大米C) 范围内的值0.9−1,0.7−0.9， 和0.25−0.7. 这里的最大 MC 值为1.09798, 这应该用于标准化米C值以使它们在地理景观中具有可比性。

广义线性模型代考

统计代写|回归分析作业代写Regression Analysis代考|A theoretical foundation for ESFs

The theoretical foundation for MESF contains two components, one derivable from the general spatial autoregressive model specification and the other derivable from the concept of a random effects term.

The spatial autoregressive response (AR) model (known as the spatial lag model in spatial econometrics) specification, an auto-normal model, may be written as follows, using the spatial linear operator $(\mathbf{I}-\rho \mathbf{C})$ and matrix notation:
$$\mathbf{Y}=(\mathbf{I}-\rho \mathbf{C})^{-1}\left(\mathbf{X} \boldsymbol{\beta}{\mathbf{X}}+\boldsymbol{\varepsilon}\right)$$ where $\boldsymbol{\beta}{\mathbf{X}}$ is a $(\mathrm{p}+1)$-by-1 vector of regression coefficients for $\mathrm{p}$ covariates and the intercept term, $\rho$ is the SA parameter, and $\varepsilon$ is an n-by-1 vector of independent and identically discributed (IID) normal random variables (RVs) with mean zero and constant variance $\sigma^{2}$. The standard maximum likelihood estimation of parameters in Eq. (3.2) involves it being rewritten as the following nonlinear regression specification:

$$(\mathbf{I}-\rho \mathbf{C}) \mathbf{Y}=(\mathbf{I}-\rho \mathbf{C})(\mathbf{I}-\rho \mathbf{C})^{-1}\left(\mathbf{X} \boldsymbol{\beta}{\mathbf{X}}+\boldsymbol{\varepsilon}\right) \Rightarrow \mathbf{Y}=\rho \mathbf{C Y}+\mathbf{X} \boldsymbol{\beta}{\mathbf{X}}+\boldsymbol{\varepsilon}$$
The eigenfunction decomposition of the $S W M C$ is $\mathbf{E} \Lambda \mathbf{E}^{\mathrm{T}}$, where matrix $\mathbf{E}$ is the set of $n$ eigenvectors of SWM $\mathrm{C}$, diagonal matrix $\boldsymbol{\Lambda}$ contains the set of $\mathrm{n}$ eigenvalues of SWM C, with the ordering of entries in these two matrices being the same eigenfunctions, and superscript $T$ denotes the matrix transpose operation. Substituting this decomposition of SWM C into Eq. (3.3) produces.
$$\mathbf{Y}=\rho \mathbf{E} \boldsymbol{A} \mathbf{E}^{\mathrm{T}} \mathbf{Y}+\mathbf{X} \boldsymbol{\beta}{\mathbf{X}}+\boldsymbol{\varepsilon},$$ where $\mathbf{E}^{\mathrm{T}} \mathbf{Y}$ is the ordinary least squares (OLS) estimate of regression coefficients when response variable $\mathbf{Y}$ is regressed on eigenvector matrix $\mathbf{E}$. A stepwise selection procedure (e.g., simultaneous forward-backward) eliminates $j$ eigenvectors for which $\mathbf{E}{j}^{\mathrm{T}} \mathbf{Y} \approx 0$ (i.e., the $\mathrm{SA}$ map patterns for these eigenvectors do not account for any SA in the regression residuals) or for which $\rho \lambda_{j} \approx 0$ (i.e., the map pattern displays a trivial degree of SA), which in practice tends to be a large majority of the eigenvectors, leaving $\mathrm{K}<<\mathrm{n}$ eigenvectors in the model specification:
$$\mathbf{Y}=\mathbf{E}{\mathrm{K}} \boldsymbol{\beta}{\mathrm{E}}+\mathbf{X} \boldsymbol{\beta}_{\mathrm{X}}+\boldsymbol{\xi},$$

统计代写|回归分析作业代写Regression Analysis代考|The fundamental theorem of MESF

A statement of the fundamental theorem of MESF appears in Section 2.1.3. It is based upon several theorems in matrix algebra, including the fundarnental theorem of principal components analysis (see Tatsuoka, 1988, p. 146), which may be translated as follows:
Given a modified $n-b y-n S W M\left(I-11^{T} / n\right) C\left(I-11^{T} / n\right)$ for a given geographic land scape, we can derive a set of orthogonal and uncorrelated variables $\boldsymbol{E}{\imath}, \boldsymbol{E}{2}, \ldots, \boldsymbol{E}{n}$ by a set of linear transformations corresponding to the principal-axes rotation [i.e, the rigid rotation whose transformation matrix $E$ has the n eigervectors of matrix $\left.\left(I-11^{\top} / n\right) C\left(I-11^{T} / n\right)\right]$ as its columns. The $S A$ measures of this new set of variables are given by the diagonal matrix $\left.\left(|^{\top} \mathrm{C}\right]\right) \Lambda=\left[n / \mathbf{1}^{\top} C 1 \mathbf{E}^{\top}\left(I-11^{\top} / n\right)\right.$ $C\left(I-11^{\top} / n\right) E$, whose diagonal elements are the n MCs of the corresponding map patterns produced by the n eigenvectors of matrix $\boldsymbol{E}$. Orthogonality results from the matrix $\left(\mathbf{I}-11^{\mathrm{T}} / \mathrm{n}\right) \mathrm{C}\left(\mathbf{I}-11^{\mathrm{T}} / \mathrm{n}\right)$ being symmetric (if $\mathbf{C}$ is a symmetric matrix, then $\mathbf{A C A}{ }^{T}$ is a symmetric matrix). Uncorrelatedness results from the pre- and postmultiplication of matrix $\mathbf{C}$ by the projection matrix $\left(\mathbf{I}-11^{\mathrm{T}} / \mathrm{n}\right)$, resulting in a single eigenvector proportional to the $n-b y-1$ vector 1 , and hence the $n-1$ other eigenvectors having elements that sum to zero; the numerator of the Pearson product moment correlation coefficient for a pair of different eigenvectors has a cross-product term (e.g., XY) of zero (orthogonality) and a product of two means (each being a sum of the elements of an eigenvector, with at least one of these sums equal to zero) of zero (Griffith $2000 \mathrm{~b}, \mathrm{p} .105$ ). Tiefelsdorf and Boots (1995; Section 2.1.2) prove that the MC for a given eigenvector $\mathbf{E}{j}$ is given by $\left(\mathrm{n} / \mathbf{1}^{\mathrm{T}} \mathrm{C} 1\right) \lambda_{\mathrm{j}}$. The rank ordering of the $\mathrm{R}$ ayleigh quotients
$$\left(n / 1^{\mathrm{T}} \mathrm{C} 1\right) \mathbf{E}^{\mathrm{T}}\left(\mathbf{I}-11^{\mathrm{T}} / \mathrm{n}\right) \mathrm{C}\left(\mathbf{I}-11^{\mathrm{T}} / \mathrm{n}\right) \mathbf{E} /\left(\mathbf{E}^{\mathrm{T}} \mathbf{E}\right)=\left(\mathrm{n} / \mathbf{1}^{\mathrm{T}} \mathrm{C} 1\right) \boldsymbol{\Lambda}$$
produces the sequential ordering from the maximum possible level of positive SA (PSA) to the maximum possible level of negative SA (NSA; see de Jong, Sprenger, \& van Veen, 1984).

Because one eigenvector element corresponds to each of the $\mathrm{n}$ areal units in a geographic landscape, a map can portray the geographic distribution of each set of eigenvector elements. Consequently, a map of the $\mathrm{ESF}{\mathbf{K}} \boldsymbol{\beta}{\mathbf{E}}$ fumishes a visualization of SA; as such, it supplements the Moran scatterplot graphic tool. Furthermore, because each eigenvector is an n-by-1 variate, eigenvectors can be treated like covariates and included in a linear regression analysis.

统计代写|回归分析作业代写Regression Analysis代考|Map pattern and SA: Heterogeneity in map-wide trends

SA may be interpreted in a number of different ways, one of which is map pattern (Griffith, 1992). Pattern refers to some discernible real-world regularity that contains elements recurring in a predictable manner. Map pattern refers to this regularity and repetitiveness occurring in two dimensions and is the basis for spatial interpolation (prediction linking to kriging in geostatistics). SA makes map pattern possible by organizing attribute values on a map in such a way that for PSA, for example, relatively high values cluster together in a geographic landscape, as do relatively intermediate, and relatively low, values. This geographic organization can yield global gradients across, as well as large regional or small local clusters in, a geographic landscape; in general, neighborhood subsets of georeferenced attribute values are similar or dissimilar (NSA). These are the components of map pattern depicted by the modified SWM eigenvectors with, respectively, large, moderate, or small but not close to zero, eigenvalues. In other words, map pattern has to do with the geographic arrangement of attribute values of a map, with the nature and degree of (dis)similarities of nearby values relating to $\mathrm{SA}$.
Heterogeneity refers to a collection of diverse elements, elements that are nonuniform in the composition of their attribute values. In terms of statistical properties, these elements are not IID (see Section 3.1). In classical linear regression, a response variable $Y$ often is considered heterogeneous in its individual observation means, resulting in the term $\mathbf{X} \boldsymbol{\beta}{\mathbf{X}}$ being included in a linear regression specification. This specification strategy seeks to account for heterogeneity with the regression mean, rendering residuals that are IID and hence homogeneous. If $X \equiv 1$, then the mean of $Y$ for each areal unit is the constant $\beta{0}$; this is the special case of a homogeneous $Y$. In the presence of $\mathrm{SA}$, the residuals still have a mean of zero, but now heterogeneity persists through their variances being unequal; this outcome is one consequence of variance inflation by SA. Eq. (3.4) highlights how MESF addresses this problem by replacing the constant mean with a variable mean:
$$\mathbf{Y}=\mathrm{E}{\mathrm{K}} \boldsymbol{\beta}{\mathrm{E}}+\mathbf{X} \boldsymbol{\beta}{\mathrm{X}}+\boldsymbol{\xi}=\left(\mathbf{1} \boldsymbol{\beta}{0}+\mathrm{E}{\mathrm{K}} \boldsymbol{\beta}{\mathrm{E}}\right)+\mathbf{X}{\mathrm{P}} \boldsymbol{\beta}{\mathrm{X}}+\boldsymbol{\xi}$$

统计代写|回归分析作业代写Regression Analysis代考|A theoretical foundation for ESFs

MESF 的理论基础包含两个组成部分，一个来自一般空间自回归模型规范，另一个来自随机效应项的概念。

统计代写|回归分析作业代写Regression Analysis代考|The fundamental theorem of MESF

MESF 基本定理的陈述出现在第 2.1.3 节。它基于矩阵代数中的几个定理，包括主成分分析的基本定理（参见 Tatsuoka, 1988, p. 146），可以翻译如下
：n−b是−n小号在米(一世−11吨/n)C(一世−11吨/n)对于给定的地理景观，我们可以推导出一组正交且不相关的变量和一世,和2,…,和n通过一组对应于主轴旋转的线性变换[即，其变换矩阵的刚性旋转和具有矩阵的 n 个 eigervectors(一世−11⊤/n)C(一世−11吨/n)]作为它的列。这小号一种这组新变量的度量由对角矩阵给出(|⊤C])Λ=[n/1⊤C1和⊤(一世−11⊤/n) C(一世−11⊤/n)和, 其对角元素是矩阵的 n 个特征向量产生的对应地图图案的 n 个 MC和. 矩阵的正交性结果(一世−11吨/n)C(一世−11吨/n)是对称的（如果C是一个对称矩阵，那么一种C一种吨是一个对称矩阵）。矩阵的前乘和后乘导致不相关性C由投影矩阵(一世−11吨/n)，产生一个与n−b是−1向量 1 ，因此n−1其他元素之和为零的特征向量；一对不同特征向量的 Pearson 积矩相关系数的分子具有一个为零的叉积项（例如 XY）（正交性）和两个均值的乘积（每个均值是一个特征向量的元素之和，其中这些总和中至少有一个等于零）的零（格里菲斯2000 b,p.105）。Tiefelsdorf 和 Boots（1995；第 2.1.2 节）证明给定特征向量的 MC和j是（谁）给的(n/1吨C1)λj. 的排名顺序R艾莉商数
(n/1吨C1)和吨(一世−11吨/n)C(一世−11吨/n)和/(和吨和)=(n/1吨C1)Λ

统计代写|回归分析作业代写Regression Analysis代考|Map pattern and SA: Heterogeneity in map-wide trends

SA 可以用多种不同的方式来解释，其中一种是地图模式（Griffith，1992）。模式是指一些可识别的现实世界规律，其中包含以可预测方式重复出现的元素。地图模式是指这种在二维中出现的规律性和重复性，是空间插值（与地质统计学中的克里金法相关的预测）的基础。SA 通过组织地图上的属性值使地图模式成为可能，例如，对于 PSA，相对较高的值在地理景观中聚集在一起，相对中等和相对较低的值也是如此。这种地理组织可以产生跨越地理景观的全球梯度，以及地理景观中的大型区域或小型局部集群；一般来说，地理参考属性值的邻域子集相似或不同 (NSA)。这些是修改后的 SWM 特征向量所描绘的地图图案的组成部分，分别具有大、中等或小但不接近于零的特征值。换句话说，地图模式与地图属性值的地理排列有关，与附近值的性质和（不）相似程度有关小号一种.

广义线性模型代考

统计代写|回归分析作业代写Regression Analysis代考|The spectral analysis of three-dimensional data

In many cases, the spectral analysis of three-dimensional georeferenced data involves a sequence of maps, one for each point in a specified time series, rather than supplementing planar surfaces with elevation. Griffith and Heurclink (2012) extend the preeeding spectral analysis conceptualizations to this situation. Now the spectral density-based space-time $(\tau, \eta, \nu)-$ lag correlation function becomes, for a regular square tessellation and the rook adjacency definition, and uniformly spaced points in time,
$\frac{\int_{0}^{\pi} \int_{0}^{\pi} \int_{0}^{\pi} \frac{\operatorname{Cos}(\tau \theta) \operatorname{Cos}(\eta \varphi) \operatorname{Cos}(v t)}{\left[1-\operatorname{COS}(t)\left{\rho_{s}[\operatorname{Cos}(\theta)+\operatorname{Cos}(\varphi)]+\rho_{T}\right}\right]^{k}} d \theta d \varphi d t}{\int_{0}^{\pi} \int_{0}^{\pi} \int_{0}^{\pi} \frac{1}{\left[1-\operatorname{COS}(t)\left{\rho_{s}[\operatorname{CoS}(\theta)+\operatorname{CoS}(\varphi)]+\rho_{T}\right}\right]^{k}} \mathrm{~d} \theta \mathrm{d} \varphi \mathrm{dt}}, \boldsymbol{\kappa}=1,2$,
where $t$ denotes the twe argument, $\rho_{\mathrm{s}}$ denotes the SA parameter, and $\rho_{\mathrm{r}}$ denotes the temporal autocorrelation parameter. This specification represents a contemporaneous space-time process, which is additive, whose matrix representation is given by

$$\mathbf{C}=\mathbf{I}{\mathrm{T}} \otimes \mathbf{I}{\mathrm{s}}-\rho_{\mathrm{s}} \mathbf{C}{\mathrm{T}} \otimes \mathbf{C}{\mathrm{s}}-\rho_{\mathrm{T}} \mathbf{C}{\mathrm{T}} \otimes \mathbf{I}{\mathrm{s}},$$
where $\otimes$ denotes the Kronecker product mathematical matrix operation, $\mathbf{C}{\mathrm{s}}$ denotes the SWM, $\mathbf{C}{\mathrm{T}}$ denotes the time-series connectivity matrix, $\mathbf{I}{\mathrm{T}}$ denotes the $T$-by-T identify matrix, $\mathbf{I}{s}$ denotes the $\mathrm{n}-\mathrm{by}-\mathrm{n}$ identity matrix, and $1-\operatorname{COS}(\mathrm{t})\left{\rho_{\mathrm{s}}[\operatorname{COS}(\mathrm{u})+\operatorname{COS}(\mathrm{v})]+\rho_{\mathrm{T}}\right}$ are the limiting eigenvalues of the space time connectivity matrix $C$.

An alternative specification is multiplicative and hence describes a space-time lagged process; its matrix representation is given by
$$\mathbf{C}=\mathbf{I}{\mathrm{T}} \otimes \mathbf{I}{\mathrm{s}}-\rho_{\mathrm{s}} \mathbf{I}{\mathrm{T}} \otimes \mathbf{C}{\mathrm{s}}-\rho_{\mathrm{T}} \mathbf{C}{\mathrm{T}} \otimes \mathbf{I}{\mathrm{s}} \text {, }$$
and its spectral density-based $(\tau, \eta, \nu)$-lag correlations are given by
For a regular square lattice forming a complete $P-b y-Q$ rectangular region,
$$\mathbf{C}{\mathrm{s}}=\mathbf{C}{\mathrm{P}} \otimes \mathbf{I}{\mathrm{Q}}+\mathbf{C}{\mathrm{Q}} \otimes \mathbf{I}{\mathrm{r}},$$ where $C{p}$ and $C_{Q}$, respectively, are $S W M s$ for a $P$ length and a $Q$ length linear landscape, and $\mathbf{I}{\mathrm{P}}$ and $\mathbf{I}{\mathrm{Q}}$, respectively, are $\mathrm{P}-\mathrm{by}-\mathrm{P}$ and $\mathrm{Q}-\mathrm{by}-\mathrm{Q}$ identity matrices.

统计代写|回归分析作业代写Regression Analysis代考|Summary

This chapter reviews articulations among SWMs, eigenfunctions, and spectral functions, all three of which relate to $\mathrm{SA}$. In doing so, it also links them to geostatistics. The eigenvalues of a SWM index the nature and degree of SA in the eigenvectors of a modified SWM and also appear in the complex fraction spectral density functions used to calculate lagged spatial correlations. The cells of standardized inverse spatial covariance structures, illustrated here with the popular first- and second-order ones, contain spectral density function results. These notions interlace with concepts for PCA. Although this chapter focuses on the $\mathrm{MC}$ index of $\mathrm{SA}$, similar results may be established for both the Geary ratio (GR) and the join count statistics that are applicable to nominal measurement scale data. The linear geographic landscape furnishes many relatively simple illustrations of the connections of interest here. The two-dimensional geographic landscape furnishes more relevant, albeit more complicated, contexts and highlights map pattern visualizations, one of the most important topics of this chapter.

统计代写|回归分析作业代写Regression Analysis代考|The spectral decomposition of a SWM

Consider the geographic landscape in Fig. $2.5 \mathrm{C}$. Its rook adjacency SWM C is as follows:
$$\left[\begin{array}{llll} 0 & 1 & 1 & 0 \ 1 & 0 & 0 & 1 \ 1 & 0 & 0 & 1 \ 0 & 1 & 1 & 0 \end{array}\right]$$
The Perron-Frobenius theorem states that the principal eigenvalue is contained in the interval defined by the largest and smallest row sums; therefore here $\lambda_{1}=2$. For each pair of rows or columns that is identical, an eigenvalue equals zero; therefore because the first and fourth rows/columns are identical, and the second and third rows/columns are identical, two eigenvalues equal zero. Finally the trace of this matrix equals the sum of its four eigenvalues; therefore $2+0+0+\lambda=0$, and hence an eigenvalue equals $-2$.

Eq. (2.1) for this SWM is $\lambda^{2}\left(\lambda^{2}-4\right)=0$. The first $\lambda^{2}$ term is for the two roots of zero, whereas the second term factors into $(\lambda+2)(\lambda-2)$, which is for the two roots $\pm 2$. Ord (1975) also states that the eigenvalues for this particular type of geographic surface partitioning and SWM are given by $\lambda=2\left[\operatorname{COS}\left(\frac{\mathrm{h} \pi}{2+1}\right)+\operatorname{COS}\left(\frac{\mathrm{k} \pi}{2+1}\right)\right], \mathrm{h}=1,2$ and $\mathrm{k}=1,2$. This equation yields $2(0.5+0.5)=2 ; 2(0.5-0.5)=0 ; 2(-0.5+0.5)=0$; and, $2(-0.5-0.5)=-2$.

Griffith (2000, p. 98) proves that the solution to Eq. (2.2) for this particular type of geographic surface partitioning and SWM are the eigenvectors given by
$$\frac{2}{\sqrt{(2+1)(2+1)}}\left[\operatorname{SIN}\left(\frac{h \pi}{2+1}\right) \times \operatorname{SIN}\left(\frac{k \pi}{2+1}\right)\right]$$
This expression produces the 4-by-4 eigenvector matrix
$$\left[\begin{array}{rrrr} 0.5 & 0.5 & 0.5 & 0.5 \ 0.5 & -0.5 & 0.5 & -0.5 \ 0.5 & 0.5 & -0.5 & -0.5 \ 0.5 & -0.5 & -0.5 & 0.5 \end{array}\right]$$

统计代写|回归分析作业代写Regression Analysis代考|The spectral analysis of three-dimensional data

\frac{\int_{0}^{\pi} \int_{0}^{\pi} \int_{0}^{\pi} \frac{\operatorname{Cos}(\tau \theta) \operatorname{ Cos}(\eta \varphi) \operatorname{Cos}(v t)}{\left[1-\operatorname{COS}(t)\left{\rho_{s}[\operatorname{Cos}(\theta)+ \operatorname{Cos}(\varphi)]+\rho_{T}\right}\right]^{k}} d \theta d \varphi d t}{\int_{0}^{\pi} \int_{0 }^{\pi} \int_{0}^{\pi} \frac{1}{\left[1-\operatorname{COS}(t)\left{\rho_{s}[\operatorname{CoS}( \theta)+\operatorname{CoS}(\varphi)]+\rho_{T}\right}\right]^{k}} \mathrm{~d} \theta \mathrm{d} \varphi \mathrm{dt }}, \boldsymbol{\kappa}=1,2\frac{\int_{0}^{\pi} \int_{0}^{\pi} \int_{0}^{\pi} \frac{\operatorname{Cos}(\tau \theta) \operatorname{ Cos}(\eta \varphi) \operatorname{Cos}(v t)}{\left[1-\operatorname{COS}(t)\left{\rho_{s}[\operatorname{Cos}(\theta)+ \operatorname{Cos}(\varphi)]+\rho_{T}\right}\right]^{k}} d \theta d \varphi d t}{\int_{0}^{\pi} \int_{0 }^{\pi} \int_{0}^{\pi} \frac{1}{\left[1-\operatorname{COS}(t)\left{\rho_{s}[\operatorname{CoS}( \theta)+\operatorname{CoS}(\varphi)]+\rho_{T}\right}\right]^{k}} \mathrm{~d} \theta \mathrm{d} \varphi \mathrm{dt }}, \boldsymbol{\kappa}=1,2,

C=一世吨⊗一世s−ρs一世吨⊗Cs−ρ吨C吨⊗一世s,

Cs=C磷⊗一世问+C问⊗一世r,在哪里Cp和C问，分别是小号在米s为一个磷长度和一个问长度线性景观，和一世磷和一世问，分别是磷−b是−磷和问−b是−问身份矩阵。

统计代写|回归分析作业代写Regression Analysis代考|The spectral decomposition of a SWM

[0110 1001 1001 0110]
Perron-Frobenius 定理指出，主特征值包含在由最大和最小行和定义的区间内；因此在这里λ1=2. 对于每对相同的行或列，特征值等于 0；因此，由于第一和第四行/列相同，并且第二和第三行/列相同，因此两个特征值为零。最后这个矩阵的迹等于它的四个特征值之和；所以2+0+0+λ=0，因此特征值等于−2.

Griffith (2000, p. 98) 证明了方程的解。(2.2) 对于这种特殊类型的地理表面划分和 SWM 是由下式给出的特征向量
2(2+1)(2+1)[罪⁡(H圆周率2+1)×罪⁡(ķ圆周率2+1)]

[0.50.50.50.5 0.5−0.50.5−0.5 0.50.5−0.5−0.5 0.5−0.5−0.50.5]

