统计代写|商业分析作业代写Statistical Modelling for Business代考|Describing Central Tendency

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

统计代写|商业分析作业代写Statistical Modelling for Business代考|The mean, median, and mode

In addition to describing the shape of the distribution of a sample or population of measurements, we also describe the data set’s central tendency. A measure of central tendency represents the center or middle of the data. Sometimes we think of a measure of central tendency as a typical value. However, as we will see, not all measures of central tendency are necessarily typical values.

One important measure of central tendency for a population of measurements is the population mean. We define it as follows:More precisely, the population mean is calculated by adding all the population measurements and then dividing the resulting sum by the number of population measurements. For instance, suppose that Chris is a college junior majoring in business. This semester Chris is taking five classes and the numbers of students enrolled in the classes (that is, the class sizes) are as follows:

The mean $\mu$ of this population of class sizes is
$$\mu=\frac{60+41+15+30+34}{5}=\frac{180}{5}=36$$
Because this population of five class sizes is small, it is possible to compute the population mean. Often, however, a population is very large and we cannot obtain a measurement for each population element. Therefore, we cannot compute the population mean. In such a case, we must estimate the population mean by using a sample of measurements.

In order to understand how to estimate a population mean, we must realize that the population mean is a population parameter.

统计代写|商业分析作业代写Statistical Modelling for Business代考|The Car Mileage Case: Estimating Mileage

In order to offer its tax credit, the federal government has decided to define the “typical” EPA combined city and highway mileage for a car model as the mean $\mu$ of the population of EPA combined mileages that would be obtained by all cars of this type. Here, using the mean to represent a typical value is probably reasonable. We know that some individual cars will get mileages that are lower than the mean and some will get mileages that are above it. However, because there will be many thousands of these cars on the road, the mean mileage obtained by these cars is probably a reasonable way to represent the model’s overall fuel economy. Therefore, the government will offer its tax credit to any automaker selling a midsize model equipped with an automatic transmission that achieves a mean EPA combined mileage of at least $31 \mathrm{mpg}$.

To demonstrate that its new midsize model qualifies for the tax credit, the automaker in this case study wishes to use the sample of 50 mileages in Table $3.1$ to estimate $\mu$, the model’s mean mileage. Before calculating the mean of the entire sample of 50 mileages, we will illustrate the formulas involved by calculating the mean of the first five of these mileages.

Table $3.1$ tells us that $x_{1}=30.8, x_{2}=31.7, x_{3}=30.1, x_{4}=31.6$, and $x_{5}=32.1$, so the sum of the first five mileages is
\begin{aligned} \sum_{i=1}^{5} x_{i} &=x_{1}+x_{2}+x_{3}+x_{4}+x_{5} \ &=30.8+31 . \overline{3}+30.1+31.6+3 \overline{2} .1=156.3 \end{aligned}
Therefore, the mean of the first five mileages is
$$\bar{x}=\frac{\sum_{i=1}^{5} x_{i}}{5}=\frac{156.3}{5}=31.26$$
Of course, intuitively, we are likely to obtain a more accurate point estimate of the population mean by using all of the available sample information. The sum of all 50 mileages can be verified to be
$$\sum_{i=1}^{50} x_{i}=x_{1}+x_{2}+\cdots+x_{50}=30.8+31.7+\cdots+31.4=1578$$
Therefore, the mean of the sample of 50 mileages is
$$\bar{x}=\frac{\sum_{i=1}^{50} x_{i}}{50}=\frac{1578}{50}=31.56$$
This point estimate says we estimate that the mean mileage that would be obtained by all of the new midsize cars that will or could potentially be produced this year is $31.56 \mathrm{mpg}$. Unless we are extremely lucky, however, there will be sampling error. That is, the point estimate $\bar{x}=31.56$ mpg, which is the average of the sample of fifty randomly selected mileages, will probably not exactly equal the population mean $\mu$, which is the average mileage that would be obtained by all cars. Therefore, although $\bar{x}=31.56$ provides some evidence that $\mu$ is at least 31 and thus that the automaker should get the tax credit, it does not provide definitive evidence. In later chapters, we discuss how to assess the reliability of the sample mean and how to use a measure of reliability to decide whether sample information provides definitive evidence.

统计代写|商业分析作业代写Statistical Modelling for Business代考|Comparing the mean, median, and mode

Often we construct a histogram for a sample to make inferences about the shape of the sampled population. When we do this, it can be useful to “smooth out” the histogram and use the resulting relative frequency curve to describe the shape of the population. Relative frequency curves can have many shapes. Three common shapes are illustrated in Figure $3.3$. Part (a) of this figure depicts a population described by a symmetrical relative frequency curve. For such a population, the mean $(\mu)$, median $\left(M_{s}\right)$, and mode $\left(M_{s}\right)$ are all equal. Note that in this case all three of these quantities are located under the highest point of the curve. It follows that when the frequency distribution of a sample of measurements is approximately symmetrical, then the sample mean, median, and mode will be nearly the same. For instance,

consider the sample of 50 mileages in Table 3.1. Because the histogram of these mileages in Figure $3.2$ is approximately symmetrical, the mean $-31.56$ – and the median-31.55- of the mileages are approximately equal to each other.

Figure $3.3$ (b) depicts a population that is skewed to the right. Here the population mean is larger than the population median, and the population median is larger than the population mode (the mode is located under the highest point of the relative frequency curve). In this case the population mean averages in the large values in the upper tail of the distribution. Thus the population mean is more affected by these large values than is the population median. To understand this, we consider the following example.

统计代写|商业分析作业代写Statistical Modelling for Business代考|The mean, median, and mode

μ=60+41+15+30+345=1805=36

统计代写|商业分析作业代写Statistical Modelling for Business代考|The Car Mileage Case: Estimating Mileage

∑一世=15X一世=X1+X2+X3+X4+X5 =30.8+31.3¯+30.1+31.6+32¯.1=156.3

X¯=∑一世=15X一世5=156.35=31.26

∑一世=150X一世=X1+X2+⋯+X50=30.8+31.7+⋯+31.4=1578

X¯=∑一世=150X一世50=157850=31.56

广义线性模型代考

statistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

The statistical analyst’s goal should be to present the most accurate and truthful portrayal of a data set that is possible. Such a presentation allows managers using the analysis to make informed decisions. However, it is possible to construct statistical summaries that are misleading. Although we do not advocate using misleading statistics, you should be aware of some of the ways statistical graphs and charts can be manipulated in order to distort the truth. By knowing what to look for, you can avoid being misled by a (we hope) small number of unscrupulous practitioners.

As an example, suppose that the nurses at a large hospital will soon vote on a proposal to join a union. Both the union organizers and the hospital administration plan to distribute recent salary statistics to the entire nursing staff. Suppose that the mean nurses’ salary at the hospital and the mean nurses’ salary increase at the hospital (expressed as a percentage) for each of the last four years are as follows:

The hospital administration does not want the nurses to unionize and, therefore, hopes to convince the nurses that substantial progress has been made to increase salaries without a union. On the other hand, the union organizers wish to portray the salary increases as minimal so that the nurses will feel the need to unionize.

Figure $2.29$ gives two bar charts of the mean nurses’ salaries at the hospital for each of the last four years. Notice that in Figure $2.29$ (a) the administration has started the vertical scale of the bar chart at a salary of $\$ 58,000$by using a scale break ( ). Alternatively, the chart could be set up without the scale break by simply starting the vertical scale at$\$58,000$. Starting the vertical scale at a value far above zero makes the salary increases look more dramatic. Notice that when the union organizers present the bar chart in Figure $2.29$ (b), which has a vertical scale starting at zero, the salary increases look far less impressive.

Figure $2.30$ presents two bar charts of the mean nurses’ salary increases (in percentages) at the hospital for each of the last four years. In Figure $2.30$ (a), the administration has made the widths of the bars representing the percentage increases proportional to their heights. This makes the upward movement in the mean salary increases look more dramatic because the observer’s eye tends to compare the areas of the bars, while the improvements in the mean salary increases are really only proportional to the heights of the bars. When the union organizers present the bar chart of Figure $2.30$ (b), the improvements in the mean salary increases look less impressive because each bar has the same width.

Figure $2.31$ gives two time series plots of the mean nurses’ salary increases at the hospital for the last four years. In Figure $2.31$ (a) the administration has stretched the vertical axis of the graph. That is, the vertical axis is set up so that the distances between the percentages are large. This makes the upward trend of the mean salary increases appear to be steep. In Figure $2.31$ (b) the union organizers have compressed the vertical axis (that is, the distances between the percentages are small). This makes the upward trend of the

mean salary increases appear to be gradual. As we will see in the exercises, stretching and compressing the horizontal axis in a time series plot can also greatly affect the impression given by the plot.

It is also possible to create totally different interpretations of the same statistical summary by simply using different labeling or captions. For example, consider the bar chart of mean nurses’ salary increases in Figure 2.30(b). To create a favorable interpretation, the hospital administration might use the caption “Salary Increase Is Higher for the Fourth Year in a Row.” On the other hand, the union organizers might create a negative impression by using the caption “Salary Increase Fails to Reach $10 \%$ for Fourth Straight Year.”

In summary, it is important to carefully study any statistical summary so that you will not be misled. Look for manipulations such as stretched or compressed axes on graphs, axes that do not begin at zero, bar charts with bars of varying widths, and biased captions. Doing these things will help you to see the truth and to make well-informed decisions.

In Section $1.5$ we said that descriptive analytics uses traditional and more recently developed graphics to present to executives (and sometimes customers) easy-to-understand visual summaries of up-to-the-minute information conceming the operational status of a business. In this section we will discuss some of the more recently developed graphics used by descriptive analytics, which include gauges, bullet graphs, treemaps, and sparklines. In addition, we will see how they are used with each other and more traditional graphics to form analytic dashboards, which are part of executive information systems. We will also briefly discuss data discovery, which involves, in it simplest form, data drill down.

Dashboards and gauges An analytic dashboard provides a graphical presentation of the current status and historical trends of a business’s key performance indicators. The term dashboard originates from the automotive dashboard, which helps a driver monitor a car’s key functions. Figure $2.35$ shows a dashboard that graphically portrays some key performance indicators for a (fictitious) airline for last year. In the lower left-hand portion of the dashboard we see three gauges depicting the percentage utilizations of the regional, short-haul, and international fleets of the airline. In general, a gauge (chart) allows us to visualize data in a way that is similar to a real-life speedometer needle on an automobile. The outer scale of the gauge is often color coded to provide additional performance information. For example, note that the colors on the outer scale of the gauges in Figure $2.35$ range from red to dark green to light green. These colors signify percentage fleet utilizations that range from poor (less than 75 percent) to satisfactory (less than 90 percent but at least 75 percent) to good (at least 90 percent).
Bullet graphs While gauge charts are nice looking, they take up considerable space and to some extent are cluttered. For example, recalling the Disney Parks Case, using seven $\mathrm{~ ส ม ่ ม ด ร ~ c h a}$ seven Epcot rides would take up considerable space and would be less efficient than using the bullet graph that we originally showed in Figure 1.8. That bullet graph (which has been obtained using Excel) is shown again in Figure 2.36.

In general, a bullet graph features a single measure (for example, predicted waiting time) and displays it as a horizontal bar (or a vertical bar) that extends into ranges representing qualitative measures of performance, such as poor, satisfactory, and good. The ranges are displayed as either different colors or as varying intensities of a single hue. Using a single hue makes them discernible by those who are color blind and restricts the use of color if the bullet graph is part of a dashboard that we don’t want to look too “busy.” Many bullet graphs compare the single primary measure to a target, or objective, which is represented by a symbol on the bullet graph. The bullet graph of Disney’s predicted waiting times uses five colors ranging from dark green to red and signifying short ( 0 to 20 minutes) to very long ( 80 to 100 minutes) predicted waiting times. This bullet graph does not compare the predicted waiting times to an objective. However, the bullel graphs located in the upper left of the dashboard in Figure $2.35$ (representing the percentages of on-time arrivals and departures for the airline) do display obbjectives represented by shoort vertical black lines. For examplé, consider the bullet graphs representing the percentages of on-time arrivals and departures in the Midwest, which are shown below.

Sparklines Figure $2.38$ shows sparklines depicting the monthly closing prices of five stocks over six months. In general, a sparkline, another of the new descriptive analytics graphics, is a line chart that presents the general shape of the variation (usually over time) in some measurement, such as temperature or a stock price. A sparkline is typically drawn without axes or coordinates and made small enough to be embedded in text. Sparklines are often grouped together so that comparative variations can be seen. Whereas most charts are designed to show considerable information and are set off from the main text, sparklines are intended to be succinct and located where they are discussed. Therefore, if the sparklines in Figure $2.38$ were located in the text of a report comparing stocks, they would probably be made smaller than shown in Figure $2.38$, and the actual monthly closing prices of the stocks used to obtain the sparklines would probably not be shown next to the sparklines.
Data discovery To conclude this section, we briefly discuss data discovery methods, which allow decision makers to interactively view data and make preliminary analyses. One simple version of data discovery is data drill down, which reveals more detailed data that underlie a higher-level summary. For example, an online presentation of the airline dashboard in Figure $2.35$ might allow airline executives to drill down by clicking the gauge which shows that approximately 89 percent of the airline’s international fleet was utilized over the year to see the bar chart in Figure 2.39. This bar chart depicts the quarterly percentage utilizations of the airline’s international fleet over the year. The bar chart suggests that the airline might offer international travel specials in quarter 1 (Winter) and quarter 4 (Fall) to increase international travel and thus increase the percentage of the international fleet utilized in those quarters. Drill down can be done at multiple levels. For example, Disney executives might wish to have weekly reports of total merchandise sales at its four Orlando parks, which can be drilled down to reveal total merchandise sales at each park, which can be further drilled down to reveal total merchandise sales at each “land” in each park, which can be yet further drilled down to reveal total merchandise sales in each store in each land in each park.

广义线性模型代考

statistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

统计代写|商业分析作业代写Statistical Modelling for Business代考| Contingency Tables (Optional)

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

统计代写|商业分析作业代写Statistical Modelling for Business代考|The Brokerage Firm Case: Studying Client Satisfaction

Previous sections in this chapter have presented methods for summarizing data for a single variable. Often, however, we wish to use statistics to study possible relationships between several variables. In this section we present a simple way to study the relationship between two variables. Crosstabulation is a process that classifies data on two dimensions. This process results in a table that is called a contingency table. Such a table consists of rows and columns – the rows classify the data according to one dimension and the columns classify the data according to a second dimension. Together, the rows and columns represent all possibilities (or contingencies).An investment broker sells several kinds of investment products-a stock fund, a bond fund, and a tax-deferred annuity. The broker wishes to study whether client satisfaction with its products and services depends on the type of investment product purchased. To do this, 100 of the broker’s clients are randomly selected from the population of clients who have

purchased shares in exactly one of the funds. The broker records the fund type purchased by each client and has one of its investment counselors personally contact the client. When contacted, the client is asked to rate his or her level of satisfaction with the purchased fund as high, medium, or low. The resulting data are given in Table 2.16.

Looking at the raw data in Table $2.16$, it is difficult to see whether the level of client satisfaction varies depending on the fund type. We can look at the data in an organized way by constructing a contingency table. A crosstabulation of fund type versus level of client satisfaction is shown in Table $2.17$. The classification categories for the two variables are defined along the left and top margins of the table. The three row labels-bond fund, stock fund, and tax deferred annuity-define the three fund categories and are given in the left table margin. The three column labels-high, medium, and low-define the three levels of client satisfaction and are given along the top table margin. Each row and column combination, that is, each fund type and level of satisfaction combination, defines what we call a “cell” in the table. Because each of the randomly selected clients has invested in exactly one fund type and has reported exactly one level of satisfaction, each client can be placed in a particular cell in the contingency table. For example, because client number 1 in Table $2.16$ has invested in the bond fund and reports a high level of client satisfaction, client number 1 can be placed in the upper left cell of the table (the cell defined by the Bond Fund row and High Satisfaction column).

One good way to investigate relationships such as these is to compute row percentages and column percentages. We compute row percentages by dividing each cell’s freguency by its corresponding row total and by expressing the resulting fraction as a percentage. For instance, the row percentage for the upper left-hand cell (bond fund and high level of satisfaction) in Table $2.17$ is $(15 / 30) \times 100 \%=50 \%$. Similarly, column percentages are computed by dividing each cell’s frequency by its corresponding column total and by expressing the resulting fraction as a percentage. For example, the column percentage for the upper left-hand cell in Table $2.17$ is $(15 / 40) \times 100 \%=37.5 \%$. Table $2.18$ summarizes all of the row percentages for the different fund types in Table 2.17. We see that each row in Table $2.18$ gives a percentage frequency distribution of level of client satisfaction given a particular fund type.
For example, the first row in Table $2.18$ gives a percent frequency distribution of client satisfaction for investors who have purchased shares in the bond fund. We see that 50 percent of bond fund investors report high satisfaction, while 40 percent of these investors report medium satisfaction, and only 10 percent report low satisfaction. The other mows in Tahle $2.18$ provide. Гepcent freynency distrikutions of client satisfaction for stock fund and annuity parchasers
All three percent frequency distributions of client satisfaction-for the bond fund, the stock fund, and the tax deferred annuity-are illustrated using bar charts in Figure $2.23$. In this figure, the bar heights for each chart are the respective row percentages in Table $2.18$. For example, these distributions tell us that 80 percent of stock fund investors report high satisfaction, while $97.5$ percent of tax deferred annuity purchasers report medium or low satisfaction. Looking at the entire table of row percentages (or the bar charts in Figure 2.23), we might conclude that stock fund investors are highly satisfied, that bond fund investors are quite satisfied (but, somewhat less so than stock fund investors), and that tax-deferred-annuity purchasers are less satisfied than either stock fund or bond fund investors. In general, row percentages and column percentages help us to quantify relationships such as these.

In the investment example, we have cross-tabulated two qualitative variables. We can also cross-tabulate a quantitative variable versus a qualitative variable or two quantitative variables against each other. If we are cross-tabulating a quantitative variable, we often define categories by using appropriate ranges. For example, if we wished to cross-tabulate level of education (grade school, high school, college, graduate school) versus income, we might define income classes $\$ 0-\$50,000, \$ 50,001-\$100,000, \$ 100,001-\$150,000$, and above $\$ 150,000$. 统计代写|商业分析作业代写Statistical Modelling for Business代考|Scatter Plots We often study relationships between variables by using graphical methods. A simple graph that can be used to study the relationship between two variables is called a scatter plot. As an example, suppose that a marketing manager wishes to investigate the relationship between the sales volume (in thousands of units) of a product and the amount spent (in units of$\$10,000$ ) on advertising the product. To do this, the marketing manager randomly selects 10 sales regions having equal sales potential. The manager assigns a different level of advertising expenditure for January 2016 to each sales region as shown in Table 2.20. At the end of the month, the sales volume for each region is recorded as also shown in Table $2.20$.

A scatter plot of these data is given in Figure 2.24. To construct this plot, we place the variable advertising expenditure (denoted $x$ ) on the horizontal axis and we place the variable sales volume (denoted $y$ ) on the vertical axis. For the first sales region, advertising expenditure equals 5 and sales volume equals 89 . We plot the point with coordinates $x=5$ and $y=89$ on the scatter plot to represent this sales region. Points for the other sales regions are plotted similarly. The scatter plot shows that there is a positive relationship between advertising expenditure and sales volumethat is, higher values of sales volume are associated with higher levels of advertising expenditure.
We have drawn a straight line through the plotted points of the scatter plot to represent the relationship between advertising expenditure and sales volume. We often do this when the relationship between two variables appears to be a straight line, or linear, relationship. Of course, the relationship between $x$ and $y$ in Figure $2.24$ is not perfectly linear-not all of the points in the scatter plot are exactly on the line. Nevertheless, because the relationship between $x$ and $y$ appears to be approximately linear, it seems reasonable to represent the general relationship between these variables using a straight line. In future chapters we will explain ways to quantify such a relationship that is, describe such a relationship numerically. Moreover, not all linear relationships between two variables $x$ and $y$ are positive linear relationships (that is, have a positive slope). For example, Table $2.21$ on the next page gives the average hourly outdoor temperature $(x)$ in a city during a week and the city’s natural gas consumption (y) during the week for each of the previous eight weeks. The temperature readings are expressed in degrees Fahrenheit and the natural gas consumptions are expressed in millions of cubic feet of natural gas. The scatter plot in Figure $2.25$ shows that there is a

negative linear relationship between $x$ and $y$-that is, as average hourly temperature in the city increases, the city’s natural gas consumption decreases in a linear fashion. Finally, not all relationships are linear. In Chapter 14 we will consider how to represent and quantify curved relationships, and, as illustrated in Figure $2.26$, there are situations in which two variables $x$ and $y$ do not appear to have any relationship.

To conclude this section, recall from Chapter 1 that a time series plot (also called a runs plot) is a plot of individual process measurements versus time. This implies that a time series plot is a scatter plot, where values of a process variable are plotted on the vertical axis versus corresponding values of time on the horizontal axis.

广义线性模型代考

statistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

统计代写|商业分析作业代写Statistical Modelling for Business代考|Cumulative distributions and ogives

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

统计代写|商业分析作业代写Statistical Modelling for Business代考|Cumulative distributions and ogives

Another way to summarize a distribution is to construct a cumulative distribution. To do this, we use the same number of classes, the same class lengths, and the same class boundaries that we have used for the frequency distribution of a data set. However, in order to construct a cumulative frequency distribution, we record for each class the number of

measurements that are less than the upper boundary of the class. To illustrate this idea, Table $2.10$ gives the cumulative frequency distribution of the payment time distribution summarized in Table $2.7$ (page 64). Columns (1) and (2) in this table give the frequency distribution of the payment times. Column (3) gives the cumulative frequency for each class. To see how these values are obtained, the cumulative frequency for the class $10<13$ is the number of payment times less than 13 . This is obviously the frequency for the class $10<13$, which is 3 . The cumulative frequency for the class $13<16$ is the number of payment times less than 16 , which is obtained by adding the frequencies for the first two classes-that is, $3+14=17$. The cumulative frequency for the class $16<19$ is the number of payment times less than 19-that is, $3+14+23=40$. We see that, in general, a cumulative frequency is obtained by summing the frequencies of all classes representing values less than the upper buumdary of the class.

Column (4) gives the cumulative relative frequency for each class, which is obtained by summing the relative frequencies of all classes representing values less than the upper boundary of the class. Or, more simply, this value can be found by dividing the cumulative frequency for the class by the total number of measurements in the data set. For instance, the cunulative relative frequency for the class $19<22$ is $52 / 65-.8$. Culum (5) gives the cumulative percent frequency for each class, which is obtained by summing the percent frequencies of all classes representing values less than the upper boundary of the class. More simply, this value can be found by multiplying the cumulative relative frequency of a class by 100. For instance, the cumulative percent frequency for the class $19<22$ is $.8 \times(100)=80$ percent.

As an example of interpreting Table $2.10 .60$ of the 65 payment times are 24 days or less. or, equivalently, $92.31$ percent of the payment times (or a fraction of $.9231$ of the payment times) are 24 days or less. Also, notice that the last entry in the cumulative frequency distribution is the total number of measurements (here, 65 payment times). In addition, the last entry in the cumulative relative frequency distribution is $1.0$ and the last entry in the cumulalive percent frequency disiribution is $100 \%$. In general, for any dala sel, these lasi eniries will be, respectively, the total number of measurements, $1.0$, and $100 \%$.

An ogive (pronounced “oh-jive”) is a graph of a cumulative distribution. To construct a frequency ogive, we plot a point above each upper class boundary at a height equal to the cumulative frequency of the class. We then connect the plotted points with line segments. A similar graph can be drawn using the cumulative relative frequencies or the cumulative percent frequencies. As an example, Figure $2.14$ gives a percent frequency ogive of the payment times. Looking at this figure, we see that, for instance, a little more than 25 percent (actually, $26.15$ percent according to Table $2.10$ ) of the payment times are less than 16 days, while 80 percent of the payment times are less than 22 days. Also notice that we have completed the ogive by plotting an additional point at the lower boundary of the first (leftmost) class at a height equal to zero. This depicts the fact that none of the payment times is less than 10 days. Finally, the ogive graphically shows that all ( 100 percent) of the payment times are less than 31 days.

A very simple graph that can be used to summarize a data set is called a dot plot. To make a dot plot we draw a horizontal axis that spans the range of the measurements in the data set. We then place dots above the horizontal axis to represent the measurements. As an example, Figure $2.18$ (a) shows a dot plot of the exam scores in Table $2.8$ (page 68). Remember, these are the scores for the first exam given before implementing a strict attendance policy. The horizontal axis spans exam scores from 30 to 100 . Each dot above the axis represents an exam score. For instance, the two dots above the score of 90 tell us that two students received a 90 on the exam. The dot plot shows us that there are two concentrations of scores-those in the $80 \mathrm{~s}$ and $90 \mathrm{~s}$ and those in the $60 \mathrm{~s}$. Figure $2.18(\mathrm{~b})$ gives a dot plot of the scores on the second exam (which was given after imposing the attendance policy). As did the percent frequency polygon for Exam 2 in Figure $2.13$ (page 69), this second dot plot shows that the attendance policy eliminated the concentration of scores in the $60 \mathrm{~s}$.

Dot plots are useful for detecting outliers, which are unusually large or small observations that are well separated from the remaining observations. For example, the dot plot for exam 1 indicates that the score 32 seems unusually low. How we handle an outlier depends on its cause. If the outlier results from a measurement error or an error in recording or processing the data, it should be corrected. If such an outlier cannot be corrected, it should be discarded. If an outlier is not the result of an error in measuring or recording the data, its cause may reveal important information. For example, the outlying exam score of 32 convinced the author that the student needed a tutor. After working with a tutor, the student showed considerable improvement on Exam 2. A more precise way to detect outliers is presented in Section 3.3.

统计代写|商业分析作业代写Statistical Modelling for Business代考|back-to-back stem-and-leaf display

If we wish to compare two distributions, it is convenient to construct a back-to-back stemand-leaf display. Figure $2.20$ presents a back-to-back stem-and-leaf display for the previously discussed exam scores. The left side of the display summarizes the scores for the first exam. Remember, this exam was given before implementing a strict attendance policy. The right side of the display summarizes the scores for the second exam (which was given after imposing the attendance policy). Looking at the left side of the display, we see that for the first exam there are two concentrations of scores-those in the $80 \mathrm{~s}$ and $90 \mathrm{~s}$ and those in the $60 \mathrm{~s}$. The right side of the display shows that the attendance policy eliminated the concentration of scores in the $60 \mathrm{~s}$ and illustrates that the scores on exam 2 are almost single peaked and somewhat skewed to the left.

Stem-and-leaf displays are useful for detecting outliers, which are unusually large or small observations that are well separated from the remaining observations. For example, the stem-and-leaf display for exam 1 indicates that the score 32 seems unusually low. How we handle an outlier depends on its cause. If the outlier results from a measurement error or an error in recording or processing the data, it should be corrected. If such an outlier cannot be corrected, it should be discarded. If an outlier is not the result of an error in measuring or recording the data, its cause may reveal important information. For example, the outlying exam score of 32 convinced the author that the student needed a tutor. After working with a tutor, the student showed considerable improvement on exam 2. A more precise way to detect outliers is presented in Section 3.3.

统计代写|商业分析作业代写Statistical Modelling for Business代考|Cumulative distributions and ogives

ogive（发音为“oh-jive”）是累积分布图。为了构建频率曲线，我们在每个上层类边界上方绘制一个点，高度等于类的累积频率。然后我们将绘制的点与线段连接起来。可以使用累积相对频率或累积百分比频率绘制类似的图表。例如，图2.14给出支付时间的百分比频率。看看这个数字，我们看到，例如，略高于 25%（实际上，26.15根据表百分比2.10) 的付款时间少于 16 天，而 80% 的付款时间少于 22 天。另请注意，我们通过在第一个（最左边）类的下边界处绘制一个附加点，高度为零，从而完成了 ogive。这描述了没有一个付款时间少于 10 天的事实。最后，ogive 以图形方式显示所有（100%）的付款时间都少于 31 天。

广义线性模型代考

statistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

统计代写|商业分析作业代写Statistical Modelling for Business代考|Constructing Frequency Distributions and Histograms

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

统计代写|商业分析作业代写Statistical Modelling for Business代考|Constructing Frequency Distributions and Histograms

The procedure in the preceding box is not the only way to construct a histogram. Often, histograms are constructed more informally. For instance, it is not necessary to set the lower boundary of the first (leftmost) class equal to the smallest measurement in the data. As an example, suppose that we wish to form a histogram of the 50 gas mileages given in Table $1.7$ (page 15). Examining the mileages, we see that the smallest mileage is $29.8 \mathrm{mpg}$ and that the largest mileage is $33.3 \mathrm{mpg}$. Therefore, it would be convenient to begin the first (leftmost) class at $29.5 \mathrm{mpg}$ and end the last (rightmost) class at $33.5 \mathrm{mpg}$. Further, it would be reasonable to use classes that are $.5 \mathrm{mpg}$ in length. We would then use 8 classes: $29.5<30$, $30<30.5,30.5<31,31<31.5,31.5<32,32<32.5,32.5<33$, and $33<33.5$. A histogram of the gas mileages employing these classes is shown in Figure $2.9$.

Sometimes it is desirable to let the nature of the problem determine the histogram classes. For example, to construct a histogram describing the ages of the residents in a city, it might be reasonable to use classes having 10-year lengths (that is, under 10 years, $10-19$ years, 20-29 years, $30-39$ years, and so on).

统计代写|商业分析作业代写Statistical Modelling for Business代考|Some common distribution shapes

90<100. 这是让情况确定频率分布的类别的示例，当情况自然定义类别时，这是常见的做法。形成多边形的点对应于类的中点绘制(35,45,55,65,75,85,95). 每个点都绘制在一个高度，该高度等于其班级考试成绩的百分比。例如，因为 40 个分数中有 10 个至少为 90 且小于 100，所以对应于类中点 95 的绘图点绘制在 25% 的高度处。

广义线性模型代考

statistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

统计代写|商业分析作业代写Statistical Modelling for Business代考|Graphically Summarizing Quantitative Data

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

统计代写|商业分析作业代写Statistical Modelling for Business代考|Frequency distributions and histograms

Major consulting firms such as Accenture, Ernst \& Young Consulting, and Deloitte \& Touche Consulting employ statistical analysis to assess the effectiveness of the systems they design for their customers. In this case a consulting firm has developed an electronic billing system for a Hamilton, Ohio, trucking company. The system sends invoices electronically to each customer’s computer and allows customers to easily check and correct errors. It is hoped that the new hilling system will substantially reduce the amount of time it takes customers to make payments. Typical payment times-measured from the date on an invoice to the date payment is received-using the trucking company’s old billing system had been 39 days or more. This exceeded the industry standard payment time of 30 days.
The new billing system does not automatically compute the payment time for each invoice because there is no continuing need for this information. Therefore, in order to assess the system’s effectiveness, the consulting firm selects a random sample of 65 invoices from the 7,823 invoices processed during the first three months of the new system’s operation. The payment times for the 65 sample invoices are manually determined and are given in Table $2.4$. If this sample can be used to establish that the new billing system substantially reduces payment times, the consulting firm plans to market the system to other trucking companies.

Looking at the payment times in Table $2.4$, we can see that the shortest payment time is 10 days and that the longest payment timee is 29 days. Beyond that, it is pretty difficult to interpret the data in any meaningful way. To better understand the sample of 65 payment times, the consulting firm will form a frequency distribution of the data and will graph the distribution by constructing a histogram. Similar to the frequency distributions for qualitative data we studied in Section 2.1, the frequency distribution will divide the payment times into classes and will tell us how many of the payment times are in each class.

统计代写|商业分析作业代写Statistical Modelling for Business代考|Form Nonoverlapping Classes of Equal Width

Step 3: Form Nonoverlapping Classes of Equal Width We can form the classes of the frequency distribution by defining the boundaries of the classes. To find the first class boundary, we find the smallest payment time in Table $2.4$, which is 10 days. This value is the lower boundary of the first class. Adding the class length of 3 to this lower boundary, we obtain $10+3=13$, which is the upper boundary of the first class and the lower boundary of the second class. Similarly, the upper boundary of the second class and the lower boundary of the third class equals $13+3=16$. Continuing in this fashion, the lower boundaries of the remaining classes are $19,22,25$, and 28 . Adding the class length 3 to the lower boundary of the last class gives us the upper boundary of the last class, 31 . These boundaries define seven nonoverlapping classes for the frequency distribution. We summarize these classes in Table 2.6. For instance, the first class $-10$ days and less than 13 days-includes the payment times 10,11 , and 12 days; the second class $-13$ days and less than 16 days-includes the payment times 13 , 14 , and 15 days; and so forth. Notice that the largest observed payment time- 29 days-is contained in the last class. In cases where the largest measurement is not contained in the last class, we simply add another class. Generally speaking, the guidelines we have given for forming classes are not inflexible rules. Rather, they are inended to help us find reasunable classes. Finally, the method we have used for forming classes results in classes of equal length. Generally, forming classes of equal length will make it easier to appropriately interpret the frequency distribution.

统计代写|商业分析作业代写Statistical Modelling for Business代考|Graph the Histogram

Step 5: Graph the Histogram We can graphically portray the distribution of payment times by drawing a histogram. The histogram can be constructed using the frequency, relative frequency, or percent frequency distribution. To set up the histogram, we draw rectangles that correspond to the classes. The base of the rectangle corresponding to a class represents the payment times in the class. The height of the rectangle can represent the class frequency, relative frequency, or percent frequency.

We have drawn a frequency histogram of the 65 payment times in Figure 2.7. The first (leftmost) rectangle, or “bar,” of the histogram represents the payment times 10,11 , and 12 . Looking at Figure 2.7, we see that the base of this rectangle is drawn from the lower boundary (10) of the first class in the frequency distribution of payment times to the lower boundary (13) of the second class. The height of this rectangle tells us that the frequency of the first class is 3 . The second histogram rectangle represents payment times 13,14 , and 15 . Its base is drawn from the lower boundary (13) of the second class to the lower boundary (16) of the third class, and its height tells us that the frequency of the second class is 14 . The other histogram bars are constructed similarly. Notice that there are no gaps between the adjacent rectangles in the histogram. Here, although the payment times have been recorded to the nearest whole day, the fact that the histogram bars touch each other emphasizes that a payment time could (in theory) be any number on the horizontal axis. In general, histograms are drawn so that adjacent bars touch each other.

Looking at the frequency distribution in Table $2.7$ and the frequency histogram in Figure 2.7, we can describe the payment times:
1 None of the payment times exceeds the industry standard of 30 days. (Actually, all of the payment times are less than 30 -remember the largest payment time is 29 days.)
2 The payment times are concentrated between 13 and 24 days ( 57 of the 65 , or $(57 / 65) \times 100=87.69 \%$, of the payment times are in this range).

统计代写|商业分析作业代写Statistical Modelling for Business代考|Graph the Histogram

1 支付时间没有超过 30 天的行业标准。（实际上，所有的付款时间都少于 30 天——记住最大的付款时间是 29 天。）
2 付款时间集中在 13 到 24 天之间（65 天中的 57 天，或(57/65)×100=87.69%, 的付款时间在此范围内）。

广义线性模型代考

statistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

统计代写|商业分析作业代写Statistical Modelling for Business代考|Studying Pizza Preferences by Using

Part 1: Studying Pizza Preferences by Using a Frequency Distribution Unfortunately, the raw data in Table $2.1$ do not reveal much useful information about the pattern of pizza preferences. In order to summarize the data in a more useful way, we can construct a frequency distribution. To do this we simply count the number of times each of the six pizza restaurants appears in Table 2.1. We find that Bruno’s appears 8 times, Domino’s appears 2 times, Little Caesars appears 9 times, Papa John’s appears 19 times, Pizza Hut appears 4 times, and Will’s Uptown Pizza appears 8 times. The frequency distribution for the pizza preferences is given in Table $2.2$ on the next page-a list of each of the six restaurants along with their corresponding counts (or frequencies). The frequency distribution shows us how the preferences are distributed among the six restaurants. The purpose of the frequency

distribution is to make the data easier to understand. Certainly, looking at the frequency distribution in Table $2.2$ is more informative than looking at the raw data in Table 2.1. We see that Papa John’s is the most popular restaurant, and that Papa John’s is roughly twice as popular as each of the next three runners-up-Bruno’s, Little Caesars, and Will’s. Finally, Pizza Hut and Domino’s are the least preferred restaurants.

When we wish to summarize the proportion (or fraction) of items in each class, we employ the relative frequency for each class. If the data set consists of $n$ observations, we define the relative frequency of a class as follows:
Relative frequency of a class $=\frac{\text { frequency of the class }}{h}$
This quantity is simply the fraction of items in the class. Further, we can obtain the percent frequency of a class by multiplying the relative frequency by 100 .

Table $2.3$ gives a relative frequency distribution and a percent frequency distribution of the pizza preference data. A relative frequency distribution is a table that lists the relative frequency for each class, and a percent frequency distribution lists the percent frequency for each class. Looking at Table $2.3$, we see that the relative frequency for Bruno’s pizza is $8 / 50=.16$ and that (from the percent frequency distribution) $16 \%$ of the sampled students preferred Bruno’s pizza. Similarly, the relative frequency for Papa John’s pizza is $19 / 50=$ $.38$ and $38 \%$ of the sampled students preferred Papa John’s pizza. Finally, the sum of the relative frequencies in the relative frequency distribution equals $1.0$, and the sum of the percent frequencies in the percent frequency distribution equals $100 \%$. These facts are true for all relative frequency and percent frequency distributions.

统计代写|商业分析作业代写Statistical Modelling for Business代考|Studying Pizza Preferences

Part 2: Studying Pizza Preferences by Using Bar Charts and Pie Charts A bar chart is a graphic that depicts a frequency, relative frequency, or percent frequency distribution. For example, Figure $2.1$ gives an Excel bar chart of the pizza preference data. On the horizontal axis we have placed a label for each class (restaurant), while the vertical axis measures frequencies. To construct the bar chart, Excel draws a bar (of fixed width) corresponding to each class label.

Each bar is drawn so that its height equals the frequency corresponding to its label. Because the height of each bar is a frequency, we refer to Figure $2.1$ as a frequency bar chart. Notice that there are gaps between the bars. When data are qualitative, the bars should always be separated by gaps in order to indicate that each class is separate from the others. The bar chart in Figure $2.1$ clearly illustrates that, for example, Papa Juhits pizca is preferred by more sampled students than any other restaurant and Domino’s pizza is least preferred by the sampled students.

If desired, the har heights can represent relative frequencies or percent frequencies For instance, Figure $2.2$ is a Minitab percent bar chart for the pizza preference data. Here the heights of the bars are the percentages given in the percent frequency distribution of Table 2.3. Lastly, the bars in Figures $2.1$ and $2.2$ have been positioned vertically. Because of this, these bar charts are called vertical bar charts. However, sometimes bar charts are constructed with horizontal bars and are called horizontal bar charts.

A pie chart is another graphic that can be used to depict a frequency distribution. When constructing a pie chart, we first draw a circle to represent the entire data set. We then divide the circle into sectors or “pie slices” based on the relative frequencies of the classes. For example, remembering that a circle consists of 360 degrees, Bruno’s Pizza (which has relative frequency .16) is assigned a pie slice that consists of $.16(360)=57.6$ degrees. Similarly, Papa John’s Pizza (with relative frequency .38) is assigned a pie slice having. $.38(360)=$ $136.8$ degrees. The resulting pie chart (constructed using Excel) is shown in Figure $2.3$ on the next page. Here we have labeled the pie slices using the percent frequencies. The pie slices can also be labeled using frequencies or relative frequencies.

统计代写|商业分析作业代写Statistical Modelling for Business代考|The Pareto chart (Optional)

Pareto charts are used to help identify important quality problems and opportunities for process improvement. By using these charts we can prioritize problem-solving activities. The Pareto chart is named for Vilfredo Pareto (1848-1923), an Italian economist. Pareto suggested that, in many economies, most of the wealth is held by a small minority of the population. It has been found that the “Pareto principle” often applies to defects. That is, only a few defect types account for most of a product’s quality problems.

To illustrate the use of Pareto charts, suppose that a jelly producer wishes to evaluate the labels being placed on 16-ounce jars of grape jelly. Every day for two weeks, all defective labels found on inspection are classified by type of defect. If a label has more than one defect, the type of defect that is most noticeable is recorded. The Excel output in Figure $2.4$ presents the frequencies and percentages of the types of defects observed over the two-week period.
In general, the first step in setting up a Pareto chart summarizing data concerning types of defects (or categories) is to construct a frequency table like the one in Figure 2.4. Defects or categories should be listed at the left of the table in decreasing order by frequenciesthe defect with the highest frequency will be at the top of the table, the defect with the second-highest frequency below the first, and so forth. If an “other” category is employed, it should be placed at the bottom of the table. The “other” category should not make up 50 percent or more of the total of the frequencies, and the frequency for the “other” category should not exceed the frequency for the defect at the top of the table. If the frequency for the “other” category is too high, data should be collected so that the “other” category can be broken down into new categories. Once the frequency and the percentage for each category are determined, a cumulative percentage for each category is computed. As illustrated in Figure $2.4$, the cumulative percentage for a particular category is the sum of the percentages corresponding to the particular eategory and the categories that are above that category in the table.

广义线性模型代考

statistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

统计代写|商业分析作业代写Statistical Modelling for Business代考|Excel, MegaStat, and Minitab for Statistics

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

统计代写|商业分析作业代写Statistical Modelling for Business代考|Excel, MegaStat, and Minitab for Statistics

In this book we use three types of software to carry out statistical analysis – Excel 2013, MegaStat, and Minitab $17 .$ Excel is, of course, a general purpose electronic spreadsheet program and analytical tool. The analysis ToolPak in Excel includes many procedures for performing various kinds of basic statistical analyses. MegaStat is an add-in package that is specifically designed for performing statistical analysis in the Excel spreadsheet environment. Minitab is a computer package designed expressly for conducting statistical analysis. It is widely used at many colleges and universities and in a large number of business organizations. The principal advantage of Excel is that, because of its broad acceptance among students and professionals as a multipurpose analytical tool, it is both well-known and widely available. The advantages of a special-purpose statistical software package like Minitab are that it provides a far wider range of statistical procedures and it offers the experienced analyst a range of options to better control the analysis. The advantages of MegaStat include (1) its ability to perform a number of statistical calculations that are not automatically done by the procedures in the Excel ToolPak and (2) features that make it easier to use than Excel for a wide variety of statistical analyses. In addition, the output obtained by using MegaStat is automatically placed in a standard Excel spreadsheet and can be edited by using any of the features in Excel. MegaStat can be copied from the book’s website. Excel, MegaStat, and Minitab, through built-in functions, programming languages, and macros, offer almost limitless power. Here, we will limit our attention to procedures that are easily accessible via menus without resorting to any special programming or advanced features.

Commonly used features of Excel 2013, MegaStat, and Minitab 17 are presented in this chapter along with an initial application-the construction of a time series plot of the gas mileages in Table 1.7. You will find that thẻ limited instructions included hêré, alông with thê built-in hêlp féatures ớ all thieê sơftware packages, will serve as a starting point from which you can discover a variety of other procedures and options. Much more detailed descriptions of Minitab 17 can be found in other sources, in particular in the manual Getting Started with Minitab $17 .$ This manual is available as a pdf file, viewable using Adobe Acrobat Reader, on the Minitab Inc. website-go to http://www.minitab.com/en-us/support/documentation/ to download the manual. This manual is also available online at http://support.minitab.com/en-us/minitab/17/getting-started/. Similarly, there are a number of alternative reference materials for Microsoft Excel 2013. Of course, an understanding of the related statistical concepts is essential to the effective use of any statistical software package.

统计代写|商业分析作业代写Statistical Modelling for Business代考|Getting Started with Excel

Because Excel 2013 may be new to some readers, and because the Excel 2013 window looks somewhat different from previous versions of Excel, we will begin by describing some characteristics of the Excel 2013 window. Versions of Excel prior to 2007 employed many drop-down menus. This meant that many features were “hidden” from the user, which resulted in a steep learning curve for beginners. Beginning with Excel 2010, Microsoft tried to reduce the number of features that are hidden in drop-down menus. Therefore, Excel 2013 displays all of the applicable commands needed for a particular type of task at the top of the Excel window. These commands are represented by a tab-and-group arrangement called the ribbon-see the right side of the illustration of an Excel 2013 window on the next page. The commands displayed in the ribbon are regulated by a series of tabs located near the top of the ribbon. For example, in the illustration, the Home tab is selected. If we selected a different tab, say, for example, the Page Layout tab, the commands displayed by the ribbon would be different.
We now briefly describe some basic features of the Excel 2013 window:
1 File button: By clicking on this button, the user obtains a menu of often used commands-for example, Open, Save, Print, and so forth. This menu also provides access to a large number of Excel options settings.
$2 \mathrm{~ T a ̉ b s : ~ C l i c k i n g ̄ ~ o ̄ n ~ a ~ t a b b ~ r e ̂ s u l t s ~ i n ~ a ~ r i b b o ́ n ~ đ i s p l a y ~ o ̛ f ~ f e ̉ a t u}$ type of task. For example, when the Home tab is selected (as in the figure), the features, commands, and options displayed by the ribbon are all related to making entries into the Excel worksheet. As another example, if the Formulas tab is selected, all of the features, commands, and options displayed in the ribbon relate to using formulas in the Excel worksheet.
Appendix $1.1$
Getting Started with Excel
3 Quick access toolbar: This toolbar displays buttons that provide shortcuts to often-used commands. Initially, this toolbar displays Save, Undo, and Redo buttons. The user can customize this toolbar by adding shortcut buttons for other commands (such as New, Open, Quick Print, and so forth). This can be done by clicking on the arrow button directly to the right of the Quick Access toolbar and by making selections from the “Customize” drop-down menu that appears.

统计代写|商业分析作业代写Statistical Modelling for Business代考|Select Home : Format : Row Height

This notation indicates that we first select the Home tab on the ribbon, then we select Format from the Cells Group on the ribbon, and finally we select Row Height from the Format drop-down menu.

For many of the statistical and graphical procedures in Excel, it is necessary to provide a range of cells to specify the location of data in the spreadsheet. Generally, the range may be specified either by typing the cell locations directly into a dialog box or by dragging the selected range with the mouse. Although for the experienced user, it is usually easier to use the mouse to select a range, the instructions that follow will, for precision and clarity, specify ranges by typing in cell locations. The selected range may include column or variable labels-labels at the tops columns that serve to identify variables. When the selected range includes such labels, it is important to select the “Labels check box” in the analysis dialog box.

统计代写|商业分析作业代写Statistical Modelling for Business代考|Getting Started with Excel

1 文件按钮：通过单击此按钮，用户可以获得一个常用命令菜单，例如打开、保存、打印等。此菜单还提供对大量 Excel 选项设置的访问。
̉đ̛̉2 吨一种̉bs: Cl一世Cķ一世nḠ 这̄n 一种 吨一种bb r和̂s在l吨s 一世n 一种 r一世bb这́n D一世spl一种是 这̛F F和̉一种吨在任务类型。例如，当 Home 选项卡被选中时（如图所示），功能区显示的功能、命令和选项都与在 Excel 工作表中输入条目有关。作为另一个示例，如果选择了“公式”选项卡，则功能区中显示的所有功能、命令和选项都与在 Excel 工作表中使用公式有关。

Excel
3 入门快速访问工具栏：此工具栏显示提供常用命令快捷方式的按钮。最初，此工具栏显示保存、撤消和重做按钮。用户可以通过为其他命令（例如新建、打开、快速打印等）添加快捷按钮来自定义此工具栏。这可以通过直接单击快速访问工具栏右侧的箭头按钮并从出现的“自定义”下拉菜单中进行选择来完成。

广义线性模型代考

statistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

统计代写|商业分析作业代写Statistical Modelling for Business代考|Types of survey questions

Survey instruments can use dichotomous (“yes or no”), multiple-choice, or open-ended questions. Each type of question has its benefits and drawbacks. Dichotomous questions are usually clearly stated, can be answered quickly, and yield data that are easily analyzed. However, the information gathered may be limited by this two-option format. If we limit voters to expressing support or disapproval for stem-cell research, we may not learn the nuanced reasoning that voters use in weighing the merits and moral issues involved. Similarly, in today’s heterogeneous world, it would be unusual to use a dichotomous question to categorize a person’s religious preferences. Asking whether respondents are Christian or non-Christian (or to use any other two categories like Jewish or non-Jewish; Muslim or nonMuslim) is certain to make some people feel their religion is being slighted. In addition, this is a crude and unenlightening way to learn about religious preferences.

Multiple-choice questions can assume several different forms. Sometimes respondents are asked to choose a response from a list (for example, possible answers to the religion question could be Jewish; Christian; Muslim; Hindu; Agnostic; or Other). Other times, respondents are asked to choose an answer from a numerical range. We could ask the question:
“In your opinion, how important are SAT scores to a college student’s success?”
Not important at all $1 \quad 2 \quad 3 \quad 4 \quad 5$ Extremely important
These numerical responses are usually summarized and reported in terms of the average response, whose size tells us something about the perceived importance. The Zagat restaurant survey (www.zagat.com) asks diners to rate restaurants’ food, décor, and service, each on a scale of 1 to 30 points, with a 30 representing an incredible level of satisfaction. Although the Zagat scale has an unusually wide range of possible ratings, the concept is the same as in the more common 5-point scale.

Open-ended questions typically provide the most honest and complete information because there are no suggested answers to divert or bias a person’s response. This kind of question is often found on instructor evaluation forms distributed at the end of a college course. College students at Georgetown University are asked the open-ended question, “What comments would you give to the instructor?’ The responses provide the instructor feedback that may be missing from the initial part of the teaching evaluation survey, which consists of numerical multiple-choice ratings of various aspects of the course. While these numerical ratings can be used to compare instructors and courses, there are no easy comparisons of the diverse responses instructors receive to the open-ended question. In fact, these responses are often seen only by the instructor and are useful, constructive tools for the teacher despite the fact they cannot be readily summarized.

Survey questionnaires must be carefully constructed so they do not inadvertently bias the results. Because survey design is such a difficult and sensitive process, it is not uncommon for a pilot survey to be taken before a lot of time, effort, and financing go into collecting a large amount of data. Pilot surveys are similar to the beta version of a new electronic product; they are tested out with a smaller group of people to work out the “kinks” before being used on a larger scale. Determination of the sample size for the final survey is an important process for many reasons. If the sample size is too large, resources may be wasted during the data collection. On the other hand, not collecting enough data for a meaningful analysis will obviously be detrimental to the study. Fortunately, there are several formulas that will help decide how large a sample should be, depending on the goal of the study and various other factors.

统计代写|商业分析作业代写Statistical Modelling for Business代考|Types of surveys

There are several different survey types, and we will explore just a few of them. The phone survey is particularly well-known (and often despised). A phone survey is inexpensive and usually conducted by callers who have very little training. Because of this and the impersonal nature of the medium, the respondent may misunderstand some of the questions. A further drawback is that some people cannot be reached and that others may refuse to answer some or all of the questions. Phone surveys are thus particularly prone to have a low response rate.
The response rate is the proportion of all people whom we attempt to contact that actually respond to a survey. A low response rate can destroy the validity of a survey’s results.
It can be difficult to collect good data from unsolicited phone calls because many of us resent the interruption. The calls often come at inopportune times, intruding on a meal or arriving just when we have climbed a ladder with a full can of paint. No wonder we may fantasize about turning the tables on the callers and calling them when it is least convenient.

Numerous complaints have been filed with the Federal Trade Commission (FTC) about the glut of marketing and survey telephone calls to private residences. The National Do Not Call Registry was created as the culmination of a comprehensive, three-year review of the Telemarketing Sales Rule (TSR) (www.ftc.gov/donotcall/). This legislation allows people to enroll their phone numbers on a website so as to prevent most marketers from calling them.
Self-administered surveys, or mail surveys, are also very inexpensive to conduct. However, these also have their drawbacks. Often, recipients will choose not to reply unless they receive some kind of financial incentive or other reward. Generally, after an initial mailing, the response rate will fall between 20 and 30 percent. Response rates can be raised with successive follow-up reminders, and after three contacts, they might reach between 65 and 75 percent. Unfortunately, the entire process can take significantly longer than a phone survey would.

Web-based surveys have become increasingly popular, but they suffer from the same problems as mail surveys. In addition, as with phone surveys, respondents may record their true reactions incorrectly because they have misunderstood some of the questions posed.
A personal interview provides more control over the survey process. People selected for interviews are more likely to respond because the questions are being asked by someone face-to-face. Questions are less likely to be misunderstood because the people conducting the interviews are typically trained employees who can clear up any confusion arising during the process. On the other hand, interviewers can potentially “lead” a respondent by body language which signals approval or disapproval of certain sorts of answers. They can also prompt certain replies by providing too much information. Mall surveys are examples of personal interviews. Interviewers approach shoppers as they pass by and ask them to answer the survey questions. Response rates around 50 percent are typical. Personal interviews are more costly than mail or phone surveys. Obviously, the objective of the study will be important in deciding upon the survey type employed.

统计代写|商业分析作业代写Statistical Modelling for Business代考|Errors of observation

As discussed in Section 1.4, the opinions of those who bother to complete a voluntary response survey may be dramatically different from those who do not. (Recall the Ann Landers question about having children.) The viewer voting on the popular television show American Idol is another illustration of selection bias, because only those who are interested in the outcome of the show will bother to phone in or text message their votes. The results of the voting are not representative of the performance ratings the country would give as a whole.
Errors of observation occur when data values are recorded incorrectly. Such errors can be caused by the data collector (the interviewer), the survey instrument, the respondent, or the data collection process. For instance, the manner in which a question is asked can influence the response. Or, the order in which questions appear on a questionnaire can influence the survey results. Or, the data collection method (telephone interview, questionnaire, personal interview, or direct observation) can influence the results. A recording error occurs when either the respondent or interviewer incorrectly marks an answer. Once data are collected from a survey, the results are often entered into a computer for statistical analysis. When transferring data from a survey form to a spreadsheet program like Excel, Minitab, or MegaStat, there is potential for entering them incorrectly. Before the survey is administered, the questions need to be very carefully worded so that there is little chance of misinterpretation. A poorly framed question might yield results that lead to unwarranted decisions. Scaled questions are particularly susceptible to this type of error. Consider the question “How would you rate this course?” Without a proper explanation, the respondent may not know whether “1” or ” 5 ” is the best.

If the survey instrument contains highly sensitive questions and respondents feel compelled to answer, they may not tell the truth. This is especially true in personal interviews. We then have what is called response bias. A surprising number of people are reluctant to be candid about what they like to read or watch on television. People tend to overreport “good” activities like reading respected newspapers and underreport their “bad” activities like delighting in the National Fnquirer’s stories of alien ahductions and celehrity meltdewns. Iniggine, then, the difficully in getting henest inswers abeut pevple’s ganbling hab its, drug use, or sexual histories. Response bias can also occur when respondents are asked slanted questions whose wording influences the answer received. For example, consider the following question:
Which of the following best describes your views on gun control?
1 The government should take away our guns, leaving us defenseless against heavily armed criminals.
2 We have the right to keep and bear arms.
This question is biased toward eliciting a response against gun control.

统计代写|商业分析作业代写Statistical Modelling for Business代考|Types of survey questions

“在您看来，SAT 成绩对大学生的成功有多重要？”

统计代写|商业分析作业代写Statistical Modelling for Business代考|Errors of observation

1 政府应该拿走我们的枪支，让我们对全副武装的犯罪分子束手无策。
2 我们有权持有和携带武器。

广义线性模型代考

statistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

统计代写|商业分析作业代写Statistical Modelling for Business代考|Predictive analytics, data mining

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

Predictive analytics are methods used to find anomalies, patterns, and associations in data sets, with the purpose of predicting future outcomes. Predictive analytics and data mining are terms that are sometimes used together, but data mining might more specifically be defined to be the use of predictive analytics, computer science algorithms, and information systems techniques to extract useful knowledge from huge amounts of data. It is estimated that for any data mining project, approximately 65 percent to 90 percent of the time is spent in data preparation – checking, correcting, reconciling inconsistencies in, and otherwise “cleaning” the data. Also, whereas predictive analytics methods might be most useful to decision makers when used with data mining, these methods can also be important, as we will see, when analyzing smaller data sets. Prescriptive analytics looks at internal and extemal variables and constraints, along with the predictions obtained from predictive analytics, to recommend one or more courses of action. In this book, other than intuitively using predictions from predictive analytics to suggest business improvement courses of action, we will not discuss prescriptive analytics. Therefore, returning to predictive analytics, we can roughly classify the applications of predictive analytics into six categories:
Anomaly (outlier) detection In a data set, predictive analytics can be used to get a picture of what the data tends to look like in a typical case and to determine if an observation is notably different (or outlying) from this pattern. For example, a sales manager could model the sales results of typical salespeople and use anomaly detection to identify specific salespeople who have unusually high or low sales results. Or the IRS could model typical tax returns and use anomaly detection to identify specific returns that are extremely atypical for review and possible audit.
Association learning This involves identifying items that tend to co-occur and finding the rules that describe their co-occurrence. For example, a supermarket chain once found that men who buy baby diapers on Thursdays also tend to buy beer on Thursdays (possibly in anticipation of watching sports on television over the weekend). This led the chain to display beer near the baby aisle in its stores. As another example, Netflix might find that customers whō rent fictiōnal dramas alsō tênd tō rent historical documentaries ō thăt some customers will rent almost any type of movie that stars a particular actor or actress. Disney might find that visitors who spend more time at the Magic Kingdom also tend to buy Disney cartoon character clothing. Disney might also find that visitors who stay in more luxurious Disney hotels also tend to play golf on Disney courses and take cruises on the Disney Cruise Line. These types of findings are used for targeting coupons, deals, or advertising to the right potential customers.

统计代写|商业分析作业代写Statistical Modelling for Business代考|Ratio, Interval, Ordinal

In Section $1.1$ we said that a variable is quantitative if its possible values are numbers that represent quantities (that is, “how much” or “how many”). In general, a quantitative variable is measured on a scale having a fixed unit of measurement between its possible values. For example, if we measure employees’ salaries to the nearest dollar, then one dollar is the fixed unit of measurement between different employees’ salaries. There are two types of quantitative variables: ratio and interval. A ratio variable is a quantitative variable measured on a scale such that ratios of its values are meaningful and there is an inherently defined zero value. Variables such as salary, height, weight, time, and distance are ratio variables. For example, a distance of zero miles is “no distance at all,” and a town that is 30 miles away is “twice as far” as a town that is 15 miles away.

An interval variable is a quantitative variable where ratios of its values are not meaningful and there is not an inherently defined zero value. Temperature (on the Fahrenheit scale) is an interval variable. For example, zero degrees Fahrenheit does not represent “no heat at all,” just that it is very cold. Thus, there is no inherently defined zero value. Furthermore, ratios of temperatures are not meaningful. For example, it makes no sense to say that $60^{\circ}$ is twice as

warm as $30^{\circ}$. In practice, there are very few interval variables other than temperature. Almost all quantitative variables are ratio variables.

In Section $1.1$ we also said that if we simply record into which of several categories a population (or sample) unit falls, then the variable is qualitative (or eategorical). There are two types of qualitative variables: ordinal and nominative. An ordinal variable is a qualitative variable for which there is a meaningful ordering, or ranking, of the categories. The measurements of an ordinal variable may be nonnumerical or numerical. For example, a student may be asked to rate the teaching effectiveness of a college professor as excellent, good, average, poor, or unsatisfactory. Here, one category is higher than the next one; that is, “excellent” is a higher rating than “good,” “good” is a higher rating than “average,” and so on. Therefore, teaching effectiveness is an ordinal variable having nonnumerical measurements. On the other hand, if (as is often done) we substitute the numbers $4,3,2,1$, and 0 for the ratings excellent through unsatisfactory, then teaching effectiveness is an ordinal variable having numerical measurements.

In practice, hoth numhers and associated words are often presented to respondents asked to rate a person or item. When numbers are used, statisticians debate whether the ordinal variable is “somewhat quantitative.” For example, statisticians who claim that teaching effectiveness rated as $4,3,2,1$, or 0 is not somewhat quantitative argue that the difference between 4 (excellent) and 3 (good) may not be the same as the difference between 3 (good) and 2 (average). Other statisticians argue that as soon as respondents (students) see equally spaced numbers (even though the numbers are described by words), their responses are affected enough to make the variable (teaching effectiveness) somewhat quantitative. Generally speaking, the specific words associated with the numbers probably substantially affect whether an ordinal variable may be considered somewhat quantitative. It is important to note, however, that in practice numerical ordinal ratings are often analyzed as though they are quantitative. Specifically, various arithmetic operations (as discussed in Chapters 2 through 18) are often performed on numerical ordinal ratings. For example, a professor’s teaching effectiveness average and a student’s grade point average are calculated.

To conclude this section, we consider the second type of qualitative variable. A nominative variable is a qualitative variable for which there is no meaningful ordering, or ranking, of the categories. A person’s gender, the color of a car, and an employee’s state of residence are nominative variables.

It is wise to stratify when the population consists of two or more groups that differ with respect to the variable of interest. For instance, consumers could be divided into strata based on gender, age, ethnic group, or income.

As an example, suppose that a department store chain proposes to open a new store in a location that would serve customers who live in a geographical region that consists of (1) an industrial city, (2) a suburban community, and (3) a rural area. In order to assess the potential profitability of the proposed store, the chain wishes to study the incomes of all households in the region. In addition, the chain wishes to estimate the proportion and the total number of households whose members would be likely to shop at the store. The department store chain feels that the industrial city, the suburban community, and the rural area differ with respect to income and the store’s potential desirability. Therefore, it uses these subpopulations as strata and takes a stratified random sample.

Taking a stratified sample can be advantageous because such a sample takes advantage of the fact that elements in the same stratum are similar to each other. It follows that a stratified sample can provide more accurate information than a random sample of the same size. As a simple example, if all of the elements in each stratum were exactly the same, then examining only one element in each stratum would allow us to describe the entire population. Furthermore, stratification can make a sample easier (or possible) to select. Recall that, in order to take a random sample, we must have a list, or frame of all of the population elements. Although a frame might not exist for the overall population, a frame might exist for each stratum. For example, suppose nearly all the households in the department store’s geographical region have telephones. Although there might not be a telephone directory for the overall geographical region, there might be separate telephone directories for the industrial city, the suburb, and the rural area. For more discussion of stratified random sampling, see Mendenhall, Schaeffer, and Ott (1986).
Sometimes it is advantageous to select a sample in stages. This is a common practice when selecting a sample from a very large geographical region. In such a case, a frame often does not exist. For instance, there is no single list of all registered voters in the United States. There is also no single list of all households in the United States. In this kind of situation, we can use multistage cluster sampling. To illustrate this procedure, suppose we wish to take a sample of registered voters from all registered voters in the United States. We might proceed as follows:
Stage 1: Randomly select a sample of counties from all of the counties in the United States.
Stage 2: Randomly select a sample of townships from each county selected in Stage $1 .$
Stage 3: Randomly select a sample of voting precincts from each township selected in Stage 2.
Stage 4: Randomly select a sample of registered voters from each voting precinct selected in Stage 3 .

广义线性模型代考

statistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。