### 分类： 商业分析作业代写

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 商科代写|商业分析作业代写Statistical Modelling for Business代考|The Data Component

A common starting point for a discussion about data is that they are facts. There is a huge philosophical literature on what is a fact. As noted by Mulligan and Correia (2020), a fact is the opposite of theories and values and “are to be distinguished from things, in particular from complex objects, complexes and wholes, and from relations.” Without getting into this philosophy of facts, I will hold that a fact is a checkable or provable entity and, therefore, true. For example, it is true that Washington D.C. is the capital of the United States: it is easily checkable and can be shown to be true. It is also a fact that $1+1=2$. This is checkable by simply counting one finger on your left hand and one finger on your right hand. ${ }^3$

You could have a lot of facts on a topic but they are of little value if they are not

1. organized,
2. subsetted,
3. manipulated, and
4. interpreted
in a meaningfully way to provide insight for a recommendation for an action, the action being the problem solution. Otherwise, the facts are just a collection of valueless things. Their value stems from what you can do with them.

Organizing data, or facts, is a first step in any analytical process and the drive for information. This could involve arranging them in chronological order (e.g., by date and time of a transaction), spatial order (e.g., countries in the Northern and Southern Hemispheres), alphanumeric order, size order, and so on in an infinite number of ways. Transactions data, for example, are facts about units sold of a series of products including what products, who bought them, when they were sold, the amount sold, and prices. They are typically maintained in a file without a discernible order: just product, date, and units. There is no insight or intelligence from this data. In fact, it is somewhat randomly organized based on when orders were placed. ${ }^4$ If sorted by product and date, however, then they are organized and useful, but not much. The best organization is the one most applicable for a practical problem.
In Data Science, statistics, econometrics, machine learning, and other quantitative areas, a common organizational form is a rectangular array consisting of rows and columns: The rows are typically objects and the columns variables or features. An object can be a person (e.g., a customer) or an event (e.g., a transaction). The words object, case, individual, event, observation, and instance are often used interchangeably and I will certainly do this. For the methods considered in this book, each row is an individual case, one case per row and each case is in its own row.

## 商科代写|商业分析作业代写Statistical Modelling for Business代考|The Extractor Component

Finally, you have to apply some methods or procedures to your DataFrame to extract information. Refer back to Fig. $1.2$ for the role and position of an Extractor function in the information chain. This whole book is concerned with these methods. The interpretation of the results to give meaning to the information will be illustrated as I develop and discuss the methods, but the final interpretation is up to you based on your problem, your domain knowledge, and your expertise.

Due to the size and complexity of modern business data sets, the amount and type of information hidden inside them is large, to say the least. There is no one piece of information-no one size fits all-for all business problems. The same data set can be used for multiple problems and can yield multiple types of information. The possibilities are endless. The information content, however, is a function of the size and complexity of the DataFrame you eventually work with. The size is the number of data elements. Since a DataFrame is a rectangular array, the size is #rows $\times$ #columns elements and is given by its shape attribute. Shape is expressed as a tuple written as (rows, columns). For example, it could be $(5,2)$ for a DataFrame with 5 rows and 2 columns and 10 elements. The complexity is the types of data in the DataFrame and is difficult to quantify except to count types. They could be floating point numbers (or, simply, floats), integers (or ints), Booleans, text strings (referred to as objects), datetime values, and more. The larger and more complex the DataFrame, the more information you can extract. Let $I=I$ fformation, $S=$ size and $C=$ complexity. Then Information $=f(S, C)$ with $\partial I / \partial S>0$ and $\partial I / \partial C>0$. For a very large, complex DataFrame, there is a very large amount of information.

The cost of extracting information directly increases with the DataFrame’s size and complexity of the data. If I have 10 sales values, then my data set is small and simple. Minimal information, such as the mean, standard deviation, and range, can be extracted. The cost of extraction is low; just a hand-held calculator is needed. If I have $10 \mathrm{~GB}$ of data, then more can be done but at a greater cost. For data sizes approaching exabytes, the costs are monumental.

There could be an infinite amount of information, even contradictory information, in a large and complex DataFrame. So, when something is extracted, you have to check for its accuracy. For example, suppose you extract information that classifies customers by risk of default on extended credit. This particular classification may not be a good or correct one; that is, the predictive classifier $(P C)$ may not be the best one for determining whether someone is a credit risk or not. Predictive Error Analysis $(P E A)$ is needed to determine how well the $P C$ worked. I discuss this in Chap. 11. In that discussion, I will use a distinction between a training data set and a testing data set for building the classifier and testing it. This means the entire DataFrame will not, and should not, be used for a particular problem. It should be divided into the two parts although that division is not always clear, or even feasible. I will discuss the split into training and testing data sets in Chap.9.
The complexity of the DataFrame is, as I mentioned above, dependent on the types of data. Generally speaking, there are two forms: text and numeric. Other forms such as images and audio are possible but I will restrict myself to these two forms. I have to discuss these data types so that you know the possibilities and their importance and implications. How the two are handled within a DataFrame and with what statistical, econometric, and machine learning tools for the extraction of information is my focus in this book and so I will deal with them in depth in succeeding chapters. I will first discuss text data and then numeric data in the next two subsections.

# 商业分析代写

## 商科代写|商业分析作业代写Statistical Modelling for Business代考|The Data Component

1. 有组织的，
2. 子集，
3. 操纵，和
4. 以有意义的方式进行解释
，以提供对行动建议的洞察力，行动就是问题的解决方案。否则，事实只是一堆毫无价值的东西。它们的价值源于您可以用它们做什么。

## 广义线性模型代考

statistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 商科代写|商业分析作业代写Statistical Modelling for Business代考|Uncertainty vs. Risk

Uncertainty is a fact of life reflecting our lack of knowledge. It is either spatial (“I don’t know what is happening in Congress today.”) or temporal (“I don’t know what will happen to sales next year.”). In either case, the lack of knowledge is about the state of the world (SOW): what is happening in Congress and what will happen next year. Business textbooks such as Freund and Williams (1969), Spurr and Bonini (1968), and Hildebrand et al. (2005) typically discuss assigning a probability to different $S O W$ s that you could list. The purpose of these probabilities is to enable you to say something about the world before that something materializes. Somehow, and it is never explained how, you assign numeric values representing outcomes, or payoffs, to the $S O W \mathrm{~s}$. The probabilities and associated payoffs are used to calculate an expected or average payoff over all the possible $S O W$ s. Consider, for example, the rate of return on an investment (ROI) in a capital expansion project. The ROI might depend on the average annual growth of real GDP for the next 5 years. Suppose the real GDP growth is simply expressed as declining (i.e., a recession), flat ( $0 \%$ ), slow $(1 \%-2 \%)$, and robust $(>2 \%)$ with assigned probabilities of $0.05,0.20,0.50$, and $0.25$, respectively. These form a probability distribution. Let $p_i$ be the probability state $i$ is realized. Then, $\sum_{i=1}^n p_i=1.0$ for these $n=4$ possible states. I show the $S O W \mathrm{~s}$, probabilities, and $R O I$ values in Table 1.1. The expected $R O I$ is $\sum_{i=1}^4 p_i \times$ $R O I_i=2.15 \%$. This is the amount expected to be earned on average over the next 5 years.

Savage (1972, p. 9) notes that the “world” in the statement “state of the world” is defined for the problem at hand and that you should not take it literally. It is a fluid concept. He states that it is “the object about which the person is concerned.” At the same time, the “state” of the world is a full description of its conditions. Savage (1972) notes that it is “a description of the world, leaving no relevant aspects undescribed.” But he also notes that there is a true state, a “state that does in fact obtain, i.e., the true description of the world.” Unfortunately, it is unknown, and so the best we can do until it is realized or revealed to us is assign probabilities to the occurrence of each state for decision making. These are the probabilities in Table 1.1. More importantly, it is the fact that the true state is unknown, and never will be known until revealed that is the problem. No amount of information will ever completely and perfectly reveal this true state before it occurs.

## 商科代写|商业分析作业代写Statistical Modelling for Business代考|The Data-Information Nexus

To an extent, discussing definitions and terminology is useful for the advancement of scientific and practical solutions for any problem. If you cannot agree on basic terms, then you are doomed at worst and hindered at best from making any progress toward a solution, a decision. You can, however, become so involved in defining terms and so overly concerned about terminology that nothing else maters. Popper too strongly, that
One should never quarrel about words, and never get involved in questions of terminology … What we are really interested in, our real problem,… are problems of theories and their truth.
Popper, a philosopher of science, was concerned about scientific problems. The same sentiment, however, holds for practical problems like the ones you face daily in your business. Despite Popper’s preeminence, you still need some perspective on the foundational units that drive the raison d’etre of BDA: data and information. ${ }^1$ If information is so important for reducing uncertainty, then a logical question to ask is: “What is information?” A subordinate, but equally important, question is:

The words information and data are used as synonyms in everyday conversations. It is not uncommon, for example, to hear a business manager claim in one instance that she has a lot of data and then say in the next instance that she has a lot of information, thus linking the two words to have the same meaning. In fact, the computer systems that manage data are referred to as Information Systems (IS) and the associated technology used in those systems is referred to as Information Technology (IT). ${ }^2$ The C-Level executive in charge of this data and $I T$ infrastructure is the Chief Information Officer $(\mathrm{CIO})$. Notice the repeated use of the word “information.”
Even though people use these two words interchangeably it does not mean they have the same meaning. It is my contention, along with others, that data and information are distinct terms that, yet, have a connection. I will simply state that data are facts, objects that are true on their face, that have to be organized and manipulated to yield insight into something previously unknown. When managed and manipulated, they become information. The organization cannot be without the manipulation and the manipulation cannot be without the organization. The IT group of your business organizes your company’s data but it does not manipulate it to be information. The information is latent, hidden inside the data and must be extracted so it can be used in a decision. I illustrate this connection Fig. 1.2. I will comment on each component in the next few sections.

# 商业分析代写

## 商科代写|商业分析作业代写Statistical Modelling for Business代考|Uncertainty vs. Risk

Savage (1972, p. 9) 指出，“世界状况”陈述中的“世界”是为手头的问题定义的，你不应该从字面上理解它。这是一个流动的概念。他说这是“这个人所关心的对象”。同时，世界的“状态”是对其状况的完整描述。Savage (1972) 指出它是“对世界的描述，没有留下任何未描述的相关方面”。但他也指出存在一种真实的状态，一种“确实获得的状态，即对世界的真实描述”。不幸的是，它是未知的，因此在它被实现或揭示给我们之前我们能做的最好的事情就是为每个状态的发生分配概率以进行决策。这些是表 1.1 中的概率。更重要的是，真实状态不明，在发现问题所在之前永远不会为人所知。在这种真实状态发生之前，再多的信息也无法完全、完美地揭示它。

## 广义线性模型代考

statistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

What types of business problems warrant $B D A$ ? The types are too numerous to mention, but to give a sense of them consider a few examples:

• Anomaly Detection: production surveillance, predictive maintenance, manufacturing yield optimization;
• Fraud detection;
• Identity theft;
• Account and transaction anomalies;
• Customer analytics:
• Customer Relationship Management (CRM);
• Churn analysis and prevention;
• Customer Satisfaction;
• Marketing cross-sell and up-sell;
• Pricing: leakage monitoring, promotional effects tracking, competitive price responses;
• Fulfillment: management and pipeline tracking;
• Competitive monitoring;
• Competitive Environment Analysis (CEA); and
• New Product Development.
And the list goes on, and on.
A decision of some type is required for all these problems. New product development best exemplifies a complex decision process. Decisions are made throughout a product development pipeline. This is a series of stages from ideation or conceptualization to product launch and post-launch tracking. Paczkowski (2020) identifies five stages for a pipeline: ideation, design, testing, launch, and post-launch tracking. Decisions are made between each stage whether to proceed to the next one or abort development or even production. Each decision point is marked by a business case analysis that examines the expected revenue and market share for the product. Expected sales, anticipated price points (which are refined as the product moves through the pipeline), production and marketing cost estimates, and competitive analyses that include current products, sales, pricing, and promotions plus competitive responses to the proposed new product, are all needed for each business case assessment. If any of these has a negative implication for the concept, then it will be canceled and removed from the pipeline. Information is needed for each business case check point.

The expected revenue and market share are refined for each business case analysis as new and better information -not data-become available for the items I listed above. More data do become available, of course, as the product is developed, but it is the analysis of that data based on methods described in this book, that provide the information needed to approve or not approve the advancement of the concept to the next stage in the pipeline. The first decision, for example, is simply to begin developing a new product. Someone has to say “Yes” to the question “Should we develop a new product?” The business case analysis provides that decision maker with the information for this initial “Go/No Go” decision. Similar decisions are made at other stages.

Another example is product pricing. This is actually a two-fold decision involving a structure (e.g., uniform pricing or price discrimination to mention two possibilities) and a level within the structure. These decisions are made throughout the product life cycle beginning at the development stage (the launch stage of the pipeline I discussed above) and then throughout the post-launch period until the product is ultimately removed from the market. The wrong price structure and/or level could cost your business lost profit, lost market share, or a lost business. See Paczkowski (2018) for a discussion of the role of pricing and the types of analysis for identifying the best price structure and level. Also see Paczkowski (2020) for new product development pricing at each stage of the pipeline.

## 商科代写|商业分析作业代写Statistical Modelling for Business代考|The Role of Information in Business Decision Making

Decisions are effective if they solve a problem, such as those I discussed above, and aid rather than hinder your business in succeeding in the market. I will assume your business succeeds if it earns a profit and has a positive return for its owners (shareholders, partners, employees in an employee-owned company) or a sole owner. Information could be about

• current sales;
• future sales;
• the state of the market;
• consumer, social, and technology trends and developments;
• customer needs and wants;
• customer willingness-to-pay;
• key customer segments;
• financial developments;
• supply chain developments; and
• the size of customer churn.
This information is input into decisions and like any input, if it is bad, then the decisions will be bad. Basically, the GIGO Principle (Garbage In-Garbage Out) holds. This should be obvious and almost trite. Unfortunately, you do not know when you make your decision if your information is good or bad, or even sufficient. You face uncertainty due to the amount and quality of the information you have available.

Without any information you would just be guessing, and guessing is costly. In Fig. 1.1, I illustrate what happens to the cost of decisions based on the amount of information you have. Without any information, all your decisions are based on pure guesses, hunches, so you are forced to approximate their effect. The approximation could be very naive, based on gut instinct (i.e., an unfounded belief that you know everything) or what happened yesterday or in another business similar to yours (i.e., an analog business).

The cost of these approximations in terms of financial losses, lost market share, or outright bankruptcy can be very high. As the amount of information increases, however, you will have more insight so your approximations (i.e., guesses) improve and the cost of approximations declines. This is exactly what happens during the business case process I described above. More and better information helps the decision makers at each business case stage. The approximations could now be based on trends, statistically significant estimates of impact, or model-based what-if analyses. These are not “data”; they are information.

# 商业分析代写

• 异常检测：生产监控、预测性维护、制造良率优化；
• 欺诈识别;
• 身份盗用；
• 账户及交易异常；
• 客户分析：
• 客户关系管理（CRM）；
• 客户流失分析与预防；
• 顾客满意度;
• 营销交叉销售和追加销售；
• 定价：泄漏监控、促销效果跟踪、有竞争力的价格响应；
• 履行：管理和管道跟踪；
• 竞争监控；
• 竞争环境分析（CEA）；和
• 新产品开发。
这样的例子不胜枚举。
所有这些问题都需要某种类型的决定。新产品开发最能说明复杂的决策过程。决策是在整个产品开发流程中做出的。这是从构思或概念化到产品发布和发布后跟踪的一系列阶段。Paczkowski (2020) 确定了管道的五个阶段：构思、设计、测试、发布和发布后跟踪。在每个阶段之间做出决定是继续下一阶段还是中止开发甚至生产。每个决策点都由业务案例分析标记，该分析检查产品的预期收入和市场份额。预期销售额、预期价格点（随着产品在管道中移动而细化）、生产和营销成本估算以及包括当前产品的竞争分析，每个业务案例评估都需要销售、定价和促销以及对拟议新产品的竞争性反应。如果其中任何一个对该概念有负面影响，那么它将被取消并从管道中删除。每个业务案例检查点都需要信息。

## 商科代写|商业分析作业代写Statistical Modelling for Business代考|The Role of Information in Business Decision Making

• 当前销售额；
• 未来的销售；
• 市场状况；
• 消费者、社会和技术趋势和发展；
• 客户的需求和愿望；
• 客户支付意愿；
• 关键客户群；
• 金融发展；
• 供应链发展；和
• 客户流失的规模。
这些信息被输入到决策中，就像任何输入一样，如果它是错误的，那么决策就会是错误的。基本上，GIGO 原则（垃圾进垃圾出）成立。这应该是显而易见的，几乎是陈腐的。不幸的是，您不知道您何时做出决定，您的信息是好是坏，甚至是充分的。由于可用信息的数量和质量，您面临着不确定性。

## 广义线性模型代考

statistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。