## 计算机代写|数据库作业代写Database代考|EECS484

statistics-lab™ 为您的留学生涯保驾护航 在代写数据库Database方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写数据库Database代写方面经验极为丰富，各种代写数据库Database相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|数据库作业代写Database代考|MOVING FROM 3 × 5 CARDS TO COMPUTERS

Let us return to our example of a merchant who maintained a customer file on $3 \times 5$ cards. As time passed, the customer base grew and the merchant desired to keep more information about customers. From a dataprocessing standpoint, we would say the enhancement techniques for storage and retrieval led to better organized cards, more fields, and perhaps better ways to store and find individual records.

Some questions arise: Were customer records kept in name-alphabetical order? Were the records stored by telephone number or record number (which might also be a customer number)? What happens if a field not on existing forms or cards were required? If data is added or changed, how much will the record formats change? Such were data-processing dilemmas of the past.

When computers began to be used for businesses, data was stored on magnetic media. The magnetic media were mostly disks and tapes. The way data was stored and retrieved on a computer started out like the 3 $\times 5$ cards, but the magnetic data was virtual. It did not physically exist where you could touch it or see it without some kind of software to load and find records. Further, a display device to see what the “3 $\times 5$ card” had on it was required. Prior to about 1975, the most common way data was fed into a computer was via punched cards. Punched card systems for handling data were in use as early as the $1930 \mathrm{~s}$; sorters were capable of scanning and arranging a pile of cards. Using punched cards to input data into computers was common in the 1960 s because it was known technology. The output or “display device” was typically a line printer.
As data was placed on a computer, software was developed to handle the data and filing techniques evolved. In the very early days of databes, the files kept on computers basically replicated the $3 \times 5$ cards. There were many problems with computers and databases in the “early days.” (Generally, early days in terms of computers and databases means roughly early-to-mid 1960s.) Some problems involved input (how the data got into the computer), output (how the data was to be displayed), and file maintenance (how the data was to be stored and kept up to date, how records were to be added and deleted, and how fields were to be added, deleted, or changed). A person using a computer for keeping track of data could buy a computer and hire programmers, computer operators, and data entry personnel.

## 计算机代写|数据库作业代写Database代考|DATABASE MODELS

We now take a look back at database models as they were before the relational database was practical. The look back shows why the “old systems” are considered obsolete and why the relational model is the de facto standard in databases today. The old systems were classified as two main database models: hierarchical and network. These two models were the backbone of database software before the 1980s. Although these legacy systems might be considered “old fashioned,” there are some systems still in use today dependent on these models.

In this section, we present some versions of the hierarchical model for several reasons:
(a) To illustrate how older models were constructed from file systems
(b) To show why these file-based databases became outdated when relational databases became practical
(c) To see the evolution of file-based systems
The file systems discussed below are actual ways some database systems were written prior to the availability of relational database. The point here is to illustrate the good and bad points of older database systems and to show why relational database was and is such an improvement in database design and use.

In hierarchical database models, all data are arranged in a top-down fashion in which some records have one or more “dependent” or “child” records, and each child record is tied to one and only one “parent.” The parent-child relationship is not meant to infer a human familial relationship. The terms parent and child are historical and are meant to conjure up a picture of one type of data as dependent on another. Another terminology for the parent-child relationship is owner and objects owned, but parent-child terminology is more common. As is illustrated here, the “child” records will be sports played by a “parent” person.

We begin with an example of a hierarchical file situation. Suppose you have a database of people who play a sport at some location. Suppose we have a person, Brenda, who plays tennis at city courts and who plays golf at the municipal links. The person, Brenda, would be at the top of the hierarchy, and the sport location would be in the second tier. Usually, the connection between the layers in the hierarchy is a parent-child relationship. Each parent-person may be related to many child sport locations, but each sport location (each child record) is tied back to the one person (one parent record) who plays that particular sport. A way to store this hierarchical databe could be to have two files, one file for person, one file for sport locations. For the two-file model to make sense (i.e., to have the files “related” and hence be a database), there would have to be pointers or references of some kind from one file to the other.

# 数据库代考

## 计算机代写|数据库作业代写Database代考|DATABASE MODELS

(a) 说明旧模型是如何从文件系统构建的
(b) 说明为什么当关系数据库变得实用时这些基于文件的数据库变得过时
(c)查看基于文件的系统的演变

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|数据库作业代写Database代考|CS6400

statistics-lab™ 为您的留学生涯保驾护航 在代写数据库Database方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写数据库Database代写方面经验极为丰富，各种代写数据库Database相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|数据库作业代写Database代考|ENTITY-RELATIONSHIP DIAGRAMS

This text concentrates on steps 1 through 3 of the software life cycle for databases. A database is a collection of related data. The concept of related data means a database stores information about one enterprise: a business, an organization, a grouping of related people or processes. For example, a database might contain data about Acme Plumbing and involve customers and service calls. A different database might be about the members and activities of a church group in town. It would be inappropriate to have data about the church group and Acme Plumbing in the same database because the two organizations are not related. Again, a database is a collection of related data. To keep a database about each of the above entities is fine, but not in the same database.

Database systems are often modeled using an entity-relationship (ER) diagram as the blueprint from which the actual database is created; the finalized blueprint is the output of the design phase. The ER diagram is an analyst’s tool to diagram the data to be stored in a database system. Phase 1 , the requirements phase, can be quite frustrating as the analyst has to elicit needs and wants from the user. The user may or may not be “computer savvy” and may or may not know the capabilities of a software system. The analyst often has a difficult time deciphering a user’s needs and wants to create a specification that (a) makes sense to both parties (user and analyst) and (b) allows the analyst to design efficiently.

In the real world, the user and the analyst may each be committees of professionals, but users (or user groups) must convey their ideas to an analyst (or team of analysts). Users must express what they want and what they think they need; analysts must elicit these wants and needs, document them, and create a plan to realize the user’s requirements.

User descriptions may seem vague and unstructured. Typically, users are successful at a business. They know the business; they understand the business model. The computer person is typically ignorant of the business but understands the computer end of the problem. To the computeroriented person, the user’s description of the business is as new to the analyst as the computer jargon is to the user. We present a methodology designed to make the analyst’s language precise so the user is comfortable with the to-be-designed database but still provides the analyst with a tool to facilitate mapping directly into the database.

In brief, next we review the early steps in the SE life cycle as it applies to database design.

## 计算机代写|数据库作业代写Database代考|FILES, RECORDS, AND DATA ITEMS

Data must be stored in an orderly fashion in a file of some kind to be useful. Suppose there were no computers-think back to a time when all files were paper documents for a business to keep track of its customers and products. A doctor’s office kept track of patients. A sports team kept statistics on its players. In these cases, data was recorded on paper and likely kept in a filing cabinet. The files with data in them could be referred to as a “database.” A database is most simply a repository of data about some specific entity. A customer file might be as plain and minimal as a list of people who did business with a merchant. There are two aspects to filing: storage and retrieval. Some method of storing data to facilitate retrieval is most desirable.

In a file of customer records, the whole file might be called the customer file, whereas the individual customer’s information is kept in a customer record. Files consist of records. More than likely, more information than a list of just customer’s names would be recorded. At the very least, a customer’s name, address, and phone number could constitute a customer record. Each of these components of the record is called a data item or field. The customer file contains customer records consisting of fields of data.

Table $2.1$ presents an example of some data (you can imagine each line as a $3 \times 5$ card, with the three cards [three records] making up a file).
This file contains three records with one record for each customer. The records each consist of four fields: record number, name, address, and city. As more customers are added, their data will be recorded on a new $3 \times 5$ card (a new record) and placed in the customer file. Several interesting questions and observations arise for the merchant keeping this information:

1. The merchant may well want to add information, such as a telephone number, in the future. Would you add a phone number to all $3 \times 5$ cards, or would the adding be done “as necessary”? If it were done “as necessary,” then some customers would have telephone numbers, and some would not. If a customer had no phone number on the record, then the phone number for that customer would be “null.” (We use the term “null” to mean “unknown.”)
2. How will the file be organized? Imagine not three customers, but 300 or 3,000 . Would the $3 \times 5$ cards be put in alphabetical order? Perhaps, but what happens if you get another A. McDonald or S.

# 数据库代考

## 计算机代写|数据库作业代写Database代考|FILES, RECORDS, AND DATA ITEMS

1. 商家将来可能很想添加信息，例如电话号码。你能给所有人加个电话号码吗3×5卡片，还是“根据需要”添加？如果它是“必要时”完成的，那么一些客户会有电话号码，而另一些则没有。如果客户在记录中没有电话号码，则该客户的电话号码将为“空”。（我们使用术语“空”来表示“未知”。）
2. 文件将如何组织？想象一下，不是三个客户，而是 300 或 3,000 个客户。将3×5卡片按字母顺序排列？也许吧，但如果你得到另一个 A. McDonald 或 S. 会发生什么？

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|数据库作业代写Database代考|CMU15-445

statistics-lab™ 为您的留学生涯保驾护航 在代写数据库Database方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写数据库Database代写方面经验极为丰富，各种代写数据库Database相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|数据库作业代写Database代考|BUILDING A DATABASE

How do we construct a database? Suppose you were asked to put together a database of items one keeps in a pantry. How would you go about doing this? You might grab a piece of paper and begin listing items you see. When you are done, you should have a database of items in the pantry. Simple enough-you have a collection of related data. But take this a step further-Is this a good database? Was your approach to database construction a good methodology? The answer to these questions depends in part on why and how you constructed the list and who will use the list and for what. Also, will whoever uses the database be able to find a fact easily? If you are more methodical, you might first ask yourself how best to construct this database before you grab the paper and begin a list of items. A bit of pre-thinking will save time in the long run because you plan how the list is to be used and by whom.

When dealing with software and computer-related activity like databases, there exists a science of “how to” called software engineering (SE). SE is a process of specifying systems and writing software. To design a good database, we will use some ideas from SE.

In this chapter, we present a brief description of $S E$ as it pertains to planning our database. After this background/overview of SE, we explore database models and in particular the relational database model. While there are historically many kinds of database models, most of the databes in use today use a model known as “relational database.” Our focus in this book is to put forward a methodology based on SE to design a sound relational database.

## 计算机代写|数据库作业代写Database代考|WHAT IS THE SOFTWARE ENGINEERING PROCESS

The term software engineering refers to a process of specifying, designing, writing, delivering, maintaining, and finally retiring software. Software engineers often refer to the “life cycle” of software; software has a beginning and an ending. There are many excellent references on the topic of SE. Some are referenced at the end of this chapter.

Some authors use the term software engineering synonymously with “systems analysis and design,” but the underlying point is that any information system requires some process to develop it correctly. SE spans a wide range of information system tasks. The task we are primarily interested in here is specifying and designing a database. “Specifying a database” means documenting what the datahase is supposed to contain and how to go about the overall design task itself.

A basic idea in SE is to build software correctly; a series of steps or phases is required to progress through a “life cycle.” These steps ensure that a process of thinking precedes action-thinking through “what is needed” precedes “what software is written.” Further, the “thinking before action” necessitates that all parties involved in software development understand and communicate with one another. A common version of presenting the “thinking before acting” scenario may be called a “waterfall” model; the software development process is supposed to flow in a directional way without retracing. Like a waterfall, once a decision point is passed, it is at best difficult to back up and revisit it.

# 数据库代考

## 计算机代写|数据库作业代写Database代考|WHAT IS THE SOFTWARE ENGINEERING PROCESS

SE 的一个基本思想是正确地构建软件；需要一系列步骤或阶段才能通过“生命周期”。这些步骤确保了思考先于行动的过程——通过“需要什么”先于“编写什么软件”来思考。此外，“三思而后行”要求参与软件开发的各方相互理解和沟通。呈现“三思而后行”场景的常见版本可称为“瀑布”模型；软件开发过程应该在没有回溯的情况下以定向方式流动。就像瀑布一样，一旦通过了一个决策点，最多就很难回过头来重新审视它。

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|数据库作业代写SQL代考|Time Series Analysis

statistics-lab™ 为您的留学生涯保驾护航 在代写数据库SQL方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写数据库SQL代写方面经验极为丰富，各种代写数据库SQL相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|数据库作业代写SQL代考|Date, Datetime, and Time Manipulations

Dates and times come in a wide variety of formats, depending on the data source. We often need or want to transform the raw data format for our output, or to perform calculations to arrive at new dates or parts of dates. For example, the data set might contain transaction timestamps, but the goal of the analysis is to trend monthly sales. At other times, we might want to know how many days or months have elapsed since a particular event. Fortunately, SQL has powerful functions and formatting capabilities that can transform just about any raw input to almost any output we might need for analysis.

In this section, I’ll show you how to convert between time zones, and then I’ll go into depth on formatting dates and datetimes. Next, I’ll explore date math and time manipulations, including those that make use of intervals. An interval is a data type that holds a span of time, such as a number of months, days, or hours. Although data can he stored in a datahase table as an interval type, in practice I rapely see this done, sn I will talk ahout intervals alnngside the date and time finctions that you can use them with. Last, I’ll discuss some special considerations when joining or otherwise combining data from different sources.

## 计算机代写|数据库作业代写SQL代考|Time Zone Conversions

Understanding the standard time zone used in a data set can prevent misunderstandings and mistakes further into the analysis process. Time zones split the world into north-south regions that observe the same time. Time zones allow different parts of the world to have similar clock times for daytime and nighttime-so, for example, the sun is overhead at 12 p.m. wherever you are in the world. The zones follow irregular boundaries that are as much political as geographic ones. Most are one hour apart, but some are offset only 30 or 45 minutes, and so there are more than 30 time zones spanning the globe. Many countries that are distant from the equator observe daylight savings time for parts of the year as well, but there are exceptions, such as in the

United States and Australia, where some states observe daylight savings time and others do not. Each time zone has a standard abbreviation, such as PST for Pacific Standard Time and PDT for Pacific Daylight Time.

Many databases are set to Coordinated Universal Time (UTC), the global standard used to regulate clocks, and record events in this time zone. It replaced Greenwich Mean Time (GMT), which you might still see if your data comes from an older database. UTC does not have daylight savings time, so it stays consistent all year long. This turns out to be quite useful for analysis. I remember one time a panicked product manager asked me to figure out why sales on a particular Sunday dropped so much compared to the prior Sunday. I spent hours writing queries and investigating possible causes before eventually figuring out that our data was recorded in Pacific Time (PT). Daylight savings started early Sunday morning, the database clock moved ahead 1 hour, and the day had only 23 hours instead of 24 , and thus sales appeared to drop. Half a year later we had a corresponding 25 -hour day, when sales appeared unusually high.

## 计算机代写|数据库作业代写SQL代考|Date and Timestamp Format Conversions

Dates and timestamps are key to time series analysis. Due to the wide variety of ways in which dates and times can be represented in source data, it is almost inevitable that you will need to convert date formats at some point. In this section, I’ll cover several of the most common conversions and how to accomplish them with SQL: changing the data type, extracting parts of a date or timestamp, and creating a date or timestamp from parts. I’ll begin by introducing some handy functions that return the current date and/or time.

Returning the current date or time is a common analysis task-for cxample, to include a timestamp for the result sel or to use in dale math, covered in the nexi section. The current date and time are referred to as system time, and while returning them is easy to do with SQL, there are some syntax differences between databases.

To return the current date, some databases have a current_date function, with no parentheses:
SELECT current_date;
There is a wider variety of functions to return the current date and time. Check your database’s documentation or just experiment by typing into a SQL window to see whether a function returns a value or an error. The functions with parentheses do not take arguments, but it is important to include the parentheses:
current_timestamp
localtimestamp
get_date()
now()
Finally, there are functions to return only the timestamp portion of the current system time. Again, consult documentation or experiment to figure out which function(s) to use with your database:
current_time
localtime
timeofday()
SQL has a number of functions for changing the format of dates and times. To reduce the granularity of a timestamp, use the date_trunc function. The first argument is a text value indicating the time period level to which to truncate the timestamp in the second argument. The result is a timestamp value:
date_trunc (text, timestamp)
SELECT date_trunc(‘month’ , ‘2020-10-04 12:33:35’ : : timestamp);
date_trunc (text, timestamp)
SELECT date_trunc(‘month’ ,’2020-10-04 12:33:35′: : timestamp);
date_trunc
$\cdots 2020-10-0100: 00: 00$
date_trunc
2020-10-01 00:00:00

## 计算机代写|数据库作业代写SQL代考|Date and Timestamp Format Conversions

SELECT current_date;

current_timestamp
localtimestamp
get_date()
now()

current_time
localtime
timeofday()
SQL 有许多用于更改日期和时间格式的函数。要减少时间戳的粒度，请使用 date_trunc 函数。第一个参数是一个文本值，指示要将第二个参数中的时间戳截断到的时间段级别。结果是一个时间戳值：
date_trunc (text, timestamp)
SELECT date_trunc(‘month’ , ‘2020-10-04 12:33:35’ : : timestamp);
date_trunc(文本，时间戳)
SELECT date_trunc(‘month’ ,’2020-10-04 12:33:35′: : timestamp);

⋯2020−10−0100:00:00
date_trunc
2020-10-01 00:00:00

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|数据库作业代写SQL代考|Detecting Duplicates

statistics-lab™ 为您的留学生涯保驾护航 在代写数据库SQL方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写数据库SQL代写方面经验极为丰富，各种代写数据库SQL相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|数据库作业代写SQL代考|Detecting Duplicates

A duplicate is when you have two (or more) rows with the same information. Duplicates can exist for any number of reasons. A mistake might have been made during data entry, if there is some manual step. A tracking call might have fired twice. A processing step might have run multiple times. You might have created it accidentally with a hidden many-to-many JOIN. However they come to be, duplicates can really throw a wrench in your analysis. I can recall times early in my career when I thought I had a great finding, only to have a product manager point out that my sales figure was twice the actual sales. It’s embarrassing, it erodes trust, and it requires rework and sometimes painstaking reviews of the code to find the problem. I’ve learned to check for duplicates as I go.

Fortunately, it’s relatively easy to find duplicates in our data. One way is to inspect a sample, with all columns ordered:
SELECT column_a, column_b, column_c…
FROM table
SELECT column_a, column_b, column_c.
FROM table
ORDER BY $1,2,3 \ldots$
;
ORDER BY $1,2,3 \ldots$
;

This will reveal whether the data is full of duplicates, for example, when looking at a brand-new data set, when you suspect that a process is generating duplicates, or after a possible Cartesian JOIN. If there are only a few duplicates, they might not show up in the sample. And scrolling through data to try to spot duplicates is taxing on your eyes and brain. A more systematic way to find duplicates is to SELECT the columns and then count the rows (this might look familiar from the discussion of histograms!):
SELECT count() FROM ( SELECT column_a, column_b, column_c… , count() as records
FROM….
GROUP BY $1,2,3 \ldots$
) a
SELECT count() FROM ( SELECT column_a, column_b, column_c… , count $^{}$ ) as records FROM… GROUP BY $1,2,3 \ldots$ ) a WHERE records > 1 ; WHERE records > 1 ; This will tell you whether there are any cases of duplicates. If the query returns 0 , you’re good to go. For more detail, you can list out the number of records $(2,3,4$, etc.): SELECT records, count $()$
FROM
(
SELECT column_a, column_b, column_c…, count(*) as records
FROM….
GROUP BY $1,2,3 \ldots$
) a
WHERE records > 1
GROUP BY 1
;

## 计算机代写|数据库作业代写SQL代考|Deduplication with GROUP BY and DISTINCT

Duplicates happen, and they’re not always a result of bad data. For example, imagine we want to find a list of all the customers who have successfully completed a transaction so we can send them a coupon for their next order. We might JOIN the custom ers table to the transactions table, which would restrict the records returned to only those customers that appear in the transactions table:
SELECT a.customer_id, a.customer_name, a.customer_email
FROM customers a
JOIN transactions b on a.customer_id = b.customer_id
;
This will return a row for each customer for each transaction, however, and there are hopefully at least a few customers who have transacted more than once. We have accidentally created duplicates, not because there is any underlying data quality problem but because we haven’t taken care to avoid duplication in the results. Fortunately, there are several ways to avoid this with SQL. One way to remove duplicates is to use the keyword DISTINCT:
SELECT distinct a.customer_id, a.customer_name, a.customer_email
FROM customers a
JoIN transactions b on a.customer_id = b.customer_id
SELECT distinct a.customer_id, a.customer_name, a.customer_email
FROM customers a
JOIN transactions b on a.customer_id = b.customer_id
;
;
Another option is to use a GROUP BY, which, although typically seen in connection with an aggregation, will also deduplicate in the same way as DISTINCT. I remember the first time I saw a colleague use GROUP BY without an aggregation dedupe-I

didn’t even realize it was possible. I find it somewhat less intuitive than DISTINCT, but the result is the samc:
SELECT a.customer_id, a.customer_name, a.customer_email
FROM customers a
JOIN transactions b on a.customer_id = b.customer_id
GROUP BY $1,2,3$
;
Another useful technique is to perform an aggregation that returns one row per entity. Although technically not deduping, it has a similar effect. For example, if we have a number of transactions by the same customer and need to return one record per customer, we could find the min (first) and/or the max (most recent) transac tion_date:
SELECT customer_id
,min(transaction_date) as first_transaction_date
, max(transaction_date) as last_transaction_date
, count $()$ as total_orders FROM table GROUP BY customer_id SELECT customer_id ,min(transaction_date) as first_transaction_date ,max(transaction_date) as last_transaction_date , count $\left(^{}\right.$ ) as total_orders
FROM table
GROUP BY customer_id
;
uplicate data, or data that contains multiple records per entity even if they techni-
;
Duplicate data, or data that contains multiple records per entity even if they technically are not duplicates, is one of the most common reasons for incorrect query results. You can suspect duplicates as the cause if all of a sudden the number of customers or total sales returned by a query is many times greater than what you were expecting. Fortunately, there are several techniques that can be applied to prevent this from occurring.
Another common problem is missing data, which we’ll turn to next.

## 计算机代写|数据库作业代写SQL代考|Cleaning Data with CASE Transformations

CASE statements can be used to perform a variety of cleaning, enrichment, and summarization tasks. Sometimes the data exists and is accurate, but it would be more useful for analysis if values were standardized or grouped into categories. The structure of CASE statements was presented earlier in this chapter, in the section on binning.
Nonstandard values occur for a variety of reasons. Values might come from different systems with slightly different lists of choices, system code might have changed,

options might have been presented to the customer in different languages, or the customer might have been able to fill out the value rather than pick from a list.

Imagine a field containing information about the gender of a person. Values indicating a female person exist as “F” “female”, and “femme.” We can standardize the values like this:
CASE when gender $=$ ‘ $F$ ‘ then ‘Female’
when gender = ‘female’ then ‘Female’
when qender = ‘femme’ then ‘Female’
else gender
end as gender_cleaned
CASE statements can also be used to add categorization or enrichment that does not exist in the original data. As an example, many organizations use a Net Promoter Score, or NPS, to monitor customer sentiment. NPS surveys ask respondents to rate, on a scale of 0 to 10 , how likely they are to recommend a company or product to a friend or colleague. Scores of 0 to 6 are considered detractors, 7 and 8 are passive, and 9 and 10 are promoters. The final score is calculated by subtracting the percentage of detractors from the percentage of promoters. Survey result data sets usually include optional free text comments and are sometimes enriched with information the organization knows about the person surveyed. Given a data set of NPS survey responses, the first step is to group the responses into the categories of detractor, passive, and promoter:
SELECT response_id
, likelihood
, case when llkelthood $<=6$ then ‘Detractor’
when likelihood $<=8$ then ‘Passive’
else ‘Promoter’
SELECT response_id
, Likelihood
,case when Llkelthood $<=6$ then ‘Detractor’
when likelihood $<=8$ then ‘Passive’
else ‘Promoter’
end as response_type
FRoM nps_responses
;
end as response_type
FROM nps_responses
;

## 计算机代写|数据库作业代写SQL代考|Detecting Duplicates

SELECT column_a、column_b、column_c…
FROM table
SELECT column_a、column_b、column_c。
FROM 表
ORDER BY1,2,3…
;

;

SELECT count() FROM ( SELECT column_a, column_b, column_c… , count() as records
FROM ….

) 一个
SELECT count() FROM ( SELECT column_a, column_b, column_c… , count) 作为记录来自… GROUP BY1,2,3…) a WHERE 记录 > 1 ；WHERE 记录 > 1 ; 这将告诉您是否存在重复的情况。如果查询返回 0 ，您就可以开始了。有关更多详细信息，您可以列出记录数(2,3,4等）：SELECT 记录、计数()
FROM
(
SELECT column_a, column_b, column_c…, count(*) 作为记录
FROM….
GROUP BY1,2,3…
) a
WHERE 记录 > 1
GROUP BY 1

## 计算机代写|数据库作业代写SQL代考|Deduplication with GROUP BY and DISTINCT

SELECT a.customer_id, a.customer_name, a.customer_email
FROM customers a
JOIN transactions b on a。 customer_id = b.customer_id
;

SELECT distinct a.customer_id, a.customer_name, a.customer_email
FROM customers
a 在 a.customer_id = b.customer_id 上加入交易 b
SELECT distinct a.customer_id, a.customer_name, a .customer_email
FROM customers a
JOIN transactions b on a.customer_id = b.customer_id
;
;

SELECT a.customer_id, a.customer_name, a.customer_email
FROM customers a
JOIN transactions b on a.customer_id = b.customer_id
GROUP BY1,2,3
;

SELECT customer_id
,min(transaction_date)作为 first_transaction_date
， max(transaction_date) 作为 last_transaction_date
， count()as total_orders FROM table GROUP BY customer_id SELECT customer_id ,min(transaction_date) as first_transaction_date ,max(transaction_date) as last_transaction_date , count() 作为 total_orders
FROM table
GROUP BY customer_id

## 计算机代写|数据库作业代写SQL代考|Cleaning Data with CASE Transformations

CASE 语句可用于执行各种清理、扩充和汇总任务。有时数据存在并且是准确的，但如果将值标准化或分组到类别中，它将对分析更有用。CASE 语句的结构在本章前面的分箱一节中介绍过。

CASE when gender= ‘ F’ 然后 ‘Female’

CASE 语句还可用于添加原始数据中不存在的分类或丰富。例如，许多组织使用净推荐值或 NPS 来监控客户情绪。NPS 调查要求受访者以 0 到 10 的等级对他们向朋友或同事推荐公司或产品的可能性进行评分。0 到 6 分被认为是批评者，7 和 8 分是被动的，9 和 10 是推动者。最终得分是通过从推荐者的百分比中减去批评者的百分比来计算的。调查结果数据集通常包括可选的自由文本评论，有时还包含组织了解的有关被调查人的信息。给定一组 NPS 调查响应的数据集，第一步是将响应分为批评者、被动者和促进者类别：
SELECT response_id
, 可能性
, case when llkelthood<=6然后是“贬低者”的

SELECT response_id
，可能性
，Llkelthood 时的情况<=6然后是“贬低者”的

FRoM nps_responses 结尾

FROM nps_responses 结尾

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|数据库作业代写SQL代考|Binning

statistics-lab™ 为您的留学生涯保驾护航 在代写数据库SQL方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写数据库SQL代写方面经验极为丰富，各种代写数据库SQL相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|数据库作业代写SQL代考|Binning

Binning is useful when working with continuous values. Rather than the number of observations or records for each value being counted, ranges of values are grouped together, and these groups are called bins or buckets. The number of records that fall into each interval is then counted. Bins can be variable in size or have a fixed size, depending on whether your goal is to group the data into bins that have particular meaning for the organization, are roughly equal width, or contain roughly equal numbers of records. Bins can be created with CASE statements, rounding, and logarithms.

A CASE statement allows for conditional logic to be evaluated. These statements are very flexible, and we will come back to them throughout the book, applying them to data profiling, cleaning, text analysis, and more. The basic structure of a CASE statement is:
case when condition1 then return_value_1
when condition2 then return_value_2
else return_value_default
end
The WHEN condition can be an equality, inequality, or other logical condition. The THEN return value can be a constant, an expression, or a field in the table. Any number of conditions can be included, but the statement will stop executing and return the result the first time a condition evaluates to TRUE. ELSE tells the database what to use as a default value if no matches are found and can also be a constant or field. ELSE is optional, and if it is not included, any nonmatches will return null. CASE statements can also be nested so that the return value is another CASE statement.

## 计算机代写|数据库作业代写SQL代考|n-Tiles

You’re probably familiar with the median, or middle value, of a data set. This is the 50th percentile value. Half of the values are larger than the median, and the other half are smaller. With quartiles, we fill in the 25 th and 75 th percentile values. A quarter of the values are smaller and three quarters are larger for the 25 th percentile; three quarters are smaller and one quarter are larger at the 75 th percentile. Deciles break the data set into 10 equal parts. Making this concept generic, $n$-tiles allow us to calculate any percentile of the data set: 27 th percentile, $50.5$ th percentile, and so on.

Many databases have a median function built in but rely on more generic n-tile functions for the rest. These functions are window functions, computing across a range of rows to return a value for a single row. They take an argument that specifies the number of bins to split the data into and, optionally, a PARTITION BY and/or an ORDER BY clause:
ntile(num_bins) over (partition by… order by…)
As an example, imagine we had 12 transactions with order_amounts of $\$ 19.99, \$9.99$, $\$ 59.99, \$11.99, \$ 23.49, \$55.98, \$ 12.99, \$99.99, \$ 14.99, \$34.99, \$ 4.99$, and$\$89.99$. Performing an ntile calculation with 10 bins sorts each order_amount and assigns a bin from 1 to 10 :

This can be used to bin records in practice by first calculating the ntile of each row in a subquery and then wrapping it in an outer query that uses min and max to find the upper and lower boundaries of the value range:
SELECT ntile
,min(order_amount) as lower_bound
, max(order_amount) as upper_bound
, count(order_id) as orders
FROM
SELECT customer_id, order_id, order_amount
SELECT ntile
, min(order_amount) as lower_bound
, max(order_amount) as upper_bound
, count(order_id) as orders
FROM
( SELECT customer_id, order_id, order_amount
,ntile(10) over_(order by order_amount) as ntile
FROM orders a
GROUP BY 1
;
, ntile(10) over (order by order_amount) as ntile
FROM orders
) $a$
GROUP BY 1
;
A related function is percent_rank. Instead of returning the bins that the data falls into, percent_rank returns the percentile. It takes no argument but requires parentheses and optionally takes a PARTITIONBY and/or an ORDER BY clause:
percent_rank() over (partition by… order by…)

## 计算机代写|数据库作业代写SQL代考|Profiling: Data Quality

Data quality is absolutely critical when it comes to creating good analysis. Although this may seem obvious, it has been one of the hardest lessons I’ve learned in my years of working with data. It’s easy to get overly focused on the mechanics of processing

the data, finding clever query techniques and just the right visualization, only to have stakeholders ignore all of that and point out the one data inconsistency. Ensuring data quality can be one of the hardest and most frustrating parts of analysis. The saying “garbage in, garbage out” captures only part of the problem. Good ingredients in plus incorrect assumptions can also lead to garbage out.

Comparing data against ground truth, or what is otherwise known to be true, is ideal though not always possible. For example, if you are working with a replica of a production database, you could compare the row counts in each system to verify that all rows arrived in the replica database. In other cases, you might know the dollar value and count of sales in a particular month and thus can query for this information in the database to make sure the sum of sales and count of records match. Often the difference between your query results and the expected value comes down to whether you applied the correct filters, such as excluding cancelled orders or test accounts; how you handled nulls and spelling anomalies; and whether you set up correct JOIN conditions between tables.

Profiling is a way to uncover data quality issues early on, before they negatively impact results and conclusions drawn from the data. Profiling reveals nulls, categorical codings that need to be deciphered, fields with multiple values that need to be parsed, and unusual datetime formats. Profiling can also uncover gaps and step changes in the data that have resulted from tracking changes or outages. Data is rarely perfect, and it’s often only through its use in analysis that data quality issues are uncuvered.

## 计算机代写|数据库作业代写SQL代考|Binning

CASE 语句允许评估条件逻辑。这些语句非常灵活，我们将在本书中反复讨论它们，将它们应用于数据分析、清理、文本分析等。CASE 语句的基本结构是：
case when condition1 then return_value_1
when condition2 then return_value_2
else return_value_default
end
WHEN 条件可以是等式、不等式或其他逻辑条件。THEN 返回值可以是常量、表达式或表中的字段。可以包含任意数量的条件，但语句将停止执行并在条件第一次评估为 TRUE 时返回结果。如果没有找到匹配项，ELSE 告诉数据库使用什么作为默认值，也可以是常量或字段。ELSE 是可选的，如果不包括在内，任何不匹配项都将返回 null。CASE 语句也可以嵌套，以便返回值是另一个 CASE 语句。

## 计算机代写|数据库作业代写SQL代考|n-Tiles

ntile(num_bins) over (partition by… order by…)

SELECT ntile
,min( order_amount) as lower_bound
, max(order_amount) as upper_bound
, count(order_id) as orders
FROM
SELECT customer_id, order_id, order_amount
SELECT ntile
, min(order_amount) as lower_bound
, max(order_amount) as upper_bound
, count(order_id) as orders
FROM
( SELECT customer_id, order_id, order_amount
,ntile(10) over_(order by order_amount) as ntile
FROM orders a
GROUP BY 1
;
, ntile(10) over (order by order_amount) as ntile
FROM orders
)一个

percent_rank() over (partition by… order by…)

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|数据库作业代写SQL代考|SQL Query Structure

statistics-lab™ 为您的留学生涯保驾护航 在代写数据库SQL方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写数据库SQL代写方面经验极为丰富，各种代写数据库SQL相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|数据库作业代写SQL代考|SQL Query Structure

SQL queries have common clauses and syntax, although these can be combined in a nearly infinite number of ways to achieve analysis goals. This book assumes you have some prior knowledge of SQL, but I’ll review the basics here so that we have a common foundation for the code examples to come.

The SELECT clause determines the columns that will be returned by the query. One column will be returned for each expression within the SELECT clause, and expressions are separated by commas. An expression can be a field from the table, an aggregation such as a sum, or any number of calculations, such as CASE statements, type conversions, and various functions that will be discussed later in this chapter and throughout the book.

The FROM clause determines the tables from which the expressions in the SELECT clause are derived. A “table” can be a database table, a view (a type of saved query that otherwise functions like a table), or a subquery. A subquery is itself a query, wrapped in parentheses, and the result is treated like any other table by the query that references it. A query can reference multiple tables in the FROM clause, though they must use one of the JOIN types along with a condition that specifies how the tables relate. The JOIN condition usually specifies an squality between ficlds in cach table, such as orders.customer_id = customers.customer_id. JOIN conditions can include multiple fields and can also specify inequalities or ranges of values, such as ranges of dates. We’ll see a variety of JOIN conditions that achieve specific analysis goals throughout the book. An INNER JOIN returns all records that match in both tables. A LEFT JOIN returns all records from the first table, but only those records from the second table that match. A RIGHT JOIN returns all records from the second table, but only those records from the first table that match. A FULL OUTER JOIN returns all records from both tables. A Cartesian JOIN can result when each record in the first table matches more than one record in the second table. Cartesian JOINs should generally be avoided, though there are some specific use cases, such as generating data to fill in a time series, in which we will use them intentionally. Finally, tables in the FROM clause can be aliased, or given a shorter name of one or more letters that can be referenced in other clauses in the query. Aliases save query writers from having to type out long table names repeatedly, and they make queries easier to read.

## 计算机代写|数据库作业代写SQL代考|Profiling: Distributions

Profiling is the first thing I do when I start working with any new data set. I look at how the data is arranged into schemas and tables. I look at the table names to get familiar with the topics covered, such as customers, orders, or visits. I check out the column names in a few tables and start to construct a mental model of how the tables relate to one another. For example, the tables might include an order_detail table with line-item breakouts that relate to the order table via an order_id, while the order table relates to the customer table via a customer_id. If there is a data dictionary, I review that and compare it to the data I see in a sample of rows.

The tables generally represent the operations of an organization, or some subset of the operations, so I think about what domain or domains are covered, such as ecommerce, marketing, or product interactions. Working with data is easier when we have knowledge of how the data was generated. Profiling can provide clues about this, or about what questions to ask of the source, or of people inside or outside the organization responsible for the collection or generation of the data. Even when you collect the data yourself, profiling is useful.

Another detail I check for is how history is represented, if at all. Data sets that are replicas of production databases may not contain previous values for customer addresses or order statuses, for example, whereas a well-constructed data warehouse may have daily snapshots of changing data fields.

Profiling data is related to the concept of exploratory data analysis, or EDA, named by John Tukey. In his book of that name, ${ }^{1}$ Tukey describes how to analyze data sets by computing various summaries and visualizing the results. He includes techniques for looking at distributions of data, including stem-and-leaf plots, box plots, and histograms.

After checking a few samples of data, I start looking at distributions. Distributions allow me to understand the range of values that exist in the data and how often they occur, whether there are nulls, and whether negative values exist alongside positive ones. Distributions can be created with continuous or categorical data and are also called frequencies. In this section, we’ll look at how to create histograms, how binning can help us understand the distribution of continuous values, and how to use n-tiles to get more precise about distributions.

## 计算机代写|数据库作业代写SQL代考|Histograms and Frequencies

One of the best ways to get to know a data set, and to know particular fields within the data set, is to check the frequency of values in each field. Frequency checks are also useful whenever you have a question about whether certain values are possible or if you spot an unexpected value and want to know how commonly it occurs. Frequency checks can be done on any data type, including strings, numerics, dates, and booleans. Frequency queries are a great way to detect sparse data as well.

The query is straightforward. The number of rows can be found with count(* ), and the profiled field is in the GROUP BY. For example, we can check the frequency of each type of fruit in a fictional fruit_inventory table:

A frequency plot is a way to visualize the number of times something occurs in the data set. The field being profiled is usually plotted on the $x$-axis, with the count of observations on the $y$-axis. Figure 2-1 shows an example of plotting the frequency of fruit from our query. Frequency graphs can also be drawn horizontally, which accommodates long value names well. Notice that this is categorical data without any inherent order.

## 计算机代写|数据库作业代写SQL代考|SQL Query Structure

SQL 查询具有通用的子句和语法，尽管它们可以以几乎无限的方式组合以实现分析目标。本书假设您有一些 SQL 的先验知识，但我将在这里回顾基础知识，以便我们为后面的代码示例有一个共同的基础。

SELECT 子句确定查询将返回的列。SELECT 子句中的每个表达式都将返回一列，表达式用逗号分隔。表达式可以是表中的字段、聚合（如求和）或任意数量的计算（如 CASE 语句、类型转换和将在本章后面和整本书中讨论的各种函数）。

FROM 子句确定派生 SELECT 子句中的表达式的表。“表”可以是数据库表、视图（一种保存的查询类型，其功能类似于表）或子查询。子查询本身就是一个查询，用括号括起来，结果被引用它的查询与任何其他表一样对待。一个查询可以在 FROM 子句中引用多个表，但它们必须使用一种 JOIN 类型以及一个指定表如何关联的条件。JOIN 条件通常指定 cach 表中 ficld 之间的 squality，例如 orders.customer_id = customers.customer_id。JOIN 条件可以包括多个字段，还可以指定不等式或值范围，例如日期范围。我们将在整本书中看到各种实现特定分析目标的 JOIN 条件。INNER JOIN 返回两个表中匹配的所有记录。LEFT JOIN 返回第一个表中的所有记录，但仅返回第二个表中匹配的那些记录。RIGHT JOIN 返回第二个表中的所有记录，但仅返回第一个表中匹配的那些记录。FULL OUTER JOIN 返回两个表中的所有记录。当第一个表中的每条记录与第二个表中的多个记录匹配时，可能会导致笛卡尔连接。通常应该避免笛卡尔 JOIN，尽管有一些特定的用例，例如生成数据以填充时间序列，我们将在其中有意使用它们。最后，FROM 子句中的表可以别名，或给出一个或多个字母的较短名称，可以在查询的其他子句中引用。别名使查询编写者不必重复输入长表名，并且它们使查询更易于阅读。

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|数据库作业代写SQL代考|Quantitative Versus Qualitative Data

statistics-lab™ 为您的留学生涯保驾护航 在代写数据库SQL方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写数据库SQL代写方面经验极为丰富，各种代写数据库SQL相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|数据库作业代写SQL代考|Quantitative Versus Qualitative Data

Quantitative data is numeric. It measures people, things, and events. Quantitative data can include descriptors, such as customer information, product type, or device configurations, but it also comes with numeric information such as price, quantity, or visit duration. Counts, sums, average, or other numeric functions are applied to the data. Quantitative data is often machine generated these days, but it doesn’t need to be. Height, weight, and blood pressure recorded on a paper patient intake form are quantitative, as are student quiz scores typed into a spreadsheet by a teacher.

Qualitative data is usually text based and includes opinions, feelings, and descriptions that aren’t strictly quantitative. Temperature and humidity levels are quantitative, while descriptors like “hot and humid” are qualitative. The price a customer paid for a product is quantitative; whether they like or dislike it is qualitative. Survey feedback, customer support inquiries, and social media posts are qualitative. There are whole professions that deal with qualitative data. In a data analysis context, we usually try to quantify the qualitative. One technique for this is to extract keywords or phrases and count their occurrences. We’ll look at this in more detail when we delve into text analysis in Chapter 5. Another technique is sentiment analysis, in which the structure of language is used to interpret the meaning of the words used, in addition to their frequency. Sentences or other bodies of text can be scored for their level of positivity or negativity, and then counts or averages are used to derive insights that would be hard to summarize otherwise. There have been exciting advances in the field of natural language processing, or NLP, though much of this work is done with tools such as Python.

## 计算机代写|数据库作业代写SQL代考|First-, Second-, and Third-Party Data

First-party data is collected by the organization itself. This can be done through server logs, databases that keep track of transactions and customer information, or other systems that are built and controlled by the organization and generate data of interest for analysis. Since the systems were created in-house, finding the people who built them and learning about how the data is generated is usually possible. Data analysts may also be able to influence or have control over how certain pieces of data are created and stored, particularly when bugs are responsible for poor data quality.

Second-party data comes from vendors that provide a service or perform a business function on the organization’s behalf. These are often software as a service (SaaS) products; common examples are CRM, email and marketing automation tools, ecommerce-enabling software, and web and mobile interaction trackers. The data is similar to first-party data since it is about the organization itself, created by its employees and customers. However, both the code that generates and stores the data and the data model are controlled externally, and the data analyst typically has little influence over these aspects. Second-party data is increasingly imported into an organization’s data warehouse for analysis. This can be accomplished with custom code or ETL connectors, or with SaaS vendors that offer data integration.

Third-party data may be purchased or obtained from free sources such as those published by governments. Unless the data has been collected specifically on behalf of the organization, data teams usually have little control over the format, frequency, and data quality. This data often lacks the granularity of first- and second-party data. For example, most third-party sources do not have user-level data, and instead data might be joined with first-party data at the postal code or city level, or at a higher level. Third-party data can have unique and useful information, however, such as aggregate spending patterns, demographics, and market trends that would be very expensive or impossible to collect otherwise.

## 计算机代写|数据库作业代写SQL代考|Sparse Data

Sparse data occurs when there is a small amount of information within a larger set of empty or unimportant information. Sparse data might show up as many nulls and only a few values in a particular column. Null, different from a value of 0 , is the absence of data; that will be covered later in the section on data cleaning. Sparse data can occur when events are rare, such as software errors or purchases of products in the long tail of a product catalog. It can also occur in the early days of a feature or product launch, when only testers or beta customers have access. JSON is one approach that has been developed to deal with sparse data from a writing and storage perspective, as it stores only the data that is present and omits the rest. This is in contrast to a row-store database, which has to hold memory for a field even if there is no value in it.

Sparse data can be problematic for analysis. When events are rare, trends aren’t necessarily meaningful, and correlations are hard to distinguish from chance fluctuations. It’s worth profiling your data, as discussed later in this chapter, to understand if and where your data is sparse. Some options are to group infrequent events or items into categories that are more common, exclude the sparse data or time period from

the analysis entirely, or show descriptive statistics along with cautionary explanations that the trends are not necessarily meaningful.

There are a number of different types of data and a variety of ways that data is described, many of which are overlapping or not mutually exclusive. Familiarity with these types is useful not only in writing good SQL but also for deciding how to analyze the data in appropriate ways. You may not always know the data types in advance, which is why data profiling is so critical. Before we get to that, and to our first code examples, I’ll give a brief review of SQL query structure.

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|数据库作业代写SQL代考|Preparing Data for Analysis

statistics-lab™ 为您的留学生涯保驾护航 在代写数据库SQL方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写数据库SQL代写方面经验极为丰富，各种代写数据库SQL相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|数据库作业代写SQL代考|Types of Data

Estimates of how long data scientists spend preparing their data vary, but it’s safe to say that this step takes up a significant part of the time spent working with data. In 2014 , the New York Times reported that data scientists spend from $50 \%$ to $80 \%$ of their time cleaning and wrangling their data. A 2016 survey by CrowdFlower found that data scientists spend $60 \%$ of their time cleaning and organizing data in order to prepare it for analysis or modeling work. Preparing data is such a common task that terms have sprung up to describe it, such as data munging, data wrangling, and data prep. (“Mung” is an acronym for Mash Until No Good, which I have certainly done on occasion.) Is all this data preparation work just mindless toil, or is it an important part of the process?

Data preparation is easier when a data set has a data dictionary, a document or repository that has clear descriptions of the fields, possible values, how the data was collected, and how it relates to other data. Unfortunately, this is frequently not the case. Documentation often isn’t prioritized, even by people who see its value, or it becomes out-of-date as new fields and tables are added or the way data is populated changes. Data profiling creates many of the elements of a data dictionary, so if your organization already has a data dictionary, this is a good time to use it and contribute to it. If no data dictionary exists currently, consider starting one! This is one of the most valuable gifts you can give to your team and to your future self. An up-to-date data dictionary allows you to speed up the data-profiling process by building on profiling that’s already been done rather than replicating it. It will also improve the quality of your analysis results, since you can verify that you have used fields correctly and applied appropriate filters.

Even when a data dictionary exists, you will still likely need to do data prep work as part of the analysis. In this chapter, I’ll start with a review of data types you are likely to encounter. This is followed by a review of $\mathrm{SQL}$ query structure. Next, I will talk about profiling the data as a way to get to know its contents and check for data quality. Then I’ll talk about some data-shaping techniques that will return the columns and rows needed for further analysis. Finally, I’ll walk through some useful tools for cleaning data to deal with any quality issues.

## 计算机代写|数据库作业代写SQL代考|Database Data Types

Fields in database tables all have defined data types. Most databases have good documentation on the types they support, and this is a good resource for any needed detail beyond what is presented here. You don’t necessarily need to be an expert on the nuances of data types to be good at analysis, but later in the book we’ll encounter situations in which considering the data type is important, so this section will cover the basics. The main types of data are strings, numeric, logical, and datetime, as summarized in Table 2-1. These are based on Postgres but are similar across most major database types.

String data types are the most versatile. These can hold letters, numbers, and special characters, including unprintable characters like tabs and newlines. String fields can be defined to hold a fixed or variable number of characters. A CHAR field could be defined to allow only two characters to hold US state abbreviations, for example, whereas a field storing the full names of states would need to be a VARCHAR to allow a variable number of characters. Fields can be defined as TEXT, CLOB (Character Large Object), or BLOB (Binary Large Object, which can include additional data types such as images), depending on the database to hold very long strings, though since they often take up a lot of space, these data types tend to be used sparingly. When data is loaded, if strings arrive that are too big for the defined data type, they may be truncated or rejected entirely. SQL has a number of string functions that we will make use of for various analysis purposes.

Numeric data types are all the ones that store numbers, both positive and negative. Mathematical functions and operators can be applied to numeric fields. Numeric data types include the INT types as well as FLOAT, DOUBLE, and DECIMAL types that allow decimal places. Integer data types are often implemented because they use less memory than their decimal counterparts. In some databases, such as Postgres, dividing integers results in an integer, rather than a value with decimal places as you might expect. We’ll discuss converting numeric data types to obtain correct results later in this chapter.

The logical data type is called BOOLEAN. It has values of TRUE and FALSE and is an efficient way to store information where these options are appropriate. Operations that compare two fields return a BOOLEAN value as a result. This data type is often used to create flags, fields that summarize the presence or absence of a property in the data. For example, a table storing email data might have a BOOLEAN has_opened field.

The datetime types include DATE, TIMESTAMP, and TIME. Date and time data should be stored in a field of one of these database types whenever possible, since SQL has a number of useful functions that operate on them. Timestamps and dates are very common in databases and are critical to many types of analysis, particularly time series analysis (covered in Chapter 3 ) and cohort analysis (covered in Chapter 4). Chapter 3 will discuss date and time formatting, transformations, and calculations.

## 计算机代写|数据库作业代写SQL代考|Structured Versus Unstructured

Data is often described as structured or unstructured, or sometimes as semistructured. Most databases were designed to handle structured data, where each attribute is stored in a column, and instances of each entity are represented as rows. A data model is first created, and then data is inserted according to that data model. For example, an address table might have fields for street address, city, state, and postal code. Each row would hold a particular customer’s address. Each field has a data type and allows only data of that type to be entered. When structured data is inserted into a table, each field is verified to ensure it conforms to the correct data type. Structured data is easy to query with SQL.

Unstructured data is the opposite of structured data. There is no predetermined structure, data model, or data types. Unstructured data is often the “everything else” that isn’t database data. Documents, emails, and web pages are unstructured. Photos, images, videos, and audio files are also examples of unstructured data. They don’t fit into the traditional data types, and thus they are more difficult for relational databases to store efficiently and for SQL to query. Unstructured data is often stored outside of relational databases as a result. This allows data to be loaded quickly, but lack of data validation can result in low data quality. As we saw in Chapter 1 , the technology continues to evolve, and new tools are being developed to allow SQL querying of many types of unstructured data.

Semistructured data falls in between these two categories. Much “unstructured” data has some structure that we can make use of. For example, emails have from and to email addresses, subject lines, body text, and sent timestamps that can be stored separately in a data model with those fields. Metadata, or data about data, can be extracted from other file types and stored for analysis. For example, music audio files might be tagged with artist, song name, genre, and duration. Generally, the structured parts of semistructured data can be queried with $\mathrm{SQL}$, and $\mathrm{SQL}$ can often be used to parse or otherwise extract structured data for further querying. We’ll see some applications of this in the discussion of text analysis in Chapter $5 .$

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|数据库作业代写SQL代考|Row-Store Databases

statistics-lab™ 为您的留学生涯保驾护航 在代写数据库SQL方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写数据库SQL代写方面经验极为丰富，各种代写数据库SQL相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|数据库作业代写SQL代考|Row-Store Databases

Row-store databases-also called transactional databases-are designed to be efficient at processing transactions: INSERTs, UPDATEs, and DELETEs. Popular open source row-store databases include MySQL and Postgres. On the commercial side, Microsoft SQL Server, Oracle, and Teradata are widely used. Although they’re not really optimized for analysis, for a number of years row-store databases were the only option for companies building data warehouses. Through careful tuning and schema design, these databases can be used for analytics. They are also attractive due to the low cost of open source options and because they’re familiar to the database administrators who maintain them. Many organizations replicate their production database in the same technology as a first step toward building out data infrastructure. For all of these reasons, data analysts and data scientists are likely to work with data in a rowstore database at some point in their career.

We think of a table as rows and columns, but data has to be serialized for storage. A query searches a hard disk for the needed data. Hard disks are organized in a series of blocks of a fixed size. Scanning the hard disk takes both time and resources, so minimizing the amount of the disk that needs to be scanned to return query results is important. Row-store databases approach this problem by serializing data in a row. Figure 1-4 shows an example of row-wise data storage. When querying, the whole row is read into memory. This approach is fast when making row-wise updates, but it’s slower when making calculations across many rows if only a few columns are needed.

To reduce the width of tables, row-store databases are usually modeled in third normal form, which is a database design approach that seeks to store each piece of information only once, to avoid duplication and inconsistencies. This is efficient for transaction processing but often leads to a large number of tables in the database, each with only a few columns. To analyze such data, many joins may be required, and it can be difficult for nondevelopers to understand how all of the tables relate to each other and where a particular piece of data is stored. When doing analysis, the goal is usually denormalization, or getting all the data together in one place.

Tables typically have a primary key that enforces uniqueness-in other words, it prevents the database from creating more than one record for the same thing. Tables will often have an id column that is an auto-incrementing integer, where each new record gets the next integer after the last one inserted, or an alphanumeric value that is created by a primary key generator. There should also be a set of columns that together make the row unique; this combination of fields is called a composite key, or sometimes a business key. For example, in a table of people, the columns first_name, last_name, and birthdate together might make the row unique. Social_security_id would also be a unique identifier, in addition to the table’s person_id column.

## 计算机代写|数据库作业代写SQL代考|Column-Store Databases

Column-store databases took off in the early part of the 21 st century, though their theoretical history goes back as far as that of row-store databases. Column-store databases store the values of a column together, rather than storing the values of a row together. This design is optimized for queries that read many records but not necessarily all the columns. Popular column-store databases include Amazon Redshift, Snowflake, and Vertica.

Column-store databases are efficient at storing large volumes of data thanks to compression. Missing values and repeating values can be represented by very small marker values instead of the full value. For example, rather than storing “United Kingdom” thousands or millions of times, a column-store database will store a surrogate value that takes up very little storage space, along with a lookup that stores the full “United Kingdom” value. Column-store databases also compress data by taking advantage of repetitions of values in sorted data. For example, the database can store the fact that the marker value for “United Kingdom” is repeated 100 times, and this takes up even less space than storing that marker 100 times.

Column-store databases do not enforce primary keys and do not have indexes. Repeated values are not problematic, thanks to compression. As a result, schemas can be tailored for analysis queries, with all the data together in one place as opposed to being in multiple tables that need to be joined. Duplicate data can easily sneak in without primary keys, however, so understanding the source of the data and quality checking are important.

Updates and deletes are expensive in most column-store databases, since data for a single row is distributed rather than stored together. For very large tables, a writeonly policy may exist, so we also need to know something about how the data is generated in order to figure out which records to use. The data can also be slower to read, as it needs to be uncompressed before calculations are applied.

## 计算机代写|数据库作业代写SQL代考|Other Types of Data Infrastructure

Databases aren’t the only way data can be stored, and there is an increasing variety of options for storing data needed for analysis and powering applications. File storage systems, sometimes called data lakes, are probably the main alternative to database warehouses. NoSQL databases and search-based data stores are alternative data storage systems that offer low latency for application development and searching log files. Although not typically part of the analysis process, they are increasingly part of organizations’ data infrastructure, so I will introduce them briefly in this section as well. One interesting trend to point out is that although these newer types of infrastructure at first aimed to break away from the confines of SQL databases, many have ended up implementing some kind of SQL interface to query the data.

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。