### 计算机代写|数据库作业代写SQL代考|Detecting Duplicates

statistics-lab™ 为您的留学生涯保驾护航 在代写数据库SQL方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写数据库SQL代写方面经验极为丰富，各种代写数据库SQL相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|数据库作业代写SQL代考|Detecting Duplicates

A duplicate is when you have two (or more) rows with the same information. Duplicates can exist for any number of reasons. A mistake might have been made during data entry, if there is some manual step. A tracking call might have fired twice. A processing step might have run multiple times. You might have created it accidentally with a hidden many-to-many JOIN. However they come to be, duplicates can really throw a wrench in your analysis. I can recall times early in my career when I thought I had a great finding, only to have a product manager point out that my sales figure was twice the actual sales. It’s embarrassing, it erodes trust, and it requires rework and sometimes painstaking reviews of the code to find the problem. I’ve learned to check for duplicates as I go.

Fortunately, it’s relatively easy to find duplicates in our data. One way is to inspect a sample, with all columns ordered:
SELECT column_a, column_b, column_c…
FROM table
SELECT column_a, column_b, column_c.
FROM table
ORDER BY $1,2,3 \ldots$
;
ORDER BY $1,2,3 \ldots$
;

This will reveal whether the data is full of duplicates, for example, when looking at a brand-new data set, when you suspect that a process is generating duplicates, or after a possible Cartesian JOIN. If there are only a few duplicates, they might not show up in the sample. And scrolling through data to try to spot duplicates is taxing on your eyes and brain. A more systematic way to find duplicates is to SELECT the columns and then count the rows (this might look familiar from the discussion of histograms!):
SELECT count() FROM ( SELECT column_a, column_b, column_c… , count() as records
FROM….
GROUP BY $1,2,3 \ldots$
) a
SELECT count() FROM ( SELECT column_a, column_b, column_c… , count $^{}$ ) as records FROM… GROUP BY $1,2,3 \ldots$ ) a WHERE records > 1 ; WHERE records > 1 ; This will tell you whether there are any cases of duplicates. If the query returns 0 , you’re good to go. For more detail, you can list out the number of records $(2,3,4$, etc.): SELECT records, count $()$
FROM
(
SELECT column_a, column_b, column_c…, count(*) as records
FROM….
GROUP BY $1,2,3 \ldots$
) a
WHERE records > 1
GROUP BY 1
;

## 计算机代写|数据库作业代写SQL代考|Deduplication with GROUP BY and DISTINCT

Duplicates happen, and they’re not always a result of bad data. For example, imagine we want to find a list of all the customers who have successfully completed a transaction so we can send them a coupon for their next order. We might JOIN the custom ers table to the transactions table, which would restrict the records returned to only those customers that appear in the transactions table:
SELECT a.customer_id, a.customer_name, a.customer_email
FROM customers a
JOIN transactions b on a.customer_id = b.customer_id
;
This will return a row for each customer for each transaction, however, and there are hopefully at least a few customers who have transacted more than once. We have accidentally created duplicates, not because there is any underlying data quality problem but because we haven’t taken care to avoid duplication in the results. Fortunately, there are several ways to avoid this with SQL. One way to remove duplicates is to use the keyword DISTINCT:
SELECT distinct a.customer_id, a.customer_name, a.customer_email
FROM customers a
JoIN transactions b on a.customer_id = b.customer_id
SELECT distinct a.customer_id, a.customer_name, a.customer_email
FROM customers a
JOIN transactions b on a.customer_id = b.customer_id
;
;
Another option is to use a GROUP BY, which, although typically seen in connection with an aggregation, will also deduplicate in the same way as DISTINCT. I remember the first time I saw a colleague use GROUP BY without an aggregation dedupe-I

didn’t even realize it was possible. I find it somewhat less intuitive than DISTINCT, but the result is the samc:
SELECT a.customer_id, a.customer_name, a.customer_email
FROM customers a
JOIN transactions b on a.customer_id = b.customer_id
GROUP BY $1,2,3$
;
Another useful technique is to perform an aggregation that returns one row per entity. Although technically not deduping, it has a similar effect. For example, if we have a number of transactions by the same customer and need to return one record per customer, we could find the min (first) and/or the max (most recent) transac tion_date:
SELECT customer_id
,min(transaction_date) as first_transaction_date
, max(transaction_date) as last_transaction_date
, count $()$ as total_orders FROM table GROUP BY customer_id SELECT customer_id ,min(transaction_date) as first_transaction_date ,max(transaction_date) as last_transaction_date , count $\left(^{}\right.$ ) as total_orders
FROM table
GROUP BY customer_id
;
uplicate data, or data that contains multiple records per entity even if they techni-
;
Duplicate data, or data that contains multiple records per entity even if they technically are not duplicates, is one of the most common reasons for incorrect query results. You can suspect duplicates as the cause if all of a sudden the number of customers or total sales returned by a query is many times greater than what you were expecting. Fortunately, there are several techniques that can be applied to prevent this from occurring.
Another common problem is missing data, which we’ll turn to next.

## 计算机代写|数据库作业代写SQL代考|Cleaning Data with CASE Transformations

CASE statements can be used to perform a variety of cleaning, enrichment, and summarization tasks. Sometimes the data exists and is accurate, but it would be more useful for analysis if values were standardized or grouped into categories. The structure of CASE statements was presented earlier in this chapter, in the section on binning.
Nonstandard values occur for a variety of reasons. Values might come from different systems with slightly different lists of choices, system code might have changed,

options might have been presented to the customer in different languages, or the customer might have been able to fill out the value rather than pick from a list.

Imagine a field containing information about the gender of a person. Values indicating a female person exist as “F” “female”, and “femme.” We can standardize the values like this:
CASE when gender $=$ ‘ $F$ ‘ then ‘Female’
when gender = ‘female’ then ‘Female’
when qender = ‘femme’ then ‘Female’
else gender
end as gender_cleaned
CASE statements can also be used to add categorization or enrichment that does not exist in the original data. As an example, many organizations use a Net Promoter Score, or NPS, to monitor customer sentiment. NPS surveys ask respondents to rate, on a scale of 0 to 10 , how likely they are to recommend a company or product to a friend or colleague. Scores of 0 to 6 are considered detractors, 7 and 8 are passive, and 9 and 10 are promoters. The final score is calculated by subtracting the percentage of detractors from the percentage of promoters. Survey result data sets usually include optional free text comments and are sometimes enriched with information the organization knows about the person surveyed. Given a data set of NPS survey responses, the first step is to group the responses into the categories of detractor, passive, and promoter:
SELECT response_id
, likelihood
, case when llkelthood $<=6$ then ‘Detractor’
when likelihood $<=8$ then ‘Passive’
else ‘Promoter’
SELECT response_id
, Likelihood
,case when Llkelthood $<=6$ then ‘Detractor’
when likelihood $<=8$ then ‘Passive’
else ‘Promoter’
end as response_type
FRoM nps_responses
;
end as response_type
FROM nps_responses
;

## 计算机代写|数据库作业代写SQL代考|Detecting Duplicates

SELECT column_a、column_b、column_c…
FROM table
SELECT column_a、column_b、column_c。
FROM 表
ORDER BY1,2,3…
;

;

SELECT count() FROM ( SELECT column_a, column_b, column_c… , count() as records
FROM ….

) 一个
SELECT count() FROM ( SELECT column_a, column_b, column_c… , count) 作为记录来自… GROUP BY1,2,3…) a WHERE 记录 > 1 ；WHERE 记录 > 1 ; 这将告诉您是否存在重复的情况。如果查询返回 0 ，您就可以开始了。有关更多详细信息，您可以列出记录数(2,3,4等）：SELECT 记录、计数()
FROM
(
SELECT column_a, column_b, column_c…, count(*) 作为记录
FROM….
GROUP BY1,2,3…
) a
WHERE 记录 > 1
GROUP BY 1

## 计算机代写|数据库作业代写SQL代考|Deduplication with GROUP BY and DISTINCT

SELECT a.customer_id, a.customer_name, a.customer_email
FROM customers a
JOIN transactions b on a。 customer_id = b.customer_id
;

SELECT distinct a.customer_id, a.customer_name, a.customer_email
FROM customers
a 在 a.customer_id = b.customer_id 上加入交易 b
SELECT distinct a.customer_id, a.customer_name, a .customer_email
FROM customers a
JOIN transactions b on a.customer_id = b.customer_id
;
;

SELECT a.customer_id, a.customer_name, a.customer_email
FROM customers a
JOIN transactions b on a.customer_id = b.customer_id
GROUP BY1,2,3
;

SELECT customer_id
,min(transaction_date)作为 first_transaction_date
， max(transaction_date) 作为 last_transaction_date
， count()as total_orders FROM table GROUP BY customer_id SELECT customer_id ,min(transaction_date) as first_transaction_date ,max(transaction_date) as last_transaction_date , count() 作为 total_orders
FROM table
GROUP BY customer_id

## 计算机代写|数据库作业代写SQL代考|Cleaning Data with CASE Transformations

CASE 语句可用于执行各种清理、扩充和汇总任务。有时数据存在并且是准确的，但如果将值标准化或分组到类别中，它将对分析更有用。CASE 语句的结构在本章前面的分箱一节中介绍过。

CASE when gender= ‘ F’ 然后 ‘Female’

CASE 语句还可用于添加原始数据中不存在的分类或丰富。例如，许多组织使用净推荐值或 NPS 来监控客户情绪。NPS 调查要求受访者以 0 到 10 的等级对他们向朋友或同事推荐公司或产品的可能性进行评分。0 到 6 分被认为是批评者，7 和 8 分是被动的，9 和 10 是推动者。最终得分是通过从推荐者的百分比中减去批评者的百分比来计算的。调查结果数据集通常包括可选的自由文本评论，有时还包含组织了解的有关被调查人的信息。给定一组 NPS 调查响应的数据集，第一步是将响应分为批评者、被动者和促进者类别：
SELECT response_id
, 可能性
, case when llkelthood<=6然后是“贬低者”的

SELECT response_id
，可能性
，Llkelthood 时的情况<=6然后是“贬低者”的

FRoM nps_responses 结尾

FROM nps_responses 结尾

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。