CS 106L - 统计代写答疑辅导

标签： CS 106L

计算机代写|C++作业代写C++代考|Small Overhead, Big Benefits for C++

Posted on 2022年5月26日2022年5月26日 by statistics-lab

如果你也在怎样代写C++这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

C++ 是一种高级语言，它是由Bjarne Stroustrup 于1979 年在贝尔实验室开始设计开发的。 C++ 进一步扩充和完善了C 语言，是一种面向对象的程序设计语言。 C++ 可运行于多种平台上，如Windows、MAC 操作系统以及UNIX 的各种版本。

statistics-lab™ 为您的留学生涯保驾护航在代写C++方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写C++代写方面经验极为丰富，各种代写C++相关的作业也就用不着说。

我们提供的C++及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

计算机代写|C++作业代写C++代考|Small Overhead, Big Benefits for C++

计算机代写|C++作业代写C++代考|Big Benefits for C++

We do not mean to make too big a deal about performance loss, nor do we wish to deny it. For simple C++ code written in a “Fortran” style, with a single layer of wellbalanced parallel loops, the dynamic nature of TBB may not be needed at all. However, the limitations of such a coding style are an important factor in why TBB exists. TBB was designed to efficiently support nested, concurrent, and sequential composition of parallelism and to dynamically map this parallelism on to a target platform. Using a composable library like TBB, developers can build applications by combining components and libraries that contain parallelism without worrying that they will negatively interfere with each other. Importantly, TBB does not require us to restrict the parallelism we express to avoid performance problems. For large, complicated applications using $\mathrm{C}++$, TBB is therefore easy to recommend without disclaimers.
The TBB library has evolved over the years to not only adjust to new platforms but also to demands from developers that want a bit more control over the choices the library makes in mapping parallelism to the hardware. While TBB $1.0$ had very few performance controls for users, TBB 2019 has quite a few more – such as affinity controls,

constructs for work isolation, hooks that can be used to pin threads to cores, and so on. The developers of TBB worked hard to design these controls to provide just the right level of control without sacrificing composability.

The interfaces provided by the library are nicely layered – TBB provides high-level templates that suit the needs of most programmers, focusing on common cases. But it also provides low-level interfaces so we can drill down and create tailored solutions for our specific applications if needed. TBB has the best of both worlds. We typically rely on the default choices of the library to get great performance but can delve into the details if we need to.

计算机代写|C++作业代写C++代考|Evolving Support for Parallelism in TBB and C++

Both the TBB library and the $\mathrm{C}++$ language have evolved significantly since the introduction of the original TBB. In $2006, C_{++}$had no language support for parallel programming, and many libraries, including the Standard Template Library (STL), were not easily used in parallel programs because they were not thread-safe.

The $\mathrm{C}++$ language committee has been busy adding features for threading directly to the language and its accompanying Standard Template Library (STL). Figure 1-1 shows new and planned $\mathrm{C}_{++}$features that address parallelism. Even though we are big fans of TBB, we would in fact prefer if all of the fundamental support needed for parallelism is in the $\mathrm{C}++$ language itself. That would allow TBB to utilize a consistent foundation on which to build higher-level parallelism abstractions. The original versions of TBB had to address a lack of $\mathrm{C}++$ language support, and this is an area where the $\mathrm{C}++$ standard has grown significantly to fill the foundational voids

that TBB originally had no choice but to fill with features such as portable locks and atomics. Unfortunately, for $\mathrm{C}_{++}$developers, the standard still lacks features needed for full support of parallel programming. Fortunately, for readers of this book, this means that TBB is still relevant and essential for effective threading in $\mathrm{C}++$ and will likely stay relevant for many years to come.
It is very important to understand that we are not complaining about the $\mathrm{C}++$ standard process. Adding features to a language standard is best done very carefully, with careful review. The $\mathrm{C}++11$ standard committee, for instance, spent huge energy on a memory model. The significance of this for parallel programming is critical for every library that builds upon the standard. There are also limits to what a language standard should include, and what it should support. We believe that the tasking system and the flow graph system in TBB is not something that will directly become part of a language standard. Even if we are wrong, it is not something that will happen anytime soon.

计算机代写|C++作业代写C++代考|Recent C++ Additions for Parallelism

As shown in Figure 1-1, the $\mathrm{C}++11$ standard introduced some low-level, basic building blocks for threading, including std: : async, std:: future, and std:: thread. It also introduced atomic variables, mutual exclusion objects, and condition variables. These extensions require programmers to do a lot of coding to build up higher-level abstractions – but they do allow us to express basic parallelism directly in $\mathrm{C}++$. The C++11 standard was a clear improvement when it comes to threading, but it doesn’t provide us with the high-level features that make it easy to write portable, efficient parallel code. It also does not provide us with tasks or an underlying work-stealing task scheduler.
The $\mathrm{C}++17$ standard introduced features that raise the level of abstraction above these low-level building blocks, making it easier for us to express parallelism without having to worry about every low-level detail. As we discuss later in this book, there are still some significant limitations, and so these features are not yet sufficiently expressive or performant – there’s still a lot of work to do in the $\mathrm{C}++$ standard.

The most pertinent of these $\mathrm{C}++17$ additions are the execution policies that can be used with the Standard Template Library (STL) algorithms. These policies let us choose whether an algorithm can be safely parallelized, vectorized, parallelized and vectorized, or if it needs to retain its original sequenced semantics. We call an STL implementation that supports these policies a Parallel STL.

Looking into the future, there are proposals that might be included in a future $\mathrm{C}_{++}$ standard with even more parallelism features, such as resumable functions, executors, task blocks, parallel for loops, SIMD vector types, and additional execution policies for the STL algorithms.

C++/C代写

计算机代写|C++作业代写C++代考|Big Benefits for C++

我们无意在性能损失方面做太多，也不想否认。对于以“Fortran”风格编写的简单 C++ 代码，具有单层平衡良好的并行循环，可能根本不需要 TBB 的动态特性。然而，这种编码风格的局限性是 TBB 存在的一个重要因素。TBB 旨在有效地支持并行的嵌套、并发和顺序组合，并将这种并行动态映射到目标平台。使用像 TBB 这样的可组合库，开发人员可以通过组合包含并行性的组件和库来构建应用程序，而不必担心它们会相互干扰。重要的是，TBB 不需要我们限制我们表达的并行性以避免性能问题。对于大型、复杂的应用程序，使用C++, TBB 因此很容易在没有免责声明的情况下推荐。
TBB 库多年来不断发展，不仅可以适应新平台，还可以满足开发人员的需求，这些开发人员希望更好地控制库在将并行性映射到硬件时所做的选择。虽然待定1.0对用户的性能控制很少，TBB 2019 有更多——比如亲和力控制，

用于工作隔离的构造、可用于将线程固定到核心的钩子等等。TBB 的开发人员努力设计这些控件，以便在不牺牲可组合性的情况下提供恰到好处的控制级别。

库提供的接口很好地分层——TBB 提供了满足大多数程序员需求的高级模板，专注于常见情况。但它也提供了低级接口，因此如果需要，我们可以深入研究并为我们的特定应用程序创建量身定制的解决方案。TBB 拥有两全其美的优势。我们通常依靠库的默认选择来获得出色的性能，但如果需要，可以深入研究细节。

计算机代写|C++作业代写C++代考|Evolving Support for Parallelism in TBB and C++

TBB 库和C++自最初的 TBB 引入以来，语言已经发生了显着变化。在2006,C++没有对并行编程的语言支持，包括标准模板库 (STL) 在内的许多库都不容易在并行程序中使用，因为它们不是线程安全的。

这C++语言委员会一直忙于向语言及其随附的标准模板库 (STL) 添加用于直接线程化的功能。图 1-1 显示了新的和计划的C++解决并行性的功能。尽管我们是 TBB 的忠实拥护者，但事实上，如果并行性所需的所有基本支持都在C++语言本身。这将允许 TBB 使用一致的基础来构建更高级别的并行抽象。TBB 的原始版本必须解决缺乏C++语言支持，这是一个领域C++标准已显着增长以填补基础空白

TBB 原本别无选择，只能填充便携式锁和原子锁等功能。不幸的是，对于C++开发人员，该标准仍然缺乏完全支持并行编程所需的功能。幸运的是，对于本书的读者来说，这意味着 TBB 对于有效的线程化仍然是相关的和必不可少的。C++并且可能会在未来很多年保持相关性。
了解我们不是在抱怨C++标准流程。向语言标准添加功能最好非常小心，并仔细审查。这C++11例如，标准委员会在内存模型上花费了大量精力。这对于并行编程的重要性对于每个基于该标准的库来说都是至关重要的。语言标准应该包括什么以及它应该支持什么也有限制。我们相信 TBB 中的任务系统和流程图系统不会直接成为语言标准的一部分。即使我们错了，也不会很快发生。

计算机代写|C++作业代写C++代考|Recent C++ Additions for Parallelism

如图 1-1 所示，C++11标准引入了一些低级的、基本的线程构建块，包括 std::async、std::future 和 std::thread。它还引入了原子变量、互斥对象和条件变量。这些扩展需要程序员进行大量编码来构建更高级别的抽象——但它们确实允许我们直接在C++. C++11 标准在线程方面有了明显的改进，但它没有为我们提供易于编写可移植、高效的并行代码的高级特性。它也没有为我们提供任务或底层工作窃取任务调度程序。
这C++17标准引入了将抽象级别提高到这些低级构建块之上的特性，使我们更容易表达并行性，而不必担心每个低级细节。正如我们在本书后面讨论的那样，仍然存在一些明显的限制，因此这些功能还没有足够的表现力或性能——在C++标准。

其中最相关的C++17附加是可以与标准模板库 (STL) 算法一起使用的执行策略。这些策略让我们选择一个算法是否可以安全地并行化、向量化、并行化和向量化，或者它是否需要保留其原始的序列语义。我们将支持这些策略的 STL 实现称为并行 STL。

展望未来，未来可能会包含一些提案C++具有更多并行特性的标准，例如可恢复函数、执行器、任务块、并行 for 循环、SIMD 向量类型以及 STL 算法的附加执行策略。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

金融工程是使用数学技术来解决金融问题。金融工程使用计算机科学、统计学、经济学和应用数学领域的工具和知识来解决当前的金融问题，以及设计新的和创新的金融产品。

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

术语广义线性模型（GLM）通常是指给定连续和/或分类预测因素的连续响应变量的常规线性回归模型。它包括多元线性回归，以及方差分析和方差分析（仅含固定效应）。

有限元方法代写

有限元方法（FEM）是一种流行的方法，用于数值解决工程和数学建模中出现的微分方程。典型的问题领域包括结构分析、传热、流体流动、质量运输和电磁势等传统领域。

有限元是一种通用的数值方法，用于解决两个或三个空间变量的偏微分方程（即一些边界值问题）。为了解决一个问题，有限元将一个大系统细分为更小、更简单的部分，称为有限元。这是通过在空间维度上的特定空间离散化来实现的，它是通过构建对象的网格来实现的：用于求解的数值域，它有有限数量的点。边界值问题的有限元方法表述最终导致一个代数方程组。该方法在域上对未知函数进行逼近。[1] 然后将模拟这些有限元的简单方程组合成一个更大的方程系统，以模拟整个问题。然后，有限元通过变化微积分使相关的误差函数最小化来逼近一个解决方案。

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

随机分析代写

随机微积分是数学的一个分支，对随机过程进行操作。它允许为随机过程的积分定义一个关于随机过程的一致的积分理论。这个领域是由日本数学家伊藤清在第二次世界大战期间创建并开始的。

时间序列分析代写

随机过程，是依赖于参数的一组随机变量的全体，参数通常是时间。随机变量是随机现象的数量表现，其时间序列是一组按照时间发生先后顺序进行排列的数据点序列。通常一组时间序列的时间间隔为一恒定值（如1秒，5分钟，12小时，7天，1年），因此时间序列可以作为离散时间数据进行分析处理。研究时间序列数据的意义在于现实中，往往需要研究某个事物其随时间发展变化的规律。这就需要通过研究该事物过去发展的历史记录，以得到其自身发展的规律。

回归分析代写

多元回归分析渐进（Multiple Regression Analysis Asymptotics）属于计量经济学领域，主要是一种数学上的统计分析方法，可以分析复杂情况下各影响因素的数学关系，在自然科学、社会和经济学等多个领域内应用广泛。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习和应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|C++作业代写C++代考|Gustafson’s Observations Regarding Amdahl’s Law

Posted on 2022年5月26日2022年5月26日 by statistics-lab

如果你也在怎样代写C++这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的C++及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

计算机代写|C++作业代写C++代考|Gustafson’s Observations Regarding Amdahl’s Law

Amdahl’s Law views programs as fixed, while we make changes to the computer. But experience seems to indicate that as computers get new capabilities, applications change to take advantage of these features. Most of today’s applications would not run on computers from 10 years ago, and many would run poorly on machines that are just 5 years old. This observation is not limited to obvious applications such as video games; it applies also to office applications, web browsers, photography, and video editing software.

More than two decades after the appearance of Amdahl’s Law, John Gustafson, while at Sandia National Labs, took a different approach and suggested a reevaluation of Amdahl’s Law. Gustafson noted that parallelism is more useful when we observe that workloads grow over time. This means that as computers have become more powerful, we have asked them to do more work, rather than staying focused on an unchanging workload. For many problems, as the problem size grows, the work required for the parallel part of the problem grows faster than the part that cannot be parallelized (the serial part). Hence, as the problem size grows, the serial fraction decreases, and, according to Amdahl’s Law, the scalability improves. We can start with an application that looks like Figure P-10, but if the problem scales with the available parallelism, we are likely to see the advancements illustrated in Figure P-13. If the sequential parts still take the same amount of time to perform, they become less and less important as

a percentage of the whole. The algorithm eventually reaches the conclusion shown in Figure P-14. Performance grows at the same rate as the number of processors, which is called linear or order of $n$ scaling, denoted as $O(n)$.
Even in our example, the efficiency of the program is still greatly limited by the serial parts. The efficiency of using processors in our example is about $40 \%$ for large numbers of processors. On a supercomputer, this might be a terrible waste. On a system with multicore processors, one can hope that other work is running on the computer in parallel to use the processing power our application does not use. This new world has many complexities. In any case, it is still good to minimize serial code, whether we take the “glass half empty” view and favor Amdahl’s Law or we lean toward the “glass half full” view and favor Gustafson’s observations.

计算机代写|C++作业代写C++代考|Serial vs. Parallel Algorithms

One of the truths in programming is this: the best serial algorithm is seldom the best parallel algorithm, and the best parallel algorithm is seldom the best serial algorithm.
This means that trying to write a program that runs well on a system with one processor core, and also runs well on a system with a dual-core processor or quad-core processor, is harder than just writing a good serial program or a good parallel program.
Supercomputer programmers know from practice that the work required grows quickly as a function of the problem size. If the work grows faster than the sequential overhead (e.g., communication, synchronization), we can fix a program that scales poorly just by increasing the problem size. It’s not uncommon at all to take a program that won’t scale much beyond 100 processors and scale it nicely to 300 or more processors just by doubling the size of the problem.

计算机代写|C++作业代写C++代考|What Is a Thread

If you know what a thread is, feel free to skip ahead to the section “Safety in the Presence of Concurrency.” It’s important to be comfortable with the concept of a thread, even though the goal of TBB is to abstract away thread management. Fundamentally, we will still be constructing a threaded program, and we will need to understand the implications of this underlying implementation.

All modern operating systems are multitasking operating systems that typically use a preemptive scheduler. Multitasking means that more than one program can be active at a time. We may take it for granted that we can have an e-mail program and a web browser program running at the same time. Yet, not that long ago, this was not the case. A preemptive scheduler means the operating system puts a limit on how long one program can use a processor core before it is forced to let another program use it. This is how the operating system makes it appear that our e-mail program and our web browser are running at the same time when only one processor core is actually doing the work. Generally, each program runs relatively independent of other programs. In particular, the memory where our program variables will reside is completely separate from the memory used by other processes. Our e-mail program cannot directly assign a new value to a variable in the web browser program. If our e-mail program can communicate with our web browser – for instance, to have it open a web page from a link we received in e-mail – it does so with some form of communication that takes much more time than a memory access.
This isolation of programs from each other has value and is a mainstay of computing today. Within a program, we can allow multiple threads of execution to exist in a single program. An operating system will refer to the program as a process, and the threads of execution as (operating system) threads.

All modern operating systems support the subdivision of processes into multiple threads of execution. Threads run independently, like processes, and no thread knows what other threads are running or where they are in the program unless they synchronize explicitly. The key difference between threads and processes is that the threads within a process share all the data of the process. Thus, a simple memory access can set a variable in another thread. We will refer to this as “shared mutable state” (changeable memory locations that are shared) – and we will decry the pain that sharing can cause in this book. Managing the sharing of data, is a multifaceted problem that we included in our list of enemies of parallel programming. We will revisit this challenge, and solutions, repeatedly in this book.

C++/C代写

计算机代写|C++作业代写C++代考|Gustafson’s Observations Regarding Amdahl’s Law

阿姆达尔定律认为程序是固定的，而我们对计算机进行更改。但经验似乎表明，随着计算机获得新功能，应用程序会发生变化以利用这些功能。今天的大多数应用程序无法在 10 年前的计算机上运行，而且许多应用程序在只有 5 年历史的计算机上运行不佳。这种观察不仅限于视频游戏等明显的应用程序；它也适用于办公应用程序、网络浏览器、摄影和视频编辑软件。

在阿姆达尔定律出现二十多年后，约翰古斯塔夫森在桑迪亚国家实验室采取了不同的方法，并建议重新评估阿姆达尔定律。Gustafson 指出，当我们观察到工作负载随着时间的推移而增长时，并行性会更有用。这意味着随着计算机变得越来越强大，我们要求它们做更多的工作，而不是专注于不变的工作量。对于许多问题，随着问题规模的增长，问题的并行部分所需的工作比无法并行化的部分（串行部分）增长得更快。因此，随着问题规模的增长，序列分数减少，并且根据阿姆达尔定律，可扩展性提高。我们可以从一个看起来像图 P-10 的应用程序开始，但是如果问题随着可用的并行度而扩展，我们很可能会看到图 P-13 所示的进步。如果连续的部分仍然需要相同的时间来执行，它们变得越来越不重要，因为

占整体的百分比。该算法最终得出图 P-14 所示的结论。性能以与处理器数量相同的速度增长，称为线性或顺序n缩放，表示为这(n).
即使在我们的示例中，程序的效率仍然受到串行部分的很大限制。在我们的例子中使用处理器的效率大约是40%对于大量的处理器。在超级计算机上，这可能是一种可怕的浪费。在具有多核处理器的系统上，人们可以希望其他工作在计算机上并行运行，以使用我们的应用程序不使用的处理能力。这个新世界有许多复杂性。无论如何，最小化串行代码仍然是好的，无论我们采取“半杯空”的观点并支持阿姆达尔定律，还是我们倾向于“半杯满”的观点并支持 Gustafson 的观察。

计算机代写|C++作业代写C++代考|Serial vs. Parallel Algorithms

编程中的一个真理是：最好的串行算法很少是最好的并行算法，最好的并行算法很少是最好的串行算法。
这意味着试图编写一个程序在一个处理器内核的系统上运行良好，并且在一个双核处理器或四核处理器的系统上运行良好，比编写一个好的串行程序或一个好的串行程序更难。并行程序。
超级计算机程序员从实践中知道，所需的工作随着问题的大小而迅速增长。如果工作的增长速度超过了顺序开销（例如，通信、同步），我们可以通过增加问题大小来修复一个扩展性较差的程序。采用一个不会超过 100 个处理器的程序，只需将问题的规模扩大一倍，就可以很好地将其扩展到 300 个或更多处理器，这种情况并不少见。

计算机代写|C++作业代写C++代考|What Is a Thread

如果您知道线程是什么，请随意跳到“存在并发的安全性”部分。熟悉线程的概念很重要，即使 TBB 的目标是抽象出线程管理。从根本上说，我们仍将构建一个线程程序，我们需要了解这个底层实现的含义。

所有现代操作系统都是多任务操作系统，通常使用抢占式调度程序。多任务处理意味着一次可以激活多个程序。我们可能想当然地认为我们可以同时运行一个电子邮件程序和一个网络浏览器程序。然而，就在不久前，情况并非如此。抢占式调度程序意味着操作系统对一个程序在被迫让另一个程序使用它之前可以使用处理器内核的时间进行限制。这就是操作系统如何使我们的电子邮件程序和我们的网络浏览器看起来同时运行，而实际上只有一个处理器内核在做这项工作。通常，每个程序相对独立于其他程序运行。尤其，我们的程序变量将驻留的内存与其他进程使用的内存完全分开。我们的电子邮件程序不能直接将新值分配给 Web 浏览器程序中的变量。如果我们的电子邮件程序可以与我们的网络浏览器通信——例如，让它从我们在电子邮件中收到的链接打开一个网页——它会通过某种形式的通信来实现，这比访问内存要花费更多的时间.
This isolation of programs from each other has value and is a mainstay of computing today. Within a program, we can allow multiple threads of execution to exist in a single program. An operating system will refer to the program as a process, and the threads of execution as (operating system) threads.

所有现代操作系统都支持将进程细分为多个执行线程。线程独立运行，就像进程一样，没有线程知道其他线程正在运行或它们在程序中的位置，除非它们显式同步。线程和进程之间的主要区别在于进程中的线程共享进程的所有数据。因此，一个简单的内存访问可以在另一个线程中设置一个变量。我们将其称为“共享可变状态”（共享的可变内存位置）——我们将在本书中谴责共享可能带来的痛苦。管理数据共享是一个多方面的问题，我们将其包含在并行编程的敌人列表中。我们将在本书中反复回顾这一挑战和解决方案。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|C++作业代写C++代考|Achieving Parallelism

Posted on 2022年5月26日2022年5月26日 by statistics-lab

如果你也在怎样代写C++这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的C++及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

计算机代写|C++作业代写C++代考|Achieving Parallelism

Coordinating people around the job of preparing and mailing the envelopes is easily expressed by the following two conceptual steps:

Assign people to tasks (and feel free to move them around to balance the workload).
Start with one person on each of the six tasks but be willing to split up a given task so that two or more people can work on it together.
The six tasks are folding, stuffing, sealing, addressing, stamping, and mailing. We also have six people (resources) to help with the work. That is exactly how TBB works best: we define tasks and data at a level we can explain and then split or combine data to match up with resources available to do the work.
The first step in writing a parallel program is to consider where the parallelism is. Many textbooks wrestle with task and data parallelism as though there were a clear choice. TBB allows any combination of the two that we express. If we are lucky, our program will have an abundant amount of data parallelism available for us to exploit. To simplify this work, TBB requires only that we specify tasks and how to split them. For a completely data-parallel task, in TBB we will define one task to which we give all the data. That task will then be split up automatically

to use the available hardware parallelism. The implicit synchronization (as opposed to synchronization we directly ask for with coding) will often eliminate the need for using locks to achieve synchronization. Referring back to our enemies list, and the fact that we hate locks, the implicit synchronization is a good thing. What do we mean by “implicit” synchronization? Usually, all we are saying is that synchronization occurred but we did not explicitly code a synchronization. At first, this should seem like a “cheat.” After all, synchronization still happened – and someone had to ask for it! In a sense, we are counting on these implicit synchronizations being more carefully planned and implemented. The more we can use the standard methods of TBB, and the less we explicitly write our own locking code, the better off we will be – in general.
By letting TBB manage the work, we hand over the responsibility for splitting up the work and synchronizing when needed. The synchronization done by the library for us, which we call implicit synchronization, in turn often eliminates the need for an explicit coding for synchronization (see Chapter 5 ).

We strongly suggest starting there, and only venturing into explicit synchronization (Chapter 5 ) when absolutely necessary or beneficial. We can say, from experience, even when such things seem to be necessary – they are not. You’ve been warned. If you are like us, you’ll ignore the warning occasionally and get burned. We have.

People have been exploring decomposition for decades, and some patterns have emerged. We’ll cover this more later when we discuss design patterns for parallel programming.

计算机代写|C++作业代写C++代考|Terminology: Scaling and Speedup

The scalability of a program is a measure of how much speedup the program gets as we add more computing capabilities. Speedup is the ratio of the time it takes to run a program without parallelism vs. the time it takes to run in parallel. A speedup of $4 \times$ indicates that the parallel program runs in a quarter of the time of the serial program. An example would be a serial program that takes 100 seconds to run on a one-processor machine and 25 seconds to run on a quad-core machine.

As a goal, we would expect that our program running on two processor cores should run faster than our program running on one processor core. Likewise, running on four processor cores should be faster than running on two cores.
Any program will have a point of diminishing returns for adding parallelism. It is not uncommon for performance to even drop, instead of simply leveling off, if we force the use of too many compute resources. The granularity at which we should stop subdividing a problem can be expressed as a grain size. TBB uses a notion of grain size to help limit the splitting of data to a reasonable level to avoid this problem of dropping in performance. Grain size is generally determined automatically, by an automatic partitioner within TBB, using a combination of heuristics for an initial guess and dynamic refinements as execution progresses. However, it is possible to explicitly manipulate the grain size settings if we want to do so. We will not encourage this in this book, because we seldom will do better in performance with explicit specifications than the automatic partitioner in TBB, it tends to be somewhat machine specific, and therefore explicitly setting grain size reduces performance portability.

As Thinking Parallel becomes intuitive, structuring problems to scale will become second nature.

计算机代写|C++作业代写C++代考|Amdahl’s Law

Renowned computer architect, Gene Amdahl, made observations regarding the maximum improvement to a computer system that can be expected when only a portion of the system is improved. His observations in 1967 have come to be known as Amdahl’s Law. It tells us that if we speed up everything in a program by $2 x$, we can expect the

resulting program to run $2 \times$ faster. However, if we improve the performance of only $2 / 5$ th of the program by $2 \times$, the overall system improves only by $1.25 \times$.
Amdahl’s Law is easy to visualize. Imagine a program, with five equal parts, that runs in 500 seconds, as shown in Figure P-10. If we can speed up two of the parts by $2 \times$ and $4 \times$, as shown in Figure P-11, the 500 seconds are reduced to only 400 (1.25 × speedup) and 350 seconds (1.4× speedup), respectively. More and more, we are seeing the limitations of the portions that are not speeding up through parallelism. No matter how many processor cores are available, the serial portions create a barrier at 300 seconds that will not be broken (see Figure P-12) leaving us with only $1.7 \times$ speedup. If we are limited to parallel programming in only $2 / 5$ th of our execution time, we can never get more than a $1.7 \times$ boost in performance！

C++/C代写

计算机代写|C++作业代写C++代考|Achieving Parallelism

在准备和邮寄信封的工作中协调人们很容易通过以下两个概念性步骤来表达：

将人员分配给任务（并随意调动他们以平衡工作量）。
从一个人开始处理六项任务中的每一项，但愿意将给定的任务分开，以便两个或更多人可以一起工作。
这六项任务是折叠、填充、密封、寻址、盖章和邮寄。我们还有六个人（资源）来帮助工作。这正是 TBB 的最佳工作方式：我们在可以解释的级别定义任务和数据，然后拆分或组合数据以匹配可用于完成工作的资源。
编写并行程序的第一步是考虑并行性在哪里。许多教科书都在与任务和数据并行性作斗争，好像有一个明确的选择。TBB 允许我们表达的两者的任意组合。如果幸运的话，我们的程序将有大量的数据并行性可供我们利用。为了简化这项工作，TBB 只需要我们指定任务以及如何拆分它们。对于完全数据并行的任务，在 TBB 中，我们将定义一个任务，我们将向其提供所有数据。然后该任务将自动拆分

使用可用的硬件并行性。隐式同步（与我们直接通过编码要求的同步相反）通常会消除使用锁来实现同步的需要。回顾我们的敌人列表，以及我们讨厌锁的事实，隐式同步是一件好事。“隐式”同步是什么意思？通常，我们所说的只是发生了同步，但我们没有明确编码同步。起初，这应该看起来像是一个“作弊”。毕竟，同步仍然发生了——而且必须有人提出要求！从某种意义上说，我们指望这些隐式同步得到更仔细的计划和实施。我们越能使用 TBB 的标准方法，越少地显式编写自己的锁定代码，我们就会越好——总的来说。
通过让 TBB 管理工作，我们移交了拆分工作并在需要时进行同步的责任。库为我们完成的同步，我们称之为隐式同步，反过来通常消除了对同步的显式编码的需要（参见第 5 章）。

我们强烈建议从那里开始，并且仅在绝对必要或有益时才尝试显式同步（第 5 章）。根据经验，我们可以说，即使这些事情似乎是必要的——它们不是。你已经被警告过了。如果您像我们一样，偶尔会忽略警告并被烧毁。我们有。

几十年来，人们一直在探索分解，并出现了一些模式。稍后我们将在讨论并行编程的设计模式时详细介绍这一点。

计算机代写|C++作业代写C++代考|Terminology: Scaling and Speedup

程序的可扩展性是衡量程序在我们添加更多计算能力时获得多少加速的量度。加速比是在没有并行性的情况下运行程序所需的时间与并行运行所需的时间之比。一个加速4×表示并行程序的运行时间是串行程序的四分之一。例如，一个串行程序在单处理器机器上运行需要 100 秒，在四核机器上运行需要 25 秒。

作为一个目标，我们希望在两个处理器内核上运行的程序应该比在一个处理器内核上运行的程序运行得更快。同样，在四个处理器内核上运行应该比在两个内核上运行更快。
任何程序都会有一个增加并行性的收益递减点。如果我们强制使用过多的计算资源，性能甚至会下降，而不是简单地趋于平稳，这种情况并不少见。我们应该停止细分问题的粒度可以表示为粒度。TBB 使用粒度概念来帮助将数据拆分限制在合理的水平，以避免性能下降的问题。粒度通常由 TBB 中的自动分区器自动确定，结合使用启发式算法进行初始猜测和执行过程中的动态细化。但是，如果我们愿意，可以显式地操纵粒度设置。我们不会在本书中鼓励这样做，

随着平行思考变得直观，按比例构建问题将成为第二天性。

计算机代写|C++作业代写C++代考|Amdahl’s Law

著名的计算机架构师 Gene Amdahl 对计算机系统的最大改进进行了观察，如果只改进系统的一部分，则可以预期这种改进。他在 1967 年的观察被称为阿姆达尔定律。它告诉我们，如果我们通过以下方式加速程序中的所有内容2X，我们可以期待

结果程序运行2×快点。但是，如果我们只提高2/5该计划的第2×，整个系统仅通过以下方式改进1.25×.
阿姆达尔定律很容易形象化。想象一个程序，有五个相等的部分，在 500 秒内运行，如图 P-10 所示。如果我们可以通过以下方式加速其中的两个部分2×和4×，如图 P-11 所示，500 秒分别减少到仅 400（1.25 倍加速）和 350 秒（1.4 倍加速）。越来越多的，我们看到了没有通过并行加速的部分的局限性。无论有多少可用的处理器内核，串行部分都会在 300 秒时创建一个不会被打破的屏障（参见图 P-12），我们只剩下1.7×加速。如果我们仅限于并行编程2/5th 我们的执行时间，我们永远不能得到超过1.7×性能提升！

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|C++作业代写C++代考|Terminology: Data Parallelism

Posted on 2022年5月26日2022年5月26日 by statistics-lab

如果你也在怎样代写C++这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的C++及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

计算机代写|C++作业代写C++代考|Terminology: Data Parallelism

Data parallelism (Figure $\mathrm{P}-3$ ) is easy to picture: take lots of data and apply the same transformation to each piece of the data. In Figure P-3, each letter in the data set is capitalized and becomes the corresponding uppercase letter. This simple example shows that given a data set and an operation that can be applied element by element, we can apply the same task in parallel to each element. Programmers writing code for supercomputers love this sort of problem and consider it so easy to do in parallel that it has been called embarrassingly parallel. A word of advice: if you have lots of data parallelism, do not be embarrassed – take advantage of it and be very happy. Consider i happy parallelism.

When comparing the effort to find work to do in parallel, an approach that focuses on data parallelism is limited by the amount of data we can grab to process. Approaches based on task parallelism alone are limited by the different task types we program. Whil both methods are valid and important, it is critical to find parallelism in the data that we process in order to have a truly scalable parallel program. Scalability means that our application can increase in performance as we add hardware (e.g., more processor cores) provided we have enough data. In the age of big data, it turns out that big data and parallel programming are made for each other. It seems that growth in data sizes is a reliable source of additional work. We will revisit this observation, a little later in this Preface, when we discuss Amdahl’s Law.

计算机代写|C++作业代写C++代考|Terminology: Pipelining

While task parallelism is harder to find than data parallelism, a specific type of task parallelism is worth highlighting: pipelining. In this kind of algorithm, many independent tasks need to be applied to a stream of data. Each item is processed by each stage, as shown by the letter A in (Figure P-4). A stream of data can be processed more quickly when we use a pipeline, because different items can pass through different stages at the same time, as shown in Figure P-5. In these examples, the time to get a result may not be faster (referred to as the latency measured as the time from input to output) but the throughput is greater because it is measured in terms of completions (output) per unit of time. Pipelines enable parallelism to increase throughput when compared with a sequential (serial) processing. A pipeline can also be more sophisticated: it can reroute data or skip steps for chosen items. TBB has specific support for simple pipelines (Chapter 2) and very complex pipelines (Chapter 3). Of course, each step in the pipeline can use data or task parallelism as well. The composability of TBB supports this seamlessly.

计算机代写|C++作业代写C++代考|Example of Exploiting Mixed Parallelism

Consider the task of folding, stuffing, sealing, addressing, stamping, and mailing letters. If we assemble a group of six people for the task of stuffing many envelopes, we can arrange each person to specialize in and perform their assigned task in a pipeline fashion (Figure P-6). This contrasts with data parallelism, where we divide up the supplies and give a batch of everything to each person (Figure P-7). Each person then does all the steps on their collection of materials.
Figure P- 7 is clearly the right choice if every person has to work in a different location far from each other. That is called coarse-grained parallelism because the interactions between the tasks are infrequent (they only come together to collect envelopes, then leave and do their task, including mailing). The other choice shown in Figure P-6 approximates what we call fine-grained parallelism because of the frequent interactions (every envelope is passed along to every worker in various steps of the operation).
Neither extreme tends to fit reality, although sometimes they may be close enough to be useful. In our example, it may turn out that addressing an envelope takes enough time to keep three people busy, whereas the first two steps and the last two steps require only one person on each pair of steps to keep up. Figure P-8 illustrates the steps with the corresponding size of the work to be done. We can conclude that if we assigned only one person to each step as we see done in Figure P-6, that we would be “starving” some people in this pipeline of work for things to do – they would be idle. You might say it would be hidden “underemployment.” Our solution, to achieve a reasonable balance in our pipeline (Figure P-9) is really a hybrid of data and task parallelism.

C++/C代写

计算机代写|C++作业代写C++代考|Terminology: Data Parallelism

数据并行性（图磷−3) 很容易想象：获取大量数据并对每条数据应用相同的转换。在图 P-3 中，数据集中的每个字母都大写，变成了对应的大写字母。这个简单的例子表明，给定一个数据集和一个可以逐个元素应用的操作，我们可以将相同的任务并行应用于每个元素。为超级计算机编写代码的程序员喜欢这类问题，并认为它很容易并行完成，以至于被称为令人尴尬的并行。忠告：如果您有很多数据并行性，请不要感到尴尬——好好利用它并感到非常高兴。考虑一下我快乐的并行性。

在比较寻找并行工作的努力时，专注于数据并行性的方法受到我们可以抓取以处理的数据量的限制。仅基于任务并行性的方法受到我们编程的不同任务类型的限制。虽然这两种方法都是有效且重要的，但为了拥有真正可扩展的并行程序，在我们处理的数据中找到并行性至关重要。可扩展性意味着我们的应用程序可以在我们添加硬件（例如，更多处理器内核）时提高性能，前提是我们有足够的数据。在大数据时代，事实证明，大数据和并行编程是相辅相成的。数据规模的增长似乎是额外工作的可靠来源。我们将在本前言稍后讨论阿姆达尔定律时重新审视这一观察。

计算机代写|C++作业代写C++代考|Terminology: Pipelining

虽然任务并行性比数据并行性更难找到，但一种特定类型的任务并行性值得强调：流水线。在这种算法中，许多独立的任务需要应用于一个数据流。每个项目由每个阶段处理，如（图 P-4）中的字母 A 所示。当我们使用管道时，可以更快地处理数据流，因为不同的项目可以同时通过不同的阶段，如图 P-5 所示。在这些示例中，获得结果的时间可能不会更快（称为延迟，测量为从输入到输出的时间），但吞吐量更大，因为它是根据每单位时间的完成（输出）来衡量的。与顺序（串行）处理相比，管道使并行性能够提高吞吐量。管道也可以更复杂：它可以重新路由数据或跳过所选项目的步骤。TBB 对简单管道（第 2 章）和非常复杂的管道（第 3 章）有特定的支持。当然，管道中的每个步骤也可以使用数据或任务并行性。TBB 的可组合性无缝地支持了这一点。

计算机代写|C++作业代写C++代考|Example of Exploiting Mixed Parallelism

考虑折叠、填充、密封、寻址、盖章和邮寄信件的任务。如果我们将 6 个人组成一个小组来完成填充许多信封的任务，我们可以安排每个人专门从事并以流水线方式执行分配给他们的任务（图 P-6）。这与数据并行性形成对比，在数据并行性中，我们将供应分配给每个人（图 P-7）。然后，每个人都会完成他们收集材料的所有步骤。
如果每个人都必须在相距很远的不同地点工作，那么图 P-7 显然是正确的选择。这被称为粗粒度并行，因为任务之间的交互并不频繁（它们只是聚集在一起收集信封，然后离开并完成他们的任务，包括邮寄）。图 P-6 中显示的另一个选择近似于我们所说的细粒度并行，因为频繁交互（每个信封在操作的各个步骤中传递给每个工作人员）。
这两个极端都不符合现实，尽管有时它们可能足够接近以致有用。在我们的示例中，处理一个信封可能需要足够的时间来让三个人保持忙碌，而前两个步骤和最后两个步骤只需要一个人在每对步骤上跟上。图 P-8 说明了要完成的工作的相应大小的步骤。我们可以得出结论，如果我们只为每个步骤分配一个人，如图 P-6 所示，我们将“饿死”这条工作管道中的一些人做事——他们会闲置。你可能会说这将是隐藏的“就业不足”。我们的解决方案是为了在管道中实现合理的平衡（图 P-9），它实际上是数据和任务并行性的混合体。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|C++作业代写C++代考|Concurrent vs. Parallel

Posted on 2022年5月26日2022年5月26日 by statistics-lab

如果你也在怎样代写C++这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的C++及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

计算机代写|C++作业代写C++代考|Concurrent vs. Parallel

It is worth noting that the terms concurrent and parallel are related, but subtly different. Concurrent simply means “happening during the same time span” whereas parallel is more specific and is taken to mean “happening at the same time (at least some of the time).” Concurrency is more like what a single person tries to do when multitasking, whereas parallel is akin to what multiple people can do together. Figure P-1 illustrates the concepts of concurrency vs. parallelism. When we create effective parallel programs, we are aiming to accomplish more than just concurrency. In general, speaking of concurrency will mean there is not an expectation for a great deal of activity to be truly parallel – which means that two workers are not necessarily getting more work done than one could in theory (see tasks A and B in Figure P-1). Since the work is not done sooner, concurrency does not improve the latency of a task (the delay to start a task). Using the term parallel conveys an expectation that we improve latency and throughput (work done in a given time). We explore this in more depth starting on page xxxv when we explore limits of parallelism and discuss the very important concepts of Amdahl’s Law.

计算机代写|C++作业代写C++代考|Enemies of Parallelism

Bearing in mind the enemies of parallel programming will help understand our advocacy for particular programming methods. Key parallel programming enemies include

Locks: In parallel programming, locks or mutual exclusion objects (mutexes) are used to provide a thread with exclusive access to a resource – blocking other threads from simultaneously accessing the same resource. Locks are the most common explicit way to ensure parallel tasks update shared data in a coordinated fashion (as opposed to allowing pure chaos). We hate locks because they serialize part of our programs, limiting scaling. The sentiment “we hate locks” is on our minds throughout the book. We hope to instill this mantra in you as well, without losing sight of when we must synchronize properly. Hence, a word of caution: we actually do love locks when they are needed, because without them disaster will strike. This love/hate relationship with locks needs to be understood.
Shared mutable state: Mutable is another word for “can be changed.” Shared mutable state happens any time we share data among multiple threads, and we allow it to change while being shared. Such sharing either reduces scaling when synchronization is needed and used correctly, or it leads to correctness issues (race conditions or deadlocks) when synchronization (e.g., a lock) is incorrectly applied. Realistically, we need shared mutable state when we write interesting applications. Thinking about careful handling of shared mutable state may be an easier way to understand the basis of our love/hate relationship with locks. In the end, we all end up “managing” shared mutable state and the mutual exclusion (including locks) to make it work as we wish.
Not “Thinking Parallel”: Use of clever bandages and patches will not make up for a poorly thought out strategy for scalable algorithms. Knowing where the parallelism is available, and how it can be

exploited, should be considered before implementation. Trying to add parallelism to an application, after it is written, is fraught with peril. Some preexisting code may shift to use parallelism relatively well, but most code will benefit from considerable rethinking of algorithms.

Forgetting that algorithms win: This may just be another way to say “Think Parallel.” The choice of algorithms has a profound effect on the scalability of applications. Our choice of algorithms determine how tasks can divide, data structures are accessed, and results are coalesced. The optimal algorithm is really the one which serves as the basis for optimal solution. An optimal solution is a combination of the appropriate algorithm, with the best matching parallel data structure, and the best way to schedule the computation over the data. The search for, and discovery of, algorithms which are better is seemingly unending for all of us as programmers. Now, as parallel programmers, we must add scalable to the definition of better for an algorithm.

计算机代写|C++作业代写C++代考|Terminology of Parallelism

The vocabulary of parallel programming is something we need to learn in order to converse with other parallel programmers. None of the concepts are particularly hard, but they are very important to internalize. A parallel programmer, like any programmer, spends years gaining a deep intuitive feel for their craft, despite the fundamentals being simple enough to explain.
We will discuss decomposition of work into parallel tasks, scaling terminology, correctness considerations, and the importance of locality due primarily to cache effects. When we think about our application, how do we find the parallelism?
At the highest level, parallelism exists either in the form of data to operate on in parallel, or in the form of tasks to execute in parallel. And they are not mutually exclusive. In a sense, all of the important parallelism is in data parallelism. Nevertheless, we will introduce both because it can be convenient to think of both. When we discuss scaling, and Amdahl’s Law, our intense bias to look for data parallelism will become more understandable.

C++/C代写

计算机代写|C++作业代写C++代考|Concurrent vs. Parallel

值得注意的是，并发和并行这两个术语是相关的，但有细微的不同。并发只是意味着“在同一时间跨度内发生”，而并行更具体，被认为是指“同时发生（至少在某些时候）”。并发更像是一个人在多任务处理时尝试做的事情，而并行类似于多个人可以一起做的事情。图 P-1 说明了并发与并行的概念。当我们创建有效的并行程序时，我们的目标不仅仅是并发。一般来说，谈到并发意味着不期望大量活动真正并行——这意味着两名工作人员完成的工作不一定比理论上多（参见图 P 中的任务 A 和 B -1)。由于工作没有尽快完成，并发不会改善任务的延迟（启动任务的延迟）。使用术语并行表达了我们改善延迟和吞吐量（在给定时间内完成的工作）的期望。我们从第 xxxv 页开始更深入地探讨这一点，当时我们探讨了并行性的限制并讨论了阿姆达尔定律的非常重要的概念。

计算机代写|C++作业代写C++代考|Enemies of Parallelism

牢记并行编程的敌人将有助于理解我们对特定编程方法的倡导。主要的并行编程敌人包括

锁：在并行编程中，锁或互斥对象（互斥体）用于为线程提供对资源的独占访问——阻止其他线程同时访问同一资源。锁是确保并行任务以协调的方式更新共享数据的最常见的显式方式（而不是允许纯粹的混乱）。我们讨厌锁，因为它们序列化了我们程序的一部分，限制了扩展。“我们讨厌锁”的情绪贯穿整本书。我们也希望将这个口头禅灌输给你，同时不要忽视我们何时必须正确同步。因此，请注意：我们实际上会在需要时使用爱锁，因为没有它们，灾难就会降临。需要了解这种与锁的爱/恨关系。
共享可变状态：可变是“可以更改”的另一个词。每当我们在多个线程之间共享数据时，都会发生共享可变状态，并且我们允许它在共享时更改。这种共享要么在需要和正确使用同步时减少缩放，要么在不正确地应用同步（例如，锁）时导致正确性问题（竞争条件或死锁）。实际上，当我们编写有趣的应用程序时，我们需要共享可变状态。仔细考虑共享可变状态的处理可能是一种更容易理解我们对锁的爱/恨关系的基础的方法。最后，我们最终都会“管理”共享可变状态和互斥（包括锁），以使其按我们的意愿工作。
不是“并行思考”：使用巧妙的绷带和补丁无法弥补可扩展算法的考虑不周的策略。了解并行性在哪里可用，以及如何实现

被利用，应在实施前考虑。尝试在应用程序编写完成后为其添加并行性是充满危险的。一些预先存在的代码可能会相对较好地使用并行性，但大多数代码将受益于对算法的大量重新思考。

忘记算法获胜：这可能只是“并行思考”的另一种说法。算法的选择对应用程序的可扩展性有着深远的影响。我们对算法的选择决定了如何划分任务、访问数据结构以及合并结果。最优算法实际上是作为最优解的基础的算法。最佳解决方案是适当的算法、最佳匹配的并行数据结构以及调度数据计算的最佳方式的组合。对于我们所有的程序员来说，寻找和发现更好的算法似乎是永无止境的。现在，作为并行程序员，我们必须将可扩展性添加到更好的算法的定义中。

计算机代写|C++作业代写C++代考|Terminology of Parallelism

为了与其他并行程序员交流，我们需要学习并行编程的词汇。这些概念都不是特别难，但它们对于内化非常重要。与任何程序员一样，并行程序员需要花费数年时间才能对自己的手艺有深刻的直观感受，尽管基本原理很简单可以解释。
我们将讨论将工作分解为并行任务、缩放术语、正确性考虑以及主要由于缓存效应而导致的局部性的重要性。当我们考虑我们的应用程序时，我们如何找到并行性？
在最高级别，并行性要么以并行操作的数据形式存在，要么以并行执行的任务形式存在。它们并不相互排斥。从某种意义上说，所有重要的并行性都在数据并行性中。不过，我们将同时介绍这两种方法，因为可以方便地考虑两者。当我们讨论缩放和阿姆达尔定律时，我们对寻找数据并行性的强烈偏见将变得更容易理解。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|C++作业代写C++代考|Parallel Programming Does Not Have to Be Messy

Posted on 2022年5月26日2022年5月26日 by statistics-lab

如果你也在怎样代写C++这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的C++及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

计算机代写|C++作业代写C++代考|Parallel Programming Does Not Have to Be Messy

TBB offers composability for parallel programming, and that changes everything. Composability means we can mix and match features of TBB without restriction. Most notably, this includes nesting. Therefore, it makes perfect sense to have a parallel_for inside a parallel_for loop. It is also okay for a parallel_for to call a subroutine, which then has a parallel_for within it.
Supporting composable nested parallelism turns out to be highly desirable because it exposes more opportunities for parallelism, and that results in more scalable applications. OpenMP, for instance, is not composable with respect to nesting because each level of nesting can easily cause significant overhead and consumption of resources leading to exhaustion and program termination. This is a huge problem when you consider that a library routine may contain parallel code, so we may experience issues using a non-composable technique if we call the library while already doing parallelism. No such problem exists with TBB, because it is composable. TBB solves this, in part, by letting use expose opportunities for parallelism (tasks) while TBB decides at runtime how to map them to hardware (threads).

This is the key benefit to coding in terms of tasks (available but nonmandatory parallelism (see “relaxed sequential semantics” in Chapter 2)) instead of threads (mandatory parallelism). If a parallel_for was considered mandatory, nesting would cause an explosion of threads which causes a whole host of resource issues which can easily (and often do) crash programs when not controlled. When parallel_for exposes

available nonmandatory parallelism, the runtime is free to use that information to match the capabilities of the machine in the most effective manner.
We have come to expect composability in our programming languages, but most parallel programming models have failed to preserve it (fortunately, TBB does preserve composability!). Consider “if” and “while” statements. The $\mathrm{C}$ and $\mathrm{C}++$ languages allow them to freely mix and nest as we desire. Imagine this was not so, and we lived in a world where a function called from within an if statement was forbidden to contain a while statement! Hopefully, any suggestion of such a restriction seems almost silly. TBB brings this type of composability to parallel programming by allowing parallel constructs to be freely mixed and nested without restrictions, and without causing issues.

计算机代写|C++作业代写C++代考|Scaling, Performance, and Quest for Performance

Perhaps the most important benefit of programming with TBB is that it helps create a performance portable application. We define performance portability as the characteristic that allows a program to maintain a similar “percentage of peak performance” across a variety of machines (different hardware, different operating systems, or both). We would like to achieve a high percentage of peak performance on many different machines without the need to change our code.
We would also like to see a $16 \times$ gain in performance on a 64 -core machine vs. a quad-core machine. For a variety of reasons, we will almost never see ideal speedup (never say never: sometimes, due to an increase in aggregate cache size we can see more than ideal speedup – a condition we call superlinear speedup).
WHAT IS SPEEDUP?
Speedup is formerly defined to be the time to run sequentially (not in parallel) divided by the time to run in parallel. If my program runs in 3 seconds normally, but in only 1 second on a quad-core processor, we would say it has a speedup of $3 \times$. Sometimes, we might speak of efficiency which is speedup divided by the number of processing cores. Our $3 \times$ would be $75 \%$ efficient at using the parallelism.
The ideal goal of a $16 \times$ gain in performance when moving from a quad-core machine to one with 64 cores is called linear scaling or perfect scaling.

计算机代写|C++作业代写C++代考|Introduction to Parallel Programming

To accomplish this, we need to keep all the cores busy as we grow their numbers – something that requires considerable available parallelism. We will dive more into this concept of “available parallelism” starting on page xxxvii when we discuss Amdahl’s Law and its implications.

For now, it is important to know that TBB supports high-performance programming and helps significantly with performance portability. The high-performance support comes because TBB introduces essentially no overhead which allows scaling to proceed without issue. Performance portability lets our application harness available parallelism as new machines offer more.
In our confident claims here, we are assuming a world where the slight additional overhead of dynamic task scheduling is the most effective at exposing the parallelism and exploiting it. This assumption has one fault: if we can program an application to perfectly match the hardware, without any dynamic adjustments, we may find a few percentage points gain in performance. Traditional High-Performance Computing (HPC) programming, the name given to programming the world’s largest computers for intense computations, has long had this characteristic in highly parallel scientific computations. HPC developer who utilize OpenMP with static scheduling, and find it does well with their performance, may find the dynamic nature of TBB to be a slight reduction in performance. Any advantage previously seen from such static scheduling is becoming rarer for a variety of reasons. All programming including HPC programming, is increasing in complexity in a way that demands support for nested and dynamic parallelism support. We see this in all aspects of HPC programming as well, including growth to multiphysics models, introduction of AI (artificial intelligence), and use of ML (machine learning) methods. One key driver of additional complexity is the increasing diversity of hardware, leading to heterogeneous compute capabilities within a single machine. TBB gives us powerful options for dealing with these complexities, including its flow graph features which we will dive into in Chapter $3 .$

C++/C代写

计算机代写|C++作业代写C++代考|Parallel Programming Does Not Have to Be Messy

TBB 为并行编程提供了可组合性，这改变了一切。可组合性意味着我们可以不受限制地混合和匹配 TBB 的特性。最值得注意的是，这包括嵌套。因此，在 parallel_for 循环中包含一个 parallel_for 是非常有意义的。一个parallel_for 调用一个子例程也是可以的，子例程里面有一个parallel_for。
事实证明，支持可组合的嵌套并行是非常可取的，因为它为并行提供了更多机会，从而产生了更具可扩展性的应用程序。例如，OpenMP 在嵌套方面是不可组合的，因为每一层的嵌套很容易导致显着的开销和资源消耗，从而导致耗尽和程序终止。当您考虑到库例程可能包含并行代码时，这是一个巨大的问题，因此如果我们在已经进行并行处理的同时调用库，我们可能会遇到使用不可组合技术的问题。TBB 不存在这样的问题，因为它是可组合的。TBB 部分解决了这个问题，方法是让使用暴露并行性（任务）的机会，而 TBB 在运行时决定如何将它们映射到硬件（线程）。

这是根据任务（可用但非强制性的并行性（参见第 2 章中的“宽松顺序语义”））而不是线程（强制性并行性）进行编码的主要好处。如果parallel_for 被认为是强制性的，那么嵌套会导致线程爆炸，从而导致大量资源问题，如果不受控制，这些问题很容易（并且经常会）使程序崩溃。当 parallel_for 暴露时

可用的非强制性并行性，运行时可以自由地使用该信息以最有效的方式匹配机器的功能。
我们已经开始期望我们的编程语言具有可组合性，但大多数并行编程模型都未能保留它（幸运的是，TBB 确实保留了可组合性！）。考虑“if”和“while”语句。这C和C++语言允许它们按照我们的意愿自由混合和嵌套。想象一下，情况并非如此，我们生活在一个从 if 语句中调用的函数被禁止包含 while 语句的世界！希望任何关于这种限制的建议看起来几乎都是愚蠢的。TBB 允许并行结构不受限制地自由混合和嵌套，并且不会引起问题，从而为并行编程带来了这种可组合性。

计算机代写|C++作业代写C++代考|Scaling, Performance, and Quest for Performance

使用 TBB 编程的最重要的好处可能是它有助于创建高性能的可移植应用程序。我们将性能可移植性定义为允许程序在各种机器（不同的硬件、不同的操作系统或两者兼有）上保持相似的“峰值性能百分比”的特性。我们希望在许多不同的机器上实现高百分比的峰值性能，而无需更改我们的代码。
我们还希望看到一个16×64 核机器与四核机器的性能提升。由于各种原因，我们几乎永远不会看到理想的加速（永远不要说永远：有时，由于聚合缓存大小的增加，我们可以看到比理想加速更多的情况——我们称之为超线性加速）。
什么是加速？
加速比以前定义为按顺序（非并行）运行的时间除以并行运行的时间。如果我的程序正常运行 3 秒，但在四核处理器上只运行 1 秒，我们会说它的加速比为3×. 有时，我们可能会说效率是加速除以处理核心的数量。我们的3×将会75%有效地使用并行性。
一个理想的目标16×从四核机器迁移到 64 核机器时的性能提升称为线性缩放或完美缩放。

计算机代写|C++作业代写C++代考|Introduction to Parallel Programming

为了实现这一点，我们需要在增加内核数量时让所有内核保持忙碌——这需要相当大的可用并行度。当我们讨论阿姆达尔定律及其含义时，我们将从第 xxxvii 页开始更深入地研究“可用并行性”的概念。

目前，重要的是要知道 TBB 支持高性能编程并显着提高性能可移植性。高性能支持的出现是因为 TBB 基本上没有引入任何开销，这使得扩展可以毫无问题地进行。性能可移植性让我们的应用程序利用可用的并行性，因为新机器提供更多。
在我们自信的声明中，我们假设动态任务调度的轻微额外开销对于暴露并行性和利用它是最有效的。这个假设有一个错误：如果我们可以对应用程序进行编程以完美匹配硬件，而无需任何动态调整，我们可能会发现性能提升几个百分点。传统的高性能计算 (HPC) 编程是为世界上最大的计算机编程以进行密集计算的名称，长期以来在高度并行的科学计算中具有这一特征。将 OpenMP 与静态调度结合使用并发现其性能表现良好的 HPC 开发人员可能会发现 TBB 的动态特性会稍微降低性能。由于各种原因，以前从这种静态调度中看到的任何优势都变得越来越少。包括 HPC 编程在内的所有编程都在以需要支持嵌套和动态并行支持的方式增加复杂性。我们在 HPC 编程的各个方面也看到了这一点，包括向多物理模型的发展、AI（人工智能）的引入以及 ML（机器学习）方法的使用。增加复杂性的一个关键驱动因素是硬件的日益多样化，从而导致单台机器内的异构计算能力。TBB 为我们提供了强大的选项来处理这些复杂性，包括我们将在本章中深入探讨的流程图特性以需要支持嵌套和动态并行性支持的方式增加复杂性。我们在 HPC 编程的各个方面也看到了这一点，包括向多物理模型的发展、AI（人工智能）的引入以及 ML（机器学习）方法的使用。增加复杂性的一个关键驱动因素是硬件的日益多样化，从而导致单台机器内的异构计算能力。TBB 为我们提供了强大的选项来处理这些复杂性，包括我们将在本章中深入探讨的流程图特性以需要支持嵌套和动态并行性支持的方式增加复杂性。我们在 HPC 编程的各个方面也看到了这一点，包括向多物理模型的发展、AI（人工智能）的引入以及 ML（机器学习）方法的使用。增加复杂性的一个关键驱动因素是硬件的日益多样化，从而导致单台机器内的异构计算能力。TBB 为我们提供了强大的选项来处理这些复杂性，包括我们将在本章中深入探讨的流程图特性导致单台机器内的异构计算能力。TBB 为我们提供了强大的选项来处理这些复杂性，包括我们将在本章中深入探讨的流程图特性导致单台机器内的异构计算能力。TBB 为我们提供了强大的选项来处理这些复杂性，包括我们将在本章中深入探讨的流程图特性3.

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

计算机代写|C++作业代写C++代考|Organization of the Book and Preface

Posted on 2022年5月26日2022年5月26日 by statistics-lab

如果你也在怎样代写C++这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的C++及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

计算机代写|C++作业代写C++代考|Organization of the Book and Preface

计算机代写|C++作业代写C++代考|Think Parallel

For those new to parallel programming, we offer this Preface to provide a foundation that will make the remainder of the book more useful, approachable, and self-contained. We have attempted to assume only a basic understanding of $C$ programming and introduce the key elements of $\mathrm{C}++$ that $\mathrm{TBB}$ relies upon and supports. We introduce parallel programming from a practical standpoint that emphasizes what makes parallel programs most effective. For experienced parallel programmers, we hope this Preface will be a quick read that provides a useful refresher on the key vocabulary and thinking that allow us to make the most of parallel computer hardware.

After reading this Preface, you should be able to explain what it means to “Think Parallel” in terms of decomposition, scaling, correctness, abstraction, and patterns. You will appreciate that locality is a key concern for all parallel programming. You will understand the philosophy of supporting task programming instead of thread programming – a revolutionary development in parallel programming supported by TBB. You will also understand the elements of $\mathrm{C}_{++}$programming that are needed above and beyond a knowledge of $\mathrm{C}$ in order to use TBB well.
The remainder of this Preface contains five parts:
(1) An explanation of the motivations behind TBB (begins on page xxi)
(2) An introduction to parallel programming (begins on page xxvi)
(3) An introduction to locality and caches – we call “Locality and the Revenge of the Caches” – the one aspect of hardware that we feel essential to comprehend for top performance with parallel programming (begins on page lii)
(4) An introduction to vectorization (SIMD) (begins on page $l x$ )
(5) An introduction to the features of $\mathrm{C}++$ (beyond those in the $\mathrm{C}$ language) which are supported or used by TBB (begins on page lxii)

计算机代写|C++作业代写C++代考|Motivations Behind Threading Building Blocks

TBB first appeared in 2006. It was the product of experts in parallel programming at Intel, many of whom had decades of experience in parallel programming models, including OpenMP. Many members of the TBB team had previously spent years helping drive OpenMP to the great success it enjoys by developing and supporting OpenMP implementations. Appendix A is dedicated to a deeper dive on the history of TBB and the core concepts that go into it, including the breakthrough concept of task-stealing schedulers.

Born in the early days of multicore processors, TBB quickly emerged as the most popular parallel programming model for $\mathrm{C}++$ programmers. TBB has evolved over its first decade to incorporate a rich set of additions that have made it an obvious choice for parallel programming for novices and experts alike. As an open source project, TBB has enjoyed feedback and contributions from around the world.

TBB promotes a revolutionary idea: parallel programming should enable the programmer to expose opportunities for parallelism without hesitation, and the underlying programming model implementation (TBB) should map that to the hardware at runtime.
Understanding the importance and value of TBB rests on understanding three things: (1) program using tasks, not threads; (2) parallel programming models do not need to be messy; and (3) how to obtain scaling, performance, and performance portability with portable low overhead parallel programming models such as TBB. We will dive into each of these three next because they are so important! It is safe to say that the importance of these were underestimated for a long time before emerging as cornerstones in our understanding of how to achieve effective, and structured, parallel programming.

计算机代写|C++作业代写C++代考|Program Using Tasks Not Threads

Parallel programming should always be done in terms of tasks, not threads. We cite an authoritative and in-depth examination of this by Edward Lee at the end of this Preface. In 2006, he observed that “For concurrent programming to become mainstream, we must discard threads as a programming model.”
Parallel programming expressed with threads is an exercise in mapping an application to the specific number of parallel execution threads on the machine we happen to run upon. Parallel programming expressed with tasks is an exercise in exposing opportunities for parallelism and allowing a runtime (e.g., TBB runtime) to map tasks onto the hardware at runtime without complicating the logic of our application.
Threads represent an execution stream that executes on a hardware thread for a time slice and may be assigned other hardware threads for a future time slice. Parallel programming in terms of threads fail because they are too often used as a one-to-one correspondence between threads (as in execution threads) and threads (as in hardware threads, e.g., processor cores). A hardware thread is a physical capability, and the number of hardware threads available varies from machine to machine, as do some subtle characteristics of various thread implementations.

In contrast, tasks represent opportunities for parallelism. The ability to subdivide tasks can be exploited, as needed, to fill available threads when needed.

With these definitions in mind, a program written in terms of threads would have to map each algorithm onto specific systems of hardware and software. This is not only a distraction, it causes a whole host of issues that make parallel programming more difficult, less effective, and far less portable.

Whereas, a program written in terms of tasks allows a runtime mechanism, for example, the TBB runtime, to map tasks onto the hardware which is actually present at runtime. This removes the distraction of worrying about the number of actual hardware threads available on a system. More importantly, in practice this is the only method which opens up nested parallelism effectively. This is such an important capability, that we will revisit and emphasize the importance of nested parallelism in several chapters.

C++/C代写

计算机代写|C++作业代写C++代考|Think Parallel

对于那些刚接触并行编程的人，我们提供这个前言是为了提供一个基础，使本书的其余部分更加有用、平易近人、自成一体。我们试图假设只有一个基本的理解C编程和介绍的关键要素C++那吨乙乙依靠和支持。我们从实用的角度介绍并行编程，强调什么使并行程序最有效。对于有经验的并行程序员，我们希望这本前言能够快速阅读，提供对关键词汇和思想的有用复习，使我们能够充分利用并行计算机硬件。

读完这篇前言，你应该能够从分解、缩放、正确性、抽象和模式等方面解释“并行思考”的含义。您将意识到局部性是所有并行编程的关键问题。您将了解支持任务编程而不是线程编程的理念——TBB 支持的并行编程的革命性发展。您还将了解C++超越知识所需的编程C为了用好TBB。
本前言的其余部分包含五个部分：
(1) 解释 TBB 背后的动机（从第 xxi 页开始）
(2) 并行编程介绍（从第 xxvi 页开始）
(3) 对局部性和缓存的介绍——我们称之为“局部性和缓存的复仇”——我们认为硬件的一个方面是我们认为对于实现并行编程的最佳性能至关重要的一个方面（从第 lii 页开始）
(4) 矢量化 (SIMD) 简介（从第 1 页开始）lX)
(5) 特点介绍C++（除了那些在CTBB 支持或使用的语言）（从第 lxii 页开始）

计算机代写|C++作业代写C++代考|Motivations Behind Threading Building Blocks

TBB 于 2006 年首次出现。它是英特尔并行编程专家的产物，其中许多人在包括 OpenMP 在内的并行编程模型方面拥有数十年的经验。TBB 团队的许多成员此前曾花费数年时间帮助推动 OpenMP 取得巨大成功，通过开发和支持 OpenMP 实施。附录 A 致力于深入探讨 TBB 的历史和其中的核心概念，包括任务窃取调度程序的突破性概念。

TBB 诞生于多核处理器的早期，迅速成为最流行的并行编程模型C++程序员。TBB 在其第一个十年中不断发展，包含了一组丰富的附加功能，使其成为新手和专家等并行编程的明显选择。作为一个开源项目，TBB 得到了来自世界各地的反馈和贡献。

TBB 提出了一个革命性的想法：并行编程应该使程序员能够毫不犹豫地展示并行性的机会，并且底层编程模型实现 (TBB) 应该在运行时将其映射到硬件。
理解 TBB 的重要性和价值在于理解三件事：（1）程序使用任务，而不是线程；(2) 并行编程模型不需要凌乱；(3) 如何通过可移植的低开销并行编程模型（例如 TBB）获得可扩展性、性能和性能可移植性。接下来我们将深入研究这三个，因为它们非常重要！可以肯定地说，在成为我们理解如何实现有效且结构化的并行编程的基石之前，它们的重要性在很长一段时间内都被低估了。

计算机代写|C++作业代写C++代考|Program Using Tasks Not Threads

并行编程应该始终根据任务而不是线程来完成。我们在前言的末尾引用了 Edward Lee 对此进行的权威而深入的研究。2006 年，他观察到“要使并发编程成为主流，我们必须放弃线程作为编程模型。”
用线程表示的并行编程是将应用程序映射到我们碰巧运行的机器上特定数量的并行执行线程的练习。用任务表达的并行编程是一种展示并行性机会的练习，并允许运行时（例如，TBB 运行时）在运行时将任务映射到硬件上，而不会使我们的应用程序的逻辑复杂化。
线程表示在硬件线程上执行一个时间片的执行流，并且可以为未来的时间片分配其他硬件线程。线程方面的并行编程失败了，因为它们经常被用作线程（如在执行线程中）和线程（如在硬件线程中，例如处理器内核）之间的一一对应。硬件线程是一种物理能力，可用的硬件线程数量因机器而异，各种线程实现的一些细微特征也是如此。

相反，任务代表了并行的机会。可以根据需要利用细分任务的能力来在需要时填充可用线程。

考虑到这些定义，根据线程编写的程序必须将每个算法映射到特定的硬件和软件系统上。这不仅会让人分心，还会引发一系列问题，使并行编程更加困难、效率低下，而且可移植性也大大降低。

然而，根据任务编写的程序允许运行时机制，例如 TBB 运行时，将任务映射到运行时实际存在的硬件上。这消除了担心系统上可用的实际硬件线程数量的干扰。更重要的是，在实践中，这是唯一有效打开嵌套并行的方法。这是一项非常重要的能力，我们将在几章中重新讨论并强调嵌套并行的重要性。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写