## 计算机代写|C++作业代写C++代考|Small Overhead, Big Benefits for C++

C++ 是一种高级语言，它是由Bjarne Stroustrup 于1979 年在贝尔实验室开始设计开发的。 C++ 进一步扩充和完善了C 语言，是一种面向对象的程序设计语言。 C++ 可运行于多种平台上，如Windows、MAC 操作系统以及UNIX 的各种版本。

statistics-lab™ 为您的留学生涯保驾护航 在代写C++方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写C++代写方面经验极为丰富，各种代写C++相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|C++作业代写C++代考|Big Benefits for C++

We do not mean to make too big a deal about performance loss, nor do we wish to deny it. For simple C++ code written in a “Fortran” style, with a single layer of wellbalanced parallel loops, the dynamic nature of TBB may not be needed at all. However, the limitations of such a coding style are an important factor in why TBB exists. TBB was designed to efficiently support nested, concurrent, and sequential composition of parallelism and to dynamically map this parallelism on to a target platform. Using a composable library like TBB, developers can build applications by combining components and libraries that contain parallelism without worrying that they will negatively interfere with each other. Importantly, TBB does not require us to restrict the parallelism we express to avoid performance problems. For large, complicated applications using $\mathrm{C}++$, TBB is therefore easy to recommend without disclaimers.
The TBB library has evolved over the years to not only adjust to new platforms but also to demands from developers that want a bit more control over the choices the library makes in mapping parallelism to the hardware. While TBB $1.0$ had very few performance controls for users, TBB 2019 has quite a few more – such as affinity controls,

constructs for work isolation, hooks that can be used to pin threads to cores, and so on. The developers of TBB worked hard to design these controls to provide just the right level of control without sacrificing composability.

The interfaces provided by the library are nicely layered – TBB provides high-level templates that suit the needs of most programmers, focusing on common cases. But it also provides low-level interfaces so we can drill down and create tailored solutions for our specific applications if needed. TBB has the best of both worlds. We typically rely on the default choices of the library to get great performance but can delve into the details if we need to.

## 计算机代写|C++作业代写C++代考|Evolving Support for Parallelism in TBB and C++

Both the TBB library and the $\mathrm{C}++$ language have evolved significantly since the introduction of the original TBB. In $2006, C_{++}$had no language support for parallel programming, and many libraries, including the Standard Template Library (STL), were not easily used in parallel programs because they were not thread-safe.

The $\mathrm{C}++$ language committee has been busy adding features for threading directly to the language and its accompanying Standard Template Library (STL). Figure 1-1 shows new and planned $\mathrm{C}_{++}$features that address parallelism. Even though we are big fans of TBB, we would in fact prefer if all of the fundamental support needed for parallelism is in the $\mathrm{C}++$ language itself. That would allow TBB to utilize a consistent foundation on which to build higher-level parallelism abstractions. The original versions of TBB had to address a lack of $\mathrm{C}++$ language support, and this is an area where the $\mathrm{C}++$ standard has grown significantly to fill the foundational voids

that TBB originally had no choice but to fill with features such as portable locks and atomics. Unfortunately, for $\mathrm{C}_{++}$developers, the standard still lacks features needed for full support of parallel programming. Fortunately, for readers of this book, this means that TBB is still relevant and essential for effective threading in $\mathrm{C}++$ and will likely stay relevant for many years to come.
It is very important to understand that we are not complaining about the $\mathrm{C}++$ standard process. Adding features to a language standard is best done very carefully, with careful review. The $\mathrm{C}++11$ standard committee, for instance, spent huge energy on a memory model. The significance of this for parallel programming is critical for every library that builds upon the standard. There are also limits to what a language standard should include, and what it should support. We believe that the tasking system and the flow graph system in TBB is not something that will directly become part of a language standard. Even if we are wrong, it is not something that will happen anytime soon.

## 计算机代写|C++作业代写C++代考|Recent C++ Additions for Parallelism

As shown in Figure 1-1, the $\mathrm{C}++11$ standard introduced some low-level, basic building blocks for threading, including std: : async, std:: future, and std:: thread. It also introduced atomic variables, mutual exclusion objects, and condition variables. These extensions require programmers to do a lot of coding to build up higher-level abstractions – but they do allow us to express basic parallelism directly in $\mathrm{C}++$. The C++11 standard was a clear improvement when it comes to threading, but it doesn’t provide us with the high-level features that make it easy to write portable, efficient parallel code. It also does not provide us with tasks or an underlying work-stealing task scheduler.
The $\mathrm{C}++17$ standard introduced features that raise the level of abstraction above these low-level building blocks, making it easier for us to express parallelism without having to worry about every low-level detail. As we discuss later in this book, there are still some significant limitations, and so these features are not yet sufficiently expressive or performant – there’s still a lot of work to do in the $\mathrm{C}++$ standard.

The most pertinent of these $\mathrm{C}++17$ additions are the execution policies that can be used with the Standard Template Library (STL) algorithms. These policies let us choose whether an algorithm can be safely parallelized, vectorized, parallelized and vectorized, or if it needs to retain its original sequenced semantics. We call an STL implementation that supports these policies a Parallel STL.

Looking into the future, there are proposals that might be included in a future $\mathrm{C}_{++}$ standard with even more parallelism features, such as resumable functions, executors, task blocks, parallel for loops, SIMD vector types, and additional execution policies for the STL algorithms.

## 计算机代写|C++作业代写C++代考|Big Benefits for C++

TBB 库多年来不断发展，不仅可以适应新平台，还可以满足开发人员的需求，这些开发人员希望更好地控制库在将并行性映射到硬件时所做的选择。虽然待定1.0对用户的性能控制很少，TBB 2019 有更多——比如亲和力控制，

## 计算机代写|C++作业代写C++代考|Evolving Support for Parallelism in TBB and C++

TBB 库和C++自最初的 TBB 引入以来，语言已经发生了显着变化。在2006,C++没有对并行编程的语言支持，包括标准模板库 (STL) 在内的许多库都不容易在并行程序中使用，因为它们不是线程安全的。

TBB 原本别无选择，只能填充便携式锁和原子锁等功能。不幸的是，对于C++开发人员，该标准仍然缺乏完全支持并行编程所需的功能。幸运的是，对于本书的读者来说，这意味着 TBB 对于有效的线程化仍然是相关的和必不可少的。C++并且可能会在未来很多年保持相关性。

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|C++作业代写C++代考|Gustafson’s Observations Regarding Amdahl’s Law

C++ 是一种高级语言，它是由Bjarne Stroustrup 于1979 年在贝尔实验室开始设计开发的。 C++ 进一步扩充和完善了C 语言，是一种面向对象的程序设计语言。 C++ 可运行于多种平台上，如Windows、MAC 操作系统以及UNIX 的各种版本。

statistics-lab™ 为您的留学生涯保驾护航 在代写C++方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写C++代写方面经验极为丰富，各种代写C++相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|C++作业代写C++代考|Gustafson’s Observations Regarding Amdahl’s Law

Amdahl’s Law views programs as fixed, while we make changes to the computer. But experience seems to indicate that as computers get new capabilities, applications change to take advantage of these features. Most of today’s applications would not run on computers from 10 years ago, and many would run poorly on machines that are just 5 years old. This observation is not limited to obvious applications such as video games; it applies also to office applications, web browsers, photography, and video editing software.

More than two decades after the appearance of Amdahl’s Law, John Gustafson, while at Sandia National Labs, took a different approach and suggested a reevaluation of Amdahl’s Law. Gustafson noted that parallelism is more useful when we observe that workloads grow over time. This means that as computers have become more powerful, we have asked them to do more work, rather than staying focused on an unchanging workload. For many problems, as the problem size grows, the work required for the parallel part of the problem grows faster than the part that cannot be parallelized (the serial part). Hence, as the problem size grows, the serial fraction decreases, and, according to Amdahl’s Law, the scalability improves. We can start with an application that looks like Figure P-10, but if the problem scales with the available parallelism, we are likely to see the advancements illustrated in Figure P-13. If the sequential parts still take the same amount of time to perform, they become less and less important as

a percentage of the whole. The algorithm eventually reaches the conclusion shown in Figure P-14. Performance grows at the same rate as the number of processors, which is called linear or order of $n$ scaling, denoted as $O(n)$.
Even in our example, the efficiency of the program is still greatly limited by the serial parts. The efficiency of using processors in our example is about $40 \%$ for large numbers of processors. On a supercomputer, this might be a terrible waste. On a system with multicore processors, one can hope that other work is running on the computer in parallel to use the processing power our application does not use. This new world has many complexities. In any case, it is still good to minimize serial code, whether we take the “glass half empty” view and favor Amdahl’s Law or we lean toward the “glass half full” view and favor Gustafson’s observations.

## 计算机代写|C++作业代写C++代考|Serial vs. Parallel Algorithms

One of the truths in programming is this: the best serial algorithm is seldom the best parallel algorithm, and the best parallel algorithm is seldom the best serial algorithm.
This means that trying to write a program that runs well on a system with one processor core, and also runs well on a system with a dual-core processor or quad-core processor, is harder than just writing a good serial program or a good parallel program.
Supercomputer programmers know from practice that the work required grows quickly as a function of the problem size. If the work grows faster than the sequential overhead (e.g., communication, synchronization), we can fix a program that scales poorly just by increasing the problem size. It’s not uncommon at all to take a program that won’t scale much beyond 100 processors and scale it nicely to 300 or more processors just by doubling the size of the problem.

If you know what a thread is, feel free to skip ahead to the section “Safety in the Presence of Concurrency.” It’s important to be comfortable with the concept of a thread, even though the goal of TBB is to abstract away thread management. Fundamentally, we will still be constructing a threaded program, and we will need to understand the implications of this underlying implementation.

All modern operating systems are multitasking operating systems that typically use a preemptive scheduler. Multitasking means that more than one program can be active at a time. We may take it for granted that we can have an e-mail program and a web browser program running at the same time. Yet, not that long ago, this was not the case. A preemptive scheduler means the operating system puts a limit on how long one program can use a processor core before it is forced to let another program use it. This is how the operating system makes it appear that our e-mail program and our web browser are running at the same time when only one processor core is actually doing the work. Generally, each program runs relatively independent of other programs. In particular, the memory where our program variables will reside is completely separate from the memory used by other processes. Our e-mail program cannot directly assign a new value to a variable in the web browser program. If our e-mail program can communicate with our web browser – for instance, to have it open a web page from a link we received in e-mail – it does so with some form of communication that takes much more time than a memory access.
This isolation of programs from each other has value and is a mainstay of computing today. Within a program, we can allow multiple threads of execution to exist in a single program. An operating system will refer to the program as a process, and the threads of execution as (operating system) threads.

All modern operating systems support the subdivision of processes into multiple threads of execution. Threads run independently, like processes, and no thread knows what other threads are running or where they are in the program unless they synchronize explicitly. The key difference between threads and processes is that the threads within a process share all the data of the process. Thus, a simple memory access can set a variable in another thread. We will refer to this as “shared mutable state” (changeable memory locations that are shared) – and we will decry the pain that sharing can cause in this book. Managing the sharing of data, is a multifaceted problem that we included in our list of enemies of parallel programming. We will revisit this challenge, and solutions, repeatedly in this book.

## 计算机代写|C++作业代写C++代考|Serial vs. Parallel Algorithms

This isolation of programs from each other has value and is a mainstay of computing today. Within a program, we can allow multiple threads of execution to exist in a single program. An operating system will refer to the program as a process, and the threads of execution as (operating system) threads.

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|C++作业代写C++代考|Achieving Parallelism

C++ 是一种高级语言，它是由Bjarne Stroustrup 于1979 年在贝尔实验室开始设计开发的。 C++ 进一步扩充和完善了C 语言，是一种面向对象的程序设计语言。 C++ 可运行于多种平台上，如Windows、MAC 操作系统以及UNIX 的各种版本。

statistics-lab™ 为您的留学生涯保驾护航 在代写C++方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写C++代写方面经验极为丰富，各种代写C++相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|C++作业代写C++代考|Achieving Parallelism

Coordinating people around the job of preparing and mailing the envelopes is easily expressed by the following two conceptual steps:

1. Assign people to tasks (and feel free to move them around to balance the workload).
2. Start with one person on each of the six tasks but be willing to split up a given task so that two or more people can work on it together.
The six tasks are folding, stuffing, sealing, addressing, stamping, and mailing. We also have six people (resources) to help with the work. That is exactly how TBB works best: we define tasks and data at a level we can explain and then split or combine data to match up with resources available to do the work.
The first step in writing a parallel program is to consider where the parallelism is. Many textbooks wrestle with task and data parallelism as though there were a clear choice. TBB allows any combination of the two that we express. If we are lucky, our program will have an abundant amount of data parallelism available for us to exploit. To simplify this work, TBB requires only that we specify tasks and how to split them. For a completely data-parallel task, in TBB we will define one task to which we give all the data. That task will then be split up automatically

to use the available hardware parallelism. The implicit synchronization (as opposed to synchronization we directly ask for with coding) will often eliminate the need for using locks to achieve synchronization. Referring back to our enemies list, and the fact that we hate locks, the implicit synchronization is a good thing. What do we mean by “implicit” synchronization? Usually, all we are saying is that synchronization occurred but we did not explicitly code a synchronization. At first, this should seem like a “cheat.” After all, synchronization still happened – and someone had to ask for it! In a sense, we are counting on these implicit synchronizations being more carefully planned and implemented. The more we can use the standard methods of TBB, and the less we explicitly write our own locking code, the better off we will be – in general.
By letting TBB manage the work, we hand over the responsibility for splitting up the work and synchronizing when needed. The synchronization done by the library for us, which we call implicit synchronization, in turn often eliminates the need for an explicit coding for synchronization (see Chapter 5 ).

We strongly suggest starting there, and only venturing into explicit synchronization (Chapter 5 ) when absolutely necessary or beneficial. We can say, from experience, even when such things seem to be necessary – they are not. You’ve been warned. If you are like us, you’ll ignore the warning occasionally and get burned. We have.

People have been exploring decomposition for decades, and some patterns have emerged. We’ll cover this more later when we discuss design patterns for parallel programming.

## 计算机代写|C++作业代写C++代考|Terminology: Scaling and Speedup

The scalability of a program is a measure of how much speedup the program gets as we add more computing capabilities. Speedup is the ratio of the time it takes to run a program without parallelism vs. the time it takes to run in parallel. A speedup of $4 \times$ indicates that the parallel program runs in a quarter of the time of the serial program. An example would be a serial program that takes 100 seconds to run on a one-processor machine and 25 seconds to run on a quad-core machine.

As a goal, we would expect that our program running on two processor cores should run faster than our program running on one processor core. Likewise, running on four processor cores should be faster than running on two cores.
Any program will have a point of diminishing returns for adding parallelism. It is not uncommon for performance to even drop, instead of simply leveling off, if we force the use of too many compute resources. The granularity at which we should stop subdividing a problem can be expressed as a grain size. TBB uses a notion of grain size to help limit the splitting of data to a reasonable level to avoid this problem of dropping in performance. Grain size is generally determined automatically, by an automatic partitioner within TBB, using a combination of heuristics for an initial guess and dynamic refinements as execution progresses. However, it is possible to explicitly manipulate the grain size settings if we want to do so. We will not encourage this in this book, because we seldom will do better in performance with explicit specifications than the automatic partitioner in TBB, it tends to be somewhat machine specific, and therefore explicitly setting grain size reduces performance portability.

As Thinking Parallel becomes intuitive, structuring problems to scale will become second nature.

## 计算机代写|C++作业代写C++代考|Amdahl’s Law

Renowned computer architect, Gene Amdahl, made observations regarding the maximum improvement to a computer system that can be expected when only a portion of the system is improved. His observations in 1967 have come to be known as Amdahl’s Law. It tells us that if we speed up everything in a program by $2 x$, we can expect the

resulting program to run $2 \times$ faster. However, if we improve the performance of only $2 / 5$ th of the program by $2 \times$, the overall system improves only by $1.25 \times$.
Amdahl’s Law is easy to visualize. Imagine a program, with five equal parts, that runs in 500 seconds, as shown in Figure P-10. If we can speed up two of the parts by $2 \times$ and $4 \times$, as shown in Figure P-11, the 500 seconds are reduced to only 400 (1.25 × speedup) and 350 seconds (1.4× speedup), respectively. More and more, we are seeing the limitations of the portions that are not speeding up through parallelism. No matter how many processor cores are available, the serial portions create a barrier at 300 seconds that will not be broken (see Figure P-12) leaving us with only $1.7 \times$ speedup. If we are limited to parallel programming in only $2 / 5$ th of our execution time, we can never get more than a $1.7 \times$ boost in performance！

## 计算机代写|C++作业代写C++代考|Achieving Parallelism

1. 将人员分配给任务（并随意调动他们以平衡工作量）。
2. 从一个人开始处理六项任务中的每一项，但愿意将给定的任务分开，以便两个或更多人可以一起工作。
这六项任务是折叠、填充、密封、寻址、盖章和邮寄。我们还有六个人（资源）来帮助工作。这正是 TBB 的最佳工作方式：我们在可以解释的级别定义任务和数据，然后拆分或组合数据以匹配可用于完成工作的资源。
编写并行程序的第一步是考虑并行性在哪里。许多教科书都在与任务和数据并行性作斗争，好像有一个明确的选择。TBB 允许我们表达的两者的任意组合。如果幸运的话，我们的程序将有大量的数据并行性可供我们利用。为了简化这项工作，TBB 只需要我们指定任务以及如何拆分它们。对于完全数据并行的任务，在 TBB 中，我们将定义一个任务，我们将向其提供所有数据。然后该任务将自动拆分

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|C++作业代写C++代考|Terminology: Data Parallelism

C++ 是一种高级语言，它是由Bjarne Stroustrup 于1979 年在贝尔实验室开始设计开发的。 C++ 进一步扩充和完善了C 语言，是一种面向对象的程序设计语言。 C++ 可运行于多种平台上，如Windows、MAC 操作系统以及UNIX 的各种版本。

statistics-lab™ 为您的留学生涯保驾护航 在代写C++方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写C++代写方面经验极为丰富，各种代写C++相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|C++作业代写C++代考|Terminology: Data Parallelism

Data parallelism (Figure $\mathrm{P}-3$ ) is easy to picture: take lots of data and apply the same transformation to each piece of the data. In Figure P-3, each letter in the data set is capitalized and becomes the corresponding uppercase letter. This simple example shows that given a data set and an operation that can be applied element by element, we can apply the same task in parallel to each element. Programmers writing code for supercomputers love this sort of problem and consider it so easy to do in parallel that it has been called embarrassingly parallel. A word of advice: if you have lots of data parallelism, do not be embarrassed – take advantage of it and be very happy. Consider i happy parallelism.

When comparing the effort to find work to do in parallel, an approach that focuses on data parallelism is limited by the amount of data we can grab to process. Approaches based on task parallelism alone are limited by the different task types we program. Whil both methods are valid and important, it is critical to find parallelism in the data that we process in order to have a truly scalable parallel program. Scalability means that our application can increase in performance as we add hardware (e.g., more processor cores) provided we have enough data. In the age of big data, it turns out that big data and parallel programming are made for each other. It seems that growth in data sizes is a reliable source of additional work. We will revisit this observation, a little later in this Preface, when we discuss Amdahl’s Law.

## 计算机代写|C++作业代写C++代考|Terminology: Pipelining

While task parallelism is harder to find than data parallelism, a specific type of task parallelism is worth highlighting: pipelining. In this kind of algorithm, many independent tasks need to be applied to a stream of data. Each item is processed by each stage, as shown by the letter A in (Figure P-4). A stream of data can be processed more quickly when we use a pipeline, because different items can pass through different stages at the same time, as shown in Figure P-5. In these examples, the time to get a result may not be faster (referred to as the latency measured as the time from input to output) but the throughput is greater because it is measured in terms of completions (output) per unit of time. Pipelines enable parallelism to increase throughput when compared with a sequential (serial) processing. A pipeline can also be more sophisticated: it can reroute data or skip steps for chosen items. TBB has specific support for simple pipelines (Chapter 2) and very complex pipelines (Chapter 3). Of course, each step in the pipeline can use data or task parallelism as well. The composability of TBB supports this seamlessly.

## 计算机代写|C++作业代写C++代考|Example of Exploiting Mixed Parallelism

Consider the task of folding, stuffing, sealing, addressing, stamping, and mailing letters. If we assemble a group of six people for the task of stuffing many envelopes, we can arrange each person to specialize in and perform their assigned task in a pipeline fashion (Figure P-6). This contrasts with data parallelism, where we divide up the supplies and give a batch of everything to each person (Figure P-7). Each person then does all the steps on their collection of materials.
Figure P- 7 is clearly the right choice if every person has to work in a different location far from each other. That is called coarse-grained parallelism because the interactions between the tasks are infrequent (they only come together to collect envelopes, then leave and do their task, including mailing). The other choice shown in Figure P-6 approximates what we call fine-grained parallelism because of the frequent interactions (every envelope is passed along to every worker in various steps of the operation).
Neither extreme tends to fit reality, although sometimes they may be close enough to be useful. In our example, it may turn out that addressing an envelope takes enough time to keep three people busy, whereas the first two steps and the last two steps require only one person on each pair of steps to keep up. Figure P-8 illustrates the steps with the corresponding size of the work to be done. We can conclude that if we assigned only one person to each step as we see done in Figure P-6, that we would be “starving” some people in this pipeline of work for things to do – they would be idle. You might say it would be hidden “underemployment.” Our solution, to achieve a reasonable balance in our pipeline (Figure P-9) is really a hybrid of data and task parallelism.

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|C++作业代写C++代考|Concurrent vs. Parallel

C++ 是一种高级语言，它是由Bjarne Stroustrup 于1979 年在贝尔实验室开始设计开发的。 C++ 进一步扩充和完善了C 语言，是一种面向对象的程序设计语言。 C++ 可运行于多种平台上，如Windows、MAC 操作系统以及UNIX 的各种版本。

statistics-lab™ 为您的留学生涯保驾护航 在代写C++方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写C++代写方面经验极为丰富，各种代写C++相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|C++作业代写C++代考|Concurrent vs. Parallel

It is worth noting that the terms concurrent and parallel are related, but subtly different. Concurrent simply means “happening during the same time span” whereas parallel is more specific and is taken to mean “happening at the same time (at least some of the time).” Concurrency is more like what a single person tries to do when multitasking, whereas parallel is akin to what multiple people can do together. Figure P-1 illustrates the concepts of concurrency vs. parallelism. When we create effective parallel programs, we are aiming to accomplish more than just concurrency. In general, speaking of concurrency will mean there is not an expectation for a great deal of activity to be truly parallel – which means that two workers are not necessarily getting more work done than one could in theory (see tasks A and B in Figure P-1). Since the work is not done sooner, concurrency does not improve the latency of a task (the delay to start a task). Using the term parallel conveys an expectation that we improve latency and throughput (work done in a given time). We explore this in more depth starting on page xxxv when we explore limits of parallelism and discuss the very important concepts of Amdahl’s Law.

## 计算机代写|C++作业代写C++代考|Enemies of Parallelism

Bearing in mind the enemies of parallel programming will help understand our advocacy for particular programming methods. Key parallel programming enemies include

• Locks: In parallel programming, locks or mutual exclusion objects (mutexes) are used to provide a thread with exclusive access to a resource – blocking other threads from simultaneously accessing the same resource. Locks are the most common explicit way to ensure parallel tasks update shared data in a coordinated fashion (as opposed to allowing pure chaos). We hate locks because they serialize part of our programs, limiting scaling. The sentiment “we hate locks” is on our minds throughout the book. We hope to instill this mantra in you as well, without losing sight of when we must synchronize properly. Hence, a word of caution: we actually do love locks when they are needed, because without them disaster will strike. This love/hate relationship with locks needs to be understood.
• Shared mutable state: Mutable is another word for “can be changed.” Shared mutable state happens any time we share data among multiple threads, and we allow it to change while being shared. Such sharing either reduces scaling when synchronization is needed and used correctly, or it leads to correctness issues (race conditions or deadlocks) when synchronization (e.g., a lock) is incorrectly applied. Realistically, we need shared mutable state when we write interesting applications. Thinking about careful handling of shared mutable state may be an easier way to understand the basis of our love/hate relationship with locks. In the end, we all end up “managing” shared mutable state and the mutual exclusion (including locks) to make it work as we wish.
• Not “Thinking Parallel”: Use of clever bandages and patches will not make up for a poorly thought out strategy for scalable algorithms. Knowing where the parallelism is available, and how it can be

exploited, should be considered before implementation. Trying to add parallelism to an application, after it is written, is fraught with peril. Some preexisting code may shift to use parallelism relatively well, but most code will benefit from considerable rethinking of algorithms.

• Forgetting that algorithms win: This may just be another way to say “Think Parallel.” The choice of algorithms has a profound effect on the scalability of applications. Our choice of algorithms determine how tasks can divide, data structures are accessed, and results are coalesced. The optimal algorithm is really the one which serves as the basis for optimal solution. An optimal solution is a combination of the appropriate algorithm, with the best matching parallel data structure, and the best way to schedule the computation over the data. The search for, and discovery of, algorithms which are better is seemingly unending for all of us as programmers. Now, as parallel programmers, we must add scalable to the definition of better for an algorithm.

## 计算机代写|C++作业代写C++代考|Terminology of Parallelism

The vocabulary of parallel programming is something we need to learn in order to converse with other parallel programmers. None of the concepts are particularly hard, but they are very important to internalize. A parallel programmer, like any programmer, spends years gaining a deep intuitive feel for their craft, despite the fundamentals being simple enough to explain.
We will discuss decomposition of work into parallel tasks, scaling terminology, correctness considerations, and the importance of locality due primarily to cache effects. When we think about our application, how do we find the parallelism?
At the highest level, parallelism exists either in the form of data to operate on in parallel, or in the form of tasks to execute in parallel. And they are not mutually exclusive. In a sense, all of the important parallelism is in data parallelism. Nevertheless, we will introduce both because it can be convenient to think of both. When we discuss scaling, and Amdahl’s Law, our intense bias to look for data parallelism will become more understandable.

## 计算机代写|C++作业代写C++代考|Enemies of Parallelism

• 锁：在并行编程中，锁或互斥对象（互斥体）用于为线程提供对资源的独占访问——阻止其他线程同时访问同一资源。锁是确保并行任务以协调的方式更新共享数据的最常见的显式方式（而不是允许纯粹的混乱）。我们讨厌锁，因为它们序列化了我们程序的一部分，限制了扩展。“我们讨厌锁”的情绪贯穿整本书。我们也希望将这个口头禅灌输给你，同时不要忽视我们何时必须正确同步。因此，请注意：我们实际上会在需要时使用爱锁，因为没有它们，灾难就会降临。需要了解这种与锁的爱/恨关系。
• 共享可变状态：可变是“可以更改”的另一个词。每当我们在多个线程之间共享数据时，都会发生共享可变状态，并且我们允许它在共享时更改。这种共享要么在需要和正确使用同步时减少缩放，要么在不正确地应用同步（例如，锁）时导致正确性问题（竞争条件或死锁）。实际上，当我们编写有趣的应用程序时，我们需要共享可变状态。仔细考虑共享可变状态的处理可能是一种更容易理解我们对锁的爱/恨关系的基础的方法。最后，我们最终都会“管理”共享可变状态和互斥（包括锁），以使其按我们的意愿工作。
• 不是“并行思考”：使用巧妙的绷带和补丁无法弥补可扩展算法的考虑不周的策略。了解并行性在哪里可用，以及如何实现

• 忘记算法获胜：这可能只是“并行思考”的另一种说法。算法的选择对应用程序的可扩展性有着深远的影响。我们对算法的选择决定了如何划分任务、访问数据结构以及合并结果。最优算法实际上是作为最优解的基础的算法。最佳解决方案是适当的算法、最佳匹配的并行数据结构以及调度数据计算的最佳方式的组合。对于我们所有的程序员来说，寻找和发现更好的算法似乎是永无止境的。现在，作为并行程序员，我们必须将可扩展性添加到更好的算法的定义中。

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|C++作业代写C++代考|Parallel Programming Does Not Have to Be Messy

C++ 是一种高级语言，它是由Bjarne Stroustrup 于1979 年在贝尔实验室开始设计开发的。 C++ 进一步扩充和完善了C 语言，是一种面向对象的程序设计语言。 C++ 可运行于多种平台上，如Windows、MAC 操作系统以及UNIX 的各种版本。

statistics-lab™ 为您的留学生涯保驾护航 在代写C++方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写C++代写方面经验极为丰富，各种代写C++相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|C++作业代写C++代考|Parallel Programming Does Not Have to Be Messy

TBB offers composability for parallel programming, and that changes everything. Composability means we can mix and match features of TBB without restriction. Most notably, this includes nesting. Therefore, it makes perfect sense to have a parallel_for inside a parallel_for loop. It is also okay for a parallel_for to call a subroutine, which then has a parallel_for within it.
Supporting composable nested parallelism turns out to be highly desirable because it exposes more opportunities for parallelism, and that results in more scalable applications. OpenMP, for instance, is not composable with respect to nesting because each level of nesting can easily cause significant overhead and consumption of resources leading to exhaustion and program termination. This is a huge problem when you consider that a library routine may contain parallel code, so we may experience issues using a non-composable technique if we call the library while already doing parallelism. No such problem exists with TBB, because it is composable. TBB solves this, in part, by letting use expose opportunities for parallelism (tasks) while TBB decides at runtime how to map them to hardware (threads).

This is the key benefit to coding in terms of tasks (available but nonmandatory parallelism (see “relaxed sequential semantics” in Chapter 2)) instead of threads (mandatory parallelism). If a parallel_for was considered mandatory, nesting would cause an explosion of threads which causes a whole host of resource issues which can easily (and often do) crash programs when not controlled. When parallel_for exposes

available nonmandatory parallelism, the runtime is free to use that information to match the capabilities of the machine in the most effective manner.
We have come to expect composability in our programming languages, but most parallel programming models have failed to preserve it (fortunately, TBB does preserve composability!). Consider “if” and “while” statements. The $\mathrm{C}$ and $\mathrm{C}++$ languages allow them to freely mix and nest as we desire. Imagine this was not so, and we lived in a world where a function called from within an if statement was forbidden to contain a while statement! Hopefully, any suggestion of such a restriction seems almost silly. TBB brings this type of composability to parallel programming by allowing parallel constructs to be freely mixed and nested without restrictions, and without causing issues.

## 计算机代写|C++作业代写C++代考|Scaling, Performance, and Quest for Performance

Perhaps the most important benefit of programming with TBB is that it helps create a performance portable application. We define performance portability as the characteristic that allows a program to maintain a similar “percentage of peak performance” across a variety of machines (different hardware, different operating systems, or both). We would like to achieve a high percentage of peak performance on many different machines without the need to change our code.
We would also like to see a $16 \times$ gain in performance on a 64 -core machine vs. a quad-core machine. For a variety of reasons, we will almost never see ideal speedup (never say never: sometimes, due to an increase in aggregate cache size we can see more than ideal speedup – a condition we call superlinear speedup).
WHAT IS SPEEDUP?
Speedup is formerly defined to be the time to run sequentially (not in parallel) divided by the time to run in parallel. If my program runs in 3 seconds normally, but in only 1 second on a quad-core processor, we would say it has a speedup of $3 \times$. Sometimes, we might speak of efficiency which is speedup divided by the number of processing cores. Our $3 \times$ would be $75 \%$ efficient at using the parallelism.
The ideal goal of a $16 \times$ gain in performance when moving from a quad-core machine to one with 64 cores is called linear scaling or perfect scaling.

## 计算机代写|C++作业代写C++代考|Introduction to Parallel Programming

To accomplish this, we need to keep all the cores busy as we grow their numbers – something that requires considerable available parallelism. We will dive more into this concept of “available parallelism” starting on page xxxvii when we discuss Amdahl’s Law and its implications.

For now, it is important to know that TBB supports high-performance programming and helps significantly with performance portability. The high-performance support comes because TBB introduces essentially no overhead which allows scaling to proceed without issue. Performance portability lets our application harness available parallelism as new machines offer more.
In our confident claims here, we are assuming a world where the slight additional overhead of dynamic task scheduling is the most effective at exposing the parallelism and exploiting it. This assumption has one fault: if we can program an application to perfectly match the hardware, without any dynamic adjustments, we may find a few percentage points gain in performance. Traditional High-Performance Computing (HPC) programming, the name given to programming the world’s largest computers for intense computations, has long had this characteristic in highly parallel scientific computations. HPC developer who utilize OpenMP with static scheduling, and find it does well with their performance, may find the dynamic nature of TBB to be a slight reduction in performance. Any advantage previously seen from such static scheduling is becoming rarer for a variety of reasons. All programming including HPC programming, is increasing in complexity in a way that demands support for nested and dynamic parallelism support. We see this in all aspects of HPC programming as well, including growth to multiphysics models, introduction of AI (artificial intelligence), and use of ML (machine learning) methods. One key driver of additional complexity is the increasing diversity of hardware, leading to heterogeneous compute capabilities within a single machine. TBB gives us powerful options for dealing with these complexities, including its flow graph features which we will dive into in Chapter $3 .$

## 计算机代写|C++作业代写C++代考|Parallel Programming Does Not Have to Be Messy

TBB 为并行编程提供了可组合性，这改变了一切。可组合性意味着我们可以不受限制地混合和匹配 TBB 的特性。最值得注意的是，这包括嵌套。因此，在 parallel_for 循环中包含一个 parallel_for 是非常有意义的。一个parallel_for 调用一个子例程也是可以的，子例程里面有一个parallel_for。

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 计算机代写|C++作业代写C++代考|Organization of the Book and Preface

C++ 是一种高级语言，它是由Bjarne Stroustrup 于1979 年在贝尔实验室开始设计开发的。 C++ 进一步扩充和完善了C 语言，是一种面向对象的程序设计语言。 C++ 可运行于多种平台上，如Windows、MAC 操作系统以及UNIX 的各种版本。

statistics-lab™ 为您的留学生涯保驾护航 在代写C++方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写C++代写方面经验极为丰富，各种代写C++相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 计算机代写|C++作业代写C++代考|Think Parallel

For those new to parallel programming, we offer this Preface to provide a foundation that will make the remainder of the book more useful, approachable, and self-contained. We have attempted to assume only a basic understanding of $C$ programming and introduce the key elements of $\mathrm{C}++$ that $\mathrm{TBB}$ relies upon and supports. We introduce parallel programming from a practical standpoint that emphasizes what makes parallel programs most effective. For experienced parallel programmers, we hope this Preface will be a quick read that provides a useful refresher on the key vocabulary and thinking that allow us to make the most of parallel computer hardware.

After reading this Preface, you should be able to explain what it means to “Think Parallel” in terms of decomposition, scaling, correctness, abstraction, and patterns. You will appreciate that locality is a key concern for all parallel programming. You will understand the philosophy of supporting task programming instead of thread programming – a revolutionary development in parallel programming supported by TBB. You will also understand the elements of $\mathrm{C}_{++}$programming that are needed above and beyond a knowledge of $\mathrm{C}$ in order to use TBB well.
The remainder of this Preface contains five parts:
(1) An explanation of the motivations behind TBB (begins on page xxi)
(2) An introduction to parallel programming (begins on page xxvi)
(3) An introduction to locality and caches – we call “Locality and the Revenge of the Caches” – the one aspect of hardware that we feel essential to comprehend for top performance with parallel programming (begins on page lii)
(4) An introduction to vectorization (SIMD) (begins on page $l x$ )
(5) An introduction to the features of $\mathrm{C}++$ (beyond those in the $\mathrm{C}$ language) which are supported or used by TBB (begins on page lxii)

## 计算机代写|C++作业代写C++代考|Motivations Behind Threading Building Blocks

TBB first appeared in 2006. It was the product of experts in parallel programming at Intel, many of whom had decades of experience in parallel programming models, including OpenMP. Many members of the TBB team had previously spent years helping drive OpenMP to the great success it enjoys by developing and supporting OpenMP implementations. Appendix A is dedicated to a deeper dive on the history of TBB and the core concepts that go into it, including the breakthrough concept of task-stealing schedulers.

Born in the early days of multicore processors, TBB quickly emerged as the most popular parallel programming model for $\mathrm{C}++$ programmers. TBB has evolved over its first decade to incorporate a rich set of additions that have made it an obvious choice for parallel programming for novices and experts alike. As an open source project, TBB has enjoyed feedback and contributions from around the world.

TBB promotes a revolutionary idea: parallel programming should enable the programmer to expose opportunities for parallelism without hesitation, and the underlying programming model implementation (TBB) should map that to the hardware at runtime.
Understanding the importance and value of TBB rests on understanding three things: (1) program using tasks, not threads; (2) parallel programming models do not need to be messy; and (3) how to obtain scaling, performance, and performance portability with portable low overhead parallel programming models such as TBB. We will dive into each of these three next because they are so important! It is safe to say that the importance of these were underestimated for a long time before emerging as cornerstones in our understanding of how to achieve effective, and structured, parallel programming.

Parallel programming should always be done in terms of tasks, not threads. We cite an authoritative and in-depth examination of this by Edward Lee at the end of this Preface. In 2006, he observed that “For concurrent programming to become mainstream, we must discard threads as a programming model.”
Parallel programming expressed with threads is an exercise in mapping an application to the specific number of parallel execution threads on the machine we happen to run upon. Parallel programming expressed with tasks is an exercise in exposing opportunities for parallelism and allowing a runtime (e.g., TBB runtime) to map tasks onto the hardware at runtime without complicating the logic of our application.

In contrast, tasks represent opportunities for parallelism. The ability to subdivide tasks can be exploited, as needed, to fill available threads when needed.

With these definitions in mind, a program written in terms of threads would have to map each algorithm onto specific systems of hardware and software. This is not only a distraction, it causes a whole host of issues that make parallel programming more difficult, less effective, and far less portable.

Whereas, a program written in terms of tasks allows a runtime mechanism, for example, the TBB runtime, to map tasks onto the hardware which is actually present at runtime. This removes the distraction of worrying about the number of actual hardware threads available on a system. More importantly, in practice this is the only method which opens up nested parallelism effectively. This is such an important capability, that we will revisit and emphasize the importance of nested parallelism in several chapters.

## 计算机代写|C++作业代写C++代考|Think Parallel

(1) 解释 TBB 背后的动机（从第 xxi 页开始）
(2) 并行编程介绍（从第 xxvi 页开始）
(3) 对局部性和缓存的介绍——我们称之为“局部性和缓存的复仇”——我们认为硬件的一个方面是我们认为对于实现并行编程的最佳性能至关重要的一个方面（从第 lii 页开始）
(4) 矢量化 (SIMD) 简介（从第 1 页开始）lX)
(5) 特点介绍C++（除了那些在CTBB 支持或使用的语言）（从第 lxii 页开始）

## 计算机代写|C++作业代写C++代考|Motivations Behind Threading Building Blocks

TBB 于 2006 年首次出现。它是英特尔并行编程专家的产物，其中许多人在包括 OpenMP 在内的并行编程模型方面拥有数十年的经验。TBB 团队的许多成员此前曾花费数年时间帮助推动 OpenMP 取得巨大成功，通过开发和支持 OpenMP 实施。附录 A 致力于深入探讨 TBB 的历史和其中的核心概念，包括任务窃取调度程序的突破性概念。

TBB 诞生于多核处理器的早期，迅速成为最流行的并行编程模型C++程序员。TBB 在其第一个十年中不断发展，包含了一组丰富的附加功能，使其成为新手和专家等并行编程的明显选择。作为一个开源项目，TBB 得到了来自世界各地的反馈和贡献。

TBB 提出了一个革命性的想法：并行编程应该使程序员能够毫不犹豫地展示并行性的机会，并且底层编程模型实现 (TBB) 应该在运行时将其映射到硬件。

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。