## 电子工程代写|计算机系统结构代写Computer Systems Architecture代考|CMSC411

statistics-lab™ 为您的留学生涯保驾护航 在代写计算机系统结构Computer Systems Architecture方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写计算机系统结构Computer Systems Architecture方面经验极为丰富，各种代写计算机系统结构Computer Systems Architecture相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 电子工程代写|计算机系统结构代写Computer Systems Architecture代考|Power Model

To predict the energy consumption for a schedule, an appropriate power model for the processor is necessary. Basically, a model is a simplified representation of the reality. The complexity of a model increases significantly with its accuracy. As the power consumption of a processor depends on several factors, like the temperature, instruction mix, usage rate and technology of the processor, there exist numerous approaches in the literature to model the power consumption of a processor with varying complexities and accuracies, like in $[4,7,11]$ or $[18]$.
In general the power consumption can be subdivided into a static part, that is frequency-independent and a dynamic part, that depends both on the frequency and on the supply voltage.
$$P_{\text {processor }}=P_{\text {static }}+P_{\text {dynamic }}$$
The static power consumption consists of the idle power $P_{\text {idle }}$ and a device specific constant $s$, that is only needed when the processor is under load.
$$P_{\text {processor }}= \begin{cases}P_{\text {idle }}+s+P_{\text {dynamic }} & \text { if under load } \ P_{\text {idle }} & \text { else }\end{cases}$$
The dynamic power consumption is typically modeled as a cubic frequency function [2], as the frequency and voltage are loosely linearly correlated ${ }^{1}$. Additionally the supply voltage and thus the dynamic power consumption depends on the load level of a core. As we only consider fully loaded cores or cores that are in idle mode (at the lowest frequency) the influence of a load level can be given by a parameter $w \in{0,1}$. If we assume a homogeneous multi-core processor with $n$ cores, a simple power model for the dynamic part can be given by the following equation, where $a, b$ and $\beta$ are device specific constants, $i$ is the core index and $f_{c u r r, i}$ is the current frequency of core $i$ :
$$P_{d y \text { namic }}=\sum_{i=0}^{n-1} w_{i} \cdot \beta\left(f_{c u r, i}{ }^{3}+a \cdot f_{c u r, i}{ }^{2}+b \cdot f_{c u r, i}\right)$$
Only if a core runs at a higher frequency under full load, the dynamic part of the power consumption for the processor is considered.

## 电子工程代写|计算机系统结构代写Computer Systems Architecture代考|Model Validation

To prove the accuracy of the power model, we used three different computer systems with Intel processors as test platforms:

1. Intel i7 $3630 \mathrm{qm}$ Ivy-Bridge based laptop
2. Intel i5 4570 Haswell based desktop machine
3. Intel is E1620 server machine
To construct the power model, we extracted the power values by physical experiments using the Intel RAPL tool. As described in Sect. 4.1, we measured the power consumption for each frequency combination for $10 \mathrm{~s}$ with a sampling rate of $10 \mathrm{~ms}$ and repeated all measurements five times. We test the power model for six different workload scenarios: ALU-, FPU-, SSE-, BP- and RAMintensive workloads and for a combination of these tests as mixed workload. The measured power values were used to construct the power model for each platform and scenario. The architecture specific tuning parameters $(s, \beta, a$ and $b)$ in Eqs. 2 and 3 were then determined using a least squares analysis.

Table 1 shows exemplary the individual parameters for each platform for a mixed workload after fitting the physical measurements to Eqs. 2 and 3 and optimizing the tuning parameters. The results of the least squares analysis for the other tests only differ slightly from the mixed workload scenario. The different parameters for the power model can be determined and saved in advance and used for several classes of applications with a specific workload type dominating. Then the power consumption can be measured during the execution of the first application and compared to the different power models to find the best suitable for the whole class.

## 电子工程代写|计算机系统结构代写Computer Systems Architecture代考|Power Model

$$P_{\text {processor }}=P_{\text {static }}+P_{\text {dynamic }}$$

$$P_{\text {processor }}=\left{P_{\text {idle }}+s+P_{\text {dynamic }} \quad \text { if under load } P_{\text {idle }} \quad\right. \text { else }$$

$$P_{d y \text { namic }}=\sum_{i=0}^{n-1} w_{i} \cdot \beta\left(f_{c u r, i}{ }^{3}+a \cdot f_{c u r, i}{ }^{2}+b \cdot f_{c u r, i}\right)$$

## 电子工程代写|计算机系统结构代写Computer Systems Architecture代考|Model Validation

1. 英特尔 i73630q米基于 Ivy-Bridge 的笔记本电脑
2. 基于 Intel i5 4570 Haswell 的台式机
3. Intel 是 E1620 服务器机器
为了构建功率模型，我们使用 Intel RAPL 工具通过物理实验提取功率值。如节中所述。4.1，我们测量了每个频率组合的功耗10 s采样率为10 米s并重复所有测量五次。我们针对六种不同的工作负载场景测试电源模型：ALU-、FPU-、SSE-、BP-和 RAM 密集型工作负载，并将这些测试组合为混合工作负载。测量的功率值用于构建每个平台和场景的功率模型。架构特定的调整参数(s,b,一个和b)在方程式中。然后使用最小二乘分析确定 2 和 3。

## 广义线性模型代考

statistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 电子工程代写|计算机系统结构代写Computer Systems Architecture代考|EE282

statistics-lab™ 为您的留学生涯保驾护航 在代写计算机系统结构Computer Systems Architecture方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写计算机系统结构Computer Systems Architecture方面经验极为丰富，各种代写计算机系统结构Computer Systems Architecture相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 电子工程代写|计算机系统结构代写Computer Systems Architecture代考|Runtime System

RUPS (Runtime system for User Preferences-defined Schedules) is a scheduling tool for parallel platforms with features allowing the user to input various preferences e.g. $P E, E$ or $F T$ in the schedule. Schedules are then created with the RUPS tool – optimized for the user defined preference in question. RUPS consists of four main parts illustrated in Fig. 2(a). The processor details are extracted in Part 1 and passed to the scheduler (Part 3), which in turn optimizes the schedule based on the processor parameters and user preferences (Part 2). Finally, the schedule is passed to the runtime system (Part 4), and scheduled on the processor. In this section, we describe the details of these four parts.

At the first use of RUPS, it has to be initialized once with the system check tool to adjust the power model for the processor used. This tool measures the power consumption of the processor for all supported frequencies and for a different number of cores under full load. We measure the power consumption for 10 seconds (s) with a sampling rate of 10 milliseconds (ms). All cases are repeated five times to compensate high power values that could occur due to unexpected background processes. Between each case, all cores are set to the lowest frequency in idle mode for $5 \mathrm{~s}$ to reduce the rise in temperature of the processor and thus the influence on the power consumption. Then the averaged results of the measure points for each case are used as values for the power model.

## 电子工程代写|计算机系统结构代写Computer Systems Architecture代考|Runtime System

Figure 2(b) illustrates a short overview of the runtime system. We simulate a failure by exiting a MPI-process just before the corresponding task is started. The other processes are then informed about the failure by an error handler. We integrate a testing mode where one additional MPI-process is started to measure the energy consumption with the help of Intel RAPL. The measurement process measures the energy with a sample rate of $10 \mathrm{~ms}$.

## 电子工程代写|计算机系统结构代写Computer Systems Architecture代考|Runtime System

RUPS (Runtime system for User Preferences-defined Schedules) 是一种用于并行平台的调度工具，具有允许用户输入各种偏好的功能，例如磷和,和或者F吨在时间表中。然后使用 RUPS 工具创建计划——针对相关用户定义的偏好进行了优化。RUPS 由图 2(a) 所示的四个主要部分组成。处理器详细信息在第 1 部分中提取并传递给调度程序（第 3 部分），调度程序又根据处理器参数和用户偏好优化调度（第 2 部分）。最后，调度被传递到运行时系统（第 4 部分），并在处理器上调度。在本节中，我们将详细介绍这四个部分。

## 广义线性模型代考

statistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 电子工程代写|计算机系统结构代写Computer Systems Architecture代考|CS1533

statistics-lab™ 为您的留学生涯保驾护航 在代写计算机系统结构Computer Systems Architecture方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写计算机系统结构Computer Systems Architecture方面经验极为丰富，各种代写计算机系统结构Computer Systems Architecture相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 电子工程代写|计算机系统结构代写Computer Systems Architecture代考|The Trade-Off Problem

A combination of all three objectives is possible in general, but there does not exit an overall optimal solution. In this context the degree of $F T$ is rated by the overhead in performance (and energy) that results from the fault tolerance techniques in both the fault-free and fault case. Therefore, a compromise between the optimization criteria must be made. While one criterion is improved, either one or both of the others are worsened.

When we focus on $P E$ of a schedule, it is dependent on the mapping of the tasks. The more an application can be parallelized the better is the performance. Additionally, modern processors support several frequencies at which a processor can run. Thus, tasks should be accelerated as much as possible, i.e. use the highest supported frequency of a PU. In contrast, a more parallelized application results in fewer gaps between tasks and thus in fewer possibilities to include duplicates without shifting successor tasks. This results in a high performance overhead in case of a failure, e.g. a low $F T$. Additionally, running on a high frequency typically leads to a high $E$. When we focus on $F T$, duplicates should be executed completely in the fault-free case and available but unused PUs should also be considered for mapping duplicates to minimize performance loss in case of a fault. In this case, duplicates may lead to shifts of original tasks and thus to a low $P E$ in the fault-free case. In terms of $E$, both executing duplicates completely and using available PUs not necessary for the original tasks result in a high $E$. Is the focus put on $E$, low frequencies and short duplicates are preferable. But low frequencies lead to low $P E$ and short duplicates to a high performance overhead $(F T)$ in case of a failure.

In addition, the main focus of a user varies in different situations. For example, in a time critical environment, $P E$ is the most important criterion next to $F T$. Thus, in this situation $P E$ and also $F T$ is usually favored over minimizing $E$. Another situation is, that a failure occurs extremely rarely and thus $E$ is becoming more important. Other examples exist in mobile devices where $E$ is the most important criterion next to $P E$. The main focus is therefore put on $E$ and $P E$ while $F T$ is neglected. However, the alignment of the optimization is very situational and ultimately depends highly on the user preferences.

## 电子工程代写|计算机系统结构代写Computer Systems Architecture代考|Previous Approach

Fechner et al. [10] provides a fault-tolerant duplication-based scheduling approach that guarantees no overhead in a fault-free case. Starting from an already existing schedule (and taskgraph), each original task is copied and its duplicate $(D)$ is placed on another $\mathrm{PU}$ than the original task so that in case of a failure the schedule execution can be continued. We assume homogeneous PUs and a fail-stop model, where a failure of a PU might result from a faulty hardware, software or network. We only consider one failure per schedule execution.

If an original task has finished it sends a commit message to its corresponding D so that it can be aborted. Schedules often comprise several gaps between tasks resulting from dependencies. Ds can be placed either in those gaps or directly between two succeeding tasks. To avoid an overhead in a fault-free case, in all situations where a D would lead to a shift of all its successor tasks only a placeholder, a so called dummy duplicate (DD) is placed. DDs are only extended to fully Ds in case of a failure. To reduce the communication overhead, Ds are placed with a short delay, so called slack. Thus, either the results of an original task are sent to its successor tasks or the results of the corresponding $\mathrm{D}$, but not both. Figure 1(a) illustrates an example taskgraph. For a better understanding the communication times and the slack are disregarded. Figure 1(b) and (c) show the resulting schedules of two strategies, the first uses only DDs the second uses Ds and DDs.

## 广义线性模型代考

statistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。