## 计算机代写|嵌入式软件代写Embedded Software代考|ELEC3607

## 计算机代写|嵌入式软件代写Embedded Software代考|Wait States and Burst Accesses

There are a variety of different types of storage, all of which have their advantages and disadvantages. RAM is fast as well as readable and writable, but is said to be volatile as it loses its contents if not permanently powered. Flash is persistent memory (non-volatile) but access to it is relatively slow. In most cases it is so slow that access to it must be artificially slowed down. This is achieved by using ‘wait states’ for each access during which the processor waits for the memory to respond.
As discussed in detail in Section 2.3, each time memory is accessed, the address to be accessed must be specified. With respect to the transfer of user data, the exchange of address information can be seen as a kind of overhead (Figure 7). During the execution of code, several memory locations are very often read in sequence, especially whenever there are no jumps or function calls (keyword: basic block). The same applies to the initialization of variables with values from the flash: the values are often stored in memory one after the other.

In both cases there would be many individual read accesses, each with significant overhead (Figure 8). To make this type of access more efficient, many memories offer burst accesses. These can transfer an entire range of data starting from a single address (Figure 9), significantly reducing the overhead.

In a tax office, a clerk deals with the affairs of four clients in one morning. Her files are put on the desk for quick access. After all, she has to look at individual documents again and again and does not want to fetch the file from the archive for each document and then return the file back to the archive after viewing it. That would be inefficient.

This office procedure describes the concept of a cache very well. A comparatively small but very fast memory (desktop equates to cache) is loaded with the current contents of a much larger, but also much slower, memory (archive equates to flash or shared RAM), as in Figure 10.

With larger processors, further gradations or cache levels come into play. The example of the tax office could be extended as follows to illustrate multi-level caches. Between the desk and the archive there may also be a drawer unit on castors under the desk, as well as a filing cabinet in the office. This results in the following gradation: desk equates to level 1 cache, hanging file register equates to level 2 cache, filing cabinet equates to level 3 cache, and finally archive equals flash or shared RAM. Usually the word ‘level’ is not written out in full but simply replaced by an ‘ $L$ ‘. Thus we speak of an $\mathrm{L} 1$ cache, $\mathrm{L} 2$ cache and so on.

If data or code is to be read and it is already in the cache, this is called a cache hit. If they are not in the cache and must first be fetched from main memory, there is a cache miss.

## 计算机代写|嵌入式软件代写Embedded Software代考|Cache Structure and Cache Rows

Each cache is divided into cache lines, each line being several dozen bytes in size. The main memory is an integer multiple larger than the cache, so the cache fits in ‘ $n$ times’. When transferring data to or from the cache, an entire cache line is always transferred by burst access.

The assignment of cache lines to the memory addresses in the main memory is not freely selectable. Instead, it results from the position of the line in the cache. Figure 11 illustrates the relationship. Cache line 3 , for example, can only be matched with memory areas marked with a ‘ 3 ‘. In reality, the size ratio is more pronounced than the 1:4 ratio used in the figure and the number of cache lines is also significantly higher. Table 2 shows the parameters as they are defined for first generation Infineon AURIX processors.

To illustrate how the cache works, let us assume a concrete situation in which a function FunctionA has already been loaded into a cache line (Figure 12). Obviously, the function is small enough to fit completely into a cache line. Three different cases will be considered below.

What happens if the cached function FunctionA now calls: (I) the function FunctionB; (II) the function Functionc; or (III) the function FunctionA (i.e. recursively calls itself)?
(I) Function FunctionB is loaded into cache line 3 and thus overwrites Functiona.
(II) Function Functione is loaded into cache line 4 and FunctionA remains in cache line 3.
(III) Nothing happens because FunctionA is already in cache line 3 .

## 计算机代写|嵌入式软件代写Embedded Software代考|Cache Structure and Cache Rows

(I) 函数 FunctionB 被加载到缓存行 3 并因此覆盖 Functiona。
(II) 函数 Functione 被加载到缓存行 4 中，而 FunctionA 保留在缓存行 3 中。
(III) 什么都没有发生，因为 FunctionA 已经在缓存行 3 中。

## 计算机代写|嵌入式软件代写Embedded Software代考|CSE2425

## 计算机代写|嵌入式软件代写Embedded Software代考|Code Execution

Section $1.3$ explained how the executable machine code is generated and that this code is a collection of machine instructions. The computational core of a microprocessor is constantly processing machine instructions. These instructions are loaded sequentially from program memory (or code memory) into the execution unit, whereupon they are decoded and then executed.

The program counter (PC) has already been mentioned, and it can be thought of as pointing to the current command in the program memory. As long as there are no jump commands or commands calling a (sub-)function, the PC is increased by one memory location once the processing of a command is complete. As a result, the PC points to the next command, which in turn is loaded into the execution unit, and then decoded and executed. The program memory is primarily a sequence of machine commands.

At this point it should be mentioned that a series of machine commands without any jump or call is referred to as a basic block. More precisely, a basic block is a series of machine instructions whose execution always starts with the first instruction, then sequentially executes all its instructions and terminates with the execution of the last instruction. The processor does not jump into, or out of, the basic block at any other point than its first or last instruction respectively. Basic blocks play an important role, amongst other things, in static code analysis, so we will return to this topic later.

The instructions provided by a processor are described in the processor’s Instruction Set Reference Manual. Knowledge of the instruction set of a processor is essential for optimizing software at the code level. Section $8.3$ will cover this in detail.

How an instruction set is documented, coded, and handled will be illustrated using the example of an add instruction on the 8-bit Microchip AVR processor. Microchip AVR processors have 32 data/address registers. Their role as data register or address register depends on the instruction. Figure 5 shows an excerpt (a single page) from the instruction set reference manual for the Microchip AVR ATmega processor [3], namely the section that describes the add command with carry flag. The description in textual and operational form $(R d \leftarrow R d+R r+C)$ is followed by the definition of the syntax.

## 计算机代写|嵌入式软件代写Embedded Software代考|Memory Addressing and Addressing Modes

The addressing mode describes how the memory is accessed. Each memory access requires the definition of the address to access as well as what should be done with the data at that address. This could range from using it to store data at the address, read from it, jump to it, call a subroutine at this address, and so on.

For runtime optimization at the code level, it is essential to know the addressing modes of the respective processor. Most processor architecture manuals (often part of the instruction reference manual) have a section that describes the available addressing modes in detail.

As the previous Section $2.2$ showed, the opcode defines what should happen, such as ‘continue program execution at address x’ (a jump command), or ‘load the contents of address y into working register $\mathrm{d} 4$ ‘. The address to which some action should occur is passed as a parameter. On a 32-bit processor, the address bus has a width of 32 bits. Almost all processors are designed so that there is one byte of memory for each address. Thus $2^{32}=4,294,967,296$ single bytes can be addressed, which corresponds to 4 gigabytes. Strictly speaking, according to the IEC [4] it should be called 4 gibibytes because the prefix giga stands for $10^9$ and not for $2^{30}$. In practice, however, the prefixes kibi $\left(2^{10}\right)$, mebi $\left(2^{20}\right)$, gibi $\left(2^{30}\right)$, tebi $\left(2^{40}\right)$ etc., which are based on powers of two, are hardly ever used. For this reason, we will also talk about kilobytes and megabytes in the following when referring to $2^{10}$ or $2^{20}$ bytes respectively.

But back to the 4-gigabyte address space. Most embedded systems, even those with 32-bit processors, have much smaller quantities of memory, typically ranging from a few kilobytes to a few megabytes.

If 32 -bit addresses were always used this would be very inefficient as, for each memory access, the opcode as well as the full 32-bit address would have to be loaded. For this reason, all processors offer a range of addressing modes in addition to far addressing, the name given to the use of the full address bus width.

It is difficult to describe all existing types of addressing comprehensively and it is not useful at this point. Instead, some examples will be picked out for certain processors that differ in their implementation from the description here or have special features. Additionally, processor manufacturers have come up with a number of special addressing types that are not discussed here. For the following explanation of addressing types a fictive 16-bit processor is used. It has 64 kilobytes of program memory (flash) and 64 kilobytes of data memory (RAM). It also has eight data registers $R 0 \ldots R 7$ and eight address registers $A 0 \ldots A 7$. With each clock cycle, the CPU reads one word, i.e. 16 bits, from the program memory.

## 计算机代写|嵌入式软件代写Embedded Software代考|CSCl1600

## 计算机代写|嵌入式软件代写Embedded Software代考|Phase Driven Process Model: The V-Model

The V-model describes a concept to approach software development. It has been used for decades in the automotive industry and is usually also available – at least in the background – when newer concepts, such as Scrum, are being developed. Like so many technical developments it has its origin in the military sector. Later, it was transferred to the civilian sector and was adapted to new development requirements in the versions V-Model 97 and V-Model XT[1].

The ‘ $\mathrm{V}$ ‘ of the $\mathrm{V}$-model represents the idealized course of development in a coordinate system with two axes. The horizontal axis is a time axis with the left side marking the project start. The vertical axis describes the level of abstraction, from ‘detailed’ at the bottom to ‘abstract’ at the top (Figure 1). A project should start at a high level of abstraction with a collection of user or customer requirements for the product. This is followed by the basic design of the product at system level. Over the course of the project the design is then broken down, refined, and improved. Additional, more detailed requirements may also emerge later. Once the design phase is complete, the implementation phase begins. With respect to a software project, this corresponds to the coding. Individual components are brought together at the integration phase, and this is followed by the verification, checking the fulfillment of the requirements at the different levels of abstraction. The final phase of validation takes place at the highest level of abstraction by ensuring that the user or customer requirements are met.

If a requirement is not fulfilled, the cause of the deviation must be eliminated. The cause is inevitably somewhere on the $\mathrm{V}$ between the requirement and its verification. As a result, all subsequent dependent steps must also be corrected, adapted, or at least repeated.

It is clear that the effort and associated cost to resolve an issue will grow depending on how late that issue is discovered. This reads like a truism, but it is astonishing how many projects completely neglect embedded software timing. Far too often, runtime problems are investigated in a late project phase with lots of hectic, high costs and risk, only for them to be temporarily corrected or mitigated.

## 计算机代写|嵌入式软件代写Embedded Software代考|Linker Script

The linker script (or linker control file) also plays a very important role. Strictly speaking, it should be called the ‘locator script’ or ‘locator control file’ but, as mentioned earlier, most vendors combine the locator into the linker.

Listing 5 shows an excerpt of the linker script of an 8-bit microcontroller, the Microchip AVR ATmega32, that has 32 KByte flash, 2 KByte RAM, and 1 KByte EEPROM.

The linker script tells the locator how to distribute the symbols across the different memory regions of the microcontroller. This is usually done as follows. First, in the $\mathrm{C}$ or assembler source code, all symbols are assigned to a specific section or, more precisely, a specific input section. This assignment will be made implicitly if the programmer has not made them explicitly. The following section names have become commonplace, representing default sections.

It is also an unwritten rule that section names begin with a dot.
In the step that follows, the instructions in the linker script assign all input sections to output sections which, in turn, are finally mapped to the available memory. For a better understanding, the .text sections in the Listing 5 have corresponding comments.

In a classic linker script, as in Listing 5, the definitions of the available memory are at the beginning. These are followed by the definitions of the output sections and, with each of these definitions, the link to the inputs along with the allocation to a memory region.

A very good description of the syntax of this linker script and the underlying concepts can be found in the GNU linker manual [2]. Most other tool vendors have at least adopted the concepts of the GNU linker (ld) for their linkers, often copying the entire syntax of the linker script.

## 计算机代写|嵌入式软件代写Embedded Software代考|Phase Driven Process Model: The V-Model

V 模型描述了一种方法软件开发的概念。它已在汽车行业中使用了几十年，并且通常在开发新概念（例如 Scrum）时也可用（至少在后台使用）。像许多技术发展一样，它起源于军事领域。后来，它被转移到民用领域，并适应了 V-Model 97 和 V-Model XT[1] 版本的新开发要求。

