### 计算机代写|机器学习代写machine learning代考|Transformations

## 计算机代写|机器学习代写machine learning代考|Linear Transformations

Figure $2.1$ presented an example for a $\mathbb{R} \rightarrow \mathbb{R}$ linear transformation. More generally, a $n \times n$ square matrix can be employed to perform a $\mathbb{R}^{n} \rightarrow \mathbb{R}^{n}$ linear transformation through multiplication. Figures 2.5a-c illustrate how a matrix $\mathbf{A}$ transforms a space $\mathbf{x}$ into another $\mathrm{x}^{\prime}$ using the matrix product operation $\mathrm{x}^{\prime}-\mathbf{A x}$. The deformation of the circle and the underlying grid (see (a)) show the effect of various transformations. Note that the terms on the main

diagonal of A control the transformations along the $x_{1}^{\prime}$ and $x_{2}^{\prime}$ axes, and the nondiagonal terms control the transformation dependency between both axes, (see, for example, figure 2.6).

The determinant of a square matrix A measures how much the transformation contracts or expands the space:

• $\operatorname{det}(\mathbf{A})=1$ : preserves the space/volume
• $\operatorname{det}(\mathbf{A})=0$ : collapses the space/volume along a subset of dimensions, for example, 2-D space $\rightarrow$ 1-D space (see figure $2.7$ )

In the examples presented in figure $2.5 \mathrm{a}-\mathrm{c}$, the determinant quantifies how much the area/volume is changed in the transformed space; for the circle, it corresponds to the change of area caused by the transformation. As shown in figure $2.5 \mathrm{a}$, if $\mathbf{A}=\mathbf{I}$, the transformation has no effect so $\operatorname{det}(\mathbf{A})=1$. For a square matrix $[\mathbf{A}]_{n \times n}$, $\operatorname{det}(\mathbf{A}): \mathbb{R}^{n \times n} \rightarrow \mathbb{R}$.

## 计算机代写|机器学习代写machine learning代考|Eigen Decomposition

Linear tranaformations operate on several dimensions, such as in Lhe case presenled in figure $2.6$ where the tramsformalion inlruduces dependency between variables. Eigen decomposition enables finding a linear transformation that removes the dependency while preserving the area/volume. A square matrix $[\mathbf{A}]{n \times n}$ can be decomposed in eigenvectors $\left{\nu{1}, \cdots, \nu_{n}\right}$ and eigenvalues $\left{\lambda_{1}, \cdots, \lambda_{n}\right}$. In its matrix form.
$$\mathbf{A}=\mathbf{V} \operatorname{diag}(\boldsymbol{\lambda}) \mathbf{V}^{-1}$$
where
\begin{aligned} &\mathbf{V}=\left[\begin{array}{lll} \boldsymbol{\nu}{1} & \cdots & \boldsymbol{\nu}{n} \end{array}\right] \ &\boldsymbol{\lambda}=\left[\begin{array}{lll} \lambda_{1} & \cdots & \lambda_{n} \end{array}\right]^{\top} . \end{aligned}
Figure $2.6$ presents the eigen decomposition of the transformation $\mathbf{x}^{\prime}=\mathbf{A x}$. Eigenvectors $\nu_{1}$ and $\nu_{2}$ describe the new referential into which the transformation is independently applied to each axis. Eigenvalues $\lambda_{1}$ and $\lambda_{2}$ describe the transformation magnitude along each eigenvector.

A matrix is positive definite if all eigenvalues $>0$, and a matrix is positive semidefinite (PSD) if all eigenvalues $\geq 0$. The determinant of a matrix corresponds to the product of its eigenvalues. Therefore, in the case where one eigenvalue equals zero, it indicates that two or more dimensions are linearly dependent and have collapsed into a single one. The transformation matrix is then said to be singular. Figure $2.7$ presents an example of a nearly singular transformation. For a positive semidefinite matrix $\mathbf{A}$ and for any

vector $\mathbf{x}$, the following relation holds:
$$\mathbf{x}^{\top} \mathbf{A} \mathbf{x} \geq 0$$
This property is employed in $\S 3.3 .5$ to define the requirements for an admissible covariance matrix.
A more exhaustive review of linear algebra can be found in dedicated textbooks such as the one by Kreyszig. ${ }^{1}$

## 计算机代写|机器学习代写machine learning代考|Probability Theory

The interpretation of probability theory employed in this book follows Laplace’s view of “6ommon sense reduced to calculus.” It means that probabilities describe our state of knowledge rather than intrinsically aleatory phenomena. In practice, few phenomena are actually intrinsically unpredictable. Take, for example, a coin as displayed in figure 3.1. Whether a coin toss results in either heads or tails has nothing to do with an inherently aleatory process. The outcome appears unpredictable because of the lack of knowledge about the coin’s initial position, speed, and acceleration. If we could gather information about the coin’s initial kinematic conditions, the outcome would become predictable. Devices that can throw coins with repeatable initial kinematic conditions will lead to repeatable outcomes.

Figure $3.2$ presents another example where we consider the elastic modulus ${ }^{1} E$ at one specific location in a dam. Notwithstanding long-term effects such as creep, ${ }^{2}$ at any given location, $E$ does not vary with time: $E$ is a deterministic, yet unknown constant. Probability is employed here as a tool to describe our incomplete knowledge of that constant.
There are two types of uncertainty: aleatory and epistemic. aleatory uncertainty is characterized by its irreducibility: no information can either reduce or alter it. Alternately, epistemic uncertainty refers to a lack of knowledge that can be altered by new information. In an engineering context, aleatory uncertainties arise when we are concerned with future realizations that have yet to occur. Epistemic uncertainty applies to any other case dealing with deterministic, yet unknown quantities.

This book approaches machine learning using probability theory because in many practical engineering problems, the number of ubservaliuns availible is limuiled. frum a few te id few lhuusanal. In such a context, the amount of information available is typically

insufficient to eliminate epistemic uncertainties. When large data sets are available, probabilistic and deterministic methods may lead to indistinguishable results; the opposite occurs when little data is available. Therefore, the less we know about it, the stronger the aryument for approaching a problem using probability theory.
In this chapter, a review of set theory lays the foundation for probability theory, where the central part is the concept of random variables. Machine learning methods are built from an ensemble of functions organized in a clever way. Therefore, the last part of this chapter looks at what happens when random variables are introduced into deterministic functions.
For specific notions related to probability theory that are outside the scope of this chapter, the reader should refer to dedicated textbooks such as those by Box and Tiao; ${ }^{3}$ Ang and Tang. ${ }^{4}$

## 计算机代写|机器学习代写machine learning代考|Linear Transformations

A 的对角线控制沿 x1′ 和 x2′ 轴的变换，非对角项控制两个轴之间的变换依赖性，（例如，参见，图 2.6）。

• det⁡(A)=1 ：保留空间/体积
• det⁡(A)=0 ：沿维度子集折叠空间/体积，例如二维空间 → 一维空间（见图 2.7 ）

## 计算机代写|机器学习代写machine learning代考|Eigen Decomposition

$$\mathbf{A}=\mathbf{V} \operatorname{diag}(\boldsymbol{\lambda}) \mathbf{V}^{-1}$$
where
\begin{aligned} &\mathbf{ V}=\left[\begin{array}{lll} \boldsymbol{\nu}{1} & \cdots & \boldsymbol{\nu}{n} \end{array}\right] \ &\boldsymbol{\lambda}=\left[\begin{array}{lll} \lambda_{1} & \cdots & \lambda_{n} \end{array}\right]^{\​​top} 。 \end{aligned}

x⊤Ax≥0
§§3.3.5
1

