### 计算机代写|机器学习代写machine learning代考|Linear Functions

Figure $3.21$ b illustrates how a function $y=2 x$ transforms a random variable $X$ with mean $\mu_{X}=1$ and standard deviation $\sigma_{X}=0.5$ into $Y$ with mean $\mu_{Y}=2$ and standard deviation $\sigma_{y}=1$. In the machine learning context, it is common to employ linear functions of random variables $y=g(x)=a x+b$, as illustrated in figure 3.21a. Given a random variable $X$ with mean $\mu_{X}$ and variance $\sigma_{X}^{2}$, the change in the neighborhood size simplifies to
$$\left|\frac{d y}{d x}\right|=|a| .$$
In such a case, because of the linear property of the expectation operation (see $\S 3.3 .5$ ),
$$\mu_{Y}=g\left(\mu_{X}\right)=a \mu_{X}+b, \quad \sigma_{Y}=|a| \sigma_{X} .$$

Let us consider a set of $n$ random variables $\mathbf{X}$ defined by its mean vector and covariance matrix,
$$\mathbf{X}=\left[\begin{array}{c} X_{1} \ \vdots \ X_{n} \end{array}\right], \mu_{\mathbf{X}}=\left[\begin{array}{c} \mu X_{1} \ \vdots \ \mu_{X_{n}} \end{array}\right], \boldsymbol{\Sigma}{\mathbf{X}}=\left[\begin{array}{ccc} \sigma{X_{1}}^{2} & \cdots & \rho_{n} \sigma_{X_{1}} \sigma_{X_{n}} \ & \cdots & \vdots \ \text { sym. } & & \sigma_{X_{n}}^{2} \end{array}\right]$$
and the variables $\mathbf{Y}=\left[\begin{array}{llll}Y_{1} & Y_{2} & \cdots & Y_{n}\end{array}\right]^{\top}$ obtained from a linear function $\mathbf{Y}=\mathbf{g}(\mathbf{X})=\mathbf{A} \mathbf{X}+\mathbf{b}$ so that
The function outputs $\mathbf{Y}$ (i.e., the mean vector), covariance matrix, and the joint covariance are then described by
If instead of having an $n \rightarrow n$ function, we have an $n \rightarrow 1$ function $y=g(\mathbf{X})=\mathbf{a}^{\top} \mathbf{X}+b$, then the Jacobian simplifies to the gradient vector $\nabla g(\mathbf{x})=\left[\begin{array}{ll}\frac{\partial g(\mathbf{x})}{\partial x_{1}} & \cdots \frac{\partial g(\mathbf{x})}{\partial x_{n}}\end{array}\right]$, which is again equal to the vector $\mathbf{a}^{\top}$,
$$\underbrace{[]{1 \times 1}}{Y}=\underbrace{[]{1 \times n}}{\mathbf{a} T=\nabla g(\mathbf{x})} \times \underbrace{[]{n \times 1}^{[}}{\mathbf{X}}+\underbrace{[]{1 \times 1}}{b} .$$
The function output $Y$ is then described by
\begin{aligned} \mu_{Y} &=g\left(\boldsymbol{\mu}{\mathbf{X}}\right)=\mathbf{a}^{\boldsymbol{\top}} \boldsymbol{\mu}{\mathbf{X}}+b \ \sigma_{Y}^{2} &=\mathbf{a}^{\boldsymbol{\top}} \boldsymbol{\Sigma}_{\mathbf{X}} \mathbf{a} . \end{aligned}

## 计算机代写|机器学习代写machine learning代考|Linearization of Nonlinear Functions

Because of the analytic simplicity associated with linear functions of random variables, it is common to approximate nonlinear functions by linear ones using a Taylor series so that

In practice, the series are most often limited to the first-order approximation, so for a one-to-one function, it simplifies to
$$Y=g(X) \approx a X+b$$
Figure $3.22$ presents an example of such a linear approximation for a one-to-one transformation. Linearizing at the expected value $\mu_{x}$ minimizes the approximation errors because the linearization is then centered in the region associated with a high probability content for $f_{X}(x)$. In that case, a corresponds to the gradient of $g(x)$ evaluated at $\mu X$,
$$a=\left[\frac{d g(x)}{d x}\right]{x=\mu{X}} .$$
For the $n \rightarrow 1$ multivariate case, the linearized transformation leads to
\begin{aligned} Y=g(\mathbf{X}) & \approx \mathbf{a}^{\top} \mathbf{X}+b \ &=\nabla g\left(\boldsymbol{\mu}{\mathbf{X}}\right)\left(\mathbf{X}-\boldsymbol{\mu}{\mathbf{X}}\right)+g\left(\boldsymbol{\mu}{\mathbf{X}}\right) \end{aligned} where $Y$ has a mean and variance equal to \begin{aligned} \mu{Y} & \approx g\left(\boldsymbol{\mu}{\mathbf{X}}\right) \ \sigma{Y}^{2} & \approx \nabla g\left(\boldsymbol{\mu}{\mathbf{X}}\right) \boldsymbol{\Sigma}{\mathbf{X}} \nabla g\left(\boldsymbol{\mu}{\mathbf{X}}\right)^{\top} \end{aligned} For the $n \rightarrow n$ multivarlatec case, the linearized transformătlon leads to \begin{aligned} \mathbf{Y}=\mathbf{g}(\mathbf{X}) & \approx \mathbf{A X}+\mathbf{b} \ &=\mathbf{J}{\mathbf{Y}, \mathbf{X}}\left(\boldsymbol{\mu}{\mathbf{X}}\right)\left(\mathbf{X}-\boldsymbol{\mu}{\mathbf{X}}\right)+\mathbf{g}\left(\boldsymbol{\mu}{\mathbf{X}}\right) \end{aligned} where $Y$ is described by the mean vector and covariance matrix, \begin{aligned} &\boldsymbol{\mu}{\mathbf{Y}} \cong g\left(\boldsymbol{\mu}{\mathbf{X}}\right) \ &\boldsymbol{\Sigma}{\mathbf{Y}} \cong \mathbf{J}{\mathbf{Y}, \mathbf{X}}\left(\boldsymbol{\mu}{\mathbf{X}}\right) \boldsymbol{\Sigma}{\mathbf{X}} \mathbf{J}{\mathbf{Y}, \mathbf{X}}^{\top}\left(\boldsymbol{\mu}{\mathbf{X}}\right) \end{aligned} For multivariate nonlinear functions, the gradient or Jacobian is evaluated at the expected value $\boldsymbol{\mu}{\mathbf{X}}$.

## 计算机代写|机器学习代写machine learning代考|Normal Distribution

The definition of probability distributions $f_{X}(x)$ was left aside in chapter 3 . This chapter presents the formulation and properties for the probability distributions employed in this book: the Normal distribution for $x \in \mathbb{R}$, the log-normal for $x \in \mathbb{R}^{+}$, and the Beta for $x \in(0,1)$.

The most widely employed probability distribution is the Normal, also known as the Gaussian, distribution. In this book, the names Gaussian and Normal are employed interchangeably when describing a probability distribution. This section covers the mathematical foundation for the univariate and multivariate Normal and then details the properties explaining its widespread usage.

|d是dX|=|一个|.

μ是=G(μX)=一个μX+b,σ是=|一个|σX.

X=[X1 ⋮ Xn],μX=[μX1 ⋮ μXn],ΣX=[σX12⋯ρnσX1σXn ⋯⋮  符号。 σXn2]

If 描述，而不是n→n函数，我们有一个n→1功能是=G(X)=一个⊤X+b, 然后雅可比简化为梯度向量∇G(X)=[∂G(X)∂X1⋯∂G(X)∂Xn]，这又等于向量一个⊤,

[]1×1⏟是=[]1×n⏟一个吨=∇G(X)×[]n×1[⏟X+[]1×1⏟b.

μ是=G(μX)=一个⊤μX+b σ是2=一个⊤ΣX一个.

## 计算机代写|机器学习代写machine learning代考|Linearization of Nonlinear Functions

μ是≈G(μX) σ是2≈∇G(μX)ΣX∇G(μX)⊤为了n→n多变量情况下，线性化变换导致

μ是≅G(μX) Σ是≅Ĵ是,X(μX)ΣXĴ是,X⊤(μX)对于多元非线性函数，梯度或雅可比在期望值处进行评估μX.

