### 计算机代写|机器学习代写machine learning代考|Probability Distributions

## 计算机代写|机器学习代写machine learning代考|Univariate Normal

The probability density function (PDF) for a Normal random variable is defined over the real numbers $x \in \mathbb{R} . X \sim \mathcal{N}\left(x ; \mu, \sigma^{2}\right)$ is parameterized by its mean $\mu$ and variance $\sigma^{2}$, so its PDF is
$$f_{X}(x)=\mathcal{N}\left(x ; \mu, \sigma^{2}\right)=\frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^{2}\right)$$
Figure 4.1 presents an example of PDF and cumulative distribution function (CDF) with parameters $\mu=0$ and $\sigma=1$. The mode that is, the most likely value-corresponds to the mean. Changing the mean $\mu$ causes a translation of the distribution. Increasing the standard deviation $\sigma$ causes a proportional increase in the PDF’s dispersion. The Normal CDF is presented in figure 4.1b. Its formulation is obtained through integration, where the integral can

be formulated using the error function erf(-),
\begin{aligned} F_{X}(x) &=\int_{-\infty}^{x} \frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{1}{2}\left(\frac{x^{\prime}-\mu}{\sigma}\right)^{2}\right) d x^{\prime} \ &=\frac{1}{2}\left(1+\operatorname{erf}\left(\frac{x-\mu}{\sigma \sqrt{2}}\right)\right) \end{aligned}
Figure $4.2$ illustrates the successive steps taken to construct the univariate Normal PDF. Within the innermost parenthesis of the PDF formulation is a linear function $\frac{x-\mu}{\sigma}$, which centers $x$ on the mean $\mu$ and normalizes it with the standard deviation $\sigma$. This first term is then squared, leading to a positive number over all its domain except at the mean, where it is equal to zero. Taking the negative exponential of this second term leads to a bell-shaped curve, where the value equals one $(\exp (0)=1)$ at the mean $x=\mu$ and where there are inflexion points at $\mu \pm \sigma$. At this step, the curve is proportional to the final Normal PDF. Only the normalization constant is missing to ensure that $\int_{-\infty}^{\infty} f(x) d x=1$. The normalization constant is obtained by integrating the exponential term,
$$\int_{-\infty}^{+\infty} \exp \left(-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^{2}\right) d x=\sqrt{2 \pi} \sigma$$
Dividing the exponential term by the normalization constant in equation $4.1$ results in the final formulation for the Normal PDF. Note that for $x=\mu, f(\mu) \neq 1$ because the PDF has been normalized so its integral is one.

## 计算机代写|机器学习代写machine learning代考|Multivariate Normal

The joint probability density function (PDF) for two Normal random variables $\left{X_{1}, X_{2}\right}$ is given by
$$f_{X_{1} X_{2}}\left(x_{1}, x_{2}\right)=\frac{1}{2 \pi \sigma_{1} \sigma_{2} \sqrt{1-\rho^{2}}} \exp \left(-\frac{1}{2\left(1-\rho^{2}\right)}\left(\left(\frac{x_{1}-\mu_{1}}{\sigma_{1}}\right)^{2}+\left(\frac{x_{2}-\mu_{2}}{\sigma_{2}}\right)^{2}-2 \rho\left(\frac{x_{1}-\mu_{1}}{\sigma_{1}}\right)\left(\frac{x_{2}-\mu_{2}}{\sigma_{2}}\right)\right)\right)$$
There are three terms within the parentheses inside the exponential.
The first two are analogous to the quadratic terms for the univariate case. The third one includes a new parameter $\rho$ describing the correlation coefficient between $X_{1}$ and $X_{2}$. Together, these three $\begin{array}{cl}\text { terms describe the equation of a 2-D ellipse centered at }\left[\mu_{1} \mu_{2}\right]^{\top} . & \begin{array}{l}\text { Multivariate Normal } \ \mathbf{x} \in \mathbb{R}^{n}: \mathbf{X} \sim \mathcal{N}\left(\mathbf{x} ; \mu_{\mathbf{x}}, \boldsymbol{\Sigma} \mathbf{x}\right)\end{array}\end{array}$ random variables $\mathbf{X}=\left[\begin{array}{llll}X_{1} & X_{2} & \cdots & X_{n}\end{array}\right]^{\top}$ is described by $\mathbf{x} \in \mathbb{R}^{n}$ :
$\mathbf{X} \sim \mathcal{N}\left(\mathbf{x}: \boldsymbol{\mu}{\mathbf{X}}, \mathbf{\Sigma}{\mathbf{X}}\right)$, where $\boldsymbol{\mu}{\mathbf{X}}=\left[\mu{1} \mu_{2} \cdots \mu_{n}\right]^{\top}$ is a vector

containing mean values and $\boldsymbol{\Sigma}{\mathbf{X}}$ is the covariance matrix, $$\boldsymbol{\Sigma}{\mathbf{x}}=\mathbf{D}{\mathbf{X}} \mathbf{R} \mathbf{x} \mathbf{D}=\left[\begin{array}{cccc} \sigma{1}^{2} & \rho_{12} \sigma_{1} \sigma_{2} & \cdots & \rho_{1 n} \sigma_{1} \sigma_{n} \ & \sigma_{2}^{2} & \cdots & \rho_{2 n} \sigma_{2} \sigma_{n} \ & & \cdots & \vdots \ \text { sym. } & & & \sigma_{n}^{2} \end{array}\right]{n \times n} .$$ $\mathbf{D}{\mathbf{X}}$ is the standard deviation matrix containing the standard deviation of each random variable on its main diagonal, and $\mathbf{R} \mathbf{x}$ is the symmetric (sym.) correlation matrix containing the correlation coefficient for each pair of random variables,
$$\mathbf{D}{\mathbf{X}}=\left[\begin{array}{cccc} \sigma{1} & 0 & 0 & 0 \ & \sigma_{2} & 0 & 0 \ & & \ddots & 0 \ \text { sym. } & & & \sigma_{n} \end{array}\right], \mathbf{R}{\mathbf{X}}=\left[\begin{array}{cccc} 1 & \rho{12} & \cdots & \rho_{1 n} \ & 1 & \cdots & \rho_{2 n} \ & & \cdots & \rho_{n-1 n} \ \text { sym. } & & & 1 \end{array}\right]$$
Note that a variable is linearly correlated with itself so the main diagonal terms for the correlation matrix are $\left[\mathbf{R}{\mathbf{x}}\right]{i i}=1$, $\forall i$. The multivariate Normal joint PDF is described by
$$f_{\mathbf{X}}(\mathbf{x})=\frac{1}{(2 \pi)^{n / 2}\left(\operatorname{det} \boldsymbol{\Sigma}{\mathbf{X}}\right)^{1 / 2}} \exp \left(-\frac{1}{2}\left(\mathbf{x}-\mu{\mathbf{X}}\right)^{\top} \boldsymbol{\Sigma}{\mathbf{X}}^{-1}\left(\mathbf{x}-\boldsymbol{\mu}{\mathbf{X}}\right)\right),$$
where the terms inside the exponential describe an $n$-dimensional ellipsoid centered at $\boldsymbol{\mu}{\mathbf{X}}$. The directions of the principal axes of this ellipsoid are described by the eigenvector (see \$2.4.2) of the covariance matrix$\boldsymbol{\Sigma}{\mathbf{X}}$, and their lengths by the eigenvalues. Figure$4.3$presents an example of a covariance matrix decomposed into its eigenvector and eigenvalues. The curves overlaid on the joint PDF describe the marginal PDFs in the eigen space. For the multivariate Normal joint PDF formulation, the term on the left of the exponential is again the normalization constant, which now includes the determinant of the covariance matrix. As presented in$\S 2.4 .1$, the determinant quantifies how much the covariance matrix$\mathbf{\Sigma}{\mathbf{X}}$is scaling the space$\mathbf{x}$. Figure$4.4$presents examples of bivariate Normal PDF and CDF with parameters$\mu{1}=0, \sigma_{1}=2, \mu_{2}=0, \sigma_{2}=1$, and$\rho=0.6$. For the bivariate CDF, notice how evaluating the upper bound for one variable leads to the marginal CDF, represented by the bold red line, for the other variable. ## 计算机代写|机器学习代写machine learning代考|Properties A multivariate Normal random variable follow several properties. Here, we insist on six: 1. It is completely defined by its mean vector$\boldsymbol{\mu}_{\mathbf{X}}$and covariance$\operatorname{matrix} \boldsymbol{\Sigma} \mathbf{X}$. 2. Its marginal distributions are also Normal, and the PDF of any marginal is given by $$x_{i}: X_{i} \sim \mathcal{N}\left(x_{i} ;\left[\boldsymbol{\mu}{\mathbf{X}}\right]{i}\left[\boldsymbol{\Sigma}{\mathbf{X}}\right]{i i}\right)$$ 3. The absence of correlation implies statistical independence. Note that this is not generally true for other types of random variables (see$\$3.3 .5$ ),
$$\rho_{i j}=0 \Leftrightarrow X_{i} \Perp X_{j} .$$
4. The central limit theorem (CLT) states that, under some conditions, the asymptotic distribution obtained from the normalized sum of independent identically distributed (iid) random variables (normally distributed or not) is Normal. Given $X_{i}, \forall i \in{1, \cdots, n}$, a set of iid random variables with expected value $\mathbb{E}\left[X_{i}\right]=\mu_{X}$ and finite variance $\sigma_{X}^{2}$, the PDF of $Y=\sum_{i=1}^{n} X_{i}$ approaches $\mathcal{N}\left(n \mu_{X}, n \sigma_{X}^{2}\right)$, for $n \rightarrow \infty$. More formally, the CLT states that
$$\sqrt{n}\left(\frac{Y}{n}-\mu_{X}\right) \stackrel{d}{\rightarrow} \mathcal{N}\left(0, \sigma_{X}^{2}\right)$$
where $\stackrel{d}{\rightarrow}$ means converges in distribution. In practice, when obacrving the outcomes of real-life phenomena, it is common to obtain empirical distributions that are similar to the Normal distribution. We can see the parallel where these phenomena are themselves issued from the superposition of several phenomena. This property is key in explaining the widespread usage of the Normal probability distribution.

## 计算机代写|机器学习代写machine learning代考|Properties

1. 它完全由它的平均向量定义μX和协方差矩阵⁡ΣX.
2. 它的边缘分布也是正态分布，任何边缘的 PDF 由下式给出
X一世:X一世∼ñ(X一世;[μX]一世[ΣX]一世一世)
3. 缺乏相关性意味着统计独立性。请注意，这通常不适用于其他类型的随机变量（参见$3.3.5 ), ρ一世j=0⇔X一世\珀普Xj. 4. 中心极限定理 (CLT) 指出，在某些条件下，从独立同分布 (iid) 随机变量（正态分布或非正态分布）的归一化总和获得的渐近分布是正态分布。给定X一世,∀一世∈1,⋯,n，一组具有期望值的独立同分布随机变量和[X一世]=μX和有限方差σX2, 的 PDF是=∑一世=1nX一世方法ñ(nμX,nσX2)， 为了n→∞. 更正式地说，CLT 指出 n(是n−μX)→dñ(0,σX2) 在哪里→d均值在分布中收敛。在实践中，在观察现实生活现象的结果时，通常会获得类似于正态分布的经验分布。我们可以看到这些现象本身是从几种现象的叠加中产生的平行线。该属性是解释正态概率分布广泛使用的关键。 5. 正态随机变量的线性函数的输出也是正态的。给定X:X∼ñ(X;μX,ΣX)和一个线性函数是=一个X+b，$ 3.4.1中描述的线性变换的属性允许获得
是∼ñ(是;一个μX+b,一个ΣX一个⊤)让我们考虑线性函数的简化情况和=X+是对于两个随机变量X:X∼ñ(X;μX,σX2),是:是∼ ñ(是;μ是,σ是2). 它们的总和由 $Z \sim \mathcal{N}\left(z ; \mu_{X}+ \mu {Y}, \sigma{X}^{2}+\sigma_{Y}^{2} 描述+2 \rho_{XY} \sigma_{X} \sigma_{Y}\right) .$

