### 数学代写|信息论作业代写information theory代考|Information Measures for Continuous

## 数学代写|信息论作业代写information theory代考|Random Variables

The definitions of mutual information for discrete random variables can be directly extended to continuous random variables. Let $X$ and $Y$ be random variables with joint probability density function (pdf) $p(x, y)$ and marginal pdfs $p(x)$ and $p(y)$. The average mutual information between $X$ and $Y$ is defined as follows.

Definition $1.8$ The Average Mutual Information between two continuous random variables $X$ and $Y$ is defined as
$$I(X ; Y)=\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} p(x) p(y \mid x) \log \frac{p(y \mid x) p(x)}{p(x) p(y)} d x d y$$
It should be pointed out that the definition of average mutual information can be carried over from discrete random variables to continuous random variables, but the concept and physical interpretation cannot. The reason is that the information content in a continuous random variable

is actually infinite, and we require infinite number of bits to represent a continuous random variable precisely. The self information, and hence the entropy, is infinite. To get around the problem we define a quantity called the differential entropy.
Definition 1.9 The Differential Entropy of a continuous random variable $X$ is defined as
$$h(X)=-\int_{-\infty}^{\infty} p(x) \log p(x) d x$$
Again, it should be understood that there is no physical meaning attached to the above quantity. We carry on with extending our definitions further.

Definition 1.10 The Average Conditional Entropy of a continuous random variable $X$ given $Y$ is defined as
$$h(X \mid Y)=-\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} p(x, y) \log p(x \mid y) d x d y$$
The average mutual information can be expressed as
$$I(X ; Y)=h(X)-h(X \mid Y)=h(Y)-h(Y \mid X)$$
Following is the list of some properties of differential entropy:

1. $h(a X)=h(X)+\log |a|$.
2. If $X$ and $Y$ are independent, then $h(X+Y) \geq h(X)$. This is because $h(X+Y) \geq h(X+Y \mid Y)=$ $h(X \mid Y)=h(X)$.

## 数学代写|信息论作业代写information theory代考|Relative Entropy

An interesting question to ask is how similar (or different) are two probability distributions? Relative entropy is used as a measure of distance between two distributions.

Definition 1.11 The Relative Entropy or Kullback Leibler (KL) Distance between two probability mass functions $p(x)$ and $q(x)$ is defined as
$$D(p | q)=\sum_{x \in X} p(x) \log \left(\frac{p(x)}{q(x)}\right)$$
It can be interpreted as the expected value of $\log \left(\frac{p(x)}{q(x)}\right)$.
Example 1.9 Consider a Gaussian distribution $p(x)$ with mean and variance given by $\left(\mu_{1}, \sigma_{1}^{2}\right)$, and another Gaussian distribution $q(x)$ with mean and variance given by $\left(\mu_{2}, \sigma_{2}^{2}\right)$. Using (1.33), we can find the KL distance between two Gaussian distributions as

$$D(p | q)=\frac{1}{2}\left[\frac{\sigma_{1}^{2}}{\sigma_{2}^{2}}+\left(\frac{\mu_{2}-\mu_{1}}{\sigma_{2}}\right)^{2}-1-\log {2}\left(\frac{\sigma{1}^{2}}{\sigma_{2}^{2}}\right)\right]$$
The distance becomes zero when the two distributions are identical, i.e., $\mu_{1}=\mu_{2}$ and $\sigma_{1}^{2}=\sigma_{2}^{2}$. It is interesting to note that when $\mu_{1} \neq \mu_{2}$, the distance is minimum for $\sigma_{1}^{2}=\sigma_{2}^{2}$. This minimum distance is given by
$$D_{\min }(p | q)=\frac{1}{2}\left(\frac{\mu_{2}-\mu_{1}}{\sigma_{2}}\right)^{2}$$
Also, the KL distance is infinite if either $\sigma_{1}^{2} \rightarrow 0$ or $\sigma_{2}^{2} \rightarrow 0$, that is, if either of the distributions tends to the Dirac delta.
The average mutual information can be seen as the relative entropy between the joint distribution, $p(x, y)$, and the product distribution, $p(x) p(y)$, i.e.,
$$I(X ; Y)=D(p(x, y) | p(x) p(y))$$
We note that, in general, $D(p | q) \neq D(q | p)$. Thus, even though the relative entropy is a distance measure, it does not follow the symmetry property of distances. To overcome this, another measure, called the Jensen Shannon distance, is sometimes used to define the similarity between two distributions.

Definition $1.12$ The Jensen Shannon Distance between two probability mass functions $p(x)$ and $q(x)$ is defined as
$$J S D(p | q)=\frac{1}{2} D(p | m)+\frac{1}{2} D(q | m)$$
where $m=\frac{1}{2}(p+q)$.
DID YOU If the base of the logarithm is 2 , then, $0 \leq J S D(p | q) \leq 1$. Note that Jensen Shannon distance is
KNOW sometimes referred to as Jensen Shannon divergence or Information Radius in literature.

## 数学代写|信息论作业代写information theory代考|Huffman Coding

We will now study an algorithm for constructing efficient source codes for a DMS with source symbols that are not equally probable. A variable length encoding algorithm was suggested by Huffiman in 1952 , based on the source symbol probabilities $P\left(x_{i}\right), i=1,2, \ldots, L$. The algorithm is optimal in the sense that the average number of bits required to represent the source symbols is a minimum provided the prefix condition is met. The steps of the Huffman coding algorithm are as follows:
(i) Arrange the source symbols in decreasing order of their probabilities.
(ii) Take the bottom two symbols and tie them together as shown in Fig. 1.11. Add the probabilities of the two symbols and write it on the combined node. Label the two branches with a ‘ $l$ ‘ and a ‘ 0 ‘ as depicted in Fig. 1.11.
(iii) Treat this sum of probabilities as a new probability associated with a new symbol. Again pick the two smallest probabilities, tie them together to form a new probability. Each time we perform the combination of two symbols we reduce the total number of symbols by one. Whenever we tie together two probabilities (nodes) we label the two branches with $\mathrm{a}$ ‘ 1 ‘ and $\mathrm{a} \mathrm{} ~ 0$ ‘.

(iv) Continue the procedure until only one probability is left (and it should be 1 if your addition is right!). This completes the construction of the Huffman tree.
(v) To find out the prefix codeword for any symbol, follow the branches from the final node back to the symbol. While tracing back the route, read out the labels on the branches. This is the codeword for the symbol.
The algorithm can be easily understood using the following example.

## 数学代写|信息论作业代写information theory代考|Random Variables

H(X)=−∫−∞∞p(X)日志⁡p(X)dX

H(X∣是)=−∫−∞∞∫−∞∞p(X,是)日志⁡p(X∣是)dXd是

1. H(一个X)=H(X)+日志⁡|一个|.
2. 如果X和是是独立的，那么H(X+是)≥H(X). 这是因为H(X+是)≥H(X+是∣是)= H(X∣是)=H(X).

## 数学代写|信息论作业代写information theory代考|Relative Entropy

D(p|q)=∑X∈Xp(X)日志⁡(p(X)q(X))

D(p|q)=12[σ12σ22+(μ2−μ1σ2)2−1−日志⁡2(σ12σ22)]

D分钟(p|q)=12(μ2−μ1σ2)2

Ĵ小号D(p|q)=12D(p|米)+12D(q|米)

DID YOU 如果对数的底是 2 ，那么，0≤Ĵ小号D(p|q)≤1. 请注意，Jensen Shannon 距离

## 数学代写|信息论作业代写information theory代考|Huffman Coding

(i) 以它们的概率的降序排列源符号。
(ii) 如图 1.11 所示，将底部的两个符号绑在一起。将两个符号的概率相加，并将其写在组合节点上。用’标记两个分支l’和一个’0’，如图1.11所示。
(iii) 将此概率总和视为与新符号相关联的新概率。再次选择两个最小的概率，将它们连接在一起形成一个新的概率。每次我们执行两个符号的组合时，我们都会将符号的总数减一。每当我们将两个概率（节点）联系在一起时，我们将两个分支标记为一个’1′ 和一个 0 ‘.

(iv) 继续这个过程，直到只剩下一个概率（如果你的加法正确，它应该是 1！）。这样就完成了 Huffman 树的构建。
(v) 要找出任何符号的前缀码字，请沿着从最终节点到符号的分支。一边追溯路线，一边读出树枝上的标签。这是符号的代码字。

## 有限元方法代写

