数学代写|信息论代写information theory代考|ECE4042

## 数学代写|信息论代写information theory代考|Definition of entropy of a continuous random variable

Up to now we have assumed that a random variable $\xi$, with entropy $H_{\xi}$, can take values from some discrete space consisting of either a finite or a countable number of elements, for instance, messages, symbols, etc. However, continuous variables are also widespread in engineering, i.e. variables (scalar or vector), which can take values from a continuous space $X$, most often from the space of real numbers. Such a random variable $\xi$ is described by the probability density function $p(\xi)$ that assigns the probability
$$\Delta P=\int_{\xi \varepsilon \Delta X} p(\xi) d \xi \approx p(A) \Delta V \quad(A \in \Delta X)$$
of $\xi$ appearing in region $\Delta X$ of the specified space $X$ with volume $\Delta V(d \xi=d V$ is a differential of the volume).

How can we define entropy $H_{\xi}$ for such a random variable? One of many possible formal ways is the following: In the formula
$$H_{\xi}=-\sum_{\xi} P \xi \ln P(\xi)=-\mathbb{E}[\ln P(\xi)]$$
appropriate for a discrete variable we formally replace probabilities $P(\xi)$ in the argument of the logarithm by the probability density and, thereby, consider the expression
$$H_{\xi}=-\mathbb{E}[\ln p(\xi)]=-\int_{x} p(\xi) \ln p(\xi) d \xi .$$
This way of defining entropy is not well justified. It remains unclear how to define entropy in the combined case, when a continuous distribution in a continuous space coexists with concentrations of probability at single points, i.e. the probability density contains delta-shaped singularities. Entropy (1.6.2) also suffers from the drawback that it is not invariant, i.e. it changes under a non-degenerate transformation of variables $\eta=f(\xi)$ in contrast to entropy (1.6.1), which remains invariant under such transformations.

## 数学代写|信息论代写information theory代考|Properties of entropy in the generalized version

Entropy (1.6.13), (1.6.16) defined in the previous section possesses a set of properties, which are analogous to the properties of an entropy of a discrete random variable considered earlier. Such an analogy is quite natural if we take into account the interpretation of entropy (1.6.13) (provided in Section 1.6) as an asymptotic case (for large $N$ ) of entropy (1.6.1) of a discrete random variable.

The non-negativity property of entropy, which was discussed in Theorem $1.1$, is not always satisfied for entropy (1.6.13), (1.6.16) but holds true for sufficiently large $N$. The constraint
$$H_{\xi}^{P / Q} \leqslant \ln N$$
results in non-negativity of entropy $H_{\xi}$.
Now we move on to Theorem $1.2$, which considered the maximum value of entropy. In the case of entropy (1.6.13), when comparing different distributions $P$ we need to keep measure $v$ fixed. As it was mentioned, quantity (1.6.17) is non-negative and, thus, (1.6.16) entails the inequality
$$H_{\xi} \leqslant \ln N .$$
At the same time, if we suppose $P=Q$, then, evidently, we will have
$$H_{\xi}=\ln N .$$
This proves the following statement that is an analog of Theorem $1.2$.

## 数学代写|信息论代写information theory代考|Encoding of discrete information

The definition of the amount of information, given in Chapter 1, is justified when we deal with a transformation of information from one kind into another, i.e. when considering encoding of information. It is essential that the law of conservation of information amount holds under such a transformation. It is very useful to draw an analogy with the law of conservation of energy. The latter is the main argument for introducing the notion of energy. Of course, the law of conservation of information is more complex than the law of conservation of energy in two respects. The law of conservation of energy establishes an exact equality of energies, when one type of energy is transformed into another. However, in transforming information we have a more complex relation, namely ‘not greater’ $(\leqslant)$, i.e. the amount of information cannot increase. The equality sign corresponds to optimal encoding. Thus, when formulating the law of conservation of information, we have to point out that there possibly exists such an encoding, for which the equality of the amounts of information occurs.

The second complication is that the equality is not exact. It is approximate, asymptotic, valid for complex (large) messages and for composite random variables. The larger a system of messages is, the more exact such a relation becomes. The exact equality sign takes place only in the limiting case. In this respect, there is an analogy with the laws of statistical thermodynamics, which are valid for large thermodynamic systems consisting of a large number (of the order of the Avogadro number) of molecules.

When conducting encoding, we assume that a long sequence of messages $\xi_{1}, \xi_{2}$, … is given together with their probabilities, i.e. a sequence of random variables. Therefore, the amount of information (entropy $H$ ) corresponding to this sequence can be calculated. This information can be recorded and transmitted by different realizations of the sequence. If $M$ is the number of such realizations, then the law of conservation of information can be expressed by the equality $H=\ln M$, which is complicated by the two above-mentioned factors (i.e. actually. $H \leqslant \ln M$ ).

Two different approaches may be used for solving the encoding problem. One can perform encoding of an infinite sequence of messages, i.e. online (or ‘sliding’) encoding. The inverse procedure, i.e. decoding, will be performed analogously.

