## CSE208 information theory课程简介

An introduction to information theory including topics such as entropy, relative entropy, mutual information, asymptotic equipartition property, channel capacity, differential entropy, rate distortion theory, and universal source coding. Prerequisite(s): Computer Engineering 107, or Applied Mathematics and Statistics 131 or equivalent course, or permission of instructor.

Information theory is a branch of mathematics that deals with the quantification, storage, and communication of information. It was first introduced by Claude Shannon in 1948 and has since found numerous applications in areas such as communication engineering, computer science, statistics, and physics.

## PREREQUISITES

Some of the key concepts and topics covered in an introduction to information theory course are:

1. Entropy: Entropy is a measure of the uncertainty or randomness in a random variable. It is a fundamental concept in information theory and has applications in fields such as cryptography, data compression, and statistical mechanics.
2. Relative entropy: Also known as Kullback-Leibler divergence, relative entropy measures the difference between two probability distributions. It is widely used in machine learning, data science, and statistical inference.
3. Mutual information: Mutual information measures the amount of information that two random variables share. It is a key concept in signal processing, machine learning, and neuroscience.
4. Asymptotic equipartition property (AEP): AEP states that for a large number of independent and identically distributed random variables, the probability of the sequence being close to its expected value approaches one.
5. Channel capacity: Channel capacity is the maximum rate at which information can be reliably transmitted through a communication channel. It is a fundamental concept in communication engineering and has applications in wireless communication, data transmission, and coding theory.
6. Differential entropy: Differential entropy is a measure of the amount of information in a continuous random variable. It is closely related to entropy and has applications in fields such as signal processing, machine learning, and image processing.
7. Rate distortion theory: Rate distortion theory studies the problem of lossy data compression, where the goal is to compress data while minimizing the distortion between the original and compressed signals.
8. Universal source coding: Universal source coding studies the problem of lossless data compression, where the goal is to compress data without losing any information.

Overall, an introduction to information theory provides students with a solid foundation in the mathematical theory of information and its applications in a wide range of fields.

## CSE208 information theory HELP（EXAM HELP， ONLINE TUTOR）

This is an 80 minute exam. You may have an additional 100 minutes to answer the following five questions in the notebooks provided. You are permitted 2 double-sided sheet of notes. Make sure that you have included your name, ID number (last 4 digits only) and signature in each book used ( 5 points). Read each question carefully. All statements must be justified. Computations should be simplified as much as possible.

1. 20 points Let $Y=g(X)$ be deterministic function of discrete random variable $X$.
(a) 5 points Give an example of a random variable $X$ and a function $Y=g(X)$ such that $H(Y)<$ $H(X)$.
(b) 5 points Give an example of a random variable $X$ and a function $Y=g(X)$ such that $H(Y)=$ $H(X)$
(c) 10 points Either give an example of a random variable $X$ and a function $Y=g(X)$ such that $H(Y) \geq H(X)$ or prove that $H(Y) \leq H(X)$. Hint: look at $H(X, Y)$.

(a) Let $X$ be a random variable that takes values in ${0,1}$ with equal probabilities and let $Y=X$. Then, $H(X)=H(Y)=1$, but $H(Y)=0$ since $Y$ is a constant function of $X$.

(b) Let $X$ be a random variable that takes values in ${0,1}$ with equal probabilities and let $Y=X \oplus 1$ (where $\oplus$ denotes the bitwise exclusive or operation). Then, $H(X)=H(Y)=1$, since $X$ and $Y$ are equally likely to take on either value. Furthermore, since $Y$ is a one-to-one function of $X$, $H(Y \mid X) = 0$, so by the chain rule of entropy, \begin{align*} H(X,Y) &= H(Y \mid X) + H(X) \ &= H(Y) \ &= H(X). \end{align*} Thus, $H(Y)=H(X)$.

(c) Let $X$ be a random variable that takes values in ${0,1,2}$ with probabilities $\mathbb{P}(X=0)=1/2$, $\mathbb{P}(X=1)=1/4$, and $\mathbb{P}(X=2)=1/4$. Define $Y$ by $Y=0$ if $X=0$, and $Y=1$ otherwise. Then, $H(X)=1.5$ and $H(Y)=1$, since $Y$ is a Bernoulli random variable with parameter $1/2$. However, $H(X,Y)=H(Y)+H(X\mid Y)=H(Y)+\mathbb{P}(Y=0)H(X\mid Y=0)+\mathbb{P}(Y=1)H(X\mid Y=1)=1.25+0.5+0=1.75$. Therefore, $H(Y)<H(X,Y)$, which implies $H(Y)<H(X)$.

1. 20 points Let be random variables such that
$$X \rightarrow Y \rightarrow Z, \quad Y \rightarrow Z \rightarrow X, \quad Z \rightarrow X \rightarrow Y$$
If $I(X ; Y)=3$, find $I(X ; Z)$ and $I(Y ; Z)$. Or, if these quantities cannot be known, find tight upper and lower bounds. Make sure to justify your answers.

By the chain rule of mutual information, we have \begin{align*} I(X;Y,Z) &= I(X;Z) + I(X;Y|Z) \ I(Y;X,Z) &= I(Y;Z) + I(Y;X|Z) \ I(Z;X,Y) &= I(Z;X) + I(Z;Y|X). \end{align*} Since $X\to Y\to Z$, we have $I(Y;Z|X)=0$. Also, since $Y\to Z\to X$, we have $I(Y;X|Z)=0$. Hence, we have \begin{align*} 3 &= I(X;Y) \ &= I(X;Y,Z) – I(X;Z|Y) \ &= I(X;Z) + I(X;Y|Z) – I(X;Z|Y) \ &= I(X;Z) + I(X;Y|Z,Y) \ &= I(X;Z) + I(X;Y|Z) \ &= I(X;Z) + I(X;Y,Z) – I(X;Z|Y) \ &= I(X;Z) + I(X;Y) – I(X;Z|Y). \end{align*} Thus, $I(X;Z) = 3 – I(X;Y) + I(X;Z|Y)$. Similarly, we can derive that

I(Y;Z) = 3 – I(Y;X) + I(Y;Z|X).I(Y;Z)=3−I(Y;X)+I(Y;Z∣X).

Since $I(Y;X,Z) = I(Y;X) = I(X;Y)$, we have $I(Y;Z|X) \leq I(X;Z)$. Therefore, we have

I(Y;Z) \geq 3 – I(X;Y) + I(X;Z|Y).I(Y;Z)≥3−I(X;Y)+I(X;Z∣Y).

For a lower bound of $I(X;Z)$, note that $I(X;Z|Y) \geq 0$, and hence $I(X;Z) \geq 3 – I(X;Y)$. For an upper bound of $I(X;Z)$, note that $I(X;Z) \leq \min{H(X),H(Z)}$. To see this, note that

H(X) = H(X|Z) + I(X;Z) \geq I(X;Z),H(X)=H(X∣Z)+I(X;Z)≥I(X;Z),

and similarly $H(Z) \geq I(X;Z)$. Therefore, we have

3 – I(X;Y) \leq I(X;Z) \leq \min\{H(X),H(Z)\}.3−I(X;Y)≤I(X;Z)≤min{H(X),H(Z)}.

For $I(Y;Z)$, we can use similar arguments to derive that

3 – I(X;Y) \leq I(Y;Z) \leq \min\{H(Y),H(Z)\}.3−I(X;Y)≤I(Y;Z)≤min{H(Y),H(Z)}.

## Textbooks

• An Introduction to Stochastic Modeling, Fourth Edition by Pinsky and Karlin (freely
available through the university library here)
• Essentials of Stochastic Processes, Third Edition by Durrett (freely available through
the university library here)
To reiterate, the textbooks are freely available through the university library. Note that
you must be connected to the university Wi-Fi or VPN to access the ebooks from the library
links. Furthermore, the library links take some time to populate, so do not be alarmed if
the webpage looks bare for a few seconds.