### 数学代写|信息论作业代写information theory代考|Run Length Encoding

Run-length Encoding, or RLE is a technique used to reduce the size of a repeating string of characters. This repeating string is called a run. Typically RLE encodes a run of symbols into two bytes, a count and a symbol. RLE can compress any type of data regardless of its information content, but the content of data to be compressed affects the compression ratio. RLE cannot achieve high compression ratios compared to other compression methods, but it is easy to implement and is quick to execute. Run-length encoding is supported by most bitmap file formats such as TIFF, JPG, BMP, PCX and fax machines.

We will restrict ourselves to that portion of the PCX data stream that actually contains the coded image, and not those parts that store the color palette and image information such as number of lines, pixels per line, file and the coding method.

The basic scheme is as follows. If a string of pixels are identical in color value, encode them as a special flag byte which contains the count followed by a byte with the value of the repeated pixel. If the pixel is not repeated, simply encode it as the byte itself. Such simple schemes can often become more complicated in practice. Consider that in the above scheme, if all 256 colors in a palette are used in an image, then, we need all 256 values of a byte to represent those colors. Hence if we are going to use just bytes as our basic code unit, we don’t have any possible unused byte values that can be used as a flag/count byte. On the other hand, if we use two bytes for every coded pixel to leave room for the flag/count combinations, we might double the size of pathological images instead of compressing them.
The compromise in the PCX format is based on the belief of its designers than many user-created drawings (which was the primary intended output of their software) would not use all 256 colors. So, they optimized their compression scheme for the case of up to 192 colors only. Images with more colors will also probably get good compression, just not quite as good, with this scheme.

Although we live in an analog world, most of the communication takes place in the digital form. Since most natural sources (e.g. speech, video etc.) are analog, they are first sampled, quantized and then processed. However, the representation of an arbitrary real number requires an infinite number of bits. Thus, a finite representation of a continuous random variable can never be perfect. Consider an analog message waveform $x(t)$ which is a sample waveform of a stochastic process $X(t)$. Assuming $X(t)$ is a bandlimited, stationary process, it can be represented by a sequence of uniform samples taken at the Nyquist rate. These samples are quantized in amplitude and encoded as a sequence of binary digits. A simple encoding strategy can be to define $L$ levels and encode every sample using
\begin{aligned} &R=\log {2} L \text { bits if } L \text { is a power of } 2 \text {, or } \ &R=\left\lfloor\log {2} L\right\rfloor+1 \text { bits if } L \text { is not a power of } 2 \end{aligned}
If all levels are not equally probable we may use entropy coding for a more efficient representation. In order to represent the analog waveform more accurately, we need more number of levels, which would imply more number of bits per sample. Theoretically we need infinite bits per sample to perfectly represent an analog source. Quantization of amplitude results in data compression at the cost of signal distortion. It’s a form of lossy data compression. Distortion implies some measure of difference between the actual source samples $\left{x_{k}\right}$ and the corresponding quantized value $\left{\tilde{x}_{k}\right}$.

In this section, we look at optimum quantizers design. Consider a continuous amplitude signal whose amplitude is not uniformly distributed, but varies according to a certain probability density function, $p(x)$. We wish to design the optimum scalar quantizer that minimizes some function of the quantization error $q=\tilde{x}-x$, where $\tilde{x}$ is the quantized value of $x$. The distortion resulting due to the quantization can be expressed as
$$D=\int_{-\infty}^{\infty} f(\tilde{x}-x) p(x) d x$$
where $f(\tilde{x}-x)$ is the desired function of the error. An optimum quantizer is one that minimizes $D$ by optimally selecting the output levels and the corresponding input range of each output
level. The resulting optimum quantizer is called the Lloyd-Max quantizer. For an L-level quantizer the distortion is given by
$$D=\sum_{k=1}^{L} \int_{x_{k-1}}^{x_{k}} f\left(\tilde{x}_{k}-x\right) p(x) d x$$

The necessary conditions for minimum distortion are obtained by differentiating $D$ with respect to $\left{x_{k}\right}$ and $\left{\tilde{x}{k}\right}$. As a result of the differentiation process we end up with the following system of equations $$\begin{array}{ll} f\left(\tilde{x}{k}-x_{k}\right)=f\left(\tilde{x}{k+1}-x{k}\right), & k=1,2, \ldots, L-1 \ \int_{x_{k-1}}^{x_{k}} f^{\prime}\left(\tilde{x}{k+1}-x\right) p(x) d x, & k=1,2, \ldots, L \end{array}$$ For $f(x)=x^{2}$, i.e., the mean square value of the distortion, the above equations simplify to $$\begin{array}{ll} x{k}=\frac{1}{2}\left(\tilde{x}{k}+\tilde{x}{k+1}\right), & k=1,2, \ldots, L-1 \ \int_{x_{k-1}}^{x_{k}}\left(\tilde{x}{k}-x\right) p(x) d x=0, & k=1,2, \ldots, L \end{array}$$ The non uniform quantizers are optimized with respect to the distortion. However, each quantized sample is represented by equal number of bits (say, $R$ bits/sample). It is possible to have a more efficient variable length coding. The discrete source outputs that result from quantization can be characterized by a set of probabilities $p{k^{*}}$. These probabilities can then be used to design efficient variable length codes (source coding). In order to compare the performance of different nonuniform quantizers, we first fix the distortion, $D$, and then compare the average number of bits required per sample.

R=日志⁡2大号 位如果 大号 是一种力量 2， 或者  R=⌊日志⁡2大号⌋+1 位如果 大号 不是一种力量 2

D=∫−∞∞F(X~−X)p(X)dX

D=∑ķ=1大号∫Xķ−1XķF(X~ķ−X)p(X)dX

F(X~ķ−Xķ)=F(X~ķ+1−Xķ),ķ=1,2,…,大号−1 ∫Xķ−1XķF′(X~ķ+1−X)p(X)dX,ķ=1,2,…,大号为了F(X)=X2，即失真的均方值，上述方程简化为

Xķ=12(X~ķ+X~ķ+1),ķ=1,2,…,大号−1 ∫Xķ−1Xķ(X~ķ−X)p(X)dX=0,ķ=1,2,…,大号非均匀量化器针对失真进行了优化。但是，每个量化样本都由相同数量的比特表示（例如，R位/样本）。可以有更有效的可变长度编码。量化产生的离散源输出可以用一组概率来表征pķ∗. 然后可以使用这些概率来设计有效的可变长度代码（源编码）。为了比较不同非均匀量化器的性能，我们首先修复失真，D，然后比较每个样本所需的平均位数。

