统计代写|贝叶斯分析代写Bayesian Analysis代考|DATA5711

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

## 统计代写|贝叶斯分析代写Bayesian Analysis代考|PROBABILITY MEASURES

At the core of probabilistic theory (and probabilistic modeling) lies the idea of a “sample space.” The sample space is a set $\Omega$ that consists of all possible elements over which we construct a probability distribution. In this book, the sample space most often consists of objects relating to language, such as words, phrase-structure trees, sentences, documents or sequences. As we see later, in the Bayesian setting, the sample space is defined to be a Cartesian product between a set of such objects and a set of model parameters (Section 1.5.1).

Once a sample space is determined, we can define a probability measure for that sample space. A probability measure $p$ is a function which attaches a real number to events-subsets of the sample space.
A probability measure has to satisfy three axiomatic properties:

• It has to be a non-negative function such that $p(A) \geq 0$ for any event $A$.
• For any countable disjoint sequence of events $A_{i} \subseteq \Omega, i \in{1, \ldots}$, if $A_{i} \cap A_{j}=\emptyset$ for $i \neq$ $j$, it should hold that $p\left(\bigcup_{i} A_{i}\right)=\sum_{i} p\left(A_{i}\right)$. This means that the sum of probabilities of disjoint events should equal the probability of the union of the events.
• The probability of $\Omega$ is $1: p(\Omega)=1$.

There are a few consequences from these three axiomatic properties. The first is that $p(\emptyset)=0$ (to see this, consider that $p(\Omega)+p(\emptyset)=p(\Omega \cup \emptyset)=p(\Omega)=1$ ). The second is that $p(A \cup B)=p(A)+p(B)-p(A \cap B)$ for any two events $A$ and $B$ (to see this, consider that $p(A \cup B)=p(A)+p(B \backslash(A \cap B))$ and that $p(B)=p(B \backslash(A \cap B))+p(A \cap B))$. And finally, the complement of an event $A, \Omega \backslash A$ is such that $p(\Omega \backslash A)=1-p(A)$ (to see this, consider that $1=p(\Omega)=p((\Omega \backslash A) \cup A)=p(\Omega \backslash A)+p(A)$ for any event $A)$.

In the general case, not every subset of the sample space should be considered an event.
From a measure-theoretic point of view for probability theory, an event must be a “measurable set.” The collection of measurable sets of a given sample space needs to satisfy some axiomatic properties. ${ }^{1}$ A discussion of measure theory is beyond the scope of this book, but see Ash and Doléans-Dade (2000) for a thorough investigation of this topic.

For our discrete sample spaces, consisting of linguistic structures or other language-related discrete objects, this distinction of measurable sets from arbitrary subsets of the sample space is not crucial. We will consider all subsets of the sample space to be measurable, which means they could be used as events. For continuous spaces, we will be using well-known probability measures that rely on Lebesgue’s measure. This means that the sample space will be a subset of a Euclidean space, and the set of events will be the subsets of this space that can be integrated over using Lebesgue’s integration.

## 统计代写|贝叶斯分析代写Bayesian Analysis代考|RANDOM VARIABLES

In their most basic form, random variables are functions that map each $w \in \Omega$ to a real value. They are often denoted by capital letters such as $X$ and $Z$. Once such a function is defined, under some regularity conditions, it induces a probability measure over the real numbers. More specifically, for any $A \subseteq \mathbb{R}$ such that the pre-image, $X^{-1}(A)$, defined as, ${\omega \in \Omega \mid X(\omega) \in A}$, is an event, its probability is:
$$p_{X}(A)=p(X \in A)=p\left(X^{-1}(A)\right),$$
where $p_{X}$ is the probability measure induced by the random variable $X$ and $p$ is a probability measure originally defined for $\Omega$. The sample space for $p_{X}$ is $\mathbb{R}$. The set of events for this sample space includes all $A \subseteq \mathbb{R}$ such that $X^{-1}(A)$ is an event in the original sample space $\Omega$ of $p$.
It is common to define a statistical model directly in terms of random variables, instead of explicitly defining a sample space and its corresponding real-value functions. In this case, random variables do not have to be interpreted as real-value functions and the sample space is understood to be a range of the random variable function. For example, if one wants to define a probability distribution over a language vocabulary, then one can define a random variable $X(\omega)=\omega$ with $\omega$ ranging over words in the vocabulary. Following this, the probability of a word in the vocabulary is denoted by $p(X \in{\omega})=p(X=\omega)$.

## 统计代写|贝叶斯分析代写Bayesian Analysis代考|CONTINUOUS AND DISCRETE RANDOM VARIABLES

This book uses the two most common kinds of random variables available in statistics: continuous and discrete. Continuous random variables take values in a continuous space, usually a subspace of $\mathbb{R}^{d}$ for $d \geq 1$. Discrete random variables, on the other hand, take values from a discrete, possibly countable set. In this book, discrete variables are usually denoted using capital letters such as $X, Y$ and $Z$, while continuous variables are denoted using greek letters, such as $\theta$ and $\mu$.

The continuous variables in this book are mostly used to define a prior over the parameters of a discrete distribution, as is usually done in the Bayesian setting. See Section $1.5 .2$ for a discussion of continuous variables. The discrete variables, on the other hand, are used to model structures that will be predicted (such as parse trees, part-of-speech tags, alignments, clusters) or structures which are observed (such as a sentence, a string over some language vocabulary or other such sequences).

The discrete variables discussed in this book are assumed to have an underlying probability mass function (PMF)-i.e., a function that attaches a weight to each element in the sample space, $p(x)$. This probability mass function induces the probability measure $p(X \in A)$, which satisfies:
$$p(X \in A)=\sum_{x \in A} p(x),$$
where $A$ is a subset of the possible values $X$ can take. Note that this equation is the result of the axiom of probability measures, where the probability of an event equals the sum of probabilities of disjoint events that precisely cover that event (singletons, in our case).

## 统计代写|贝叶斯分析代写Bayesian Analysis代考|PROBABILITY MEASURES

• 它必须是一个非负函数，使得 $p(A) \geq 0$ 对于任何事件 $A$.
• 对于任何可数的不相交的事件序列 $A_{i} \subseteq \Omega, i \in 1, \ldots$ ，如果 $A_{i} \cap A_{j}=\emptyset$ 为了 $i \neq j$ ，它应该认为 $p\left(\bigcup_{i} A_{i}\right)=\sum_{i} p\left(A_{i}\right)$. 这意味着不相交事件的概率之和应该等于事件联合的概率。
• 的概率 $\Omega$ 是 $1: p(\Omega)=1$.
这三个公理性质有一些结果。第一个是 $p(\emptyset)=0$ (要看到这一点，请考虑
$p(\Omega)+p(\emptyset)=p(\Omega \cup \emptyset)=p(\Omega)=1)$ 。第二个是 $p(A \cup B)=p(A)+p(B)-p(A \cap B)$ 对于任何两 个事件 $A$ 和 $B$ (要看到这一点，请考虑 $p(A \cup B)=p(A)+p(B \backslash(A \cap B))$ 然后
$p(B)=p(B \backslash(A \cap B))+p(A \cap B))$. 最后，事件的补充 $A, \Omega \backslash A$ 是这样的 $p(\Omega \backslash A)=1-p(A)$ (要看到 这一点，请考虑 $1=p(\Omega)=p((\Omega \backslash A) \cup A)=p(\Omega \backslash A)+p(A)$ 对于任何事件 $A)$.
在一般情况下，并非样本空间的每个子集都应被视为一个事件。
从概率论的测度论观点来看，一个事件必须是一个”可测集”。给定样本空间的可测量集的集合需要满足一些公理性 质。 1 测度论的讨论超出了本书的范围，但请参阅 Ash 和 Doléans-Dade (2000) 对该主题的深入研究。
对于我们的离散样本空间，由语言结构或其他与语言相关的离散对象组成，可测量集与样本空间任意子集的区别并 不重要。我们将认为样本空间的所有子集都是可测量的，这意味着它们可以用作事件。对于连续空间，我们将使用 依赖于 Lebesgue 测度的众所周知的概率测度。这意味着样本空间将是欧几里得空间的子集，而事件集将是该空间 的子集，可以使用 Lebesgue 积分进行积分。

## 统计代写|贝叶斯分析代写Bayesian Analysis代考|RANDOM VARIABLES

$$p_{X}(A)=p(X \in A)=p\left(X^{-1}(A)\right),$$

$$p(X \in \omega)=p(X=\omega) \text {. }$$

## 统计代写|贝叶斯分析代写Bayesian Analysis代考|CONTINUOUS AND DISCRETE RANDOM VARIABLES

$$p(X \in A)=\sum_{x \in A} p(x)$$

