FNR 6560 - 统计代写答疑辅导

标签： FNR 6560

统计代写|贝叶斯统计代写Bayesian statistics代考|STA421

Posted on 2022年12月19日2022年12月19日 by statistics-lab

如果你也在怎样代写贝叶斯统计这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

贝叶斯统计学是一个使用概率的数学语言来描述认识论的不确定性的系统。在 “贝叶斯范式 “中，对自然状态的相信程度是明确的；这些程度是非负的，而对所有自然状态的总相信是固定的。

statistics-lab™ 为您的留学生涯保驾护航在代写贝叶斯统计方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写贝叶斯统计代写方面经验极为丰富，各种代写贝叶斯统计相关的作业也就用不着说。

我们提供的贝叶斯统计及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

统计代写|贝叶斯统计代写Bayesian statistics代考|STA421

统计代写|贝叶斯统计代写beyesian statistics代考|Bayes’ theorem for parametric inference

Consider a general problem in which we have data $x$ and require inference about a parameter $\theta$. In a Bayesian analysis $\theta$ is unknown and viewed as a random variable. Thus, it possesses a density function $f(\theta)$. From Bayes’ theorem ${ }^2,(1.6)$, we have
$$
\begin{aligned}
f(\theta \mid x) & =\frac{f(x \mid \theta) f(\theta)}{f(x)} \
& \propto f(x \mid \theta) f(\theta)
\end{aligned}
$$
Colloquially (1.7) is
Posterior $\propto$ Likelihood $\times$ Prior.

Most commonly, both the parameter $\theta$ and data $x$ are continuous. There are cases when $\theta$ is continuous and $x$ is discrete ${ }^3$. In exceptional cases $\theta$ could be discrete.
The Bayesian method comprises of the following principle steps

Prior
Obtain the prior density $f(\theta)$ which expresses our knowledge about $\theta$ prior to observing the data.
Likelihood
Obtain the likelihood function $f(x \mid \theta)$. This step simply describes the process giving rise to the data $x$ in terms of $\theta$.
Posterior
Apply Bayes’ theorem to derive posterior density $f(\theta \mid x)$ which expresses all that is known about $\theta$ after observing the data.
Inference
Derive appropriate inference statements from the posterior distribution e.g. point estimates, interval estimates, probabilities of specified hypotheses.

统计代写|贝叶斯统计代写beyesian statistics代考|Conjugate Bayesian updates

Example 4 Beta-Binomial. Suppose that $X \mid \theta \sim \operatorname{Bin}(n, \theta)$. We specify a prior distribution for $\theta$ and consider $\theta \sim \operatorname{Beta}(\alpha, \beta)$ for $\alpha, \beta>0$ known ${ }^4$. Thus, for $0 \leq \theta \leq 1$ we have
$$
f(\theta)=\frac{1}{B(\alpha, \beta)} \theta^{\alpha-1}(1-\theta)^{\beta-1}
$$
where $B(\alpha, \beta)=\frac{\Gamma(\alpha) \Gamma(\beta)}{\Gamma(\alpha+\beta)}$ is the beta function and $E(\theta)=\frac{\alpha}{\alpha+\beta}$. Recall that as
$$
\int_0^1 f(\theta) d \theta=1
$$
then
$$
B(\alpha, \beta)=\int_0^1 \theta^{\alpha-1}(1-\theta)^{\beta-1} d \theta .
$$
Using Bayes’ theorem, (1.7), the posterior is
$$
\begin{aligned}
f(\theta \mid x) \propto f(x \mid \theta) f(\theta) & =\left(\begin{array}{c}
n \
x
\end{array}\right) \theta^x(1-\theta)^{n-x} \times \frac{1}{B(\alpha, \beta)} \theta^{\alpha-1}(1-\theta)^{\beta-1} \
& \propto \theta^x(1-\theta)^{n-x} \theta^{\alpha-1}(1-\theta)^{\beta-1} \
& =\theta^{\alpha+x-1}(1-\theta)^{\beta+n-x-1} .
\end{aligned}
$$
So, $f(\theta \mid x)=c \theta^{\alpha+x-1}(1-\theta)^{\beta+n-x-1}$ for some constant $c$ not involving $\theta$. Now
$$
\int_0^1 f(\theta \mid x) d \theta=1 \Rightarrow c^{-1}=\int_0^1 \theta^{\alpha+x-1}(1-\theta)^{\beta+n-x-1} d \theta .
$$
Notice that from (1.12) we can evaluate this integral so that
$$
c^{-1}=\int_0^1 \theta^{\alpha+x-1}(1-\theta)^{\beta+n-x-1} d \theta=B(\alpha+x, \beta+n-x)
$$
whence
$$
\begin{aligned}
& \qquad f(\theta \mid x)=\frac{1}{B(\alpha+x, \beta+n-x)} \theta^{\alpha+x-1}(1-\theta)^{\beta+n-x-1} \
& \text { i.e. } \theta \mid x \sim \operatorname{Beta}(\alpha+x, \beta+n-x)
\end{aligned}
$$
Notice the tractability of this update: the prior and posterior distribution are both from the same family of distributions, in this case the Beta family. This is an example of conjugacy. The update is simple to perform: the number of successes observed, $x$, is added to $\alpha$ whilst the number of failures observed, $n-x$, is added to $\beta$.

贝叶斯统计代写

统计代写|贝叶斯统计代写beyesian statistics代考|Bayes’ theorem for parametric inference

考虑一个我们有数据的一般问题 $x$ 并需要推断一个参数 $\theta$. 在贝叶斯分析中 $\theta$ 是末知的，被视为随机变量。因此，它具有密度函数 $f(\theta)$. 从贝叶斯定理 ${ }^2,(1.6)$ ，我们有
$$
f(\theta \mid x)=\frac{f(x \mid \theta) f(\theta)}{f(x)} \quad \propto f(x \mid \theta) f(\theta)
$$
通俗地讲 (1.7) 是
后验 $\propto$ 可能性 $\times$ 事先的。
最常见的是，参数 $\theta$ 和数据 $x$ 是连续的。有些时候 $\theta$ 是连续的并且 $x$ 是离散的 ${ }^3$. 在特殊情况下㰤能是离散的。
贝叶斯方法包括以下主要步㗒

Prior
获取先验密度 $f(\theta)$ 这表达了我们对 $\theta$ 在观察数据之前。
Likelihood
获得似然函数 $f(x \mid \theta)$. 这一步简单描述了产生数据的过程 $x$ 按照 $\theta$.
后验
应用贝叶斯定理推导后验密度 $f(\theta \mid x)$ 它表达了所有已知的 $\theta$ 观察数据后。
推论
从后验分布中推导出适当的推论陈述，例如点估计、区间估计、特定假设的概率。

统计代写|贝叶斯统计代写beyesian statistics代考|Conjugate Bayesian updates

示例 4 Beta-二项式。假设 $X \mid \theta \sim \operatorname{Bin}(n, \theta)$. 我们指定先验分布 $\theta$ 并考虑 $\theta \sim \operatorname{Beta}(\alpha, \beta)$ 为了 $\alpha, \beta>0$ 已知的 4 . 因此，对于 $0 \leq \theta \leq 1$ 我们有
$$
f(\theta)=\frac{1}{B(\alpha, \beta)} \theta^{\alpha-1}(1-\theta)^{\beta-1}
$$
在哪里 $B(\alpha, \beta)=\frac{\Gamma(\alpha) \Gamma(\beta)}{\Gamma(\alpha+\beta)}$ 是 beta 函数并且 $E(\theta)=\frac{\alpha}{\alpha+\beta}$. 回想一下
$$
\int_0^1 f(\theta) d \theta=1
$$
然后
$$
B(\alpha, \beta)=\int_0^1 \theta^{\alpha-1}(1-\theta)^{\beta-1} d \theta
$$
使用贝叶斯定理 (1.7)，后验是
$$
f(\theta \mid x) \propto f(x \mid \theta) f(\theta)=(n x) \theta^x(1-\theta)^{n-x} \times \frac{1}{B(\alpha, \beta)} \theta^{\alpha-1}(1-\theta)^{\beta-1} \propto \theta^x(1-\theta)^{n-x} \theta^\alpha
$$
请注意，从 (1.12) 我们可以评估这个积分，以便
$$
c^{-1}=\int_0^1 \theta^{\alpha+x-1}(1-\theta)^{\beta+n-x-1} d \theta=B(\alpha+x, \beta+n-x)
$$
何处
$$
f(\theta \mid x)=\frac{1}{B(\alpha+x, \beta+n-x)} \theta^{\alpha+x-1}(1-\theta)^{\beta+n-x-1} \quad \text { i.e. } \theta \mid x \sim \operatorname{Beta}(\alpha+x, \beta+n
$$
请注意此更新的易处理性: 先验分布和后验分布均来自同一分布族，在本例中为 Beta 族。这是共轭的例子。更新执行起来很简单：观察到的成功次数， $x$, 被添加到 $\alpha$ 而观察到的失败次数， $n-x$ ，被添加到 $\beta$.

统计代写|贝叶斯统计代写beyesian statistics代考请认准statistics-lab™

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

金融工程是使用数学技术来解决金融问题。金融工程使用计算机科学、统计学、经济学和应用数学领域的工具和知识来解决当前的金融问题，以及设计新的和创新的金融产品。

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

术语广义线性模型（GLM）通常是指给定连续和/或分类预测因素的连续响应变量的常规线性回归模型。它包括多元线性回归，以及方差分析和方差分析（仅含固定效应）。

有限元方法代写

有限元方法（FEM）是一种流行的方法，用于数值解决工程和数学建模中出现的微分方程。典型的问题领域包括结构分析、传热、流体流动、质量运输和电磁势等传统领域。

有限元是一种通用的数值方法，用于解决两个或三个空间变量的偏微分方程（即一些边界值问题）。为了解决一个问题，有限元将一个大系统细分为更小、更简单的部分，称为有限元。这是通过在空间维度上的特定空间离散化来实现的，它是通过构建对象的网格来实现的：用于求解的数值域，它有有限数量的点。边界值问题的有限元方法表述最终导致一个代数方程组。该方法在域上对未知函数进行逼近。[1] 然后将模拟这些有限元的简单方程组合成一个更大的方程系统，以模拟整个问题。然后，有限元通过变化微积分使相关的误差函数最小化来逼近一个解决方案。

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

随机分析代写

随机微积分是数学的一个分支，对随机过程进行操作。它允许为随机过程的积分定义一个关于随机过程的一致的积分理论。这个领域是由日本数学家伊藤清在第二次世界大战期间创建并开始的。

时间序列分析代写

随机过程，是依赖于参数的一组随机变量的全体，参数通常是时间。随机变量是随机现象的数量表现，其时间序列是一组按照时间发生先后顺序进行排列的数据点序列。通常一组时间序列的时间间隔为一恒定值（如1秒，5分钟，12小时，7天，1年），因此时间序列可以作为离散时间数据进行分析处理。研究时间序列数据的意义在于现实中，往往需要研究某个事物其随时间发展变化的规律。这就需要通过研究该事物过去发展的历史记录，以得到其自身发展的规律。

回归分析代写

多元回归分析渐进（Multiple Regression Analysis Asymptotics）属于计量经济学领域，主要是一种数学上的统计分析方法，可以分析复杂情况下各影响因素的数学关系，在自然科学、社会和经济学等多个领域内应用广泛。

MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习和应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

统计代写|贝叶斯统计代写Bayesian statistics代考|STA602

Posted on 2022年12月19日2022年12月19日 by statistics-lab

如果你也在怎样代写贝叶斯统计这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的贝叶斯统计及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

统计代写|贝叶斯统计代写Bayesian statistics代考|STA602

统计代写|贝叶斯统计代写beyesian statistics代考|The Bayesian method

Consider a problem where we wish to make inferences about a parameter $\theta$ given data $x$. In a classical setting the data is treated as if it is random, even after it has been observed, and the parameter is viewed as a fixed unknown constant. Consequently, no probability distribution can be attached to the parameter. Conversely in a Bayesian approach parameters, having not been observed, are treated as random and thus possess a probability distribution whilst the data, having been observed, is treated as being fixed.

Example 1 Suppose that we perform $n$ independent Bernoulli trials in which we observe $x$, the number of times an event occurs. We are interested in making inferences about $\theta$, the probability of the event occurring in a single trial. Let’s consider the classical approach to this problem.
Prior to observing the data, the probability of observing $x$ was
$$
P(X=x \mid \theta)=\left(\begin{array}{l}
n \
x
\end{array}\right) \theta^x(1-\theta)^{n-x}
$$

This is a function of the (future) $x$, assuming that $\theta$ is known. If we know $x$ but don’t know $\theta$ we could treat (1) as a function of $\theta, L(\theta)$, the likelihood function. We then choose the value which maximises this likelihood. The maximum likelihood estimate is $\frac{x}{n}$ with corresponding estimator $\frac{X}{n}$.

In the general case, the classical approach uses an estimate $T(x)$ for $\theta$. Justifications for the estimate depend upon the properties of the corresponding estimator $T(X)$ (bias, consistency, …) using its sampling distribution (given $\theta$ ). That is, we treat the data as being random even though it is known! Such an approach can lead to nonsensical answers.

Example 2 Suppose in the Bernoulli trials of Example 1 we wish to estimate $\theta^2$. The maximum likelihood estimator ${ }^1$ is $\left(\frac{X}{n}\right)^2$. However this is a biased estimator as
$$
\begin{aligned}
E\left(X^2 \mid \theta\right) & =\operatorname{Var}(X \mid \theta)+E^2(X \mid \theta) \
& =n \theta(1-\theta)+n^2 \theta^2 \
& =n \theta+n(n-1) \theta^2 .
\end{aligned}
$$

统计代写|贝叶斯统计代写beyesian statistics代考|Bayes’ theorem

Let $X$ and $Y$ be random variables with joint density function $f(x, y)$. The marginal distribution of $Y, f(y)$, is the joint density function averaged over all possible values of $X$,
$$
f(y)=\int_X f(x, y) d x .
$$
For example, if $Y$ is univariate and $X=\left(X_1, X_2\right)$ where $X_1$ and $X_2$ are univariate then
$$
f(y)=\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f\left(x_1, x_2, y\right) d x_1 d x_2 .
$$
The conditional distribution of $Y$ given $X=x$ is
$$
f(y \mid x)=\frac{f(x, y)}{f(x)}
$$
so that by substituting (1.2) into (1.1) we have
$$
f(y)=\int_X f(y \mid x) f(x) d x .
$$
which is often known as the theory of total probability. $X$ and $Y$ are independent if and only if
$$
f(x, y)=f(x) f(y) .
$$
Substituting (1.2) into (1.3) we see that an equivalent result is that
$$
f(y \mid x)=f(y)
$$

so that independence reflects the notion that learning the outcome of $X$ gives us no information about the distribution of $Y$ (and vice versa). If $Z$ is a third random variable then $X$ and $Y$ are conditionally independent given $Z$ if and only if
$$
f(x, y \mid z)=f(x \mid z) f(y \mid z) .
$$

贝叶斯统计代写

统计代写|贝叶斯统计代写beyesian statistics代考|The Bayesian method

考虑一个我们莃望对参数进行推断的问题 $\theta$ 给定数据 $x$. 在经典设置中，数据被视为随机的，即使在观察到数据之后也是如此，并且参数被视为固定的末知常数。因此，不能将概率分布附加到参数。相反，在贝叶斯方法中，末被观察到的参数被视为随机的，因此具有概率分布，而被观察到的数据被视为固定的。
示例 1 假设我们执行 $n$ 我们观察到的独立伯努利试验 $x$ ，事件发生的次数。我们有兴趣做出关于 $\theta$ ，事件在单次试验中发生的概率。让我们考虑一下解决这个问题的经典方法。
在观察数据之前，观察到的概率 $x$ 曾是
$$
P(X=x \mid \theta)=(n x) \theta^x(1-\theta)^{n-x}
$$
这是 (末来) 的功能 $x$ ，假如说 $\theta$ 众所周知。如果我们知道 $x$ 但不知道 $\theta$ 我们可以将 (1) 视为函数 $\theta, L(\theta)$ ，似然函数。然后我们选择最大化这种可能性的值。最大似然估计是 $\frac{x}{n}$ 与相应的估计 $\frac{X}{n}$.
在一般情况下，经典方法使用估计 $T(x)$ 为了 $\theta$. 估计的理由取决于相应估计量的属性 $T(X)$ （偏差，一致性， …..) 使用其抽样分布 (给定 $\theta$ ). 也就是说，我们将数据视为随机数据，即使它是已知的! 这种方法可能会导致荒谬的答案。
示例 2 假设在示例 1 的伯努利试验中我们希望估计 $\theta^2$. 最大似然估计 ${ }^1$ 是 $\left(\frac{X}{n}\right)^2$. 然而，这是一个有偏估计量，因为
$$
E\left(X^2 \mid \theta\right)=\operatorname{Var}(X \mid \theta)+E^2(X \mid \theta) \quad=n \theta(1-\theta)+n^2 \theta^2=n \theta+n(n-1) \theta^2
$$

统计代写|贝叶斯统计代写beyesian statistics代考|Bayes’ theorem

让 $X$ 和 $Y$ 是具有联合密度函数的随机变量 $f(x, y)$. 的边际分布 $Y, f(y)$, 是所有可能值的联合密度函数的平均值 $X$ ，
$$
f(y)=\int_X f(x, y) d x .
$$
例如，如果 $Y$ 是单变量的并且 $X=\left(X_1, X_2\right)$ 在哪里 $X_1$ 和 $X_2$ 那么是单变量的
$$
f(y)=\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f\left(x_1, x_2, y\right) d x_1 d x_2 .
$$
的条件分布 $Y$ 给予 $X=x$ 是
$$
f(y \mid x)=\frac{f(x, y)}{f(x)}
$$
因此通过将 (1.2) 代入 (1.1) 我们有
$$
f(y)=\int_X f(y \mid x) f(x) d x .
$$
这通常被称为全概率论。 $X$ 和 $Y$ 是独立的当且仅当
$$
f(x, y)=f(x) f(y) .
$$
将 (1.2) 代入 (1.3) 我们看到一个等价的结果是
$$
f(y \mid x)=f(y)
$$
因此，独立性反映了学习结果的概念 $X$ 没有给我们关于分布的信息 $Y$ (反之亦然) 。如果 $Z$ 是第三个随机变量 $X$ 和 $Y$ 条件独立给定 $Z$ 当且仅当
$$
f(x, y \mid z)=f(x \mid z) f(y \mid z) \text {. }
$$

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

统计代写|贝叶斯统计代写Bayesian statistics代考|STA602

Posted on 2022年11月7日2022年11月7日 by statistics-lab

如果你也在怎样代写贝叶斯统计这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的贝叶斯统计及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

统计代写|贝叶斯统计代写beyesian statistics代考|GENERAL LAWS

There are those who think philosophers have already spilled more ink on the paradoxes of confirmation than they are worth, others who think them among the deepest conceptual knots in the foundations of knowledge. Like the problem of free will, the Goodman paradox owes much of its fascination to the way in which it combines urgent and topical philosophical concerns, above all, interest in the inductive roots of language and the linguistic roots of our theoretical construction of the world. My own view is that the paradoxes have at least one lesson, fraught with significance, to convey: that general laws are not necessarily confirmed by their positive cases. I. J. Good ${ }^1$ was, I believe, the first to both point this out and make a convincing case. One of his examples is rather artificial. He invites us to imagine that the live possibilities have been narrowed to just two: either the world contains a single white raven and a vast number of black ravens, or else it contains no white ravens and a modest number of black ravens. (In either case, of course, it may contain other things as well.) Since a random sample of the general population is more likely to contain a black raven in the first case than in the second, the first possibility is confirmed (i.e., made more probable). Hence, by confirming the first possibility (many black ravens and a single white raven), observation of a black raven disconfirms the hypothesis that all ravens are black.

Good’s other example is less precise but more suggestive. Crows and ravens being related species, observation of white crows (mutants, perhaps) would tend rather to lower than raise the probability that all ravens (including mutants) are black. This example provides more insight. When we fill out the description of a non-black non-raven to include its being a white crow, we bring relevant background knowledge into the foreground. The same dramatic affect on our probabilities is illustrated in a somewhat sharper form when we fill out our description of a non-reactive specimen of non $\mathrm{U}^{238}$ (the heavy isotope of uranium) to include its being an inert specimen of $\mathrm{U}^{235}$ (the lighter isotope). Atomic theory instructs us that the chemical properties of an element are independent of isotopy, and so we should hardly expect observation of inert specimens of the lighter isotope of uranium to increase our confidence that samples of the heavier isotope are reactive. There is an even better example to illustrate the point, one which has the virtue of bringing knowable probabilities into play.

Consider the classical problem of matches. ${ }^2$ A case would be assorting $N$ hats at random among their $N$ owners; the problem is to compute the probability of a match (a man receiving his own hat). Let $H$ be the hypothesis that no man receives his own hat (no matches). Of the first two men queried, we learn that neither received his own hat (in conformity with $H$ ). This outcome, call it $X$, will confirm $H$. But let us see what happens when we pick out various subevents of $X$.

统计代写|贝叶斯统计代写beyesian statistics代考|RESOLUTION OF THE PARADOXES

The lynch pin of the Goodman paradox is the inference from ‘a green emerald examined before time $t$ is a grue emerald’ to ‘examination of an emerald before time $t$ which proves to be green confirms the hypothesis that all emeralds are grue’. But when we fill out our description of a grue emerald to include its being green and examined prior to time $t$, we single out a subevent, and no inference to the confirmation of the grue hypothesis can be drawn. No more than we can infer confirmation of the reactivity of the heavy isotope of uranium by inert specimens of the light isotope from the fact that the latter are non-samples of the heavy isotope.

Nor, for that matter, can we even infer confirmation of the grue hypothesis by observation of grue emeralds, for, in general, as Good’s first example illustrates, we cannot conclude confirmation of ‘All $A$ are $B$ ‘ from observation of $A B$ ‘s. The possible worlds (i.e., the possible states of the actual world with respect to a specific population and set of properties) which are assigned high prior probability in the light of background knowledge and contain many $A B$ ‘s may all contain an $A$ which is non- $B$. Just as in Good’s example, finding an $A B$ would, by raising the probabilities of these possible worlds, lower the probability that all $A$ are without exception $B$.

The same is true, a fortiori, of non- $A B$ ‘s. In fact, it is quite easy to think up cases where observation of a non- $A B$ would disconfirm ‘All $A$ are $B$ ‘. This would be true, for example, if the numbers of $A$ ‘s and $B$ ‘s were known and finite. (For ‘All $A$ are $B$ ‘ to have non-zero prior probability would then require that the known number of $B$ ‘s exceed the known number of $A$ ‘s.) Each non- $A B$ found would then reduce the probability that all $A$ are $B$, the probability vanishing entirely when the observed number of non- $A B$ ‘s surpassed the known excess of $B$ ‘s over $A$ ‘s.

When background knowledge is admitted, e.g., in the form of a probability distribution over possible states of the considered population, very little can be inferred in general about the confirmation of a general law by its ‘positive cases’ (in any straightforward sense of this term). Given a probabilistic analysis of confirmation, then, the paradoxes are stopped dead in their tracks. We cannot infer the confirmation of the grue hypothesis by grue emeralds (much less by green emeralds), nor that of the raven hypothesis by white shoes or red herrings.

贝叶斯统计代写

统计代写|贝叶斯统计代写beyesian statistics代考|GENERAL LAWS

有些人认为哲学家已经在证实的悖论上泼了比他们的价值更多的墨水，其他人则认为它们是知识基础中最深的概念结。就像自由意志的问题一样，古德曼悖论的魅力在很大程度上归功于它结合了紧迫和热门的哲学关注的方式，尤其是对语言的归纳根源和我们对世界的理论建构的语言根源的兴趣。我自己的观点是，这些悖论至少有一个意义重大的教训可以传达：一般规律不一定被它们的积极案例所证实。IJ 好1我相信，是第一个既指出这一点又提出令人信服的案例的人。他的一个例子是相当人为的。他邀请我们想象生存的可能性已经缩小到只有两种：要么世界只包含一只白渡鸦和大量黑渡鸦，要么世界没有白渡鸦和少量黑渡鸦。（当然，在任何一种情况下，它也可能包含其他东西。）由于一般人群的随机样本在第一种情况下比在第二种情况下更有可能包含黑乌鸦，所以第一种可能性得到了证实（即，更有可能）。因此，通过确认第一种可能性（许多黑乌鸦和一只白乌鸦），对黑乌鸦的观察否定了所有乌鸦都是黑色的假设。

Good 的另一个例子不太精确，但更具启发性。乌鸦和渡鸦是相关物种，观察白乌鸦（也许是突变体）倾向于降低而不是提高所有乌鸦（包括突变体）都是黑色的概率。这个例子提供了更多的洞察力。当我们填写非黑非乌鸦的描述以包括它是一只白乌鸦时，我们将相关的背景知识带入了前台。当我们填写对非反应性样本的描述时，对我们概率的同样戏剧性影响会以一种更清晰的形式说明。在238（铀的重同位素）包括它是在235（较轻的同位素）。原子理论告诉我们，元素的化学性质与同位素无关，因此我们几乎不应该期望通过观察铀的较轻同位素的惰性样品来增加我们对较重同位素样品具有反应性的信心。有一个更好的例子来说明这一点，它具有将可知概率发挥作用的优点。

考虑经典的匹配问题。2一个案例将是什锦的ñ在他们中间随意戴上帽子ñ拥有者; 问题是计算匹配的概率（一个人收到自己的帽子）。让H假设没有人收到自己的帽子（没有匹配）。在被询问的前两个人中，我们了解到两人都没有收到自己的帽子（根据H）。这个结果，叫吧X, 将确认H. 但是让我们看看当我们挑选出各种子事件时会发生什么X.

统计代写|贝叶斯统计代写beyesian statistics代考|RESOLUTION OF THE PARADOXES

古德曼悖论的关键是来自“在时间之前检查过的绿翡翠”的推论吨is a grue emerald’ to ‘examination of an emerald before time吨被证明是绿色的，证实了所有祖母绿都是绿色的假设。但是当我们填写我们对祖母绿的描述时，包括它是绿色的，并且在时间之前进行了检查吨，我们挑出一个子事件，并且不能推断出对正确假设的确认。我们只能通过轻同位素的惰性样品来推断铀的重同位素的反应性，因为后者不是重同位素的样品。

就此而言，我们甚至不能通过观察绿色祖母绿来推断对真实假设的证实，因为一般来说，正如古德的第一个例子所说明的那样，我们不能得出结论证实“所有一个是乙’从观察一个乙的。根据背景知识分配了高先验概率并包含许多一个乙可能都包含一个一个这是非乙. 就像在 Good 的例子中一样，找到一个一个乙将通过提高这些可能世界的概率，降低所有可能世界的概率一个无一例外乙.

更何况，非一个乙的。事实上，很容易想到观察非一个乙将不确认“所有一个是乙’。这将是正确的，例如，如果一个’沙乙是已知的和有限的。（对所有人一个是乙’ 具有非零先验概率将需要已知数量的乙的超过已知数量一个’s.) 每个非一个乙找到然后会降低所有的概率一个是乙，当观察到的非一个乙超过了已知的过剩乙结束了一个的。

当背景知识被承认时，例如，以所考虑人口的可能状态的概率分布的形式，通常很少能推断出通过其“积极案例”确认一般规律（在任何直接意义上学期）。给定确认的概率分析，然后，悖论就停止了。我们不能用绿色祖母绿（更不用说绿色祖母绿）来推断绿色假设的证实，也不能用白鞋或红鲱鱼来推断乌鸦假设的证实。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

统计代写|贝叶斯统计代写Bayesian statistics代考|FNR6560

Posted on 2022年11月7日2022年11月7日 by statistics-lab

如果你也在怎样代写贝叶斯统计这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的贝叶斯统计及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

统计代写|贝叶斯统计代写Bayesian statistics代考|FNR6560

统计代写|贝叶斯统计代写beyesian statistics代考|NOISELESS INFORMA TION

An experiment $X$ (or channel) is noiseless for $\theta$ if $H_\theta(X)=0$. I.e., knowledge of the true state or transmitted message removes all uncertainty regarding the outcome of $X$. Each outcome $x$ of $X$ may then be identified with the set of states $\theta$ such that $x$ occurs when $\theta$ obtains, and so $X$ is effectively a partition of $\theta$, the set of states or possible messages. Consider now sequences of experiments or repetitions of an experiment where at each step there are $n$ possible outcomes. Following Sneed (1967), I shall speak of $n$-ary questioning procedures. Given a noiseless channel, our problem may be to find the most efficient of all the $n$-ary questioning procedures ( $n$ is typically a function of the channel). It is not hard to see that the most efficient maximizes the $E S I$ or transmitted information. (This maximum is often called the channel capacity.) For noiseless channels, $T(X ; \theta)=T(\theta ; X)=H(X)-H_\theta(X)=H(X)$. The best questioning procedure therefore maximizes $H(X)$, the outcome entropy, at each step. In particular, this procedure will identify the true state or message in a minimum number of steps, provided all partitions are feasible. In general, we must distinguish between the average number of steps it takes a procedure to identify the true state and the number of steps it requires to identify the true state. The latter is found by assuming the a priori least favorable distribution of states – the uniform distribution. For equiprobable messages, the best questioning procedure partitions the set of live possibilities into equinumerous subsets. I refer to this principle as the uniform partition strategy. For the problem of locating a square on a checkerboard discussed earlier, this strategy directs us to divide the number of remaining squares in half at each step. The following example further illustrates the efficiency of the uniform partition strategy.

EXAMPLE 5 (the odd ball). Given twelve steel balls, eleven of which are of the same weight, the problem is to locate the odd ball in three weighings with a pan balance, and to determine whether the odd ball is heavier or lighter than the eleven standard weights. (Thus, we seek a 3-ary questioning procedure that requires only three steps.) I number the balls $1, \ldots, 12$, and assign each of the 24 possible states $1 H, 1 L, 2 H, 2 L, \ldots, 12 H, 12 L$ (‘ $H$ ‘ for ‘heavier’, ‘ $L$ ‘ for ‘lighter’) equal probability. To insure noiselessness, I permit only weighings of equal numbers of balls. (Before reading on, the reader may wish to attempt a solution of this problem by trial and error.)

Solution. The uniform partition strategy determines the best first weighing as four against four (not, as many people initially guess, six against six). Say we weigh $1,2,3,4$ against $5,6,7,8$. Then all three possible outcomes of the weighing are equiprobable and the set of 24 possibilities is uniformly partitioned into three sets of 8 elements each. E.g., if the left pan is heavier, the unexcluded possibilities are $1 H, 2 H, 3 H, 4 H, 5 L, 6 L, 7 L, 8 L$. Given this outcome, let us find a best second weighing. Since there are 8 remaining possibilities, the best second weighing will partition this set into three subsets of 3,3 and 2 elements, the best feasible approximation to a uniform partition. Weighing $1,2,9$ against $3,4,5$ achieves this most nearly uniform partition, and is therefore a best second weighing. (N.B., 9 is known to be a standard weight.) Whatever outcome this best second weighing produces, the true state can be found on a third weighing. E.g., if the pans balance on the second weighing, leaving the possibilities $6 L, 7 L, 8 L$, weigh ball 6 against ball 7. If they balance, you are left with $8 L$, etc. The reader is invited to find a best second weighing in the case where the pans balance on the first weighing. Pursuit of the uniform partition strategy will yield the solution in three weighings whatever the outcome of each weighing. I.e., this questioning procedure requires only three questions.

统计代写|贝叶斯统计代写beyesian statistics代考|INFORMATION

Our discussion has by no means exhausted the measures of information that have been proposed. I have focused on what seem to me the most fundamental and most useful concepts, and on those which play a role in subsequent chapters. I have also wholly neglected the vast psychological literature dealing with applications of information theory to learning, perception and related problem areas. Much of this material is relevant to our concerns and highly suggestive, and so this is a serious omission. For a useful introduction to this literature, consult Atneave (1954), (1959), and Garner (1962). One would expect the ‘disinterested’ measures studied here to induce the same (or nearly the same) ranking of experiments, but I have not investigated the matter in detail (nor has anyone else, to my knowledge).
When we compare ‘interested’ with ‘disinterested’ measures, on the other hand, the matter is quite otherwise. The EVSI, we saw (Example 6), is not an increasing function of the $E S I$ and the two can induce opposite rankings of the same pair of experiments. Consider another ‘interested’ measure (Blackwell and Girshick, 1954) which ranks one experiment higher than another if any loss function attainable with the latter is at tainable with the former. As Lindley (1956) shows, one experiment ranked higher than another by this method must also have higher $E S I$, but the converse fails. Blackwell and Girshick show, for example, that in comparing the hypothesis that two traits $F$ and $G$ are unassociated with any alternative of dependence (where the proportions with which the two traits $F, G$ occur in the general population are known), it is most informative to sample that one of the four traits $F, G$, non- $F$, non- $G$ which is rarest in the considered population. This result can be verified directly for the ESI, and it follows from Lindley’s more general result.

If utility one is assigned to the ‘acceptance’ of a true hypothesis and utility zero to the ‘acceptance’ of a false hypothesis, then expected loss reduces to the expected proportion of errors (i.e., of false accepted hypotheses). Lindley’s result, seen in this light, is somewhat reassuring. On the other hand, as Marshak (1974) observes, if $a_i$ is the action of affirming $H_i$, and we posit the ‘disinterested’ utilities $U\left(a_i, s_j\right)=\delta_{i j}$ (Kronecker’s delta, which is 1 or 0 according as $i=j$ or $i \neq j$ ), then the $C V S I$ of outcome $x$ becomes
$$
\text { (1.34) } \max _i P\left(H_i / x\right)-\max _i P\left(H_i\right)
$$
as the reader can easily verify. However, not even this drastic constraint on the scientist’s utilities will insure that the $E V S I$ and $E S I$ induce the same ranking of two or more experiments. One has only to note that the entropy of one distribution can exceed that of a second even though the maximal element of the first also exceeds the maximal element of the second.

贝叶斯统计代写

统计代写|贝叶斯统计代写beyesian statistics代考|NOISELESS INFORMA TION

一个实验X（或通道）是无噪音的一世如果H一世(X)=0. 即，对真实状态或传输消息的了解消除了关于结果的所有不确定性X. 每一个结果X的X然后可以用一组状态来识别一世这样X发生时一世得到，所以X实际上是一个分区一世，一组状态或可能的消息。现在考虑实验序列或实验的重复，其中每一步都有n可能的结果。继 Sneed (1967) 之后，我将谈到n-ary 提问程序。给定一个无噪声信道，我们的问题可能是找到所有信道中最有效的n- 提问程序（n通常是通道的函数）。不难看出，最有效的最大化和小号我或传输的信息。（这个最大值通常称为信道容量。）对于无噪声信道，吨(X;一世)=吨(一世;X)=H(X)−H一世(X)=H(X). 因此，最好的提问程序最大化H(X)，每一步的结果熵。特别是，如果所有分区都是可行的，则此过程将在最少的步骤中识别真实状态或消息。一般来说，我们必须区分一个过程识别真实状态所需的平均步骤数和识别真实状态所需的步骤数。后者是通过假设先验的最不利状态分布——均匀分布来找到的。对于等概率消息，最好的提问程序将一组活的可能性划分为等数的子集。我把这个原则称为统一分区策略。对于前面讨论过的在棋盘上定位一个正方形的问题，这个策略指导我们在每一步将剩余的正方形的数量分成两半。

例 5（奇数球）。给定 12 个钢球，其中 11 个重量相同，问题是用平秤在 3 次称重中定位奇数球，并确定奇数球比 11 个标准砝码重还是轻。（因此，我们寻求只需要三个步骤的三元提问程序。）我给球编号1,…,12, 并分配 24 个可能的状态中的每一个1H,1大号,2H,2大号,…,12H,12大号 (‘ H’对于’更重’，’大号’代表’打火机’）等概率。为了确保无噪音，我只允许称量相同数量的球。（在继续阅读之前，读者可能希望尝试通过反复试验来解决这个问题。）

解决方案。统一分区策略将最佳的第一个权重确定为四对四（而不是像许多人最初猜测的那样，六对六）。说我们称重1,2,3,4反对5,6,7,8. 那么所有三种可能的称重结果都是等概率的，并且这组 24 种可能性被均匀地划分为三组，每组 8 个元素。例如，如果左锅较重，则不可排除的可能性是1H,2H,3H,4H,5大号,6大号,7大号,8大号. 鉴于此结果，让我们找到最佳的第二次称重。由于还有 8 种可能性，最好的第二次加权将把这个集合划分为 3,3 和 2 个元素的三个子集，这是对均匀划分的最佳可行近似。称重1,2,9反对3,4,5实现了这种最接近均匀的划分，因此是最佳的第二次称重。（注意，9 是已知的标准重量。）无论这种最佳的第二次称重产生什么结果，都可以在第三次称重中找到真实状态。例如，如果平底锅在第二次称重时保持平衡，则可能6大号,7大号,8大号，将球 6 与球 7 称重。如果它们平衡，则剩下8大号等。请读者在第一次称量时平底锅平衡的情况下找到最佳的第二次称量。无论每次称重的结果如何，追求统一分区策略都会在三个称重中产生解决方案。即，这个提问过程只需要三个问题。

统计代写|贝叶斯统计代写beyesian statistics代考|INFORMATION

我们的讨论并没有穷尽所提出的信息量度。我专注于在我看来最基础和最有用的概念，以及那些在后续章节中发挥作用的概念。我也完全忽略了处理信息论在学习、感知和相关问题领域应用的大量心理学文献。这些材料中的大部分内容与我们的关注点相关并且具有高度暗示性，因此这是一个严重的遗漏。有关该文献的有用介绍，请参阅 Atneave (1954)、(1959) 和 Garner (1962)。人们会期望这里研究的“无私”措施会引起相同（或几乎相同）的实验排名，但我没有详细调查此事（据我所知，其他任何人也没有）。
另一方面，当我们将“有兴趣”与“无兴趣”措施进行比较时，情况就完全不同了。我们看到（例 6）的 EVSI 不是和小号我并且两者可以诱导同一对实验的相反排名。考虑另一种“感兴趣的”度量（Blackwell 和 Girshick，1954 年），如果后者可达到的任何损失函数与前者可实现，则该方法将一个实验排名高于另一个实验。正如 Lindley (1956) 所表明的，通过这种方法排名高于另一个的一个实验也必须具有更高的和小号我，但反过来失败。例如，Blackwell 和 Girshick 表明，在比较两个特征的假设时，F和G与依赖的任何选择无关（其中两个特征的比例F,G发生在一般人群中是已知的），对四个特征之一进行抽样是最有用的F,G, 非F, 非G这在所考虑的人群中是最罕见的。这个结果可以直接为 ESI 验证，它来自 Lindley 的更一般的结果。

如果效用 1 被分配给“接受”一个真假设，而效用 0 被分配给“接受”一个错误假设，那么预期损失会减少到错误的预期比例（即错误接受的假设）。林德利的结果，从这个角度来看，还是有些让人放心的。另一方面，正如 Marshak (1974) 所观察到的，如果一个一世是肯定的动作H一世, 我们假设“无私”的效用在(一个一世,sj)=d一世j（克罗内克三角洲，根据下式为 1 或 0一世=j或者一世≠j)，那么C在小号我结果的X变成

(1.34) 最大限度一世磷(H一世/X)−最大限度一世磷(H一世)
读者可以很容易地验证。然而，即使是对科学家效用的这种严格限制也不能确保和在小号我和和小号我诱导两个或多个实验的相同排名。只需注意一个分布的熵可以超过第二个分布的熵，即使第一个分布的最大元素也超过了第二个分布的最大元素。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

统计代写|贝叶斯统计代写Bayesian statistics代考|STAT206

Posted on 2022年11月7日2022年11月7日 by statistics-lab

如果你也在怎样代写贝叶斯统计这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的贝叶斯统计及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

统计代写|贝叶斯统计代写Bayesian statistics代考|STAT206

统计代写|贝叶斯统计代写beyesian statistics代考|THE UTILITY OF INFORMATION

This section concerns risky decision making, or decision making with incomplete knowledge of the state of nature. For example, the decision might involve classifying a patient as infected or uninfected, marketing or withholding a new drug, or determining an optimal allocation of stock. When one of the options is best under every possible circumstance, the choice is clear (the so-called ‘sure-thing principle’). In general, though, the best course of action depends on which state of nature obtains. It is clear that if one has fairly sharply defined probabilities for the different states, and fairly well defined views on the desirability of performing the several actions under the considered states, then the best action is that which has highest utility at the most probable states. If numerical utilities and probabilities are assigned, we are led to a sharper, quantitative form of this principle; choose that action which maximizes expected utility. The expected utility of an action is the weighted average of its utilities under the several states, the weights being the respective probabilities of those states. The rule in question is variously referred to as the expected utility rule or the Bayes decision rule. An action which is best by the lights of the rule (i.e., an action which maximizes expected utility) is called a Bayes act.

In many cases, numerical utilities can be identified with monetary payoffs for practical purposes. But utility cannot generally be identified with money; it depends on such additional factors as the prospective uses to which the money is put, levels of aspiration, externalities, risk averseness, and, of course, on the agent’s initial fortune. Thus, ten dollars generally has more utility for a pauper than for a millionaire, and more utility still for a pauper who needs just ten dollars more to realize a life-long ambition. Moreover, it may not be easy to find a monetary equivalent for the dire consequences of an inappropriate decision (which might even result in death). We shall not enter into these and other complications here, since our interest is largely confined to the bearing of probability and probability changes on decisions taken. In what follows, therefore, I take utilities (or payoffs) as given, but my treatment of probabilities will be more realistic. To see right off how the Bayes rule may apply where probabilities are incompletely known, consider a simple two-act two-state decision problem, whether or not to invest $\$ 5000.00$ in a corporate stock, with payoffs given in Table I.

统计代写|贝叶斯统计代写beyesian statistics代考|DISINTERESTED INFORMA TION

We take up now the theory of disinterested information; the problem here is to maximize information without regard to its utility for any particular decision problem. We base our treatment on the obvious analogy between experimentation and communication over a noisy channel. The parameter space of an assumed model plays the role of source or message ensemble from which the input, the state of nature or setting of the parameters, is selected. The input is then transmitted by the experiment to the target, the experimenter, who proceeds to decode the message. Noise enters in the form of sampling error, systematic bias, the masking effects of hidden variables, uncontrolled variation in the experimental materials, and so forth. The information transmitted from source to target measures the average amount by which the experiment reduces the experimenter’s uncertainty regarding the parameters of his model. Thus, for a fixed model, each of the available experiments has an associated expected yield of information. If, as we assume throughout, information is the goal of research, then the experimenter should, by performing the experiment with the highest expected yield of information, select the least noisy of the available channels. In a straightforward sense, that experiment can be regarded as most sensitive, or as providing the weightiest evidence.

Transmitted information does not depend on the direction of flow, that is, on whether the parameter space of the model or the outcome space of the experiment plays the role of source. The symmetry suggests that, for purposes of predicting the outcome of a fixed experiment, that model should be preferred which transmits, on the average, a maximal quantity of information regarding the outcome space. Such models can be said to be maximally informative with respect to the experiment.

The first thing we need in carrying out the projected development is a suitable measure of uncertainty. Let $X$ be a discrete ${ }^2$ random variable with possible values $x_i$ which have probabilities $p_i, i=1, \ldots, m$. By the entropy of $X$ (or its distribution) is meant the quantity:
(1.7) $\quad H\left(p_1, \ldots, p_m\right)=-\Sigma_i p_i \log p_i$.
Logarithms may be taken to any base $b>1$. While we confine discussion to the discrete case, theorems given here can be extended to continuous random variables. ${ }^3$

贝叶斯统计代写

统计代写|贝叶斯统计代写beyesian statistics代考|THE UTILITY OF INFORMATION

本节涉及风险决策，或对自然状态了解不完整的决策。例如，决策可能涉及将患者分类为感染或未感染、营销或扣留新药，或确定库存的最佳分配。当一个选项在所有可能的情况下都是最好的时，选择是明确的（所谓的“确定性原则”）。但是，一般而言，最佳行动方案取决于获得的自然状态。很明显，如果一个人对不同状态的概率有相当明确的定义，并且对在所考虑的状态下执行几个动作的可取性有相当明确的看法，那么最好的动作就是在最可能的状态下具有最高效用的动作。如果分配了数值效用和概率，我们被引导到这个原则的更清晰、定量的形式；选择最大化预期效用的行动。一个动作的预期效用是它在几个状态下的效用的加权平均值，权重是这些状态各自的概率。所讨论的规则被称为期望效用规则或贝叶斯决策规则。根据规则最好的动作（即最大化预期效用的动作）称为贝叶斯动作。所讨论的规则被称为期望效用规则或贝叶斯决策规则。根据规则最好的动作（即最大化预期效用的动作）称为贝叶斯动作。所讨论的规则被称为期望效用规则或贝叶斯决策规则。根据规则最好的动作（即最大化预期效用的动作）称为贝叶斯动作。

在许多情况下，出于实际目的，数值效用可以用货币收益来确定。但是效用通常不能与金钱等同起来；它取决于其他因素，例如资金的预期用途、愿望水平、外部性、风险厌恶程度，当然还有代理人的初始财富。因此，对于一个穷人来说，10 美元通常比对百万富翁更有用，而对于只需要多 10 美元来实现终生抱负的穷人来说，更有用。此外，对于不恰当的决定（甚至可能导致死亡）的可怕后果，要找到一个货币等价物可能并不容易。我们在这里不会涉及这些和其他复杂性，因为我们的兴趣主要局限于概率和概率变化对所做出决定的影响。因此，在下文中，我将效用（或收益）视为给定的，但我对概率的处理将更加现实。要立即了解贝叶斯规则如何适用于概率不完全已知的情况，请考虑一个简单的两步两态决策问题，是否投资$5000.00在公司股票中，收益在表 I 中给出。

统计代写|贝叶斯统计代写beyesian statistics代考|DISINTERESTED INFORMA TION

我们现在讨论无私信息理论；这里的问题是最大化信息而不考虑它对任何特定决策问题的效用。我们将我们的处理建立在实验和通过嘈杂通道进行通信之间的明显类比之上。假设模型的参数空间扮演源或消息集合的角色，从中选择输入、自然状态或参数设置。然后，输入由实验传输到目标，即实验者，后者继续对消息进行解码。噪声以抽样误差、系统偏差、隐藏变量的掩蔽效应、实验材料中不受控制的变化等形式进入。从源传输到目标的信息测量了实验减少实验者对其模型参数的不确定性的平均量。因此，对于固定模型，每个可用实验都有相关的预期信息产量。如果，正如我们自始至终假设的那样，信息是研究的目标，那么实验者应该通过以最高的预期信息产量进行实验，选择可用信道中噪声最小的。直截了当地说，这个实验可以被认为是最敏感的，或者是提供了最有分量的证据。那么实验者应该通过执行具有最高预期信息产量的实验，选择可用通道中噪声最小的。直截了当地说，这个实验可以被认为是最敏感的，或者是提供了最有分量的证据。那么实验者应该通过执行具有最高预期信息产量的实验，选择可用通道中噪声最小的。直截了当地说，这个实验可以被认为是最敏感的，或者是提供了最有分量的证据。

传递的信息不取决于流动的方向，即取决于模型的参数空间或实验的结果空间是否起源的作用。对称性表明，为了预测固定实验的结果，应该首选该模型，该模型平均传输有关结果空间的最大信息量。这样的模型可以说是关于实验的最大信息量。

在进行预计的开发时，我们首先需要的是对不确定性进行适当的衡量。让X是离散的2具有可能值的随机变量X一世有概率的p一世,一世=1,…,米. 由熵X（或其分布）是指数量：
（1.7）H(p1,…,p米)=−小号一世p一世日志⁡p一世.
对数可以取任何底b>1. 虽然我们将讨论限制在离散情况，但这里给出的定理可以扩展到连续随机变量。3

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

统计代写|贝叶斯统计代写Bayesian statistics代考|Bayesian inference methods

Posted on 2022年5月21日2022年5月21日 by statistics-lab

如果你也在怎样代写贝叶斯统计这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的贝叶斯统计及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

统计代写|贝叶斯统计代写Bayesian statistics代考|Bayesian inference methods

统计代写|贝叶斯统计代写beyesian statistics代考|The Bayes theorem for probability

The Bayes theorem allows us to calculate probabilities of events when additional information for some other events is available. For example, a person may have a certain disease whether or not they show any symptoms of it. Suppose a randomly selected person is found to have the symptom. Given this additional information, what is the probability that they have the disease? Note that having the symptom does not fully guarantee that the person has the disease.

To formally state the Bayes theorem, let $B_{1}, B_{2}, \ldots, B_{k}$ be a set of mutually exclusive and exhaustive events and let $A$ be another event with positive probability (see illustration in Figure 4.1). The Bayes theorem states that for any $i, i=1, \ldots, k$,
$$
P\left(B_{i} \mid A\right)=\frac{P\left(B_{i} \cap A\right)}{P(A)}=\frac{P\left(A \mid B_{i}\right) P\left(B_{i}\right)}{\sum_{j=1}^{k} P\left(A \mid B_{j}\right) P\left(B_{j}\right)}
$$
Example 4.1. We can understand the theorem using a simple example. Consider a rare disease that is thought to occur in $0.1 \%$ of the population. Using a particular blood test a physician observes that out of the patients with disease $99 \%$ possess a particular symptom. Also assume that $1 \%$ of the population without the disease have the same symptom. A randomly chosen person from the population is blood tested and is shown to have the symptom. What is the conditional probability that the person has the disease?

Here $k=2$ and let $B_{1}$ be the event that a randomly chosen person has the disease and $B_{2}$ is the complement of $B_{1}$. Let $A$ be the event that a randomly chosen person has the symptom. The problem is to determine $P\left(B_{1} \mid A\right)$.
We have $P\left(B_{1}\right)=0.001$ since $0.1 \%$ of the population has the disease, and $P\left(B_{2}\right)=0.999$. Also, $P\left(A \mid B_{1}\right)=0.99$ and $P\left(A \mid B_{2}\right)=0.01$. Now
$$
\begin{aligned}
P(\text { disease } \mid \text { symptom })=P\left(B_{1} \mid A\right) &=\frac{P\left(A \mid B_{1}\right) P\left(B_{1}\right)}{P\left(A \mid B_{1}\right) P\left(B_{1}\right)+P\left(A \mid B_{2}\right) P\left(B_{2}\right)} \
&=\frac{0.99 \times 0.001}{0.99 \times 0.001+0.999 \times 0.01} \
&=\frac{99}{99+999}=0.09 .
\end{aligned}
$$
The probability of disease given symptom here is very low, only $9 \%$, since the

disease is a very rare disease and there will be a large percentage of individuals in the population who have the symptom but not the disease, highlighted by the figure 999 as the second last term in the denominator above.

It is interesting to see what happens if the same person is found to have the same symptom in another independent blood test. In this case, the prior probability of $0.1 \%$ would get revised to $0.09$ and the revised posterior probability is given by:
$$
P(\text { disease } \mid \text { twice positive })=\frac{0.99 \times 0.09}{0.99 \times 0.09+0.91 \times 0.01}=0.908 .
$$
As expected, this probability is much higher since it combines the evidence from two independent tests. This illustrates an aspect of the Bayesian world view: the prior probability gets continually updated in the light of new evidence.

统计代写|贝叶斯统计代写beyesian statistics代考|Bayes theorem for random variables

The Bayes theorem stated above is generalized for two random variables instead of two events $A$ and $B_{i}$ ‘s as noted above. In the generalization, $B_{i}$ ‘s will be replaced by the generic parameter $\theta$ which we want to estimate and $A$ will be replaced by the observation random variable denoted by $Y$. Also, the probabilities of events will be replaced by the probability (mass or density) function of the argument random variable. Thus, $P\left(A \mid B_{i}\right)$ will be substituted by $f(y \mid \theta)$ where $f(\cdot$ ) denotes the probability (mass or density) function of the random variable $X$ given a particular value of $\theta$. The replacement for $P\left(B_{i}\right)$ is $\pi(\theta)$, which is the prior distribution of the unknown parameter $\theta$. If $\theta$ is

a discrete parameter taking only finite many, $k$ say, values, then the summation in the denominator of the above Bayes theorem will stay as it is since $\sum_{j=1}^{k} \pi\left(\theta_{j}\right)$ must be equal to 1 as the total probability. If, however, $\theta$ is a continuous parameter then the summation in the denominator of the Bayes theorem must be replaced by an integral over the range of $\theta$, which is generally taken as the whole of the real line.

The Bayes theorem for random variables is now stated as follows. Suppose that two random variables $Y$ and $\theta$ are given with probability density functions (pdfs) $f(y \mid \theta)$ and $\pi(\theta)$, then
$$
\pi(\theta \mid y)=\frac{f(y \mid \theta) \pi(\theta)}{\int_{-\infty}^{\infty} f(y \mid \theta) \pi(\theta) d \theta},-\infty<\theta<\infty .
$$
The probability distribution given by $\pi(\theta)$ captures the prior beliefs about the unknown parameter $\theta$ and is the prior distribution in the Bayes theorem. The posterior distribution of $\theta$ is given by $\pi(\theta \mid y)$ after observing the value $y$ of the random variable $Y$. We illustrate the theorem with the following example.
Example 4.2. Binomial Suppose $Y \sim \operatorname{binomial}(n, \theta)$ where $n$ is known and we assume $\operatorname{Beta}(\alpha, \beta)$ prior distribution for $\theta$. Here the likelihood function is
$$
f(y \mid \theta)=\left(\begin{array}{l}
n \
y
\end{array}\right) \theta^{y}(1-\theta)^{n-y}
$$
for $0<\theta<1$. The function $f(y \mid \theta)$ is to be viewed as a function of $\theta$ for a given value of $y$, although its argument is written as $y \mid \theta$ instead of $\theta \mid y$. This is because we use the probability density function of $Y$, which is more widely known, and we avoid introducing further notation for the likelihood function e.g. $L(\theta ; y)$.

Suppose that the prior distribution is the beta distribution (A.22) having density
$$
\pi(\theta)=\frac{1}{B(\alpha, \beta)} \theta^{\alpha-1}(1-\theta)^{\beta-1}, \quad 0<\theta<1 .
$$

统计代写|贝叶斯统计代写beyesian statistics代考|Sequential updating of the posterior distribution

Consider the denominator in the posterior distribution. The denominator, given by, $\int_{-\infty}^{\infty} f(y \mid \theta) \pi(\theta) d \theta$ or $\int_{-\infty}^{\infty} f\left(y_{1}, \ldots, y_{n} \mid \theta\right) \pi(\theta) d \theta$ is free of the unknown parameter $\theta$ since $\theta$ is only a dummy in the integral, and it has been integrated out in the expression. The posterior distribution $\pi\left(\theta \mid y_{1}, \ldots, y_{n}\right)$ is to be viewed as a function of $\theta$ and the denominator is merely a constant. That is why, we often ignore the constant denominator and write the posterior distribution $\pi\left(\theta \mid y_{1}, \ldots, y_{n}\right)$ as
$$
\pi\left(\theta \mid y_{1}, \ldots, y_{n}\right) \propto f\left(y_{1}, \ldots, y_{n} \mid \theta\right) \times \pi(\theta)
$$
By noting that $f\left(y_{1}, \ldots, y_{n} \mid \theta\right)$ provides the likelihood function of $\theta$ and $\pi(\theta)$ is the prior distribution for $\theta$, we write:
Posterior $\propto$ Likelihood $\times$ Prior.
Hence we always know the posterior distribution up-to a normalizing constant. Often we are able to identify the posterior distribution of $\theta$ just by looking at the numerator as in the two preceding examples.

The structure of the Bayes theorem allows sequential updating of the posterior distribution. By Bayes theorem we “update” the prior belief $\pi(\theta)$ to $\pi(\theta \mid \mathbf{y})$. Note that $\pi\left(\theta \mid y_{1}\right) \propto f\left(y_{1} \mid \theta\right) \pi(\theta)$ and if $Y_{2}$ is independent of $Y_{2}$ given the parameter $\theta$, then:
$$
\begin{aligned}
\pi\left(\theta \mid y_{1}, y_{2}\right) & \propto & f\left(y_{2} \mid \theta\right) f\left(y_{1} \mid \theta\right) \pi(\theta) \
& \propto f\left(y_{2} \mid \theta\right) \pi\left(\theta \mid y_{1}\right)
\end{aligned}
$$
Thus, at the second stage of data collection, the first stage posterior distribution, $\pi\left(\theta \mid y_{1}\right)$ acts as the prior distribution to update our belief about $\theta$ after. Thus, the Bayes theorem shows how the knowledge about the state of nature represented by $\theta$ is continually modified as new data becomes available. There is another strong point that jumps out of this sequential updating. It is possible to start with a very weak prior distribution $\pi(\theta)$ and upon observing data sequentially the prior distribution gets revised to a stronger one, e.g. $\pi\left(\theta \mid y_{1}\right)$ when just one observation has been recorded – of course, assuming that data are informative about the unknown parameter $\theta$.

贝叶斯统计代写

统计代写|贝叶斯统计代写beyesian statistics代考|The Bayes theorem for probability

当某些其他事件的附加信息可用时，贝叶斯定理允许我们计算事件的概率。例如，一个人可能患有某种疾病，无论他们是否表现出任何症状。假设一个随机选择的人被发现有症状。鉴于这些额外信息，他们患此病的概率是多少？请注意，出现症状并不能完全保证该人患有该疾病。

为了正式陈述贝叶斯定理，让乙1,乙2,…,乙ķ是一组互斥且穷举的事件，让一种是另一个具有正概率的事件（参见图 4.1 中的说明）。贝叶斯定理指出，对于任何一世,一世=1,…,ķ,
磷(乙一世∣一种)=磷(乙一世∩一种)磷(一种)=磷(一种∣乙一世)磷(乙一世)∑j=1ķ磷(一种∣乙j)磷(乙j)
例 4.1。我们可以通过一个简单的例子来理解这个定理。考虑一种被认为发生在0.1%的人口。使用特定的血液测试，医生观察到在患有疾病的患者中99%具有特定的症状。还假设1%没有疾病的人群有相同的症状。从人群中随机选择一个人进行血液检查，并显示有症状。这个人得病的条件概率是多少？

这里ķ=2然后让乙1是一个随机选择的人患有这种疾病的事件，并且乙2是的补码乙1. 让一种是随机选择的人有症状的事件。问题是要确定磷(乙1∣一种).
我们有磷(乙1)=0.001自从0.1%的人口患有这种疾病，并且磷(乙2)=0.999. 还，磷(一种∣乙1)=0.99和磷(一种∣乙2)=0.01. 现在
磷( 疾病 ∣ 症状 )=磷(乙1∣一种)=磷(一种∣乙1)磷(乙1)磷(一种∣乙1)磷(乙1)+磷(一种∣乙2)磷(乙2) =0.99×0.0010.99×0.001+0.999×0.01 =9999+999=0.09.
这里出现症状的概率很低，只有9%，因为

疾病是一种非常罕见的疾病，人口中将有很大比例的个体有症状但没有疾病，在上面分母中的倒数第二个数字 999 中突出显示。

有趣的是，如果同一个人在另一次独立验血中被发现有相同的症状会发生什么。在这种情况下，先验概率0.1%将被修改为0.09修改后的后验概率由下式给出：
磷( 疾病 ∣ 两次阳性 )=0.99×0.090.99×0.09+0.91×0.01=0.908.
正如预期的那样，这个概率要高得多，因为它结合了来自两个独立测试的证据。这说明了贝叶斯世界观的一个方面：先验概率根据新证据不断更新。

统计代写|贝叶斯统计代写beyesian statistics代考|Bayes theorem for random variables

上述贝叶斯定理被推广到两个随机变量而不是两个事件一种和乙一世’ 如上所述。概括来说，乙一世’s 将被通用参数替换θ我们要估计和一种将由表示为的观察随机变量替换是. 此外，事件的概率将被参数随机变量的概率（质量或密度）函数替换。因此，磷(一种∣乙一世)将被F(是∣θ)在哪里F(⋅) 表示随机变量的概率（质量或密度）函数X给定一个特定的值θ. 替换为磷(乙一世)是圆周率(θ)，这是未知参数的先验分布θ. 如果θ是

一个离散参数，只取有限多个，ķ比如说，值，那么上述贝叶斯定理的分母中的总和将保持不变，因为∑j=1ķ圆周率(θj)必须等于 1 作为总概率。然而，如果，θ是一个连续参数，那么贝叶斯定理的分母中的总和必须用以下范围内的积分代替θ，一般取整条实线。

随机变量的贝叶斯定理现在陈述如下。假设两个随机变量是和θ用概率密度函数 (pdfs) 给出F(是∣θ)和圆周率(θ)，然后
圆周率(θ∣是)=F(是∣θ)圆周率(θ)∫−∞∞F(是∣θ)圆周率(θ)dθ,−∞<θ<∞.
由下式给出的概率分布圆周率(θ)捕获关于未知参数的先验信念θ是贝叶斯定理中的先验分布。的后验分布θ是（谁）给的圆周率(θ∣是)观察值后是随机变量是. 我们用下面的例子来说明这个定理。
例 4.2。二项式假设是∼二项式⁡(n,θ)在哪里n是已知的，我们假设贝塔⁡(一种,b)事先分配θ. 这里似然函数是
F(是∣θ)=(n 是)θ是(1−θ)n−是
为了0<θ<1. 功能F(是∣θ)被视为一个函数θ对于给定的值是, 虽然它的论点写成是∣θ代替θ∣是. 这是因为我们使用了概率密度函数是，这是更广为人知的，我们避免为似然函数引入进一步的符号，例如大号(θ;是).

假设先验分布是具有密度的 beta 分布 (A.22)
圆周率(θ)=1乙(一种,b)θ一种−1(1−θ)b−1,0<θ<1.

统计代写|贝叶斯统计代写beyesian statistics代考|Sequential updating of the posterior distribution

考虑后验分布中的分母。分母由下式给出，∫−∞∞F(是∣θ)圆周率(θ)dθ或者∫−∞∞F(是1,…,是n∣θ)圆周率(θ)dθ没有未知参数θ自从θ在积分中只是一个哑元，在表达式中已经被积分出来了。后验分布圆周率(θ∣是1,…,是n)被视为一个函数θ分母只是一个常数。这就是为什么，我们经常忽略常数分母而写出后验分布圆周率(θ∣是1,…,是n)作为
圆周率(θ∣是1,…,是n)∝F(是1,…,是n∣θ)×圆周率(θ)
通过注意到F(是1,…,是n∣θ)提供似然函数θ和圆周率(θ)是先验分布θ，我们写：
后∝可能性×事先的。
因此，我们总是知道直到归一化常数的后验分布。通常我们能够识别θ只需查看前面两个示例中的分子即可。

贝叶斯定理的结构允许后验分布的顺序更新。通过贝叶斯定理，我们“更新”了先验信念圆周率(θ)到圆周率(θ∣是). 注意圆周率(θ∣是1)∝F(是1∣θ)圆周率(θ)而如果是2独立于是2给定参数θ，然后：
圆周率(θ∣是1,是2)∝F(是2∣θ)F(是1∣θ)圆周率(θ) ∝F(是2∣θ)圆周率(θ∣是1)
因此，在数据收集的第二阶段，第一阶段后验分布，圆周率(θ∣是1)作为先验分布来更新我们的信念θ后。因此，贝叶斯定理显示了关于自然状态的知识如何表示为θ随着新数据的出现而不断修改。这种顺序更新还有另一个优点。可以从非常弱的先验分布开始圆周率(θ)并且在顺序观察数据后，先验分布被修改为更强的分布，例如圆周率(θ∣是1)当只记录一次观察时——当然，假设数据是关于未知参数的信息θ.

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

统计代写|贝叶斯统计代写Bayesian statistics代考|Exploratory data analysis methods

Posted on 2022年5月21日2022年5月21日 by statistics-lab

如果你也在怎样代写贝叶斯统计这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的贝叶斯统计及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

统计代写|贝叶斯统计代写Bayesian statistics代考|Exploratory data analysis methods

统计代写|贝叶斯统计代写beyesian statistics代考|Non-spatial graphical exploration

Recall the air pollution data set nysptime introduced in Section $1.3 .1$ which contains the daily maximum ozone concentration values at the 28 sites shown in Figure $1.1$ in the state of New York for the 62 days in July and August 2006. From this data set we have created a spatial data set, named nyspatial, which contains the average air pollution and the average values of the three covariates at the 28 sites. Figure $3.1$ provides a histogram for the response, average daily ozone concentration levels, at the 28 monitoring sites. The plot does not show a symmetric bell-shaped histogram but it does admit the possibility of a unimodal distribution for the response. The $R$ command used to draw the plot is given below:

The geom_histogram command has been invoked with a bin width argument of $4.5$. The shape of the histogram will change if a different bin width is supplied. As is well known, a lower value will provide a lesser degree of smoothing while a higher value will increase more smoothing by collapsing the number of classes. It is also possible to adopt a different scale, e.g. square root or logarithm, but we have not done so here to illustrate modeling on the original scale of the data. We shall explore different scales for the spatio-temporal version of this data set.

Figure $3.2$ provides a pair-wise scatter plot of the response against the three explanatory covariates: maximum temperature, wind speed and relative humidity. The diagonal panels in this plot provides kernel density estimates of the variables. This plot reveals that wind speed correlates the most with ozone levels at this aggregated average level. As is well known, see e.g. Sahu and Bakar (2012a), the maximum temperature also positively correlates with the ozone levels. Relative humidity is seen to have the least amount of correlation with ozone levels. This plot has been obtained using the commands.

统计代写|贝叶斯统计代写beyesian statistics代考|Exploring spatio-temporal point reference data

This section illustrates EDA methods with the nysptime data set in bmstdr. To decide the modeling scale Figure $3.5$ plots the mean against variance for each site on the original scale and also on the square root scale for the response. A stronger linear mean-variance relationship with a larger value of the slope for the superimposed regression line is observed on the original scale making this less suitable for modeling purposes. This is because in linear statistical modeling we often model the mean as a function of the available covariates and assume equal variance (homoscedasticity) for the residual differences between the observed and modeled values. A word of caution here is that the right panel does not show a complete lack of mean-variance relationship. However, we still prefer to model on the square root scale to stabilize the variance and in this case the predictions we make in Chapter 7 for ozone concentration values do not become negative.

Temporal variations are illustrated in Figures $3.6$ for all 28 sites and in Figure $3.8$ for the 8 sites which have been used for model validation purposes in Chapters 6 and 7. Figure $3.7$ shows variations of ozone concentration values for the 28 monitoring sites. Suspected outliers, data values which are at a distance beyond $1.5$ times the inter quartile range from the whiskers, are plotted as red stars. Such high values of ground level ozone pollution are especially harmful to humans.

统计代写|贝叶斯统计代写beyesian statistics代考|Exploring areal Covid-19 case and death data

This section explores the Covid-19 mortality data introduced in Section 1.4.1. The bmstdr data frame engtotals contains aggregated number of deaths along with other relevant information for analyzing and modeling this data set. The data frame object engdeaths contains the death numbers by the 20 weeks from March 13 to July 31,2020 . These two data sets will be used to illustrate spatial and spatio-temporal modeling for areal data in Chapter 10 .
Typical such areal data are represented by a choropleth map which uses shades of color or grey scale to classify values into a few broad classes, like a histogram. Two choropleth maps have been provided in Figure $1.9$.

For the engtotals data set the minimum and maximum number of deaths were 4 and 1223 respectively for the City of London (a very small borough within greater London with population 9721) and Birmingham with population $1,141,816$ in 2019 . However, the minimum and maximum death rates per 100,000 were $10.79$ and $172.51$ respectively for Hastings (in the South East) and Hertsmere (near Watford in greater London) respectively.

Calculation of the Moran’s I for number and rate of deaths is performed by using the moran .me function in the library spdep. This function requires the spatial adjacency matrix in a list format, which is obtained by the poly2nb and nb2listw functions in the spdep library. The Moran’s I statistics for the raw observed death numbers and the rate are found to be $0.34$ and $0.45$ respectively both with a p-value smaller than $0.001$ for the null hypothesis of no spatial autocorrelation. The permutation tests in statistics randomly permute the observed data and then calculates the relevant statistics for a number of replications. These replicate values of the statistics are used to approximate the null distribution of the statistics against which the observed value of the statistics for the observed data is compared and an approximate $\mathrm{p}$-value is found. The tests with Geary’s $\mathrm{C}$ statistics gave a $\mathrm{p}$-value of less than $0.001$ for the death rate per 100,000 but the p-value was higher, $0.025$, for the un-adjusted observed Covid death numbers. Thus, the higher degree of spatial variation in the death rates has been successfully detected by the Geary’s statistics. The code lines to obtain these results are given below.

贝叶斯统计代写

统计代写|贝叶斯统计代写beyesian statistics代考|Non-spatial graphical exploration

回想一下第 1 节介绍的空气污染数据集 nysptime1.3.1其中包含图 1 所示 28 个站点的每日最大臭氧浓度值1.12006 年 7 月和 8 月在纽约州的 62 天。根据该数据集，我们创建了一个名为 nyspatial 的空间数据集，其中包含 28 个站点的平均空气污染和三个协变量的平均值。数字3.1提供了 28 个监测点的响应直方图，即平均每日臭氧浓度水平。该图未显示对称的钟形直方图，但确实承认响应存在单峰分布的可能性。这R用于绘制绘图的命令如下：

已使用 bin 宽度参数调用 geom_histogram 命令4.5. 如果提供不同的 bin 宽度，直方图的形状将发生变化。众所周知，较低的值将提供较小程度的平滑，而较高的值将通过折叠类的数量来增加更多的平滑度。也可以采用不同的尺度，例如平方根或对数，但我们在这里没有这样做来说明对数据原始尺度的建模。我们将探索该数据集时空版本的不同尺度。

数字3.2提供了对三个解释性协变量的响应的成对散点图：最高温度、风速和相对湿度。该图中的对角线面板提供了变量的核密度估计。该图显示，风速与此聚合平均水平的臭氧水平最相关。众所周知，参见例如 Sahu 和 Bakar (2012a)，最高温度也与臭氧水平正相关。相对湿度被认为与臭氧水平的相关性最小。该图是使用命令获得的。

统计代写|贝叶斯统计代写beyesian statistics代考|Exploring spatio-temporal point reference data

本节说明使用 bmstdr 中的 nysptime 数据集的 EDA 方法。决定造型比例图3.5在原始尺度以及响应的平方根尺度上绘制每个站点的均值与方差。在原始尺度上观察到叠加回归线的斜率值越大，线性均值-方差关系越强，因此不太适合建模目的。这是因为在线性统计建模中，我们经常将平均值建模为可用协变量的函数，并假设观测值和建模值之间的残差差异相等（同方差性）。这里需要注意的是，右侧面板并未显示完全缺乏均值-方差关系。但是，我们仍然更喜欢在平方根尺度上建模以稳定方差，在这种情况下，我们在第 7 章中对臭氧浓度值所做的预测不会变为负值。

时间变化如图所示3.6对于所有 28 个站点，在图3.8用于第 6 章和第 7 章中用于模型验证目的的 8 个站点。图3.7显示了 28 个监测点的臭氧浓度值的变化。可疑的异常值，超出距离的数据值1.5将晶须的四分位间距乘以红色星形。如此高的地面臭氧污染值对人类尤其有害。

统计代写|贝叶斯统计代写beyesian statistics代考|Exploring areal Covid-19 case and death data

本节探讨第 1.4.1 节中介绍的 Covid-19 死亡率数据。bmstdr 数据框 engtotals 包含汇总的死亡人数以及用于分析和建模此数据集的其他相关信息。数据框对象 engdeaths 包含从 2020 年 3 月 13 日到 7 月 31 日这 20 周的死亡人数。这两个数据集将在第 10 章中用于说明区域数据的空间和时空建模。
典型的此类区域数据由等值线图表示，该图使用颜色或灰度的阴影将值分类为几个大类，如直方图。图中提供了两个等值线图1.9.

对于 enttotals 数据集，伦敦金融城（大伦敦内的一个非常小的行政区，人口 9721）和伯明翰的人口最少和最大死亡人数分别为 4 和 12231,141,816在 2019 年。然而，每 100,000 人的最低和最高死亡率分别为10.79和172.51分别是黑斯廷斯（东南部）和赫茨米尔（大伦敦沃特福德附近）。

使用库 spdep 中的 moran .me 函数计算死亡人数和死亡率的 Moran’s I。该函数需要列表格式的空间邻接矩阵，由spdep库中的poly2nb和nb2listw函数获取。发现原始观察到的死亡人数和死亡率的 Moran’s I 统计数据为0.34和0.45分别具有小于的 p 值0.001对于没有空间自相关的原假设。统计中的排列测试随机排列观察到的数据，然后计算多次重复的相关统计量。这些统计量的重复值用于近似统计量的零分布，观察数据的统计量的观察值与之进行比较，以及一个近似值p- 找到值。与 Geary 的测试C统计给出了一个p-值小于0.001对于每 100,000 人的死亡率，但 p 值更高，0.025，对于未调整的观察到的 Covid 死亡人数。因此，Geary 的统计数据已经成功地检测到死亡率的更高程度的空间变化。下面给出了获得这些结果的代码行。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

统计代写|贝叶斯统计代写Bayesian statistics代考|CAR models

Posted on 2022年5月21日2022年5月21日 by statistics-lab

如果你也在怎样代写贝叶斯统计这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的贝叶斯统计及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

Point Processing (Introduction to Video and Image Processing) Part 1 — 统计代写|贝叶斯统计代写Bayesian statistics代考|CAR models

统计代写|贝叶斯统计代写beyesian statistics代考|CAR models

The keyword CAR in CAR models stands for Conditional AutoRegression. This concept is often used in the context of modeling areal data which can be either discrete counts or continuous measurements. However, the CAR models are best described using the assumption of the normal distribution although CAR models for discrete data are also available. In our Bayesian modeling for areal data CAR models are used as prior distributions for spatial effects defined on the areal units. This justifies our treatment of CAR models using the normal distribution assumption.

Assume that we have areal data $Y_{i}$ for the $n$ areal units. The conditional in CAR stands for conditioning based on all the others. For example, we like to think of $Y_{1}$ given $Y_{2}, \ldots, Y_{n}$. The AutoRegression terms stand for regression on itself (auto). Putting these concepts together the CAR models are based on regression of each $Y_{i}$ conditional on the others $Y_{j}$ for $j=1, \ldots n$ but with

$j \neq i$. The constraint $j \neq i$ makes sure that we do not use $Y_{i}$ to define the distribution of $Y_{i}$. Thus, a typical CAR model will be written as
$$
Y_{i} \mid y_{j}, j \neq i \sim N\left(\sum_{j} b_{i j} y_{j}, \sigma_{i}^{2}\right)
$$
where the $b_{i j}$ ‘s are presumed to be the regression coefficients for predicting $Y_{i}$ based on all the other $Y_{j}$ ‘s. The full distributional specification for $\mathbf{Y}=\left(Y_{1}, \ldots, Y_{n}\right)$ comes from the independent product specification of the distributions $(2.6)$ for each $i=1, \ldots, n$. There are several key points and concepts that we now discuss to understand and we present those below as a bulleted list.

The models (2.6) can be equivalently rewritten as
$$
\mathbf{Y}=B \mathbf{Y}+\epsilon
$$
where $\epsilon=\left(\epsilon_{1}, \ldots, \epsilon_{n}\right)$ is a multivariate normal error distribution with zero means. The appearance of $\mathbf{Y}$ on the right hand side of the above emphasizes the keywords AutoRegression in CAR.
The CAR specification defines a valid multivariate normal probability distribution for $\mathbf{Y}$ under the additional conditions
$$
\frac{b_{i j}}{\sigma_{i}^{2}}=\frac{b_{j i}}{\sigma_{j}^{2}}, i, j=1, \ldots, n
$$
which are required to ensure that the inverse covariance matrix $\Sigma^{-1}$ in (A.24), is symmetric.

统计代写|贝叶斯统计代写beyesian statistics代考|Point processes

Spatial point pattern data arise when an event of interest, e.g. outbreak of a disease, e.g. Covid-19, occurs at random locations inside a study region of interest, $\mathbb{D}$. Often the main interest in such a case lies in discovering any explainable or non-random pattern in a scatter plot of the data locations. Absence of any regular pattern in the data locations is said to correspond to the model of complete spatial randomness, CSR, which is also called a Poisson process. Under CSR, the number of points in any given sub-region will follow the Poisson distribution with a parameter value proportional to the area of the sub-region. Often, researchers are interested in rejecting the model of CSR in favor of their own theories of the evolution or clustering of the points. In this context the researchers have to decide what all type of clustering may possibly explain the clustering pattern of points and which one of those provides the “best” fit to the observed data. There are other obvious investigations to make, for example, are there any suitable covariates which may explain the pattern? To illustrate, a lack of trees in many areas in a city may be explained by a layer of built environment.

Spatio-temporal point process data are naturally found in a number of disciplines, including (human or veterinary) epidemiology where extensive datasets are also becoming more common. One important distinction in practice is between processes defined as a discrete-time sequence of spatial point processes, or as a spatially and temporally continuous point process. See the books by Diggle (2014) and Møller and Waagepetersen (2003) for many examples and theoretical developments.

统计代写|贝叶斯统计代写beyesian statistics代考|Conclusion

The main purpose of this chapter has been to introduce the key concepts we need to pursue spatio-temporal modeling in the later chapters. Spatiotemporal modeling, as any other substantial scientific area of research, has its own unique set of keywords and concept dictionary. Not knowing some of these is a barrier to fully understanding, or more appropriately appreciating, what is going on under the hood of modeling equations. Thus, this chapter plugs the knowledge gap a reader may have regarding the typical terminology used while modeling.

It has not been possible to keep the chapter completely notation free. Notations have been introduced to keep the rigor in presentation and also as early and unique reference points for many key concepts assumed in the later chapters. For example, the concepts of Gaussian Process (GP), Kriging, internal and external standardization are defined without the data application overload. Of course, it is possible to skip reading of this chapter until a time when the reader is confronted with an un-familiar jargon.

Digital Image Processing: Point Operations to Adjust Brightness and Contrast - Technical Articles — 统计代写|贝叶斯统计代写Bayesian statistics代考|CAR models

贝叶斯统计代写

统计代写|贝叶斯统计代写beyesian statistics代考|CAR models

CAR 模型中的关键字 CAR 代表 Conditional AutoRegression。这个概念通常用于建模区域数据的上下文中，这些数据可以是离散计数或连续测量。然而，CAR 模型最好使用正态分布的假设来描述，尽管也可以使用离散数据的 CAR 模型。在我们对面积数据的贝叶斯建模中，CAR 模型被用作在面积单位上定义的空间效应的先验分布。这证明了我们使用正态分布假设处理 CAR 模型的合理性。

假设我们有面积数据是一世为了n面积单位。CAR 中的条件表示基于所有其他条件的条件。例如，我们喜欢想到是1给定是2,…,是n. AutoRegression 术语代表自身的回归（自动）。将这些概念放在一起，CAR模型基于每个概念的回归是一世以他人为条件是j为了j=1,…n但与

j≠一世. 约束j≠一世确保我们不使用是一世定义分布是一世. 因此，典型的 CAR 模型将被写为
是一世∣是j,j≠一世∼ñ(∑jb一世j是j,σ一世2)
在哪里b一世j’s 被假定为用于预测的回归系数是一世基于所有其他是j的。完整的分布规范是=(是1,…,是n)来自发行版的独立产品规范(2.6)对于每个一世=1,…,n. 我们现在讨论要理解的几个关键点和概念，我们将它们作为项目符号列表呈现在下面。

模型（2.6）可以等效地重写为
是=乙是+ε
在哪里ε=(ε1,…,εn)是具有零均值的多元正态误差分布。的出现是上图右侧强调了 CAR 中的关键字 AutoRegression。
CAR 规范定义了一个有效的多元正态概率分布是在附加条件下
b一世jσ一世2=bj一世σj2,一世,j=1,…,n
这需要确保逆协方差矩阵Σ−1在 (A.24) 中，是对称的。

统计代写|贝叶斯统计代写beyesian statistics代考|Point processes

当感兴趣的事件（例如疾病爆发，例如 Covid-19）发生在感兴趣的研究区域内的随机位置时，就会出现空间点模式数据，D. 通常这种情况下的主要兴趣在于在数据位置的散点图中发现任何可解释的或非随机模式。数据位置中没有任何规则模式被认为对应于完全空间随机性模型，CSR，也称为泊松过程。在 CSR 下，任何给定子区域中的点数将遵循泊松分布，其参数值与子区域的面积成正比。通常，研究人员有兴趣拒绝 CSR 模型，转而支持他们自己的点演化或聚类理论。在这种情况下，研究人员必须确定所有类型的聚类可能解释点的聚类模式，以及其中哪一种提供与观察数据的“最佳”拟合。还有其他明显的调查要做，例如，是否有任何合适的协变量可以解释这种模式？举例来说，一个城市的许多地区缺乏树木可以用一层建筑环境来解释。

时空点过程数据自然存在于许多学科中，包括（人类或兽医）流行病学，其中广泛的数据集也变得越来越普遍。实践中的一个重要区别是定义为空间点过程的离散时间序列或空间和时间连续点过程的过程之间的区别。有关许多示例和理论发展，请参阅 Diggle (2014) 和 Møller 和 Waagepetersen (2003) 的书籍。

统计代写|贝叶斯统计代写beyesian statistics代考|Conclusion

本章的主要目的是介绍我们在后续章节中进行时空建模所需的关键概念。时空建模与任何其他实质性的科学研究领域一样，都有自己独特的一组关键词和概念词典。不知道其中一些是完全理解或更恰当地欣赏建模方程的幕后发生的事情的障碍。因此，本章填补了读者对建模时使用的典型术语可能存在的知识空白。

不可能使该章完全免费。引入了符号以保持表述的严谨性，同时也作为后面章节中假设的许多关键概念的早期和独特的参考点。例如，高斯过程 (GP)、克里金法、内部和外部标准化的概念是在没有数据应用过载的情况下定义的。当然，在读者遇到不熟悉的行话之前，可以跳过本章的阅读。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

统计代写|贝叶斯统计代写Bayesian statistics代考|Measures of spatial association for areal data

Posted on 2022年5月21日2022年5月21日 by statistics-lab

如果你也在怎样代写贝叶斯统计这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的贝叶斯统计及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

The Hole Argument (Stanford Encyclopedia of Philosophy) — 统计代写|贝叶斯统计代写Bayesian statistics代考|Measures of spatial association for areal data

统计代写|贝叶斯统计代写beyesian statistics代考|Measures of spatial association for areal data

Exploration of areal spatial data requires definition of a sense of spatial distance between all the constituting areal units within the data set. This measure of distance is parallel to the distance $d$ between any two point referenced spatial locations discussed previously in this chapter. A blank choropleth map, e.g. Figure $1.12$ without the color gradients, provides a quick visual measure of spatial distance, e.g. California, Nevada and Oregon in the west coast are spatial neighbors but they are quite a long distance away from Pennsylvania, New York and Connecticut in the east coast. More formally, the concept of spatial distance for areal data is captured by what is called a neighborhood, or a proximity, or an adjacency, matrix. This is essentially a matrix where each of its entry is used to provide information on the spatial relationship between each possible pair of the areal units in the data set.

The proximity matrix, denoted by $W$, consists of weights which are used to represent the strength of spatial association between the different areal units. Assuming that there are $n$ areal units, the matrix $W$ is of the order $n \times n$ where each of its entry $w_{i j}$ contains the strength of spatial association between the units $i$ and $j$, for $i, j=1, \ldots, n$. Customarily, wii is set to 0 for each $i=1, \ldots, n$. Commonly, the weights $w_{i j}$ for $i \neq j$ are chosen to be binary where it is assigned the value 1 if units $i$ and $j$ share a common boundary and 0 otherwise. This proximity matrix can readily be formed just by inspecting a choropleth map, such as the one in Figure 1.12. However, the weighting function can instead be designed so as to incorporate other spatial information, such as the distances between the areal units. If required, additional proximity matrices can be defined for different orders, whereby the order dictates the proximity of the areal units. For instance we may have a first order proximity matrix representing the direct neighbors for an areal unit,

a second order proximity matrix representing the neighbors of the first order areal units and so on. These considerations will render a proximity matrix, which is symmetric, i.e. $w_{i j}=w_{j i}$ for all $i$ and $j$.

The weighting function $w_{i j}$ can be standardized by calculating a new proximity matrix given by $\tilde{w}{i j}=w{i j} / w_{i+}$ where $w_{i+}=\sum_{j=1}^{n} w_{i j}$, so that each areal unit is given a sense of “equality” in any statistical analysis. However, in this case the new proximity matrix may not remain symmetric, i.e. $\tilde{w}{i j}$ may or may not equal $\tilde{w}{j i}$ for all $i$ and $j$.

When working with grid based areal data, where the proximity matrix is defined based on touching areal units, it is useful to specify whether “queen” or “rook”, in a game of chess, based neighbors are being used. In the $R$ package spdep, “queen” based neighbors refer to any touching areal units, whereas “rook” based neighbors use the stricter criteria that both areal units must share an edge (Bivand, 2020).

There are two popular measures of spatial association for areal data which together serve as parallel to the concept of the covariance function, and equivalently variogram, defined earlier in this chapter. The first of these two measures is the Moran’s $I$ (Moran, 1950) which acts as an adaptation of Pearson’s correlation coefficient and summarizes the level of spatial autocorrelation present in the data. The measure $I$ is calculated by comparing each observed area $i$ to its neighboring areas using the weights, $w_{i j}$, from the proximity matrix for all $j=1, \ldots, n$. The formula for Moran’s $I$ is written as:
$$
I=\frac{n}{\sum_{i \neq j} w_{i j}} \frac{\sum_{i=1}^{n} \sum_{j=1}^{n} w_{i j}\left(Y_{i}-\bar{Y}\right)\left(Y_{j}-\bar{Y}\right)}{\sum_{i=1}^{n}\left(Y_{i}-\bar{Y}\right)^{2}},
$$
where $Y_{i}, i=1, \ldots, n$ is the random sample from the $n$ areal units and $\bar{Y}$ is the sample mean. It can be shown that $I$ lies in the interval $[-1,1]$, and its sampling variance can be found, see e.g. Section $4.1$ in Banerjee et al. (2015) so that an asymptotic test can be performed by appealing to the central limit theorem. For small values of $n$ there are permutation tests which compares the observed value of $I$ to a null distribution of the test statistic $I$ obtained by simulation. We shall illustrate these with a real data example in Section 3.4.
An alternative to the Moran’s $I$ is the Geary’s $C$ (Geary, 1954) which also measures spatial autocorrelation present in the data. The Geary’s $C$ is given by
$$
C=\frac{(n-1)}{2 \sum_{i \neq j} w_{i j}} \frac{\sum_{i=1}^{n} \sum_{j=1}^{n} w_{i j}\left(Y_{i}-Y_{j}\right)^{2}}{\sum_{i=1}^{n}\left(Y_{i}-\bar{Y}\right)^{2}} .
$$
The measure $C$ being the ratio of two weighted sum of squares is never negative. It can be shown that $E(C)=1$ under the assumption of no spatial association. Small values of $C$ away from the mean 1 indicate positive spatial association. An asymptotic test can be performed but the speed of convergence to the limiting null distribution is expected to be very slow since it is a ratio of weighted sum of squares. Monte Carlo permutation tests can be performed and those will be illustrated in Section $3.4$ with a real data example.

统计代写|贝叶斯统计代写beyesian statistics代考|Internal and external standardization for areal data

Internal and external standardization are two oft-quoted keywords in areal data modeling, especially in disease mapping where rates of a disease over different geographical (areal) units are compared. These two are now defined along with other relevant key words. To facilitate the comparison often we aim to understand what would have happened if all the areal units had the same uniform rate. This uniform rate scenario serves as a kind of a null hypothesis of “no spatial clustering or association”. Disease incidence rates in excess or in deficit relative to the uniform rate is called the relative risk. Relative risk is often expressed as a ratio where the denominator corresponds to the standard dictated by the above null hypothesis. Thus, a relative risk of $1.2$ will imply $20 \%$ increased risk relative to the prevailing standard rate. The relative risk can be associated with a particular geographical areal unit or even for the whole study domain when the standard may refer to an absence of the disease. Statistical models are often postulated for the relative risk for the ease of interpretation.

Return to the issue of comparison of disease rates relative to the uniform rate. Often in practical data modeling situation, the counts of number of individuals over different geographies and other categories, e.g. sex and ethnicity, are available. Standardization, internal and external, is a process by which we obtain the corresponding counts of diseased individuals under the assumption of the null hypothesis of uniform disease rates being true. We now introduce the notation $n_{i}$, for $i=1, \ldots, k$ being the total number of individuals in region $i$ and $y_{i}$ being the observed number of individuals with the disease, often called cases, in region $i$. Under the null hypothesis
$$
\bar{r}=\frac{\sum_{i=1}^{k} y_{i}}{\sum_{i=1}^{k} n_{i}}
$$
will be an estimate of the uniform disease rate. As a result,
$$
E_{i}=n_{i} \bar{r}
$$
will be the expected number of individuals with the disease in region $i$ if the null hypothesis of uniform disease rate is true. Note that $\sum_{i=1}^{k} E_{i}=\sum_{i=1}^{k} y_{i}$ so that the total number of observed and expected cases are same. Note that to find $E_{i}$ we used the observations $y_{i}, i=1, \ldots, k$. This process of finding the $E_{i}$ ‘s is called internal standardization. The word internal highlights the use of the data itself to perform the standardization.

The technique of internal standardization is appealing to the analysts since no new external data are needed for the purposes of modeling and analysis. However, this technique is often criticized since in the modeling process $E_{i}$ ‘s are treated as fixed values when in reality these are functions of the random observations $y_{i}$ ‘s of the associated random variables $Y_{i}$ ‘s. Modeling of the $Y_{i}$ ‘s while treating the $E_{i}$ s as fixed is the unsatisfactory aspect of this strategy. To overcome this drawback the concept of external standardization is often used and this is what is discussed next.

统计代写|贝叶斯统计代写beyesian statistics代考|Spatial smoothers

Observed spatially referenced data will not be smooth in general due to the presence of noise and many other factors, such as data being observed at a

coarse irregular spatial resolution where observation locations are not on a regular grid. Such irregular variations hinder making inference regarding any dominant spatial pattern that may be present in the data. Hence researchers often feel the need to smooth the data to discover important discernible spatial trend from the data. Statistical modeling, as proposed in this book, based on formal coherent methods for fitting and prediction, is perhaps the best formal method for such smoothing needs. However, researchers often use many non-rigorous off-the shelf methods for spatial smoothing either as exploratory tools demonstrating some key features of the data or more dangerously for making inference just by “eye estimation” methods. Our view in this book is that we welcome those techniques primarily as exploratory data analysis tools but not as inference making tools. Model based approaches are to be used for smoothing and inference so that the associated uncertainties of any final inferential product may be quantified fully.

For spatially point referenced data we briefly discuss the inverse distance weighting (IDW) method as an example method for spatial smoothing. There are many other methods based on Thiessen polygons and crude application of Kriging (using ad-hoc estimation methods for the unknown parameters). These, however, will not be discussed here due to their limitations in facilitating rigorous model based inference.

To perform spatial smoothing, the IDW method first prepares a fine grid of locations covering the study region. The IDW method then performs interpolation at each of those grid locations separately. The formula for interpolation is a weighted linear function of the observed data points where the weight for each observation is inversely proportional to the distance between the observation and interpolation locations. Thus to predict $Y\left(\mathrm{~s}{0}\right)$ at location $\mathbf{s}{0}$ the IDW method first calculates the distance $d_{i 0}=\left|\mathbf{s}{i}-\mathbf{s}{0}\right|$ for $i=1, \ldots, n$. The prediction is now given by:
$$
\hat{Y}\left(\mathbf{s}{0}\right)=\frac{1}{\sum{i=1}^{n} \frac{1}{d_{i 0}}} \sum_{i=1}^{n} \frac{y\left(\mathbf{s}{i}\right)}{d{i 0}}
$$
Variations in the basic IDW methods are introduced by replacing $d_{i 0}$ by the $p$ th power, $d_{i 0}^{p}$ for some values of $p>0$. The higher the value of $p$, the quicker the rate of decay of influence of the distant observations in the interpolation. Note that it is not possible to attach any uncertainty measure to the individual predictions $Y\left(\mathbf{s}{0}\right)$ since a joint model has not been specified for the random vector $Y\left(\mathbf{s}{0}\right), Y\left(\mathbf{s}{1}\right), \ldots, Y\left(\mathbf{s}{n}\right)$. However, in practice, an overall error rate such as the root mean square prediction error can be calculated for set aside validation data sets. Such an overall error rate will fail to ascertain uncertainty for prediction at an individual location.

There are many methods for smoothing areal data as well. One such method is inspired by what are known as conditionally auto-regressive (CAR) models which will be discussed more formally later in Section 2.14. In implementing this method we first need to define a neighborhood structure.

Technologies | Free Full-Text | A Space-Time Correlation Model for MRC Receivers in Rayleigh Fading Channels | HTML — 统计代写|贝叶斯统计代写Bayesian statistics代考|Measures of spatial association for areal data

贝叶斯统计代写

统计代写|贝叶斯统计代写beyesian statistics代考|Measures of spatial association for areal data

区域空间数据的探索需要定义数据集中所有构成区域单元之间的空间距离感。这种距离度量与距离平行d在本章前面讨论的任何两点参考空间位置之间。一个空白的等值线图，例如图1.12没有颜色渐变，提供了空间距离的快速视觉测量，例如西海岸的加利福尼亚、内华达和俄勒冈是空间邻居，但它们与东海岸的宾夕法尼亚、纽约和康涅狄格相距很远。更正式地说，区域数据的空间距离概念由所谓的邻域、邻近或邻接矩阵捕获。这本质上是一个矩阵，其中的每个条目都用于提供有关数据集中每个可能的区域单元对之间的空间关系的信息。

邻近矩阵，表示为在, 由权重组成，用于表示不同区域单元之间的空间关联强度。假设有n面积单位，矩阵在是有序的n×n其中每个条目在一世j包含单元之间的空间关联强度一世和j，为了一世,j=1,…,n. 通常，wii 设置为 0一世=1,…,n. 通常，权重在一世j为了一世≠j被选择为二进制，如果单位被赋值为 1一世和j共享一个公共边界，否则为 0。这个邻近矩阵可以很容易地通过检查一个等值线图来形成，例如图 1.12 中的那个。然而，加权函数可以被设计成包含其他空间信息，例如区域单元之间的距离。如果需要，可以为不同的阶定义额外的邻近矩阵，其中阶决定了区域单元的邻近度。例如，我们可能有一个表示区域单元的直接邻居的一阶邻近矩阵，

表示一阶区域单元的邻居的二阶邻近矩阵，依此类推。这些考虑将呈现一个接近矩阵，它是对称的，即在一世j=在j一世对全部一世和j.

加权函数在一世j可以通过计算一个新的邻近矩阵来标准化在~一世j=在一世j/在一世+在哪里在一世+=∑j=1n在一世j，以便在任何统计分析中都赋予每个区域单位一种“平等”的感觉。然而，在这种情况下，新的邻近矩阵可能不会保持对称，即在~一世j可能相等也可能不相等在~j一世对全部一世和j.

在处理基于网格的区域数据时，其中接近矩阵是基于接触区域单位定义的，在国际象棋游戏中，指定使用基于邻居的“皇后”或“车”是有用的。在里面R包 spdep，基于“queen”的邻居指的是任何接触的区域单元，而基于“rook”的邻居使用更严格的标准，即两个区域单元必须共享一条边（Bivand，2020）。

有两种流行的区域数据空间关联度量，它们共同作用于本章前面定义的协方差函数和等效变异函数的概念。这两项措施中的第一项是 Moran’s一世(Moran, 1950) 作为 Pearson 相关系数的改编版，总结了数据中存在的空间自相关水平。的措施一世通过比较每个观察区域来计算一世使用权重到其相邻区域，在一世j，来自所有的邻近矩阵j=1,…,n. 莫兰公式一世写成：
一世=n∑一世≠j在一世j∑一世=1n∑j=1n在一世j(是一世−是¯)(是j−是¯)∑一世=1n(是一世−是¯)2,
在哪里是一世,一世=1,…,n是来自的随机样本n区域单位和是¯是样本均值。可以证明一世位于区间[−1,1]，并且可以找到它的采样方差，请参见例如 Section4.1在 Banerjee 等人。（2015），以便可以通过诉诸中心极限定理来进行渐近检验。对于小值n有比较观察值的置换检验一世到检验统计量的零分布一世通过模拟得到。我们将在 3.4 节中用一个真实的数据示例来说明这些。
莫兰的替代品一世是 Geary 的C(Geary, 1954) 它还测量数据中存在的空间自相关。吉尔里的C是（谁）给的
C=(n−1)2∑一世≠j在一世j∑一世=1n∑j=1n在一世j(是一世−是j)2∑一世=1n(是一世−是¯)2.
的措施C作为两个加权平方和的比率永远不会是负数。可以证明和(C)=1在没有空间关联的假设下。的小值C远离均值 1 表示正空间关联。可以执行渐近测试，但收敛到极限零分布的速度预计会非常慢，因为它是加权平方和的比率。可以执行蒙特卡洛置换测试，这些将在第3.4有一个真实的数据示例。

统计代写|贝叶斯统计代写beyesian statistics代考|Internal and external standardization for areal data

内部标准化和外部标准化是区域数据建模中经常引用的两个关键词，特别是在比较不同地理（区域）单位的疾病发生率的疾病绘图中。这两个现在与其他相关关键词一起定义。为了便于比较，我们通常旨在了解如果所有区域单位具有相同的统一比率会发生什么。这种统一速率的场景是一种“没有空间聚类或关联”的零假设。相对于统一比率的疾病发病率超过或不足称为相对风险。相对风险通常表示为分母对应于上述零假设所规定的标准的比率。因此，相对风险1.2会暗示20%相对于现行标准利率，风险增加。相对风险可能与特定的地理区域单位相关，甚至当标准可能涉及不存在疾病时，甚至与整个研究领域相关。为了便于解释，通常假设统计模型具有相对风险。

回到疾病率相对于统一率的比较问题。通常在实际数据建模情况下，可以使用不同地理和其他类别（例如性别和种族）的个人数量。标准化，内部和外部，是我们在假设统一疾病率的零假设为真的情况下获得患病个体的相应计数的过程。我们现在介绍符号n一世，为了一世=1,…,ķ是区域内的个体总数一世和是一世是在该地区观察到的患有这种疾病的个体的数量，通常称为病例一世. 在原假设下
r¯=∑一世=1ķ是一世∑一世=1ķn一世
将是对统一疾病率的估计。因此，
和一世=n一世r¯
将是该地区的预期患病人数一世如果统一发病率的原假设为真。注意∑一世=1ķ和一世=∑一世=1ķ是一世使观察到的和预期的病例总数相同。注意要找到和一世我们使用了观察结果是一世,一世=1,…,ķ. 这个寻找的过程和一世被称为内部标准化。内部一词强调了使用数据本身来执行标准化。

内部标准化技术对分析师很有吸引力，因为建模和分析不需要新的外部数据。然而，这种技术经常受到批评，因为在建模过程中和一世被视为固定值，而实际上这些是随机观察的函数是一世的相关随机变量是一世的。的建模是一世的同时治疗和一世s 固定是该策略的不令人满意的方面。为了克服这个缺点，经常使用外部标准化的概念，这就是接下来要讨论的内容。

统计代写|贝叶斯统计代写beyesian statistics代考|Spatial smoothers

由于存在噪声和许多其他因素，例如在

粗略的不规则空间分辨率，其中观察位置不在规则网格上。这种不规则的变化阻碍了对可能存在于数据中的任何主要空间模式进行推断。因此，研究人员经常觉得有必要对数据进行平滑处理，以从数据中发现重要的可辨别空间趋势。本书中提出的统计建模，基于用于拟合和预测的形式一致方法，可能是满足此类平滑需求的最佳形式方法。然而，研究人员经常使用许多非严格的现成方法来进行空间平滑，或者作为展示数据某些关键特征的探索性工具，或者更危险地用于仅通过“眼睛估计”方法进行推断。我们在本书中的观点是，我们欢迎这些技术主要作为探索性数据分析工具，而不是作为推理工具。基于模型的方法将用于平滑和推理，以便可以完全量化任何最终推理产品的相关不确定性。

对于空间点参考数据，我们简要讨论反距离加权 (IDW) 方法作为空间平滑的示例方法。还有许多其他方法基于泰森多边形和克里金法的粗略应用（对未知参数使用临时估计方法）。然而，由于它们在促进基于模型的严格推理方面的局限性，这里将不讨论这些。

为了进行空间平滑，IDW 方法首先准备一个覆盖研究区域的精细位置网格。然后，IDW 方法在每个网格位置分别执行插值。插值公式是观测数据点的加权线性函数，其中每个观测值的权重与观测值和插值位置之间的距离成反比。从而预测是( s0)在位置s0IDW方法首先计算距离d一世0=|s一世−s0|为了一世=1,…,n. 预测现在由下式给出：
是^(s0)=1∑一世=1n1d一世0∑一世=1n是(s一世)d一世0
通过替换引入基本 IDW 方法的变化d一世0由p权力，d一世0p对于一些值p>0. 的价值越高p，插值中远处观测的影响衰减率越快。请注意，不可能将任何不确定性度量附加到单个预测是(s0)因为没有为随机向量指定联合模型是(s0),是(s1),…,是(sn). 然而，在实践中，可以为预留的验证数据集计算整体错误率，例如均方根预测误差。这样的总体错误率将无法确定单个位置的预测不确定性。

也有许多用于平滑面数据的方法。一种这样的方法受到所谓的条件自回归（CAR）模型的启发，稍后将在第 2.14 节中更正式地讨论。在实现这个方法时，我们首先需要定义一个邻域结构。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

有限元方法代写

随机分析代写

时间序列分析代写

回归分析代写

MATLAB代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

统计代写|贝叶斯统计代写Bayesian statistics代考|Space-time covariance functions

Posted on 2022年5月21日2022年5月21日 by statistics-lab

如果你也在怎样代写贝叶斯统计这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。

我们提供的贝叶斯统计及其相关学科的代写，服务范围广, 其中包括但不限于:

Statistical Inference 统计推断
Statistical Computing 统计计算
Advanced Probability Theory 高等概率论
Advanced Mathematical Statistics 高等数理统计学
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础

Testing the type of non-separability and some classes of space-time covariance function models | SpringerLink — 统计代写|贝叶斯统计代写Bayesian statistics代考|Space-time covariance functions

统计代写|贝叶斯统计代写beyesian statistics代考|Space-time covariance functions

A particular covariance structure must be assumed for the $Y\left(\mathbf{s}{i}, t\right)$ process. The pivotal space-time covariance function is defined as $$ C\left(\mathbf{s}{1}, \mathbf{s}{2} ; t{1}, t_{2}\right)=\operatorname{Cov}\left[Y\left(\mathbf{s}{1}, t{1}\right), Y\left(\mathbf{s}{2}, t{2}\right)\right] .
$$
The zero mean spatio-temporal process $Y(s, t)$ is said to be covariance stationary if
$$
C\left(\mathbf{s}{1}, \mathbf{s}{2} ; t_{1}, t_{2}\right)=C\left(\mathbf{s}{1}-\mathbf{s}{2} ; t_{1}-t_{2}\right)=C(d ; \tau)
$$

where $d=\mathbf{s}{1}-\mathbf{s}{2}$ and $\tau=t_{1}-t_{2}$. The process is said to be isotropic if
$$
C(d ; \tau)=C(|d| ;|\tau|),
$$
that is, the covariance function depends upon the separation vectors only through their lengths ||$d||$ and $|\tau|$. Processes which are not isotropic are called anisotropic. In the literature isotropic processes are popular because of their simplicity and interpretability. Moreover, there is a number of simple parametric forms available to model those.

A further simplifying assumption to make is the assumption of separability; see for example, Mardia and Goodall (1993). Separability is a concept used in modeling multivariate spatial data including spatio-temporal data. A separable covariance function in space and time is simply the product of two covariance functions one for space and the other for time.
The process $Y(\mathbf{s}, t)$ is said to be separable if
$$
\left.C(|d| ;|\tau|)=C_{s}(|d|)\right) C_{t}(|\tau|) .
$$
Now suitable forms for the functions $C_{s}(\cdot)$ and $C_{t}(\cdot)$ are to be assumed. A very general choice is to adopt the Matèrn covariance function introduced before.
There is a growing literature on methods for constructing non-separable and non-stationary spatio-temporal covariance functions that are useful for modeling. See for example, Gneiting (2002) who develops a class of nonseparable covariance functions. A simple example is:
$$
C(|d| ;|\tau|)=(1+|\tau|)^{-1} \exp \left{-| d|| /(1+|\tau|)^{\beta / 2}\right},
$$
where $\beta \in[0,1]$ is a space-time interaction parameter. For $\beta=0,(2.3)$ provides a separable covariance function. The other extreme case at $\beta=1$ corresponds to a totally non-separable covariance function. Figure $2.3$ plots this function for four different values: $0,0.25,0.5$ and 1 of $\beta$. There are some discernible differences between the functions can be seen for higher distances at the top right corner of each plot. However, it is true that it is not easy to describe the differences, and it gets even harder to see differences in model fits. The paper by Gneiting (2002) provides further descriptions of non-separable covariance functions.

There are other ways to construct non-separable covariance functions, for example, by mixing more than one spatio-temporal processes, see e.g. Sahu et al. (2006) or by including a further level of hierarchy where the covariance matrix obtained using $C(|d| ;|\tau|)$ follows a inverse-Wishart distribution centred around a separable covariance matrix. Section $8.3$ of the book by Banerjee et al. (2015) also lists many more strategies. For example, Schmidt and O’Hagan (2003) construct non-stationary spatio-temporal covariance structure via deformations.

统计代写|贝叶斯统计代写beyesian statistics代考|Kriging or optimal spatial prediction

The jargon “Kriging” refers to a form of spatial prediction at an unobserved location based on the observed data. That it is a popular method is borne by the fact that “Kriging” is verbification of a method of spatial prediction named after its inventor D.G. Krige, a South African mining engineer. Kriging solves the problem of predicting $Y(\mathbf{s})$ at a new location $\mathbf{s}{0}$ having observed data $y\left(\mathbf{s}{1}\right), \ldots, y\left(\mathbf{s}_{n}\right)$.

Classical statistical theory based on squared error loss function in prediction will yield the sample mean $\bar{y}$ to be the optimal predictor for $Y\left(\mathbf{s}{0}\right)$ if spatial dependency is ignored between the random variables $Y\left(\mathbf{s}{0}\right), Y\left(\mathbf{s}{1}\right), \ldots, Y\left(\mathbf{s}{n}\right)$. Surely, because of the Tobler’s law, the prediction for $Y\left(\mathbf{s}{0}\right)$ will be improved if instead spatial dependency is taken into account. The observations nearer to the prediction location, $\mathbf{s}{0}$, will receive higher weights in the prediction formula than the observations further apart. So, now the question is how do we determine these weights? Kriging provides the answer.
In order to proceed further we assume that $Y(\mathbf{s})$ is a GP, although some of the results we discuss below also hold in general without this assumption. In order to perform Kriging it is assumed that the best linear unbiased predictor with weights $l_{i}, \hat{Y}\left(\mathrm{~s}{0}\right)$, is of the form $\sum{i=1}^{n} \ell_{i} Y\left(\mathbf{s}{i}\right)$ and dependence between the $Y\left(\mathbf{s}{0}\right), Y\left(\mathbf{s}{1}\right), \ldots, Y\left(\mathbf{s}{n}\right)$ is described by a covariance function, $C(d \mid \psi)$, of the distance $d$ between any two locations as defined above in this chapter. The Kriging weights are easily determined by evaluating the conditional mean of $Y\left(\mathbf{s}{0}\right)$ given the observed values $y\left(\mathbf{s}{1}\right), \ldots, y\left(\mathbf{s}{n}\right)$. These weights are “optimal” in the same statistical sense that the mean $E(X)$ minimizes the expected value of the squared error loss function, i.e. $E(X-a)^{2}$ is minimized at $a=$ $E(X)$. Here we take $X$ to be the conditional random variable $Y\left(\mathrm{~s}{0}\right)$ given $y\left(\mathbf{s}{1}\right), \ldots, y\left(\mathbf{s}{n}\right) .$

The actual values of the optimal weights are derived by partitioning the mean vector, $\boldsymbol{\mu}{n+1}$ and the covariance matrix, $\Sigma$, of $Y\left(\mathbf{s}{0}\right), Y\left(\mathbf{s}{1}\right), \ldots, Y\left(\mathbf{s}{n}\right)$ as follows. Let
$$
\boldsymbol{\mu}{n+1}=\left(\begin{array}{c} \mu{0} \
\boldsymbol{\mu}
\end{array}\right), \quad \Sigma=\left(\begin{array}{cc}
\sigma_{00} & \Sigma_{01} \
\Sigma_{10} & \Sigma_{11}
\end{array}\right)
$$
where $\boldsymbol{\mu}$ is the vector of the means of $\mathbf{Y}=\left(Y\left(\mathbf{s}{1}\right), \ldots, Y\left(\mathbf{s}{n}\right)\right)^{\prime} ; \sigma_{00}=$ $\operatorname{Var}\left(Y\left(\mathbf{s}{0}\right)\right) ; \Sigma{01}=\Sigma_{1,0}^{\prime}=\operatorname{Cov}\left(\begin{array}{c}Y\left(\mathbf{s}{0}\right) \ \mathbf{Y}{1}\end{array}\right) ; \Sigma_{11}=\operatorname{Var}\left(\mathbf{Y}{1}\right)$. Now standard multivariate normal distribution theory tells us that $$ Y\left(\mathbf{s}{0}\right) \mid \mathbf{y} \sim N\left(\mu_{0}+\Sigma_{01} \Sigma_{11}^{-1}(\mathbf{y}-\boldsymbol{\mu}), \sigma_{00}-\Sigma_{01} \Sigma_{11}^{-1} \Sigma_{10}\right) .
$$
In order to facilitate a clear understanding of the underlying spatial dependence on Kriging we assume a zero-mean GP, i.e. $\boldsymbol{\mu}{\mathrm{n}+1}=\mathbf{0}$. Now we have $E\left(Y\left(\mathbf{s}{0}\right) \mid \mathbf{y}\right)=\Sigma_{01} \Sigma_{11}^{-1} \mathbf{y}$ and thus we see that the optimal Kriging weights are particular functions of the assumed covariance function. Note that the weights

do not depend on the underlying common spatial variance as that is canceled in the product $\Sigma_{01} \Sigma_{11}^{-1}$. However, the spatial variance will affect the accuracy of the predictor since $\operatorname{Var}\left(Y\left(\mathbf{s}{0}\right) \mid \mathbf{y}\right)=\sigma{00}-\Sigma_{01} \Sigma_{11}^{-1} \Sigma_{10}$.

It is interesting to note that Kriging is an exact predictor in the sense that $E\left(Y\left(\mathbf{s}{i}\right) \mid \mathbf{y}\right)=y\left(\mathbf{s}{i}\right)$ for any $i=1, \ldots, n$. It is intuitively clear why this result will hold. This is because a random variable is an exact predictor of itself. Mathematically, this can be easily proved using the definition of inverse of an matrix. To elaborate further, suppose that
$$
\Sigma=\left(\begin{array}{c}
\Sigma_{1}^{\prime} \
\Sigma_{2}^{\prime} \
\vdots \
\Sigma_{n}^{\prime}
\end{array}\right)
$$
where $\Sigma_{i}^{\prime}$ is a row vector of dimension $n$. Then the result $\Sigma \Sigma^{-1}=I_{n}$ where $I_{n}$ is the identity matrix of order $n$, implies that $\Sigma_{i} \Sigma^{-1}=\mathbf{a}{i}$ where the $i$ th element of $\mathbf{a}{i}$ is 1 and all others are zero.

The above discussion, with the simplified assumption of a zero mean GP, is justified since often in practical applications we only assume a zero-mean GP as a prior distribution. The mean surface of the data (or their transformations) is often explicitly modeled by a regression model and hence such models will contribute to determine the mean values of the predictions. In this context we note that in a non-Bayesian geostatistical modeling setup there are various flavors of Kriging such as simple Kriging, ordinary Kriging, universal Kriging, co-Kriging, intrinsic Kriging, depending on the particular assumption of the mean function. In our Bayesian inference set up such flavors of Kriging will automatically ensue since Bayesian inference methods are automatically conditioned on observed data and the explicit model assumptions.

统计代写|贝叶斯统计代写beyesian statistics代考|Autocorrelation and partial autocorrelation

A study of time series for temporally correlated data will not be complete without the knowledge of autocorrelation. Simply put, autocorrelation means correlation with itself at different time intervals. The time interval is technically called the lag in the time series literature. For example, suppose $Y_{t}$ is a time series random variable where $t \geq 1$ is an integer. The autocorrelation at lag $k(\geq 1)$ is defined as $\rho_{k}=\operatorname{Cor}\left(Y_{t+k}, Y_{t}\right)$. It is obvious that the autocorrelation at lag $k=0, \rho_{0}$, is one. Ordinarily, $\rho_{k}$ decreases as $k$ increases just as the spatial correlation decreases when the distance between two locations increases. Viewed as a function of the lag $k, \rho_{k}$ is called the autocorrelation function, often abbreviated as ACF.

Sometimes high autocorrelation at any lag $k>1$ persists because of high correlation between $Y_{t+k}$ and the intermediate time series, $Y_{t+k-1}, \ldots, Y_{t+1}$. The partial autocorrelation at lag $k$ measures the correlation between $Y_{t+k}$ and $Y_{t}$ after removing the autocorrelation at shorter lags. Formally, partial autocorrelation is defined as the conditional autocorrelation between $Y_{t+k}$ and $Y_{t}$ given the values of $Y_{t+k-1}, \ldots, Y_{t+1}$. The partial correlation can also be easily explained with the help of multiple regression. To remove the effects of intermediate time series $Y_{t+k-1}, \ldots, Y_{t+1}$ one considers two regression models: one $Y_{t+k}$ on $Y_{t+k-1}, \ldots, Y_{t+1}$ and the other $Y_{t}$ on $Y_{t+k-1}, \ldots, Y_{t+1}$. The simple correlation coefficient between two sets of residuals after fitting the two regression models is the partial auto-correlation at a given lag $k$. To learn more the interested reader is referred to many excellent introductory text books on time series such as the one by Chatfield (2003).

贝叶斯统计代写

统计代写|贝叶斯统计代写beyesian statistics代考|Space-time covariance functions

必须假设一个特定的协方差结构是(s一世,吨)过程。关键时空协方差函数定义为C(s1,s2;吨1,吨2)=这⁡[是(s1,吨1),是(s2,吨2)].
零均值时空过程是(s,吨)据说是协方差平稳的，如果
C(s1,s2;吨1,吨2)=C(s1−s2;吨1−吨2)=C(d;τ)

在哪里d=s1−s2和τ=吨1−吨2. 如果该过程是各向同性的
C(d;τ)=C(|d|;|τ|),
也就是说，协方差函数仅取决于分离向量的长度 ||d||和|τ|. 非各向同性的过程称为各向异性。在文献中，各向同性过程因其简单性和可解释性而广受欢迎。此外，还有许多简单的参数形式可用于对它们进行建模。

进一步简化的假设是可分离性假设；例如，参见 Mardia 和 Goodall (1993)。可分离性是用于对包括时空数据在内的多元空间数据进行建模的概念。空间和时间中的可分离协方差函数只是两个协方差函数的乘积，一个用于空间，另一个用于时间。
过程是(s,吨)据说是可分离的，如果
C(|d|;|τ|)=Cs(|d|))C吨(|τ|).
现在适合功能的形式Cs(⋅)和C吨(⋅)将被假定。一个非常普遍的选择是采用前面介绍的 Matèrn 协方差函数。
关于构建对建模有用的不可分离和非平稳时空协方差函数的方法的文献越来越多。例如，参见 Gneiting (2002)，他开发了一类不可分离的协方差函数。一个简单的例子是：
C(|d| ;|\tau|)=(1+|\tau|)^{-1} \exp \left{-| d|| /(1+|\tau|)^{\beta / 2}\right},C(|d| ;|\tau|)=(1+|\tau|)^{-1} \exp \left{-| d|| /(1+|\tau|)^{\beta / 2}\right},
在哪里b∈[0,1]是一个时空交互参数。为了b=0,(2.3)提供可分离的协方差函数。另一个极端情况b=1对应于一个完全不可分离的协方差函数。数字2.3为四个不同的值绘制此函数：0,0.25,0.5和 1 个b. 在每幅图的右上角可以看到距离较大的函数之间存在一些明显的差异。然而，描述这些差异确实不容易，而且更难看出模型拟合的差异。Gneiting (2002) 的论文进一步描述了不可分离的协方差函数。

还有其他方法可以构建不可分离的协方差函数，例如，通过混合多个时空过程，参见 Sahu 等人。（2006）或通过包括进一步的层次结构，其中协方差矩阵使用获得C(|d|;|τ|)遵循以可分离协方差矩阵为中心的逆Wishart 分布。部分8.3Banerjee 等人的书。(2015) 还列出了更多策略。例如，Schmidt 和 O’Hagan (2003) 通过变形构造了非平稳的时空协方差结构。

统计代写|贝叶斯统计代写beyesian statistics代考|Kriging or optimal spatial prediction

行话“克里金法”是指基于观察到的数据在未观察到的位置进行空间预测的一种形式。这是一种流行的方法，因为“克里金法”是一种空间预测方法的动词化，它的发明者是南非采矿工程师 DG Krige。克里金法解决了预测问题是(s)在一个新的位置s0有观测数据是(s1),…,是(sn).

预测中基于平方误差损失函数的经典统计理论将产生样本均值是¯成为最佳预测器是(s0)如果随机变量之间的空间依赖性被忽略是(s0),是(s1),…,是(sn). 当然，由于托布勒定律，预测是(s0)如果考虑到空间依赖性，将会得到改进。更接近预测位置的观测值，s0, 将在预测公式中获得比更远的观察值更高的权重。那么，现在的问题是我们如何确定这些权重？克里金给出了答案。
为了进一步进行，我们假设是(s)是一个 GP，尽管我们在下面讨论的一些结果在没有这个假设的情况下也普遍成立。为了执行克里金法，假设具有权重的最佳线性无偏预测器l一世,是^( s0), 是形式∑一世=1nℓ一世是(s一世)和之间的依赖是(s0),是(s1),…,是(sn)由协方差函数描述，C(d∣ψ), 的距离d在本章上述定义的任何两个位置之间。克里金权重很容易通过评估条件均值来确定是(s0)给定观察值是(s1),…,是(sn). 这些权重在统计意义上是“最优的”和(X)最小化平方误差损失函数的期望值，即和(X−一种)2最小化在一种= 和(X). 我们这里取X成为条件随机变量是( s0)给定是(s1),…,是(sn).

最佳权重的实际值是通过对均值向量进行分区得出的，μn+1和协方差矩阵，Σ，的是(s0),是(s1),…,是(sn)如下。让
μn+1=(μ0 μ),Σ=(σ00Σ01 Σ10Σ11)
在哪里μ是均值的向量是=(是(s1),…,是(sn))′;σ00= 曾是⁡(是(s0));Σ01=Σ1,0′=这⁡(是(s0) 是1);Σ11=曾是⁡(是1). 现在标准的多元正态分布理论告诉我们是(s0)∣是∼ñ(μ0+Σ01Σ11−1(是−μ),σ00−Σ01Σ11−1Σ10).
为了便于清楚地理解对克里金法的潜在空间依赖性，我们假设零均值 GP，即μn+1=0. 现在我们有和(是(s0)∣是)=Σ01Σ11−1是因此我们看到最佳克里金权重是假设协方差函数的特定函数。注意权重

不依赖于潜在的公共空间方差，因为它在产品中被取消Σ01Σ11−1. 但是，空间方差会影响预测器的准确性，因为曾是⁡(是(s0)∣是)=σ00−Σ01Σ11−1Σ10.

有趣的是，克里金法是一个精确的预测因子，在这个意义上和(是(s一世)∣是)=是(s一世)对于任何一世=1,…,n. 直观地很清楚为什么这个结果会成立。这是因为随机变量是其自身的精确预测器。在数学上，这可以很容易地使用矩阵逆的定义来证明。为了进一步阐述，假设
Σ=(Σ1′ Σ2′ ⋮ Σn′)
在哪里Σ一世′是一个维度的行向量n. 然后结果ΣΣ−1=一世n在哪里一世n是阶单位矩阵n，暗示Σ一世Σ−1=一种一世在哪里一世第一个元素一种一世是 1，其他的都是零。

上述讨论，以及零均值 GP 的简化假设是合理的，因为在实际应用中，我们通常只假设零均值 GP 作为先验分布。数据的平均表面（或其转换）通常由回归模型明确建模，因此此类模型将有助于确定预测的平均值。在这种情况下，我们注意到在非贝叶斯地统计建模设置中，有各种风格的克里金法，例如简单克里金法、普通克里金法、通用克里金法、协同克里金法、内在克里金法，具体取决于均值函数的特定假设。在我们的贝叶斯推理设置中，由于贝叶斯推理方法会自动以观察到的数据和明确的模型假设为条件，因此会自动出现这种风格的克里金法。

统计代写|贝叶斯统计代写beyesian statistics代考|Autocorrelation and partial autocorrelation

如果没有自相关知识，对时间相关数据的时间序列的研究将是不完整的。简单地说，自相关是指在不同的时间间隔与自身相关。在时间序列文献中，时间间隔在技术上称为滞后。例如，假设是吨是一个时间序列随机变量，其中吨≥1是一个整数。滞后自相关ķ(≥1)定义为ρķ=心电图⁡(是吨+ķ,是吨). 很明显，滞后的自相关ķ=0,ρ0, 是一个。按说，ρķ减少为ķ当两个位置之间的距离增加时，空间相关性就会减小。被视为滞后的函数ķ,ρķ称为自相关函数，通常缩写为 ACF。

有时在任何滞后处都有高自相关ķ>1由于之间的高度相关性而持续存在是吨+ķ和中间时间序列，是吨+ķ−1,…,是吨+1. 滞后的偏自相关ķ衡量之间的相关性是吨+ķ和是吨在更短的滞后消除自相关之后。形式上，部分自相关被定义为之间的条件自相关是吨+ķ和是吨给定的值是吨+ķ−1,…,是吨+1. 在多元回归的帮助下，偏相关也可以很容易地解释。消除中间时间序列的影响是吨+ķ−1,…,是吨+1一个考虑两个回归模型：一个是吨+ķ在是吨+ķ−1,…,是吨+1和另外一个是吨在是吨+ķ−1,…,是吨+1. 拟合两个回归模型后两组残差之间的简单相关系数是给定滞后的部分自相关ķ. 要了解更多信息，感兴趣的读者可以参考许多关于时间序列的优秀介绍性教科书，例如 Chatfield (2003) 的一本。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写