## 统计代写|抽样调查作业代写sampling theory of survey代考|Ranked Set Sampling and Judgement Post-stratification

Stokes’ pioneering work (Stokes, 1977) brought measured covariates to ranked set sampling (RSS). Briefly restating her work and establishing notation, consider a set of $n H$ units that are partitioned at random into $n$ sets, each of size $H$. The units are presumed to form a random sample from some distribution. Within a given set, we begin with $\left(X_h, Y_h\right), h=1, \ldots, H$. These units are ranked on the $X_h$, so that $X_{(r: H)}$ is the $r$ th order statistic in the set. The measured response, $Y_{[r: H]}$, associated with this unit is its concomitant. To draw a RSS of size $n$ from such a population, sample sizes $n_h, h=1, \ldots, H$, are specified, with $\sum_{h=1}^H n_h=n$. One unit is drawn from each of the $n$ sets; in $n_h$ sets, the unit ranked $h$ is selected. The resulting sample is a RSS.

The earliest description of RSS appears in McIntyre (1952) (republished as McIntyre, 2005). In McIntyre’s description of the technique, ranking is based on the subjective judgement of an experimenter who examines each set of $H$ units, specifying the ranks of the units in the set. Once the units in each set have been ranked, the sample is drawn as described above and the response of interest, $Y$, is measured on the $n$ sampled units. Extending our notation to capture both set and rank within set, the mean of the $n H$ units is
$$\bar{Y}=(n H)^{-1} \sum_{i=1}^n \sum_{h=1}^H Y_{i h},$$
where $Y_{i h}$ is the response of the unit with rank $h$ in set $i$. Suppressing the notation for the rank, define $Y_i$ to the be $i$ th of the $n$ sampled units. Provided $n_h>1$ for all $h$,
$$\bar{Y}{r s s}=H^{-1} \sum{h=1}^H \bar{Y}h,$$ where $\bar{Y}_h$ is the sample mean of the $n_h$ sampled units with rank $h$. The RSS estimator is unbiased: $E\left[\bar{Y}{r s s} \mid \bar{Y}\right]=\bar{Y}$ for any collection of $n H$ units. Furthermore, when the units are a random sample from a distribution with mean $\mu=E[Y]$, $E\left[\bar{Y}_{r s s}\right]=E[\bar{Y}]=\mu$. The goal of RSS is to estimate $\mu$. Stokes and Sager (1988) cast estimation of a cumulative distribution function as estimation of a proportion (mean) for all cut points on the real line.

RSS with estimation following (2) is robust to variation in the specifics of how the ranks are created. When created subjectively, better ranking leads to greater separation of the means of the rank classes (or strata), in turn leading to greater reduction in variance relative to estimators based on a random sample from the population. When ranks arise from a measured covariate, the same holds. Sound

## 统计代写|抽样调查作业代写sampling theory of survey代考|Multivariate Order Statistics and JPS

In Wang et al. (2006), Stokes and coauthors posed the intriguing question of how to use multiple covariates to convey information about the ranks of units for use in JPS. Their solution is to rank on each of the distinct covariates. In the case of a continuous bivariate covariate, $\left(X_1, X_2\right)$, each of the units in the set would be assigned a pair of ranks – one for $X_1$ and the other for $X_2$. This pair of ranks defines the post-stratum (or rank class) of the unit. For a set of size $H$, there are $H^2$ poststrata. We denote these post-strata with $\mathbf{r}=\left(r_1, r_2\right)$, where $r_1, r_2 \in{1, \ldots, H}$. We focus on a bivariate covariate but note that the technique extends to covariates of greater dimension. Figure 1 illustrates the situation for a bivariate order statistic for set size $H=5$.

The increase in the number of post-strata from $H$ to $H^2$ necessitates reconsideration of the basic post-stratification estimator (3). Marginally, each covariate for the measured unit will have rank $r_i=h$ with probability $1 / H$ for $i=1,2$ and $h=1, \ldots, H$. The joint distribution of $\mathbf{R}$ leads to the stratum probability $\pi_{\mathbf{r}}=P(\mathbf{R}=\mathbf{r})$. In general, these probabilities can be found via numerical integration if the model for $\left(X_1, X_2\right)$ is fully specified. Some of the $\pi_{\mathbf{r}}$ may be much smaller than $H^{-2}$, leading to a large probability that the estimator is undefined.
Wang et al. (2006) handled this issue by appealing to a parametric model as an aid to estimation. The authors defined $\mu_{[\mathbf{r}]}=F[Y \mid \mathbf{R}=\mathbf{r}]$. The value of $\mu_{[\mathbf{r}]}$ can be found by numerical integration over the conditional distribution of $Y \mid \mathbf{R}$. Once the stratum means are in place, they are connected to the mean of $Y$ via the expression $\mu=\sum_{\mathbf{r}} \pi_{\mathbf{r}} \mu_{[\mathbf{r}]}$. It is helpful to introduce the difference between the stratum mean and the overall mean, $\delta_{[\mathbf{r}]}=\mu_{[\mathbf{r}]}-\mu$. The authors suggested estimation by ordinary least squares applied to a model for $\mu$, with observations in stratum $\mathbf{r}$ offset by $\delta_{[\mathbf{r}]}$. The data are $\left(Y_i, \mathbf{r}i\right), i=1, \ldots, n$, and the estimator is $$\hat{\mu}{o L S}=n^{-1} \sum_{i=1}^n\left(Y_i-\delta_{\left[\mathbf{r}i\right]}\right) .$$ The estimator $\hat{\mu}{o L S}$ can be viewed in two stages: In the first, each observation is bias-corrected by subtracting its $\delta_{[\mathbf{r}]}$; in the second, the sample mean of the biascorrected observations is computed. Partitioning the sample into strata reduces the within-stratum variances. Removing bias and then using the sample mean ensures that each observation receives equal weight in the estimator. Together, these two stages lead to substantial variance reduction, especially for relatively large set sizes.

Stokes 的开创性工作 (Stokes, 1977) 将测量的协变量引入排序集抽样 (RSS)。简要重申她的工作并建立符 号，考虑一组 $n H$ 随机分成的单元 $n$ 套装，每个尺寸 $H$. 假定这些单位从某种分布中形成随机样本。在给定 的集合中，我们从 $\left(X_h, Y_h\right), h=1, \ldots, H$. 这些单位排名在 $X_h$ ，以便 $X_{(r: H)}$ 是个 $r$ 集合中的 th 阶统 计量。测得的响应， $Y_{[r: H]}$, 与这个单位相关的是它的伴随物。绘制大小为RSSn从这样的人群中，样本量 $n_h, h=1, \ldots, H$ ，被指定为 $\sum_{h=1}^H n_h=n$. 每个单位抽取一个单位 $n$ 套; 在 $n_h$ 套，单位排名 $h$ 被选中。 生成的样本是一个 RSS。

$$\bar{Y}=(n H)^{-1} \sum_{i=1}^n \sum_{h=1}^H Y_{i h},$$

$$\bar{Y} r s s=H^{-1} \sum h=1^H \bar{Y} h,$$

$$\hat{\mu} o L S=n^{-1} \sum_{i=1}^n\left(Y_i-\delta_{[\mathbf{r} i]}\right) .$$

