## 统计代写|R语言代写R language代考|SOW-BS086

R是一种用于统计计算和图形的编程语言，由R核心团队和R统计计算基金会支持。R由统计学家Ross Ihaka和Robert Gentleman创建，在数据挖掘者和统计学家中被用于数据分析和开发统计软件。用户已经创建了软件包来增强R语言的功能。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础

Thus far, we’ve only been entering data directly into the interactive R console. For any data set of non-trivial size this is, obviously, an intractable solution. Fortunately for us, $\mathrm{R}$ has a robust suite of functions for reading data directly from external files.
Go ahead, and create a file on your hard disk called favorites . txt that looks like this:
flavor, number
pistachio, 6
mint chocolate chip, 7
vanilla,5
chocolate, 10
strawberry, 2
neopolitan, 4
This data represents the number of students in a class that prefer a particular flavor of soy ice cream. We can read the file into a variable called favs as follows:

If you get an error that there is no such file or directory, give $\mathrm{R}$ the full path name to your data set or, alternatively, run the following command:
The preceding command brings up an open file dialog for letting you navigate to the file you’ve just created.
The argument sep $=$ “, ” tells $\mathrm{R}$ that each data element in a row is separated by a comma. Other common data formats have values separated by tabs and pipes (“|”). The value of sep should then be ” $\backslash t “$ and ” $\mid$ “, respectively.

The argument header=TRUE tells $\mathrm{R}$ that the first row of the file should be interpreted as the names of the columns. Remember, you can enter ?read. table at the console to learn more about these options.

Reading from files in this comma-separated-values format (usually with the .csv file extension) is so common that $\mathrm{R}$ has a more specific function just for it. The preceding data import expression can be best written simply as: Now, we have all the data in the file held in a variable of class data. frame. A data frame can be thought of as a rectangular array of data that you might see in a spreadsheet application. In this way, a data frame can also be thought of as a matrix; indeed, we can use matrix-style indexing to extract elements from it. A data frame differs from a matrix, though, in that a data frame may have columns of differing types. For example, whereas a matrix would only allow one of these types, the data set we just loaded contains character data in its first column, and numeric data in its second column.

## 统计代写|R语言代写R language代考|Working with packages

Robust, performant, and numerous though base R’s functions are, we are by no means limited to them! Additional functionality is available in the form of packages. In fact, what makes $\mathrm{R}$ such a formidable statistics platform is the astonishing wealth of packages available (well over 7,000 at the time of writing). R’s ecosystem is second to none!
Most of these myriad packages exist on the Comprehensive R Archive Network (CRAN). CRAN is the primary repository for user-created packages.

One package that we are going to start using right away is the ggplot2 package. ggplot2 is a plotting system for $R$. Base $R$ has sophisticated and advanced mechanisms to plot data, but many find ggplot2 more consistent and easier to use. Further, the plots are often more aesthetically pleasing by default.
Let’s install it!

install.packages (“ggplot2”)
Now that we have the package downloaded, let’s load it into the R session, and test it out by plotting our data from the last section:

You’re all wrong, Mint Chocolate Chip is way better!
Don’t worry about the syntax of the ggplot function, yet. We’ll get to it in good time.
You will be installing some more packages as you work through this text. In the meantime, if you want to play around with a few more packages, you can install the gdata and foreign packages that allow you to directly import Excel spreadsheets and SPSS data files respectively directly into $R$.

neopolitan, 4

favs <- read.table ("favorites.txt", sep = ",", header=TRUE)

(), sep $=$ “,”, header=TRUE)

## 统计代写|R语言代写R language代考|Working with packages

#从 CRAN 下载和安装
install.packages ("ggplot2")

## 统计代写|R语言代写R language代考|Subsetting

It is very common to want to extract one or more elements from a vector. For this, we use a technique called indexing or subsetting. After the vector, we put an integer in square brackets ( [] ) called the subscript operator. This instructs $\mathrm{R}$ to return the element at that index. The indices (plural for index, in case you were wondering!) for vectors in $\mathrm{R}$ start at 1 , and stop at the length of the vector.
$>$ our.vect[1] $\quad #$ to get the first value
[1] 8

$>$ # the function length() returns the length of a vector
$>$ length (our.vect)
[1] 7
$>$ our.vect [length (our.vect)] # get the last element of a vector
[1] 9
Note that in the preceding code, we used a function in the subscript operator. In cases like these, R evaluates the expression in the subscript operator, and uses the number it returns as the index to extract.

If we get greedy, and try to extract an element at an index that doesn’t exist, $\mathrm{R}$ will respond with NA, meaning, not available. We see this special value cropping up from time to time throughout this text.
$>$ our.vect [10]
[1] NA
One of the most powerful ideas in $\mathrm{R}$ is that you can use vectors to subset other vectors:
$>$ # extract the first, third, fifth, and
$>$ # seventh element from our vector
$>$ our.vect $[c(1,3,5,7)]$
The ability to use vectors to index other vectors may not seem like much now, but its usefulness will become clear soon.
Another way to create vectors is by using sequences.
Above, the $1: 10$ statement creates a vector from 1 to 10 . $10: 1$ would have created the same 10 element vector, but in reverse. The seq () function is more general in that it allows sequences to be made using steps (among many other things).

Did I mention that we can use vectors to subset other vectors? When we subset vectors using logical vectors of the same length, only the elements corresponding to the TRUE values are extracted. Hopefully, sparks are starting to go off in your head. If we wanted to extract only the legitimate non-NA digits from Jenny’s number, we can do it as follows:
$>$ messy.vector[1is.na (messy.vector)]
This is a very critical trait of $\mathrm{R}$, so let’s take our time understanding it; this idiom will come up again and again throughout this book.

The logical vector that yields TRUE when an NA value occurs in messy .vector (from is . na ()) is then negated (the whole thing) by the negation operator !. The resultant vector is TRUE whenever the corresponding value in messy. vector is not NA.
When this logical vector is used to subset the original messy vector, it only extracts the non-NA values from it.

Similarly, we can show all the digits in Jenny’s phone number that are greater than five as follows:
$>$ our.vect [our.vect $>$ 5]
Thus far, we’ve only been displaying elements that have been extracted from a vector. However, just as we’ve been assigning and re-assigning variables, we can assign values to various indices of a vector, and change the vector as a result. For example, if Jenny tells us that we have the first digit of her phone number wrong (it’s really 9), we can reassign just that element without modifying the others.
$>$ our.vect
[1] 8 6 7 5 3 0 9
our.vect [1] <- 9
our.vect
Sometimes, it may be required to replace all the NA values in a vector with the value o. To do that with our messy vector, we can execute the following command:
$>$ messy.vector [is.na (messy.vector)] $<-0$

messy.vector
[1] 8 8 0 7 5 0 3 0 9

## 统计代写|R语言代写R language代考|Subsetting

$>$ 我们的.vect[1] 四#得到第一个值
[1] 8
$>$ #函数 length() 返回向量的长度
$>$ 长度 (our.vect)
[1] 7
$>$ our.vect [length (our.vect)] # 获取向量的最后一个元素
[1] 9

$>$ our.vect [10]
[1] NA

$>$ #提取第一、第三、第五和
$>$ #向量中的第七个元素
$>$ 我们的.vect $[c(1,3,5,7)]$

$>$ messy.vector[1is.na (messy.vector)]

$>$ 我们的.vect [我们的.vect $>5$ ]

$>$ 我们的.vect
our.vect [1] <- 9
our.vect

$>$ messy.vector $[$ is.na (messy.vector) $]<-0$

[1] 8 8 0 7 5 0 3 0 9

## 统计代写|R语言代写R language代考|Getting help in R

Before we go further, it would serve us well to have a brief section detailing how to get help in R. Most R tutorials leave this for one of the last sections-if it is even included at all! In my own personal experience, though, getting help is going to be one of the first things you will want to do as you add more bricks to your $\mathrm{R}$ knowledge castle. Learning R doesn’t have to be difficult; just take it slowly, ask questions, and get help early. Go you!
It is easy to get help with $\mathrm{R}$ right at the console. Running the help.start () function at the prompt will start a manual browser. From here, you can do anything from going over the basics of $\mathrm{R}$ to reading the nitty-gritty details on how $\mathrm{R}$ works internally.

You can get help on a particular function in $\mathrm{R}$ if you know its name, by supplying that name as an argument to the help function. For example, let’s say you want to know more about the gsub () function that I sprang on you before. Running the following code:
help("gsub")
# or simply
?gsub
will display a manual page documenting what the function is, how to use it, and examples of its usage.

This rapid accessibility to documentation means that I’m never hopelessly lost when I encounter a function which I haven’t seen before. The downside to this extraordinarily convenient help mechanism is that I rarely bother to remember the order of arguments, since looking them up is just seconds away.
Occasionally, you won’t quite remember the exact name of the function you’re looking for, but you’ll have an idea about what the name should be. For this, you can use the help.search () function.
For tougher, more semantic queries, nothing beats a good old fashioned web search engine. If you don’t get relevant results the first time, try adding the term programming or statistics in there for good measure.

## 统计代写|R语言代写R language代考|Vectors

Vectors are the most basic data structures in R, and they are ubiquitous indeed. In fact, even the single values that we’ve been working with thus far were actually vectors of length 1 . That’s why the interactive $\mathrm{R}$ console has been printing [1] along with all of our output.

Vectors are essentially an ordered collection of values of the same atomic data type. Vectors can be arbitrarily large (with some limitations), or they can be just one single value.
The canonical way of building vectors manually is by using the c() function (which stands for combine).

In the preceding example, we created a numeric vector of length 7 (namely, Jenny’s telephone number).
Note that if we tried to put character data types into this vector as follows:
another.vect <- c("8", 6, 7, "-", 3, "0", 9)
another.vect
[1] "8" "6" "7" "-" "3" "0" "9"
$\mathrm{R}$ would convert all the items in the vector (called elements) into character data types to satisfy the condition that all elements of a vector must be of the same type. A similar thing happens when you try to use logical values in a vector with numbers; the logical values would be converted into 1 and 0 (for TRUE and FALSE, respectively). These logicals will turn into TRUE and FALSE (note the quotation marks) when used in a vector that contains characters.

## 统计代写|R语言代写R language代考|Getting help in R

help("gsub")
# or simply
?gsub

## 统计代写|R语言代写R language代考|Vectors

another.vect <- c("8", 6, 7, "-", 3, "0", 9)
another.vect
[1] "8" "6" "7" "-" "3" "0" "9"
$\mathrm{R}$ 会将向量中的所有项（称为元素）转换为字符数据类型，以满足向量的所有元素必须属于同一类型的条 件。当您尝试在带有数字的向量中使用逻辑值时，也会发生类似的事情；逻辑值将被转换为 1 和 0 (分别 代表 TRUE 和 FALSE) 。当在包含字符的向量中使用时，这些逻辑将变成 TRUE 和 FALSE (注意引号) 。

## 统计代写|R语言代写R language代考|Reproducible data analysis

Reproducible data analysis is much more than a fashionable buzzword. Under any situation where accountability is important, from scientific research to decision making in commercial enterprises, industrial quality control and safety and environmental impact assessments, being able to reproduce a data analysis reaching the same conclusions from the same data is crucial. Most approaches to reproducible data analysis are based on automating report generation and including, as part of the report, all the computer commands used to generate the results presented.

A fundamental requirement for reproducibility is a reliable record of what commands have been run on which data. Such a record is especially difficult to keep when issuing commands through menus and dialogue boxes in a graphical user interface or interactively at a console. Even working interactively at the R console using copy and paste to include commands and results in a report is error prone, and laborious.

A further requirement is to be able to match the output of the R commands to the input. If the script saves the output to separate files, then the user will need to take care that the script saved or shared as a record of the data analysis was the one actually used for obtaining the reported results and conclusions. This is another error-prone stage in the reporting of data analysis. To solve this problem an approach was developed, inspired in what is called literate programming (Knuth 1984). The idea is that running the script will produce a document that includes the listing of the $\mathrm{R}$ code used, the results of running this code and any explanatory text needed to understand and interpret the analysis.

Although a system capable of producing such reports with R, called ‘Sweave’ (Leisch 2002), has been available for a couple decades, it was rather limited and not supported by an IDE, making its use rather tedious. A more recently developed system called ‘knitr’ (Xie 2013) together with its integration into RStudio has made the use of this type of reports very easy. The most recent development is what has been called R notebooks produced within RStudio. This new feature, can produce the readable report of running the script as an HTML file, displaying the code used interspersed with the results within the viewable file as in earlier approaches. However, this newer approach goes even further: the actual source script used to generate the report is embedded in the HTML file of the report and can be extracted and run very easily and consequently re-used. This means that anyone who gets access to the output of the analysis in human readable form also gets access to the code used to generate the report, in computer executable format.

When searching for answers, asking for advice or reading books, you will be confronted with different ways of approaching the same tasks. Do not allow this to overwhelm you; in most cases it will not matter as many computations can be done in R, as in any language, in several different ways, still obtaining the same result. The different approaches may differ mainly in two aspects: 1) how readable to humans are the instructions given to the computer as part of a script or program, and 2) how fast the code runs. Unless computation time is an important bottleneck in your work, just concentrate on writing code that is easy to understand to you and to others, and consequently easy to check and reuse. Of course, do always check any code you write for mistakes, preferably using actual numerical test cases for any complex calculation or even relatively simple scripts. Testing and validation are extremely important steps in data analysis, so get into this habit while reading this book. Testing how every function works, as I will challenge you to do in this book, is at the core of any robust data analysis or computing programming.

To access help pages through the command prompt we use function help( () or a question mark. Every object exported by an $\mathrm{R}$ package (functions, methods, classes, data) is documented. Sometimes a single help page documents several R objects. Usually at the end of the help pages, some examples are given, which tend to help very much in learning how to use the functions described. For example, one can search for a help page at the $\mathrm{R}$ console.

## 广义线性模型代考

## 统计代写|R语言代写R language代考|Using R interactively

A physical terminal (keyboard plus text-only screen) decades ago was how users communicated with computers, and was frequently called a console. Nowadays, a text-only interface to a computer, in most cases a window or a pane within a graphical user interface, is still called a console. In our case, the R console (Figure 1.1). This is the native user interface of $R$.

Typing commands at the $\mathrm{R}$ console is useful when one is playing around, rather aimlessly exploring things, or trying to understand how an R function or operator we are not familiar with works. Once we want to keep track of what we are doing, there are better ways of using $\mathrm{R}$, which allow us to keep a record of how an analysis has been carried out. The different ways of using R are not exclusive of each other, so most users will use the $\mathrm{R}$ console to test individual commands and plot data during the first stages of exploration. As soon as we decide how we want to plot or analyze the data, it is best to start using scripts. This is not enforced in any way by $\mathrm{R}$, but scripts are what really brings to light the most important advantages of using a programming language for data analysis. In Figure $1.1$ we can see how the $\mathrm{R}$ console looks. The text in red has been typed in by the user, except for the prompt $>$, and the text in blue is what $\mathrm{R}$ has displayed in response. It is essentially a dialogue between user and $\mathrm{R}$. The console can look different when displayed within an IDE like RStudio, but the only difference is in the appearance of the text rather than in the text itself (cf. Figures $1.1$ and 1.2).

The two previous figures showed the result of entering a single command. Figure $1.3$ shows how the console looks after the user has entered several commands, each as a separate line of text.

The examples in this book require only the console window for user input. Menu-driven programs are not necessarily bad, they are just unsuitable when there is a need to set very many options and choose from many different actions. They are also difficult to maintain when extensibility is desired, and when independently developed modules of very different characteristics need to be integrated. Textual languages also have the advantage, to be addressed in later chapters, that command sequences can be stored in human- and computer-readable text files. Such files constitute a record of all the steps used, and in most cases, makes it trivial to reproduce the same steps at a later time. Scripts are a very simple and handy way of communicating to other users how to do a given data analysis.

## 统计代写|R语言代写R language代考|Using R in a “batch job”

To run a script we need first to prepare a script in a text editor. Figure $1.4$ shows the console immediately after running the script file shown in the text editor. As before, red text, the command source(“my-script.R”), was typed by the user, and the blue text in the console is what was displayed by $\mathrm{R}$ as a result of this action. The title bar of the console, shows “R-console,” while the title bar of the editor shows the path to the script file that is open and ready to be edited followed by “R-editor.”

A true “batch job” is not run at the R console but at the operating system command prompt, or shell. The shell is the console of the operating system-Linux, Unix, OS X, or MS-Windows. Figure $1.5$ shows how running a script at the Windows command prompt looks. A script can be run at the operating system prompt to do time-consuming calculations with the output saved to a file. One may use this approach on a server, say, to leave a large data analysis job running overnight or even for several days.

Integrated Development Environments (IDEs) are used when developing computer programs. IDEs provide a centralized user interface from within which the different tools used to create and test a computer program can be accessed and used in coordination. Most IDEs include a dedicated editor capable of syntax highlighting, and even report some mistakes, related to the programming language in use. One could describe such an editor as the equivalent of a word processor with spelling and grammar checking, that can alert about spelling and syntax errors for a computer language like $R$ instead of for a natural language like English. In the case of RStudio, the main, but not only language supported is $\mathrm{R}$. The main window of IDEs usually displays more than one pane simultaneously. From within the RStudio IDF, one has access to the R console, a text editor, a file-system hrowser, a pane for graphical output, and access to several additional tools such as for installing and updating extension packages. Although RStudio supports very well the development of large scripts and packages, it is currently, in my opinion, also the hest possihle way of using $\mathrm{R}$ at the console as it has the $\mathrm{R}$ help system very well integrated both in the editor and $\mathrm{R}$ console. Figure $1.6$ shows the main window displayed by RStudio after running the same script as shown above at the $\mathrm{R}$ console (Figure 1.4) and at the operating system command prompt (Figure 1.5). We can see by comparing these three figures how RStudio is really a layer between the user and an unmodified R executable. The script was sourced by pressing the “Source” button at the top of the editor pane. RStudio, in response to this, generated the code needed to source the file and “entered” it at the console, the same console, where we would type any $\mathrm{R}$ commands.

## 广义线性模型代考

## 统计代写|R语言代写R language代考|R as a language

$\mathrm{R}$ is a computer language designed for data analysis and data visualization, however, in contrast to some other scripting languages, it is, from the point of view of computer programming, a complete language-it is not missing any important feature. In other words, no fundamental operations or data types are lacking (Chambers 2016). I attribute much of its success to the fact that its design achieves a very good balance between simplicity, clarity and generality. R excels at generality thanks to its extensibility at the cost of only a moderate loss of simplicity, while clarity is ensured by enforced documentation of extensions and support for both object-oriented and functional approaches to programming. The same three principles can be also easily respected by user code written in $\mathrm{R}$.

As mentioned above, R started as a free and open-source implementation of the S language (Becker and Chambers 1984; Becker et al. 1988). We will describe the features of the $\mathrm{R}$ language in later chapters. Here I mention, for those with programming experience, that it does have some features that make it different from other frequently used programming languages. For example, R does not have the strict type checks of Pascal or $\mathrm{C}++$. It has operators that can take vectors and matrices as operands allowing more concise program statements for such operations than other languages. Writing programs, specially reliable and fast code, requires familiarity with some of these idiosyncracies of the $\mathrm{R}$ language. For those using $\mathrm{R}$ interactively, or writing short scripts, these idiosyncratic features make life a lot easier by saving typing.

## 统计代写|R语言代写R language代考|R as a computer program

The R program itself is open-source, and the source code is available for anybody to inspect, modify and use. A small fraction of users will directly contribute improvements to the R program itself, but it is possible, and those contributions are important in making R reliable. The executable, the R program we actually use, can be built for different operating systems and computer hardware. The members of the R developing team make an important effort to keep the results obtained from calculations done on all the different builds and computer architectures as consistent as possible. The aim is to ensure that computations return consistent results not only across updates to $R$ but also across different operating systems like Linux, Unix (including OS X), and MS-Windows, and computer hardware.
The $\mathrm{R}$ program does not have a graphical user interface (GUI), or menus from which to start different types of analyses. Instead, the user types the commands at the R console (Figure 1.1). The same textual commands can also be saved into a text file, line by line, and such a file, called a “script” can substitute repeated typing of the same sequence of commands. When we work at the console typing in commands one by one, we say that we use $\mathrm{R}$ interactively. When we run script, we may say that we run a “batch job.”

The two approaches described above are part of the R program by itself. However, it is common to use a second program as a front-end or middleman between the user and the R program. Such a program allows more flexibility and has multiple features that make entering commands or writing scripts easier. Computations are still done by exactly the same R program. The simplest option is to use a text editor like Emacs to edit the scripts and then run the scripts in R from within the editor. With some editors like Emacs, rather good integration is possible. However, nowadays there are also Integrated Development Environments (IDEs) available for R. An IDE both gives access to the R console in one window and provides a text editor for writing scripts in another window. Of the available IDEs for R, RStudio is currently the most popular by a wide margin.

## 统计代写|R语言代写R language代考|R as a language

R是一种专为数据分析和数据可视化而设计的计算机语言，然而，与其他一些脚本语言相比，从计算机编程的角度来看，它是一种完整的语言——它不缺少任何重要的特性。换句话说，不缺少基本操作或数据类型（Chambers 2016）。我将它的成功很大程度上归功于它的设计在简单性、清晰性和通用性之间取得了很好的平衡。R 在通用性方面表现出色，这要归功于它的可扩展性，其代价是仅在一定程度上降低了简单性，同时通过强制扩展文档和对面向对象和函数式编程方法的支持来确保清晰度。编写的用户代码也可以轻松遵守相同的三个原则R.

## 统计代写|R语言代写R language代考|R as a computer program

R程序本身是开源的，任何人都可以查看、修改和使用源代码。一小部分用户将直接为 R 程序本身做出改进，但这是可能的，并且这些贡献对于使 R 可靠非常重要。可执行文件，即我们实际使用的 R 程序，可以针对不同的操作系统和计算机硬件构建。R 开发团队的成员做出了重要的努力，以使在所有不同构建和计算机体系结构上进行的计算所获得的结果尽可能保持一致。目的是确保计算不仅在更新之间返回一致的结果R但也跨越不同的操作系统，如 Linux、Unix（包括 OS X）和 MS-Windows，以及计算机硬件。

## 广义线性模型代考

## 统计代写|抽样调查作业代写sampling theory of survey代考|The Predictive Rank Distribution

Ranks lie at the heart of JPS, and indeed all of RSS. Focusing on a single set, we can describe the ranks of the $H$ units in terms of a matrix $P$. Each row of the matrix corresponds to a unit in the set, each column to the rank of the unit in the set. A perfectly ranked sample corresponds to a permutation matrix where the row for unit $h$, if having rank $j$, is the $H$-vector with a 1 in position $j$ and 0 in all other positions. We use the notation $p_h$ to represent row $h$ of the matrix $P$.

JPS and RSS rely on the rank matrix $P$ but do not rely on an assumption of perfect ranking. Whether the ranks come from subjective judgement or from measured covariates, they yield a permutation matrix $P$, provided there are no ties in the ranking. In the event that there are ties, perhaps due to a pair of rankers (or measured covariates) providing different ranking matrices, $P_1$ and $P_2$, MacEachern et al. (2004) suggested use of the average $\bar{P}=0.5 P_1+0.5 P_2$. This is appropriate when there is no reason to prefer one ranking over the other. Replacement of the permutation matrix $P$ with the average necessitates replacement of the estimator (3) with one that allows non-indicator vectors $p_h$. Relying on the extensive body of work on ratio estimation in survey sampling, MacEachern et al. (2004) suggested the estimator in (4). This estimator effectively prorates the response across the strata to which it may belong.

The replacement of an $H \times H$ permutation matrix $P$ with a convex combination over permutation matrices has been used productively in RSS by a number of authors, primarily when concerned with creating models for imperfect rankings (e.g. Bohn and Wolfe, 1994; Frey, 2007, while Dell and Clutter, 1972 and Fligner and MacEachern, 2006 developed models for imperfect ranking of differing form). The permutation matrices represent the extreme points of the set of doubly stochastic matrices-matrices with non-negative entries whose row sums and column sums total one. As a consequence, all other doubly stochastic matrices may be represented as an average of permutation matrices.

The use of measured covariates for JPS allows one to build a model for the response $Y$ as a function of the measured covariates, $\mathbf{X}$. The model may be constructed from the data at hand, or it may have been developed in previous studies. With more than one covariate, a regression model for $Y$ on $\mathbf{X}$ effectively transforms the vector of covariates into a single covariate while capturing much of the information connecting covariate to response. If the units in a set are ranked on the fitted value from the model when the covariate distribution is continuous, there will be no ties among the covariate values, ranking will be unambiguous, and the ranking matrix $P$ will be a permutation matrix. Chen et al. (2005) took this approach to form a logistic regression model for a binary response.

## 统计代写|抽样调查作业代写sampling theory of survey代考|Simulation Study

This section presents the results of simulation studies comparing the performance of the various estimators of the mean based on a JPS sample. The findings for existing estimators are in line with the results in Wang et al. (2006). They also highlight the value that the predictive rank probabilities bring to estimation, particularly for the new estimator in (15).

The first study investigates the performance of eight estimators when the model that generates the data is fully known and is exactly right. This allows us to look at the potential performance of the estimators, exclusive of uncertainty about the model. Large sample sizes let us compare the asymptotic performance of the estimators.

The eight estimators are JPS1 from (3), a plug-in estimator based on the rank of $E\left[Y \mid X_1, X_2\right]$ (LS), OLS and WLS from Wang et al. (2006), TRs from (4), JPS2 and JPS3 from (14) and (15) and REG from (10). JPS2 and JPS3 make use of the predictive rank distribution. The estimator TRs has the same form as JPS2 but, as in MacEachern et al. (2004), uses the two ranks from the concomitants instead of the model-based predictive rank distribution. The REG estimator makes direct use of the covariates.

The model is the following. There are $n$ sets, each consisting of $H$ units. There are two covariates and a single response of interest. The covariates are measured on all $n H$ units, while the response is measured for a single unit in each set. The vector $\left(X_1, X_2, Y\right)$ follows a multivariate normal distribution with standard normal marginal distributions and covariances (correlations) specified in Tables 1,2 , and 3. The varied correlations range from a strong relationship between the concomitants and $Y$ to a relatively weak relationship between them. Sample sizes $n=20,50$ and 100 are investigated for set size $H=2$. For larger set sizes, results are presented only $n=50$ and 100 . For these set sizes, some of the estimators did not exist for some replicates. For the simulation, 10,000 replicates were used.

The tables present the relative accuracy of the various estimators to the sample mean based on a SRS. The entries are the ratio of MSEs for the SRS relative to the estimator in question. A number greater than 1 indicates smaller MSE for the estimator than for SRS.

## 统计代写|抽样调查作业代写sampling theory of survey代考|The Predictive Rank Distribution

JPS和RSS依赖秩矩阵P但不要依赖完美排名的假设。无论等级来自主观判断还是来自测量的协变量，它们都会产生一个置换矩阵P，前提是排名没有关系。如果存在关系，可能是由于一对排序器（或测量的协变量）提供不同的排序矩阵，P1和P2, MacEachern 等人。(2004) 建议使用平均值P¯=0.5P1+0.5P2. 当没有理由偏爱一个排名而不是另一个排名时，这是合适的。置换矩阵的替换P平均需要用一个允许非指标向量的估计器替换估计器 (3)pH. MacEachern 等人依靠调查抽样中比率估计的广泛工作。(2004) 建议使用 (4) 中的估计量。该估算器有效地按比例分配了它可能所属的各个层的响应。

## 广义线性模型代考

## 统计代写|抽样调查作业代写sampling theory of survey代考|Consistency of JPS Estimators

The literature on RSS and JPS demonstrates the consistency of the estimators $\bar{Y}{r s s}$ and $\hat{\mu}{j p s 1}$ in (2) and (3), respectively, under minimal conditions. These traditional estimators borrow heavily from the design-based perspective of survey sampling, where (approximate) unbiasedness is prized. Small variance is the secondary consideration. Modern work with surveys adjusts the balance, relying more heavily on models, especially where missing data is a concern (Lohr, 2010). With this perspective, a bit more bias is allowed, provided it is accompanied by a substantial reduction in variance. Simulations are used to evaluate the estimators’ performance when the model does not hold. Wang et al. (2006) pursued this path.

We work in the infinite population setting where we collect IID sets, observing a single member of each set. As such, we envision that the data come from some distribution which we refer to as the “true model”. In addition, there is a model used to construct the estimator. We assume that $\mu$ exists under both models. Consistency concerns arise when the true model and that used for analysis differ.

To set the framework for our consideration of robustness, we split the models into two parts. The first is the conditional distribution of $Y \mid \mathbf{R}$. The second is the distribution of $\mathbf{R}$ for the unit that is to be fully measured. The true and analysis models may differ in one or both of these aspects. A given estimator may be robust to differences in one portion of the model but not to differences in the other portion of the model. We consider each of the estimators in turn, presenting a heuristic argument for or against consistency. Our statements are to be taken loosely; simulations appearing in a later section support our claims. Formal statements and proofs of these results await another venue.

We briefly note that the estimators $\hat{\mu}{j p s 1}$ and $\hat{\mu}{j p s 2}$ are consistent for $\mu$. These estimators do not rely on a model, and so we need not consider the gap between the true and analysis models. Consistency was established in MacEachern et al. (2004).
The estimators based on parametric models, $\hat{\mu}{o L S}$ and $\hat{\mu}{w L S}$, may or may not be consistent. We begin with $\hat{\mu}{o L S}$. For a given stratum $\mathbf{r}$, an offset observation, $Y-$ $\delta{[\mathbf{r}]}=Y-\mu_{[\mathbf{r}]}+\mu$, has mean $\mu$-provided the true and analysis models agree for the distribution of $Y \mid(\mathbf{R}=\mathbf{r})$ so that $\mu_{[\mathbf{r}]}$ has the same value under the two models and the offset has been correctly specified (or will be estimated consistently). Averaging across the strata, we see that the estimator targets the quantity $\mu-\sum_{\mathbf{r}} \pi_{\mathbf{r}} \delta[\mathbf{r}]$. The estimator will be consistent for $\mu$ if (7) holds so that the average offset is zero. It is clear that this will be the case when the distribution on $\mathbf{R}$ and the conditional mean of $Y \mid(\mathbf{R}=\mathbf{r})$ are correctly specified for each of the $H^2$ strata. The first ensures accuracy of the $\pi_{\mathbf{r}}$, while the second ensures accuracy of the $\delta_{[\mathbf{r}]}$. Together, these imply (7). While these conditions stop short of full agreement between the true and analysis models, they are nearly there.

## 统计代写|抽样调查作业代写sampling theory of survey代考|Covariates or Ranks?

The use of the vector of measured covariates, $\mathbf{X}_i$, to induce the ranks opens up many possibilities. One might ask whether ranking on $X_1$ and $X_2$ is optimal, or whether there is a mapping to another set of variates that leads to a better estimator. One possibility stands out, especially when relying on a multivariate normal model for $(Y, \mathbf{X})$. The vector $\mathbf{X}$ can be mapped to the regression of $Y$ on $\mathbf{X}$ and its orthogonal complement. Under the multivariate normal model, this corresponds to an affine transformation of the covariates, $\mathbf{X}$, to a new set of covariates, say $\mathbf{W}=A \mathbf{X}$. The first coordinate of $\mathbf{W}$ is $E[Y \mid \mathbf{X}]$. The second coordinate is independent of both the first coordinate and the response and can be dropped.

In practice, we do not expect to know the relationship between covariates and response. With this in mind, we might estimate the relationship by fitting a model for $Y \mid \mathbf{X}$ to our $n$ fully observed cases. Having done so, the fitted values become the first coordinate of $\mathbf{W}$. Often, the fitted values are estimates of $E[Y \mid \mathbf{W}]=$ $E[Y \mid \mathbf{X}]$. From here, a natural estimate of $\mu$ can be obtained by averaging the fitted values (estimated means) for all $n H$ observations. Following this path, the ranks have disappeared, and we are no longer in the setting of RSS or JPS.

The “covariate” approach leads to a natural estimator in the regression setting. The model for $Y \mid \mathbf{X}$ is a constant variance linear regression model. The chain of algebra below yields the estimator when the covariance matrix for $\mathbf{X}$ and $Y$ is known.

Define $\bar{Y}{s r s}$ and $\overline{\mathbf{X}}{s r s}$ to be the mean of the response and the covariates for the $n$ fully measured units, respectively. Take $\overline{\mathbf{X}}$ (a vector) to be the mean of the covariates for all $n H$ units. For the covariance matrix, with $Y$ in position 1 followed by the vector $\mathbf{X}$ in the trailing positions, the matrix can be written in partitioned form. This leads to $\Sigma_{12}$ and $\Sigma_{22}$ for the covariance of $Y$ and the vector $\mathbf{X}$ and the variance matrix for the vector $\mathbf{X}$, respectively. Then
$$\hat{\mu}_{reg} = \frac{1}{nH} \sum_{i=1}^n \sum_{h=1}^H \hat{E}[Y_{ih} \mid \mathbf{X}_{ih}] = \frac{1}{nH} \sum_{i=1}^n \sum_{h=1}^H \bar{Y}_{srs} + \Sigma_{12} \Sigma_{22}^{-1}(\mathbf{X}_{ih} - \overline{\mathbf{X}}_{srs}) = \bar{Y}_{srs} + \Sigma_{12} \Sigma_{22}^{-1}(\overline{\mathbf{X}} - \overline{\mathbf{X}}_{srs})$$
This estimator is constructed by replacing the unknown parameters with estimates from the $n$ fully measured units. In the event that the covariance matrix was not known, it would be replaced by the estimated covariance from the fully measured units. If the covariance matrix is unknown, estimates can be plugged in for the unknown quantities.

Why would one choose to pass from the covariate $\mathbf{X}$ to the coarser summary of its rank? The advantage of working with the rank-based estimators is their ability to handle deficiencies in the assumed model for $(\mathbf{X}, Y)$. A well-chosen estimator either will be consistent or will be Fisher consistent for a value very near the truth. (Parenthetically, estimators based directly on ( $\mathbf{X}, Y)$ may also be consistent.) The rank-based estimators also seem to be better able to handle poorer quality covariates, including those whose distribution is not fully stable from one set to another. They also lead to methods with enhanced robustness for data sets with missing covariate values and imperfect models for the missing covariates given the observed covariates.

## 统计代写|抽样调查作业代写sampling theory of survey代考|Consistency of JPS Estimators

$$\hat{\mu}_{reg} = \frac{1}{nH} \sum_{i=1}^n \sum_{h=1}^H \hat{E}[Y_{ih} \mid \mathbf{X}_{ih}] = \frac{1}{nH} \sum_{i=1}^n \sum_{h=1}^H \hat{\mu}_Y + \Sigma_{12} \Sigma_{22}^{-1}(\mathbf{X}_{ih} - \hat{\mu}_X) = \bar{Y}_{srs} + \Sigma_{12} \Sigma_{22}^{-1}(\overline{\mathbf{X}} - \overline{\mathbf{X}}_{srs})$$
$$该估计器是通过用来自 n 完全测量的单位。在协方差矩阵末知的情况下，它将被完全测量单位的估计协方 差所取代。如果协方差矩阵末知，则可以代入末知量的估计值。 为什么会选择从协变量传递 \mathbf{X} 对其排名的粗略总结? 使用基于秩的估计器的优点是它们能够处理假设模型 中的缺陷 (\mathbf{X}, Y).一个精心选择的估计要么是一致的，要么是 Fisher 一致的，以获得非常接近真实的值。 (顺便说一下，估计量直接基于 ( \mathbf{X}, Y )也可能是一致的。) 基于等级的估计器似乎也能够更好地处理质 量较差的协变量，包括那些从一组到另一组的分布不完全稳定的协变量。它们还导致方法对具有缺失协变 量值的数据集具有增强的鲁棒性，并且在给定观察到的协变量的情况下，缺失协变量的模型不完善。 统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。统计代写|python代写代考 ## 随机过程代考 在概率论概念中，随机过程随机变量的集合。 若一随机系统的样本点是随机函数，则称此函数为样本函数，这一随机系统全部样本函数的集合是一个随机过程。 实际应用中，样本函数的一般定义在时间域或者空间域。 随机过程的实例如股票和汇率的波动、语音信号、视频信号、体温的变化，随机运动如布朗运动、随机徘徊等等。 ## 贝叶斯方法代考 贝叶斯统计概念及数据分析表示使用概率陈述回答有关未知参数的研究问题以及统计范式。后验分布包括关于参数的先验分布，和基于观测数据提供关于参数的信息似然模型。根据选择的先验分布和似然模型，后验分布可以解析或近似，例如，马尔科夫链蒙特卡罗 (MCMC) 方法之一。贝叶斯统计概念及数据分析使用后验分布来形成模型参数的各种摘要，包括点估计，如后验平均值、中位数、百分位数和称为可信区间的区间估计。此外，所有关于模型参数的统计检验都可以表示为基于估计后验分布的概率报表。 ## 广义线性模型代考 广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。 statistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。 ## 机器学习代写 随着AI的大潮到来，Machine Learning逐渐成为一个新的学习热点。同时与传统CS相比，Machine Learning在其他领域也有着广泛的应用，因此这门学科成为不仅折磨CS专业同学的“小恶魔”，也是折磨生物、化学、统计等其他学科留学生的“大魔王”。学习Machine learning的一大绊脚石在于使用语言众多，跨学科范围广，所以学习起来尤其困难。但是不管你在学习Machine Learning时遇到任何难题，StudyGate专业导师团队都能为你轻松解决。 ## 多元统计分析代考 基础数据: N 个样本， P 个变量数的单样本，组成的横列的数据表 变量定性: 分类和顺序；变量定量：数值 数学公式的角度分为: 因变量与自变量 ## 时间序列分析代写 随机过程，是依赖于参数的一组随机变量的全体，参数通常是时间。 随机变量是随机现象的数量表现，其时间序列是一组按照时间发生先后顺序进行排列的数据点序列。通常一组时间序列的时间间隔为一恒定值（如1秒，5分钟，12小时，7天，1年），因此时间序列可以作为离散时间数据进行分析处理。研究时间序列数据的意义在于现实中，往往需要研究某个事物其随时间发展变化的规律。这就需要通过研究该事物过去发展的历史记录，以得到其自身发展的规律。 ## 回归分析代写 多元回归分析渐进（Multiple Regression Analysis Asymptotics）属于计量经济学领域，主要是一种数学上的统计分析方法，可以分析复杂情况下各影响因素的数学关系，在自然科学、社会和经济学等多个领域内应用广泛。 ## MATLAB代写 MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。 ## 统计代写|抽样调查作业代写sampling theory of survey代考|STAT506 如果你也在 怎样代写抽样调查sampling theory of survey这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。 抽样调查是一种非全面调查，根据随机的原则从总体中抽取部分实际数据进行调查，并运用概率估计方法，根据样本数据推算总体相应的数量指标的一种统计分析方法。 statistics-lab™ 为您的留学生涯保驾护航 在代写抽样调查sampling theory of survey方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写抽样调查sampling theory of survey方面经验极为丰富，各种代写抽样调查sampling theory of survey相关的作业也就用不着说。 我们提供的抽样调查sampling theory of survey及其相关学科的代写，服务范围广, 其中包括但不限于: • Statistical Inference 统计推断 • Statistical Computing 统计计算 • Advanced Probability Theory 高等楖率论 • Advanced Mathematical Statistics 高等数理统计学 • (Generalized) Linear Models 广义线性模型 • Statistical Machine Learning 统计机器学习 • Longitudinal Data Analysis 纵向数据分析 • Foundations of Data Science 数据科学基础 ## 统计代写|抽样调查作业代写sampling theory of survey代考|Ranked Set Sampling and Judgement Post-stratification Stokes’ pioneering work (Stokes, 1977) brought measured covariates to ranked set sampling (RSS). Briefly restating her work and establishing notation, consider a set of n H units that are partitioned at random into n sets, each of size H. The units are presumed to form a random sample from some distribution. Within a given set, we begin with \left(X_h, Y_h\right), h=1, \ldots, H. These units are ranked on the X_h, so that X_{(r: H)} is the r th order statistic in the set. The measured response, Y_{[r: H]}, associated with this unit is its concomitant. To draw a RSS of size n from such a population, sample sizes n_h, h=1, \ldots, H, are specified, with \sum_{h=1}^H n_h=n. One unit is drawn from each of the n sets; in n_h sets, the unit ranked h is selected. The resulting sample is a RSS. The earliest description of RSS appears in McIntyre (1952) (republished as McIntyre, 2005). In McIntyre’s description of the technique, ranking is based on the subjective judgement of an experimenter who examines each set of H units, specifying the ranks of the units in the set. Once the units in each set have been ranked, the sample is drawn as described above and the response of interest, Y, is measured on the n sampled units. Extending our notation to capture both set and rank within set, the mean of the n H units is$$
\bar{Y}=(n H)^{-1} \sum_{i=1}^n \sum_{h=1}^H Y_{i h},
$$where Y_{i h} is the response of the unit with rank h in set i. Suppressing the notation for the rank, define Y_i to the be i th of the n sampled units. Provided n_h>1 for all h,$$
\bar{Y}{r s s}=H^{-1} \sum{h=1}^H \bar{Y}h, $$where \bar{Y}_h is the sample mean of the n_h sampled units with rank h. The RSS estimator is unbiased: E\left[\bar{Y}{r s s} \mid \bar{Y}\right]=\bar{Y} for any collection of n H units. Furthermore, when the units are a random sample from a distribution with mean \mu=E[Y], E\left[\bar{Y}_{r s s}\right]=E[\bar{Y}]=\mu. The goal of RSS is to estimate \mu. Stokes and Sager (1988) cast estimation of a cumulative distribution function as estimation of a proportion (mean) for all cut points on the real line. RSS with estimation following (2) is robust to variation in the specifics of how the ranks are created. When created subjectively, better ranking leads to greater separation of the means of the rank classes (or strata), in turn leading to greater reduction in variance relative to estimators based on a random sample from the population. When ranks arise from a measured covariate, the same holds. Sound ## 统计代写|抽样调查作业代写sampling theory of survey代考|Multivariate Order Statistics and JPS In Wang et al. (2006), Stokes and coauthors posed the intriguing question of how to use multiple covariates to convey information about the ranks of units for use in JPS. Their solution is to rank on each of the distinct covariates. In the case of a continuous bivariate covariate, \left(X_1, X_2\right), each of the units in the set would be assigned a pair of ranks – one for X_1 and the other for X_2. This pair of ranks defines the post-stratum (or rank class) of the unit. For a set of size H, there are H^2 poststrata. We denote these post-strata with \mathbf{r}=\left(r_1, r_2\right), where r_1, r_2 \in{1, \ldots, H}. We focus on a bivariate covariate but note that the technique extends to covariates of greater dimension. Figure 1 illustrates the situation for a bivariate order statistic for set size H=5. The increase in the number of post-strata from H to H^2 necessitates reconsideration of the basic post-stratification estimator (3). Marginally, each covariate for the measured unit will have rank r_i=h with probability 1 / H for i=1,2 and h=1, \ldots, H. The joint distribution of \mathbf{R} leads to the stratum probability \pi_{\mathbf{r}}=P(\mathbf{R}=\mathbf{r}). In general, these probabilities can be found via numerical integration if the model for \left(X_1, X_2\right) is fully specified. Some of the \pi_{\mathbf{r}} may be much smaller than H^{-2}, leading to a large probability that the estimator is undefined. Wang et al. (2006) handled this issue by appealing to a parametric model as an aid to estimation. The authors defined \mu_{[\mathbf{r}]}=F[Y \mid \mathbf{R}=\mathbf{r}]. The value of \mu_{[\mathbf{r}]} can be found by numerical integration over the conditional distribution of Y \mid \mathbf{R}. Once the stratum means are in place, they are connected to the mean of Y via the expression \mu=\sum_{\mathbf{r}} \pi_{\mathbf{r}} \mu_{[\mathbf{r}]}. It is helpful to introduce the difference between the stratum mean and the overall mean, \delta_{[\mathbf{r}]}=\mu_{[\mathbf{r}]}-\mu. The authors suggested estimation by ordinary least squares applied to a model for \mu, with observations in stratum \mathbf{r} offset by \delta_{[\mathbf{r}]}. The data are \left(Y_i, \mathbf{r}i\right), i=1, \ldots, n, and the estimator is$$ \hat{\mu}{o L S}=n^{-1} \sum_{i=1}^n\left(Y_i-\delta_{\left[\mathbf{r}i\right]}\right) . $$The estimator \hat{\mu}{o L S} can be viewed in two stages: In the first, each observation is bias-corrected by subtracting its \delta_{[\mathbf{r}]}; in the second, the sample mean of the biascorrected observations is computed. Partitioning the sample into strata reduces the within-stratum variances. Removing bias and then using the sample mean ensures that each observation receives equal weight in the estimator. Together, these two stages lead to substantial variance reduction, especially for relatively large set sizes. ## 抽样调查代考 ## 统计代写|抽样调查作业代写sampling theory of survey代考|Ranked Set Sampling and Judgement Post-stratification Stokes 的开创性工作 (Stokes, 1977) 将测量的协变量引入排序集抽样 (RSS)。简要重申她的工作并建立符 号，考虑一组 n H 随机分成的单元 n 套装，每个尺寸 H. 假定这些单位从某种分布中形成随机样本。在给定 的集合中，我们从 \left(X_h, Y_h\right), h=1, \ldots, H. 这些单位排名在 X_h ，以便 X_{(r: H)} 是个 r 集合中的 th 阶统 计量。测得的响应， Y_{[r: H]}, 与这个单位相关的是它的伴随物。绘制大小为RSSn从这样的人群中，样本量 n_h, h=1, \ldots, H ，被指定为 \sum_{h=1}^H n_h=n. 每个单位抽取一个单位 n 套; 在 n_h 套，单位排名 h 被选中。 生成的样本是一个 RSS。 对 RSS 的最早描述出现在 McIntyre (1952)（重新出版为 McIntyre，2005) 中。在 McIntyre 对这项技术 的描述中，排名是基于实验者的主观判断，他检查了每组 H 单位，指定集合中单位的等级。一旦对每组中 的单元进行排序，就会按照上述方法抽取样本，并得出感兴趣的响应， Y ，是在 n 抽样单位。扩展我们的符 号以捕获集合和集合内的等级，即 n H 单位是$$
\bar{Y}=(n H)^{-1} \sum_{i=1}^n \sum_{h=1}^H Y_{i h},
$$在哪里 Y_{i h} 是具有等级的单元的响应 h 在集合中 i. 抑制等级的符号，定义 Y_i 成为 i 的第 n 抽样单位。假如 n_h>1 对所有人 h ，$$
\bar{Y} r s s=H^{-1} \sum h=1^H \bar{Y} h,
$$在哪里 \bar{Y}h 是样本均值 n_h 有排名的抽样单位 h. RSS 估计器是无偏的: E[\bar{Y} r s s \mid \bar{Y}]=\bar{Y} 对于任何集合 n H 单位。此外，当单位是来自均值分布的随机样本时 \mu=E[Y], E\left[\bar{Y}{r s s}\right]=E[\bar{Y}]=\mu. RSS的目标是 估计 \mu. Stokes 和 Sager (1988) 将累积分布函数的估计作为对实线上所有切割点的比例（均值）的估计。 估计遵循 (2) 的 RSS 对于排名创建方式的具体变化具有鲁棒性。当主观创建时，更好的排名会导致排名类 别 (或阶层) 的均值更大程度的分离，进而导致相对于基于总体随机样本的估计量的方差更大程度的减 少。当排名来自测量的协变量时，同样成立。 ## 统计代写|抽样调查作业代写sampling theory of survey代考|Multivariate Order Statistics and JPS 在王等人。(2006)，Stokes 和合著者提出了一个有趣的问题，即如何使用多个协变量来传达有关JPS 中使 用的单位等级的信息。他们的解决方案是对每个不同的协变量进行排名。在连续双变量协变量的情况下， \left(X_1, X_2\right) ，集合中的每个单元都将分配一对等级一一一个用于 X_1 另一个是 X_2. 这对职级定义了单位的职 级 (或职级) 。对于一组尺寸 H ，有 H^2 后层。我们用 \mathbf{r}=\left(r_1, r_2\right) ， 在哪里 r_1, r_2 \in 1, \ldots, H. 我们 关注双变量协变量，但注意到该技术扩展到更大维度的协变量。图 1 说明了集合大小的双变量顺序统计的 情况 H=5. 后阶层数量的增加来自 H 至 H^2 需要重新考虑基本的分层后估计量 (3)。边际上，测量单位的每个协变量将 具有排名 r_i=h 有概率 1 / H 为了 i=1,2 和 h=1, \ldots, H. 的联合分布 \mathbf{R} 导致层概率 \pi_{\mathrm{r}}=P(\mathbf{R}=\mathbf{r}). 一般来说，如果模型为 \left(X_1, X_2\right) 是完全指定的。某些 \pi_{\mathrm{r}} 可能比 H^{-2} ，导致估计量末定义的可能性很大。 王等。(2006) 通过求助于参数模型作为估计的辅助来处理这个问题。作者定义 \mu_{[\mathbf{r}]}=F[Y \mid \mathbf{R}=\mathbf{r}]. 的 价值 \mu_{[\mathbf{r}} 可以通过对条件分布的数值积分找到 Y \mid \mathbf{R}. 一旦层均值就位，它们将连接到 Y 通过表达式 \mu=\sum_{\mathbf{r}} \pi_{\mathbf{r}} \mu_{[\mathbf{r}]}. 引入层均值和总体均值之间的差异是有帮助的， \delta_{[\mathbf{r}]}=\mu_{[\mathbf{r}]}-\mu. 作者建议将普通最小 二乘法应用于模型 \mu ，在 stratum 中观察 \mathbf{r} 抵消 \delta_{[\mathbf{r}]}. 数据是 \left(Y_i, \mathbf{r} i\right), i=1, \ldots, n ，估计量是$$
\hat{\mu} o L S=n^{-1} \sum_{i=1}^n\left(Y_i-\delta_{[\mathbf{r} i]}\right) .
$$估算器 \hat{\mu} o L S 可以分两个阶段来查看: 在第一个阶段，每个观察值通过减去它的偏差校正 \delta_{[\mathbf{r}]}; 第二，计算 偏差校正观察的样本均值。将样本划分为层可减少层内方差。去除偏差然后使用样本均值可确保每个观察 值在估计器中获得相等的权重。这两个阶段一起导致显着的方差减少，特别是对于相对较大的集合大小。 统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。统计代写|python代写代考 ## 随机过程代考 在概率论概念中，随机过程随机变量的集合。 若一随机系统的样本点是随机函数，则称此函数为样本函数，这一随机系统全部样本函数的集合是一个随机过程。 实际应用中，样本函数的一般定义在时间域或者空间域。 随机过程的实例如股票和汇率的波动、语音信号、视频信号、体温的变化，随机运动如布朗运动、随机徘徊等等。 ## 贝叶斯方法代考 贝叶斯统计概念及数据分析表示使用概率陈述回答有关未知参数的研究问题以及统计范式。后验分布包括关于参数的先验分布，和基于观测数据提供关于参数的信息似然模型。根据选择的先验分布和似然模型，后验分布可以解析或近似，例如，马尔科夫链蒙特卡罗 (MCMC) 方法之一。贝叶斯统计概念及数据分析使用后验分布来形成模型参数的各种摘要，包括点估计，如后验平均值、中位数、百分位数和称为可信区间的区间估计。此外，所有关于模型参数的统计检验都可以表示为基于估计后验分布的概率报表。 ## 广义线性模型代考 广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。 statistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。 ## 机器学习代写 随着AI的大潮到来，Machine Learning逐渐成为一个新的学习热点。同时与传统CS相比，Machine Learning在其他领域也有着广泛的应用，因此这门学科成为不仅折磨CS专业同学的“小恶魔”，也是折磨生物、化学、统计等其他学科留学生的“大魔王”。学习Machine learning的一大绊脚石在于使用语言众多，跨学科范围广，所以学习起来尤其困难。但是不管你在学习Machine Learning时遇到任何难题，StudyGate专业导师团队都能为你轻松解决。 ## 多元统计分析代考 基础数据: N 个样本， P 个变量数的单样本，组成的横列的数据表 变量定性: 分类和顺序；变量定量：数值 数学公式的角度分为: 因变量与自变量 ## 时间序列分析代写 随机过程，是依赖于参数的一组随机变量的全体，参数通常是时间。 随机变量是随机现象的数量表现，其时间序列是一组按照时间发生先后顺序进行排列的数据点序列。通常一组时间序列的时间间隔为一恒定值（如1秒，5分钟，12小时，7天，1年），因此时间序列可以作为离散时间数据进行分析处理。研究时间序列数据的意义在于现实中，往往需要研究某个事物其随时间发展变化的规律。这就需要通过研究该事物过去发展的历史记录，以得到其自身发展的规律。 ## 回归分析代写 多元回归分析渐进（Multiple Regression Analysis Asymptotics）属于计量经济学领域，主要是一种数学上的统计分析方法，可以分析复杂情况下各影响因素的数学关系，在自然科学、社会和经济学等多个领域内应用广泛。 ## MATLAB代写 MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。 ## 统计代写|抽样调查作业代写sampling theory of survey代考|MATH525 如果你也在 怎样代写抽样调查sampling theory of survey这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。 抽样调查是一种非全面调查，根据随机的原则从总体中抽取部分实际数据进行调查，并运用概率估计方法，根据样本数据推算总体相应的数量指标的一种统计分析方法。 statistics-lab™ 为您的留学生涯保驾护航 在代写抽样调查sampling theory of survey方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写抽样调查sampling theory of survey方面经验极为丰富，各种代写抽样调查sampling theory of survey相关的作业也就用不着说。 我们提供的抽样调查sampling theory of survey及其相关学科的代写，服务范围广, 其中包括但不限于: • Statistical Inference 统计推断 • Statistical Computing 统计计算 • Advanced Probability Theory 高等楖率论 • Advanced Mathematical Statistics 高等数理统计学 • (Generalized) Linear Models 广义线性模型 • Statistical Machine Learning 统计机器学习 • Longitudinal Data Analysis 纵向数据分析 • Foundations of Data Science 数据科学基础 ## 统计代写|抽样调查作业代写sampling theory of survey代考|Bayesian approach Another inferential approach in survey sampling is Bayesian as we now discuss. About \mathrm{Y}=\left(y_1, \ldots, y_i, \ldots, y_N\right), let \Omega=\left{\mathrm{Y} \mid\left(-\infty<a_i \leq y_i \leq b_i<+\infty\right)\right., with a_i, b_i known or unknown }, called the universal parametric space. For a sample s=\left(i_1, \ldots, i_n\right) and survey data d=(s, y)=\left(\left(i_1, y_{i 1}\right), \ldots,\left(i_n, y_{i n}\right)\right), with y=\left(y_{i 1}, \ldots, y_{i n}\right), let us write$$
$$P_{\mathrm{Y}}(d) = \text{Prob}(d) = p(s) I_{\mathrm{Y}}(d)$$
$$where$$
I_{Y}(d)=\left{\begin{array}{ll}
1 & \text { if } Y \in \Omega_d \
0 & \text { if } Y \notin \Omega_d
\end{array},\right.
$$writing \Omega_d=\left{\mathrm{Y} \mid-\infty<a_j \leq y_j \leq b_j<\infty\right. for j \neq i_1, \ldots, i_n but y is as observed }, then we call P_{\mathrm{Y}}(d), the probability of observing the survey data d when Y is in the underlying parametric space. A survey design p is called ‘informative’ if p(s) involves any element of Y and it is called ‘non-informative’ in case p(s) involves no element of Y. An informative design may be contemplated if, for example, sampling proceeds by choosing an element i_1, observing the y_{i 1}-value and allowing the value of p\left(i_2 \mid\left(i_1, y_{i 1}\right)\right) to involve y_{i 1} and likewise choosing successive elements in s utilizing the y-values for the units already drawn in it. But, generally a design p is ‘non-informative’. In case p is non-informative, \operatorname{Prob}(d)=p(s), which is a constant free of Y so long as the underlying Y belongs to \Omega_d i.e. it is consistent with the observed survey data at hand. We take P_{Y}(d) also as the ‘likelihood’ of Y given the data d and write it as$$
$$L_d(\mathrm{Y}) = P_{\mathrm{Y}}(d) = p(s) I_{\mathrm{Y}}(d)$$
$$P_{\mathrm{Y}}(d_1) = P_{\mathrm{Y}}(d_1 \cap t(d_1)) = P_{\mathrm{Y}}(t(d_1)) C_1 \text{ with } C_1 \text{ a constant}$$
$$P_{\mathrm{Y}}(d_1) = P_{\mathrm{Y}}(t(d_1)) C_1 = P_{\mathrm{Y}}(t(d_2)) C_2 = P_{\mathrm{Y}}(d_2) \frac{C_1}{C_2}, \text{ with } C_2 \text{ a constant}$$
& =P_{\mathbf{Y}}\left(d_2\right) \frac{C_1}{C_2}, \text { with } C_2 \text { a constant. }
\end{aligned}
$$Since d^ is a sufficient statistic, P_{Y}\left(d_1^*\right)=P_{Y}\left(d_1\right) C_3, C_3 is a constant. ## 抽样调查代考 ## 统计代写|抽样调查作业代写sampling theory of survey代考|Bayesian approach I_ {Y}(d)=| left { begin { array }{|} 1 \& Itext { if } Y \backslash in \backslash Omega_d \backslash 0 \& Itext { if } Y \backslash notin \backslash 0 mega_d lend{array}， \正确的。 \ \$$

, theprobabilityofobservingthesurveydata $\mathrm{d}$ when 是
isintheunderlyingparametricspace. Asurveydesign iscalled $^*$ in formative if $^{\prime} \mathrm{p}(\mathrm{s})$
involvesanyelemento $f$ 是anditiscalled’non $-$ in formative’ incasep(s)
involvesnoelemento $f$ 是
. Aninformativedesignmaybecontemplatedif, forexample, samplingproceedsbychoosin i_1, observingthey_{i 1}-valueandallowingthevalueof $\mathrm{p} \backslash \mathrm{eft}(\mathrm{i} 2$ \mid \eft(i_1, y_{i
1}\right)\right)toinvolvey_{i 1}andlikewisechoosingsuccessiveelementsin 秒utilizingthe 是 -values fortheunitsalreadydrawninit. But, generallyadesign $\mathrm{p}$
is’non – informative’. Incasepisnon – informative, loperatorname ${$ 概率 $}(\mathrm{d})=\mathrm{p}(\mathrm{s})$
, whichisaconstantfreeo $f$ 是solongastheunderlying 是belongstolOmega_d
i.e.itisconsistentwiththeobservedsurveydataathand. Wetake $\mathrm{P}{-}{\mathrm{Y}}(\mathrm{d})$ alsoasthe ‘likelihood’of 是giventhedatadandwriteitas $\$$\mathrm{L}{-} \mathrm{d}(I m a t h r m{Y})=P_{-}{\operatorname{Imathrm}{\mathrm{Y}}}(\mathrm{d})=\mathrm{p}(\mathrm{s}) I_{-}{I m a t h r m{Y}}(\mathrm{d}) \ \$$ 因此，对于“非信息”设计，可能性是一个常数，不受$\$$Ysolongasitinvolves 是inlOmega_d; i.e.Y \$$ 与 观察到的数据一致。

## 统计代写|抽样调查作业代写sampling theory of survey代考|Minimal sufficiency

\text { since } \$t\left(d_1\right)=t\left(d_2\right) \text { \$itfollowsthat }
Veft(d_2\right)\right) C_2\ lend{aligned } \ \

$\$ P_{-}{Y} \backslash$left(d_1^*$\backslash$right)=P_${Y} \backslash$left(d_1\right) C_3，C_3\$是常数。

## 广义线性模型代考

