统计代写|贝叶斯统计代写Bayesian statistics代考|Exploratory data analysis methods

统计代写|贝叶斯统计代写beyesian statistics代考|Non-spatial graphical exploration

Recall the air pollution data set nysptime introduced in Section $1.3 .1$ which contains the daily maximum ozone concentration values at the 28 sites shown in Figure $1.1$ in the state of New York for the 62 days in July and August 2006. From this data set we have created a spatial data set, named nyspatial, which contains the average air pollution and the average values of the three covariates at the 28 sites. Figure $3.1$ provides a histogram for the response, average daily ozone concentration levels, at the 28 monitoring sites. The plot does not show a symmetric bell-shaped histogram but it does admit the possibility of a unimodal distribution for the response. The $R$ command used to draw the plot is given below:

The geom_histogram command has been invoked with a bin width argument of $4.5$. The shape of the histogram will change if a different bin width is supplied. As is well known, a lower value will provide a lesser degree of smoothing while a higher value will increase more smoothing by collapsing the number of classes. It is also possible to adopt a different scale, e.g. square root or logarithm, but we have not done so here to illustrate modeling on the original scale of the data. We shall explore different scales for the spatio-temporal version of this data set.

Figure $3.2$ provides a pair-wise scatter plot of the response against the three explanatory covariates: maximum temperature, wind speed and relative humidity. The diagonal panels in this plot provides kernel density estimates of the variables. This plot reveals that wind speed correlates the most with ozone levels at this aggregated average level. As is well known, see e.g. Sahu and Bakar (2012a), the maximum temperature also positively correlates with the ozone levels. Relative humidity is seen to have the least amount of correlation with ozone levels. This plot has been obtained using the commands.

统计代写|贝叶斯统计代写beyesian statistics代考|Exploring spatio-temporal point reference data

This section illustrates EDA methods with the nysptime data set in bmstdr. To decide the modeling scale Figure $3.5$ plots the mean against variance for each site on the original scale and also on the square root scale for the response. A stronger linear mean-variance relationship with a larger value of the slope for the superimposed regression line is observed on the original scale making this less suitable for modeling purposes. This is because in linear statistical modeling we often model the mean as a function of the available covariates and assume equal variance (homoscedasticity) for the residual differences between the observed and modeled values. A word of caution here is that the right panel does not show a complete lack of mean-variance relationship. However, we still prefer to model on the square root scale to stabilize the variance and in this case the predictions we make in Chapter 7 for ozone concentration values do not become negative.

Temporal variations are illustrated in Figures $3.6$ for all 28 sites and in Figure $3.8$ for the 8 sites which have been used for model validation purposes in Chapters 6 and 7. Figure $3.7$ shows variations of ozone concentration values for the 28 monitoring sites. Suspected outliers, data values which are at a distance beyond $1.5$ times the inter quartile range from the whiskers, are plotted as red stars. Such high values of ground level ozone pollution are especially harmful to humans.

统计代写|贝叶斯统计代写beyesian statistics代考|Exploring areal Covid-19 case and death data

This section explores the Covid-19 mortality data introduced in Section 1.4.1. The bmstdr data frame engtotals contains aggregated number of deaths along with other relevant information for analyzing and modeling this data set. The data frame object engdeaths contains the death numbers by the 20 weeks from March 13 to July 31,2020 . These two data sets will be used to illustrate spatial and spatio-temporal modeling for areal data in Chapter 10 .
Typical such areal data are represented by a choropleth map which uses shades of color or grey scale to classify values into a few broad classes, like a histogram. Two choropleth maps have been provided in Figure $1.9$.

For the engtotals data set the minimum and maximum number of deaths were 4 and 1223 respectively for the City of London (a very small borough within greater London with population 9721) and Birmingham with population $1,141,816$ in 2019 . However, the minimum and maximum death rates per 100,000 were $10.79$ and $172.51$ respectively for Hastings (in the South East) and Hertsmere (near Watford in greater London) respectively.

Calculation of the Moran’s I for number and rate of deaths is performed by using the moran .me function in the library spdep. This function requires the spatial adjacency matrix in a list format, which is obtained by the poly2nb and nb2listw functions in the spdep library. The Moran’s I statistics for the raw observed death numbers and the rate are found to be $0.34$ and $0.45$ respectively both with a p-value smaller than $0.001$ for the null hypothesis of no spatial autocorrelation. The permutation tests in statistics randomly permute the observed data and then calculates the relevant statistics for a number of replications. These replicate values of the statistics are used to approximate the null distribution of the statistics against which the observed value of the statistics for the observed data is compared and an approximate $\mathrm{p}$-value is found. The tests with Geary’s $\mathrm{C}$ statistics gave a $\mathrm{p}$-value of less than $0.001$ for the death rate per 100,000 but the p-value was higher, $0.025$, for the un-adjusted observed Covid death numbers. Thus, the higher degree of spatial variation in the death rates has been successfully detected by the Geary’s statistics. The code lines to obtain these results are given below.

