Statistics is a formal science, and knowledge gained through the formal scientific method is dynamic and continuously evolving in its own right. Sometimes factual science requires the use of statistical techniques in the course of factual research in order to obtain new knowledge based on experimentation and observation. In such cases, the application of statistics allows data from representative samples to be analysed in order to explain the correlations and dependencies of random or conditionally occurring physical or natural phenomena.
Statistics is applicable to a wide range of factual sciences, from physics to social sciences, from health sciences to quality control. In addition, it is used in commercial or governmental organisations to describe datasets obtained for decision making, or to generalise observed features.
Today, statistics as applied to factual sciences allows for the study of specific populations by collecting information, analysing data and interpreting results. It is also an important science for the quantitative study of groups or collective phenomena.
Suppose an economist is organizing a survey of American minimum wage workers, and is interested in understanding how many workers that earn the minimum wage are teenagers. ${ }^1$ Suppose further that one out of every four minimum wage workers is a teenager. If the economist finds 80 minimum wage workers for his survey, what’s the probability that he interviews exactly 14 teenagers? 35 teenagers? What’s the probability that he gets at least 5 teenagers in his survey?
Solution to 2: In this case, we have $p=0.25$ again and $n=80$. We again use the binomial formula to obtain the solution to the first question: $$ P(K=14)=\left(\begin{array}{l} 80 \ 14 \end{array}\right) .25^{14}(1-.25)^{80-14}=0.0319 . $$ The second uses the same formula: $$ P(K=35)=\left(\begin{array}{l} 80 \ 35 \end{array}\right) \cdot 25^{35}(1-.25)^{80-35}=0.00011704 $$
The origin of mathematics is to a large extent connected with the beginning of human agriculture”. The first three requisites of mathematics were: calculations for the management of crop distribution and commercial transactions, measurements for the management of agricultural land, and the elucidation of the periodicity of astronomical phenomena for the calendars used in determining the time of agricultural activity. It can be argued that these three imperatives correspond roughly to the three main areas of mathematics: the study of structure, space and change. For example, from experience in civil engineering, it is known that a triangle with a side ratio of 3 : 4 : 5 is a right-angled triangle, but it is not generally known that the side ratio of a right-angled triangle is c2 = a2 + b2 (with c, b, and a being the side ratios) (Pythagoras’ theorem). In the days when mathematics was purely practical rather than a separate discipline, it was not a problem to treat these relationships as if they were data in the natural sciences and to prove them correct by citing a large number of examples. However, since the number of numbers is infinite, it is impossible to prove a complete statement by studying a large number of numbers. Since mathematics has been studied as a discipline, “mathematical proofs”, which use logic to determine truth or falsehood, have been developed. Mathematical proofs are also highly valued in modern mathematics.
Let $u$ and $v$ be arbitrary vectors of an inner product space. Then $$ |(\boldsymbol{u}, \boldsymbol{v})| \leq(\boldsymbol{u}, \boldsymbol{u})^{\frac{1}{2}}(\boldsymbol{v}, \boldsymbol{v})^{\frac{1}{2}} $$
PROOF If $v=0$, the inequality is obviously satisfied. Suppose $v \neq 0$. Then for an arbitrary scalar $\alpha \in \mathbb{C}(\mathbb{R})$ $$ 0 \leq(\boldsymbol{u}-\alpha \boldsymbol{v}, \boldsymbol{u}-\alpha \boldsymbol{v})=(\boldsymbol{u}, \boldsymbol{u})-\alpha(\boldsymbol{v}, \boldsymbol{u})-\bar{\alpha}(\boldsymbol{u}, \boldsymbol{v})+\alpha \bar{\alpha}(\boldsymbol{v}, \boldsymbol{v}) $$ Take $\alpha=\overline{(\boldsymbol{v}, \boldsymbol{u})} /(\boldsymbol{v}, \boldsymbol{v})$. Then $\bar{\alpha}=\overline{(\boldsymbol{u}, \boldsymbol{v})} /(\boldsymbol{v}, \boldsymbol{v})$ and $$ (\boldsymbol{u}, \boldsymbol{u})-\frac{|(\boldsymbol{v}, \boldsymbol{u})|^2}{(\boldsymbol{v}, \boldsymbol{v})}-\frac{|(\boldsymbol{u}, \boldsymbol{v})|^2}{(\boldsymbol{v}, \boldsymbol{v})}+\frac{|(\boldsymbol{u}, \boldsymbol{v})|^2}{(\boldsymbol{v}, \boldsymbol{v})} \geq 0 $$ or $$ (\boldsymbol{u}, \boldsymbol{u})(\boldsymbol{v}, \boldsymbol{v})-|(\boldsymbol{u}, \boldsymbol{v})|^2 \geq 0 $$ from which the assertion follows. The Cauchy-Schwarz inequality is a useful tool in many proofs in analysis. For brevity, we shall follow common practice and refer to it as simply the Schwarz inequality.
statistics-labTM为您提供悉尼大学(英语:The University of Sydney)Analysis (Advanced)数学分析(高级) 澳洲代写代考和辅导服务!
课程介绍:
Analysis grew out of calculus, which leads to the study of limits of functions, sequences and series. It is one of the fundamental topics underlying much of mathematics including differential equations, dynamical systems, differential geometry, topology and Fourier analysis. This advanced unit introduces the field of mathematical analysis both with a careful theoretical framework as well as selected applications. This unit will be useful to students with more mathematical maturity who study mathematics, science, or engineering. Starting off with an axiomatic description of the real numbers system, this unit concentrates on the limiting behaviour of sequences and series of real and complex numbers. This leads naturally to the study of functions defined as limits and to the notion of uniform con-vergence. Special attention is given to power series, leading into the theory of analytic functions and complex analysis. Besides a rigorous treatment of many concepts from calculus, you will learn the basic results of complex analysis such as the Cauchy integral theorem, Cauchy integral formula, the residues theorems, leading to useful techniques for evaluating real integrals. By doing this unit, you will develop solid foundations in the more formal aspects of analysis, including knowledge of abstract concepts, how to apply them and the ability to construct proofs in mathematics.
MATH2923|Analysis (Advanced)数学分析(高级) 悉尼大学
Analysis (Advanced)数学分析(高级)案例
问题 1.
A countable union of countable sets is countable.
Proof. Let $\left\{A_n\right\}$ be a countable collection of countable sets, and let $A=\cup_{n=1}^{\infty} A_n$. Write $A_n=\left\{a_{n 1}, a_{n 2}, \ldots\right\}$. Define $f: \mathbb{N} \times \mathbb{N} \rightarrow \cup_{n=1}^{\infty} A_n$ by $f(m, n)=a_{m n}$. Clearly, $f$ is onto. By theorem 2.1.6, there exists a bijection $g: \mathbb{N} \rightarrow \mathbb{N} \times \mathbb{N}$. The composition fog maps $\mathbb{N}$ onto A. By theorem 2.1.9, A is countable.
问题 2.
Let $A$ and $B$ be nonempty sets. Then there is an injection from $A$ to $B$ or an injection from $B$ to $A$.
Proof. Let $\mathfrak{B}$ be the collection of all injective functions $f$ such that Dom $(f) \subseteq A$ and $\Re(f) \subseteq B$. Let $\left{f_\alpha: \alpha \in I\right}$ be an indexing of $\mathfrak{B}$ and, for $\alpha \in I$, write $A_\alpha=$ $\operatorname{Dom}\left(f_\alpha\right)$, and $B_\alpha=\Re\left(f_\alpha\right)$. Partially order $\mathfrak{B}$ as follows: $f_\alpha \leq f_\beta$ if $f_\beta$ extends $f_\alpha$. More explicitly, $f_\alpha \leq f_\beta$ means that $A_\alpha \subseteq A_\beta, B_\alpha \subseteq B_\beta$, and the restriction of $f_\beta$ to $A_\alpha$ is $f_\alpha$. Clearly, $\leq$ is a partial ordering of $\mathfrak{B}$. Now let $\mathfrak{\mathcal { E }}$ be a chain in $\mathfrak{B}$, and index $\mathfrak{C}$ by a subset J of $I ; \mathfrak{S}=\left{f_\alpha: \alpha \in J\right}$. We show that $\mathfrak{C}$ has an upper bound: let $A_\alpha=\operatorname{Dom}\left(f_\alpha\right), B_\alpha=\Re\left(f_\alpha\right), S=\cup_{\alpha \in J} A_\alpha$, and $T=\cup_{\alpha \in J} B_\alpha$. Definef $: S \rightarrow$ Tas follows: if $x \in S$, choose a set $A_\alpha$ that contains $x$, and let $f(x)=f_\alpha(x)$. The function $f$ is well defined because $\mathfrak{C}$ is a chain. Specifically, if $x \in A_\alpha \cap A_\beta(\alpha, \beta \in J)$, then $f_\alpha \leq f_\beta$ or $f_\beta \leq f_\alpha$; say, the former. Since $f_\beta$ extends $f_\alpha, f_\alpha(x)=f_\beta(x)$. We leave it to the reader to verify that $f$ is an injection. Clearly, $f$ is an upper bound of $\mathfrak{\mathcal { E }}$. By Zorn’s lemma, $\mathfrak{B}$ has a maximal element, say, $f_1$. Write $A_1=\operatorname{Dom}\left(f_1\right)$ and $B_1=\Re\left(f_1\right)$. If $A_1=A$, then $f_1$ is an injection from $A$ into $B$. If $B_1=B$, then $f_1^{-1}$ is an injection from $B$ to $A$, and the proof is complete. We now show that $A_1 \neq A$ and $B_1 \neq B$ cannot occur simultaneously. If that were the case, pick elements $a \in A-A_1, b \in B-B_1$, and extend $f_1$ to a function $f: A_1 \cup{a} \rightarrow B_1 \cup{b}$ by defining $f(a)=b$ and $\left.f\right|_{A_1}=f_1$. Clearly, $f$ is a strict extension of $f_1$, and this contradicts the maximality of $f_1$.
问题 3.
Let $B$ be a nonempty subset of a set $A$. If there is an injection $f$ : $A \rightarrow B$, then $A \approx B$.
Proof. Assume, without loss of generality, that $\Re(f) \subset B \subset A$ (strict inclusions). Define the powers of $f$ as follows: $f^{(1)}(x)=f(x), f^{(n+1)}(x)=f\left(f^{(n)}(x)\right), n \geq 1$. Let $B^{\prime}=\left{f^{(n)}(x): x \in A-B, n \in \mathbb{N}\right}$. Note that $B^{\prime} \subseteq \Re(f)$ and that $f\left(B^{\prime}\right) \subseteq$ $B^{\prime}$. Let $C=(A-B) \cup B^{\prime}$, and let $f_1$ be the restriction of $f$ to $C$. Thus $f_1$ is an injection from $C$ to $B^{\prime}$. We now show that $f_1$ is a bijection. If $y \in B^{\prime}$, then $y=f^{(n)}(x)$ for some $x \in A-B$. If $n=1$, then $y=f(x), x \in A-B \subseteq C$. If $n>1$, then $y=f(z), z=f^{(n-1)}(x) \in B^{\prime} \subseteq C$. Now let $D=B-B^{\prime}$. Since $B^{\prime} \subseteq \Re(f), D=$ $B-B^{\prime} \supseteq B-\Re(f) \neq \varnothing$. The reader can check that $B^{\prime}$ and $D$ partition $B$ and that the three sets $B^{\prime}, D$, and $A-B$ partition $A ;$ a simple Venn diagram makes this abundantly clear. Thus $C$ and $D$ partition $A$, and $B^{\prime}$ and $D$ partitions $B$. The function $h: A \rightarrow B$ defined below is a bijection from $A$ to $B$ : $$ h(x)= \begin{cases}g(x) & \text { if } x \in C, \ x & \text { if } x \in D .\end{cases} $$
MATH2871|Data Management for Statistical Analysis统计分析数据管理 澳洲国立大学
statistics-labTM为您提供澳大利亚国立大学(The Australian National University)Data Management for Statistical Analysis统计分析数据管理澳洲代写代考和辅导服务!
课程介绍:
The School of Mathematics and Statistics is proud to announce a collaborative venture with SAS to provide a practical introduction to the management and analysis of data. Large data sets are now found widely in business, finance, bioinformatics, government, intelligence and elsewhere, and skills in querying, cleaning, managing, displaying and analysing data are widely sought.
MATH2871|Data Management for Statistical Analysis统计分析数据管理 澳洲国立大学
Data Management for Statistical Analysis统计分析数据管理案例
问题 1.
Write a Matlab script that calculates the mean and median of a sample of 100 uniform random numbers between 0 and 2 and the percentage of points in the sample that are greater than 1 .
clear all;
clc;
$\mathrm{n}=100$;
\% Generate 100 uniform random numbers between 0 and 2
$\mathrm{x}=2$ * rand $(\mathrm{n}, 1)$;
\% Calculate the mean
$\mathrm{mu}=\operatorname{mean}(\mathrm{x})$;
\% Calculate the median
med $=\operatorname{median}(\mathrm{x})$;
\% Find the percentage of points greater than 1
per $=\operatorname{sum}(\mathrm{x}>1) / \mathrm{n}$;
$\mathrm{mu}=$
1.0089
med $=$
0.9841
per $=$
0.4900
问题 2.
In this exercise, we demonstrate the Central Limit Theorem, where each $X_i$ is exponential. a. Generate 1000 random numbers from the exponential distribution with $\lambda=6$, and plot them in a histogram. This should give you an idea of what the exponential distribution looks like. b. For $n=2$, repeat the following 1000 times: Generate a random sample of $n$ numbers from the exponential distribution with $\lambda=6$. c. Compute the sample mean of the $n$ numbers and standardize it using the true mean and standard deviation of the distribution. d. Make a histogram and normal plot of the 1000 sample means. e. Repeat (b)-(d) for $n=10,20$, and 100. Put all four histograms and all four normal plots in the same window. Comment on their shapes.
clear all;
clc;
close all;
num samples $=1000$;
1 amb da $=6$;
mu $=1 /$ lambda ;
sigma = mu;
\% Part a
\% Generate 1000 random numbers from the exponential distribution with
\& lambda $=6$
$\mathrm{x}=$ exprnd (mu, num_samples, 1);
\% Plot a histogram
hist $(\mathrm{x}, 20)$;
title('Histogram of 1000 exponential random variables with lambda = 6');
\% Part b,c, d,e
sample_sizes $=\left[\begin{array}{llll}2 & 10 & 20 & 100\end{array}\right]$;
figure;
for $i=1:$ length (sample_sizes)
sample_size $=$ samplēe_sizes (i);
$\mathrm{x}=$ expornd (mu, num_samples,sample_size) ;
sample_mean $=$ meañ $\left(\mathrm{x}^{\prime}\right)$;
$z_{-}$score $=($sample_mean $-\mathrm{mu}) /($ sigma/sqrt (num_samples) $)$;
\% Plot a histogram $z$-scores
subplot $(2,2, i)$;
hist(z_score);
title(['Sample size $n=$ ', num2str(sample_size)])
\&lot a normal plot of $z$-scores
subplot ( $2,2, i)$;
normplot(z_score);
title(['Samplesize $n=$ ', num2str(sample_size)])
\% Plot a histogram z-scores
subplot $(2,2, i)$;
hist (z score);
title(['Sample size $n=$ ', num2str(sample_size)])
\% Plot a normal plot of $z$-scores
subplot $(2,2$, i) ;
normplot (z_score);
title(['Sample size $n=$ ', num2str(sample_size)])
end
问题 3.
a) Obviously the $\alpha$-risk is 0.05 from definition. Let’s calculate the $\beta$-risk of the rule if $\mu=1$. The $\beta$-risk is just $\beta=1-\pi(1)=1-\Phi\left[-z_\alpha+\frac{\left(\mu-\mu_0\right) \sqrt{n}}{\sigma}\right]=1-\Phi(-1.645+3)=1-\Phi(1.355)=0.0877$. b) Here is the script and the results for my iterations:
clear all;
clc;
close all;
\& Parameters
level $=0.05$;
mu_0 = 0;
$\mathrm{mu}^{-}=1$;
sigma $=1$;
$\mathrm{n}=9$;
num samples $=100$;
\% Reject if sample mean is greater than critical_value
critical_value = mu_0+norminv $(1-$ level $){ }^* \operatorname{sigma} / \operatorname{sqr} \bar{t}(n)$;
\% Generate 100 samples of size $n$ with mean mu_o
$\mathrm{x}=$ sigma*randn (num_samples, $\mathrm{n}$ ) +mu_0;
$\mathrm{x}$ bar $=\operatorname{mean}\left(\mathrm{x}^{\prime}\right)$;
a_risk = sum (x_bar $>$ critical_value)/num_samples;
\% Generate 100 samples of size $n$ with mean mu
$\mathrm{x}=$ sigma*randn (num_samples, $\mathrm{n})+\mathrm{mu}$;
$\mathrm{x}_{-}$bar $=\operatorname{mean}\left(\mathrm{x}^{\prime}\right)$;
b_risk = sum (x_bar<critical_value)/num_samples;
$\gg$ a_risk
a_risk $=$
0.0500
$>$ b_risk
b_risk $=$
0.0700
The results are close to their theoretical values.
MATH2871|Data Management for Statistical Analysis统计分析数据管理 澳洲国立大学
statistics-labTM为您提供澳大利亚国立大学(The Australian National University)Introduction to Mathematical Thinking: Problem-Solving and Proofs数学思维导论: 解决问题与证明澳洲代写代考和辅导服务!
课程介绍:
This course focuses on the language of mathematical arguments. Rather than attacking advanced topics, we will use simple mathematics to develop an understanding of how results are established. We begin with clearly stated and plausible assumptions or axioms and then develop a more and more complex theory from them. The course, and the lecturer, will have succeeded if you finish the course able to construct valid arguments of your own and to criticise those that are presented to you.
MATH2222|Introduction to Mathematical Thinking: Problem-Solving and Proofs数学思维导论: 解决问题与证明 澳洲国立大学
Introduction to Mathematical Thinking: Problem-Solving and Proofs数学思维导论: 解决问题与证明案例
问题 1.
Let $G=(V, E)$ be a connected graph that contains a circuit $C$. If $e$ is an edge in $C$, then the subgraph $G^{\prime}=(V, E \backslash{e})$ is still connected.
Note that a circuit in a graph is not a subgraph; however, the set of vertices and the set of edges in the circuit do form a subgraph. When talking about a vertex in a graph and also in a subgraph, confusion could arise about the vertex’s degree. If $G^{\prime}$ is a subgraph of $G$, and a vertex, $v$, is in both $G$ and $G^{\prime}$, then we will use the notation $\operatorname{deg}G(v)$ and $\operatorname{deg}{G^{\prime}}(v)$ to denote its degree in $G$ and $G^{\prime}$ respectively.
问题 2.
Let $S=\left(s_n \mid n \in \mathbb{N}\right)$ be a sequence. Then a subsequence, $T$, of $S$ is a sequence obtained from $S$ by omitting some of the terms of $S$ while retaining the order. So $T$ can be written in the form $\left(t_k=s_{n_k} \mid k \in \mathbb{N}\right)$ subject to the condition that if $i<j$, then $n_i<n_j$.
This definition is really hard to parse; in particular, the subscript with its own subscript can be bewildering. So let’s do an example. Consider the sequence $$ \begin{aligned} S & =\left(s_n=1+3 n \mid n \in \mathbb{N}\right) \ & =(4,7,10,13,16,19,22,25,28,31, \ldots), \end{aligned} $$ which has the following two subsequences (among many others): $$ \begin{aligned} T & =\left(t_k \mid k \in \mathbb{N}\right)=(4,10,16,22,28,34, \ldots) \quad \text { and, } \ U & =\left(u_k \mid k \in \mathbb{N}\right)=(7,10,25,28,31,34,37, \ldots) . \end{aligned} $$ We could describe $T$ as the sequence containing the odd terms (first, third, fifth…) from $S$ (in the same order). So the first term of $T$ is the first term of $S$; the second term of $T$ is the third term of $S$,
and similarly, the third term of $T$ is the fifth term of $S$. In symbols, $t_1=s_1, t_2=s_3, t_3=s_5$, and so forth. In particular, $t_k=s_{2 k-1}$, so $n_k=2 k-1$. It’s easy to check that, if $i<j$, then $n_i=2 i-1<$ $2 j-1=n_j$.
We could describe the subsequence $U$ as the sequence formed from $S$ by dropping the first and fourth through seventh terms. This subsequence is a much less regular pattern, so it will be harder to find a formula for $n_k$, but having an easy formula is not required for being a subsequence. In this case, $u_1=s_2, u_2=s_3, u_3=s_8, u_4=s_9$, $u_5=s_{10}$, and so on. So in the notation $T=\left(t_i=s_{n_i}\right)$, the subscripts $n_1=2, n_2=3, n_3=8, n_4=9, n_5=10$, and so forth.
Although it is important to understand this definition of subsequence, most of the time it will be obvious from the description that the terms of a potential subsequence are in the same order as in the parent sequence.
问题 3.
Show that $(n-1) ! \equiv 0(\bmod n)$ for composite $n>4$. [Hint: Make sure that your proof works for the case $n=p^2$, where $p$ is a prime].
Solution: Let $p$ be the smallest prime dividing $n$. If $n \neq p^2$ then $p$ and $n / p$ are both less than $n$ and are distinct. So $(n-1)$ ! is divisible by $p(n / p)=n$. Now, if $n=p^2$ then since $p>2$ (because $n>4$ ) we see that $p$ and $2 p$ are both less than $n$. So $(n-1)$ ! is divisible by $p \cdot 2 p=2 p^2=2 n$ and therefore by $n$.
MATH2222|Introduction to Mathematical Thinking: Problem-Solving and Proofs数学思维导论: 解决问题与证明 澳洲国立大学
statistics-labTM为您提供悉尼大学(英语:The University of Sydney)Number Theory and Cryptography数论与密码学澳洲代写代考和辅导服务!
课程介绍:
Cryptography is the branch of mathematics that provides the techniques for confidential exchange of information sent via possibly insecure channels. This unit introduces the tools from elementary number theory that are needed to understand the mathematics underlying the most commonly used modern public key cryptosystems. Topics include the Euclidean Algorithm, Fermat’s Little Theorem, the Chinese Remainder Theorem, Mobius Inversion, the RSA Cryptosystem, the Elgamal Cryptosystem and the Diffie-Hellman Protocol. Issues of computational complexity are also discussed.
MATH2088|Number Theory and Cryptography数论与密码学 悉尼大学
Number Theory and Cryptography数论与密码学案例
问题 1.
Find the gcd of 621 and 483 .
Solution: We run the Euclidean algorithm: $$ \begin{aligned} & 621=\mathbf{1} \cdot 483+138 \ & 483=\mathbf{3} \cdot 138+69 \ & 138=2 \cdot 69 . \end{aligned} $$ So $\operatorname{gcd}(621,483)=69$.
问题 2.
Find the smallest integer $N$ such that $\phi(n) \geq 5$ for all $n \geq N$.
Solution: Trying out small values of $n$, we see that $\phi(12)=4$ but $\phi(n)$ seems to be greater than 4 for all $n \geq 13$. Let’s prove this: suppose $n \geq 11$. Let $n=\prod p_i^{e_i}$, so $\phi(n)=$ $\prod p_i^{e_i-1}\left(p_i-1\right)$. Let $e$ be the power of 2 dividing $n$. If $e \geq 4$, then $\phi(n) \geq 2^{e-1} \geq 8$. So we only need to consider $e=0,1,2,3$.
If $e=0$, then $n$ is odd. If $n$ is prime, then $\phi(n)=n-1 \geq 12$. Otherwise, either $n$ will be divisible by at least two distinct odd primes $p$ and $q$, in which case $\phi(n) \geq(p-1)(q-1) \geq 2 \cdot 4=$ 8 , or $n$ is divisible by $p^2$ for some odd prime $p$, in which case $\phi(n) \geq p(p-1) \geq 3(3-1)=6$. Next suppose $e=1$. Then $n=2 m$ where $m$ is odd and $m=n / 2 \geq 7$. We have $\phi(n)=\phi(m)$. Then if $m$ is prime, $\phi(n)=m-1 \geq 6$. Otherwise, the above reasoning (for the $e=0$ case) shows that $\phi(n) \geq 6$.
Next, the case $e=2$. Then $n=4 m$, with $m$ odd and $m=n / 4>3$, so $m \geq 5$ since $m$ is an odd integer. So $\phi(n)=2 \phi(m)$. As before we show that $\phi(m) \geq 4$, so $\phi(n) \geq 8$.
Finally, if $e=3$ then $n=8 m$, with $m$ odd and $m=n / 8>1$. So $m \geq 3$. Then $\phi(n)=$ $4 \phi(m)=4 \cdot 2=8$.
问题 3.
Show that $(n-1) ! \equiv 0(\bmod n)$ for composite $n>4$. [Hint: Make sure that your proof works for the case $n=p^2$, where $p$ is a prime].
Solution: Let $p$ be the smallest prime dividing $n$. If $n \neq p^2$ then $p$ and $n / p$ are both less than $n$ and are distinct. So $(n-1)$ ! is divisible by $p(n / p)=n$. Now, if $n=p^2$ then since $p>2$ (because $n>4$ ) we see that $p$ and $2 p$ are both less than $n$. So $(n-1)$ ! is divisible by $p \cdot 2 p=2 p^2=2 n$ and therefore by $n$.
MATH2088|Number Theory and Cryptography数论与密码学 悉尼大学
statistics-labTM为您提供新南威尔士大学Introduction to Stochastic Analysis随机分析入门澳洲代写代考和辅导服务!
课程介绍:
Modern theory of financial markets relies on advanced mathematical and statistical methods that are used to model, forecast and manage risk in complex financial transactions. Stochastic analysis is an indispensible tool for the theory of financial markets, derivation of prices of standard and exotic options and other derivative securities, hedging related financial risk, as well as managing the interest rate risk.
MATH5975|Introduction to Stochastic Analysis随机分析入门 新南威尔士大学
Introduction to Stochastic Analysis随机分析入门案例
问题 1.
A graph $G$ is connected when, for two vertices $x$ and $y$ of $G$, there exists a sequence of vertices $x_0, x_1, \ldots, x_k$ such that $x_0=x, x_k=y$, and $x_i \sim x_{i+1}$ for $0 \leq i \leq k-1$. Show that random walk on $G$ is irreducible if and only if $G$ is connected.
Proof. Let $P$ denote the transition matrix of random walk on $G$. The random walk is irreducible if for any vertices $x$ and $y$ there exists an integer $k$ such that $P^k(x, y)>0$. Note that $P^k(x, y)>0$ if and only if there exist vertices $x_0=x, x_1, \ldots, x_k=y$ such that $\prod_{i=0}^{k-1} P\left(x_i, x_{i+1}\right)>0$, i.e., $x_i \sim x_{i+1}$ for all $0 \leq i \leq k-1$. Therefore, the random walk is irreducible if and only if $G$ is connected.
问题 2.
We define a graph to be a tree if it is connected but contains no cycles. Prove that the following statements about a graph $T$ with $n$ vertices and $m$ edges are equivalent: (a) $T$ is a tree. (b) $T$ is connected and $m=n-1$. (c) $T$ has no cycles and $m=n-1$.
Proof. The equivalence can be easily seen from Euler’s formula $m=n+l-2$ where $l$ denotes the number of faces of the graph, because any two of the following conditions will imply the other: (1) $T$ is connected $\Longleftrightarrow m=n+l-2$; (2) $T$ has no cycles $\Longleftrightarrow l=1$; (3) $m=n-1$. Since this simple equivalence is a special case (and sometimes the starting point of the proof) of Euler’s formula, it should be proved without the use of the more general theorem. We provide a long yet elementary proof here. All three parts of the following proof are based on a simple operation, namely, removing one edge and one vertex at a time. We assume without loss of generality that $G$ has at least one edge. First we need a claim.
问题 3.
Let $T$ be a tree. A leaf is a vertex of degree 1 . (a) Prove that $T$ contains a leaf. (b) Prove that between any two vertices in $T$ there is a unique simple path. (c) Prove that $T$ has at least 2 leaves.
Proof. Part (a) is established by the claim in the previous proof. For (b), since $T$ is connected and has no cycles, for vertices $x$ and $y$ in $T$, there is a simple path $x \sim x_1 \sim$ $\cdots \sim x_k \sim y$ between them. Suppose there exists another simple path $x \sim y_1 \sim \cdots \sim y_m \sim y$ between them. Then $x \sim x_1 \sim \cdots \sim x_k \sim y \sim y_m \sim \cdots \sim y_1 \sim x$ contains a cycle, which is a contradiction.
For $(\mathrm{c})$, let $x_0$ be a leaf and $x_1 \sim x_0$. Suppose we already have a simple path $x_0 \sim \cdots \sim x_i$. If $x_i$ is a leaf, then we are done; otherwise, there exists $x_{i+1} \neq x_{i-1}$ such that $x_i \sim x_{i+1}$. Since $T$ has no cycles, $x_{i+1} \notin\left{x_0, \ldots, x_i\right}$. The process must end because $T$ is finite, so we will eventually find another leaf $x_i$.
statistics-labTM为您提供多伦多大学 (University of Toronto)Statistical Inference 统计推理加拿大代写代考和辅导服务!
课程介绍:
The course surveys the various approaches that have been considered for the development of a theory of statistical reasoning. These include likelihood methods, Bayesian methods and frequentism. These are compared with respect to their strengths and weaknesses. To use statistics successfully requires a clear understanding of the meaning of various concepts such as likelihood, confidence, p-value, belief, etc. This is the purpose of the course.
STAC58H|Statistical Inference 统计推理 多伦多大学
Statistical Inference 统计推论案例
问题 1.
Accounting for voter turnout. Let $N$ be the number of people in the state of Iowa. Suppose $p N$ of these people support Hillary Clinton, and $(1-p) N$ of them support Donald Trump, for some $p \in(0,1) . N$ is known (say $N=3,000,000)$ and $p$ is unknown. (a) Suppose that each person in Iowa randomly and independently decides, on election day, whether or not to vote, with probability $1 / 2$ of voting and probability $1 / 2$ of not voting. Let $V_{\text {Hillary }}$ be the number of people who vote for Hillary and $V_{\text {Donald }}$ be the number of people who vote for Donald. Show that $$ \mathbb{E}\left[V_{\text {Hillary }}\right]=\frac{1}{2} p N, \quad \mathbb{E}\left[V_{\text {Donald }}\right]=\frac{1}{2}(1-p) N . $$ What are the standard deviations of $V_{\text {Hillary }}$ and $V_{\text {Donald }}$, in terms of $p$ and $N$ ? Explain why, when $N$ is large, we expect the fraction of voters who vote for Hillary to be very close to $p$.
(a) Recall that a $\operatorname{Binomial}(n, p)$ random variable has mean $n p$ and variance $n p(1-p)$. A total of $p N$ people support Hillary, each voting independently with probability $\frac{1}{2}$, so $V_{\text {Hillary }} \sim \operatorname{Binomial}\left(p N, \frac{1}{2}\right)$. Then $$ \mathbb{E}\left[V_{\text {Hillary }}\right]=\frac{1}{2} p N, \quad \operatorname{Var}\left[V_{\text {Hillary }}\right]=\frac{1}{4} p N, $$ and the standard deviation of $V_{\text {Hillary }}$ is $\sqrt{\frac{1}{4} p N}$. Similarly, as $(1-p) N$ people support Donald, $V_{\text {Donald }} \sim \operatorname{Binomial}\left((1-p) N, \frac{1}{2}\right)$, so $$ \mathbb{E}\left[V_{\text {Donald }}\right]=\frac{1}{2}(1-p) N, \quad \operatorname{Var}\left[V_{\text {Donald }}\right]=\frac{1}{4}(1-p) N, $$ and the standard deviation of $V_{\text {Donald }}$ is $\sqrt{\frac{1}{4}(1-p) N}$. The fraction of voters who vote for Hillary is $$ \frac{V_{\text {Hillary }}}{V_{\text {Hillary }}+V_{\text {Donald }}}=\frac{V_{\text {Hillary }} / N}{V_{\text {Hillary }} / N+V_{\text {Donald }} / N} . $$ As $\mathbb{E}\left[V_{\text {Hillary }} / N\right]=\frac{1}{2} p$ (a constant) and $\operatorname{Var}\left[V_{\text {Hillary }} / N\right]=\frac{1}{4} p / N \rightarrow 0$ as $N \rightarrow \infty$, $V_{\text {Hillary }} / N$ should be close to $\frac{1}{2} p$ with high probability when $N$ is large. Similarly, as $\mathbb{E}\left[V_{\text {Donald }} / N\right]=\frac{1}{2}(1-p)$ and $\operatorname{Var}\left[V_{\text {Donald }} / N\right]=\frac{1}{4}(1-p) / N \rightarrow 0$ as $N \rightarrow \infty$, $V_{\text {Donald }} / N$ should be close to $\frac{1}{2}(1-p)$ with high probability when $N$ is large. Then the fraction of voters for Hillary should, with high probability, be close to $$ \frac{\frac{1}{2} p}{\frac{1}{2} p+\frac{1}{2}(1-p)}=p . $$ (The above statements “close to with high probability” may be formalized using Chebyshev’s inequality, which states that a random variable is, with high probability, not too many standard deviations away from its mean.)
问题 2.
(b) Now suppose there are two types of voters – “passive” and “active”. Each passive voter votes on election day with probability $1 / 4$ and doesn’t vote with probability $3 / 4$, while each active voter votes with probability $3 / 4$ and doesn’t vote with probability $1 / 4$. Suppose that a fraction $q_H$ of the people who support Hillary are passive and $1-q_H$ are active, and a fraction $q_D$ of the people who support Donald are passive and $1-q_D$ are active. Show that $$ \mathbb{E}\left[V_{\text {Hillary }}\right]=\frac{1}{4} q_H p N+\frac{3}{4}\left(1-q_H\right) p N, \quad \mathbb{E}\left[V_{\text {Donald }}\right]=\frac{1}{4} q_D(1-p) N+\frac{3}{4}\left(1-q_D\right)(1-p) N . $$ What are the standard deviations of $V_{\text {Hillary }}$ and $V_{\text {Donald }}$, in terms of $p, N, q_H$, and $q_D$ ? If we estimate $p$ by $\hat{p}$ using a simple random sample of $n=1000$ people from Iowa, as discussed in Lecture 1, explain why $\hat{p}$ might not be a good estimate of the fraction of voters who will vote for Hillary.
(b) Let $V_{\mathrm{H}, \mathrm{p}}$ and $V_{\mathrm{H}, \mathrm{a}}$ be the number of passive and active voters who vote for Hillary, and similarly define $V_{\mathrm{D}, \mathrm{p}}$ and $V_{\mathrm{D}, \mathrm{a}}$ for Donald. There are $q_H p N$ passive Hillary supporters, each of whom vote independently with probability $\frac{1}{4}$, so $$ V_{\mathrm{H}, \mathrm{p}} \sim \operatorname{Binomial}\left(q_H p N, \frac{1}{4}\right) . $$
Similarly, $$ \begin{aligned} V_{\mathrm{H}, \mathrm{a}} & \sim \operatorname{Binomial}\left(\left(1-q_H\right) p N, \frac{3}{4}\right), \ V_{\mathrm{D}, \mathrm{P}} & \sim \operatorname{Binomial}\left(q_D(1-p) N, \frac{1}{4}\right), \ V_{\mathrm{D}, \mathrm{a}} & \sim \operatorname{Binomial}\left(\left(1-q_D\right)(1-p) N, \frac{3}{4}\right), \end{aligned} $$ and these four random variables are independent. Since $V_{\mathrm{Hillary}}=V_{\mathrm{H}, \mathrm{p}}+V_{\mathrm{H}, \mathrm{a}}$, $$ \begin{aligned} \mathbb{E}\left[V_{\text {Hillary }}\right] & =\mathbb{E}\left[V_{\mathrm{H}, \mathrm{p}}\right]+\mathbb{E}\left[V_{\mathrm{H}, \mathrm{a}}\right]=\frac{1}{4} q_H p N+\frac{3}{4}\left(1-q_H\right) p N, \ \operatorname{Var}\left[V_{\text {Hillary }}\right] & =\operatorname{Var}\left[V_{\mathrm{H}, \mathrm{p}}\right]+\operatorname{Var}\left[V_{\mathrm{H}, \mathrm{a}}\right]=\frac{3}{16} q_H p N+\frac{3}{16}\left(1-q_H\right) p N=\frac{3}{16} p N, \end{aligned} $$ and the standard deviation of $V_{\text {Hillary }}$ is $\sqrt{\frac{3}{16} p N}$. Similarly, $$ \begin{aligned} \mathbb{E}\left[V_{\text {Donald }}\right] & =\mathbb{E}\left[V_{\mathrm{D}, \mathrm{p}}\right]+\mathbb{E}\left[V_{\mathrm{D}, \mathrm{a}}\right]=\frac{1}{4} q_D(1-p) N+\frac{3}{4}\left(1-q_D\right)(1-p) N, \ \operatorname{Var}\left[V_{\text {Donald }}\right] & =\operatorname{Var}\left[V_{\mathrm{D}, \mathrm{p}}\right]+\operatorname{Var}\left[V_{\mathrm{D}, \mathrm{a}}\right]=\frac{3}{16} q_D(1-p) N+\frac{3}{16}\left(1-q_D\right)(1-p) N \ & =\frac{3}{16}(1-p) N, \end{aligned} $$ and the standard deviation of $V_{\text {Donald }}$ is $\sqrt{\frac{3}{16}(1-p) N}$. The quantity $\hat{p}$ estimates $p$, but in this case $p$ may not be the fraction of voters who vote for Hillary: By the same argument as in part (a), the fraction of voters who vote for Hillary is given by $$ \begin{aligned} \frac{V_{\text {Hillary }}}{V_{\text {Hillary }}+V_{\text {Donald }}} & =\frac{V_{\text {Hillary }} / N}{V_{\text {Hillary }} / N+V_{\text {Donald }} / N} \ & \approx \frac{\frac{1}{4} q_H p+\frac{3}{4}\left(1-q_H\right) p}{\frac{1}{4} q_H p+\frac{3}{4}\left(1-q_H\right) p+\frac{1}{4} q_D(1-p)+\frac{3}{4}\left(1-q_D\right)(1-p)}, \end{aligned} $$ where the approximation is accurate with high probability when $N$ is large. When $q_H \neq q_D$, this is different from $p$ : For example, if $q_H=0$ and $q_D=1$, this is equal to $\frac{p}{p+(1-p) / 3}$ which is greater than $p$, reflecting the fact that Hillary supporters are more likely to vote than are Donald supporters.
问题 3.
(c) We do not know $q_H$ and $q_D$. However, suppose that in our simple random sample, we can observe whether each person is passive or active, in addition to asking them whether they support Hillary or Donald. ${ }^1$ Suggest estimators $\hat{V}{\text {Hillary }}$ and $\hat{V}{\text {Donald }}$ for $\mathbb{E}\left[V_{\text {Hillary }}\right]$ and $\mathbb{E}\left[V_{\text {Donald }}\right]$ using this additional information. Show, for your estimators, that $$ \mathbb{E}\left[\hat{V}{\text {Hillary }}\right]=\frac{1}{4} q_H p N+\frac{3}{4}\left(1-q_H\right) p N, \quad \mathbb{E}\left[\hat{V}{\text {Donald }}\right]=\frac{1}{4} q_D(1-p) N+\frac{3}{4}\left(1-q_D\right)(1-p) N $$
(c) Let $\hat{p}$ be the proportion of the 1000 surveyed people who support Hillary. Among the surveyed people supporting Hillary, let $\hat{q}H$ be the proportion who are passive. Similarly, among the surveyed people supporting Donald, let $\hat{q}_D$ be the proportion who are passive. (Note that these are observed quantities, computed from our sample of 1000 people.) Then we may estimate the number of voters for Hillary and Donald by $$ \begin{aligned} \hat{V}{\text {Hillary }} & =\frac{1}{4} \hat{q}H \hat{p} N+\frac{3}{4}\left(1-\hat{q}_H\right) \hat{p} N \ \hat{V}{\text {Donald }} & =\frac{1}{4} \hat{q}_D(1-\hat{p}) N+\frac{3}{4}\left(1-\hat{q}_D\right)(1-\hat{p}) N . \end{aligned} $$
$\hat{q}H \hat{p}$ is simply the proportion of the 1000 surveyed people who both support Hillary and are passive. Hence, letting $X_1, \ldots, X{1000}$ indicate whether each surveyed person both supports Hillary and is passive, we have $$ \hat{q}H \hat{p}=\frac{1}{n}\left(X_1+\ldots+X_n\right) . $$ Each $X_i \sim \operatorname{Bernoulli}\left(q_H p\right)$, so linearity of expectation implies $\mathbb{E}\left[\hat{q}_H \hat{p}\right]=q_H p$. Similarly, $\left(1-\hat{q}_H\right) \hat{p}, \hat{q}_D(1-\hat{p})$, and $\left(1-\hat{q}_D\right)(1-\hat{p})$ are the proportions of the 1000 surveyed people who support Hillary and are active, support Donald and are passive, and support Donald and are active, so the same argument shows $\mathbb{E}\left[\left(1-\hat{q}_H\right) \hat{p}\right]=\left(1-q_H\right) p, \mathbb{E}\left[\hat{q}_D(1-\hat{p})\right]=q_D(1-p)$, and $\mathbb{E}\left[\left(1-\hat{q}_D\right)(1-\hat{p})\right]=$ $\left(1-q_D\right)(1-p)$. Then applying linearity of expectation again yields $$ \mathbb{E}\left[\hat{V}{\text {Hillary }}\right]=\mathbb{E}\left[V_{\text {Hillary }}\right], \quad \mathbb{E}\left[\hat{V}{\text {Donald }}\right]=\mathbb{E}\left[V{\text {Donald }}\right] . $$
Data Science is an emerging and inherently interdisciplinary field. A key set of skills in this area fall under the umbrella of Statistical Machine Learning methods. This unit presents the opportunity to bring together the concepts and skills you have learnt from a Statistics or Data Science major, and apply them to a joint project with NUTM3888 Metabolic Cybernetics where Statistics and Data Science students will form teams with Nutrition students to solve a real world problem using Statistical Machine Learning methods. The unit will cover a wide breadth of cutting edge supervised and unsupervised learning methods will be covered including principal component analysis, multivariate tests, discrimination analysis, Gaussian graphical models, log-linear models, classification trees, k-nearest neighbours, k-means clustering, hierarchical clustering, and logistic regression. In this unit, you will continue to understand and explore disciplinary knowledge, while also meeting and collaborating through project-based learning; identifying and solving problems, analysing data and communicating your findings to a diverse audience. All such skills are highly valued by employers. This unit will foster the ability to work in an interdisciplinary team, and this is essential for both professional and research pathways in the future.
STAT3888|Statistical Machine Learning统计机器学习 悉尼大学
Statistical Inference (Advanced) 统计推论(高级)案例
问题 1.
Let $(X, Y) \in \mathbb{R}^d \times{0,1}$ be a random pair such that $\mathbb{P}(Y=k)=\pi_k>0\left(\pi_0+\pi_1=1\right)$ and the conditional distribution of $X$ given $Y$ is $X \mid Y \sim \mathcal{N}\left(\mu_Y, \Sigma_Y\right)$, where $\mu_0 \neq \mu_1 \in \mathbb{R}^d$ and $\Sigma_0, \Sigma_1 \in \mathbb{R}^{d \times d}$ are mean vectors and covariance matrices respectively.
Assume that $\Sigma_0=\Sigma_1=\Sigma$ is a positive definite matrix. Compute the Bayes classifier $h^$ as a function of $\mu_0, \mu_1, \pi_0, \pi_1$ and $\Sigma$. What is the nature of the sets $\left{h^=0\right}$ and $\left{h^*=1\right} ?$
(2) By Bayes’ rule, we have $\mathbb{P}(Y=y \mid X=x) \propto p_{X \mid Y=y}(x) \mathbb{P}(Y=y)$. Thus, given $X=x$, the value of the Bayes classifier is 0 if $$ \pi_0 p_{X \mid Y=0}>\pi_1 p_{X \mid Y=1}, $$ and is otherwise 1. Cancelling $(2 \pi)^{-d / 2}(\operatorname{det} \Sigma)^{-1 / 2}$ on both sides, and taking logs, the inequality reads $$ \log \pi_0-\frac{1}{2}\left(x-\mu_0\right)^{\top} \Sigma^{-1}\left(x-\mu_0\right)>\log \pi_1-\frac{1}{2}\left(x-\mu_1\right)^{\top} \Sigma^{-1}\left(x-\mu_1\right) . $$ This asks whether $x$ is at least $\log \pi_0-\log \pi_1$ units closer to $\mu_0$ than $\mu_1$, in distance measured by the quadratic form $\frac{1}{2} \Sigma^{-1}$. By the PSD assumption, this distance matches Euclidean distance after a linear transformation (given by the Cholesky factorization of $\Sigma^{-1}$ ). Then, geometrically, the boundary forms a hyperplane, and the two regions $\left{h^=0\right}$ and $\left{h^=1\right}$ form half-spaces.
问题 3.
Assume now that $\Sigma_0 \neq \Sigma_1$ are two positive definite matrices. What is the nature of the sets $\left{h^=0\right}$ and $\left{h^=1\right}$ ?
(3) The inequality defining the Bayes classifier remains quadratic in $x$, so the hypersurface separating $\left{h^=0\right}$ from $\left{h^=1\right}$ is a quadric. For example, with $$ \Sigma_0=\left(\begin{array}{ll} 1 & 0 \ 0 & 2 \end{array}\right), \quad \Sigma_1=\left(\begin{array}{ll} 2 & 0 \ 0 & 1 \end{array}\right), \quad \pi_0 \neq \pi_1, $$ the separating curve is a hyperbola.
In today’s data-rich world more and more people from diverse fields are needing to perform statistical analyses and indeed more and more tools for doing so are becoming available; it is relatively easy to point and click and obtain some statistical analysis of your data. But how do you know if any particular analysis is indeed appropriate? Is there another procedure or workflow which would be more suitable? Is there such thing as a best possible approach in a given situation? All of these questions (and more) are addressed in this unit. You will study the foundational core of modern statistical inference, including classical and cutting-edge theory and methods of mathematical statistics with a particular focus on various notions of optimality. The first part of the unit covers various aspects of distribution theory which are necessary for the second part which deals with optimal procedures in estimation and testing. The framework of statistical decision theory is used to unify many of the concepts. You will rigorously prove key results and apply these to real-world problems in laboratory sessions. By completing this unit you will develop the necessary skills to confidently choose the best statistical analysis to use in many situations.
Suppose that grades on a midterm and final have a correlation coefficient of 0.6 and both exams have an average score of 75 . and a standard deviation of 10 . (a). If a student’s score on the midterm is 90 what would you predict her score on the final to be?
Solution: (a). Let $x$ be the midterm score and $y$ be the final score. The least-squares regression of $y$ on $x$ is given in terms of the standardized values: $$ \frac{\hat{y}-\bar{y}}{s_y}=r \frac{x-\bar{x}}{s_x} $$ A score of 90 on the midterm is $(90-75) / 10=1.5$ standard deviations above the mean. The predicted score on the final will be $r \times 1.5=.9$ standard deviations above the mean final score, which is $75+(.9) \times 10=84$.
问题 2.
(b). If a student’s score on the final was 75 , what would you guess that his score was on the midterm?
(b). For this case we need to regress the midterm score $(x)$ on (y). The same argument in (a), reversing $x$ and $y$ leads to: $$ \frac{\hat{x}-\bar{x}}{s_x}=r \frac{y-\bar{y}}{s_y} $$ Since the final score was 75 , which is zero-standard deviations above $\bar{y}$, the prediction of the midterm score is $\bar{x}=75$.
问题 3.
(c). Consider all students scoring at the 75 th percentile or higher on the midterm. What proportion of these students would you expect to be at or above the 75 th percentile of the final? (i) $75 \%$, (ii) $50 \%$, (iii) less than $50 \%$, or (iv) more than $50 \%$.
(c). By the regression effect we expect dependent variable scores to be closer to their mean in standard-deviation units than the independdent variable is to its mean, in standard-deviation units. Since the 75 th percentile is on the midterm is above the mean, we expect these students to have average final score which is lower than the 75 th percentile (i.e., closer to the mean). This means (iii) is the correct answer.