📌 Q1 Markov Chains on a Cycle Graph (15 Marks)
Problem Statement: Consider a square with four vertices $A, B, C$, and $D$ arranged clockwise. A particle starts moving from the vertex $A$, and at any stage, it moves to either its clockwise neighbour or its anticlockwise neighbour with respective probabilities $0.6$ and $0.4$. Let $X_n \in \{A, B, C, D\}$ be the position after the $n$-th step, for $n \geq 1$.
(a) Find $\mathbb{P}(X_{12} = A)$.
(b) Find the stationary distribution of the Markov chain. Does the distribution of $X_n$ converge to this stationary distribution as $n \to \infty$? Justify your answer.
🧠 Approach & Key Concepts
This problem models a Random Walk on a Cycle Graph ($C_4$).
- For part (a): While a binomial combinatorial sum is possible, the most rigorous and elegant approach involves utilizing the spectral properties of the transition matrix. Since the state space forms a ring, the transition matrix is a Circulant Matrix, whose eigenvalues are easily found using roots of unity.
- For part (b): We verify double-stochasticity for the stationary distribution. For convergence, we must analyze the Periodicity of the chain. Bipartite graphs strictly oscillate, precluding point-wise convergence.
✍️ Step-by-Step Proof / Derivation
Step 1: Setting up the Transition Matrix
Let us index the state space $S = \{A, B, C, D\}$ as $\{0, 1, 2, 3\}$. The particle moves clockwise ($+1 \pmod 4$) with probability $p=0.6$, and anticlockwise ($-1 \pmod 4$) with probability $q=0.4$. The 1-step transition matrix $\mathbf{P}$ is:
$$ \mathbf{P} = \begin{pmatrix} 0 & 0.6 & 0 & 0.4 \\ 0.4 & 0 & 0.6 & 0 \\ 0 & 0.4 & 0 & 0.6 \\ 0.6 & 0 & 0.4 & 0 \end{pmatrix} $$
Step 2: Spectral Decomposition of the Circulant Matrix
Matrix $\mathbf{P}$ is a circulant matrix defined by its first row vector $\mathbf{c} = [0, 0.6, 0, 0.4]$. The eigenvalues $\lambda_k$ of a $4 \times 4$ circulant matrix are given by the Discrete Fourier Transform of $\mathbf{c}$:
$$ \lambda_k = \sum_{j=0}^{3} c_j \omega^{jk}, \quad \text{where } \omega = e^{2\pi i / 4} = i $$
Substituting $c_1 = p = 0.6$ and $c_3 = q = 0.4$, we compute the four eigenvalues:
- $\lambda_0 = p + q = 1$
- $\lambda_1 = p(i) + q(-i) = (0.6 - 0.4)i = 0.2i$
- $\lambda_2 = p(-1) + q(-1) = -1$
- $\lambda_3 = p(-i) + q(i) = -(0.6 - 0.4)i = -0.2i$
Step 3: Calculating $\mathbb{P}(X_{12} = A)$
Because the eigenvectors of a circulant matrix are independent of its entries (they are the columns of the Fourier matrix), the $n$-step transition probability returning to the starting state $P_{00}^{(n)}$ is simply the average of the $n$-th powers of its eigenvalues:
$$ P_{00}^{(n)} = \frac{1}{4} \sum_{k=0}^{3} \lambda_k^n $$
For $n = 12$, we evaluate the 12th power of each eigenvalue:
- $\lambda_0^{12} = 1^{12} = 1$
- $\lambda_1^{12} = (0.2i)^{12} = (0.2)^{12} (i^4)^3 = 0.2^{12} \cdot 1 = 0.2^{12}$
- $\lambda_2^{12} = (-1)^{12} = 1$
- $\lambda_3^{12} = (-0.2i)^{12} = 0.2^{12} (i^4)^3 = 0.2^{12}$
$$ P_{00}^{(12)} = \frac{1}{4} \left( 1 + 0.2^{12} + 1 + 0.2^{12} \right) = \frac{1}{4} \left( 2 + 2(0.2)^{12} \right) = \frac{1 + 0.2^{12}}{2} $$
Step 4: Finding the Stationary Distribution for Part (b)
A stationary distribution $\boldsymbol{\pi} = (\pi_A, \pi_B, \pi_C, \pi_D)$ must satisfy $\boldsymbol{\pi} \mathbf{P} = \boldsymbol{\pi}$ and $\sum \pi_i = 1$. By inspecting the transition matrix $\mathbf{P}$, we observe that both the row sums and the column sums are exactly $1$.
Because $\mathbf{P}$ is a doubly stochastic matrix over a finite state space, the unique stationary distribution is the uniform distribution:
$$ \boldsymbol{\pi} = \left( \frac{1}{4}, \frac{1}{4}, \frac{1}{4}, \frac{1}{4} \right) $$
Step 5: Analyzing Convergence and Periodicity
For a discrete-time Markov chain to converge pointwise to its stationary distribution ($\lim_{n \to \infty} P_{ij}^{(n)} = \pi_j$), the chain must be irreducible, positive recurrent, and aperiodic.
Let us define the period $d$ of state $A$. The period is the greatest common divisor of all $n$ such that returning to $A$ is possible: $d = \gcd\{n : P_{00}^{(n)} > 0\}$. Because the graph is a square (a bipartite graph), any path that starts at $A$ and returns to $A$ must take an even number of steps.
$$ d = \gcd(2, 4, 6, \dots) = 2 $$
Because the period $d = 2 > 1$, the Markov chain is periodic. Specifically, $P_{00}^{(2k+1)} = 0$ for all integers $k$, meaning the probability sequence strictly oscillates and does not converge to $1/4$.
Final Answer / Q.E.D:
(a) The probability $\mathbb{P}(X_{12} = A) = \frac{1 + 0.2^{12}}{2}$.
(b) The stationary distribution is $\boldsymbol{\pi} = (1/4, 1/4, 1/4, 1/4)$. However, the distribution of $X_n$ does not converge to this stationary distribution because the Markov chain is periodic with period $d=2$ (bipartite state space).
📌 Q2 Discrete MLE and Asymptotic Degeneracy (15 Marks)
Problem Statement: Let $X_1, X_2, \dots, X_n$ ($n > 1$) be independent $U(0, \theta)$ random variables where $\theta$ is an unknown positive integer.
(a) Find the maximum likelihood estimator $\hat{\theta}_n$ of $\theta$.
(b) Show that $\mathbb{P}(\hat{\theta}_n = \theta)$ converges to 1 as $n \to \infty$.
(c) Find the limiting distribution of $n(\theta - \hat{\theta}_n)$.
🧠 Approach & Key Concepts
This problem introduces a boundary-constrained optimization where the parameter space is restricted to integers ($\theta \in \mathbb{N}$).
- For part (a): The likelihood function is strictly decreasing with respect to $\theta$. Therefore, the MLE is the smallest permissible integer that covers all observed data.
- For part (b): We evaluate the probability of hitting the exact true parameter using the Cumulative Distribution Function (CDF) of the maximum order statistic $X_{(n)}$.
- For part (c): We apply the definitions of convergence in probability and convergence in distribution to show that an estimator with an exact hit probability approaching $1$ forces the scaled error to converge to a degenerate point mass.
✍️ Step-by-Step Proof / Derivation
Step 1: Finding the MLE for Part (a)
The probability density function for a single observation is $f(x_i | \theta) = \frac{1}{\theta} \mathbb{I}(0 < x_i \leq \theta)$. The joint likelihood function for the sample is the product of the marginals:
$$ L(\theta | \mathbf{X}) = \frac{1}{\theta^n} \prod_{i=1}^n \mathbb{I}(0 < X_i \leq \theta) = \frac{1}{\theta^n} \mathbb{I}(X_{(n)} \leq \theta) $$
where $X_{(n)} = \max(X_1, \dots, X_n)$ is the maximum order statistic. To maximize $L(\theta)$, we must minimize $\theta$ subject to two constraints:
- $\theta \geq X_{(n)}$ (to keep the indicator function non-zero).
- $\theta \in \{1, 2, 3, \dots\}$ (since $\theta$ is a positive integer).
The smallest integer that is greater than or equal to $X_{(n)}$ is mathematically defined as the ceiling function. Thus, the MLE is:
$$ \hat{\theta}_n = \lceil X_{(n)} \rceil $$
Step 2: Proving Asymptotic Consistency for Part (b)
We want to find the probability that our estimator exactly matches the true integer parameter $\theta$.
$$ \mathbb{P}(\hat{\theta}_n = \theta) = \mathbb{P}(\lceil X_{(n)} \rceil = \theta) $$
For the ceiling of a number to equal integer $\theta$, the number must fall strictly between $\theta - 1$ and $\theta$. Therefore:
$$ \mathbb{P}(\lceil X_{(n)} \rceil = \theta) = \mathbb{P}(\theta - 1 < X_{(n)} \leq \theta) $$
Since the true distribution of $X_i$ is bounded strictly at $\theta$, the upper condition $X_{(n)} \leq \theta$ happens with probability 1. We can rewrite the probability using the complement rule:
$$ \mathbb{P}(\hat{\theta}_n = \theta) = 1 - \mathbb{P}(X_{(n)} \leq \theta - 1) $$
The CDF of the maximum order statistic $X_{(n)}$ for uniform data is $F_{(n)}(x) = \left(\frac{x}{\theta}\right)^n$ for $x \in (0, \theta)$. Substituting $x = \theta - 1$:
$$ \mathbb{P}(\hat{\theta}_n = \theta) = 1 - \left( \frac{\theta - 1}{\theta} \right)^n = 1 - \left( 1 - \frac{1}{\theta} \right)^n $$
Since $\theta \geq 1$, the term $(1 - 1/\theta)$ is strictly less than $1$ (for $\theta > 1$) or exactly $0$ (for $\theta = 1$). In either case, as $n \to \infty$:
$$ \lim_{n \to \infty} \left( 1 - \frac{1}{\theta} \right)^n = 0 \implies \lim_{n \to \infty} \mathbb{P}(\hat{\theta}_n = \theta) = 1 $$
Step 3: Finding the Limiting Distribution for Part (c)
Let $Z_n = n(\theta - \hat{\theta}_n)$ be the sequence of scaled error random variables. We analyze the probability mass at exactly zero.
$$ \mathbb{P}(Z_n = 0) = \mathbb{P}\left(n(\theta - \hat{\theta}_n) = 0\right) = \mathbb{P}(\hat{\theta}_n = \theta) $$
From Part (b), we know that $\lim_{n \to \infty} \mathbb{P}(\hat{\theta}_n = \theta) = 1$. Therefore:
$$ \lim_{n \to \infty} \mathbb{P}(Z_n = 0) = 1 $$
Because the probability mass concentrates entirely at the single point $0$ as $n$ approaches infinity, the sequence of random variables $Z_n$ converges in probability to the constant $0$ ($Z_n \xrightarrow{p} 0$).
It is a standard theorem in asymptotic statistics that convergence in probability to a constant implies convergence in distribution to a degenerate random variable at that constant.
Final Answer / Q.E.D:
(a) The MLE is $\hat{\theta}_n = \lceil X_{(n)} \rceil$.
(b) $\mathbb{P}(\hat{\theta}_n = \theta) = 1 - (1 - 1/\theta)^n$, which converges to $1$ as $n \to \infty$.
(c) The limiting distribution of $n(\theta - \hat{\theta}_n)$ is a degenerate distribution at 0 (a point mass/Dirac delta at 0).
📌 Q3 Mixture Distributions and Popoviciu's Inequality (15 Marks)
Problem Statement: Let $X_1, X_2, \dots, X_n$ and $U$ be independent random variables, where $X_i \sim N(i, \sigma_i^2)$ and $\mathbb{P}(U = i) = \lambda_i$ for $i = 1, 2, \dots, n$ with $\sum_{i=1}^n \lambda_i = 1$. Define $Y = X_i$ if $U = i$. Let $V_1 = \text{Var}(Y)$ and $V_2 = \sum_{i=1}^n \lambda_i \sigma_i^2$.
(a) Which one is bigger $V_1$ or $V_2$? Justify your answer.
(b) Find $(\lambda_1, \lambda_2, \dots, \lambda_n)$ such that $|V_1 - V_2|$ is maximum.
(c) Find the maximum value of $|V_1 - V_2|$.
🧠 Approach & Key Concepts
The variable $Y$ is a Gaussian Mixture Model conditioned on the latent variable $U$.
- For part (a): The variance of a mixture is elegantly decomposed using the Law of Total Variance (EVE's Law). The difference between the total variance and the expected conditional variance is the variance of the conditional means.
- For parts (b & c): We must maximize the variance of the bounded discrete random variable $U \in \{1, \dots, n\}$. By Popoviciu's Inequality on Variances (or a simple extreme-point proof), the variance of a bounded distribution is maximized when the entire probability mass is split evenly between the extreme absolute bounds.
✍️ Step-by-Step Proof / Derivation
Step 1: Applying the Law of Total Variance for Part (a)
By definition, $Y$ conditioned on $U = i$ has the distribution of $X_i$, which means $Y | U \sim N(U, \sigma_U^2)$. From this conditional distribution, we extract the conditional mean and conditional variance:
$$ \mathbb{E}[Y | U] = U \quad \text{and} \quad \text{Var}(Y | U) = \sigma_U^2 $$
We use the Law of Total Variance to compute $V_1 = \text{Var}(Y)$:
$$ \text{Var}(Y) = \mathbb{E}[\text{Var}(Y | U)] + \text{Var}(\mathbb{E}[Y | U]) $$
Let us analyze the two components on the right-hand side:
- The expected conditional variance is $\mathbb{E}[\sigma_U^2] = \sum_{i=1}^n \mathbb{P}(U=i) \sigma_i^2 = \sum_{i=1}^n \lambda_i \sigma_i^2$. This is exactly defined as $V_2$.
- The variance of the conditional mean is $\text{Var}(U)$.
Substituting these into the equation yields:
$$ V_1 = V_2 + \text{Var}(U) $$
Since the variance of any real-valued random variable is strictly non-negative ($\text{Var}(U) \geq 0$), it must follow that $V_1 \geq V_2$. Therefore, $V_1$ is bigger (or equal if $U$ is a degenerate constant).
Step 2: Formulating the Optimization Problem for Part (b)
We need to maximize the absolute difference $|V_1 - V_2|$. From Step 1, we know:
$$ |V_1 - V_2| = V_1 - V_2 = \text{Var}(U) $$
Our goal is to find the probability vector $\boldsymbol{\lambda}$ that maximizes $\text{Var}(U)$ subject to the constraint that $U$ is supported on the integers $\{1, 2, \dots, n\}$. Let us expand $\text{Var}(U)$:
$$ \text{Var}(U) = \mathbb{E}[U^2] - (\mathbb{E}[U])^2 = \sum_{i=1}^n \lambda_i i^2 - \left( \sum_{i=1}^n \lambda_i i \right)^2 $$
Step 3: Proving the Extreme Point Mass Maximum
Notice that for any $1 \leq i \leq n$, the quadratic product $(i - 1)(n - i) \geq 0$. Expanding this geometric bound gives:
$$ in - i^2 - n + i \geq 0 \implies i^2 \leq (n+1)i - n $$
Taking the expectation on both sides with respect to $\boldsymbol{\lambda}$:
$$ \mathbb{E}[U^2] \leq (n+1)\mathbb{E}[U] - n $$
Substitute this upper bound back into the variance equation to create a function $g(x)$ where $x = \mathbb{E}[U]$:
$$ \text{Var}(U) \leq (n+1)\mathbb{E}[U] - n - (\mathbb{E}[U])^2 = g(\mathbb{E}[U]) $$
To maximize this bounding quadratic function $g(x) = -x^2 + (n+1)x - n$, we take the derivative with respect to $x$ and set it to zero:
$$ g'(x) = -2x + (n+1) = 0 \implies x = \frac{n+1}{2} $$
Thus, the variance is maximized when the expected value is exactly at the midpoint of the interval. For the inequality $i^2 \leq (n+1)i - n$ to achieve strict equality (which is required to reach the maximum theoretical variance), the probability mass must be $0$ everywhere the inequality is strict.
The inequality is strictly $0$ if and only if $i = 1$ or $i = n$. This forces $\lambda_i = 0$ for all $1 < i < n$.
To satisfy $\mathbb{E}[U] = \frac{n+1}{2}$ with only $\lambda_1$ and $\lambda_n$ active:
$$ 1 \cdot \lambda_1 + n \cdot \lambda_n = \frac{n+1}{2} \quad \text{and} \quad \lambda_1 + \lambda_n = 1 $$
Solving this system yields $\lambda_1 = 1/2$ and $\lambda_n = 1/2$.
Step 4: Evaluating the Maximum Value for Part (c)
We evaluate $\text{Var}(U)$ using our optimal discrete uniform distribution on the extremes:
$$ \mathbb{E}[U] = \frac{n+1}{2} $$
$$ \mathbb{E}[U^2] = 1^2\left(\frac{1}{2}\right) + n^2\left(\frac{1}{2}\right) = \frac{n^2 + 1}{2} $$
$$ \text{Max } \text{Var}(U) = \frac{n^2 + 1}{2} - \left(\frac{n+1}{2}\right)^2 = \frac{2n^2 + 2 - (n^2 + 2n + 1)}{4} = \frac{n^2 - 2n + 1}{4} $$
Final Answer / Q.E.D:
(a) $V_1$ is bigger than $V_2$ because $V_1 = V_2 + \text{Var}(U)$, and variance is non-negative.
(b) The probability vector that maximizes the difference is $\lambda_1 = 1/2$, $\lambda_n = 1/2$, and $\lambda_i = 0$ for all $1 < i < n$.
(c) The maximum absolute difference $|V_1 - V_2|$ is exactly $\frac{(n-1)^2}{4}$.
📌 Q4 Loss Functions and Weighted Medians (15 Marks)
Problem Statement:
(a) Consider a frequency distribution of $x$ (values 1 to 10) and frequencies $f$. Find the value of $\theta$ that minimizes $\psi(\theta) = \sum_{i=1}^{10} f_i |x_i - \theta|$.
(b) Consider a dataset containing $n$ bivariate observations $(x_1, y_1), \dots, (x_n, y_n)$. If $x_i = 2^i$ for $i = 1, 2, \dots, n$, show that $\Delta(\theta) = \sum_{i=1}^n |y_i - \theta x_i|$ is minimized at $\theta = y_n / x_n$.
🧠 Approach & Key Concepts
This problem tests the properties of $L_1$ loss functions (Absolute Deviations).
- For part (a): It is a standard statistical theorem that the sum of absolute deviations $\sum f_i |x_i - \theta|$ is minimized when $\theta$ is the median of the frequency distribution.
- For part (b): By factoring out $x_i$, the loss function transforms into a Weighted Median problem. Because the weights grow exponentially ($x_i = 2^i$), the final weight $x_n$ dominates the sum of all preceding weights. This structural dominance strictly anchors the minimum to the final data point.
✍️ Step-by-Step Proof / Derivation
Step 1: Minimizing Absolute Loss for Part (a)
We need to minimize $\psi(\theta) = \sum_{i=1}^{10} f_i |x_i - \theta|$. The minimizer of the sum of absolute deviations is the median of the data. First, we compute the total number of observations, $N$:
$$ N = \sum_{i=1}^{10} f_i = 17 + 35 + 46 + 38 + 25 + 13 + 10 + 9 + 5 + 2 = 200 $$
Since $N = 200$ (an even number), the median is the average of the 100th and 101st ordered observations. Let us calculate the cumulative frequencies to locate these positions:
- $x = 1$: $CF = 17$
- $x = 2$: $CF = 17 + 35 = 52$
- $x = 3$: $CF = 52 + 46 = 98$
- $x = 4$: $CF = 98 + 38 = 136$
Because the cumulative frequency reaches $98$ at $x=3$ and jumps to $136$ at $x=4$, both the 100th and 101st observations fall perfectly into the category where $x_i = 4$.
$$ \text{Median} = \frac{4 + 4}{2} = 4 \implies \theta = 4 $$
Step 2: Formulating the Weighted Median for Part (b)
We want to minimize $\Delta(\theta) = \sum_{i=1}^n |y_i - \theta x_i|$. Because $x_i = 2^i > 0$, we can factor $x_i$ out of the absolute value:
$$ \Delta(\theta) = \sum_{i=1}^n x_i \left| \frac{y_i}{x_i} - \theta \right| $$
Let $w_i = x_i = 2^i$ act as weights, and let $z_i = y_i / x_i$ act as the data points. The function $\Delta(\theta) = \sum_{i=1}^n w_i |z_i - \theta|$ is precisely minimized when $\theta$ is the weighted median of the sequence $z_i$ with corresponding weights $w_i$.
Step 3: Analyzing the Exponential Weights
The weighted median is the value $z_k$ at which the cumulative weight crosses $50\%$ of the total weight $W$. Let us compute the total weight:
$$ W = \sum_{i=1}^n 2^i = 2^{n+1} - 2 $$
Now, let us examine the weight of the $n$-th observation, $w_n = 2^n$. We compare this single weight to the sum of all other weights combined:
$$ \sum_{i=1}^{n-1} w_i = \sum_{i=1}^{n-1} 2^i = 2^n - 2 $$
Notice the strict inequality:
$$ w_n = 2^n > 2^n - 2 = \sum_{i=1}^{n-1} w_i $$
Step 4: Proving the Minimizer is at $z_n$
Because the single weight $w_n$ is strictly greater than the sum of all other weights, $z_n$ carries more than $50\%$ of the total distribution's mass. Regardless of where $z_n$ falls in the sorted order of $z_i$, any movement of $\theta$ away from $z_n$ will incur a heavier penalty from the $w_n |z_n - \theta|$ term than it could possibly save from the remaining $\sum_{i=1}^{n-1} w_i |z_i - \theta|$ terms.
Therefore, the minimum of the loss function is strictly anchored at $z_n$.
Final Answer / Q.E.D:
(a) The value that minimizes the absolute deviation is the median, $\theta = 4$.
(b) Because the weight of the $n$-th term ($2^n$) strictly exceeds the sum of all preceding weights ($2^n - 2$), the weighted median is locked at the $n$-th ratio. Thus, $\Delta(\theta)$ is minimized at $\theta = z_n = y_n / x_n$.
📌 Q5 SRSWR and Order Statistics (15 Marks)
Problem Statement: Consider a population $\{X_1, X_2, \dots, X_{15}\}$ of size 15, where all $X_i$'s are distinct. A sample of size 5 is drawn using Simple Random Sampling With Replacement (SRSWR).
(a) Show that the sample mean $\bar{x}$ can be written as $\sum_{i=1}^{15} W_i X_i$, where $(W_1, \dots, W_{15})$ follows a multinomial distribution. Derive $\mathbb{E}(\bar{x})$ and $\text{Var}(\bar{x})$ given $\sum X_i = 30$ and $\sum X_i^2 = 75$.
(b) If $\tilde{x}$ and $\tilde{X}$ denote the sample median and population median, find $\mathbb{P}(\tilde{x} = \tilde{X})$.
🧠 Approach & Key Concepts
This question bridges Survey Sampling and Non-parametric order statistics.
- For part (a): In SRSWR, drawing samples is akin to distributing $n$ items into $N$ bins, which strictly follows a Multinomial Distribution. We will use the covariance structure of the multinomial to derive the variance of the linear combination.
- For part (b): We must calculate the probability that the 3rd order statistic of a sample of 5 aligns perfectly with the 8th order statistic of a population of 15. This transforms into a Binomial probability problem evaluating the counts of items drawn below and above the median.
✍️ Step-by-Step Proof / Derivation
Step 1: The Multinomial Representation for Part (a)
Let $C_i$ be the number of times the $i$-th population unit $X_i$ is drawn in the sample of size $n=5$. Since the sampling is with replacement from $N=15$ items, each draw is independent with probability $p_i = 1/15$. Therefore, the vector of counts follows a Multinomial distribution:
$$ (C_1, C_2, \dots, C_{15}) \sim \text{Multinom}\left(n=5; p_1=\frac{1}{15}, \dots, p_{15}=\frac{1}{15}\right) $$
The sample mean $\bar{x}$ is the sum of the drawn values divided by $n=5$. This can be rewritten by grouping identical draws:
$$ \bar{x} = \frac{1}{5} \sum_{i=1}^{15} C_i X_i = \sum_{i=1}^{15} \left(\frac{C_i}{5}\right) X_i $$
By defining $W_i = C_i / 5$ (the sample proportion), we get $\bar{x} = \sum_{i=1}^{15} W_i X_i$. Note that $W_i$ is a scaled multinomial variable.
Step 2: Deriving Expectation and Variance of $\bar{x}$
From the properties of the multinomial distribution:
- $\mathbb{E}[C_i] = n p_i = 5 \times \frac{1}{15} = \frac{1}{3}$
- $\text{Var}(C_i) = n p_i (1 - p_i) = 5 \left(\frac{1}{15}\right) \left(\frac{14}{15}\right) = \frac{14}{45}$
- $\text{Cov}(C_i, C_j) = -n p_i p_j = -5 \left(\frac{1}{15}\right) \left(\frac{1}{15}\right) = -\frac{1}{45}$ (for $i \neq j$)
Using the linearity of expectation:
$$ \mathbb{E}[\bar{x}] = \frac{1}{5} \sum_{i=1}^{15} \mathbb{E}[C_i] X_i = \frac{1}{5} \sum_{i=1}^{15} \left(\frac{1}{3}\right) X_i = \frac{1}{15} \sum_{i=1}^{15} X_i $$
Given $\sum X_i = 30$, $\mathbb{E}[\bar{x}] = 30 / 15 = 2$.
For the variance, we expand the quadratic form $\text{Var}\left(\frac{1}{5} \sum C_i X_i\right)$:
$$ \text{Var}(\bar{x}) = \frac{1}{25} \left[ \sum_{i=1}^{15} \text{Var}(C_i) X_i^2 + \sum_{i \neq j} \text{Cov}(C_i, C_j) X_i X_j \right] $$
We know $(\sum X_i)^2 = \sum X_i^2 + \sum_{i \neq j} X_i X_j$. Plugging in the known sums: $30^2 = 75 + \sum_{i \neq j} X_i X_j$, meaning $\sum_{i \neq j} X_i X_j = 900 - 75 = 825$.
$$ \text{Var}(\bar{x}) = \frac{1}{25} \left[ \frac{14}{45} (75) - \frac{1}{45} (825) \right] = \frac{1}{25} \left[ \frac{1050 - 825}{45} \right] = \frac{1}{25} \left[ \frac{225}{45} \right] = \frac{1}{25} \times 5 = \frac{1}{5} = 0.2 $$
Step 3: Calculating $P(\tilde{x} = \tilde{X})$ for Part (b)
Since the population has 15 distinct items, the population median $\tilde{X}$ is the 8th order statistic, $X_{(8)}$. Exactly $7$ items are smaller than $\tilde{X}$, and $7$ are larger. On any single draw:
- $p_< = \mathbb{P}(X_i < \tilde{X}) = 7/15$
- $p_= = \mathbb{P}(X_i = \tilde{X}) = 1/15$
- $p_> = \mathbb{P}(X_i > \tilde{X}) = 7/15$
The sample has size $n=5$, so the sample median $\tilde{x}$ is the 3rd order statistic $x_{(3)}$. For the sample median to exactly equal the population median ($x_{(3)} = \tilde{X}$), the number of draws strictly less than $\tilde{X}$ (let's call this $N_1$) must be $\leq 2$, AND the number of draws strictly greater than $\tilde{X}$ (let's call this $N_3$) must be $\leq 2$.
Since $N_1 + N_2 + N_3 = 5$ (where $N_2$ is the number of times $\tilde{X}$ is drawn), the condition $N_3 \leq 2$ is mathematically equivalent to $N_1 + N_2 \geq 3$.
$$ \mathbb{P}(\tilde{x} = \tilde{X}) = \mathbb{P}(N_1 \leq 2 \text{ and } N_1 + N_2 \geq 3) = \mathbb{P}(N_1 \leq 2) - \mathbb{P}(N_1 + N_2 \leq 2) $$
Step 4: Binomial Computations
Notice that $N_1 \sim \text{Bin}(5, 7/15)$ and $(N_1 + N_2) \sim \text{Bin}(5, 8/15)$. Let $p = 7/15$ and $q = 8/15$. The probability becomes:
$$ \mathbb{P} = \sum_{k=0}^{2} \binom{5}{k} p^k (1-p)^{5-k} - \sum_{k=0}^{2} \binom{5}{k} q^k (1-q)^{5-k} $$
Since $1-p = q$ and $1-q = p$, we group the binomial terms:
$$ \mathbb{P} = \binom{5}{0} [q^5 - p^5] + \binom{5}{1} [pq^4 - qp^4] + \binom{5}{2} [p^2q^3 - q^2p^3] $$
Computing each term exactly:
- $k=0$: $(8/15)^5 - (7/15)^5 = (32768 - 16807) / 15^5 = 15961 / 15^5$
- $k=1$: $5pq(q^3 - p^3) = 5\left(\frac{56}{225}\right) \left(\frac{512 - 343}{3375}\right) = \frac{280 \times 169}{15^5} = \frac{47320}{15^5}$
- $k=2$: $10p^2q^2(q - p) = 10\left(\frac{3136}{50625}\right) \left(\frac{1}{15}\right) = \frac{31360}{15^5}$
Adding them together:
$$ \mathbb{P} = \frac{15961 + 47320 + 31360}{759375} = \frac{94641}{759375} $$
Final Answer / Q.E.D:
(a) The expected value $\mathbb{E}(\bar{x}) = 2$ and the variance $\text{Var}(\bar{x}) = 0.2$.
(b) The exact probability that the sample median equals the population median is $\frac{94641}{759375}$ (which simplifies to $\frac{31547}{253125} \approx 0.1246$).
📌 Q6 Linear Models and Estimability in Block Designs (15 Marks)
Problem Statement: A researcher studies the effect of 3 diets (Keto, Low-carb, Plant-based) on body fat percentage across 3 age groups using the additive model $y_{ijk} = \mu + \tau_i + \delta_j + \epsilon_{ijk}$.
(a) Are all the parameters of the model estimable? Justify.
(b) Are $\tau_i - \tau_j$ estimable for all $i \neq j$? Justify. If so, find the best linear unbiased estimators (BLUEs) of these contrasts.
(c) Find the F-statistic for testing equality of diet effects, its distribution, and critical region.
🧠 Approach & Key Concepts
This problem analyzes an Unbalanced Two-Way ANOVA without interaction (an incomplete block design).
- For part (a): We evaluate the absolute estimability of individual parameters in an overparameterized linear model.
- For part (b): Estimability of treatment contrasts depends entirely on the connectedness of the design's incidence graph. If there is a path between any two diets through shared age groups, the contrast is estimable.
- For part (c): We construct the standard F-test using the Adjusted Treatment Sum of Squares (obtained via the $C$-matrix from intra-block analysis) and the Error Sum of Squares.
✍️ Step-by-Step Proof / Derivation
Step 1: Assessing Absolute Estimability for Part (a)
The specified model is $y_{ijk} = \mu + \tau_i + \delta_j + \epsilon_{ijk}$. This is a classic overparameterized linear model. Let the design matrix be $\mathbf{X}$. The expected value is $\mathbb{E}[\mathbf{Y}] = \mathbf{X}\boldsymbol{\beta}$, where $\boldsymbol{\beta} = (\mu, \tau_1, \tau_2, \tau_3, \delta_1, \delta_2, \delta_3)^\top$.
A parameter vector $\mathbf{c}^\top \boldsymbol{\beta}$ is estimable if and only if $\mathbf{c}^\top$ lies in the row space of $\mathbf{X}$. However, the sum of the columns corresponding to the diet effects ($\tau_i$) equals the column for $\mu$, and the sum of the columns for the age effects ($\delta_j$) also equals the column for $\mu$.
Because of these linear dependencies, the design matrix $\mathbf{X}$ is not of full column rank. Therefore, the individual absolute parameters ($\mu, \tau_1, \dots, \delta_3$) are not estimable unless artificial constraints (like $\sum \tau_i = 0$ and $\sum \delta_j = 0$) are imposed.
Step 2: Checking Connectedness for Part (b)
A treatment contrast $\tau_i - \tau_j$ is estimable if the design is connected. We define a graph where vertices are diets and age groups, and edges exist if there is an observation in that diet-age cell. Let's trace the connections from the incidence table:
- Diet 1 connects to Age 1.
- Age 1 connects to Diet 2 and Diet 3.
- Diet 2 connects to Age 2.
- Age 2 connects to Diet 3.
- Diet 3 connects to Age 3.
Since we can trace a path from any diet to any other diet (e.g., Diet 1 $\to$ Age 1 $\to$ Diet 2), the design is connected. Therefore, all elementary contrasts $\tau_i - \tau_j$ are estimable.
Step 3: Deriving the BLUEs of the Contrasts
By the Gauss-Markov Theorem, the BLUEs of estimable functions are obtained by substituting the solutions of the normal equations $\mathbf{X}^\top \mathbf{X} \hat{\boldsymbol{\beta}} = \mathbf{X}^\top \mathbf{Y}$. For an incomplete block design, we use the intra-block equations:
$$ \mathbf{C} \hat{\boldsymbol{\tau}} = \mathbf{Q} $$
where $\mathbf{Q}_i = T_i - \sum_j \frac{n_{ij}}{n_{.j}} B_j$ (adjusted diet totals), and $\mathbf{C} = \text{diag}(n_{i.}) - \mathbf{N} \text{diag}(1/n_{.j}) \mathbf{N}^\top$. Here, $\mathbf{N}$ is the $3 \times 3$ incidence matrix $n_{ij}$.
Solving this reduced system yields the estimates $\hat{\tau}_i$. Since the $\mathbf{C}$ matrix has rank $v-1 = 2$, we impose $\sum \hat{\tau}_i = 0$ to get a unique generalized inverse $\mathbf{C}^-$. The BLUE for the contrast is:
$$ \widehat{\tau_i - \tau_j} = \hat{\tau}_i - \hat{\tau}_j $$
Step 4: Constructing the F-Statistic for Part (c)
We are testing $H_0: \tau_1 = \tau_2 = \tau_3$. The test statistic compares the Adjusted Treatment Sum of Squares ($SST_{adj}$) to the Error Sum of Squares ($SSE$).
$$ SST_{adj} = \hat{\boldsymbol{\tau}}^\top \mathbf{Q} $$
The total degrees of freedom is $N - 1 = 49$. The regression model has $1 (\mu) + 2 (\text{diets}) + 2 (\text{ages}) = 5$ independent parameters. Thus, the error degrees of freedom is $\nu_E = 50 - 5 = 45$. The numerator degrees of freedom is $v - 1 = 2$.
$$ F = \frac{SST_{adj} / 2}{SSE / 45} $$
Under the null hypothesis $H_0$, this statistic follows an $F$-distribution with $2$ and $45$ degrees of freedom. The critical region for a test of level $\alpha$ is to reject $H_0$ if:
$$ F > F_{2, 45}(\alpha) $$
Final Answer / Q.E.D:
(a) No, individual parameters are not estimable due to linear dependencies in the design matrix.
(b) Yes, all contrasts $\tau_i - \tau_j$ are estimable because the design is connected. Their BLUEs are $\hat{\tau}_i - \hat{\tau}_j$ derived from the intra-block equations $\mathbf{C}\hat{\boldsymbol{\tau}} = \mathbf{Q}$.
(c) The test statistic is $F = \frac{45 \cdot SST_{adj}}{2 \cdot SSE} \sim F_{2, 45}$. Reject $H_0$ if $F > F_{2, 45}(\alpha)$.
📌 Q7 UMVUE and the Rao-Blackwell Theorem (15 Marks)
Problem Statement: Let $X_1, X_2, \dots, X_n$ be independent observations from $U(0, \theta)$ and $X_{(n)} = \max\{X_1, \dots, X_n\}$. Let $r$ be a known positive integer.
(a) Find an unbiased estimator of $\theta^r$ based on a single observation.
(b) Find an unbiased estimator of $\theta^r$ based on a statistic that is complete and sufficient for $\theta$.
(c) Show that $\mathbb{E}[X_1^r | X_{(n)}] = \frac{n+r}{n(r+1)} X_{(n)}^r$.
🧠 Approach & Key Concepts
This is a classic application of the Lehmann-Scheffé Theorem.
- For part (a): We use the simple raw moments of the uniform distribution.
- For part (b): We find the expectation of a function of the maximum order statistic $X_{(n)}$, which is known to be the complete and sufficient statistic for $\theta$. We then scale it to achieve unbiasedness, yielding the Uniformly Minimum Variance Unbiased Estimator (UMVUE).
- For part (c): Rather than calculating the complex conditional density $f(x_1 | x_{(n)})$, we invoke the Rao-Blackwell Theorem combined with uniqueness to prove the identity elegantly.
✍️ Step-by-Step Proof / Derivation
Step 1: Unbiased Estimator from a Single Observation for Part (a)
For a single observation $X_1 \sim U(0, \theta)$, the probability density function is $f(x) = 1/\theta$ for $0 < x < \theta$. The $r$-th raw moment is:
$$ \mathbb{E}[X_1^r] = \int_{0}^{\theta} x^r \left(\frac{1}{\theta}\right) dx = \frac{1}{\theta} \left[ \frac{x^{r+1}}{r+1} \right]_0^\theta = \frac{\theta^{r+1}}{\theta(r+1)} = \frac{\theta^r}{r+1} $$
To make this an unbiased estimator of $\theta^r$, we multiply both sides by $(r+1)$:
$$ \mathbb{E}[(r+1)X_1^r] = \theta^r $$
Thus, $T_1 = (r+1)X_1^r$ is an unbiased estimator of $\theta^r$.
Step 2: Finding the UMVUE using $X_{(n)}$ for Part (b)
The maximum order statistic $X_{(n)}$ is the complete and sufficient statistic for $\theta$. Its probability density function is $g(y) = \frac{n y^{n-1}}{\theta^n}$ for $0 < y < \theta$. We find the expected value of $X_{(n)}^r$:
$$ \mathbb{E}[X_{(n)}^r] = \int_{0}^{\theta} y^r \left( \frac{n y^{n-1}}{\theta^n} \right) dy = \frac{n}{\theta^n} \int_{0}^{\theta} y^{n+r-1} dy $$
$$ \mathbb{E}[X_{(n)}^r] = \frac{n}{\theta^n} \left[ \frac{y^{n+r}}{n+r} \right]_0^\theta = \frac{n}{\theta^n} \frac{\theta^{n+r}}{n+r} = \frac{n}{n+r} \theta^r $$
To construct an unbiased estimator, we multiply by the reciprocal of the constant term:
$$ \mathbb{E}\left[ \frac{n+r}{n} X_{(n)}^r \right] = \theta^r $$
Since this estimator is a function of the complete sufficient statistic, by the Lehmann-Scheffé theorem, $T_2 = \frac{n+r}{n} X_{(n)}^r$ is the unique UMVUE for $\theta^r$.
Step 3: Elegant Proof via Rao-Blackwell for Part (c)
We are asked to evaluate the conditional expectation $\mathbb{E}[X_1^r | X_{(n)}]$. According to the Rao-Blackwell Theorem, if we condition an unbiased estimator on a sufficient statistic, the result is a new unbiased estimator that is a function of the sufficient statistic.
Let's condition our unbiased estimator from Part (a), $T_1 = (r+1)X_1^r$, on the sufficient statistic $X_{(n)}$:
$$ \phi(X_{(n)}) = \mathbb{E}[(r+1)X_1^r | X_{(n)}] $$
By the properties of conditional expectation, $\mathbb{E}[\phi(X_{(n)})] = \mathbb{E}[(r+1)X_1^r] = \theta^r$. Thus, $\phi(X_{(n)})$ is an unbiased estimator of $\theta^r$ based strictly on the complete sufficient statistic $X_{(n)}$.
Because $X_{(n)}$ is complete, the unbiased estimator based on it is strictly unique (Lehmann-Scheffé uniqueness). Therefore, $\phi(X_{(n)})$ must perfectly equal the UMVUE we derived in Part (b):
$$ \mathbb{E}[(r+1)X_1^r | X_{(n)}] = \frac{n+r}{n} X_{(n)}^r $$
By linearity of conditional expectation, we factor out the constant $(r+1)$ and divide:
$$ (r+1)\mathbb{E}[X_1^r | X_{(n)}] = \frac{n+r}{n} X_{(n)}^r \implies \mathbb{E}[X_1^r | X_{(n)}] = \frac{n+r}{n(r+1)} X_{(n)}^r $$
Final Answer / Q.E.D:
(a) Unbiased estimator based on $X_1$: $(r+1)X_1^r$
(b) UMVUE based on complete sufficient statistic $X_{(n)}$: $\frac{n+r}{n} X_{(n)}^r$
(c) Using Rao-Blackwellization and uniqueness, the conditional expectation is successfully proven to be $\frac{n+r}{n(r+1)} X_{(n)}^r$.
📌 Q8 Asymptotic Rank Statistics and Correlations (15 Marks)
Problem Statement: $M$ observations from a continuous distribution $F$ are randomly labeled $A_1, A_2, A_3$ with probabilities $p_1, p_2, p_3$. $R_i$ is the sum of the ranks of observations labeled $A_i$.
(a) Find the limiting value of $R_1 / R_2$ as $M \to \infty$.
(b) Find the correlation coefficient between $R_1$ and $R_2$.
🧠 Approach & Key Concepts
This problem evaluates non-parametric rank sums via Indicator Variables. Since the labeling is random and independent of the continuous values, the rank $k$ of an observation is independent of its label.
- For part (a): We define indicator variables for the labels, sum them up, and apply the Weak Law of Large Numbers (WLLN) to find the asymptotic ratio of their expected values.
- For part (b): We compute the variance and covariance of the rank sums. Because the total number of labels is a multinomial process across ranks, the covariance will be negative. We extract the dominant terms $O(M^3)$ as $M \to \infty$.
✍️ Step-by-Step Proof / Derivation
Step 1: Defining the Indicator Variables for Part (a)
Let $I_k^{(i)}$ be an indicator variable that equals $1$ if the observation with rank $k$ (where $k = 1, \dots, M$) receives label $A_i$, and $0$ otherwise. Since the labeling is completely random and independent of the actual values, the probability is simply $p_i$:
$$ \mathbb{E}[I_k^{(i)}] = \mathbb{P}(I_k^{(i)} = 1) = p_i $$
The sum of the ranks for group $A_i$ is $R_i = \sum_{k=1}^M k I_k^{(i)}$. By linearity of expectation:
$$ \mathbb{E}[R_i] = \sum_{k=1}^M k \mathbb{E}[I_k^{(i)}] = p_i \sum_{k=1}^M k = p_i \frac{M(M+1)}{2} $$
By the Law of Large Numbers, as $M \to \infty$, $R_i$ converges in probability to its expected value. Therefore, the ratio converges to the ratio of their expectations:
$$ \lim_{M \to \infty} \frac{R_1}{R_2} = \lim_{M \to \infty} \frac{p_1 \frac{M(M+1)}{2}}{p_2 \frac{M(M+1)}{2}} = \frac{p_1}{p_2} $$
Step 2: Variances and Covariances for Part (b)
Because the labels are assigned independently to each observation, $I_k^{(i)}$ and $I_j^{(i)}$ are independent for $k \neq j$. Therefore, the variance of the sum is the sum of the variances:
$$ \text{Var}(R_i) = \sum_{k=1}^M \text{Var}(k I_k^{(i)}) = \sum_{k=1}^M k^2 \text{Var}(I_k^{(i)}) $$
The variance of the Bernoulli indicator is $p_i(1 - p_i)$. Thus:
$$ \text{Var}(R_i) = p_i(1 - p_i) \sum_{k=1}^M k^2 = p_i(1 - p_i) \frac{M(M+1)(2M+1)}{6} $$
To find the covariance between $R_1$ and $R_2$, we note that $\text{Cov}(I_k^{(1)}, I_j^{(2)}) = 0$ for $k \neq j$. However, for the same rank $k$, the indicators are negatively correlated because an observation cannot simultaneously be labeled $A_1$ and $A_2$:
$$ \text{Cov}(I_k^{(1)}, I_k^{(2)}) = \mathbb{E}[I_k^{(1)} I_k^{(2)}] - \mathbb{E}[I_k^{(1)}]\mathbb{E}[I_k^{(2)}] = 0 - p_1 p_2 = -p_1 p_2 $$
Thus, the covariance of the rank sums is:
$$ \text{Cov}(R_1, R_2) = \sum_{k=1}^M k^2 \text{Cov}(I_k^{(1)}, I_k^{(2)}) = -p_1 p_2 \frac{M(M+1)(2M+1)}{6} $$
Step 3: Calculating the Correlation Coefficient
The correlation coefficient $\rho(R_1, R_2)$ is defined as the covariance divided by the product of the standard deviations:
$$ \rho(R_1, R_2) = \frac{\text{Cov}(R_1, R_2)}{\sqrt{\text{Var}(R_1) \text{Var}(R_2)}} $$
Notice that the massive sum term $\frac{M(M+1)(2M+1)}{6}$ appears linearly in the covariance, and as a square root product in the denominator $\sqrt{\left(\frac{M(M+1)(2M+1)}{6}\right)^2}$. Therefore, this structural term perfectly cancels out!
$$ \rho(R_1, R_2) = \frac{-p_1 p_2}{\sqrt{p_1(1-p_1) \cdot p_2(1-p_2)}} $$
Simplifying the denominator:
$$ \rho(R_1, R_2) = -\sqrt{\frac{p_1^2 p_2^2}{p_1 p_2 (1-p_1)(1-p_2)}} = -\sqrt{\frac{p_1 p_2}{(1-p_1)(1-p_2)}} $$
Final Answer / Q.E.D:
(a) The limiting value of $R_1 / R_2$ as $M \to \infty$ is $\frac{p_1}{p_2}$.
(b) The correlation coefficient between $R_1$ and $R_2$ is exactly $-\sqrt{\frac{p_1 p_2}{(1-p_1)(1-p_2)}}$, which is independent of the sample size $M$.
📚 Paper Summary & Key Focus Areas
🎯 Core Concepts Tested in This Paper
- Stochastic Processes (Q1): Utilizing the Spectral Theorem for circulant transition matrices over combinatorial brute-force methods. Understanding how the Periodicity of bipartite graph topologies prevents point-wise convergence to a stationary distribution.
- Asymptotic Theory & Inference (Q2, Q8): Handling boundary conditions in MLE where the parameter space is restricted to integers ($\mathbb{N}$), and proving convergence to a degenerate distribution. Utilizing indicator variables and the Weak Law of Large Numbers (WLLN) to find asymptotic limits and correlations of non-parametric rank sums.
- Mixture Models & Inequalities (Q3, Q4): Breaking down variance using the Law of Total Variance. Applying Popoviciu's Inequality to rigorously prove maximum variance bounds. Proving that $L_1$ loss functions are minimized by medians and extending this to exponentially dominating weighted medians.
- Sampling & Order Statistics (Q5): Translating Simple Random Sampling with Replacement (SRSWR) into a Multinomial Distribution framework to compute variances. Calculating exact alignment probabilities between sample and population medians using Binomial structures.
- Linear Models & Design of Experiments (Q6): Evaluating absolute versus contrast Estimability in overparameterized ANOVA models. Checking graph connectedness in incomplete block designs before deriving BLUEs and F-statistics via intra-block analysis.
- Advanced Point Estimation (Q7): Applying the Lehmann-Scheffé Theorem to find UMVUEs using complete sufficient statistics ($X_{(n)}$). Elegantly bypassing complex conditional integrations by invoking the Rao-Blackwell Theorem.
💡 ISI Examiner Insight:
In the ISI STB paper, mathematical elegance is rewarded just as highly as getting the correct numerical answer. For example:
1. In Q1, using discrete Fourier roots (eigenvalues of circulant matrices) demonstrates a deeper mathematical maturity than setting up a standard binomial expansion.
2. In Q6, verifying the connectedness of the design matrix graph is a mandatory theoretical prerequisite before blindly calculating BLUEs.
3. In Q7, explicitly invoking the Rao-Blackwell Theorem to evaluate a conditional expectation bypasses pages of tedious integration and secures full subjective marks instantly.