ISI STB Advanced Examination Model Solutions

📌 Q1 Markov Chains on a Cycle Graph (15 Marks)

Problem Statement: Consider a square with four vertices $A, B, C$, and $D$ arranged clockwise. A particle starts moving from the vertex $A$, and at any stage, it moves to either its clockwise neighbour or its anticlockwise neighbour with respective probabilities $0.6$ and $0.4$. Let $X_n \in \{A, B, C, D\}$ be the position after the $n$-th step, for $n \geq 1$.

(a) Find $\mathbb{P}(X_{12} = A)$.
(b) Find the stationary distribution of the Markov chain. Does the distribution of $X_n$ converge to this stationary distribution as $n \to \infty$? Justify your answer.

🧠 Approach & Key Concepts

This problem models a Random Walk on a Cycle Graph ($C_4$).

For part (a): While a binomial combinatorial sum is possible, the most rigorous and elegant approach involves utilizing the spectral properties of the transition matrix. Since the state space forms a ring, the transition matrix is a Circulant Matrix, whose eigenvalues are easily found using roots of unity.
For part (b): We verify double-stochasticity for the stationary distribution. For convergence, we must analyze the Periodicity of the chain. Bipartite graphs strictly oscillate, precluding point-wise convergence.

✍️ Step-by-Step Proof / Derivation

Step 1: Setting up the Transition Matrix

Let us index the state space $S = \{A, B, C, D\}$ as $\{0, 1, 2, 3\}$. The particle moves clockwise ($+1 \pmod 4$) with probability $p=0.6$, and anticlockwise ($-1 \pmod 4$) with probability $q=0.4$. The 1-step transition matrix $\mathbf{P}$ is:

\mathbf{P} = \begin{pmatrix} 0 & 0.6 & 0 & 0.4 \\ 0.4 & 0 & 0.6 & 0 \\ 0 & 0.4 & 0 & 0.6 \\ 0.6 & 0 & 0.4 & 0 \end{pmatrix}

Step 2: Spectral Decomposition of the Circulant Matrix

Matrix $\mathbf{P}$ is a circulant matrix defined by its first row vector $\mathbf{c} = [0, 0.6, 0, 0.4]$. The eigenvalues $\lambda_k$ of a $4 \times 4$ circulant matrix are given by the Discrete Fourier Transform of $\mathbf{c}$:

\lambda_k = \sum_{j=0}^{3} c_j \omega^{jk}, \quad \text{where } \omega = e^{2\pi i / 4} = i

Substituting $c_1 = p = 0.6$ and $c_3 = q = 0.4$, we compute the four eigenvalues:

$\lambda_0 = p + q = 1$
$\lambda_1 = p(i) + q(-i) = (0.6 - 0.4)i = 0.2i$
$\lambda_2 = p(-1) + q(-1) = -1$
$\lambda_3 = p(-i) + q(i) = -(0.6 - 0.4)i = -0.2i$

Step 3: Calculating $\mathbb{P}(X_{12} = A)$

Because the eigenvectors of a circulant matrix are independent of its entries (they are the columns of the Fourier matrix), the $n$-step transition probability returning to the starting state $P_{00}^{(n)}$ is simply the average of the $n$-th powers of its eigenvalues:

P_{00}^{(n)} = \frac{1}{4} \sum_{k=0}^{3} \lambda_k^n

For $n = 12$, we evaluate the 12th power of each eigenvalue:

$\lambda_0^{12} = 1^{12} = 1$
$\lambda_1^{12} = (0.2i)^{12} = (0.2)^{12} (i^4)^3 = 0.2^{12} \cdot 1 = 0.2^{12}$
$\lambda_2^{12} = (-1)^{12} = 1$
$\lambda_3^{12} = (-0.2i)^{12} = 0.2^{12} (i^4)^3 = 0.2^{12}$

P_{00}^{(12)} = \frac{1}{4} \left( 1 + 0.2^{12} + 1 + 0.2^{12} \right) = \frac{1}{4} \left( 2 + 2(0.2)^{12} \right) = \frac{1 + 0.2^{12}}{2}

Step 4: Finding the Stationary Distribution for Part (b)

A stationary distribution $\boldsymbol{\pi} = (\pi_A, \pi_B, \pi_C, \pi_D)$ must satisfy $\boldsymbol{\pi} \mathbf{P} = \boldsymbol{\pi}$ and $\sum \pi_i = 1$. By inspecting the transition matrix $\mathbf{P}$, we observe that both the row sums and the column sums are exactly $1$.

Because $\mathbf{P}$ is a doubly stochastic matrix over a finite state space, the unique stationary distribution is the uniform distribution:

\boldsymbol{\pi} = \left( \frac{1}{4}, \frac{1}{4}, \frac{1}{4}, \frac{1}{4} \right)

Step 5: Analyzing Convergence and Periodicity

For a discrete-time Markov chain to converge pointwise to its stationary distribution ($\lim_{n \to \infty} P_{ij}^{(n)} = \pi_j$), the chain must be irreducible, positive recurrent, and aperiodic.

Let us define the period $d$ of state $A$. The period is the greatest common divisor of all $n$ such that returning to $A$ is possible: $d = \gcd\{n : P_{00}^{(n)} > 0\}$. Because the graph is a square (a bipartite graph), any path that starts at $A$ and returns to $A$ must take an even number of steps.

d = \gcd(2, 4, 6, \dots) = 2

Because the period $d = 2 > 1$, the Markov chain is periodic. Specifically, $P_{00}^{(2k+1)} = 0$ for all integers $k$, meaning the probability sequence strictly oscillates and does not converge to $1/4$.

Final Answer / Q.E.D:
(a) The probability $\mathbb{P}(X_{12} = A) = \frac{1 + 0.2^{12}}{2}$.
(b) The stationary distribution is $\boldsymbol{\pi} = (1/4, 1/4, 1/4, 1/4)$. However, the distribution of $X_n$ does not converge to this stationary distribution because the Markov chain is periodic with period $d=2$ (bipartite state space).

📌 Q2 Discrete MLE and Asymptotic Degeneracy (15 Marks)

Problem Statement: Let $X_1, X_2, \dots, X_n$ ($n > 1$) be independent $U(0, \theta)$ random variables where $\theta$ is an unknown positive integer.

(a) Find the maximum likelihood estimator $\hat{\theta}_n$ of $\theta$.
(b) Show that $\mathbb{P}(\hat{\theta}_n = \theta)$ converges to 1 as $n \to \infty$.
(c) Find the limiting distribution of $n(\theta - \hat{\theta}_n)$.

🧠 Approach & Key Concepts

This problem introduces a boundary-constrained optimization where the parameter space is restricted to integers ($\theta \in \mathbb{N}$).

For part (a): The likelihood function is strictly decreasing with respect to $\theta$. Therefore, the MLE is the smallest permissible integer that covers all observed data.
For part (b): We evaluate the probability of hitting the exact true parameter using the Cumulative Distribution Function (CDF) of the maximum order statistic $X_{(n)}$.
For part (c): We apply the definitions of convergence in probability and convergence in distribution to show that an estimator with an exact hit probability approaching $1$ forces the scaled error to converge to a degenerate point mass.

✍️ Step-by-Step Proof / Derivation

Step 1: Finding the MLE for Part (a)

The probability density function for a single observation is $f(x_i | \theta) = \frac{1}{\theta} \mathbb{I}(0 < x_i \leq \theta)$. The joint likelihood function for the sample is the product of the marginals:

L(\theta | \mathbf{X}) = \frac{1}{\theta^n} \prod_{i=1}^n \mathbb{I}(0 < X_i \leq \theta) = \frac{1}{\theta^n} \mathbb{I}(X_{(n)} \leq \theta)

where $X_{(n)} = \max(X_1, \dots, X_n)$ is the maximum order statistic. To maximize $L(\theta)$, we must minimize $\theta$ subject to two constraints:

$\theta \geq X_{(n)}$ (to keep the indicator function non-zero).
$\theta \in \{1, 2, 3, \dots\}$ (since $\theta$ is a positive integer).

The smallest integer that is greater than or equal to $X_{(n)}$ is mathematically defined as the ceiling function. Thus, the MLE is:

\hat{\theta}_n = \lceil X_{(n)} \rceil

Step 2: Proving Asymptotic Consistency for Part (b)

We want to find the probability that our estimator exactly matches the true integer parameter $\theta$.

\mathbb{P}(\hat{\theta}_n = \theta) = \mathbb{P}(\lceil X_{(n)} \rceil = \theta)

For the ceiling of a number to equal integer $\theta$, the number must fall strictly between $\theta - 1$ and $\theta$. Therefore:

\mathbb{P}(\lceil X_{(n)} \rceil = \theta) = \mathbb{P}(\theta - 1 < X_{(n)} \leq \theta)

Since the true distribution of $X_i$ is bounded strictly at $\theta$, the upper condition $X_{(n)} \leq \theta$ happens with probability 1. We can rewrite the probability using the complement rule:

\mathbb{P}(\hat{\theta}_n = \theta) = 1 - \mathbb{P}(X_{(n)} \leq \theta - 1)

The CDF of the maximum order statistic $X_{(n)}$ for uniform data is $F_{(n)}(x) = \left(\frac{x}{\theta}\right)^n$ for $x \in (0, \theta)$. Substituting $x = \theta - 1$:

\mathbb{P}(\hat{\theta}_n = \theta) = 1 - \left( \frac{\theta - 1}{\theta} \right)^n = 1 - \left( 1 - \frac{1}{\theta} \right)^n

Since $\theta \geq 1$, the term $(1 - 1/\theta)$ is strictly less than $1$ (for $\theta > 1$) or exactly $0$ (for $\theta = 1$). In either case, as $n \to \infty$:

\lim_{n \to \infty} \left( 1 - \frac{1}{\theta} \right)^n = 0 \implies \lim_{n \to \infty} \mathbb{P}(\hat{\theta}_n = \theta) = 1

Step 3: Finding the Limiting Distribution for Part (c)

Let $Z_n = n(\theta - \hat{\theta}_n)$ be the sequence of scaled error random variables. We analyze the probability mass at exactly zero.

\mathbb{P}(Z_n = 0) = \mathbb{P}\left(n(\theta - \hat{\theta}_n) = 0\right) = \mathbb{P}(\hat{\theta}_n = \theta)

From Part (b), we know that $\lim_{n \to \infty} \mathbb{P}(\hat{\theta}_n = \theta) = 1$. Therefore:

\lim_{n \to \infty} \mathbb{P}(Z_n = 0) = 1

Because the probability mass concentrates entirely at the single point $0$ as $n$ approaches infinity, the sequence of random variables $Z_n$ converges in probability to the constant $0$ ($Z_n \xrightarrow{p} 0$).

It is a standard theorem in asymptotic statistics that convergence in probability to a constant implies convergence in distribution to a degenerate random variable at that constant.

Final Answer / Q.E.D:
(a) The MLE is $\hat{\theta}_n = \lceil X_{(n)} \rceil$.
(b) $\mathbb{P}(\hat{\theta}_n = \theta) = 1 - (1 - 1/\theta)^n$, which converges to $1$ as $n \to \infty$.
(c) The limiting distribution of $n(\theta - \hat{\theta}_n)$ is a degenerate distribution at 0 (a point mass/Dirac delta at 0).

📌 Q3 Mixture Distributions and Popoviciu's Inequality (15 Marks)

Problem Statement: Let $X_1, X_2, \dots, X_n$ and $U$ be independent random variables, where $X_i \sim N(i, \sigma_i^2)$ and $\mathbb{P}(U = i) = \lambda_i$ for $i = 1, 2, \dots, n$ with $\sum_{i=1}^n \lambda_i = 1$. Define $Y = X_i$ if $U = i$. Let $V_1 = \text{Var}(Y)$ and $V_2 = \sum_{i=1}^n \lambda_i \sigma_i^2$.

(a) Which one is bigger $V_1$ or $V_2$? Justify your answer.
(b) Find $(\lambda_1, \lambda_2, \dots, \lambda_n)$ such that $|V_1 - V_2|$ is maximum.
(c) Find the maximum value of $|V_1 - V_2|$.

🧠 Approach & Key Concepts

The variable $Y$ is a Gaussian Mixture Model conditioned on the latent variable $U$.

For part (a): The variance of a mixture is elegantly decomposed using the Law of Total Variance (EVE's Law). The difference between the total variance and the expected conditional variance is the variance of the conditional means.
For parts (b & c): We must maximize the variance of the bounded discrete random variable $U \in \{1, \dots, n\}$. By Popoviciu's Inequality on Variances (or a simple extreme-point proof), the variance of a bounded distribution is maximized when the entire probability mass is split evenly between the extreme absolute bounds.

✍️ Step-by-Step Proof / Derivation

Step 1: Applying the Law of Total Variance for Part (a)

By definition, $Y$ conditioned on $U = i$ has the distribution of $X_i$, which means $Y | U \sim N(U, \sigma_U^2)$. From this conditional distribution, we extract the conditional mean and conditional variance:

\mathbb{E}[Y | U] = U \quad \text{and} \quad \text{Var}(Y | U) = \sigma_U^2

We use the Law of Total Variance to compute $V_1 = \text{Var}(Y)$:

\text{Var}(Y) = \mathbb{E}[\text{Var}(Y | U)] + \text{Var}(\mathbb{E}[Y | U])

Let us analyze the two components on the right-hand side:

The expected conditional variance is $\mathbb{E}[\sigma_U^2] = \sum_{i=1}^n \mathbb{P}(U=i) \sigma_i^2 = \sum_{i=1}^n \lambda_i \sigma_i^2$. This is exactly defined as $V_2$.
The variance of the conditional mean is $\text{Var}(U)$.

Substituting these into the equation yields:

V_1 = V_2 + \text{Var}(U)

Since the variance of any real-valued random variable is strictly non-negative ($\text{Var}(U) \geq 0$), it must follow that $V_1 \geq V_2$. Therefore, $V_1$ is bigger (or equal if $U$ is a degenerate constant).

Step 2: Formulating the Optimization Problem for Part (b)

We need to maximize the absolute difference $|V_1 - V_2|$. From Step 1, we know:

|V_1 - V_2| = V_1 - V_2 = \text{Var}(U)

Our goal is to find the probability vector $\boldsymbol{\lambda}$ that maximizes $\text{Var}(U)$ subject to the constraint that $U$ is supported on the integers $\{1, 2, \dots, n\}$. Let us expand $\text{Var}(U)$:

\text{Var}(U) = \mathbb{E}[U^2] - (\mathbb{E}[U])^2 = \sum_{i=1}^n \lambda_i i^2 - \left( \sum_{i=1}^n \lambda_i i \right)^2

Step 3: Proving the Extreme Point Mass Maximum

Notice that for any $1 \leq i \leq n$, the quadratic product $(i - 1)(n - i) \geq 0$. Expanding this geometric bound gives:

in - i^2 - n + i \geq 0 \implies i^2 \leq (n+1)i - n

Taking the expectation on both sides with respect to $\boldsymbol{\lambda}$:

\mathbb{E}[U^2] \leq (n+1)\mathbb{E}[U] - n

Substitute this upper bound back into the variance equation to create a function $g(x)$ where $x = \mathbb{E}[U]$:

\text{Var}(U) \leq (n+1)\mathbb{E}[U] - n - (\mathbb{E}[U])^2 = g(\mathbb{E}[U])

To maximize this bounding quadratic function $g(x) = -x^2 + (n+1)x - n$, we take the derivative with respect to $x$ and set it to zero:

g'(x) = -2x + (n+1) = 0 \implies x = \frac{n+1}{2}

Thus, the variance is maximized when the expected value is exactly at the midpoint of the interval. For the inequality $i^2 \leq (n+1)i - n$ to achieve strict equality (which is required to reach the maximum theoretical variance), the probability mass must be $0$ everywhere the inequality is strict.

The inequality is strictly $0$ if and only if $i = 1$ or $i = n$. This forces $\lambda_i = 0$ for all $1 < i < n$.

To satisfy $\mathbb{E}[U] = \frac{n+1}{2}$ with only $\lambda_1$ and $\lambda_n$ active:

1 \cdot \lambda_1 + n \cdot \lambda_n = \frac{n+1}{2} \quad \text{and} \quad \lambda_1 + \lambda_n = 1

Solving this system yields $\lambda_1 = 1/2$ and $\lambda_n = 1/2$.

Step 4: Evaluating the Maximum Value for Part (c)

We evaluate $\text{Var}(U)$ using our optimal discrete uniform distribution on the extremes:

\mathbb{E}[U] = \frac{n+1}{2}

\mathbb{E}[U^2] = 1^2\left(\frac{1}{2}\right) + n^2\left(\frac{1}{2}\right) = \frac{n^2 + 1}{2}

\text{Max } \text{Var}(U) = \frac{n^2 + 1}{2} - \left(\frac{n+1}{2}\right)^2 = \frac{2n^2 + 2 - (n^2 + 2n + 1)}{4} = \frac{n^2 - 2n + 1}{4}

Final Answer / Q.E.D:
(a) $V_1$ is bigger than $V_2$ because $V_1 = V_2 + \text{Var}(U)$, and variance is non-negative.
(b) The probability vector that maximizes the difference is $\lambda_1 = 1/2$, $\lambda_n = 1/2$, and $\lambda_i = 0$ for all $1 < i < n$.
(c) The maximum absolute difference $|V_1 - V_2|$ is exactly $\frac{(n-1)^2}{4}$.

📌 Q4 Loss Functions and Weighted Medians (15 Marks)

Problem Statement:
(a) Consider a frequency distribution of $x$ (values 1 to 10) and frequencies $f$. Find the value of $\theta$ that minimizes $\psi(\theta) = \sum_{i=1}^{10} f_i |x_i - \theta|$.
(b) Consider a dataset containing $n$ bivariate observations $(x_1, y_1), \dots, (x_n, y_n)$. If $x_i = 2^i$ for $i = 1, 2, \dots, n$, show that $\Delta(\theta) = \sum_{i=1}^n |y_i - \theta x_i|$ is minimized at $\theta = y_n / x_n$.

🧠 Approach & Key Concepts

This problem tests the properties of $L_1$ loss functions (Absolute Deviations).

For part (a): It is a standard statistical theorem that the sum of absolute deviations $\sum f_i |x_i - \theta|$ is minimized when $\theta$ is the median of the frequency distribution.
For part (b): By factoring out $x_i$, the loss function transforms into a Weighted Median problem. Because the weights grow exponentially ($x_i = 2^i$), the final weight $x_n$ dominates the sum of all preceding weights. This structural dominance strictly anchors the minimum to the final data point.

✍️ Step-by-Step Proof / Derivation

Step 1: Minimizing Absolute Loss for Part (a)

We need to minimize $\psi(\theta) = \sum_{i=1}^{10} f_i |x_i - \theta|$. The minimizer of the sum of absolute deviations is the median of the data. First, we compute the total number of observations, $N$:

N = \sum_{i=1}^{10} f_i = 17 + 35 + 46 + 38 + 25 + 13 + 10 + 9 + 5 + 2 = 200

Since $N = 200$ (an even number), the median is the average of the 100th and 101st ordered observations. Let us calculate the cumulative frequencies to locate these positions:

$x = 1$: $CF = 17$
$x = 2$: $CF = 17 + 35 = 52$
$x = 3$: $CF = 52 + 46 = 98$
$x = 4$: $CF = 98 + 38 = 136$

Because the cumulative frequency reaches $98$ at $x=3$ and jumps to $136$ at $x=4$, both the 100th and 101st observations fall perfectly into the category where $x_i = 4$.

\text{Median} = \frac{4 + 4}{2} = 4 \implies \theta = 4

Step 2: Formulating the Weighted Median for Part (b)

We want to minimize $\Delta(\theta) = \sum_{i=1}^n |y_i - \theta x_i|$. Because $x_i = 2^i > 0$, we can factor $x_i$ out of the absolute value:

\Delta(\theta) = \sum_{i=1}^n x_i \left| \frac{y_i}{x_i} - \theta \right|

Let $w_i = x_i = 2^i$ act as weights, and let $z_i = y_i / x_i$ act as the data points. The function $\Delta(\theta) = \sum_{i=1}^n w_i |z_i - \theta|$ is precisely minimized when $\theta$ is the weighted median of the sequence $z_i$ with corresponding weights $w_i$.

Step 3: Analyzing the Exponential Weights

The weighted median is the value $z_k$ at which the cumulative weight crosses $50\%$ of the total weight $W$. Let us compute the total weight:

W = \sum_{i=1}^n 2^i = 2^{n+1} - 2

Now, let us examine the weight of the $n$-th observation, $w_n = 2^n$. We compare this single weight to the sum of all other weights combined:

\sum_{i=1}^{n-1} w_i = \sum_{i=1}^{n-1} 2^i = 2^n - 2

Notice the strict inequality:

w_n = 2^n > 2^n - 2 = \sum_{i=1}^{n-1} w_i

Step 4: Proving the Minimizer is at $z_n$

Because the single weight $w_n$ is strictly greater than the sum of all other weights, $z_n$ carries more than $50\%$ of the total distribution's mass. Regardless of where $z_n$ falls in the sorted order of $z_i$, any movement of $\theta$ away from $z_n$ will incur a heavier penalty from the $w_n |z_n - \theta|$ term than it could possibly save from the remaining $\sum_{i=1}^{n-1} w_i |z_i - \theta|$ terms.

Therefore, the minimum of the loss function is strictly anchored at $z_n$.

Final Answer / Q.E.D:
(a) The value that minimizes the absolute deviation is the median, $\theta = 4$.
(b) Because the weight of the $n$-th term ($2^n$) strictly exceeds the sum of all preceding weights ($2^n - 2$), the weighted median is locked at the $n$-th ratio. Thus, $\Delta(\theta)$ is minimized at $\theta = z_n = y_n / x_n$.

📌 Q5 SRSWR and Order Statistics (15 Marks)

Problem Statement: Consider a population $\{X_1, X_2, \dots, X_{15}\}$ of size 15, where all $X_i$'s are distinct. A sample of size 5 is drawn using Simple Random Sampling With Replacement (SRSWR).

(a) Show that the sample mean $\bar{x}$ can be written as $\sum_{i=1}^{15} W_i X_i$, where $(W_1, \dots, W_{15})$ follows a multinomial distribution. Derive $\mathbb{E}(\bar{x})$ and $\text{Var}(\bar{x})$ given $\sum X_i = 30$ and $\sum X_i^2 = 75$.
(b) If $\tilde{x}$ and $\tilde{X}$ denote the sample median and population median, find $\mathbb{P}(\tilde{x} = \tilde{X})$.

🧠 Approach & Key Concepts

This question bridges Survey Sampling and Non-parametric order statistics.

For part (a): In SRSWR, drawing samples is akin to distributing $n$ items into $N$ bins, which strictly follows a Multinomial Distribution. We will use the covariance structure of the multinomial to derive the variance of the linear combination.
For part (b): We must calculate the probability that the 3rd order statistic of a sample of 5 aligns perfectly with the 8th order statistic of a population of 15. This transforms into a Binomial probability problem evaluating the counts of items drawn below and above the median.

✍️ Step-by-Step Proof / Derivation

Step 1: The Multinomial Representation for Part (a)

Let $C_i$ be the number of times the $i$-th population unit $X_i$ is drawn in the sample of size $n=5$. Since the sampling is with replacement from $N=15$ items, each draw is independent with probability $p_i = 1/15$. Therefore, the vector of counts follows a Multinomial distribution:

(C_1, C_2, \dots, C_{15}) \sim \text{Multinom}\left(n=5; p_1=\frac{1}{15}, \dots, p_{15}=\frac{1}{15}\right)

The sample mean $\bar{x}$ is the sum of the drawn values divided by $n=5$. This can be rewritten by grouping identical draws:

\bar{x} = \frac{1}{5} \sum_{i=1}^{15} C_i X_i = \sum_{i=1}^{15} \left(\frac{C_i}{5}\right) X_i

By defining $W_i = C_i / 5$ (the sample proportion), we get $\bar{x} = \sum_{i=1}^{15} W_i X_i$. Note that $W_i$ is a scaled multinomial variable.

Step 2: Deriving Expectation and Variance of $\bar{x}$

From the properties of the multinomial distribution:

$\mathbb{E}[C_i] = n p_i = 5 \times \frac{1}{15} = \frac{1}{3}$
$\text{Var}(C_i) = n p_i (1 - p_i) = 5 \left(\frac{1}{15}\right) \left(\frac{14}{15}\right) = \frac{14}{45}$
$\text{Cov}(C_i, C_j) = -n p_i p_j = -5 \left(\frac{1}{15}\right) \left(\frac{1}{15}\right) = -\frac{1}{45}$ (for $i \neq j$)

Using the linearity of expectation:

\mathbb{E}[\bar{x}] = \frac{1}{5} \sum_{i=1}^{15} \mathbb{E}[C_i] X_i = \frac{1}{5} \sum_{i=1}^{15} \left(\frac{1}{3}\right) X_i = \frac{1}{15} \sum_{i=1}^{15} X_i

Given $\sum X_i = 30$, $\mathbb{E}[\bar{x}] = 30 / 15 = 2$.

For the variance, we expand the quadratic form $\text{Var}\left(\frac{1}{5} \sum C_i X_i\right)$:

\text{Var}(\bar{x}) = \frac{1}{25} \left[ \sum_{i=1}^{15} \text{Var}(C_i) X_i^2 + \sum_{i \neq j} \text{Cov}(C_i, C_j) X_i X_j \right]

We know $(\sum X_i)^2 = \sum X_i^2 + \sum_{i \neq j} X_i X_j$. Plugging in the known sums: $30^2 = 75 + \sum_{i \neq j} X_i X_j$, meaning $\sum_{i \neq j} X_i X_j = 900 - 75 = 825$.

\text{Var}(\bar{x}) = \frac{1}{25} \left[ \frac{14}{45} (75) - \frac{1}{45} (825) \right] = \frac{1}{25} \left[ \frac{1050 - 825}{45} \right] = \frac{1}{25} \left[ \frac{225}{45} \right] = \frac{1}{25} \times 5 = \frac{1}{5} = 0.2

Step 3: Calculating $P(\tilde{x} = \tilde{X})$ for Part (b)

Since the population has 15 distinct items, the population median $\tilde{X}$ is the 8th order statistic, $X_{(8)}$. Exactly $7$ items are smaller than $\tilde{X}$, and $7$ are larger. On any single draw:

$p_< = \mathbb{P}(X_i < \tilde{X}) = 7/15$
$p_= = \mathbb{P}(X_i = \tilde{X}) = 1/15$
$p_> = \mathbb{P}(X_i > \tilde{X}) = 7/15$

The sample has size $n=5$, so the sample median $\tilde{x}$ is the 3rd order statistic $x_{(3)}$. For the sample median to exactly equal the population median ($x_{(3)} = \tilde{X}$), the number of draws strictly less than $\tilde{X}$ (let's call this $N_1$) must be $\leq 2$, AND the number of draws strictly greater than $\tilde{X}$ (let's call this $N_3$) must be $\leq 2$.

Since $N_1 + N_2 + N_3 = 5$ (where $N_2$ is the number of times $\tilde{X}$ is drawn), the condition $N_3 \leq 2$ is mathematically equivalent to $N_1 + N_2 \geq 3$.

\mathbb{P}(\tilde{x} = \tilde{X}) = \mathbb{P}(N_1 \leq 2 \text{ and } N_1 + N_2 \geq 3) = \mathbb{P}(N_1 \leq 2) - \mathbb{P}(N_1 + N_2 \leq 2)

Step 4: Binomial Computations

Notice that $N_1 \sim \text{Bin}(5, 7/15)$ and $(N_1 + N_2) \sim \text{Bin}(5, 8/15)$. Let $p = 7/15$ and $q = 8/15$. The probability becomes:

\mathbb{P} = \sum_{k=0}^{2} \binom{5}{k} p^k (1-p)^{5-k} - \sum_{k=0}^{2} \binom{5}{k} q^k (1-q)^{5-k}

Since $1-p = q$ and $1-q = p$, we group the binomial terms:

\mathbb{P} = \binom{5}{0} [q^5 - p^5] + \binom{5}{1} [pq^4 - qp^4] + \binom{5}{2} [p^2q^3 - q^2p^3]

Computing each term exactly:

$k=0$: $(8/15)^5 - (7/15)^5 = (32768 - 16807) / 15^5 = 15961 / 15^5$
$k=1$: $5pq(q^3 - p^3) = 5\left(\frac{56}{225}\right) \left(\frac{512 - 343}{3375}\right) = \frac{280 \times 169}{15^5} = \frac{47320}{15^5}$
$k=2$: $10p^2q^2(q - p) = 10\left(\frac{3136}{50625}\right) \left(\frac{1}{15}\right) = \frac{31360}{15^5}$

Adding them together:

\mathbb{P} = \frac{15961 + 47320 + 31360}{759375} = \frac{94641}{759375}

Final Answer / Q.E.D:
(a) The expected value $\mathbb{E}(\bar{x}) = 2$ and the variance $\text{Var}(\bar{x}) = 0.2$.
(b) The exact probability that the sample median equals the population median is $\frac{94641}{759375}$ (which simplifies to $\frac{31547}{253125} \approx 0.1246$).

📌 Q6 Linear Models and Estimability in Block Designs (15 Marks)

Problem Statement: A researcher studies the effect of 3 diets (Keto, Low-carb, Plant-based) on body fat percentage across 3 age groups using the additive model $y_{ijk} = \mu + \tau_i + \delta_j + \epsilon_{ijk}$.
(a) Are all the parameters of the model estimable? Justify.
(b) Are $\tau_i - \tau_j$ estimable for all $i \neq j$? Justify. If so, find the best linear unbiased estimators (BLUEs) of these contrasts.
(c) Find the F-statistic for testing equality of diet effects, its distribution, and critical region.

🧠 Approach & Key Concepts

This problem analyzes an Unbalanced Two-Way ANOVA without interaction (an incomplete block design).

For part (a): We evaluate the absolute estimability of individual parameters in an overparameterized linear model.
For part (b): Estimability of treatment contrasts depends entirely on the connectedness of the design's incidence graph. If there is a path between any two diets through shared age groups, the contrast is estimable.
For part (c): We construct the standard F-test using the Adjusted Treatment Sum of Squares (obtained via the $C$-matrix from intra-block analysis) and the Error Sum of Squares.

✍️ Step-by-Step Proof / Derivation

Step 1: Assessing Absolute Estimability for Part (a)

The specified model is $y_{ijk} = \mu + \tau_i + \delta_j + \epsilon_{ijk}$. This is a classic overparameterized linear model. Let the design matrix be $\mathbf{X}$. The expected value is $\mathbb{E}[\mathbf{Y}] = \mathbf{X}\boldsymbol{\beta}$, where $\boldsymbol{\beta} = (\mu, \tau_1, \tau_2, \tau_3, \delta_1, \delta_2, \delta_3)^\top$.

A parameter vector $\mathbf{c}^\top \boldsymbol{\beta}$ is estimable if and only if $\mathbf{c}^\top$ lies in the row space of $\mathbf{X}$. However, the sum of the columns corresponding to the diet effects ($\tau_i$) equals the column for $\mu$, and the sum of the columns for the age effects ($\delta_j$) also equals the column for $\mu$.

Because of these linear dependencies, the design matrix $\mathbf{X}$ is not of full column rank. Therefore, the individual absolute parameters ($\mu, \tau_1, \dots, \delta_3$) are not estimable unless artificial constraints (like $\sum \tau_i = 0$ and $\sum \delta_j = 0$) are imposed.

Step 2: Checking Connectedness for Part (b)

A treatment contrast $\tau_i - \tau_j$ is estimable if the design is connected. We define a graph where vertices are diets and age groups, and edges exist if there is an observation in that diet-age cell. Let's trace the connections from the incidence table:

Diet 1 connects to Age 1.
Age 1 connects to Diet 2 and Diet 3.
Diet 2 connects to Age 2.
Age 2 connects to Diet 3.
Diet 3 connects to Age 3.

Since we can trace a path from any diet to any other diet (e.g., Diet 1 $\to$ Age 1 $\to$ Diet 2), the design is connected. Therefore, all elementary contrasts $\tau_i - \tau_j$ are estimable.

Step 3: Deriving the BLUEs of the Contrasts

By the Gauss-Markov Theorem, the BLUEs of estimable functions are obtained by substituting the solutions of the normal equations $\mathbf{X}^\top \mathbf{X} \hat{\boldsymbol{\beta}} = \mathbf{X}^\top \mathbf{Y}$. For an incomplete block design, we use the intra-block equations:

\mathbf{C} \hat{\boldsymbol{\tau}} = \mathbf{Q}

where $\mathbf{Q}_i = T_i - \sum_j \frac{n_{ij}}{n_{.j}} B_j$ (adjusted diet totals), and $\mathbf{C} = \text{diag}(n_{i.}) - \mathbf{N} \text{diag}(1/n_{.j}) \mathbf{N}^\top$. Here, $\mathbf{N}$ is the $3 \times 3$ incidence matrix $n_{ij}$.

Solving this reduced system yields the estimates $\hat{\tau}_i$. Since the $\mathbf{C}$ matrix has rank $v-1 = 2$, we impose $\sum \hat{\tau}_i = 0$ to get a unique generalized inverse $\mathbf{C}^-$. The BLUE for the contrast is:

\widehat{\tau_i - \tau_j} = \hat{\tau}_i - \hat{\tau}_j

Step 4: Constructing the F-Statistic for Part (c)

We are testing $H_0: \tau_1 = \tau_2 = \tau_3$. The test statistic compares the Adjusted Treatment Sum of Squares ($SST_{adj}$) to the Error Sum of Squares ($SSE$).

SST_{adj} = \hat{\boldsymbol{\tau}}^\top \mathbf{Q}

The total degrees of freedom is $N - 1 = 49$. The regression model has $1 (\mu) + 2 (\text{diets}) + 2 (\text{ages}) = 5$ independent parameters. Thus, the error degrees of freedom is $\nu_E = 50 - 5 = 45$. The numerator degrees of freedom is $v - 1 = 2$.

F = \frac{SST_{adj} / 2}{SSE / 45}

Under the null hypothesis $H_0$, this statistic follows an $F$-distribution with $2$ and $45$ degrees of freedom. The critical region for a test of level $\alpha$ is to reject $H_0$ if:

F > F_{2, 45}(\alpha)

Final Answer / Q.E.D:
(a) No, individual parameters are not estimable due to linear dependencies in the design matrix.
(b) Yes, all contrasts $\tau_i - \tau_j$ are estimable because the design is connected. Their BLUEs are $\hat{\tau}_i - \hat{\tau}_j$ derived from the intra-block equations $\mathbf{C}\hat{\boldsymbol{\tau}} = \mathbf{Q}$.
(c) The test statistic is $F = \frac{45 \cdot SST_{adj}}{2 \cdot SSE} \sim F_{2, 45}$. Reject $H_0$ if $F > F_{2, 45}(\alpha)$.

📌 Q7 UMVUE and the Rao-Blackwell Theorem (15 Marks)

Problem Statement: Let $X_1, X_2, \dots, X_n$ be independent observations from $U(0, \theta)$ and $X_{(n)} = \max\{X_1, \dots, X_n\}$. Let $r$ be a known positive integer.
(a) Find an unbiased estimator of $\theta^r$ based on a single observation.
(b) Find an unbiased estimator of $\theta^r$ based on a statistic that is complete and sufficient for $\theta$.
(c) Show that $\mathbb{E}[X_1^r | X_{(n)}] = \frac{n+r}{n(r+1)} X_{(n)}^r$.

🧠 Approach & Key Concepts

This is a classic application of the Lehmann-Scheffé Theorem.

For part (a): We use the simple raw moments of the uniform distribution.
For part (b): We find the expectation of a function of the maximum order statistic $X_{(n)}$, which is known to be the complete and sufficient statistic for $\theta$. We then scale it to achieve unbiasedness, yielding the Uniformly Minimum Variance Unbiased Estimator (UMVUE).
For part (c): Rather than calculating the complex conditional density $f(x_1 | x_{(n)})$, we invoke the Rao-Blackwell Theorem combined with uniqueness to prove the identity elegantly.

✍️ Step-by-Step Proof / Derivation

Step 1: Unbiased Estimator from a Single Observation for Part (a)

For a single observation $X_1 \sim U(0, \theta)$, the probability density function is $f(x) = 1/\theta$ for $0 < x < \theta$. The $r$-th raw moment is:

\mathbb{E}[X_1^r] = \int_{0}^{\theta} x^r \left(\frac{1}{\theta}\right) dx = \frac{1}{\theta} \left[ \frac{x^{r+1}}{r+1} \right]_0^\theta = \frac{\theta^{r+1}}{\theta(r+1)} = \frac{\theta^r}{r+1}

To make this an unbiased estimator of $\theta^r$, we multiply both sides by $(r+1)$:

\mathbb{E}[(r+1)X_1^r] = \theta^r

Thus, $T_1 = (r+1)X_1^r$ is an unbiased estimator of $\theta^r$.

Step 2: Finding the UMVUE using $X_{(n)}$ for Part (b)

The maximum order statistic $X_{(n)}$ is the complete and sufficient statistic for $\theta$. Its probability density function is $g(y) = \frac{n y^{n-1}}{\theta^n}$ for $0 < y < \theta$. We find the expected value of $X_{(n)}^r$:

\mathbb{E}[X_{(n)}^r] = \int_{0}^{\theta} y^r \left( \frac{n y^{n-1}}{\theta^n} \right) dy = \frac{n}{\theta^n} \int_{0}^{\theta} y^{n+r-1} dy

\mathbb{E}[X_{(n)}^r] = \frac{n}{\theta^n} \left[ \frac{y^{n+r}}{n+r} \right]_0^\theta = \frac{n}{\theta^n} \frac{\theta^{n+r}}{n+r} = \frac{n}{n+r} \theta^r

To construct an unbiased estimator, we multiply by the reciprocal of the constant term:

\mathbb{E}\left[ \frac{n+r}{n} X_{(n)}^r \right] = \theta^r

Since this estimator is a function of the complete sufficient statistic, by the Lehmann-Scheffé theorem, $T_2 = \frac{n+r}{n} X_{(n)}^r$ is the unique UMVUE for $\theta^r$.

Step 3: Elegant Proof via Rao-Blackwell for Part (c)

We are asked to evaluate the conditional expectation $\mathbb{E}[X_1^r | X_{(n)}]$. According to the Rao-Blackwell Theorem, if we condition an unbiased estimator on a sufficient statistic, the result is a new unbiased estimator that is a function of the sufficient statistic.

Let's condition our unbiased estimator from Part (a), $T_1 = (r+1)X_1^r$, on the sufficient statistic $X_{(n)}$:

\phi(X_{(n)}) = \mathbb{E}[(r+1)X_1^r | X_{(n)}]

By the properties of conditional expectation, $\mathbb{E}[\phi(X_{(n)})] = \mathbb{E}[(r+1)X_1^r] = \theta^r$. Thus, $\phi(X_{(n)})$ is an unbiased estimator of $\theta^r$ based strictly on the complete sufficient statistic $X_{(n)}$.

Because $X_{(n)}$ is complete, the unbiased estimator based on it is strictly unique (Lehmann-Scheffé uniqueness). Therefore, $\phi(X_{(n)})$ must perfectly equal the UMVUE we derived in Part (b):

\mathbb{E}[(r+1)X_1^r | X_{(n)}] = \frac{n+r}{n} X_{(n)}^r

By linearity of conditional expectation, we factor out the constant $(r+1)$ and divide:

(r+1)\mathbb{E}[X_1^r | X_{(n)}] = \frac{n+r}{n} X_{(n)}^r \implies \mathbb{E}[X_1^r | X_{(n)}] = \frac{n+r}{n(r+1)} X_{(n)}^r

Final Answer / Q.E.D:
(a) Unbiased estimator based on $X_1$: $(r+1)X_1^r$
(b) UMVUE based on complete sufficient statistic $X_{(n)}$: $\frac{n+r}{n} X_{(n)}^r$
(c) Using Rao-Blackwellization and uniqueness, the conditional expectation is successfully proven to be $\frac{n+r}{n(r+1)} X_{(n)}^r$.

📌 Q8 Asymptotic Rank Statistics and Correlations (15 Marks)

Problem Statement: $M$ observations from a continuous distribution $F$ are randomly labeled $A_1, A_2, A_3$ with probabilities $p_1, p_2, p_3$. $R_i$ is the sum of the ranks of observations labeled $A_i$.
(a) Find the limiting value of $R_1 / R_2$ as $M \to \infty$.
(b) Find the correlation coefficient between $R_1$ and $R_2$.

🧠 Approach & Key Concepts

This problem evaluates non-parametric rank sums via Indicator Variables. Since the labeling is random and independent of the continuous values, the rank $k$ of an observation is independent of its label.

For part (a): We define indicator variables for the labels, sum them up, and apply the Weak Law of Large Numbers (WLLN) to find the asymptotic ratio of their expected values.
For part (b): We compute the variance and covariance of the rank sums. Because the total number of labels is a multinomial process across ranks, the covariance will be negative. We extract the dominant terms $O(M^3)$ as $M \to \infty$.

✍️ Step-by-Step Proof / Derivation

Step 1: Defining the Indicator Variables for Part (a)

Let $I_k^{(i)}$ be an indicator variable that equals $1$ if the observation with rank $k$ (where $k = 1, \dots, M$) receives label $A_i$, and $0$ otherwise. Since the labeling is completely random and independent of the actual values, the probability is simply $p_i$:

\mathbb{E}[I_k^{(i)}] = \mathbb{P}(I_k^{(i)} = 1) = p_i

The sum of the ranks for group $A_i$ is $R_i = \sum_{k=1}^M k I_k^{(i)}$. By linearity of expectation:

\mathbb{E}[R_i] = \sum_{k=1}^M k \mathbb{E}[I_k^{(i)}] = p_i \sum_{k=1}^M k = p_i \frac{M(M+1)}{2}

By the Law of Large Numbers, as $M \to \infty$, $R_i$ converges in probability to its expected value. Therefore, the ratio converges to the ratio of their expectations:

\lim_{M \to \infty} \frac{R_1}{R_2} = \lim_{M \to \infty} \frac{p_1 \frac{M(M+1)}{2}}{p_2 \frac{M(M+1)}{2}} = \frac{p_1}{p_2}

Step 2: Variances and Covariances for Part (b)

Because the labels are assigned independently to each observation, $I_k^{(i)}$ and $I_j^{(i)}$ are independent for $k \neq j$. Therefore, the variance of the sum is the sum of the variances:

\text{Var}(R_i) = \sum_{k=1}^M \text{Var}(k I_k^{(i)}) = \sum_{k=1}^M k^2 \text{Var}(I_k^{(i)})

The variance of the Bernoulli indicator is $p_i(1 - p_i)$. Thus:

\text{Var}(R_i) = p_i(1 - p_i) \sum_{k=1}^M k^2 = p_i(1 - p_i) \frac{M(M+1)(2M+1)}{6}

To find the covariance between $R_1$ and $R_2$, we note that $\text{Cov}(I_k^{(1)}, I_j^{(2)}) = 0$ for $k \neq j$. However, for the same rank $k$, the indicators are negatively correlated because an observation cannot simultaneously be labeled $A_1$ and $A_2$:

\text{Cov}(I_k^{(1)}, I_k^{(2)}) = \mathbb{E}[I_k^{(1)} I_k^{(2)}] - \mathbb{E}[I_k^{(1)}]\mathbb{E}[I_k^{(2)}] = 0 - p_1 p_2 = -p_1 p_2

Thus, the covariance of the rank sums is:

\text{Cov}(R_1, R_2) = \sum_{k=1}^M k^2 \text{Cov}(I_k^{(1)}, I_k^{(2)}) = -p_1 p_2 \frac{M(M+1)(2M+1)}{6}

Step 3: Calculating the Correlation Coefficient

The correlation coefficient $\rho(R_1, R_2)$ is defined as the covariance divided by the product of the standard deviations:

\rho(R_1, R_2) = \frac{\text{Cov}(R_1, R_2)}{\sqrt{\text{Var}(R_1) \text{Var}(R_2)}}

Notice that the massive sum term $\frac{M(M+1)(2M+1)}{6}$ appears linearly in the covariance, and as a square root product in the denominator $\sqrt{\left(\frac{M(M+1)(2M+1)}{6}\right)^2}$. Therefore, this structural term perfectly cancels out!

\rho(R_1, R_2) = \frac{-p_1 p_2}{\sqrt{p_1(1-p_1) \cdot p_2(1-p_2)}}

Simplifying the denominator:

\rho(R_1, R_2) = -\sqrt{\frac{p_1^2 p_2^2}{p_1 p_2 (1-p_1)(1-p_2)}} = -\sqrt{\frac{p_1 p_2}{(1-p_1)(1-p_2)}}

Final Answer / Q.E.D:
(a) The limiting value of $R_1 / R_2$ as $M \to \infty$ is $\frac{p_1}{p_2}$.
(b) The correlation coefficient between $R_1$ and $R_2$ is exactly $-\sqrt{\frac{p_1 p_2}{(1-p_1)(1-p_2)}}$, which is independent of the sample size $M$.

📚 Paper Summary & Key Focus Areas

🎯 Core Concepts Tested in This Paper

Stochastic Processes (Q1): Utilizing the Spectral Theorem for circulant transition matrices over combinatorial brute-force methods. Understanding how the Periodicity of bipartite graph topologies prevents point-wise convergence to a stationary distribution.
Asymptotic Theory & Inference (Q2, Q8): Handling boundary conditions in MLE where the parameter space is restricted to integers ($\mathbb{N}$), and proving convergence to a degenerate distribution. Utilizing indicator variables and the Weak Law of Large Numbers (WLLN) to find asymptotic limits and correlations of non-parametric rank sums.
Mixture Models & Inequalities (Q3, Q4): Breaking down variance using the Law of Total Variance. Applying Popoviciu's Inequality to rigorously prove maximum variance bounds. Proving that $L_1$ loss functions are minimized by medians and extending this to exponentially dominating weighted medians.
Sampling & Order Statistics (Q5): Translating Simple Random Sampling with Replacement (SRSWR) into a Multinomial Distribution framework to compute variances. Calculating exact alignment probabilities between sample and population medians using Binomial structures.
Linear Models & Design of Experiments (Q6): Evaluating absolute versus contrast Estimability in overparameterized ANOVA models. Checking graph connectedness in incomplete block designs before deriving BLUEs and F-statistics via intra-block analysis.
Advanced Point Estimation (Q7): Applying the Lehmann-Scheffé Theorem to find UMVUEs using complete sufficient statistics ($X_{(n)}$). Elegantly bypassing complex conditional integrations by invoking the Rao-Blackwell Theorem.

💡 ISI Examiner Insight:
In the ISI STB paper, mathematical elegance is rewarded just as highly as getting the correct numerical answer. For example:
1. In Q1, using discrete Fourier roots (eigenvalues of circulant matrices) demonstrates a deeper mathematical maturity than setting up a standard binomial expansion.
2. In Q6, verifying the connectedness of the design matrix graph is a mandatory theoretical prerequisite before blindly calculating BLUEs.
3. In Q7, explicitly invoking the Rao-Blackwell Theorem to evaluate a conditional expectation bypasses pages of tedious integration and secures full subjective marks instantly.

🏛️ ISI Advanced Examination Practice

STB (Statistics B) — Model Solutions

📌 Q1 Markov Chains on a Cycle Graph (15 Marks)

🧠 Approach & Key Concepts

✍️ Step-by-Step Proof / Derivation

📌 Q2 Discrete MLE and Asymptotic Degeneracy (15 Marks)

🧠 Approach & Key Concepts

✍️ Step-by-Step Proof / Derivation

📌 Q3 Mixture Distributions and Popoviciu's Inequality (15 Marks)

🧠 Approach & Key Concepts

✍️ Step-by-Step Proof / Derivation

📌 Q4 Loss Functions and Weighted Medians (15 Marks)

🧠 Approach & Key Concepts

✍️ Step-by-Step Proof / Derivation

📌 Q5 SRSWR and Order Statistics (15 Marks)

🧠 Approach & Key Concepts

✍️ Step-by-Step Proof / Derivation

📌 Q6 Linear Models and Estimability in Block Designs (15 Marks)

🧠 Approach & Key Concepts

✍️ Step-by-Step Proof / Derivation

📌 Q7 UMVUE and the Rao-Blackwell Theorem (15 Marks)

🧠 Approach & Key Concepts

✍️ Step-by-Step Proof / Derivation

📌 Q8 Asymptotic Rank Statistics and Correlations (15 Marks)

🧠 Approach & Key Concepts

✍️ Step-by-Step Proof / Derivation

📚 Paper Summary & Key Focus Areas

🎯 Core Concepts Tested in This Paper