Rigorous mathematical proofs and derivations for the Indian Statistical Institute Entrance Examination.
This problem is a direct application of the Neyman-Pearson Lemma. We first construct the likelihood function $L(p)$ based on the probabilities of the three mutually exclusive outcomes for each patient. By taking the Likelihood Ratio $\Lambda = L(p_A) / L(p_0)$ and absorbing all constant terms into theHere is the first batch of rigorous, step-by-step model solutions for the **ISI STB 2023 Paper (Questions 1 to 4)**. I have formatted these perfectly into your existing HTML template with MathJax, so you can easily copy and paste the entire block to create a new `STB_2023.html` page for your repository! ```html
Rigorous mathematical proofs and derivations for the Indian Statistical Institute Entrance Examination.
This problem uses the Neyman-Pearson Lemma to find the Most Powerful (MP) test for simple hypotheses. The experimental setup naturally follows a Multinomial Distribution with three categories. We must first define the probabilities for each category in terms of $p$, construct the joint likelihood function, and evaluate the likelihood ratio $\Lambda = L(H_a) / L(H_0)$. The rejection region is defined where this ratio exceeds a critical constant $k$. We then algebraically manipulate the inequality to isolate $n_1$ and $n_2$ to match the requested format.
Step 1: Define Category Probabilities
For any single patient, let's determine the probabilities of falling into each of the three groups:
Note that $n_1 + n_2 + n_3 = n$, which implies $n_3 = n - n_1 - n_2$.
Step 2: Construct the Likelihood Function
The likelihood function for the multinomial distribution (ignoring the combinatorial constant since it cancels out in the ratio) is:
Combine the base terms for $p$ and $(1-p)$:
Substitute $n_3 = n - n_1 - n_2$ to express the exponent entirely in terms of $n, n_1,$ and $n_2$:
Step 3: Form the Likelihood Ratio
By the Neyman-Pearson Lemma, we reject $H_0$ if the likelihood ratio $\Lambda = \frac{L(2/3)}{L(1/2)} > k$.
Group the bases with matching exponents:
Step 4: Algebraic Manipulation to Target Format
We need to isolate $n_1$ and $n_2$. Expand the exponents and group by variable:
Move the constant term involving $n$ to the right side: $k' = k / (2/3)^{2n}$. Combine the bases for $n_1$ and $n_2$:
Simplify the inner brackets. For $n_1$:
For $n_2$:
Thus, the inequality simplifies beautifully to:
This problem utilizes the powerful concept of Exchangeability. Even though the variables $X_i$ are complex non-linear ratios of uniforms, they are perfectly symmetric. Because they must sum to $1$ deterministically, their total variance is strictly 0. We can exploit this deterministic sum to set up a linear equation connecting the individual variances and covariances, entirely bypassing the need to compute the nasty marginal distributions of the Dirichlet-like ratios.
Step 1: Establishing Exchangeability Properties
Because the original variables $U_1, \dots, U_n$ are i.i.d., the joint distribution of $(X_1, \dots, X_n)$ is invariant under any permutation of indices. This symmetry dictates two crucial facts:
Step 2: Utilizing the Sum Constraint
By definition, the sum of all $X_i$ variables equals $1$ deterministically:
Because the sum is a constant, its variance is exactly zero:
Step 3: Expanding the Variance of the Sum
We expand the variance of the sum using the standard formula incorporating covariances:
Substitute the symmetric identities $\sigma^2$ and $c$ into the equation. There are $n$ variance terms and $n(n-1)$ covariance pairs:
Solve for the uniform covariance $c$:
Step 4: Finding the Correlation Coefficient $\rho$ for Part (a)
The correlation coefficient is defined as $\rho = \frac{\text{Cov}(X_i, X_j)}{\sqrt{\text{Var}(X_i)\text{Var}(X_j)}}$. Since variances are equal, the denominator is just $\sigma^2$:
Since we must have at least $n \geq 2$ variables to even have pairs ($i \neq j$), the denominator $(n-1)$ is strictly positive. Therefore, $\rho = -\frac{1}{n-1}$ is strictly less than 0.
Step 5: Evaluating the Asymptotic Limit for Part (b)
We simply take the limit of our derived $\rho$ as the sample size $n$ approaches infinity:
Conceptually, as the pool of variables grows infinitely large, the "constraint" that they must sum to 1 applies less strict pressure on any individual pair, rendering them asymptotically uncorrelated.
This problem models a Random Walk with a Random Stopping Time. We will use Wald's Equation ($E[S] = E[N] \times E[D]$) to find the expected score. The number of rolls $N$ follows a Geometric distribution. After finding the exact expected value, we will apply Markov's Inequality to generate an upper bound for the probability distribution and formally restrict the median.
Step 1: Analyzing the Dice Difference $D$
There are $6 \times 6 = 36$ possible outcomes when rolling two fair dice. Let's find the probability distribution of $D = |X_1 - X_2|$:
The unconditional expected value of a single roll $D$ is:
Step 2: Defining the Stopping Time and Wald's Equation
The game stops on the very first roll where $D=0$. Therefore, the total number of rolls $N$ (including the final stopping roll) follows a Geometric Distribution with success probability $p = 1/6$.
The expected number of total rolls is:
The total score is the sum of all rolls: $S = \sum_{i=1}^N D_i$. Even though the final roll $D_N$ is strictly 0, it contributes 0 to the sum, so the mathematical formula holds. Because the stopping time $N$ depends only on the current roll and not on future rolls, we can apply Wald's Equation:
Step 3: Bounding the Median via Markov's Inequality
We are asked to prove that the median $m$ of $S$ cannot exceed 24. Since $S$ is a non-negative random variable (sums of absolute differences), we can apply Markov's Inequality:
Let's evaluate the probability that the score is 24 or greater:
Notice that $35/72$ is strictly less than $36/72 = 0.5$. Therefore:
By the fundamental definition of the median $m$, the probability of drawing a value greater than or equal to the median must be at least 0.5 ($P(S \geq m) \geq 0.5$). Since the probability mass at and above 24 is already too small (less than 0.5), the median $m$ must be strictly located to the left of 24.
This problem demonstrates the properties of Poisson Thinning and moment derivation via the Law of Total Expectation/Variance.
Step 1: Distribution of the Random Sum for Part (a)
Let $Z = \sum_{i=1}^{Y_1} B_i$. Since $B_i$ are independent Bernoulli variables, the conditional distribution of $Z$ given $Y_1 = y$ is exactly Binomial:
We find the unconditional distribution of $Z$ using Probability Generating Functions (PGF). The PGF of a random sum $Z = \sum_{1}^N X_i$ is $G_Z(s) = G_N(G_X(s))$.
This is precisely the PGF of a Poisson distribution. Thus, unconditionally, $Z \sim \text{Poisson}(p\lambda)$.
Step 2: Recombining the Variables for Part (a)
We are given $Y_2 = Z + \epsilon_2$. We know $\epsilon_2 \sim \text{Poisson}((1-p)\lambda)$.
Because $\epsilon_2$ is independent of $Y_1$ (and thus independent of $Z$, since $Z$ is just a function of $Y_1$ and independent coin flips), $Y_2$ is the sum of two strictly independent Poisson random variables. The sum of independent Poisson variables is another Poisson variable whose parameter is the sum of the individual parameters:
Step 3: Calculating Covariance for Part (b)
We want to find $\text{corr}(Y_1, Y_2) = \frac{\text{Cov}(Y_1, Y_2)}{\sqrt{\text{Var}(Y_1)\text{Var}(Y_2)}}$. Since both are Poisson($\lambda$), their variances are strictly $\lambda$. The denominator is $\sqrt{\lambda \cdot \lambda} = \lambda$. Let's compute the covariance:
Because $\epsilon_2$ is given as completely independent of $Y_1$, their covariance is 0. We evaluate $\text{Cov}(Y_1, Z)$ using expectations:
We use the Law of Total Expectation on the joint term $\mathbb{E}[Y_1 Z]$ by conditioning on $Y_1$:
Since we are conditioning on $Y_1$, it behaves as a constant and can be pulled out of the inner expectation:
As established earlier, $Z \mid Y_1$ is Binomial($Y_1, p$), so its expected value is $pY_1$. Substitute this back:
For a Poisson random variable, the second moment is $\mathbb{E}[Y_1^2] = \text{Var}(Y_1) + (\mathbb{E}[Y_1])^2 = \lambda + \lambda^2$.
Now, substitute this back into the Covariance equation. Remember $\mathbb{E}[Y_1] = \lambda$ and $\mathbb{E}[Z] = p\lambda$:
Step 4: Final Correlation Calculation
Substitute the covariance and variances into the correlation formula:
This problem evaluates your mastery of Sufficiency, Completeness, and the Lehmann-Scheffé Theorem.
Step 1: Finding the Complete Sufficient Statistic for Part (a)
The joint probability mass function of the random sample is:
We can rewrite this in the standard exponential family format:
By the Factorization Theorem, $T = \sum_{i=1}^n X_i$ is a sufficient statistic for $\lambda$. Because the distribution belongs to the full-rank exponential family (the parameter space for $\ln(\lambda)$ contains an open interval in $\mathbb{R}$), $T$ is also a complete sufficient statistic.
Since $\bar{X} = T/n$, it is a 1-to-1 function of $T$, making $\bar{X}$ itself a complete sufficient statistic. We know $\mathbb{E}(\bar{X}) = \lambda$ (it is unbiased). By the Lehmann-Scheffé Theorem, any unbiased estimator that is a function of a complete sufficient statistic is the unique UMVUE. Thus, $\bar{X}$ is the UMVUE for $\lambda$.
Step 2: Proving the Conditional Expectation for Part (b)
We know that the sample variance $S^2$ is an unbiased estimator for the population variance. For a Poisson distribution, the population variance is exactly $\lambda$. Therefore:
By the Rao-Blackwell Theorem, if we condition an unbiased estimator on a sufficient statistic, we obtain an estimator that is also unbiased and has equal or smaller variance. Let $\phi(\bar{X}) = \mathbb{E}(S^2 \mid \bar{X})$.
Because $\bar{X}$ is a complete sufficient statistic, Lehmann-Scheffé dictates that there is exactly one unbiased estimator that is a function of $\bar{X}$. This unique estimator is the UMVUE.
Since we already established in (a) that $\bar{X}$ is the unique UMVUE, the conditioned function $\phi(\bar{X})$ must strictly equal $\bar{X}$ almost surely:
Step 3: Proving Strict Variance Inequality for Part (c)
By the Law of Total Variance (Conditional Variance Formula):
Substitute the identity $\mathbb{E}(S^2 \mid \bar{X}) = \bar{X}$ from part (b) into the second term:
Since variance must be non-negative, $\text{Var}(S^2 \mid \bar{X}) \geq 0$. Furthermore, for any sample size $n \geq 2$, $S^2$ is not perfectly determined by $\bar{X}$ (i.e., multiple different data sets can have the same mean but different variances). Therefore, the conditional variance is strictly positive with positive probability, meaning its expectation is strictly greater than 0:
Adding this strictly positive term to $\text{Var}(\bar{X})$ yields:
This is a Randomized Block Design (RBD), but it is not balanced because Treatment 1 (A) is replicated more often than the others within each block. We must use Intra-Block Analysis via the C-Matrix.
Step 1: Defining the Design Parameters
Let Treatments $1,2,3,4$ correspond to A,B,C,D. Blocks $b = 4$, Plots per block $k = 5$. Replications $r_1 = 8$, and $r_2 = r_3 = r_4 = 4$. The Incidence Matrix $N$ (where entry $n_{ij}$ is the number of times treatment $i$ appears in block $j$) is:
Step 2: Constructing the C-Matrix
The information matrix for treatment effects (adjusted for blocks) is $C = R - \frac{1}{k} N N^T$, where $R = \text{diag}(r_1, r_2, r_3, r_4)$.
First, we compute $N N^T$. Since all columns of $N$ are identical $(2,1,1,1)^T$, the inner product of the $i$-th row and $j$-th row is $4 \times (n_{i1} n_{j1})$:
Now, calculate $C$ using $k=5$:
Step 3: Solving the Normal Equations
The normal equations are $C \hat{\tau} = Q$, where $Q_i = T_i - \frac{1}{k}\sum_{j} n_{ij} B_j$ (adjusted treatment totals). Note that $\sum Q_i = 0$.
Let's write out the first two equations (multiplying both sides by 5 to clear fractions):
We apply the standard identifiability constraint: $\sum \hat{\tau}_i = 0$. This means $-\hat{\tau}_2 - \hat{\tau}_3 - \hat{\tau}_4 = \hat{\tau}_1$. Substitute this into Eq 1:
For Eq 2, we know $-4\hat{\tau}_3 - 4\hat{\tau}_4 = 4\hat{\tau}_1 + 4\hat{\tau}_2$. Substitute this into Eq 2:
Substitute $\hat{\tau}_1$ to solve for $\hat{\tau}_2$:
Step 4: Formulating the Contrast Estimator
We want the estimator for $\tau_1 - \tau_2$ (due to the symmetry of B, C, and D, the formula will be identical for $\tau_3$ and $\tau_4$).
Step 5: Calculating the Variance of the Estimator
For any contrast $l^T \hat{\tau} = \sum c_i Q_i$, the variance is $\text{Var}(\sum c_i Q_i) = \sigma^2 \sum_{i,j} c_i c_j C_{ij}$.
Using $c_1 = \frac{1}{8}$, $c_j = -\frac{1}{4}$, and $c_k = 0$ for others, we pull values directly from our $C$-matrix in Step 2:
This proof bridges standard Measure Theory concepts, specifically linking $L^1$ convergence to uniform set-wise convergence (a variant of Scheffé’s lemma logic).
Part 1: Forward Direction ($\implies$)
Assume $\mathbb{E}(|X_n - X|) \to 0$ as $n \to \infty$. We must show that $\sup_{A \in \mathscr{A}} \left| \int_A X_n dP - \int_A X dP \right| \to 0$.
For any arbitrary measurable set $A \in \mathscr{A}$, we can bound the difference of the integrals by bringing the absolute value inside:
Because the integrand $|X_n - X|$ is strictly non-negative, integrating it over a subset $A$ will always be less than or equal to integrating it over the entire sample space $\Omega$:
Notice that this upper bound $\mathbb{E}(|X_n - X|)$ is completely independent of the choice of set $A$. Therefore, the supremum over all sets is also bounded by this expectation:
Since we assumed $\mathbb{E}(|X_n - X|) \to 0$, the supremum must also shrink to zero by the Squeeze Theorem, establishing uniform convergence.
Part 2: Reverse Direction ($\Longleftarrow$)
Assume $\int_A X_n dP \to \int_A X dP$ uniformly for all $A \in \mathscr{A}$. We must show that $\mathbb{E}(|X_n - X|) \to 0$.
Let's unpack the expected absolute difference. We partition the sample space $\Omega$ into two disjoint, $n$-dependent measurable sets based on the sign of the difference:
We can now write the $L^1$ norm explicitly without absolute value bars:
We are given that $\int_A (X_n - X) dP \to 0$ uniformly for all sets. By the definition of uniform convergence, for any $\epsilon > 0$, there exists an integer $N$ such that for all $n > N$ and for every possible set $A \in \mathscr{A}$:
Because $A_n$ and $A_n^c$ are valid measurable sets in $\mathscr{A}$ for any fixed $n$, this bound strictly applies to them. For any $n > N$:
Since this holds for any arbitrarily small $\epsilon > 0$, we have formally proven that $\mathbb{E}(|X_n - X|) \to 0$.
Because the independent variable $x_i$ is deterministic and rigidly sorted ($x_i = i/n$), the threshold $x=0.5$ physically cuts the dataset into two strictly disjoint subsets. Importantly, the problem does not impose a continuity constraint (like $f(0.5^-) = f(0.5^+)$). Therefore, the joint likelihood function factors perfectly into two independent components. We can solve for the MLEs by running two completely separate Ordinary Least Squares (OLS) regressions, and then pooling their residual errors to estimate the global variance $\sigma^2$.
Step 1: Partitioning the Likelihood Function for Part (a)
Let $I_1 = \{i : x_i \leq 0.5\}$ and $I_2 = \{i : x_i > 0.5\}$. Since the $n+1$ observations are independent, the joint likelihood function is the product of individual normal densities:
Given $\gamma_1 = \gamma_2 = 0$, $f(x)$ is just a straight line in each domain. We factor the likelihood based on our subsets:
Step 2: Solving for the Regression Coefficients (Part a)
To maximize the likelihood with respect to the Greek parameters, we must strictly minimize the sum of squared errors in the exponent. Notice that $\alpha_1, \beta_1$ only appear in the first sum, and $\alpha_2, \beta_2$ only appear in the second sum. They do not interact.
Thus, the MLEs are exactly the standard OLS estimators computed separately on the two halves of the data:
Step 3: Solving for Variance $\sigma^2$ (Part a)
Once the mean parameters are optimized, let $RSS_1$ and $RSS_2$ be the minimized Residual Sum of Squares for each respective regression. Differentiating the log-likelihood with respect to $\sigma^2$ and setting it to zero yields the standard MLE for variance (the pooled RSS divided by the total number of data points, $n+1$):
Step 4: Extension to Unknown $\gamma$ for Part (b)
If $\gamma_1, \gamma_2$ are unknown, the theoretical model simply upgrades from a segmented linear regression to a segmented quadratic (polynomial) regression.
Because there is still no continuity constraint given in the problem between the domains $x \leq 0.5$ and $x > 0.5$, the logical procedure remains identical: