Rigorous, step-by-step descriptive proofs for the Indian Statistical Institute Junior Research Fellowship.
To determine the convergence of a sequence defined by a recurrence relation, we use the Monotone Convergence Theorem, which states that any sequence that is both monotonic and bounded is convergent. The strategy involves:
Step 1: Establishing Positivity and Lower Bound
Given $x_1 = 1$ and $x_{n+1} = \frac{1}{2}\left(x_n + \frac{2}{x_n}\right)$. Since $x_1 > 0$, by induction, if $x_n > 0$, then $x_{n+1} > 0$ for all $n \in \mathbb{N}$. Applying the AM-GM Inequality for $n \geq 1$:
This shows that for all $n \geq 2$, the sequence is bounded below by $\sqrt{2}$.
Step 2: Proving Monotonicity
We examine the difference between successive terms $x_{n+1}$ and $x_n$ for $n \geq 2$. Since we established $x_n \geq \sqrt{2}$ for $n \geq 2$, it follows that $x_n^2 \geq 2$.
Because $x_n^2 \geq 2$ and $x_n > 0$, the numerator $2 - x_n^2 \leq 0$. Thus, $x_{n+1} \leq x_n$ for all $n \geq 2$, proving the sequence is monotonically decreasing.
Step 3: Applying the Monotone Convergence Theorem
Since the sequence $\{x_n\}$ is monotonically decreasing (for $n \geq 2$) and bounded below by $\sqrt{2}$, by the Monotone Convergence Theorem, the sequence converges to a finite limit $L$.
Step 4: Finding the Limit and Limit Inferior
Taking the limit on both sides of the recurrence relation:
Since $x_n > 0$ for all $n$, we must have $L = \sqrt{2}$. For a convergent sequence, the limit inferior is equal to the limit.
This problem tests the interaction between continuous functions, suprema, and the topological properties of $\mathbb{R}^d$. The standard intended approach relies on a Proof by Contradiction and the Bolzano-Weierstrass Theorem. However, a rigorous analysis reveals a subtle mathematical trap: the set $S = \{\mathbf{x} : \|\mathbf{x} - \mathbf{x}_0\| \geq \delta\}$ is closed but unbounded.
If a function asymptotically approaches its maximum at infinity, the strict inequality can fail. To secure full marks in an advanced exam like the ISI JRF, we must first demonstrate the logical contradiction assuming a bounded sequence (showing the examiner's implicit compact domain assumption), and then provide a rigorous Counterexample to show the mathematical boundary of the claim in $\mathbb{R}^d$.
Step 1: Setting up the Contradiction
Let $S = \{\mathbf{x} \in \mathbb{R}^d : \|\mathbf{x} - \mathbf{x}_0\| \geq \delta\}$. Because the Euclidean norm is continuous, $S$ is a closed set. Since $f$ has a unique global maximum at $\mathbf{x}_0$, we know $f(\mathbf{x}_0) > f(\mathbf{x})$ for all $\mathbf{x} \in S$. Let $M = f(\mathbf{x}_0)$.
Assume for contradiction that the strict inequality does not hold. Since $f(\mathbf{x}) \leq M$ globally, this forces the supremum over $S$ to equal the maximum:
By the definition of the supremum, there must exist a sequence $\{\mathbf{x}_n\} \subset S$ such that:
Step 2: The Bounded Case (Applying Bolzano-Weierstrass)
Assume the sequence $\{\mathbf{x}_n\}$ is bounded. By the Bolzano-Weierstrass Theorem, it contains a convergent subsequence $\mathbf{x}_{n_k} \to \mathbf{y}$ for some limit point $\mathbf{y} \in \mathbb{R}^d$.
Because $S$ is a closed set, the limit point must lie within the set, so $\mathbf{y} \in S$. This implies $\|\mathbf{y} - \mathbf{x}_0\| \geq \delta$, which ensures $\mathbf{y} \neq \mathbf{x}_0$.
Since $f$ is continuous, the limit of the function values equals the function value of the limit:
We have found a point $\mathbf{y} \neq \mathbf{x}_0$ where $f(\mathbf{y}) = f(\mathbf{x}_0) = M$. This directly contradicts the premise that $\mathbf{x}_0$ is the unique maximum. Thus, if the domain were compact, the strict inequality $f(\mathbf{x}_0) > \sup_{S} f(\mathbf{x})$ holds.
Step 3: The Critical Counterexample for the Unbounded Case
The contradiction in Step 2 fails if the sequence $\{\mathbf{x}_n\}$ is unbounded (i.e., $\|\mathbf{x}_n\| \to \infty$). In $\mathbb{R}^d$, the statement as posed in the exam is technically false without a condition that $f(\mathbf{x})$ decays at infinity (coercivity). We construct a counterexample to prove this.
Define $f: \mathbb{R}^d \mapsto \mathbb{R}$ as:
Let us evaluate the properties of this function:
Thus, $f$ has a strictly unique maximum at $\mathbf{x}_0 = \mathbf{0}$. Now, let $\delta = 1$, making the region $S = \{\mathbf{x} \in \mathbb{R}^d : \|\mathbf{x}\| \geq 1\}$. Let us find the supremum of $f$ on $S$:
We now test the proposed inequality $f(\mathbf{x}_0) > \sup_{\mathbf{x} \in S} f(\mathbf{x})$ against our findings:
This is clearly a contradiction. The strict inequality fails to hold.
This problem tests the understanding of quadratic forms and projection matrices. We will use the definition of a Positive Semi-Definite (PSD) matrix ($v^\top A v \geq 0$) combined with the Cauchy-Schwarz Inequality. For the rank, we will utilize the property that for an idempotent matrix, the rank is equal to its trace.
Step 1: Verification of Positive Semi-Definiteness
A symmetric matrix $\mathbf{A}$ is PSD if for any non-zero vector $\mathbf{v} \in \mathbb{R}^d$, the quadratic form is non-negative. We expand $\mathbf{v}^\top \mathbf{A} \mathbf{v}$:
Using inner product notation, where $\mathbf{v}^\top \mathbf{v} = \|\mathbf{v}\|^2$ and $\mathbf{v}^\top \mathbf{u} = \mathbf{u}^\top \mathbf{v}$:
Step 2: Applying Cauchy-Schwarz Inequality
The Cauchy-Schwarz inequality states that $(\mathbf{u}^\top \mathbf{v})^2 \leq \|\mathbf{u}\|^2 \|\mathbf{v}\|^2$. Since $\mathbf{u}$ is a unit vector, $\|\mathbf{u}\|^2 = 1$. Therefore:
Thus, $\mathbf{v}^\top \mathbf{A} \mathbf{v} \geq 0$ for all $\mathbf{v}$, meaning $\mathbf{A}$ is positive semi-definite.
Step 3: Finding the Rank of A via Idempotency
We first check if $\mathbf{A}$ is idempotent ($\mathbf{A}^2 = \mathbf{A}$).
Since $\mathbf{u}$ is a unit vector, $\mathbf{u}^\top \mathbf{u} = 1$, reducing the equation to $\mathbf{A}^2 = \mathbf{I} - \mathbf{u}\mathbf{u}^\top = \mathbf{A}$. For idempotent matrices, $\text{rank}(\mathbf{A}) = \text{tr}(\mathbf{A})$.
This is a conditional probability problem modeled as a graph coloring problem. The four pillars form a Cycle Graph ($C_4$). We need to find the probability of using exactly 4 colors given that it is a proper graph coloring (adjacent nodes have different colors). We will use the Chromatic Polynomial for the sample space and permutations for the favorable events.
Step 1: Define the Sample Space ($S$)
The total number of ways to color a cycle graph $C_n$ with $k$ colors such that adjacent vertices are distinct is given by the chromatic polynomial $P(C_n, k) = (k-1)^n + (-1)^n(k-1)$. For $n=4$ pillars and $k=4$ colors:
Step 2: Define the Favorable Event ($E$)
The event is that "all 4 colors are used." This implies every pillar gets a unique color, which is a simple permutation of 4 colors over 4 spots:
We must ensure these 24 permutations satisfy the adjacency condition. Since all colors are mutually distinct, no two adjacent pillars can share a color. Thus, all 24 ways are valid.
Step 3: Calculate the Conditional Probability
Since the colors are chosen uniformly at random, the probability is the ratio of favorable outcomes to the valid sample space:
This problem tests your ability to establish bounds on moments of bounded random variables.
Step 1: Establishing the Quadratic Bound for Part (a)
We are given that $X$ takes values in the interval $[1, 10]$. Therefore, with probability 1:
This inequality implies that $(X - 1) \geq 0$ and $(X - 10) \leq 0$. Multiplying these two terms yields a non-positive result:
Expanding this expression gives:
Step 2: Bounding the Variance and Standard Deviation
Taking the expectation on both sides of the inequality, and using the linearity of expectation, we get:
We are given that $\mathbb{E}(X) = 2.8$. Substituting this value into our inequality:
Now, we calculate the maximum possible variance of $X$ using the standard variance formula $\text{Var}(X) = \mathbb{E}(X^2) - [\mathbb{E}(X)]^2$:
The standard deviation $\sigma_X$ is the square root of the variance. Since variance is bounded by $12.96$:
This completes the proof for part (a).
Step 3: Separating the Expectation for Part (b)
We are given two independent and identically distributed (i.i.d.) random variables, $X_1$ and $X_2$, both distributed as $X$. We need to evaluate the expectation of their ratio.
Because $X_1$ and $X_2$ are independent, any function of $X_1$ is independent of any function of $X_2$. Therefore, the expectation of their product can be factored:
Since $X_1$ and $X_2$ are identically distributed, $\mathbb{E}(X_1) = \mathbb{E}(X_2) = \mathbb{E}(X)$. The equation simplifies to:
Step 4: Applying Jensen's Inequality
Consider the function $g(x) = \frac{1}{x}$. We evaluate its convexity on the support of $X$, which is $[1, 10]$ (hence $x > 0$).
The second derivative of $g(x)$ is $g''(x) = \frac{2}{x^3}$. For all $x \in [1, 10]$, $g''(x) > 0$. Since the second derivative is strictly positive, $g(x) = \frac{1}{x}$ is a strictly convex function on the interval.
By Jensen's Inequality, for any convex function $g$ and random variable $X$:
Applying this to our specific function $g(x) = 1/x$:
Substitute this lower bound back into our factored expectation from Step 3:
This problem explores the relationship between conditional independence and marginal independence, as well as the strict properties required for a Bivariate Normal Distribution.
Step 1: Setting up the Conditional Distributions for Part (a)
We are given the latent variable $Z \sim U(0,1)$. Thus, its expectation and variance are:
We are also given that conditional on $Z=z$, $X$ and $Y$ are independent and identically distributed (i.i.d.) as $N(z, 1)$. This implies:
Furthermore, because $X$ and $Y$ are conditionally independent given $Z$, their conditional covariance is zero:
Step 2: Evaluating Marginal Independence via the Law of Total Covariance
To determine if $X$ and $Y$ are independent, we evaluate their marginal covariance. If they are independent, their covariance must be exactly $0$. We apply the Law of Total Covariance:
Substitute the conditional expectations and covariance we established in Step 1:
The covariance of a variable with itself is simply its variance:
Since $\text{Cov}(X, Y) = 1/12 \neq 0$, the random variables $X$ and $Y$ are not marginally independent.
Step 3: Testing for Bivariate Normality for Part (b)
If the joint distribution $(X,Y)$ follows a bivariate normal distribution, then a necessary (though not sufficient on its own) condition is that the marginal distributions of $X$ and $Y$ must be univariate normal distributions. Let us find the marginal distribution of $X$ using its Moment Generating Function (MGF), $M_X(t)$.
By the Law of Iterated Expectations, the marginal MGF is the expected value of the conditional MGF:
Conditional on $Z$, $X \sim N(Z, 1)$. The MGF of a Normal distribution $N(\mu, \sigma^2)$ is $e^{\mu t + \frac{1}{2}\sigma^2 t^2}$. Substituting $\mu = Z$ and $\sigma^2 = 1$:
Now, we take the expectation over the uniform distribution of $Z$:
Step 4: Evaluating the MGF of Z
Since $Z \sim U(0,1)$, its MGF is given by the integral:
Substituting this back into the marginal MGF for $X$ gives:
The MGF of a normal distribution must be strictly of the form $\exp(\mu t + \frac{1}{2}\sigma^2 t^2)$. Because of the additional $\frac{e^t - 1}{t}$ term, $M_X(t)$ is clearly not the MGF of a normal distribution. Therefore, the marginal distribution of $X$ is not normal (it is a continuous mixture of normals).
This problem assesses your mastery of statistical inference, specifically data reduction techniques using Sufficient and Ancillary statistics.
Step 1: Constructing the Joint Likelihood Function for Part (a)
We are given a random sample $(X_1, Y_1), \dots, (X_n, Y_n)$ from a bivariate normal distribution with means $\mu_X = \mu_Y = 0$, variances $\sigma_X^2 = \sigma_Y^2 = 1$, and correlation $\rho = \theta$. The joint density (likelihood function) $L(\theta)$ is the product of the marginal densities:
Simplifying this by bringing the product into the exponent as a sum yields:
Step 2: Applying the Lehmann-Scheffรฉ Theorem
Let $(\mathbf{x}, \mathbf{y})$ and $(\mathbf{x}', \mathbf{y}')$ be two distinct sample points. We construct the likelihood ratio:
For this ratio to be independent of the parameter $\theta$ for all $\theta \in (-1, 1)$, the terms multiplied by the functionally independent parameter components, namely $-\frac{1}{2(1-\theta^2)}$ and $\frac{\theta}{1-\theta^2}$, must perfectly vanish. This requires:
By the Lehmann-Scheffรฉ theorem, the minimal sufficient statistic is the vector $T$ that defines this equivalence class:
Step 3: Evaluating Ancillarity of $Q_1$ for Part (b)
A statistic is ancillary if its distribution does not depend on $\theta$. Let's examine $Q_1 = \sum_{i=1}^n X_i^2$.
From the joint bivariate normal density $f_\theta(x, y)$, integrating out $y$ gives the marginal distribution of $X$. Since $\mu_X = 0$ and $\sigma_X^2 = 1$, the marginal distribution is standard normal:
Because the $X_i$ are independent standard normal variables, the sum of their squares follows a Chi-square distribution with $n$ degrees of freedom:
Since the $\chi^2_{(n)}$ distribution has no dependence on $\theta$, we conclude that $Q_1$ is strictly ancillary for $\theta$.
Step 4: Evaluating Ancillarity of the Joint Statistic $Q = (Q_1, Q_2)$
By symmetry, the marginal distribution of $Y_i$ is also $N(0,1)$, meaning $Q_2 = \sum_{i=1}^n Y_i^2 \sim \chi^2_{(n)}$. Thus, $Q_2$ is individually ancillary. However, for the joint vector $Q = (Q_1, Q_2)$ to be ancillary, its joint distribution must be independent of $\theta$.
To check this, let us evaluate the covariance between $Q_1$ and $Q_2$:
We compute $\text{Cov}(X_i^2, Y_i^2) = \mathbb{E}[X_i^2 Y_i^2] - \mathbb{E}[X_i^2]\mathbb{E}[Y_i^2]$. Since $X_i, Y_i \sim N(0,1)$, we know $\mathbb{E}[X_i^2] = 1$ and $\mathbb{E}[Y_i^2] = 1$.
Using the properties of conditional expectation for bivariate normals, $Y_i | X_i \sim N(\theta X_i, 1-\theta^2)$:
Recall that the 4th moment of a standard normal is $3$. Thus:
Returning to the covariance:
Because the covariance of $Q_1$ and $Q_2$ is a direct function of $\theta$, their joint distribution fundamentally depends on $\theta$. Therefore, the vector $Q$ cannot be ancillary.
This problem falls under the framework of Linear Models with Constraints. The fundamental constraint is geometric: the sum of the true angles of a planar triangle is exactly $180^\circ$ (or $\pi$ radians). The strategy involves:
Step 1: Formulating the Model and the Constraint
We are given that the measurements $Y_i$ are subject to independent $N(0, \sigma^2)$ errors. Thus, the measurement model is:
Since the angles belong to a triangle, their true sum is a known constant. Let $S = 180^\circ$ (or $\pi$). Thus, we have the deterministic constraint:
Step 2: Constructing the Test Statistic Numerator
We are testing $H_0: \theta_1 - \theta_2 = 0$. A natural, unbiased estimator for $\theta_1 - \theta_2$ is the difference in their measurements. Let $U = Y_1 - Y_2$. By the properties of normal combinations:
Under the null hypothesis $H_0$, $U \sim N(0, 2\sigma^2)$. Standardizing this gives us a standard normal variable:
Step 3: Estimating the Unknown Variance $\sigma^2$
Because $\sigma^2$ is unknown, we need to estimate it using the remaining degrees of freedom. Consider the sum of the measurements, $V = Y_1 + Y_2 + Y_3$.
Thus, $V \sim N(S, 3\sigma^2)$. We can construct a $\chi^2$ random variable with 1 degree of freedom:
Step 4: Proving Independence of $U$ and $V$
For the t-statistic to be valid, the numerator ($Z$) and the denominator (function of $X$) must be independent. Since they are linear combinations of jointly normal variables, we only need to show their covariance is zero:
Since $\text{Cov}(U, V) = 0$, $U$ and $V$ are strictly independent. Consequently, $Z$ and $X$ are independent.
Step 5: Constructing the Final t-Test
We define the test statistic $T$ as the ratio of the standard normal variable $Z$ to the square root of the chi-square variable $X$ divided by its degrees of freedom ($\nu = 1$):
The unknown parameter $\sigma$ perfectly cancels out, leaving a completely computable statistic:
Under $H_0$, this statistic $T$ follows a Student's t-distribution with $1$ degree of freedom (which is equivalent to a standard Cauchy distribution).
This problem asks for the Most Powerful (MP) test for a simple null versus a simple alternative hypothesis. We will use the Neyman-Pearson Lemma (NPL). The key steps are to find the joint likelihood function, identify the sufficient statistic (which will be the first order statistic $X_{(1)}$ due to the indicator function), formulate the Likelihood Ratio $\Lambda$, and determine the critical region based on the size $\alpha$. Because the support of the distribution depends on the parameter $\theta$, the likelihood ratio will be piecewise, leading to different randomized test structures depending on the condition $e^{2n(\theta_0 - \theta_1)} \lessgtr \alpha$.
Step 1: Likelihood Function and Sufficient Statistic
The joint probability density function (likelihood) of the sample is:
where $X_{(1)} = \min(X_1, \dots, X_n)$. By the Factorization Theorem, $X_{(1)}$ is a minimal sufficient statistic for $\theta$. We must find the distribution of $X_{(1)}$ under $H_0$ to evaluate probabilities. For $y \geq \theta_0$:
Step 2: Constructing the Likelihood Ratio ($\Lambda$)
We are testing $H_0: \theta = \theta_0$ against $H_1: \theta = \theta_1$ (with $\theta_1 > \theta_0$). The likelihood ratio is:
Since $\theta_1 > \theta_0$, we analyze $\Lambda$ piecewise over the support of $H_0$ ($X_{(1)} \geq \theta_0$):
By the Neyman-Pearson Lemma, the Most Powerful test function $\phi(X_{(1)})$ has the form:
where $c$ and $\gamma$ are chosen such that $\mathbb{E}_{\theta_0}[\phi(X_{(1)})] = \alpha$. Note that $P_{\theta_0}(\Lambda = k^*) = P_{\theta_0}(X_{(1)} \geq \theta_1) = e^{-2n(\theta_1 - \theta_0)} = e^{2n(\theta_0 - \theta_1)}$.
Step 3: Deriving the Test for Case (a) $e^{2n(\theta_0 - \theta_1)} < \alpha$
In this case, the probability of obtaining the maximum likelihood ratio $k^*$ under $H_0$ is strictly less than the allowed size $\alpha$:
If we set the threshold $c = k^*$, our test size would be too small. Therefore, we must lower the threshold to $c = 0$. The test rule becomes:
We solve for $\gamma_1$ using the size constraint:
Step 4: Deriving the Test for Case (b) $e^{2n(\theta_0 - \theta_1)} > \alpha$
In this case, the probability of obtaining the maximum likelihood ratio $k^*$ under $H_0$ exceeds $\alpha$:
If we always rejected $H_0$ when $X_{(1)} \geq \theta_1$, the Type I error would exceed $\alpha$. Thus, we must set $c = k^* = e^{2n(\theta_1 - \theta_0)}$ and introduce randomization at this boundary. The test rule becomes:
We solve for $\gamma_2$ using the size constraint:
This problem evaluates your structural understanding of the Multivariate Normal Distribution.
Step 1: Analyzing Independence and Symmetry for Part (a)
Let $W = X_1 - X_2 X_3 X_4$. We want to find $\mathbb{P}(W > 0)$. Notice the zero entries in the covariance matrix $\boldsymbol{\Sigma}$:
Because the variables are jointly normal, zero covariance implies strict independence. Thus, the random vector $(X_1, X_4)$ is completely independent of the random vector $(X_2, X_3)$.
Furthermore, since the mean vector is $\mathbf{0}$, the distribution of $(X_1, X_4)$ is symmetric. Specifically, if we apply the transformation $(X_1, X_4) \mapsto (-X_1, -X_4)$, the mean remains $(0,0)$ and the covariance matrix remains unchanged because $\text{Cov}(-X_1, -X_4) = (-1)(-1)\text{Cov}(X_1, X_4) = 1$.
Therefore, $(-X_1, X_2, X_3, -X_4)$ has the exact same joint distribution as $(X_1, X_2, X_3, X_4)$.
Step 2: Evaluating the Probability $\mathbb{P}(W > 0)$
Let us apply this transformation to our variable $W$:
Since the joint distribution is invariant under this transformation, $W$ and $-W$ are identically distributed ($W \stackrel{d}{=} -W$). This means the distribution of $W$ is perfectly symmetric around $0$.
Because $W$ is a continuous random variable (a polynomial of continuous normals), $\mathbb{P}(W = 0) = 0$. Therefore:
Thus, $\mathbb{P}(X_1 > X_2 X_3 X_4) = \mathbb{P}(X_1 - X_2 X_3 X_4 > 0) = 0.5$.
Step 3: Partitioning the Covariance Matrix for Part (b)
The squared multiple correlation coefficient of $X_1$ with $\mathbf{X}_{-1} = (X_2, X_3, X_4)^\top$ is given by:
From the given $\boldsymbol{\Sigma}$, we extract the sub-matrices:
Step 4: Computing $R_{1 \cdot 234}$
Notice that $\boldsymbol{\Sigma}_{-1, -1}$ is block-diagonal. The vector $\boldsymbol{\Sigma}_{1, -1}$ only has a non-zero entry in the 3rd position (corresponding to $X_4$). Therefore, we only need the bottom-right entry of the inverse matrix $\boldsymbol{\Sigma}_{-1, -1}^{-1}$, which is simply $1/4$.
Now, divide by $\boldsymbol{\Sigma}_{11}$:
Taking the positive square root (since $R \geq 0$):
Step 5: Computing the Partial Correlation for Part (c)
We need the partial correlation $\rho_{23 \cdot 14}$, which is derived from the conditional covariance matrix of $(X_2, X_3)$ given $(X_1, X_4)$:
We inspect the cross-covariance matrix $\boldsymbol{\Sigma}_{23, 14}$ between the block $(X_2, X_3)$ and the block $(X_1, X_4)$. From $\boldsymbol{\Sigma}$, all these cross-terms are zero.
Because the cross-covariance is a zero matrix (confirming our observation in Step 1 that the blocks are independent), conditioning on $(X_1, X_4)$ provides zero information about $(X_2, X_3)$. Thus, the conditional covariance is exactly equal to the marginal covariance:
The partial correlation is therefore identical to the marginal Pearson correlation between $X_2$ and $X_3$: