$$ \newcommand{\arginf}{\mathrm{arginf}} \newcommand{\argmin}{\mathrm{argmin}} \newcommand{\argmax}{\mathrm{argmax}} \newcommand{\asconv}[1]{\stackrel{#1-a.s.}{\rightarrow}} \newcommand{\Aset}{\mathsf{A}} \newcommand{\b}[1]{{\mathbf{#1}}} \newcommand{\ball}[1]{\mathsf{B}(#1)} \newcommand{\bbQ}{{\mathbb Q}} \newcommand{\bproof}{\textbf{Proof :}\quad} \newcommand{\bmuf}[2]{b_{#1,#2}} \newcommand{\card}{\mathrm{card}} \newcommand{\chunk}[3]{{#1}_{#2:#3}} \newcommand{\condtrans}[3]{p_{#1}(#2|#3)} \newcommand{\convprob}[1]{\stackrel{#1-\text{prob}}{\rightarrow}} \newcommand{\Cov}{\mathbb{C}\mathrm{ov}} \newcommand{\cro}[1]{\langle #1 \rangle} \newcommand{\CPE}[2]{\PE\lr{#1| #2}} \renewcommand{\det}{\mathrm{det}} \newcommand{\dimlabel}{\mathsf{m}} \newcommand{\dimU}{\mathsf{q}} \newcommand{\dimX}{\mathsf{d}} \newcommand{\dimY}{\mathsf{p}} \newcommand{\dlim}{\Rightarrow} \newcommand{\e}[1]{{\left\lfloor #1 \right\rfloor}} \newcommand{\eproof}{\quad \Box} \newcommand{\eremark}{</WRAP>} \newcommand{\eqdef}{:=} \newcommand{\eqlaw}{\stackrel{\mathcal{L}}{=}} \newcommand{\eqsp}{\;} \newcommand{\Eset}{ {\mathsf E}} \newcommand{\esssup}{\mathrm{essup}} \newcommand{\fr}[1]{{\left\langle #1 \right\rangle}} \newcommand{\falph}{f} \renewcommand{\geq}{\geqslant} \newcommand{\hchi}{\hat \chi} \newcommand{\Hset}{\mathsf{H}} \newcommand{\Id}{\mathrm{Id}} \newcommand{\img}{\text{Im}} \newcommand{\indi}[1]{\mathbf{1}_{#1}} \newcommand{\indiacc}[1]{\mathbf{1}_{\{#1\}}} \newcommand{\indin}[1]{\mathbf{1}\{#1\}} \newcommand{\itemm}{\quad \quad \blacktriangleright \;} \newcommand{\jointtrans}[3]{p_{#1}(#2,#3)} \newcommand{\ker}{\text{Ker}} \newcommand{\klbck}[2]{\mathrm{K}\lr{#1||#2}} \newcommand{\law}{\mathcal{L}} \newcommand{\labelinit}{\pi} \newcommand{\labelkernel}{Q} \renewcommand{\leq}{\leqslant} \newcommand{\lone}{\mathsf{L}_1} \newcommand{\lrav}[1]{\left|#1 \right|} \newcommand{\lr}[1]{\left(#1 \right)} \newcommand{\lrb}[1]{\left[#1 \right]} \newcommand{\lrc}[1]{\left\{#1 \right\}} \newcommand{\lrcb}[1]{\left\{#1 \right\}} \newcommand{\ltwo}[1]{\PE^{1/2}\lrb{\lrcb{#1}^2}} \newcommand{\Ltwo}{\mathrm{L}^2} \newcommand{\mc}[1]{\mathcal{#1}} \newcommand{\mcbb}{\mathcal B} \newcommand{\mcf}{\mathcal{F}} \newcommand{\meas}[1]{\mathrm{M}_{#1}} \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\normmat}[1]{{\left\vert\kern-0.25ex\left\vert\kern-0.25ex\left\vert #1 \right\vert\kern-0.25ex\right\vert\kern-0.25ex\right\vert}} \newcommand{\nset}{\mathbb N} \newcommand{\N}{\mathcal{N}} \newcommand{\one}{\mathsf{1}} \newcommand{\PE}{\mathbb E} \newcommand{\pminfty}{_{-\infty}^\infty} \newcommand{\PP}{\mathbb P} \newcommand{\projorth}[1]{\mathsf{P}^\perp_{#1}} \newcommand{\Psif}{\Psi_f} \newcommand{\pscal}[2]{\langle #1,#2\rangle} \newcommand{\pscal}[2]{\langle #1,#2\rangle} \newcommand{\psconv}{\stackrel{\PP-a.s.}{\rightarrow}} \newcommand{\qset}{\mathbb Q} \newcommand{\revcondtrans}[3]{q_{#1}(#2|#3)} \newcommand{\rmd}{\mathrm d} \newcommand{\rme}{\mathrm e} \newcommand{\rmi}{\mathrm i} \newcommand{\Rset}{\mathbb{R}} \newcommand{\rset}{\mathbb{R}} \newcommand{\rti}{\sigma} \newcommand{\section}[1]{==== #1 ====} \newcommand{\seq}[2]{\lrc{#1\eqsp: \eqsp #2}} \newcommand{\set}[2]{\lrc{#1\eqsp: \eqsp #2}} \newcommand{\sg}{\mathrm{sgn}} \newcommand{\supnorm}[1]{\left\|#1\right\|_{\infty}} \newcommand{\thv}{{\theta_\star}} \newcommand{\tmu}{ {\tilde{\mu}}} \newcommand{\Tset}{ {\mathsf{T}}} \newcommand{\Tsigma}{ {\mathcal{T}}} \newcommand{\ttheta}{{\tilde \theta}} \newcommand{\tv}[1]{\left\|#1\right\|_{\mathrm{TV}}} \newcommand{\unif}{\mathrm{Unif}} \newcommand{\weaklim}[1]{\stackrel{\mathcal{L}_{#1}}{\rightsquigarrow}} \newcommand{\Xset}{{\mathsf X}} \newcommand{\Xsigma}{\mathcal X} \newcommand{\Yset}{{\mathsf Y}} \newcommand{\Ysigma}{\mathcal Y} \newcommand{\Var}{\mathbb{V}\mathrm{ar}} \newcommand{\zset}{\mathbb{Z}} \newcommand{\Zset}{\mathsf{Z}} $$

2017/10/07 23:39 · douc

Gauss-Markov Theorem and other results

Basics on linear algebra

Let $\b{X}$ be a $n \times p$ matrix with real-valued entries. We define $\img(\b{X})=\set{\b{X}\b{y}}{\b{y} \in \rset^p}$ and $\ker(\b{X})=\set{\b{y} \in \rset^p}{\b{X}\b{y}=0}$. We can note that $\img(\b{X})$ and $\ker(\b{X}^T)$ are subspaces of $\rset^n$.

$\itemm$ $\img(\b{X}) \stackrel{\perp}{\oplus} \underbrace{\img(\b{X})^\perp}_{\ker(\b{X}^T)}= \rset^n$. Otherwise stated, $x$ is the orthogonal projection of $y$ on $\b{X}$ if and only if we have the two properties $x \in \img(\bf{X})$ and $(y-x) \in \ker(\b{X}^T)$. The fact that $\ker(\b{X}^T)$ is the orthogonal complement of $\img(\b{X})$ stems from the following remark: $z \in \img(\b{X})^\perp$ is equivalent to the fact that $\b{X}_i^T z=0$ where $\b{X}_1,\ldots,\b{X}_p$ are the column vectors of $\b{X}$ and this in turn is equivalent to $\bf{X}^Tz=0$.

$\itemm$ Denote by $P_{\img(\b{X})}$ the matrix of the orthogonal projection on $\img(\b{X})$. By abuse of notation, we also write $P_{\b{X}}$. We have $P_\b{X}=P_\b{X}^2=P_\b{X}^T$ and we can also note that $P_\b{X}=I$ on $\img(\b{X})$.

$\itemm$ An orthogonal projection is uniquely determined by the subspace on which it projects. This implies in particular the following property. Assume that $\b{X}$ is a $n \times p$ matrix and $\b{W}$ is a $n \times k$ matrix where we can possibly have $k \neq p$. As soon as $\img(\b{X})=\img(\b{W})$, we have $P_\b{X}=P_{\img(\b{X})}=P_{\img(\b{W})}=P_\b{W}$

$\itemm$ By the Pythagore identity, for all $a\in \rset^n$, we have $\|a\|^2=\|P_\b{X}a\|^2+\|(I-P_\b{X})a\|^2$ ($\star$) showing that $\|a\|\geq \|P_\b{X}a\|$ with equality only if $P_\b{X}a=a$.

$\itemm$ $\b{X}^g$ is a pseudo inverse of $\b{X}$ (and we note $\b{X}^g$) iff $\b{X} \b{X}^g \b{X}=\b{X}$ or, equivalently, for all $\lambda \in \img(\b{X})$, $\b{X} \b{X}^g \lambda=\lambda$. We admit that a pseudo inverse always exists.

Ordinary Least squares (OLS) estimator

Let $\b{y}$ be a vector of size $n$ and $\b{X}$ a $n \times p$ matrix. Since $P_\b{X}\b{y}$ is in $\img(\b{X})$, it can be written as $P_\b{X} \b{y}=\b{X} \b{\hat b}$ for some $\b{\hat b} \in \rset^p$.

The fundamental result. Theorem. The following properties are equivalent

$\b{\hat b}$ is such that $P_\bf{X} \b{y}=\b{X}\b{\hat b}$
$\b{\hat b}$ is such that $\|\b{y}-\b{X}\b{\hat b} \|=\inf_{b \in \rset^p}\|\b{y}-\b{X}\b{b}\|$
$\b{\hat b}$ satisfies the normal equations¹⁾: $\bf{X}^T\bf{X} \b{\hat b}=\bf{X}^T \b{y}$.

Moreover, $P_\b{X}=\bf{X}(\bf{X}^T \bf{X})^g\bf{X}^T$ for any choice of the pseudo inverse $(\bf{X}^T \bf{X})^g$.

A side effect of this theorem is that $\bf{X}(\bf{X}^T \bf{X})^g\bf{X}^T$ does not depend on the choice of the pseudo inverse $(\bf{X}^T \bf{X})^g$.

Proof of the fundamental result (click here)

The only difficult part is to show that the matrix defined in the last statement is the one of the orthogonal projector onto $\img(\b{X})$. Set $M=\bf{X}(\bf{X}^T \bf{X})^g\bf{X}^T$ and let $\b{y} \in \rset^n$. We show that $M\b{y}=P_\bf{X}\b{y}$ by checking successively that $M\b{y} \in \img(\b{X})$ and $\b{y} -M\b{y} \in \ker(\b{X}^T)$. The first statement is straightforward: $M \b{y}=\b{X}\lr{(\bf{X}^T \bf{X})^g\bf{X}^T \b{y}} \in \img(\b{X})$. The second statement is in two steps:

$\b{X}^T(I-M)\b{X}=\b{X}^T\b{X}-\b{X}^T\b{X}(\bf{X}^T \bf{X})^g\b{X}^T\b{X}=0$ showing that $\b{X}^T(I-M)=0$ on $\img(\b{X})$.
Moreover, for $z \in \ker(\b{X}^T)$, $\b{X}^T(I-M)z=\underbrace{\bf{X}^T z}_{0}-\b{X}^T\bf{X}(\bf{X}^T \bf{X})^g\underbrace{\bf{X}^T z}_{0}=0$.

Finally, $\b{X}^T(I-M)=0$ on $\img(\b{X})$ and $\ker(\b{X}^T)$ and therefore on $\img(\b{X}) \oplus \ker(\b{X}^T)=\rset^n$ so that $(I-M)\b{y} \in \ker(\b{X}^T)$. This concludes the proof.

We can now give the general solutions of the normal equations.

Solving the Normal equations. Theorem. The following equivalence holds true. $\b{\hat b}$ solves the normal equations iff there exists $\b{z}\in \rset^p$ such that $$ \b{\hat b}=(\bf{X}^T \bf{X})^g\bf{X}^T \b{y}+ \lrb{I-(\bf{X}^T \bf{X})^g\bf{X}^T \b{X}} \b{z} $$ for some pseudo-inverse $(\bf{X}^T \bf{X})^g$.

Solving the Normal equations (click here)

Indeed, if $\b{\hat b}$ can be written as in the statement of the Theorem, then applying $\b{X}$ we get $$ \b{X}\b{\hat b}=P_\b{X} \b{y}+ \lr{\underbrace{\b{X}-P_\b{X} \b{X}}_{0}}\b{z}=P_{\b{X}} \b{y}, $$ showing that $\b{\hat b}$ solves the normal equations by the fundemental result. Conversely, assume that $\b{\hat b}$ solves the normal equations. Then choosing any pseudo inverse $(\bf{X}^T \bf{X})^g$, \begin{align*} \b{\hat b}&=(\bf{X}^T \bf{X})^g\bf{X}^T \b{y}+ \b{\hat b}- (\bf{X}^T \bf{X})^g\underbrace{\bf{X}^T \b{y}}_{\b{X}^T\b{X} \b{\hat b}} &=(\bf{X}^T \bf{X})^g\bf{X}^T \b{y}+ \lrb{I-(\bf{X}^T \bf{X})^g\bf{X}^T \b{X}} \b{\hat b} \end{align*} which completes the proof.

Estimability and the Gauss-Markov theorem

We now consider $\b{y}=\b{X}\b{b}+e$ where $e$ is a zero mean vector with covariance matrix $\sigma^2 I_n$. We say that $\lambda^T \b{b}$ is estimable if there exists $a\in \rset^n$ such that $a^T\b{y}$ is an unbiased estimator of $\lambda^T \b{b}$, which is equivalent to $\lambda^T \b{b}=a^T\b{X}\b{b}$ for all $\b{b}\in \rset^p$ and therefore to $\lambda^T=a^T\b{X}$.

The Gauss-Markov Theorem . For any linear unbiased estimator $a^T \b{y}$ of $\lambda^T \b{b}$, $$ \Var(a^T \b{y})\geq \Var(\lambda^T \b{\hat b}) $$ where $\hat b$ is the least square estimator of $\b{b}$. We then say that $\lambda^T \b{\hat b}$ is BLUE (Best Linear Unbiased Estimator). Moreover, the equality only holds if $a^T \b{y}=\lambda^T\b{\hat b}$, saying that the BLUE is unique.

A short proof of the Gauss-Markov Theorem (click here to see it)

Note that since $a^T\b{y}$ is an unbiased estimator of $\lambda^T \b{b}$, we have $\lambda^T =a^T\b{X}$. This implies that $$ \lambda^T \b{\hat b}=a^T\b{X}\b{\hat b}=a^T P_\b{X}\b{y} \quad (\star) $$ Then, $\PE[\lambda^T \b{\hat b}]=a^T P_\b{X} \b{X}\b{b}=a^T \b{X}\b{b}=\lambda^T \b{b}$, showing that $\lambda^T \b{\hat b}$ is unbiased. Moreover, using again ($\star$), $\lambda^T \b{\hat b}=a^T P^T_\b{X}\b{y}=(P_\b{X}a)^T \b{y}$ so that, (see $\star$) $$ \Var(\lambda^T \b{\hat b})= \sigma^2 \|P_\b{X}a\|^2 \leq \sigma^2 \|a\|^2 =\Var(a^T \b{y}) $$ with equality only if $P_\b{X}a=a$ which implies $\lambda^T \b{\hat b}=(P_\b{X}a)^T \b{y}=a^T \b{y}$.

In the curse of the proof, we have seen that if $\lambda^T \b{b}$ is estimable, then choosing $a$ such that $\lambda^T=a^T\b{X}$, $$ \Var(\lambda^T \b{\hat b})= \sigma^2 \|P_\b{X}a\|^2=\sigma^2 a^T P_\b{X}^T P_\b{X} a=\sigma^2 a^T P_\b{X} a=\sigma^2 a^T \b{X} (\b{X}^T\b{X})^g \b{X}^T a=\sigma^2 \lambda^T (\b{X}^T\b{X})^g \lambda, $$ and a side effect is that $\lambda^T (\b{X}^T\b{X})^g \lambda$ does not depend on the chosen pseudo-inverse whenever $\lambda \in \img(\b{X}^T)$.

Video of a short course on the Gauss Markov Theorem

¹⁾ For the third point, the equivalence with the other statements follows from the fact that for all $\b{b}$, $(\b{y}-\b{X}\b{\hat b})\perp \b{X}\b{b}$ iff for all $\b{b}$, $\b{b}^T\b{X}^T(\b{y}-\b{X}\b{\hat b})=0$ which is equivalent to $\b{X}^T(\b{y}-\b{X}\b{\hat b})=0$, which are the normal equations.

Welcome to Randal Douc's wiki

Sidebar

Wiki

Wiki

Courses and public working groups

Courses and public working groups

Private Working Groups

Private Working Groups

Personal Notes

Personal Notes

Réponses

Réponses

Miscellanous

Miscellanous

Table of Contents

Gauss-Markov Theorem and other results

Basics on linear algebra

Ordinary Least squares (OLS) estimator

Estimability and the Gauss-Markov theorem

Welcome to Randal Douc's wiki

User Tools

Site Tools

Sidebar

Wiki

Wiki

Courses and public working groups

Courses and public working groups

Private Working Groups

Private Working Groups

Personal Notes

Personal Notes

Réponses

Réponses

Miscellanous

Miscellanous

Table of Contents

Gauss-Markov Theorem and other results

Basics on linear algebra

Ordinary Least squares (OLS) estimator

Estimability and the Gauss-Markov theorem

Page Tools