{{page>:defs}}
====== Gauss-Markov Theorem and other results ======
===== Basics on linear algebra =====
Let $\b{X}$ be a $n \times p$ matrix with real-valued entries. We define $\img(\b{X})=\set{\b{X}\b{y}}{\b{y} \in \rset^p}$ and $\ker(\b{X})=\set{\b{y} \in \rset^p}{\b{X}\b{y}=0}$.
We can note that $\img(\b{X})$ and $\ker(\b{X}^T)$ are subspaces of $\rset^n$.
$\itemm$
$\img(\b{X}) \stackrel{\perp}{\oplus} \underbrace{\img(\b{X})^\perp}_{\ker(\b{X}^T)}= \rset^n$. Otherwise stated, $x$ is the
orthogonal projection of $y$ on $\b{X}$ if and only if we have the two properties $x \in \img(\bf{X})$ and $(y-x) \in \ker(\b{X}^T)$.
The fact that $\ker(\b{X}^T)$ is the orthogonal complement of $\img(\b{X})$ stems from the following remark: $z \in \img(\b{X})^\perp$ is equivalent to the fact that $\b{X}_i^T z=0$ where $\b{X}_1,\ldots,\b{X}_p$ are the column vectors of $\b{X}$
and this in turn is equivalent to $\bf{X}^Tz=0$.
$\itemm$
Denote by $P_{\img(\b{X})}$ the matrix of the orthogonal projection on $\img(\b{X})$. By abuse of notation, we also write $P_{\b{X}}$. We have $P_\b{X}=P_\b{X}^2=P_\b{X}^T$
and we can also note that $P_\b{X}=I$ on $\img(\b{X})$.
$\itemm$
An orthogonal projection is uniquely determined by the subspace on which it projects. This implies in particular the following property.
Assume that $\b{X}$ is a $n \times p$ matrix and $\b{W}$ is a $n \times k$ matrix where we can possibly have $k \neq p$.
As soon as $\img(\b{X})=\img(\b{W})$, we have $P_\b{X}=P_{\img(\b{X})}=P_{\img(\b{W})}=P_\b{W}$
{{anchor:star}}
$\itemm$
By the Pythagore identity, for all $a\in \rset^n$, we have $\|a\|^2=\|P_\b{X}a\|^2+\|(I-P_\b{X})a\|^2$ ($\star$) showing that $\|a\|\geq \|P_\b{X}a\|$ with equality only if $P_\b{X}a=a$.
$\itemm$
$\b{X}^g$ is a pseudo inverse of $\b{X}$ (and we note $\b{X}^g$) iff $\b{X} \b{X}^g \b{X}=\b{X}$
or, equivalently, for all $\lambda \in \img(\b{X})$, $\b{X} \b{X}^g \lambda=\lambda$. We admit that a pseudo inverse always exists.
/*
$\itemm$
$\ker(\b{X}^T \b{X})= \ker(\b{X})$[(This follows from $\lambda^T \b{X}^T \b{X}\lambda=\|\b{X} \lambda\|^2$ for all $\lambda \in \rset^p$)] and translating this equality on the orthogonal complement of these spaces, we get also
$\img(\b{X}^T\b{X})=\img(\b{X}^T)$.
*/
===== Ordinary Least squares (OLS) estimator =====
Let $\b{y}$ be a vector of size $n$ and $\b{X}$ a $n \times p$ matrix. Since $P_\b{X}\b{y}$ is in $\img(\b{X})$, it can be written as
$P_\b{X} \b{y}=\b{X} \b{\hat b}$ for some $\b{\hat b} \in \rset^p$.
__The fundamental result__. **Theorem**. The following properties are equivalent
* $\b{\hat b}$ is such that $P_\bf{X} \b{y}=\b{X}\b{\hat b}$
* $\b{\hat b}$ is such that $\|\b{y}-\b{X}\b{\hat b} \|=\inf_{b \in \rset^p}\|\b{y}-\b{X}\b{b}\|$
* $\b{\hat b}$ satisfies the **normal equations**[(For the third point, the equivalence with the other statements follows from the fact that for all $\b{b}$, $(\b{y}-\b{X}\b{\hat b})\perp \b{X}\b{b}$ iff for all $\b{b}$,
$\b{b}^T\b{X}^T(\b{y}-\b{X}\b{\hat b})=0$ which is equivalent to $\b{X}^T(\b{y}-\b{X}\b{\hat b})=0$, which are the normal equations.)]: $\bf{X}^T\bf{X} \b{\hat b}=\bf{X}^T \b{y}$.
Moreover, $P_\b{X}=\bf{X}(\bf{X}^T \bf{X})^g\bf{X}^T$ for any choice of the pseudo inverse $(\bf{X}^T \bf{X})^g$.
A side effect of this theorem is that $\bf{X}(\bf{X}^T \bf{X})^g\bf{X}^T$ does not depend on the choice of the pseudo inverse $(\bf{X}^T \bf{X})^g$.
The only difficult part is to show that the matrix defined in the last statement is the one of the orthogonal projector onto $\img(\b{X})$.
Set $M=\bf{X}(\bf{X}^T \bf{X})^g\bf{X}^T$ and let $\b{y} \in \rset^n$. We show that $M\b{y}=P_\bf{X}\b{y}$ by checking successively that $M\b{y} \in \img(\b{X})$ and $\b{y} -M\b{y} \in \ker(\b{X}^T)$.
The first statement is straightforward: $M \b{y}=\b{X}\lr{(\bf{X}^T \bf{X})^g\bf{X}^T \b{y}} \in \img(\b{X})$. The second statement is in two steps:
* $\b{X}^T(I-M)\b{X}=\b{X}^T\b{X}-\b{X}^T\b{X}(\bf{X}^T \bf{X})^g\b{X}^T\b{X}=0$ showing that $\b{X}^T(I-M)=0$ on $\img(\b{X})$.
* Moreover, for $z \in \ker(\b{X}^T)$, $\b{X}^T(I-M)z=\underbrace{\bf{X}^T z}_{0}-\b{X}^T\bf{X}(\bf{X}^T \bf{X})^g\underbrace{\bf{X}^T z}_{0}=0$.
Finally, $\b{X}^T(I-M)=0$ on $\img(\b{X})$ and $\ker(\b{X}^T)$ and therefore on $\img(\b{X}) \oplus \ker(\b{X}^T)=\rset^n$ so that $(I-M)\b{y} \in \ker(\b{X}^T)$.
This concludes the proof.
We can now give the general solutions of the normal equations.
__Solving the Normal equations__. **Theorem**. The following equivalence holds true. $\b{\hat b}$ solves the normal equations iff there exists $\b{z}\in \rset^p$ such that
$$
\b{\hat b}=(\bf{X}^T \bf{X})^g\bf{X}^T \b{y}+ \lrb{I-(\bf{X}^T \bf{X})^g\bf{X}^T \b{X}} \b{z}
$$
for some pseudo-inverse $(\bf{X}^T \bf{X})^g$.
Indeed, if $\b{\hat b}$ can be written as in the statement of the Theorem, then applying $\b{X}$ we get
$$
\b{X}\b{\hat b}=P_\b{X} \b{y}+ \lr{\underbrace{\b{X}-P_\b{X} \b{X}}_{0}}\b{z}=P_{\b{X}} \b{y},
$$
showing that $\b{\hat b}$ solves the normal equations by the fundemental result.
Conversely, assume that $\b{\hat b}$ solves the normal equations. Then choosing any pseudo inverse $(\bf{X}^T \bf{X})^g$,
\begin{align*}
\b{\hat b}&=(\bf{X}^T \bf{X})^g\bf{X}^T \b{y}+ \b{\hat b}- (\bf{X}^T \bf{X})^g\underbrace{\bf{X}^T \b{y}}_{\b{X}^T\b{X} \b{\hat b}}
&=(\bf{X}^T \bf{X})^g\bf{X}^T \b{y}+ \lrb{I-(\bf{X}^T \bf{X})^g\bf{X}^T \b{X}} \b{\hat b}
\end{align*}
which completes the proof.
===== Estimability and the Gauss-Markov theorem =====
We now consider $\b{y}=\b{X}\b{b}+e$ where $e$ is a zero mean vector
with covariance matrix $\sigma^2 I_n$. We say that $\lambda^T \b{b}$ is //**estimable**// if there exists $a\in \rset^n$ such that $a^T\b{y}$ is an
unbiased estimator of $\lambda^T \b{b}$, which is equivalent to $\lambda^T \b{b}=a^T\b{X}\b{b}$ for all $\b{b}\in \rset^p$ and therefore to
$\lambda^T=a^T\b{X}$.
__The Gauss-Markov Theorem__ . For any linear unbiased estimator $a^T \b{y}$ of $\lambda^T \b{b}$,
$$
\Var(a^T \b{y})\geq \Var(\lambda^T \b{\hat b})
$$
where $\hat b$ is the least square estimator of $\b{b}$. We then say that $\lambda^T \b{\hat b}$ is BLUE (Best Linear Unbiased Estimator).
Moreover, the equality only holds if $a^T \b{y}=\lambda^T\b{\hat b}$, saying that the BLUE is unique.
Note that since $a^T\b{y}$ is an
unbiased estimator of $\lambda^T \b{b}$, we have $\lambda^T =a^T\b{X}$. This implies that
$$
\lambda^T \b{\hat b}=a^T\b{X}\b{\hat b}=a^T P_\b{X}\b{y} \quad (\star)
$$
Then, $\PE[\lambda^T \b{\hat b}]=a^T P_\b{X} \b{X}\b{b}=a^T \b{X}\b{b}=\lambda^T \b{b}$, showing that $\lambda^T \b{\hat b}$ is unbiased.
Moreover, using again ($\star$), $\lambda^T \b{\hat b}=a^T P^T_\b{X}\b{y}=(P_\b{X}a)^T \b{y}$ so that, (see [[world:gauss_markov#$\star$]])
$$
\Var(\lambda^T \b{\hat b})= \sigma^2 \|P_\b{X}a\|^2 \leq \sigma^2 \|a\|^2 =\Var(a^T \b{y})
$$
with equality only if $P_\b{X}a=a$ which implies $\lambda^T \b{\hat b}=(P_\b{X}a)^T \b{y}=a^T \b{y}$.
In the curse of the proof, we have seen that if $\lambda^T \b{b}$ is estimable, then choosing $a$ such that $\lambda^T=a^T\b{X}$,
$$
\Var(\lambda^T \b{\hat b})= \sigma^2 \|P_\b{X}a\|^2=\sigma^2 a^T P_\b{X}^T P_\b{X} a=\sigma^2 a^T P_\b{X} a=\sigma^2 a^T \b{X} (\b{X}^T\b{X})^g \b{X}^T a=\sigma^2 \lambda^T (\b{X}^T\b{X})^g \lambda,
$$
and a side effect is that $\lambda^T (\b{X}^T\b{X})^g \lambda$ does not depend on the chosen pseudo-inverse whenever $\lambda \in \img(\b{X}^T)$.
----
[[https://youtu.be/n8ROeIlZkSI|Video of a short course on the Gauss Markov Theorem]]