{{page>:defs}} ====== Statement ====== Let $\seq{X_n}{n \in \nset}$ be a sequence of iid random vectors taking values on $\rset^p$ on the same probability space $(\Omega,\mcf, \PP)$. For any $k \in [1:n]$, denote by $X^k_{(n)}$ the $k$-th nearest neighboor of $X_0$ among the set $\set{X_k}{k \in [1:n]}$, that is we have the properties * $\|X_0-X^1_{(n)}\| \leq \|X_0-X^2_{(n)}\| \leq \ldots \leq \|X_0-X^n_{(n)}\|$. * $\{X^1_{(n)}, \ldots, X^n_{(n)}\}=\{X_1,\ldots,X_n\}$. The following proposition proves the consistency property of the kNN (kth nearest neighboor). **__Proposition__** For any $k \in \nset$, we have $\PP$-a.s., $$ \lim_{n \to \infty} X^k_{(n)}=X_0 $$ ===== Proof ===== Note that since $n \mapsto \|X_0-X^k_{(n)}\|$ is non-increasing, $\lim_n X^k_{(n)}=X_0$ means that for any $\epsilon \in \bbQ_+^*$, the open ball $\ball{X_0,\epsilon}$ contains at least $k$ points among $(X_n)_{n \geq 1}$. Conversely, if $\lim_n X^k_{(n)}=X_0$ does not hold, it means that for some $\epsilon \in \bbQ_+^*$, the ball $\ball{X_0,\epsilon}$ contains at most $k-1$ points among $(X_n)_{n \geq 1}$. This in turn implies that $\| X_0- X_{m} \| \geq \epsilon$ for any sufficiently large $m$. Hence, $$ \{\lim_{n \to \infty} X^k_{(n)}=X_0\}^c \subset \cup_{(\epsilon,n) \in \bbQ^*_+ \times \nset^*}A_{\epsilon,n} \quad \mbox{where} \quad A_{\epsilon,n}=\{\forall m \geq n, \ \| X_0- X_{m} \| \geq \epsilon\} $$ From this inclusion property, in order to show that $\PP$-a.s., $\lim_{n \to \infty} X^k_{(n)}=X_0$ and hence $\PP\lr{\{\lim_{n \to \infty} X^k_{(n)}=X_0\}^c}=0$, we only need to show that $\PP(A_{\epsilon,n})=0$ for any $ (\epsilon,n) \in \bbQ^*_+ \times \nset^* $. Using that the $(X_m)_{m \geq 0}$ are iid, \begin{align*} \PP(A_{\epsilon,n})&=\PP(A_{\epsilon,1})=\PP(\cap_{m=1}^\infty \{\| X_0- X_{m} \| \geq \epsilon\})=\PE\lrb{\lim_{\ell \to \infty} \PE\lrb{\left. \prod_{m=1}^\ell\indiacc{\| X_0- X_{m} \| \geq \epsilon} \right| X_0 }}= \PE\lrb{\lim_{\ell \to \infty} h(X_0)^\ell} \end{align*} where $h(X_0)=\PP\lr{ \| X_0- X_{1} \| \geq \epsilon | X_0 }$. To complete the proof, we only need to show that $h(X_0) \in [0,1)$, $\PP$-a.s. Equivalently, since we already have $h(X_0) \in [0,1]$, we only need to show that \begin{equation} \label{eq:h} \PP\lr{ h(X_0)=1}=0 \end{equation} Choose a $\epsilon/2$-net countable covering of $\rset^p$, that is $\rset^p \subset \cup_{n \in \nset} A_n$ where $A_n = \ball{a_n,\epsilon/2}$. Note that on $\{h(X_0)=1\}\cap \{X_0 \in A_n\} $, \begin{align*} 1=h(X_0)=\PP\lr{ \| X_0- X_{1} \| \geq \epsilon | X_0 } &\leq \PP\lr{ \| X_0-a_n\| + \| a_n-X_{1} \| \geq \epsilon | X_0 } \\ &\leq \PP\lr{ \epsilon/2 + \| a_n-X_{1} \| \geq \epsilon | X_0 }= \PP\lr{\| a_n-X_{1} \|\geq \epsilon/2 | X_0 }\\ &=\PP\lr{\| a_n-X_{1} \| \geq \epsilon/2 }=\PP(X_1 \notin A_n)=\PP(X_0 \notin A_n) \end{align*} Hence, $\{h(X_0)=1\}\cap \{X_0 \in A_n\} \subset \{1=\PP(X_0 \notin A_n)\}\cap \{X_0 \in A_n\}$ $$ \PP\lr{h(X_0)=1,X_0 \in A_n} \leq \PP\lr{1 = \PP(X_0 \notin A_n),X_0 \in A_n}=\indiacc{1 = \PP(X_0 \notin A_n)} \PP(X_0 \in A_n) =0 $$ This implies that $$ \PP\lr{ h(X_0)=1} = \PP\lr{ h(X_0)=1, X_0 \in \cup_{n \in \nset}A_n } \leq \sum_{n \in \nset} \PP\lr{h(X_0)=1,X_0 \in A_n}=0 $$ which proves \eqref{eq:h} and the proof of the Proposition is completed.