A multivariate extension of Azadkia-Chatterjee's rank coefficient
The Azadkia-Chatterjee coefficient is a rank-based measure of dependence between a random variable $Y \in \mathbb{R}$ and a random vector ${\boldsymbol Z} \in \mathbb{R}^{d_Z}$. In this paper, we propose a multivariate extension that measures the dependence between random vectors ${\boldsymbol Y} \in \mathbb{R}^{d_Y}$ and ${\boldsymbol Z} \in \mathbb{R}^{d_Z}$, based on $n$ i.i.d. samples. The proposed coefficient converges almost surely to a limit with the following properties: i) it lies in $[0, 1]$; ii) it is equal to zero if and only if ${\boldsymbol Y}$ and ${\boldsymbol Z}$ are independent; and iii) it is equal to one if and only if ${\boldsymbol Y}$ is almost surely a function of ${\boldsymbol Z}$. Remarkably, the only assumption required by this convergence is that ${\boldsymbol Y}$ is not almost surely a constant vector. We further prove that under the same mild condition and after a proper scaling, this coefficient converges in distribution to a standard normal random variable when ${\boldsymbol Y}$ and ${\boldsymbol Z}$ are independent. This asymptotic normality result allows us to construct a Wald-type hypothesis test of independence based on this coefficient. To compute this coefficient, we propose a merge sort based algorithm that runs in $O(n (\log n)^{d_Y})$. Finally, we show that it can be used to measure the conditional dependence between ${\boldsymbol Y}$ and ${\boldsymbol Z}$ conditional on a third random vector ${\boldsymbol X}$, and prove that the measure is monotonic with respect to the deviation from an independence distribution under certain model restrictions.