Definition
Measure theory
To define the Hellinger distance in terms of measure theory , let
P
{\displaystyle P}
and
Q
{\displaystyle Q}
denote two probability measures on a measure space
X
{\displaystyle {\mathcal {X}}}
that are absolutely continuous with respect to an auxiliary measure
λ
{\displaystyle \lambda }
. Such a measure always exists, e.g
λ
=
(
P
+
Q
)
{\displaystyle \lambda =(P+Q)}
. The square of the Hellinger distance between
P
{\displaystyle P}
and
Q
{\displaystyle Q}
is defined as the quantity
H
2
(
P
,
Q
)
=
1
2
∫
X
(
p
(
x
)
−
q
(
x
)
)
2
λ
(
d
x
)
.
{\displaystyle H^{2}(P,Q)={\frac {1}{2}}\displaystyle \int _{\mathcal {X}}\left({\sqrt {p(x)}}-{\sqrt {q(x)}}\right)^{2}\lambda (dx).}
Here,
P
(
d
x
)
=
p
(
x
)
λ
(
d
x
)
{\displaystyle P(dx)=p(x)\lambda (dx)}
and
Q
(
d
x
)
=
q
(
x
)
λ
(
d
x
)
{\displaystyle Q(dx)=q(x)\lambda (dx)}
, i.e.
p
{\displaystyle p}
and
q
{\displaystyle q}
are the Radon–Nikodym derivatives of P and Q respectively with respect to
λ
{\displaystyle \lambda }
. This definition does not depend on
λ
{\displaystyle \lambda }
, i.e. the Hellinger distance between P and Q does not change if
λ
{\displaystyle \lambda }
is replaced with a different probability measure with respect to which both P and Q are absolutely continuous. For compactness, the above formula is often written as
H
2
(
P
,
Q
)
=
1
2
∫
X
(
P
(
d
x
)
−
Q
(
d
x
)
)
2
.
{\displaystyle H^{2}(P,Q)={\frac {1}{2}}\int _{\mathcal {X}}\left({\sqrt {P(dx)}}-{\sqrt {Q(dx)}}\right)^{2}.}
Probability theory using Lebesgue measure
To define the Hellinger distance in terms of elementary probability theory, we take λ to be the Lebesgue measure , so that dP / dλ and dQ / d λ are simply probability density functions . If we denote the densities as f and g , respectively, the squared Hellinger distance can be expressed as a standard calculus integral
H
2
(
f
,
g
)
=
1
2
∫
(
f
(
x
)
−
g
(
x
)
)
2
d
x
=
1
−
∫
f
(
x
)
g
(
x
)
d
x
,
{\displaystyle H^{2}(f,g)={\frac {1}{2}}\int \left({\sqrt {f(x)}}-{\sqrt {g(x)}}\right)^{2}\,dx=1-\int {\sqrt {f(x)g(x)}}\,dx,}
where the second form can be obtained by expanding the square and using the fact that the integral of a probability density over its domain equals 1.
The Hellinger distance H (P , Q ) satisfies the property (derivable from the Cauchy–Schwarz inequality )
0
≤
H
(
P
,
Q
)
≤
1.
{\displaystyle 0\leq H(P,Q)\leq 1.}
Discrete distributions
For two discrete probability distributions
P
=
(
p
1
,
…
,
p
k
)
{\displaystyle P=(p_{1},\ldots ,p_{k})}
and
Q
=
(
q
1
,
…
,
q
k
)
{\displaystyle Q=(q_{1},\ldots ,q_{k})}
,
their Hellinger distance is defined as
H
(
P
,
Q
)
=
1
2
∑
i
=
1
k
(
p
i
−
q
i
)
2
,
{\displaystyle H(P,Q)={\frac {1}{\sqrt {2}}}\;{\sqrt {\sum _{i=1}^{k}({\sqrt {p_{i}}}-{\sqrt {q_{i}}})^{2}}},}
which is directly related to the Euclidean norm of the difference of the square root vectors, i.e.
H
(
P
,
Q
)
=
1
2
‖
P
−
Q
‖
2
.
{\displaystyle H(P,Q)={\frac {1}{\sqrt {2}}}\;{\bigl \|}{\sqrt {P}}-{\sqrt {Q}}{\bigr \|}_{2}.}
Also,
1
−
H
2
(
P
,
Q
)
=
∑
i
=
1
k
p
i
q
i
.
{\displaystyle 1-H^{2}(P,Q)=\sum _{i=1}^{k}{\sqrt {p_{i}q_{i}}}.}
[ citation needed ]
Properties
The Hellinger distance forms a bounded metric on the space of probability distributions over a given probability space .
The maximum distance 1 is achieved when P assigns probability zero to every set to which Q assigns a positive probability, and vice versa.
Sometimes the factor
1
/
2
{\displaystyle 1/{\sqrt {2}}}
in front of the integral is omitted, in which case the Hellinger distance ranges from zero to the square root of two.
The Hellinger distance is related to the Bhattacharyya coefficient
B
C
(
P
,
Q
)
{\displaystyle BC(P,Q)}
as it can be defined as
H
(
P
,
Q
)
=
1
−
B
C
(
P
,
Q
)
.
{\displaystyle H(P,Q)={\sqrt {1-BC(P,Q)}}.}
Hellinger distances are used in the theory of sequential and asymptotic statistics .[ 5] [ 6]
The squared Hellinger distance between two normal distributions
P
∼
N
(
μ
1
,
σ
1
2
)
{\displaystyle P\sim {\mathcal {N}}(\mu _{1},\sigma _{1}^{2})}
and
Q
∼
N
(
μ
2
,
σ
2
2
)
{\displaystyle Q\sim {\mathcal {N}}(\mu _{2},\sigma _{2}^{2})}
is:
H
2
(
P
,
Q
)
=
1
−
2
σ
1
σ
2
σ
1
2
+
σ
2
2
e
−
1
4
(
μ
1
−
μ
2
)
2
σ
1
2
+
σ
2
2
.
{\displaystyle H^{2}(P,Q)=1-{\sqrt {\frac {2\sigma _{1}\sigma _{2}}{\sigma _{1}^{2}+\sigma _{2}^{2}}}}\,e^{-{\frac {1}{4}}{\frac {(\mu _{1}-\mu _{2})^{2}}{\sigma _{1}^{2}+\sigma _{2}^{2}}}}.}
The squared Hellinger distance between two multivariate normal distributions
P
∼
N
(
μ
1
,
Σ
1
)
{\displaystyle P\sim {\mathcal {N}}(\mu _{1},\Sigma _{1})}
and
Q
∼
N
(
μ
2
,
Σ
2
)
{\displaystyle Q\sim {\mathcal {N}}(\mu _{2},\Sigma _{2})}
is [ 7]
H
2
(
P
,
Q
)
=
1
−
det
(
Σ
1
)
1
/
4
det
(
Σ
2
)
1
/
4
det
(
Σ
1
+
Σ
2
2
)
1
/
2
exp
{
−
1
8
(
μ
1
−
μ
2
)
T
(
Σ
1
+
Σ
2
2
)
−
1
(
μ
1
−
μ
2
)
}
{\displaystyle H^{2}(P,Q)=1-{\frac {\det(\Sigma _{1})^{1/4}\det(\Sigma _{2})^{1/4}}{\det \left({\frac {\Sigma _{1}+\Sigma _{2}}{2}}\right)^{1/2}}}\exp \left\{-{\frac {1}{8}}(\mu _{1}-\mu _{2})^{T}\left({\frac {\Sigma _{1}+\Sigma _{2}}{2}}\right)^{-1}(\mu _{1}-\mu _{2})\right\}}
The squared Hellinger distance between two exponential distributions
P
∼
E
x
p
(
α
)
{\displaystyle P\sim \mathrm {Exp} (\alpha )}
and
Q
∼
E
x
p
(
β
)
{\displaystyle Q\sim \mathrm {Exp} (\beta )}
is:
H
2
(
P
,
Q
)
=
1
−
2
α
β
α
+
β
.
{\displaystyle H^{2}(P,Q)=1-{\frac {2{\sqrt {\alpha \beta }}}{\alpha +\beta }}.}
The squared Hellinger distance between two Weibull distributions
P
∼
W
(
k
,
α
)
{\displaystyle P\sim \mathrm {W} (k,\alpha )}
and
Q
∼
W
(
k
,
β
)
{\displaystyle Q\sim \mathrm {W} (k,\beta )}
(where
k
{\displaystyle k}
is a common shape parameter and
α
,
β
{\displaystyle \alpha \,,\beta }
are the scale parameters respectively):
H
2
(
P
,
Q
)
=
1
−
2
(
α
β
)
k
/
2
α
k
+
β
k
.
{\displaystyle H^{2}(P,Q)=1-{\frac {2(\alpha \beta )^{k/2}}{\alpha ^{k}+\beta ^{k}}}.}
The squared Hellinger distance between two Poisson distributions with rate parameters
α
{\displaystyle \alpha }
and
β
{\displaystyle \beta }
, so that
P
∼
P
o
i
s
s
o
n
(
α
)
{\displaystyle P\sim \mathrm {Poisson} (\alpha )}
and
Q
∼
P
o
i
s
s
o
n
(
β
)
{\displaystyle Q\sim \mathrm {Poisson} (\beta )}
, is:
H
2
(
P
,
Q
)
=
1
−
e
−
1
2
(
α
−
β
)
2
.
{\displaystyle H^{2}(P,Q)=1-e^{-{\frac {1}{2}}({\sqrt {\alpha }}-{\sqrt {\beta }})^{2}}.}
The squared Hellinger distance between two beta distributions
P
∼
Beta
(
a
1
,
b
1
)
{\displaystyle P\sim {\text{Beta}}(a_{1},b_{1})}
and
Q
∼
Beta
(
a
2
,
b
2
)
{\displaystyle Q\sim {\text{Beta}}(a_{2},b_{2})}
is:
H
2
(
P
,
Q
)
=
1
−
B
(
a
1
+
a
2
2
,
b
1
+
b
2
2
)
B
(
a
1
,
b
1
)
B
(
a
2
,
b
2
)
{\displaystyle H^{2}(P,Q)=1-{\frac {B\left({\frac {a_{1}+a_{2}}{2}},{\frac {b_{1}+b_{2}}{2}}\right)}{\sqrt {B(a_{1},b_{1})B(a_{2},b_{2})}}}}
where
B
{\displaystyle B}
is the beta function .
The squared Hellinger distance between two gamma distributions
P
∼
Gamma
(
a
1
,
b
1
)
{\displaystyle P\sim {\text{Gamma}}(a_{1},b_{1})}
and
Q
∼
Gamma
(
a
2
,
b
2
)
{\displaystyle Q\sim {\text{Gamma}}(a_{2},b_{2})}
is:
H
2
(
P
,
Q
)
=
1
−
Γ
(
a
1
+
a
2
2
)
(
b
1
+
b
2
2
)
−
(
a
1
+
a
2
)
/
2
b
1
a
1
b
2
a
2
Γ
(
a
1
)
Γ
(
a
2
)
{\displaystyle H^{2}(P,Q)=1-\Gamma \left({\scriptstyle {\frac {a_{1}+a_{2}}{2}}}\right)\left({\frac {b_{1}+b_{2}}{2}}\right)^{-(a_{1}+a_{2})/2}{\sqrt {\frac {b_{1}^{a_{1}}b_{2}^{a_{2}}}{\Gamma (a_{1})\Gamma (a_{2})}}}}
where
Γ
{\displaystyle \Gamma }
is the gamma function .