Jensen–Shannon divergence - Wikipedia
Jump to content
Search
Search
Donate
Create account
Log in
Personal tools
Donate
Create account
Log in
Jensen–Shannon divergence
4 languages
Català<br>Polski<br>Русский<br>Tiếng Việt
Edit links
From Wikipedia, the free encyclopedia
Statistical distance measure
In probability theory and statistics, the Jensen–Shannon divergence , named after Johan Jensen and Claude Shannon, is a method of measuring the similarity between two probability distributions. It is also known as information radius (IRad )[1][2] or total divergence to the average .[3] It is based on the Kullback–Leibler divergence, with some notable (and useful) differences, including that it is symmetric and it always has a finite value. The square root of the Jensen–Shannon divergence is a metric often referred to as Jensen–Shannon distance. The similarity between the distributions is greater when the Jensen-Shannon distance is closer to zero.[4][5][6]
Definition<br>[edit]
Consider the set
{\displaystyle M_{+}^{1}(A)}
of probability distributions where
{\displaystyle A}
is a set provided with some σ-algebra of measurable subsets. In particular we can take
{\displaystyle A}
to be a finite or countable set with all subsets being measurable.
The Jensen–Shannon divergence (JSD) is a symmetrized and smoothed version of the Kullback–Leibler divergence
{\displaystyle D(P\parallel Q)}
. It is defined by
{\displaystyle {\rm {JSD}}(P\parallel Q)={\frac {1}{2}}D(P\parallel M)+{\frac {1}{2}}D(Q\parallel M),}
where
{\displaystyle M={\frac {1}{2}}(P+Q)}
is a mixture distribution of
{\displaystyle P}
and
{\displaystyle Q}
The geometric Jensen–Shannon divergence[7] (or G-Jensen–Shannon divergence) yields a closed-form formula for divergence between two Gaussian distributions by taking the geometric mean.
A more general definition, allowing for the comparison of more than two probability distributions, is:
{\displaystyle {\begin{aligned}{\rm {JSD}}_{\pi _{1},\ldots ,\pi _{n}}(P_{1},P_{2},\ldots ,P_{n})&=\sum _{i}\pi _{i}D(P_{i}\parallel M)\\&=H\left(M\right)-\sum _{i=1}^{n}\pi _{i}H(P_{i})\end{aligned}}}
where
:=
{\displaystyle {\begin{aligned}M&:=\sum _{i=1}^{n}\pi _{i}P_{i}\end{aligned}}}
and
{\displaystyle \pi _{1},\ldots ,\pi _{n}}
are weights that are selected for the probability distributions
{\displaystyle P_{1},P_{2},\ldots ,P_{n}}
, and
{\displaystyle H(P)}
is the Shannon entropy for distribution
{\displaystyle P}
. For the two-distribution case described above,
{\displaystyle P_{1}=P,P_{2}=Q,\pi _{1}=\pi _{2}={\frac {1}{2}}.\ }
Hence, for those distributions
{\displaystyle P,Q}
{\displaystyle JSD=H(M)-{\frac {1}{2}}{\bigg (}H(P)+H(Q){\bigg )}}
Bounds<br>[edit]
The Jensen–Shannon divergence is bounded by 1 for two discrete probability distributions, given that one uses the base 2 logarithm:[8]
{\displaystyle 0\leq {\rm {JSD}}(P\parallel Q)\leq 1}
With this normalization, it is a lower bound on the total variation distance between P and Q:
{\displaystyle {\rm {JSD}}(P\parallel Q)\leq {\frac {1}{2}}\|P-Q\|_{1}={\frac {1}{2}}\sum _{\omega \in \Omega }|P(\omega )-Q(\omega )|}
With base-e logarithm, which is commonly used in statistical thermodynamics, the upper bound is
ln
{\displaystyle \ln(2)}
. In general, the bound in base b is
log
{\displaystyle \log _{b}(2)}
log
{\displaystyle 0\leq {\rm {JSD}}(P\parallel Q)\leq \log _{b}(2)}
A more general bound, the Jensen–Shannon divergence is bounded by
log
{\displaystyle \log _{b}(n)}
for more than two probability distributions:[8]
log
{\displaystyle 0\leq {\rm {JSD}}_{\pi _{1},\ldots ,\pi _{n}}(P_{1},P_{2},\ldots ,P_{n})\leq \log _{b}(n)}
Relation to mutual information<br>[edit]
The Jensen–Shannon divergence is the mutual information between a random variable
{\displaystyle X}
associated to a mixture distribution between
{\displaystyle P}
and
{\displaystyle Q}
and the binary indicator variable
{\displaystyle Z}
that is used to switch between
{\displaystyle P}
and
{\displaystyle Q}
to produce the mixture. Let
{\displaystyle X}
be some abstract function on the underlying set of events that discriminates well between events, and choose the value of
{\displaystyle X}
according to
{\displaystyle P}
if
{\displaystyle Z=0}
and according to
{\displaystyle Q}
if
{\displaystyle Z=1}
, where
{\displaystyle Z}
is equiprobable. That is, we are choosing
{\displaystyle X}
according to the probability measure
{\displaystyle M=(P+Q)/2}
, and its distribution is the mixture distribution. We compute
log
log<br>log
log
log
log<br>log
log<br>log
log<br>log
{\displaystyle {\begin{aligned}I(X;Z)&=H(X)-H(X|Z)\\&=-\sum M\log M+{\frac {1}{2}}\left[\sum P\log P+\sum Q\log Q\right]\\&=-\sum {\frac {P}{2}}\log M-\sum {\frac {Q}{2}}\log M+{\frac {1}{2}}\left[\sum P\log P+\sum Q\log Q\right]\\&={\frac {1}{2}}\sum P\left(\log P-\log M\right)+{\frac {1}{2}}\sum Q\left(\log Q-\log M\right)\\&={\rm {JSD}}(P\parallel...