From Kullback-Leibler Divergence to Jensen–Shannon Metric

From Kullback-Leibler divergence to Jensen–Shannon metric

(832) 422-8646

Contact

Kullback-Leibler divergence

Kullback-Leibler divergence is defined for two random variables X and Y by

K-L divergence is non-negative, and it’s zero if and only if X and Y have the same distribution. But it is not a metric, for reasons explained here. For one thing, it’s not symmetric.

Jeffreys divergence

We can fix the symmetry problem by defining

The J above stands for Jeffreys, for Harold Jeffreys. J is called either the symmetrized K-L divergence or Jeffreys’ divergence. It’s still a divergence, not a distance.

A distance (metric) d has to have four properties:

d(x, x) = 0

d(x, y) > 0 if x ≠ y

d(x, y) = d(y, x)

d(x, z) ≤ d(x, y) + d(y, z)

K-L divergence satisfies the first two properties. Jeffreys’ divergence satisfies the first three, but not the last one, the triangle inequality.

To show that J doesn’t satisfy the triangle inequality, let X, Y, and Z be Bernoulli random variables with p equal to 0.1, 0.2, and 0.3 respectively. Then the following Python code shows that the divergence from X to Y, plus the divergence from Y to Z, is less than the divergence from X to Z. This would be like saying you could get from LA to NYC faster by having a layover in Denver rather than taking a direct flight.

from math import log

kl = lambda p, q: p*log(p/q) + (1-p)*log((1-p)/(1-q)) j = lambda p, q: kl(p, q) + kl(q, p)

a = j(0.1, 0.2) b = j(0.2, 0.3) c = j(0.1, 0.3) print(a + b, c)

This prints 0.135 and 0.270.

Jensen-Shannon distance

Jensen-Shannon distance turns K-L divergence into a metric as follows. First, define the random variable M to be the average of X and Y. Then average the K-L divergence from M to each of X and Y. This defines the Jensen-Shannon divergence . It’s still not a metric, but its square root is, which defines the Jensen-Shannon distance .

The following code gives an example of Jensen-Shannon distance satisfying the triangle inequality.

def d(p, q): m = 0.5*(p + q) jsd = 0.5*kl(p, m) + 0.5*kl(q, m) return jsd**0.5

a = d(0.1, 0.2) b = d(0.2, 0.3) c = d(0.1, 0.3) print(a + b, c)

This prints 0.1817 and 0.1801. Now a layover makes the trip longer.

One thought on “Turning K-L divergence into a metric”

Emil

27 May 2026 at 23:49

That’s quite funny. All of them: the Jeffrey’s divergence JS divergence and JS distance – seem to extend the K-L in a non-trivial way. I googled ("wikipedied"?) for d_JS and after reafing the article still wonder how would one come up with a definition like that. Thanks for a great write-up!

Leave a Reply Your email address will not be published. Required fields are marked * Comment * Name *

Email *

Website

Search for:

John D. Cook, PhD

My colleagues and I have decades of consulting experience helping companies solve complex problems involving data privacy, applied math, and statistics.

Let’s talk. We look forward to exploring the opportunity to help your company too.

From Kullback-Leibler Divergence to Jensen–Shannon Metric

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine