QUESTION:
What
is the relationship between the variance and entropy of independent
one-hot vectors?
This document proves inequalities relating variance, collision
entropy and Shannon entropy of sequences of independent one-hot
vectors.
Introduction
Both variance and entropy are measures of uncertainty. Variance
assumes values vary as points in a space separated by distances. In
this document, the variance of a random vector refers to the variance
of the distance from its mean (sum of the variances of each
component).
Random
one-hot
vectors are a convenient spacial representation for categorical random
variables. A
one-hot
vector has all components equal to 0
except one component that equals 1.
This representation has been used in genetics
[1].
For genetic loci with only two alleles, a one-hot vector has two
redundant components. "Half" of such one-hot vectors are
typically used in genetics (e.g.
[2]
p.40,
[3],
[4]
). The variance of the "half one-hot vector" is exactly half
the variance of its full one-hot vector.
Main Result
Given N
independent random one-hot vectors X1,
X2,
..., XN,
define X∗=X1×X2×⋯×XN
as the Cartesian product.
The variance of X∗
can be adjusted to form a lower bound to the collision entropy,
H2(X∗),
and Shannon entropy, H(X∗):
−log2(1−NVar(X∗))≤NH2(X∗)≤NH(X∗)
If every Xi
takes only two equally likely values, then the lower bounds reach
equality: −log2(1−NVar(X∗))=NH2(X∗)=NH(X∗)=1
Proof
Let Mi
be length of Xi
(the number of categorical values represented by
Xi).
Let pi,j
represent the probability of Xi
taking the j-th
categorical value.
For every 1≤i≤N,
j=1∑Mipi,j=1
The expectation and variance of the i-th
one-hot vector Xi
is E(Xi)Var(Xi)=(pi,1,pi,2,…,pi,Mi)=j=1∑Mipi,j⎣⎡(1−pi,j)2+k=j∑(0−pi,k)2⎦⎤=j=1∑Mipi,j[1−2pi,j+k=1∑Mipi,k2]=1−2j=1∑Mipi,j2+k=1∑Mipi,k2=1−j=1∑Mipi,j2
Thus the variance of Xi
equals the probability of two independent samples from
Xi
being distinct. This probability of distinction has been called
logical entropy
[5].
The complement 1−Var(Xi)=j=1∑Mipi,j2
is the chance of repetition, which is expected probability. Taking the
negative log gives
Rényi
entropy of order 2, also called collision entropy:
−log2(1−Var(Xi))=−log2(j=1∑Mipi,j2)=H2(Xi)
Since negative log is a concave function, the negative log of expected
probability (collision entropy), is a lower bound to the expected
negative log of probability (Shannon entropy) by
Jensen's
inequality: H2(Xi)=−log2(j=1∑Mipi,j2)≤j=1∑Mipi,j(−log2pi,j)=H(Xi)
The total variance, can be adjusted to equal the average
probability of one-hot vector repetition (per one-hot vector):
1−NVar(X∗)=1−N1i=1∑NVar(Xi)=N1i=1∑Nj=1∑Mipi,j2
Negative log with
Jensen's
inequality can then establish yet another lower bound:
−log2(N1i=1∑Nj=1∑Mipi,j2)≤N1i=1∑N(−log2j=1∑Mipi,j2)=N1i=1∑NH2(Xi)
Collision and Shannon entropy are additive for independent
variables. Putting everything together we get
−log2(1−NVar(X∗))≤NH2(X∗)≤NH(X∗)
References
1.
Menozzi P, Piazza A, Cavalli-Sforza L (1978) Synthetic maps of human gene frequencies in Europeans. Science 201:786–792. https://doi.org/10.1126/science.356262
2.
Weir BS (1996) Genetic data analysis II: Methods for discrete population genetic data. Sinauer Associates, Sunderland, Mass
Ellerman D (2017) Logical information theory: New logical foundations for information theory. Logic Journal of the IGPL 25:806–835. https://doi.org/10.1093/jigpal/jzx022