QUESTION:
What
is the relationship between the variance and entropy of independent
one-hot vectors?
This document proves inequalities relating variance, collision
entropy and Shannon entropy of sequences of independent one-hot
vectors.
Introduction
Both variance and entropy are measures of uncertainty. Variance
assumes values vary as points in a space separated by distances. In
this document, the variance of a random vector refers to the variance
of the distance from its mean (sum of the variances of each
component).
Random
one-hot
vectors are a convenient spacial representation for categorical random
variables. A
one-hot
vector has all components equal to 0
except one component that equals 1.
This representation has been used in genetics
[1].
For genetic loci with only two alleles, a one-hot vector has two
redundant components. "Half" of such one-hot vectors are
typically used in genetics (e.g.
[2]
p.40,
[3],
[4]
). The variance of the "half one-hot vector" is exactly half
the variance of its full one-hot vector.
Main Result
Given N
independent random one-hot vectors X1β,
X2β,
..., XNβ,
define Xββ=X1βΓX2βΓβ―ΓXNβ
as the Cartesian product.
The variance of Xββ
can be adjusted to form a lower bound to the collision entropy,
H2β(Xββ),
and Shannon entropy, H(Xββ):
βlog2β(1βNVar(Xββ)β)β€NH2β(Xββ)ββ€NH(Xββ)β
If every Xiβ
takes only two equally likely values, then the lower bounds reach
equality: βlog2β(1βNVar(Xββ)β)=NH2β(Xββ)β=NH(Xββ)β=1
Proof
Let Miβ
be length of Xiβ
(the number of categorical values represented by
Xiβ).
Let pi,jβ
represent the probability of Xiβ
taking the j-th
categorical value.
For every 1β€iβ€N,
j=1βMiββpi,jβ=1
The expectation and variance of the i-th
one-hot vector Xiβ
is E(Xiβ)Var(Xiβ)β=(pi,1β,pi,2β,β¦,pi,Miββ)=j=1βMiββpi,jββ£β‘β(1βpi,jβ)2+kξ =jββ(0βpi,kβ)2β¦β€β=j=1βMiββpi,jβ[1β2pi,jβ+k=1βMiββpi,k2β]=1β2j=1βMiββpi,j2β+k=1βMiββpi,k2β=1βj=1βMiββpi,j2ββ
Thus the variance of Xiβ
equals the probability of two independent samples from
Xiβ
being distinct. This probability of distinction has been called
logical entropy
[5].
The total variance, can be adjusted to equal the average
probability of one-hot vector repetition (per one-hot vector):
1βNVar(Xββ)β=1βN1βi=1βNβVar(Xiβ)=N1βi=1βNβj=1βMiββpi,j2β
Negative log with
Jensen's
inequality can then establish yet another lower bound:
βlog2β(N1βi=1βNβj=1βMiββpi,j2β)β€N1βi=1βNβ(βlog2βj=1βMiββpi,j2β)=N1βi=1βNβH2β(Xiβ)
Collision and Shannon entropy are additive for independent
variables. Putting everything together we get
βlog2β(1βNVar(Xββ)β)β€NH2β(Xββ)ββ€NH(Xββ)β
References
1.
Menozzi P, Piazza A, Cavalli-Sforza L (1978) Synthetic maps of human gene frequencies in Europeans. Science 201:786β792. https://doi.org/10.1126/science.356262
2.
Weir BS (1996) Genetic data analysis II: Methods for discrete population genetic data. Sinauer Associates, Sunderland, Mass
Ellerman D (2017) Logical information theory: New logical foundations for information theory. Logic Journal of the IGPL 25:806β835. https://doi.org/10.1093/jigpal/jzx022