Statistics and Probability Letters

Transcription

Statistics and Probability Letters
Statistics and Probability Letters 79 (2009) 270–274
Contents lists available at ScienceDirect
Statistics and Probability Letters
journal homepage: www.elsevier.com/locate/stapro
Complete monotonicity of the entropy in the central limit theorem for
gamma and inverse Gaussian distributions
Yaming Yu ∗
Department of Statistics, University of California, Irvine 92697, USA
article
a b s t r a c t
info
√
Let Hg (α) be the differential entropy of the gamma distribution Gam(α, α). It is shown
that (1/2) log(2π e) − Hg (α) is a completely monotone function of α . This refines the
monotonicity of the entropy in the central limit theorem for gamma random variables.
A similar result holds for the inverse Gaussian family. How generally this complete
monotonicity holds is left as an open problem.
© 2008 Elsevier B.V. All rights reserved.
Article history:
Received 16 June 2008
Received in revised form 11 August 2008
Accepted 14 August 2008
Available online 22 August 2008
1. Introduction
The classical central limit theorem (CLT) states that for a sequence of independent and identically distributed (i.i.d.)
random variables Xi , i = 1,√
2, . . ., with a finite mean µ = E (X1 ) and a finite variance σ 2 = Var(X1 ), the normalized partial
Pn
sum Zn = ( i=1 Xi − nµ)/ nσ 2 tends to N (0, 1) in distribution as n → ∞. There exists an information theoretic version
of the CLT. The differential entropy for a continuous random variable S with density f (s) is defined as
H (S ) = −
Z
∞
f (s) log f (s)ds.
−∞
Barron (1986) proved that if the (differential) entropy of Zn is eventually finite, then it tends to (1/2) log(2π e), the entropy
of N (0, 1), as n → ∞. As a consequence of Pinsker’s inequality (Cover and Thomas, 1991), convergence in entropy implies
convergence in distribution in the normal CLT case.
An interesting feature of the information theoretic CLT is that H (Zn ) increases monotonically in n (which reminds us of
the second law of thermodynamics); since the entropy is invariant under a location shift, equivalently for a sequence of
i.i.d. random variables Xi with a finite second moment, we have
H
n
1 X
√
n i =1
!
Xi
≤H
√
1
n +1
X
n + 1 i =1
!
Xi
,
n ≥ 1.
(1)
Although the special case n = 1 is known to follow from Shannon’s classical entropy power inequality (Stam, 1959;
Blachman, 1965), it was not until 2004 before the above inequality was finally proven by Artstein et al. (2004); see also
Madiman and Barron (2007) for generalizations.
This paper is concerned with a possible refinement of this monotonicity, at least when the distribution of Xi belongs to
some special families. Recall that a function f (x) on x ∈ (0, ∞) is called completely monotonic if its derivatives of all orders
exist and satisfy
(−1)k f (k) (x) ≥ 0,
∗
x > 0,
Tel.: +1 949 824 7361.
E-mail address: yamingy@uci.edu.
0167-7152/$ – see front matter © 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.spl.2008.08.008
Y. Yu / Statistics and Probability Letters 79 (2009) 270–274
271
for all k ≥ 0. In particular, a completely monotone function is both monotonically decreasing and convex. A discrete
sequence fn on n = 1, 2, . . . is completely monotonic if
(−1)k ∆k fn ≥ 0,
n = 1, 2, . . . ,
for all k ≥ 0, where ∆ denotes the first difference operator, i.e., ∆fn = fn+1 − fn . Some basic properties of completely
monotone functions can be found in Feller (1966).
Theorem 1. Let Xi , i = 1, 2, . . ., be i.i.d. random variables with distribution F , mean µ, and variance σ 2 ∈ (0, ∞). Denote
h(n) =
1
2
log(2π e) − H
√
!
n
X
(Xi − µ) ,
1
nσ 2 i=1
n = 1, 2, . . . .
Then h(n) is a completely monotonic function of n if F is either a gamma distribution or an inverse Gaussian distribution.
√
Note that in Theorem 1, h(n) is the also the relative entropy, or Kullback–Leibler divergence, between (1/ nσ 2 ) i=1 (Xi
− µ) and the standard normal. Theorem 1 indicates that the behavior of relative entropy in the classical CLT is (at least for
gamma and inverse Gaussian families) highly regular in a sense.
In Sections 2 and 3, we derive Theorem 1 for the gamma and inverse Gaussian families respectively. Part of the reason
these two families are chosen is that they are analytically tractable. It seems intuitive that Theorem 1 should hold for a
wide class of distributions, but the proofs presented here depend heavily on properties of the gamma and inverse Gaussian
distributions, and are unlikely to work for the general situation. How Theorem 1 can be extended is an open problem.
Pn
2. The gamma case
Let Hg (α, β), α > 0, β > 0, be the entropy of a Gam(α, β) random variable whose density is specified by
p(x; α, β) =
1
Γ (α)
β α xα−1 e−β x ,
x > 0,
where Γ (α) = 0 xα−1 e−x dx is Euler’s gamma function. If Xi ’s are i.i.d. with a Gam(α, β ) distribution, then the entropy of
the standardized sum is
R∞
H
β
√
n X
nα i = 1
α
Xi −
β
!
= Hg nα,
√
nα .
√
With a slight abuse of notation let Hg (α) = Hg (α, α). It is easy to see that Theorem 2 below implies Theorem 1 if the
common distribution of Xi belongs to the gamma family. Theorem 2. The function (1/2) log(2π e) − Hg (α) is completely monotonic on α ∈ (0, ∞).
Proof. Direct calculation gives
Hg (α) = log Γ (α) + α −
1
2
log(α) + (1 − α)ψ(α)
(2)
where ψ(α) is the digamma function defined by ψ(α) = Γ 0 (α)/Γ (α). It is clear that (1/2) log(2π e) > Hg (α), because the
standard normal achieves maximum entropy among continuous distributions with unit variance. Moreover,
Hg0 (α) = (1 − α)ψ 0 (α) + 1 −
1
2α
.
(3)
By Leibniz’s rule, for k ≥ 2,
(−1)k (k − 1)!
(−1)k Hg(k) (α) = (−1)k (1 − α)ψ (k) (α) − (k − 1)ψ (k−1) (α) +
.
2α k
(4)
The function ψ (k) (α), k ≥ 1, admits an integral representation (Abramowitz and Stegun, 1964, p. 260)
ψ (k) (α) = (−1)k+1
∞
Z
0
t k e−α t
1 − e− t
dt .
(5)
272
Y. Yu / Statistics and Probability Letters 79 (2009) 270–274
Lemma 1. If k ≥ 2 then
αψ (k) (α) + kψ (k−1) (α) = (−1)k
∞
Z
t k e−α t −t
(1 − e−t )2
0
dt .
(6)
For k = 1 we have
αψ 0 (α) = 1 +
∞
Z
0
1 − e−t − te−t −α t
e dt .
(1 − e−t )2
(7)
Proof. Using (5) and integration by parts, we get
kψ (k−1) (α) = (−1)k
∞
Z
kt k−1 e−α t
1 − e− t
0
∞
dt
−α e−αt
e−α t −t
−
dt
1 − e− t
(1 − e−t )2
0
Z ∞ k −αt −t
t e
= −αψ (k) (α) + (−1)k
dt
(1 − e−t )2
0
= (−1)k+1
Z
tk
which proves (6). The proof of (7) is similar.
In view of (4)–(6), we have (k ≥ 2)
(k)
(k)
(k−1)
Z
∞
(−1)k (k − 1)!
dt
+
(1 − e−t )2
2α k
t k e−α t −t
(−1) Hg (α) = (−1) ψ (α) + ψ
(α) + (−1)
0
Z ∞
1−t
te−t
1 k−1 −α t
=
−
+
t
e
dt
1 − e− t
(1 − e−t )2
2
0
Z ∞
1
t
1 k−1 −α t
=
−
+
t
e dt .
1 − e− t
(1 − e−t )2
2
0
k
k
k+1
(8)
A parallel calculation using (7) reveals that the expression (8) is also valid for k = 1.
Lemma 2. The function
u(t ) ≡
1
1−
e− t
−
t
(1 −
e− t ) 2
+
1
2
is negative on t ∈ (0, ∞).
Proof. Put s = 1 − e−t . Then
u(t ) =
s + s2 /2 + log(1 − s)
s2
< 0,
given the elementary inequality log(1 − s) < −s − s2 /2 for 0 < s < 1.
Lemma 2 immediately yields
(−1)k Hg(k) (α) < 0,
k ≥ 1,
thus completing the proof of Theorem 2.
(k)
As an illustration, Fig. 1 displays the functions Hg (α) on (1, 10) for k = 0, 1, 2, 3. The calculations are based on (2)–(4).
The monotone behavior is in clear agreement with Theorem 2.
3. The inverse Gaussian case
The inverse Gaussian distribution IG(µ, λ), µ > 0, λ > 0, has density
p(x; µ, λ) =
λ
2 π x3
1/2
exp −
λ(x − µ)2
,
2µ2 x
x > 0.
(9)
The mean is µ and the variance is µ3 /λ. Some well-known properties are listed below; further details can be found in
Seshadri (1993).
Y. Yu / Statistics and Probability Letters 79 (2009) 270–274
273
(k)
Fig. 1. The functions Hg (α), k = 0, 1, 2, 3, on α ∈ (1, 10).
Lemma 3. The inverse Gaussian family is closed under scaling and (i.i.d.) convolution. Specifically,
• if X ∼ IG(µ, λ) and a > 0 is a constant, then aX ∼ IG(aµ, aλ);
P
• if X1 , . . . , Xn are i.i.d. random variables distributed as IG(µ, λ), then ni=1 Xi ∼ IG(nµ, n2 λ).
Lemma 4. If X ∼ IG(µ, λ) then λ(X − µ)2 /(µ2 X ) has a χ 2 distribution with one degree of freedom.
Let Hig (µ, λ) denote the entropy of an IG(µ, λ) random variable. By Lemma 3, for an i.i.d. sequence Xi ∼ IG(µ, λ), the
entropy of the standardized sum is
H
−1/2
(nµ /λ)
3
!
n
X
(Xi − µ) = Hig (nλ/µ)1/2 , (nλ/µ)3/2 .
i=1
It is then clear that Theorem 3 implies Theorem 1 in the inverse Gaussian case.
Theorem 3. The function (1/2) log(2π e) − Hig (θ 1/2 , θ 3/2 ), θ ∈ (0, ∞), is a completely monotone function of θ .
Proof. Let J (θ ) = (1/2) log(2π e) − Hig (θ 1/2 , θ 3/2 ). As in the proof of Theorem 2, we know J (θ ) > 0 and only need to show
(−1)k J (k) (θ ) ≥ 0 for all k ≥ 1. Write µ = θ 1/2 for notational convenience and let X ∼ IG(µ, µ3 ). Direct calculation using
(9) gives
J (θ ) =
=
1
2
1
2
log(2π e) + E log p(X ; µ, µ3 )
3
µ(X − µ)2
2
2X
− E log(X /µ) − E
3
= − E log(X /µ)
2
where Lemma 4 is used in the last equality. To calculate E log(X /µ), let us recall the moment generating function of X :
h
EetX = exp µ2 1 −
p
i
1 − 2t /µ
.
Using this and the integral identity
log(a) =
∞
Z
0
e−t − e−at
t
dt ,
a > 0,
274
Y. Yu / Statistics and Probability Letters 79 (2009) 270–274
we may express E log(X /µ) as
E log(µX /µ2 ) = −2 log(µ) + E
∞
Z
t
0
= − log(θ ) +
Z
2
e−t − eµ (1−
∞
Z
= −2 log(µ) +
0
t
∞
e−t − eθ (1−
dE log(X /µ)
dθ
=−
θ
dθ
=−
1+2t
1+2t
)
)
dt
dt .
√
1 + 2t − 1 θ (1−√1+2t )
e
dt .
t
+
0
√
By a change of variables s =
dE log(X /µ)
∞
Z
1
dt
√
√
t
0
Thus
e−t − e−µXt
1
θ
Z
1 + 2t − 1 in the above integral, we obtain
s(1 + s) −θ s
e ds
s2 /2 + s
∞
Z
+
0
∞
=−
Z ∞0
=
0
e
−θ s
∞
Z
ds +
0
s
s+2
e
−θ s
2(1 + s) −θ s
e ds
s+2
ds.
It is then clear that
(−1)k
dk E log(X /µ)
dθ
k
∞
Z
=−
0
sk
s+2
e−θ s ds < 0
for all k ≥ 1; in other words
(−1)k J (k) (θ ) > 0,
k ≥ 1. Acknowledgment
The author would like to thank an anonymous referee for his/her valuable comments.
References
Abramowitz, M., Stegun, I.A. (Eds.), 1964. Handbook of Mathematical Functions. Dover, New York.
Artstein, S., Ball, K.M., Barthe, F., Naor, A., 2004. Solution of Shannon’s problem on the monotonicity of entropy. J. Amer. Math. Soc. 17, 975–982.
Barron, A.R., 1986. Entropy and the central limit theorem. Ann. Probab. 14, 336–342.
Blachman, N.M., 1965. The convolution inequality for entropy powers. IEEE Trans. Inform. Theory 11, 267–271.
Cover, T.M., Thomas, J.A., 1991. Elements of Information Theory. Wiley, New York.
Feller, W., 1966. An Introduction to Probability Theory and its Applications, vol. 2. Wiley, New York.
Madiman, M., Barron, A.R., 2007. Generalized entropy power inequalities and monotonicity properties of information. IEEE Trans. Inform. Theory 53,
2317–2329.
Seshadri, V., 1993. The Inverse Gaussian Distribution. Oxford Univ. Press, Oxford.
Stam, A.J., 1959. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inform. Control 2, 101–112.