1. Prerequisites in Measure Theory, Probability, and Ergodic Theory
Transcription
1. Prerequisites in Measure Theory, Probability, and Ergodic Theory
1. Prerequisites in Measure Theory, Probability, and Ergodic Theory Last modified on March 26, 2015 We start by recalling briefly basic notions and facts which will be used in subsequent chapters. 1.1. Notation The sets of natural, integer, non-negative integer, and real numbers will be denoted by N, Z, Z+ , R, respectively. The set of extended real numbers R is R [ {±1}. If ⇤ and V are some non-empty sets, by V ⇤ we will denote the space of V -valued functions defined on ⇤: V ⇤ = {f : ⇤ ! V }. If ⇧ ✓ ⇤ and f 2 V ⇤ , by f⇧ we will denote the restriction of f to ⇧, i.e., f⇧ 2 V ⇧ . If ⇧1 and ⇧2 are disjoint, f 2 V ⇧1 , and g 2 V ⇧2 , by f⇧1 g⇧2 we will denote the unique element h 2 V ⇧1 [⇧2 such that h⇧1 = f and h⇧2 = g. 1.2. Measure theory 1.2.1. Suppose ⌦ is some non-empty set, a collection A of subsets of ⌦ is called a -algebra if it satisfies the following properties: (i) ⌦ 2 A; (ii) if A 2 A, then Ac = ⌦ \ A 2 A; (iii) for any sequence {An }n2N where An 2 A for every n 2 N, then [n An 2 A. Equivalently, -algebra A is a collection of subsets of ⌦, closed under countable operations like intersection, union, complement, symmetric di↵erence. An element of A is called a measurable subset of ⌦. Intersection of two -algebras is again a -algebra. This property allows us, for a given collection of sets C of ⌦, to define uniquely the minimal -algebra (C) containing C, as an intersection of all -algebras containing C. 11 1. Prerequisites A pair (⌦, A), where A is some -algebra of subsets of ⌦, is called a measurable space. If the space ⌦ is metrizable (e.g., ⌦ = [0, 1] or R), one can define the Borel -algebra of ⌦, denoted by B(⌦), as the minimal -algebra containing all open subsets of ⌦. 1.2.2. Suppose (⌦1 , A1 ) and (⌦2 , A2 ) are two measurable spaces. The map T : ⌦1 ! ⌦2 is called (A1 , A2 )-measurable if for any A2 2 A2 , the full preimage of A2 is a measurable subset of ⌦1 , i.e., T 1 A2 = {! 2 ⌦1 : T (!) 2 A2 } 2 A1 . By definition, a random variable is a measurable map from (⌦1 , A1 ) into (R, B(R)). If (⌦, A) is a measurable space, and T : ⌦ ! ⌦ is a measurable map, then T is called a measurable transformation of (⌦, A) or an endomorphism of (⌦, A). If, furthermore, T is invertible and T 1 : ⌦ ! ⌦ is also measurable, then T is called a measurable isomorphism of (⌦, A). 1.2.3. Suppose (⌦, A) is a measurable space. The function µ : A ! [0, +1] is called a measure if • µ(?) = 0; • for any countable collection of pairwise disjoint sets {An }n , An 2 A, one has ⇣[ ⌘ X µ An = µ(An ). (1.2.1) n n The triple (⌦, A, µ) is called a measure space. If µ(⌦) = 1, then µ is called a probability measure, and (⌦, A, µ) a probability (measure) space. 1.2.4. Suppose (⌦, A, µ) is called a measure space. The for any A 2 A, the indicator function of A is ( 1, ! 2 A, IA (!) = 0, ! 2 A. The Lebesgue integral of f = IA is defined as Z IA dµ = µ(A). A function f is called simple if there exist K 2 N, measurable sets {Ak }K k=1 with Ak 2 A, and non-negative numbers {↵k }K such that k=1 f (!) = K X ↵k IAk (!). k=1 The Lebesgue integral of the simple function f is defined as Z 12 f dµ = K X k=1 ↵k µ(Ak ). 1.2. Measure theory Suppose {fn } is a monotonically increasing sequence of simple functions and f = limn fn . Then the Lebesgue integral of f is defined as Z Z f dµ = lim fn dµ. n It turns out that every non-negative measurable function f : ⌦ ! R+ can be represented as a limit of increasing sequence of simple functions. For a measurable function f : ⌦ ! R, let f+ and f be the positive and the negative part of f , respectively, i.e. f = f+ f . The function f is called Lebesgue integrable if Z Z f+ dµ, f dµ < +1 and the Lebegsue integral of f is then defined as Z Z Z f dµ = f+ dµ f dµ. The set of all Lebesgue integrable function of (⌦, A, µ), will be denoted by L1 (⌦, A, µ), or L1 (⌦) and L1 (µ), if the latter cases do not lead to confusion. R If (⌦, A, µ) is a probability space, we will sometimes denote the Lebesgue integral f dµ by Eµ f or Ef . 1.2.5. Conditional expectation with respect to a sub- -algebra. Suppose (⌦, A, µ) is a probability space, and F is some sub- -algebra of A. Suppose f 2 L1 (⌦, A, µ) is a A-measurable Lebesgue integrable function. The conditional expectation of f given F will be denoted by E(f |F), and is by definition a F-measurable function on ⌦ such that Z Z E(f |F) dµ = f dµ C C for every C 2 F. 1.2.6. Martingale convergence theorem. Suppose (⌦, A, µ) is a measure space, and {An } is a sequence of sub- -algebras of A such that An ✓ An+1 for all n. Denote by A1 the minimal -algebra containing all An . Then for any f 2 L1 (⌦, A, µ) Eµ (f |An ) ! Eµ (f |A1 ) µ-almost surely and in L1 (⌦). 1.2.7. Absolutely continuous measures and the Radon-Nikodym Theorem. Suppose ⌫ and µ are two measures on the measurable space (⌦, A). The measure ⌫ is absolutely continuous with respect to µ (denoted by ⌫ ⌧ µ) if ⌫(A) = 0 for all A 2 A such that µ(A) = 0. 13 1. Prerequisites The Radon-Nikodym theorem states that if µ and ⌫ are two -finite measures on (⌦, A), and ⌫ ⌧ µ, then there exists a non-negative measurable function f , called the (RadonNikodym) density of ⌫ with respect to µ, such that Z ⌫(A) = f dµ, for all A 2 A. A 1.3. Stochastic processes Suppose (⌦, A, µ) a probability space. A stochastic process {Xn } is a collection of random variables Xn : ⌦ ! R, indexed by n 2 T , where the time is T = Z+ or Z. Stochastic process can be described by the finite-dimensional distributions (marginals): for every (n1 , . . . , nk ) 2 T k , µ(Xn1 2 ·, . . . , Xnk 2 ·) is a probability measure on Rk . In the opposite direction, a consistent family of finitedimensional distributions can be used to define a stochastic process. Process {Xn } is called stationary if for (n1 , . . . , nk ) 2 T k and all t, µ(Xn1 2 ·, . . . , Xnk 2 ·) = µ(Xn1 +t 2 ·, . . . , Xnk +t 2 ·), i.e., the time-shift does not a↵ect the finite-dimensional marginal distributions. 1.4. Ergodic theory Ergodic Theory originates from the Boltzmann-Maxwell ergodic hypothesis and is the study of measure preserving dynamical systems (⌦, A, µ, T ) where • (⌦, A, µ) is a (Lebesgue) probability space • T : ⌦ ! ⌦ is measure preserving: for all A 2 A µ(T 1 A) = µ ({x 2 X : T (x) 2 A}) = µ(A). Examples: ⌦ = T1 = R/Z ⇠ = [0, 1), µ= Lebesgue measure, • Circle rotation: T↵ (x) = x + ↵ mod 1, • Doubling map: T (x) = 2x mod 1. 14 1.5. Entropy Example: Bernoulli shifts. ⇣ ⌘ Finite set (alphabet) A = {1, . . . , N } . n ⌦ = AZ+ = ! = (!n )n 0 o : !n 2 A . Measure p = (p(1), . . . , p(N )) on A can be extended to the measure µ = pZ+ on AZ+ ⇣ ⌘ µ ! : xi1 = !i1 , . . . , !in = ain = p(ai1 ) · · · p(ain ). The measure µ is clearly preserved by the left shift ! = !n+1 n :⌦!⌦ 8n. Definition 1.1. Measure-preserving dynamical system (⌦, A, µ, T ) is ergodic if every invariant set is trivial: A=T 1 ) A µ(A) = 0 or 1, equivalently, if every invariant function is constant: f (!) = f (T (!)) (µ ) a.e.) f (!) = const (µ a.e.). Theorem 1.2 (Birkho↵’s Pointwise Ergodic Theorem). Suppose (⌦, A, µ, T ) is an ergodic measure-preserving dynamical system. Then for all f 2 L1 (⌦, µ) n 1 1X f (T t (x)) ! n t=0 Z f (x)µ(dx) as n ! 1 µ-almost surely and in L1 . 1.5. Entropy Suppose p = (p1 , . . . , pN ) is a probability vector, i.e., pi 0, N X pi = 1. i=1 The entropy of p is H(p) = N X pi log2 pi . i=1 15 1. Prerequisites 1.5.1. Shannon’s entropy rate per symbol Definition 1.3. Suppose Y = {Yk } is a stationary process with values in a finite alphabet A, and µ is the corresponding translation invariant measure. Fix n 2 N, and consider the distribution of n-tuples (Y0 , . . . , Yn 1 ) 2 An . Denote by Hn entropy of this distribution: X Hn = H(Y0n 1 ) = P[Y0n 1 = an0 1 ] log2 P[Y0n 1 = an0 1 ]. (a0 ,...,an 1 )2A n The entropy (rate) of the process Y = {Yk }, equivalently, of the measure P, denoted by h(Y), h(P), or h (P), is defined as 1 Hn n!1 n h(Y) = h(P) = h (P) = lim (the limit exists!). Similarly,the entropy can be defined for stationary random fields {Yn }n2Zd , Yn 2 A. Let ⇤n = [0, n 1]d \ Zd . Then h(Y) = lim n!1 1 |⇤d | X P[Y⇤n = a⇤n ] log2 P[Y⇤n = a⇤n ], a⇤n 2A⇤n again one can easily show that the limit exists. 1.5.2. Kolmogorov-Sinai entropy of measure-preserving systems Suppose (⌦, A, µ, T ) is a measure-preserving dynamical system. Suppose C = {C1 , . . . , CN } is a finite measurable partition of ⌦, i.e., a partition of ⌦ into measurable sets Ck 2 A. For every ! 2 ⌦ and n 2 Z+ (or Z), put Yn (x) = YnC (!) = j 2 {1, . . . , N } , T n (!) 2 Cj . Proposition: For any C, the corresponding process YC = {Yn }, Yn : ⌦ ! {1, . . . , N }, is a stationary process with ⇣ ⌘ P[Y0 = j0 , . . . , Yn = jn ] = µ ! 2 ⌦ : ! 2 Cj0 , . . . , T n (x) 2 Cjn . Definition 1.4. If (⌦, A, µ, T ) is a measure preserving dynamical system, and C = {C1 , . . . , CN } is a finite measurable partition of ⌦, then the entropy of (⌦, A, µ, T ) with 16 1.5. Entropy respect to C is defined as as the Shannon entropy of the corresponding symbolic process YC : hµ (T, C) = h(YC ). Finally, the measure-theoretic or the Kolmogorov-Sinai entropy of (⌦, A, µ, T ) is defined as hµ (T ) = sup hµ (T, C). C is finite The following theorem of Sinai eliminates the need to consider all finite partitions. Definition 1.5. A partition C is called generatingpartition (or generator) of the dynamical system ⌦, A, µ, T ) if the smallest -algebra containing all T n (Cj ), j = 1, . . . , N , n 2 Z, is A. Theorem 1.6 (Ya. Sinai). If C is a generating partition then hµ (T ) = hµ (T, C) 17