A martingale bound for the entropy associated with a trimmed filtration on $\mathbb {R}^d$

Using martingale methods, we provide bounds for the entropy of a probability measure on $\mathbb {R}^d$ with the right-hand side given in a certain integral form. As a corollary, in the one-dimensional case, we obtain a weighted log-Sobolev inequality.


Introduction
A probability measure µ on R d is said to satisfy the log-Sobolev inequality if for every smooth compactly supported function f : R d → R, the entropy of f 2 , which by definition equals

possesses a bound
with some constant c. The least possible constant c such that (1) holds for every compactly supported smooth f is called the log-Sobolev constant for the measure µ; the multiplier 2 in (1) is chosen in such a way that for the standard Gaussian measure on R d , its log-Sobolev constant equals 1.
The weighted log-Sobolev inequality has the form where the function W , taking values in R d×d , has the meaning of a weight. Clearly, one can consider (1) as a particular case of (2) with constant weight W equal to √ c multiplied by the identity matrix. The problem of giving explicit conditions on µ that ensure the log-Sobolev inequality or its modifications is intensively studied in the literature, in particular, because of numerous connections between these inequalities with measure concentration, semigroup properties, and so on (see, e.g., [8]). Motivated by this general problem, in this paper, we propose an approach that is based mainly on martingale methods and provides explicit bounds for the entropy with the right-hand side given in a certain integral form.
Our approach is motivated by the well-known fact that, on a path space of a Brownian motion, the log-Sobolev inequality possesses a simple proof based on fine martingale properties of the space (cf. [1,6]). We observe that a part of this proof is, to a high extent, insensitive w.r.t. the structure of the probability space; we formulate a respective martingale bound for the entropy in Section 1.1. To apply this general bound on a probability space of the form (R d , µ), one needs a proper martingale structure therein. In Section 2, we introduce such a structure in terms of a trimming filtration, defined in terms of a set of trimmed regions in R d . This leads to an integral bound for the entropy on (R d , µ). In Section 3, we show the way how this bound can be used to obtain a weighted log-Sobolev inequality; this is made in the one-dimensional case d = 1, although we expect that similar arguments should be effective for the multidimensional case as well; this is a subject of our further research.

A martingale bound for the entropy
Let (Ω, F , P) be a probability space with filtration F = {F t , t ∈ [0, 1]}, which is right-continuous and complete, that is, every F t contains all P-null sets from F . Let {M t , t ∈ [0, 1]} be a nonnegative square-integrable martingale w.r.t. F on this space, with càdlàg trajectories. We will use the following standard facts and notation (see [4]).
The martingale M has unique decomposition M = M c + M d , where M c is a continuous martingale, and M d is a purely discontinuous martingale (see [4], Definition 9.20). Denote by M c the quadratic variation of M c , by the optional quadratic variation of M , and by M the predictable quadratic variation of M , that is, the projection of [M ] on the set of F-predictable processes. Alternatively, M is identified as the F-predictable process that appears in the Doob-Meyer decomposition for M 2 , that is, the F-predictable nondecreasing process A such that A 0 = 0 and M 2 − A is a martingale. For a nonnegative r.v. ξ, define its entropy by Ent ξ = Eξ log ξ − Eξ log(Eξ) with the convention 0 log 0 = 0. Theorem 1. Let the σ-algebra F 0 be degenerate. Then for any nonnegative squareintegrable martingale {M t , t ∈ [0, 1]} with càdlàg trajectories, Proof. Consider first the case where with some positive constants c 1 , c 2 . Consider a smooth function Φ, bounded with all its derivatives, such that Then by the Itô formula (see [4], Theorem 12.19), Clearly, Because F 0 is assumed to be degenerate, M 0 = E[M 1 |F 0 ] = EM 1 a.s., and hence . Then Because the process M t− , t ∈ [0, 1], is F-predictable, we have which completes the proof of the required bound under assumption (3). The upper bound in this assumption can be removed using the following standard localization procedure. For N ≥ 1, define with the convention inf ∅ = 1. Then, repeating the above argument, we get We have M τN → M 1 , N → ∞ a.s. On the other hand, EM 2 τN ≤ EM 2 1 , and Hence, the family {M τN log M τN , N ≥ 1} is uniformly integrable, and Passing to the limit as N → ∞, we obtain the required statement under the assumption M t ≥ c 1 > 0. Taking M t + (1/n) instead of M t and then passing to the limit as n → ∞, we complete the proof of the theorem.
We further give two examples where the shown martingale bound for the entropy is applied. In these examples, it would be more convenient to assume that t varies in [0, ∞) instead of [0, 1]; a respective version of Theorem 1 can be proved by literally the same argument.
Example 1 (Log-Sobolev inequality on a Brownian path space; [1,6]). Let B t , t ≥ 0, be a Wiener process on (Ω, F , P) such that F = σ(B). Let {F t } be the natural filtration for B. Then for every ζ ∈ L 2 (Ω, P), the following martingale representation is available: with the Itô integral of a (unique) square-integrable {F t }-adapted process {η t } in the right-hand side (cf. [3]). Take ξ ∈ L 4 (Ω, P) and put ζ = ξ 2 and Then the calculation from the proof of Theorem 1 gives the bound Note the extra term 1/2, which appears because the martingale M is continuous.
Next, recall the Ocone representation [10] for the process {η t }, which is valid if ζ possesses the Malliavin derivative Dζ = {D t ζ, t ≥ 0}: We omit the details concerning the Malliavin calculus, referring the reader, if necessary, to [9]. Because the Malliavin derivative possesses the chain rule, we have and consequently the following log-Sobolev-type inequality holds: where Dξ is considered as a random element in H = L 2 (0, ∞). By a proper approximation procedure one can show that (7) holds for every ξ ∈ L 2 (Ω, P) that has a Malliavin derivative Dξ ∈ L 2 (Ω, P, H).
The previous example is classic and well known. The next one apparently is new, which is a bit surprising because the main ingredients therein (the Malliavin calculus on the Poisson space and the respective analogue of the Clark-Ocone representation (4), (5)) are well known (cf. [2,5]).
Example 2 (Log-Sobolev inequality on the Poisson path space). Let N t , t ≥ 0, be a Poisson process with intensity λ, and F = σ(N ). Denote by τ k , k ≥ 1, the moments of consequent jumps of the process N , and by F t = σ(N s , s ≤ t), t ≥ 0, the natural filtration for N . For any variable of the form ξ = F (τ 1 , . . . , τ n ) with some n ≥ 1 and some compactly supported F ∈ C 1 (R n ), define the random element Dξ in H = L 2 (0, ∞) by Denote by the same symbol D the closure of D, considered as an unbounded operator L 2 (Ω, P) → L 2 (Ω, P, H). Then the following analogue of the Clark-Ocone representation (4), (5) is available ( [5]): for every ζ that possesses the stochastic derivative Dζ, the following martingale representation holds: whereÑ t = N t − λt denotes the compensated Poisson process corresponding to N , and {η t } is the projection in L 2 (Ω, P, H) of Dξ on the subspace generated by the {F t }-predictable processes.
Proceeding in the same way as we did in the previous example, we obtain the following log-Sobolev-type inequality on the Poisson path space:

Trimmed regions on R d and associated integral bounds for the entropy
Let µ be a probability measure on R d with Borel σ-algebra B(R d ). Our further aim is to apply the general martingale bound from Theorem 1 in the particular setting In what follows, we denote N µ = {A ∈ F : µ(A) = 0} (the class of µ-null Borel sets).
Fix the family {D t , t ∈ [0, 1]} of closed subsets of R d such that: and for every t < 1, We call the sets D t , t ∈ [0, 1], trimmed regions, following the terminology used frequently in the multivariate analysis (cf. [7]). Given the family {D t }, we define the respective trimmed filtration {F t } by the following convention.
Then, by definition, a set A ∈ F belongs to F t if either A∩Q t ∈ N µ or Q t \A ∈ N µ . By the construction, F = {F t } is complete. It is also clear that, by property (ii) of the family {D t }, the σ-algebra F 0 is degenerate and, by property (iii), the filtration F is continuous. Hence, we can apply Theorem 1.
Fix a Borel-measurable function g : R d → R + that is square-integrable w.r.t. µ. Consider it as a random variable on (Ω, F , P) = (R d , B(R d ), µ) and define Since the σ-algebra possesses an explicit description, we can calculate every g t directly; namely, for t > 0 and µ-a.a. x, we have where we denote G t = 1 µ(Q t ) Qt g(y) µ(dy).
In what follows, we consider the modification of the process {g t } defined by (8) for every x ∈ R d . Its trajectories can be described as follows. Denote then by property (iii) of the family {D t } we have τ (x) = min{t : x ∈ D t }, and by property (ii) we have τ (x) < 1, x ∈ R d , τ (x) = 0 ⇔ x ∈ D 0 . Then, for a fixed x ∈ R d , we have which is a càdlàg function because {G t } is continuous on [0, 1).

Theorem 2.
Let g be a Borel-measurable function g : R d → R + , square-integrable w.r.t. µ. Let {D t } be a family of trimmed regions that satisfy (i)-(iii). Then where the functions G and τ are defined by (9) and (10), respectively.

Proof.
We have already verified the assumptions of Theorem 1: the filtration {F t } is complete and right continuous, and the square-integrable martingale {g t } has càdlàg trajectories. Because g 1 = g a.s. and F 0 is degenerate, by Theorem 1 we have the bound Hence, we only have to specify the integral in the right-hand side of this bound. Namely, our aim is to prove that First, we observe the following.

Lemma 1.
Let 0 < s < t < 1, and let α be a bounded F s -measurable random variable. Then Proof. By the definition of g , We have and applying formula (8) with g = g 2 t and t = s, we get Because α is F s -measurable, it equals a constant on Q s µ-a.s. Denote this constant by A; then the previous calculation gives Write H t,s in the form Observe that the functions µ t , t ∈ [0, 1] and I t , t ∈ [0, 1], are continuous functions of a bounded variation and µ t > 0, t < 1. Then It is easy to show that Indeed, because G is continuous on [0, 1), the left-hand side integral can be approximated by the integral sum .
Hence, this sum equals m k=1 Dv k \Dv k−1 up to a residue term that is dominated by and tends to zero as the size of the partition tends to zero. This proves (12). Similarly, we can show that We can summarize this calculation as follows: Because α(x) = A for µ-a.a. x ∈ D s , this completes the proof.
Let us continue with the proof of (11). Assume first that g ≥ c with some c > 0. Then g t ≥ c, and consequently the process 1/g t− is left continuous and bounded. In addition, the function G t = I t /µ t is bounded on every segment [0, T ] ⊂ [0, 1).
Fix T < 1 and take a sequence {λ n } of dyadic partitions of [0, T ], λ n = t n k , k = 0, . . . , 2 n , t n k = T k 2 n , and define For every fixed t > 0, the value g n t equals the value of g at some (dyadic) point t n < t, and t n → t−. Hence, pointwise. In addition, because of the additional assumption g ≥ c, this sequence is bounded by 1/c. Hence, by the dominated convergence theorem, here we take into account that the point t = 0 in the left-hand side integral is negligible because g t → Eg, t → 0+, in L 2 , and consequently g t → 0, t → 0+, in L 1 . By Lemma 1, recall that g t n k−1 (x) = G t n k−1 for x ∈ D t n k−1 . Next, for x ∈ D t n k \ D t n k−1 , we have |τ (x) − t n k−1 | ≤ 2 −n . Because G t , t ∈ [0, T ], is uniformly continuous and separated from zero, and G τ (x) , x ∈ D T is bounded, we obtain that Taking T → 1− and applying the monotone convergence theorem to both sides of the above identity, we get (11). To remove the additional assumption g ≥ c, consider the family g n t = g t + 1/n. (1/n). Hence, we can write (11) for g n , apply the monotone convergence theorem to both sides of this identity, and get (11) for g.

One corollary: a weighted log-Sobolev inequality on R
In this section, we show the way how the integral bound for the entropy, established in Theorem 2, can be used to obtain weighted log-Sobolev inequalities. Consider a continuous probability measure µ on (R, B(R)) and denote by p µ the density of its absolutely continuous part. Fix a family of segments D t = [a t , b t ], t ∈ [0, 1), where a 0 = b 0 , the function a t is continuous and decreasing to −∞ as t → 1−, and the function b · is continuous and increasing to +∞ as t → 1−. Then the family satisfies the assumptions imposed before. Hence, Theorem 2 is applicable.
We call a function f : In the following proposition, we apply Theorem 2 to g = f 2 , where f is smooth and symmetric.
Proposition 1. Let f : R → R be a smooth function that is symmetric w.r.t. the family Then x > a 0 .

Proof.
Write Let us analyze the expression in the right-hand side. Observe that now Q τ (x) is the union of two intervals (−∞, a τ (x) ) and (b τ (x) , +∞). Denote The point x equals either a τ (x) or b τ (x) ; hence, because g = f 2 is symmetric, Then we have Consequently, Using Fubini's theorem, we get Because g = f 2 and hence g ′ = 2f f ′ , by the Cauchy inequality we then have Observe that Hence, by Theorem 2 and Fubini's theorem we have Similarly to the proof of (12), we can show that the last identity holds because µ 0 = 1. This completes the proof.
Next, we develop a symmetrization procedure in order to remove the restriction for f to be symmetric. For any x = a 0 , one border point of the segment D τ (x) equals x; let us denote s(x) the other border point. Denote also s(a 0 ) = a 0 . Define the σ-algebraF of symmetric sets A ∈ F , that is, such that x ∈ A ⇔ s(x) ∈ A. For a function f ∈ L 2 (R, µ), consider its L 2 -symmetrization It can be seen easily that there exists a measurable function p : R → [0, 1] such that, for µ-a.a. x ∈ R, where we denote We have and, consequently, It is well known (cf. [8]) that for a Bernoulli measure ν = pδ 1 + qδ −1 (p+ q = 1), the following discrete analogue of the log-Sobolev inequality holds: . This yields the bound By the Cauchy inequality, , and, similarly to the proof of (12), we can show that This yields the following bound for the difference Ent µ f 2 − Ent µ (f ) 2 , formulated in the terms of f ′ : .
Note that C p ≤ 1 for any p ∈ [0, 1], and hence we have Assuming that the bound from Proposition 1 is applicable tof (which is yet to be studied becausef may fail to be smooth), we obtain the following inequality, valid without the assumption of symmetry of f : The right-hand side of this inequality contains the derivative off and hence depends on the choice of the family of trimmed regions {D t }. We further give a particular corollary, which appears when {D t } is the set of quantile trimmed regions. In what follows, we assume µ to possess a positive distribution density p µ and choose {D t = [a t , b t ]} in the following way. Denote q v = F −1 µ (v), that is, the quantile of µ of the level v, and put a t = q 1/2−t/2 , b t = q 1/2+t/2 , t ∈ [0, 1).
In particular, a 0 = b 0 = m, the median of µ. Denote alsoF µ = min(F µ , 1 − F µ ); observe that now we haveF Theorem 3. Let µ be a probability measure on R with positive distribution density p µ . Then, for any absolutely continuous f , we have Proof. First, observe that now the L 2 -symmetrization of a function f has the form This identity is evident for functions f of the form 1 (−∞,F −1 (v)) , v ∈ (0, 1/2] and 1 [F −1 (v),∞) , v ∈ [1/2, 1), and then easily extends to general f . Next, observe that s(x) = F −1 and because F µ is absolutely continuous and strictly increasing, s(x) is absolutely continuous as well. Thenf is absolutely continuous with )f ′ (s(x))s ′ (x) 2(f 2 (x) + f 2 (s(x))) ; here and below the derivatives are well defined for a.a. x. Using a standard localization/approximation procedure, we can show that Proposition 1 is well applicable to any absolutely continuous function. Hence, it is applicable tof , and (13) holds.
We have The function W (x) in (13) now can be rewritten as Let us analyze the second integral in the right-hand side. By (15), hence, Change the variables y = s(x); observe that we have x = s(y) andF µ (x) =F µ (y). Then we finally get 2F µ (y) p µ s(y) p µ (y) p µ (s(y)) dy = R W (y) f ′ (y) 2 µ(dy), and therefore On the other hand, by identity (14) we have now C p(x) = 1/2, and the function U (x) in (13) can be rewritten as which completes the proof of the statement.